Hive-trunk-h0.21 - Build # 925 - Fixed
Changes for Build #922 [cws] HIVE-BUILD. Bump version to 0.9.0-SNAPSHOT (cws) Changes for Build #923 Changes for Build #924 Changes for Build #925 [sdong] HIVE-2378. Warn user that precision is lost when bigint is implicitly cast to double. (Kevin Wilfong via Siying Dong) [jvs] HIVE-2412. Update Eclipse configuration to include Mockito dependency (Carl Steinbach via jvs) [jvs] HIVE-2382. Invalid predicate pushdown from incorrect column expression map for select operator generated by GROUP BY operation (Charles Chen via jvs) All tests passed The Apache Jenkins build system has built Hive-trunk-h0.21 (build #925) Status: Fixed Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/925/ to view the results.
[jira] [Commented] (HIVE-2412) Update Eclipse configuration to include Mockito dependency
[ https://issues.apache.org/jira/browse/HIVE-2412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094342#comment-13094342 ] Hudson commented on HIVE-2412: -- Integrated in Hive-trunk-h0.21 #925 (See [https://builds.apache.org/job/Hive-trunk-h0.21/925/]) HIVE-2412. Update Eclipse configuration to include Mockito dependency (Carl Steinbach via jvs) jvs : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1163454 Files : * /hive/trunk/eclipse-templates/.classpath * /hive/trunk/eclipse-templates/HiveCLI.launchtemplate Update Eclipse configuration to include Mockito dependency -- Key: HIVE-2412 URL: https://issues.apache.org/jira/browse/HIVE-2412 Project: Hive Issue Type: Bug Components: Build Infrastructure Reporter: Carl Steinbach Assignee: Carl Steinbach Fix For: 0.9.0 Attachments: HIVE-2412.1.patch.txt -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2382) Invalid predicate pushdown from incorrect column expression map for select operator generated by GROUP BY operation
[ https://issues.apache.org/jira/browse/HIVE-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094343#comment-13094343 ] Hudson commented on HIVE-2382: -- Integrated in Hive-trunk-h0.21 #925 (See [https://builds.apache.org/job/Hive-trunk-h0.21/925/]) HIVE-2382. Invalid predicate pushdown from incorrect column expression map for select operator generated by GROUP BY operation (Charles Chen via jvs) jvs : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1163437 Files : * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java * /hive/trunk/ql/src/test/queries/clientpositive/groupby_ppd.q * /hive/trunk/ql/src/test/results/clientpositive/groupby_ppd.q.out * /hive/trunk/ql/src/test/results/compiler/plan/groupby1.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby2.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby3.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby4.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby5.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby6.q.xml Invalid predicate pushdown from incorrect column expression map for select operator generated by GROUP BY operation --- Key: HIVE-2382 URL: https://issues.apache.org/jira/browse/HIVE-2382 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.6.0 Reporter: Charles Chen Assignee: Charles Chen Priority: Critical Fix For: 0.9.0 Attachments: HIVE-2382v1.patch, HIVE-2382v2.patch When a GROUP BY is specified, a select operator is added before the GROUP BY in SemanticAnalyzer.insertSelectAllPlanForGroupBy. Currently, the column expression map for this is set to the column expression map for the parent operator. This behavior is incorrect as, for example, the parent operator could rearrange the order of the columns (_col0 = _col0, _col1 = _col2, _col2 = _col1) and the new operator should not repeat this. The predicate pushdown optimization uses the column expression map to track which columns a filter expression refers to at different operators. This results in a filter on incorrect columns. Here is a simple case of this going wrong: Using {noformat} create table invites (id int, foo int, bar int); {noformat} executing the query {noformat} explain select * from (select foo, bar from (select bar, foo from invites c union all select bar, foo from invites d) b) a group by bar, foo having bar=1; {noformat} results in {noformat} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: a-subquery1:b-subquery1:c TableScan alias: c Filter Operator predicate: expr: (foo = 1) type: boolean Select Operator expressions: expr: bar type: int expr: foo type: int outputColumnNames: _col0, _col1 Union Select Operator expressions: expr: _col1 type: int expr: _col0 type: int outputColumnNames: _col0, _col1 Select Operator expressions: expr: _col0 type: int expr: _col1 type: int outputColumnNames: _col0, _col1 Group By Operator bucketGroup: false keys: expr: _col1 type: int expr: _col0 type: int mode: hash outputColumnNames: _col0, _col1 Reduce Output Operator key expressions: expr: _col0 type: int expr: _col1 type: int sort order: ++ Map-reduce partition columns: expr: _col0 type: int expr: _col1 type: int tag: -1 a-subquery2:b-subquery2:d
[jira] [Commented] (HIVE-2378) Warn user that precision is lost when bigint is implicitly cast to double.
[ https://issues.apache.org/jira/browse/HIVE-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094344#comment-13094344 ] Hudson commented on HIVE-2378: -- Integrated in Hive-trunk-h0.21 #925 (See [https://builds.apache.org/job/Hive-trunk-h0.21/925/]) HIVE-2378. Warn user that precision is lost when bigint is implicitly cast to double. (Kevin Wilfong via Siying Dong) sdong : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1163455 Files : * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ErrorMsg.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeGenericFuncDesc.java * /hive/trunk/ql/src/test/queries/clientnegative/compare_double_bigint.q * /hive/trunk/ql/src/test/queries/clientnegative/compare_string_bigint.q * /hive/trunk/ql/src/test/results/clientnegative/compare_double_bigint.q.out * /hive/trunk/ql/src/test/results/clientnegative/compare_string_bigint.q.out Warn user that precision is lost when bigint is implicitly cast to double. -- Key: HIVE-2378 URL: https://issues.apache.org/jira/browse/HIVE-2378 Project: Hive Issue Type: Improvement Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-2378.1.patch.txt, HIVE-2378.2.patch.txt, HIVE-2378.3.patch.txt When a bigint is implicitly cast to a double (when a bigint is involved in an equality expression with a string or double) precision may be lost, resulting in unexpected behavior. Until we fix the underlying issue we should throw an error in strict mode, and a warning in nonstrict mode alerting the user about this. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2423) Add a One Row per Column-View or \G option to command line output
Add a One Row per Column-View or \G option to command line output - Key: HIVE-2423 URL: https://issues.apache.org/jira/browse/HIVE-2423 Project: Hive Issue Type: New Feature Components: CLI Affects Versions: 0.8.0, 0.9.0 Environment: all Reporter: Jake Peterson Fix For: 0.8.0, 0.9.0 The hive client desperately needs better table output. I think having a query option to send output one column per line with the column name would desperately help looking at wide table data. Here I'm thinking of the \G option from the MySQL client. So essentially the command line would get a new query ending token '\G' Which would change the output to: * Row: 1 *** Column_A: Value 1 Column_B: Value 2 Column_CD: 34565434 * Row: 2 *** Right aligning the column names, with a single space before starting the value of that column from the output table row. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2184) Few improvements in org.apache.hadoop.hive.ql.metadata.Hive.close()
[ https://issues.apache.org/jira/browse/HIVE-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-2184: - Resolution: Fixed Fix Version/s: 0.9.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed to trunk. Thanks Chinna! Few improvements in org.apache.hadoop.hive.ql.metadata.Hive.close() --- Key: HIVE-2184 URL: https://issues.apache.org/jira/browse/HIVE-2184 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.5.0, 0.8.0 Environment: Hadoop 0.20.1, Hive0.8.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5) Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Fix For: 0.9.0 Attachments: HIVE-2184.1.patch, HIVE-2184.1.patch, HIVE-2184.2.patch, HIVE-2184.3.patch, HIVE-2184.patch 1)Hive.close() will call HiveMetaStoreClient.close() in this method the variable standAloneClient is never become true then client.shutdown() never call. 2)Hive.close() After calling metaStoreClient.close() need to make metaStoreClient=null -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1451) Creating a table stores the full address of namenode in the metadata. This leads to problems when the namenode address changes.
[ https://issues.apache.org/jira/browse/HIVE-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094706#comment-13094706 ] MIS commented on HIVE-1451: --- +1 for the issue. This is one of those features which many assume exists by default, but doesn't. I too have run into this and resolved it by changing the DB_LOCATION_URI column and LOCATION in the tables DBS and SDS respectively to point to the latest namenode URI. {My metastore was on MySql}. This issue will help us from manually changing namenode URI in db should the address of the namenode change. Creating a table stores the full address of namenode in the metadata. This leads to problems when the namenode address changes. --- Key: HIVE-1451 URL: https://issues.apache.org/jira/browse/HIVE-1451 Project: Hive Issue Type: Bug Components: Metastore, Query Processor Affects Versions: 0.5.0 Environment: Any Reporter: Arvind Prabhakar Here is an excerpt from table metadata for an arbitrary table {{table1}}: {noformat} hive describe extended table1; OK ... Detailed Table Information... location:hdfs://localhost:9000/user/arvind/hive/warehouse/table1, ... {noformat} As can be seen, the full address of namenode is captured in the location information for the table. This information is later used to run any queries on the table - thus making it impossible to change the namenode location once the table has been created. For example, for the above table, a query will fail if the namenode is migrated from port 9000 to 8020: {noformat} hive select * from table1; OK Failed with exception java.io.IOException:java.net.ConnectException: Call to localhost/127.0.0.1:9000 failed on connection exception: java.net.ConnectException: Connection refused Time taken: 10.78 seconds hive {noformat} It should be possible to change the namenode location regardless of when the tables are created. Also, any query execution should work with the configured namenode at that point in time rather than requiring the configuration to be exactly the same at the time when the tables were created. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
RE: Hive in EC2
When you launch an EMR cluster (or job flow in EMR terminology), it launches new EC2 instances, optionally with an Elastic IP assigned to the cluster's master host. One does not install EMR on existing EC2 (non-EMR) instances. -Original Message- From: MIS [mailto:misapa...@gmail.com] Sent: Wednesday, August 31, 2011 10:38 AM To: dev@hive.apache.org Cc: u...@hive.apache.org Subject: Re: Hive in EC2 But my concern is that I cannot run the Elastic Mapreduce on specific instances which we already own and have elastic IPs. If it is possible to do so, then using Hive EMR should be fine enough. Thanks, MIS On Wed, Aug 31, 2011 at 12:21 AM, Aggarwal, Vaibhav vagg...@amazon.comwrote: You could also choose to look at Amazon ElasticMapReduce. It allows you to provision an EC2 cluster of your choice preinstalled with Hive and Hadoop. https://cwiki.apache.org/confluence/display/Hive/HiveAmazonElasticMapReduce Thanks Vaibhav -Original Message- From: MIS [mailto:misapa...@gmail.com] Sent: Monday, August 29, 2011 11:03 PM To: u...@hive.apache.org; hive Subject: Hive in EC2 Hi, Can somebody point me to production level setup of Hive in EC2. The intent is to know the setup best practices being employed. Thanks.
[jira] [Commented] (HIVE-2413) BlockMergeTask ignores client-specified jars
[ https://issues.apache.org/jira/browse/HIVE-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094962#comment-13094962 ] He Yongqiang commented on HIVE-2413: [junit] java.lang.IllegalArgumentException: Can not create a Path from an empty string [junit] at org.apache.hadoop.fs.Path.checkPathArg(Path.java:82) [junit] at org.apache.hadoop.fs.Path.init(Path.java:90) [junit] at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:602) [junit] at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:761) [junit] at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) [junit] at org.apache.hadoop.hive.ql.io.rcfile.merge.BlockMergeTask.execute(BlockMergeTask.java:203) [junit] at org.apache.hadoop.hive.ql.exec.DDLTask.mergeFiles(DDLTask.java:410) [junit] at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:366) [junit] at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:132) [junit] at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) [junit] at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1343) [junit] at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1134) [junit] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:943) [junit] at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253) [junit] at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:210) [junit] at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:401) [junit] at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336) [junit] at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:638) [junit] at org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_concatenate_indexed_table(TestCliDriver.java:1190) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) I got these error with a bunch of testcases. Here are some of them: rcfile_merge3.q, load_fs.q, alter_merge.q etc can u take a look? BlockMergeTask ignores client-specified jars Key: HIVE-2413 URL: https://issues.apache.org/jira/browse/HIVE-2413 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Krishna Kumar Assignee: Krishna Kumar Priority: Minor Attachments: HIVE-2413.v0.patch User-specified jars are not added to the hadoop tasks while executing a BlockMergeTask resulting in a ClassNotFoundException. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2417) Merging of compressed rcfiles fails to write the valuebuffer part correctly
[ https://issues.apache.org/jira/browse/HIVE-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094963#comment-13094963 ] He Yongqiang commented on HIVE-2417: +1, will commit after tests pass Merging of compressed rcfiles fails to write the valuebuffer part correctly --- Key: HIVE-2417 URL: https://issues.apache.org/jira/browse/HIVE-2417 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Krishna Kumar Assignee: Krishna Kumar Attachments: HIVE-2417.v0.patch, HIVE-2417.v1.patch The blockmerge task does not create proper rc files when merging compressed rc files as the valuebuffer writing is incorrect. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2383) Incorrect alias filtering for predicate pushdown
[ https://issues.apache.org/jira/browse/HIVE-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-2383: - Resolution: Fixed Fix Version/s: (was: 0.8.0) 0.9.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Passed tests and committed to trunk. Thanks Charles! Incorrect alias filtering for predicate pushdown Key: HIVE-2383 URL: https://issues.apache.org/jira/browse/HIVE-2383 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.6.0 Reporter: Charles Chen Assignee: Charles Chen Priority: Critical Fix For: 0.9.0 Attachments: HIVE-2383v1.patch, HIVE-2383v2.patch, HIVE-2383v5.patch The predicate pushdown optimizer starts at the topmost operators traverses the operator tree, at each stage collecting predicates to be pushed down. At each operator, ive.ql.ppd.OpProcFactory.DefaultPPD.mergeWithChildrenPred is called, which merges the predicates of the children nodes into the current node. The predicates are stored in hive.ql.ppd.ExprWalkerInfo.pushdownPreds as a map from the alias a predicate refers to (a predicate may only refer to one alias at a time as only such predicates can be pushed) to a list of such predicates. Since at each stage the alias the predicate refers to may change (subqueries may change aliases), this is updated for each operator (hive.ql.ppd.ExprWalkerProcFactory.extractPushdownPreds is called which walks the ExprNodeDesc for each predicate). When a JoinOperator is encountered, mergeWithChildrenPred is passed an optional parameter aliases which contains a set of aliases that can be pushed per ansi semantics (see hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases). The part that is incorrect is that aliases are filtered in mergeWithChildrenPred before extractPushdownPreds is called, which associates the predicates with the correct alias in the current operator's context while the filtering should happen after. In test case Q2 below, when the predicate a.bar=3 comes into the JoinOperator, the alias is a coming in so it is accepted for pushdown. When brought into the JoinOperator's context, however, since the predicate refers to b.foo in the inner scope, we should not actually accept this for pushdown. With the test cases {noformat} -- Q1: predicate should not be pushed on the right side of a left outer join (this is correct in trunk) explain SELECT a.foo as foo1, b.foo as foo2, b.bar FROM pokes a LEFT OUTER JOIN pokes2 b ON a.foo=b.foo WHERE b.bar=3; -- Q2: predicate should not be pushed on the right side of a left outer join (this is broken in trunk) explain SELECT * FROM (SELECT a.foo as foo1, b.foo as foo2, b.bar FROM pokes a LEFT OUTER JOIN pokes2 b ON a.foo=b.foo) a WHERE a.bar=3; -- Q3: predicate should be pushed (this is correct in trunk) explain SELECT * FROM (SELECT a.foo as foo1, b.foo as foo2, a.bar FROM pokes a JOIN pokes2 b ON a.foo=b.foo) a WHERE a.bar=3; {noformat} The current output is {noformat} hive -- Q1: predicate should not be pushed on the right side of a left outer join explain SELECT a.foo as foo1, b.foo as foo2, b.bar FROM pokes a LEFT OUTER JOIN pokes2 b ON a.foo=b.foo WHERE b.bar=3; OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_LEFTOUTERJOIN (TOK_TABREF (TOK_TABNAME pokes) a) (TOK_TABREF (TOK_TABNAME pokes2) b) (= (. (TOK_TABLE_OR_COL a) foo) (. (TOK_TABLE_OR_COL b) foo (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) foo) foo1) (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) foo) foo2) (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) bar))) (TOK_WHERE (= (. (TOK_TABLE_OR_COL b) bar) 3 STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: a TableScan alias: a Reduce Output Operator key expressions: expr: foo type: int sort order: + Map-reduce partition columns: expr: foo type: int tag: 0 value expressions: expr: foo type: int b TableScan alias: b Reduce Output Operator key expressions: expr: foo type: int sort order: + Map-reduce partition columns: expr: foo type: int tag: 1
[jira] [Resolved] (HIVE-1395) Table aliases are ambiguous
[ https://issues.apache.org/jira/browse/HIVE-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi resolved HIVE-1395. -- Resolution: Won't Fix We're fixing the bugs and sticking with the normal SQL rules, which allow duplicate aliases, for the reasons mentioned above. Table aliases are ambiguous --- Key: HIVE-1395 URL: https://issues.apache.org/jira/browse/HIVE-1395 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.6.0 Reporter: Adam Kramer Consider this query: SELECT a.num FROM ( SELECT a.num AS num, b.num AS num2 FROM foo a LEFT OUTER JOIN bar b ON a.num=b.num ) a WHERE a.num2 IS NULL; ...in this case, the table alias 'a' is ambiguous. It could be the outer table (i.e., the subquery result), or it could be the inner table (foo). In the above case, Hive silently parses the outer reference to a as the inner reference. The result, then, is akin to: SELECT foo.num FROM foo WHERE bar.num IS NULL. This is bad. The bigger problem, however, is that Hive even lets people use the same table alias at multiple points in the query. We should simply throw an exception during the parse stage if there is any ambiguity in which table is which, just like we do if the column names are ambiguous. Or, if for some reason we need people to be able to use 'a' to refer to multiple tables or subqueries, it would be excellent if the exact parsing structure were made clear and added to the wiki. In that case, I will file a separate bug JIRA to complain about how it should be different. :) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2400) Update unittests Hadoop version
[ https://issues.apache.org/jira/browse/HIVE-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094999#comment-13094999 ] Marcin Kurczych commented on HIVE-2400: --- I've manually replaced hadoop-core and hadoop-tools jars to Hadoop 0.20.3 ones and everything almost worked (all tests, including new ones, which were failing because of Hadoop 0.20.1 bugs). There's almost, because I've run into a problem: VersionInfo.getVersion() was returning Unknown so I hardcoded something like if(Unknown.equals(vers)) vers=0.20.3; for testing and then everything went perfect. This must be problem with jars, I've used ones from https://repository.apache.org/index.html#nexus-search;quick~hadoop . Update unittests Hadoop version --- Key: HIVE-2400 URL: https://issues.apache.org/jira/browse/HIVE-2400 Project: Hive Issue Type: Improvement Reporter: Marcin Kurczych Assignee: Marcin Kurczych Hadoop 0.20.1 used in unittests contains bugs that were fixed in later versions of Hadoop, for example * har:// connections cannot be indexed by (scheme, authority, username) - the path is significant as well. Caching them in this way limits a hadoop client to opening one archive per filesystem. It seems to be safe not to cache them, since they wrap another connection that does the actual networking. fixed in https://issues.apache.org/jira/browse/HADOOP-6231 . -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-1342) Predicate push down get error result when sub-queries have the same alias name
[ https://issues.apache.org/jira/browse/HIVE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi resolved HIVE-1342. -- Resolution: Fixed Fix Version/s: 0.9.0 Fixed by committing sub-issues (not the patches attached to this issue). Predicate push down get error result when sub-queries have the same alias name --- Key: HIVE-1342 URL: https://issues.apache.org/jira/browse/HIVE-1342 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.6.0 Reporter: Ted Xu Assignee: Charles Chen Priority: Critical Fix For: 0.9.0 Attachments: HIVE-1342v1.patch, HIVE-1342v2.patch, HIVE-1342v3.patch, HIVE-1342v4.patch, cmd.hql, explain, ppd_same_alias_1.patch, ppd_same_alias_2.patch Query is over-optimized by PPD when sub-queries have the same alias name, see the query: --- create table if not exists dm_fact_buyer_prd_info_d ( category_id string ,gmv_trade_num int ,user_idint ) PARTITIONED BY (ds int); set hive.optimize.ppd=true; set hive.map.aggr=true; explain select category_id1,category_id2,assoc_idx from ( select category_id1 , category_id2 , count(distinct user_id) as assoc_idx from ( select t1.category_id as category_id1 , t2.category_id as category_id2 , t1.user_id from ( select category_id, user_id from dm_fact_buyer_prd_info_d group by category_id, user_id ) t1 join ( select category_id, user_id from dm_fact_buyer_prd_info_d group by category_id, user_id ) t2 on t1.user_id=t2.user_id ) t1 group by category_id1, category_id2 ) t_o where category_id1 category_id2 and assoc_idx 2; - The query above will fail when execute, throwing exception: can not cast UDFOpNotEqual(Text, IntWritable) to UDFOpNotEqual(Text, Text). I explained the query and the execute plan looks really wired ( only Stage-1, see the highlighted predicate): --- Stage: Stage-1 Map Reduce Alias - Map Operator Tree: t_o:t1:t1:dm_fact_buyer_prd_info_d TableScan alias: dm_fact_buyer_prd_info_d Filter Operator predicate: expr: *(category_id user_id)* type: boolean Select Operator expressions: expr: category_id type: string expr: user_id type: bigint outputColumnNames: category_id, user_id Group By Operator keys: expr: category_id type: string expr: user_id type: bigint mode: hash outputColumnNames: _col0, _col1 Reduce Output Operator key expressions: expr: _col0 type: string expr: _col1 type: bigint sort order: ++ Map-reduce partition columns: expr: _col0 type: string expr: _col1 type: bigint tag: -1 Reduce Operator Tree: Group By Operator keys: expr: KEY._col0 type: string expr: KEY._col1 type: bigint mode: mergepartial outputColumnNames: _col0, _col1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: bigint outputColumnNames: _col0, _col1 File Output Operator compressed: true GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
Re: Review Request: HIVE-2337: Predicate pushdown erroneously conservative with outer joins
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1275/ --- (Updated 2011-09-01 00:08:37.474019) Review request for hive. Changes --- Fixed ppd_outer_join4.q.out Summary --- https://issues.apache.org/jira/browse/HIVE-2337 This addresses bug HIVE-2337. https://issues.apache.org/jira/browse/HIVE-2337 Diffs (updated) - http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 1163856 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientpositive/ppd_outer_join5.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join4.q.out 1163856 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join5.q.out PRE-CREATION Diff: https://reviews.apache.org/r/1275/diff Testing --- Unit tests passed Thanks, Charles
[jira] [Updated] (HIVE-2337) Predicate pushdown erroneously conservative with outer joins
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Chen updated HIVE-2337: --- Attachment: HIVE-2337v4.patch Predicate pushdown erroneously conservative with outer joins Key: HIVE-2337 URL: https://issues.apache.org/jira/browse/HIVE-2337 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Charles Chen Assignee: Charles Chen Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch, HIVE-2337v3.patch, HIVE-2337v4.patch The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates. In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as: {noformat} /** * Figures out the aliases for whom it is safe to push predicates based on * ANSI SQL semantics For inner join, all predicates for all aliases can be * pushed For full outer join, none of the predicates can be pushed as that * would limit the number of rows for join For left outer join, all the * predicates on the left side aliases can be pushed up For right outer * join, all the predicates on the right side aliases can be pushed up Joins * chain containing both left and right outer joins are treated as full * outer join. [...] * * @param op * Join Operator * @param rr * Row resolver * @return set of qualified aliases */ {noformat} Since hive joins are left associative, something like a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d should be interpreted as ((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d, so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible. Using: {noformat} create table t1 (id int, key string, value string); create table t2 (id int, key string, value string); create table t3 (id int, key string, value string); create table t4 (id int, key string, value string); {noformat} For example, the query {noformat} explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20; {noformat} currently gives {noformat} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: t1 TableScan alias: t1 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 0 value expressions: expr: id type: int expr: key type: string expr: value type: string t2 TableScan alias: t2 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 1 value expressions: expr: id type: int expr: key type: string expr: value type: string t3 TableScan alias: t3 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 2 value expressions: expr: id type: int expr: key type: string expr: value type: string Reduce Operator Tree: Join Operator condition map: Outer Join 0 to 1 Inner Join 1 to 2 condition expressions: 0 {VALUE._col0} {VALUE._col1} {VALUE._col2} 1 {VALUE._col0} {VALUE._col1} {VALUE._col2} 2 {VALUE._col0} {VALUE._col1} {VALUE._col2} handleSkewJoin: false outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12 Filter Operator predicate:
[jira] [Commented] (HIVE-2337) Predicate pushdown erroneously conservative with outer joins
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095013#comment-13095013 ] jirapos...@reviews.apache.org commented on HIVE-2337: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1275/ --- (Updated 2011-09-01 00:08:37.474019) Review request for hive. Changes --- Fixed ppd_outer_join4.q.out Summary --- https://issues.apache.org/jira/browse/HIVE-2337 This addresses bug HIVE-2337. https://issues.apache.org/jira/browse/HIVE-2337 Diffs (updated) - http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 1163856 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientpositive/ppd_outer_join5.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join4.q.out 1163856 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join5.q.out PRE-CREATION Diff: https://reviews.apache.org/r/1275/diff Testing --- Unit tests passed Thanks, Charles Predicate pushdown erroneously conservative with outer joins Key: HIVE-2337 URL: https://issues.apache.org/jira/browse/HIVE-2337 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Charles Chen Assignee: Charles Chen Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch, HIVE-2337v3.patch, HIVE-2337v4.patch The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates. In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as: {noformat} /** * Figures out the aliases for whom it is safe to push predicates based on * ANSI SQL semantics For inner join, all predicates for all aliases can be * pushed For full outer join, none of the predicates can be pushed as that * would limit the number of rows for join For left outer join, all the * predicates on the left side aliases can be pushed up For right outer * join, all the predicates on the right side aliases can be pushed up Joins * chain containing both left and right outer joins are treated as full * outer join. [...] * * @param op * Join Operator * @param rr * Row resolver * @return set of qualified aliases */ {noformat} Since hive joins are left associative, something like a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d should be interpreted as ((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d, so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible. Using: {noformat} create table t1 (id int, key string, value string); create table t2 (id int, key string, value string); create table t3 (id int, key string, value string); create table t4 (id int, key string, value string); {noformat} For example, the query {noformat} explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20; {noformat} currently gives {noformat} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: t1 TableScan alias: t1 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 0 value expressions: expr: id type: int expr: key type: string expr: value type: string t2 TableScan alias: t2 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 1 value expressions: expr: id type: int expr: key type: string expr: value
[jira] [Commented] (HIVE-2337) Predicate pushdown erroneously conservative with outer joins
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095015#comment-13095015 ] Charles Chen commented on HIVE-2337: I've fixed the test output--it seems to be an improvement. Predicate pushdown erroneously conservative with outer joins Key: HIVE-2337 URL: https://issues.apache.org/jira/browse/HIVE-2337 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Charles Chen Assignee: Charles Chen Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch, HIVE-2337v3.patch, HIVE-2337v4.patch The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates. In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as: {noformat} /** * Figures out the aliases for whom it is safe to push predicates based on * ANSI SQL semantics For inner join, all predicates for all aliases can be * pushed For full outer join, none of the predicates can be pushed as that * would limit the number of rows for join For left outer join, all the * predicates on the left side aliases can be pushed up For right outer * join, all the predicates on the right side aliases can be pushed up Joins * chain containing both left and right outer joins are treated as full * outer join. [...] * * @param op * Join Operator * @param rr * Row resolver * @return set of qualified aliases */ {noformat} Since hive joins are left associative, something like a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d should be interpreted as ((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d, so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible. Using: {noformat} create table t1 (id int, key string, value string); create table t2 (id int, key string, value string); create table t3 (id int, key string, value string); create table t4 (id int, key string, value string); {noformat} For example, the query {noformat} explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20; {noformat} currently gives {noformat} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: t1 TableScan alias: t1 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 0 value expressions: expr: id type: int expr: key type: string expr: value type: string t2 TableScan alias: t2 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 1 value expressions: expr: id type: int expr: key type: string expr: value type: string t3 TableScan alias: t3 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 2 value expressions: expr: id type: int expr: key type: string expr: value type: string Reduce Operator Tree: Join Operator condition map: Outer Join 0 to 1 Inner Join 1 to 2 condition expressions: 0 {VALUE._col0} {VALUE._col1} {VALUE._col2} 1 {VALUE._col0} {VALUE._col1} {VALUE._col2} 2 {VALUE._col0} {VALUE._col1} {VALUE._col2} handleSkewJoin: false outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7,
[jira] [Commented] (HIVE-2383) Incorrect alias filtering for predicate pushdown
[ https://issues.apache.org/jira/browse/HIVE-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095020#comment-13095020 ] John Sichi commented on HIVE-2383: -- Oh, um, also: +1. Incorrect alias filtering for predicate pushdown Key: HIVE-2383 URL: https://issues.apache.org/jira/browse/HIVE-2383 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.6.0 Reporter: Charles Chen Assignee: Charles Chen Priority: Critical Fix For: 0.9.0 Attachments: HIVE-2383v1.patch, HIVE-2383v2.patch, HIVE-2383v5.patch The predicate pushdown optimizer starts at the topmost operators traverses the operator tree, at each stage collecting predicates to be pushed down. At each operator, ive.ql.ppd.OpProcFactory.DefaultPPD.mergeWithChildrenPred is called, which merges the predicates of the children nodes into the current node. The predicates are stored in hive.ql.ppd.ExprWalkerInfo.pushdownPreds as a map from the alias a predicate refers to (a predicate may only refer to one alias at a time as only such predicates can be pushed) to a list of such predicates. Since at each stage the alias the predicate refers to may change (subqueries may change aliases), this is updated for each operator (hive.ql.ppd.ExprWalkerProcFactory.extractPushdownPreds is called which walks the ExprNodeDesc for each predicate). When a JoinOperator is encountered, mergeWithChildrenPred is passed an optional parameter aliases which contains a set of aliases that can be pushed per ansi semantics (see hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases). The part that is incorrect is that aliases are filtered in mergeWithChildrenPred before extractPushdownPreds is called, which associates the predicates with the correct alias in the current operator's context while the filtering should happen after. In test case Q2 below, when the predicate a.bar=3 comes into the JoinOperator, the alias is a coming in so it is accepted for pushdown. When brought into the JoinOperator's context, however, since the predicate refers to b.foo in the inner scope, we should not actually accept this for pushdown. With the test cases {noformat} -- Q1: predicate should not be pushed on the right side of a left outer join (this is correct in trunk) explain SELECT a.foo as foo1, b.foo as foo2, b.bar FROM pokes a LEFT OUTER JOIN pokes2 b ON a.foo=b.foo WHERE b.bar=3; -- Q2: predicate should not be pushed on the right side of a left outer join (this is broken in trunk) explain SELECT * FROM (SELECT a.foo as foo1, b.foo as foo2, b.bar FROM pokes a LEFT OUTER JOIN pokes2 b ON a.foo=b.foo) a WHERE a.bar=3; -- Q3: predicate should be pushed (this is correct in trunk) explain SELECT * FROM (SELECT a.foo as foo1, b.foo as foo2, a.bar FROM pokes a JOIN pokes2 b ON a.foo=b.foo) a WHERE a.bar=3; {noformat} The current output is {noformat} hive -- Q1: predicate should not be pushed on the right side of a left outer join explain SELECT a.foo as foo1, b.foo as foo2, b.bar FROM pokes a LEFT OUTER JOIN pokes2 b ON a.foo=b.foo WHERE b.bar=3; OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_LEFTOUTERJOIN (TOK_TABREF (TOK_TABNAME pokes) a) (TOK_TABREF (TOK_TABNAME pokes2) b) (= (. (TOK_TABLE_OR_COL a) foo) (. (TOK_TABLE_OR_COL b) foo (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) foo) foo1) (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) foo) foo2) (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) bar))) (TOK_WHERE (= (. (TOK_TABLE_OR_COL b) bar) 3 STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: a TableScan alias: a Reduce Output Operator key expressions: expr: foo type: int sort order: + Map-reduce partition columns: expr: foo type: int tag: 0 value expressions: expr: foo type: int b TableScan alias: b Reduce Output Operator key expressions: expr: foo type: int sort order: + Map-reduce partition columns: expr: foo type: int tag: 1 value expressions: expr: foo type: int expr: bar type: int
Re: Review Request: HIVE-2337: Predicate pushdown erroneously conservative with outer joins
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1275/ --- (Updated 2011-09-01 00:19:17.176704) Review request for hive. Changes --- Rebased to current trunk Summary --- https://issues.apache.org/jira/browse/HIVE-2337 This addresses bug HIVE-2337. https://issues.apache.org/jira/browse/HIVE-2337 Diffs (updated) - http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 1163875 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join4.q.out 1163875 Diff: https://reviews.apache.org/r/1275/diff Testing --- Unit tests passed Thanks, Charles
[jira] [Commented] (HIVE-2337) Predicate pushdown erroneously conservative with outer joins
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095022#comment-13095022 ] jirapos...@reviews.apache.org commented on HIVE-2337: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1275/ --- (Updated 2011-09-01 00:19:17.176704) Review request for hive. Changes --- Rebased to current trunk Summary --- https://issues.apache.org/jira/browse/HIVE-2337 This addresses bug HIVE-2337. https://issues.apache.org/jira/browse/HIVE-2337 Diffs (updated) - http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 1163875 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join4.q.out 1163875 Diff: https://reviews.apache.org/r/1275/diff Testing --- Unit tests passed Thanks, Charles Predicate pushdown erroneously conservative with outer joins Key: HIVE-2337 URL: https://issues.apache.org/jira/browse/HIVE-2337 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Charles Chen Assignee: Charles Chen Fix For: 0.9.0 Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch, HIVE-2337v3.patch, HIVE-2337v4.patch, HIVE-2337v5.patch The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates. In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as: {noformat} /** * Figures out the aliases for whom it is safe to push predicates based on * ANSI SQL semantics For inner join, all predicates for all aliases can be * pushed For full outer join, none of the predicates can be pushed as that * would limit the number of rows for join For left outer join, all the * predicates on the left side aliases can be pushed up For right outer * join, all the predicates on the right side aliases can be pushed up Joins * chain containing both left and right outer joins are treated as full * outer join. [...] * * @param op * Join Operator * @param rr * Row resolver * @return set of qualified aliases */ {noformat} Since hive joins are left associative, something like a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d should be interpreted as ((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d, so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible. Using: {noformat} create table t1 (id int, key string, value string); create table t2 (id int, key string, value string); create table t3 (id int, key string, value string); create table t4 (id int, key string, value string); {noformat} For example, the query {noformat} explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20; {noformat} currently gives {noformat} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: t1 TableScan alias: t1 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 0 value expressions: expr: id type: int expr: key type: string expr: value type: string t2 TableScan alias: t2 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 1 value expressions: expr: id type: int expr: key type: string expr: value type: string t3 TableScan alias: t3 Reduce Output Operator key expressions: expr: id
[jira] [Updated] (HIVE-2337) Predicate pushdown erroneously conservative with outer joins
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Chen updated HIVE-2337: --- Fix Version/s: 0.9.0 Status: Patch Available (was: Open) Predicate pushdown erroneously conservative with outer joins Key: HIVE-2337 URL: https://issues.apache.org/jira/browse/HIVE-2337 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Charles Chen Assignee: Charles Chen Fix For: 0.9.0 Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch, HIVE-2337v3.patch, HIVE-2337v4.patch, HIVE-2337v5.patch The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates. In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as: {noformat} /** * Figures out the aliases for whom it is safe to push predicates based on * ANSI SQL semantics For inner join, all predicates for all aliases can be * pushed For full outer join, none of the predicates can be pushed as that * would limit the number of rows for join For left outer join, all the * predicates on the left side aliases can be pushed up For right outer * join, all the predicates on the right side aliases can be pushed up Joins * chain containing both left and right outer joins are treated as full * outer join. [...] * * @param op * Join Operator * @param rr * Row resolver * @return set of qualified aliases */ {noformat} Since hive joins are left associative, something like a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d should be interpreted as ((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d, so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible. Using: {noformat} create table t1 (id int, key string, value string); create table t2 (id int, key string, value string); create table t3 (id int, key string, value string); create table t4 (id int, key string, value string); {noformat} For example, the query {noformat} explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20; {noformat} currently gives {noformat} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: t1 TableScan alias: t1 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 0 value expressions: expr: id type: int expr: key type: string expr: value type: string t2 TableScan alias: t2 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 1 value expressions: expr: id type: int expr: key type: string expr: value type: string t3 TableScan alias: t3 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 2 value expressions: expr: id type: int expr: key type: string expr: value type: string Reduce Operator Tree: Join Operator condition map: Outer Join 0 to 1 Inner Join 1 to 2 condition expressions: 0 {VALUE._col0} {VALUE._col1} {VALUE._col2} 1 {VALUE._col0} {VALUE._col1} {VALUE._col2} 2 {VALUE._col0} {VALUE._col1} {VALUE._col2} handleSkewJoin: false outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7,
[jira] [Updated] (HIVE-2337) Predicate pushdown erroneously conservative with outer joins
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Chen updated HIVE-2337: --- Attachment: HIVE-2337v5.patch Predicate pushdown erroneously conservative with outer joins Key: HIVE-2337 URL: https://issues.apache.org/jira/browse/HIVE-2337 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Charles Chen Assignee: Charles Chen Fix For: 0.9.0 Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch, HIVE-2337v3.patch, HIVE-2337v4.patch, HIVE-2337v5.patch The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates. In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as: {noformat} /** * Figures out the aliases for whom it is safe to push predicates based on * ANSI SQL semantics For inner join, all predicates for all aliases can be * pushed For full outer join, none of the predicates can be pushed as that * would limit the number of rows for join For left outer join, all the * predicates on the left side aliases can be pushed up For right outer * join, all the predicates on the right side aliases can be pushed up Joins * chain containing both left and right outer joins are treated as full * outer join. [...] * * @param op * Join Operator * @param rr * Row resolver * @return set of qualified aliases */ {noformat} Since hive joins are left associative, something like a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d should be interpreted as ((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d, so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible. Using: {noformat} create table t1 (id int, key string, value string); create table t2 (id int, key string, value string); create table t3 (id int, key string, value string); create table t4 (id int, key string, value string); {noformat} For example, the query {noformat} explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20; {noformat} currently gives {noformat} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: t1 TableScan alias: t1 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 0 value expressions: expr: id type: int expr: key type: string expr: value type: string t2 TableScan alias: t2 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 1 value expressions: expr: id type: int expr: key type: string expr: value type: string t3 TableScan alias: t3 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 2 value expressions: expr: id type: int expr: key type: string expr: value type: string Reduce Operator Tree: Join Operator condition map: Outer Join 0 to 1 Inner Join 1 to 2 condition expressions: 0 {VALUE._col0} {VALUE._col1} {VALUE._col2} 1 {VALUE._col0} {VALUE._col1} {VALUE._col2} 2 {VALUE._col0} {VALUE._col1} {VALUE._col2} handleSkewJoin: false outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
Re: Review Request: Support archiving for multiple partitions if the table is partitioned by multiple columns
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1259/ --- (Updated 2011-09-01 01:23:23.280266) Review request for hive, Paul Yang and namit jain. Changes --- Reverted accidentally deleted line. Summary --- Allowing archiving at chosen level. When table is partitioned by ds, hr, min it allows archiving at ds level, hr level and min level. Corresponding syntaxes are: ALTER TABLE test ARCHIVE PARTITION (ds='2008-04-08'); ALTER TABLE test ARCHIVE PARTITION (ds='2008-04-08', hr='11'); ALTER TABLE test ARCHIVE PARTITION (ds='2008-04-08', hr='11', min='30'); You cannot do much to archived partitions. You can read them. You cannot write to them / overwrite them. You can drop single archived partitions, but not parts of bigger archives. Diffs (updated) - trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1153271 trunk/data/files/archive_corrupt.rc UNKNOWN trunk/metastore/if/hive_metastore.thrift 1153271 trunk/metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.h 1153271 trunk/metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.cpp 1153271 trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Constants.java 1153271 trunk/metastore/src/gen/thrift/gen-php/hive_metastore/hive_metastore_constants.php 1153271 trunk/metastore/src/gen/thrift/gen-py/hive_metastore/constants.py 1153271 trunk/metastore/src/gen/thrift/gen-rb/hive_metastore_constants.rb 1153271 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 1153271 trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java 1153271 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ArchiveUtils.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 1153271 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1153271 trunk/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java 1153271 trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/DummyPartition.java 1153271 trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 1153271 trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 1153271 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 1153271 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1153271 trunk/ql/src/test/queries/clientnegative/archive_insert1.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/archive_insert2.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/archive_insert3.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/archive_insert4.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/archive_multi1.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/archive_multi2.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/archive_multi3.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/archive_multi4.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/archive_multi5.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/archive_multi6.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/archive_multi7.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/archive_partspec1.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/archive_partspec2.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/archive_partspec3.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/archive_corrupt.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/archive_multi.q PRE-CREATION trunk/ql/src/test/results/clientnegative/archive1.q.out 1153271 trunk/ql/src/test/results/clientnegative/archive2.q.out 1153271 trunk/ql/src/test/results/clientnegative/archive_insert1.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/archive_insert2.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/archive_insert3.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/archive_insert4.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/archive_multi1.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/archive_multi2.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/archive_multi3.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/archive_multi4.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/archive_multi5.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/archive_multi6.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/archive_multi7.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/archive_partspec1.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/archive_partspec2.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/archive_partspec3.q.out PRE-CREATION
[jira] [Updated] (HIVE-2388) Facing issues while executing commands on hive shell. The system throws following error: only on Windows Cygwin setup
[ https://issues.apache.org/jira/browse/HIVE-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth tiwari updated HIVE-2388: --- Fix Version/s: 0.7.1 Labels: patch (was: ) Release Note: For cygwin and windows pls use the attached start.sh to start hive rather than hive.sh it calls internally the same. version 2 patch would be uploaded soon with permanent solutin in the jar. Hadoop Flags: [Reviewed] Status: Patch Available (was: Open) Facing issues while executing commands on hive shell. The system throws following error: only on Windows Cygwin setup - Key: HIVE-2388 URL: https://issues.apache.org/jira/browse/HIVE-2388 Project: Hive Issue Type: Bug Components: CLI, Query Processor Affects Versions: 0.7.1 Environment: Cygwin Windows Reporter: Siddharth tiwari Priority: Critical Labels: patch Fix For: 0.7.1 Attachments: start.sh Original Estimate: 456h Remaining Estimate: 456h DDL runs well but the following command describes throw error pls help with resolution and how to get about it hive show tables ; FAILED: Hive Internal Error: java.lang.IllegalArgumentException(java.net.URISyntaxException: Relative path in absolute URI: file:C:/cygwin/tmp//siddharth/hive_2011-08-18_ 03-11-05_208_1818592223695168110) java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:C:/cygwin/tmp//siddharth/hive_2011-08-18_03-11-05_208_181859222369516 8110 at org.apache.hadoop.fs.Path.initialize(Path.java:140) at org.apache.hadoop.fs.Path.init(Path.java:132) at org.apache.hadoop.hive.ql.Context.getScratchDir(Context.java:142) at org.apache.hadoop.hive.ql.Context.getLocalScratchDir(Context.java:168) at org.apache.hadoop.hive.ql.Context.getLocalTmpFileURI(Context.java:282) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:205) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.net.URISyntaxException: Relative path in absolute URI: file:C:/cygwin/tmp//siddharth/hive_2011-08-18_03-11-05_208_1818592223695168110 at java.net.URI.checkPath(URI.java:1787) at java.net.URI.init(URI.java:735) at org.apache.hadoop.fs.Path.initialize(Path.java:137) ... 16 more -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2388) Facing issues while executing commands on hive shell. The system throws following error: only on Windows Cygwin setup
[ https://issues.apache.org/jira/browse/HIVE-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth tiwari updated HIVE-2388: --- Attachment: start.sh use this file to start hive on cygwin bin/start.sh Facing issues while executing commands on hive shell. The system throws following error: only on Windows Cygwin setup - Key: HIVE-2388 URL: https://issues.apache.org/jira/browse/HIVE-2388 Project: Hive Issue Type: Bug Components: CLI, Query Processor Affects Versions: 0.7.1 Environment: Cygwin Windows Reporter: Siddharth tiwari Priority: Critical Labels: patch Fix For: 0.7.1 Attachments: start.sh Original Estimate: 456h Remaining Estimate: 456h DDL runs well but the following command describes throw error pls help with resolution and how to get about it hive show tables ; FAILED: Hive Internal Error: java.lang.IllegalArgumentException(java.net.URISyntaxException: Relative path in absolute URI: file:C:/cygwin/tmp//siddharth/hive_2011-08-18_ 03-11-05_208_1818592223695168110) java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:C:/cygwin/tmp//siddharth/hive_2011-08-18_03-11-05_208_181859222369516 8110 at org.apache.hadoop.fs.Path.initialize(Path.java:140) at org.apache.hadoop.fs.Path.init(Path.java:132) at org.apache.hadoop.hive.ql.Context.getScratchDir(Context.java:142) at org.apache.hadoop.hive.ql.Context.getLocalScratchDir(Context.java:168) at org.apache.hadoop.hive.ql.Context.getLocalTmpFileURI(Context.java:282) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:205) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.net.URISyntaxException: Relative path in absolute URI: file:C:/cygwin/tmp//siddharth/hive_2011-08-18_03-11-05_208_1818592223695168110 at java.net.URI.checkPath(URI.java:1787) at java.net.URI.init(URI.java:735) at org.apache.hadoop.fs.Path.initialize(Path.java:137) ... 16 more -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2388) Facing issues while executing commands on hive shell. The system throws following error: only on Windows Cygwin setup
[ https://issues.apache.org/jira/browse/HIVE-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth tiwari updated HIVE-2388: --- Resolution: Fixed Status: Resolved (was: Patch Available) Facing issues while executing commands on hive shell. The system throws following error: only on Windows Cygwin setup - Key: HIVE-2388 URL: https://issues.apache.org/jira/browse/HIVE-2388 Project: Hive Issue Type: Bug Components: CLI, Query Processor Affects Versions: 0.7.1 Environment: Cygwin Windows Reporter: Siddharth tiwari Priority: Critical Labels: patch Fix For: 0.7.1 Attachments: start.sh Original Estimate: 456h Remaining Estimate: 456h DDL runs well but the following command describes throw error pls help with resolution and how to get about it hive show tables ; FAILED: Hive Internal Error: java.lang.IllegalArgumentException(java.net.URISyntaxException: Relative path in absolute URI: file:C:/cygwin/tmp//siddharth/hive_2011-08-18_ 03-11-05_208_1818592223695168110) java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:C:/cygwin/tmp//siddharth/hive_2011-08-18_03-11-05_208_181859222369516 8110 at org.apache.hadoop.fs.Path.initialize(Path.java:140) at org.apache.hadoop.fs.Path.init(Path.java:132) at org.apache.hadoop.hive.ql.Context.getScratchDir(Context.java:142) at org.apache.hadoop.hive.ql.Context.getLocalScratchDir(Context.java:168) at org.apache.hadoop.hive.ql.Context.getLocalTmpFileURI(Context.java:282) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:205) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.net.URISyntaxException: Relative path in absolute URI: file:C:/cygwin/tmp//siddharth/hive_2011-08-18_03-11-05_208_1818592223695168110 at java.net.URI.checkPath(URI.java:1787) at java.net.URI.init(URI.java:735) at org.apache.hadoop.fs.Path.initialize(Path.java:137) ... 16 more -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2247) ALTER TABLE RENAME PARTITION
[ https://issues.apache.org/jira/browse/HIVE-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095082#comment-13095082 ] jirapos...@reviews.apache.org commented on HIVE-2247: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1105/ --- (Updated 2011-09-01 02:23:59.714244) Review request for Siying Dong. Changes --- +work.getInputs().add(new ReadEntity(oldPart)); +work.getOutputs().add(new WriteEntity(newPart)); Summary --- Implement ALTER TABLE PARTITION RENAME function to rename a partition. Add HiveQL syntax ALTER TABLE bar PARTITION (k1='v1', k2='v2') RENAME TO PARTITION (k1='v3', k2='v4'); This is my first Hive diff, I just learn everything from existing codebase and may not have a good understanding on it. Feel free to inform me if I make something wrong. Thanks This addresses bug HIVE-2247. https://issues.apache.org/jira/browse/HIVE-2247 Diffs (updated) - trunk/metastore/if/hive_metastore.thrift 1145366 trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 1145366 trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp 1145366 trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp 1145366 trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java 1145366 trunk/metastore/src/gen/thrift/gen-php/hive_metastore/ThriftHiveMetastore.php 1145366 trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote 1145366 trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 1145366 trunk/metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb 1145366 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java 1145366 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1145366 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 1145366 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 1145366 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1145366 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1145366 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1145366 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 1145366 trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 1145366 trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java 1145366 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 1145366 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 1145366 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java 1145366 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/AlterTableDesc.java 1145366 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/DDLWork.java 1145366 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/HiveOperation.java 1145366 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/RenamePartitionDesc.java PRE-CREATION trunk/ql/src/test/queries/clientnegative/alter_rename_partition_failure.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/alter_rename_partition_failure2.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/alter_rename_partition_failure3.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/alter_rename_partition.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/alter_rename_partition_authorization.q PRE-CREATION trunk/ql/src/test/results/clientnegative/alter_rename_partition_failure.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/alter_rename_partition_failure2.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/alter_rename_partition_failure3.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/alter_rename_partition.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/alter_rename_partition_authorization.q.out PRE-CREATION Diff: https://reviews.apache.org/r/1105/diff Testing --- Add a partition A in the table Rename partition A to partition B Show the partitions in the table, it returns partition B. SELECT the data from partition A, it returns no results SELECT the data from partition B, it returns the data originally stored in partition A Thanks, Weiyan ALTER TABLE RENAME PARTITION Key: HIVE-2247 URL: https://issues.apache.org/jira/browse/HIVE-2247 Project: Hive Issue Type: New Feature Reporter: Siying Dong Assignee: Weiyan Wang Attachments: HIVE-2247.3.patch.txt,
[jira] [Updated] (HIVE-2247) ALTER TABLE RENAME PARTITION
[ https://issues.apache.org/jira/browse/HIVE-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiyan Wang updated HIVE-2247: -- Attachment: HIVE-2247.8.patch.txt +work.getInputs().add(new ReadEntity(oldPart)); +work.getOutputs().add(new WriteEntity(newPart)); ALTER TABLE RENAME PARTITION Key: HIVE-2247 URL: https://issues.apache.org/jira/browse/HIVE-2247 Project: Hive Issue Type: New Feature Reporter: Siying Dong Assignee: Weiyan Wang Attachments: HIVE-2247.3.patch.txt, HIVE-2247.4.patch.txt, HIVE-2247.5.patch.txt, HIVE-2247.6.patch.txt, HIVE-2247.7.patch.txt, HIVE-2247.8.patch.txt We need a ALTER TABLE TABLE RENAME PARTITIONfunction that is similar t ALTER TABLE RENAME. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: HIVE-2337: Predicate pushdown erroneously conservative with outer joins
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1275/#review1710 --- http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java https://reviews.apache.org/r/1275/#comment3884 There is a weird non-ASCII character on this line. - John On 2011-09-01 00:19:17, Charles Chen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1275/ --- (Updated 2011-09-01 00:19:17) Review request for hive. Summary --- https://issues.apache.org/jira/browse/HIVE-2337 This addresses bug HIVE-2337. https://issues.apache.org/jira/browse/HIVE-2337 Diffs - http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 1163875 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join4.q.out 1163875 Diff: https://reviews.apache.org/r/1275/diff Testing --- Unit tests passed Thanks, Charles
[jira] [Commented] (HIVE-2337) Predicate pushdown erroneously conservative with outer joins
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095104#comment-13095104 ] jirapos...@reviews.apache.org commented on HIVE-2337: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1275/#review1710 --- http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java https://reviews.apache.org/r/1275/#comment3884 There is a weird non-ASCII character on this line. - John On 2011-09-01 00:19:17, Charles Chen wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1275/ bq. --- bq. bq. (Updated 2011-09-01 00:19:17) bq. bq. bq. Review request for hive. bq. bq. bq. Summary bq. --- bq. bq. https://issues.apache.org/jira/browse/HIVE-2337 bq. bq. bq. This addresses bug HIVE-2337. bq. https://issues.apache.org/jira/browse/HIVE-2337 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 1163875 bq. http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join4.q.out 1163875 bq. bq. Diff: https://reviews.apache.org/r/1275/diff bq. bq. bq. Testing bq. --- bq. bq. Unit tests passed bq. bq. bq. Thanks, bq. bq. Charles bq. bq. Predicate pushdown erroneously conservative with outer joins Key: HIVE-2337 URL: https://issues.apache.org/jira/browse/HIVE-2337 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Charles Chen Assignee: Charles Chen Fix For: 0.9.0 Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch, HIVE-2337v3.patch, HIVE-2337v4.patch, HIVE-2337v5.patch The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates. In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as: {noformat} /** * Figures out the aliases for whom it is safe to push predicates based on * ANSI SQL semantics For inner join, all predicates for all aliases can be * pushed For full outer join, none of the predicates can be pushed as that * would limit the number of rows for join For left outer join, all the * predicates on the left side aliases can be pushed up For right outer * join, all the predicates on the right side aliases can be pushed up Joins * chain containing both left and right outer joins are treated as full * outer join. [...] * * @param op * Join Operator * @param rr * Row resolver * @return set of qualified aliases */ {noformat} Since hive joins are left associative, something like a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d should be interpreted as ((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d, so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible. Using: {noformat} create table t1 (id int, key string, value string); create table t2 (id int, key string, value string); create table t3 (id int, key string, value string); create table t4 (id int, key string, value string); {noformat} For example, the query {noformat} explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20; {noformat} currently gives {noformat} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: t1 TableScan alias: t1 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 0 value expressions: expr: id type: int expr: key type: string expr: value type: string t2 TableScan alias: t2 Reduce Output Operator
[jira] [Commented] (HIVE-2337) Predicate pushdown erroneously conservative with outer joins
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095105#comment-13095105 ] John Sichi commented on HIVE-2337: -- Charles, did you intentionally omit the new ppd_outer_join5.q from the latest patch? Also, there's a weird non-ASCII character in the Javadoc. Predicate pushdown erroneously conservative with outer joins Key: HIVE-2337 URL: https://issues.apache.org/jira/browse/HIVE-2337 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Charles Chen Assignee: Charles Chen Fix For: 0.9.0 Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch, HIVE-2337v3.patch, HIVE-2337v4.patch, HIVE-2337v5.patch The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates. In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as: {noformat} /** * Figures out the aliases for whom it is safe to push predicates based on * ANSI SQL semantics For inner join, all predicates for all aliases can be * pushed For full outer join, none of the predicates can be pushed as that * would limit the number of rows for join For left outer join, all the * predicates on the left side aliases can be pushed up For right outer * join, all the predicates on the right side aliases can be pushed up Joins * chain containing both left and right outer joins are treated as full * outer join. [...] * * @param op * Join Operator * @param rr * Row resolver * @return set of qualified aliases */ {noformat} Since hive joins are left associative, something like a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d should be interpreted as ((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d, so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible. Using: {noformat} create table t1 (id int, key string, value string); create table t2 (id int, key string, value string); create table t3 (id int, key string, value string); create table t4 (id int, key string, value string); {noformat} For example, the query {noformat} explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20; {noformat} currently gives {noformat} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: t1 TableScan alias: t1 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 0 value expressions: expr: id type: int expr: key type: string expr: value type: string t2 TableScan alias: t2 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 1 value expressions: expr: id type: int expr: key type: string expr: value type: string t3 TableScan alias: t3 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 2 value expressions: expr: id type: int expr: key type: string expr: value type: string Reduce Operator Tree: Join Operator condition map: Outer Join 0 to 1 Inner Join 1 to 2 condition expressions: 0 {VALUE._col0} {VALUE._col1} {VALUE._col2} 1 {VALUE._col0} {VALUE._col1} {VALUE._col2} 2 {VALUE._col0}
[jira] [Commented] (HIVE-1545) Add a bunch of UDFs and UDAFs
[ https://issues.apache.org/jira/browse/HIVE-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095106#comment-13095106 ] cyril liao commented on HIVE-1545: -- com.facebook.hive.udf.lib.UDFUtils is not included. Would you please upload it? Add a bunch of UDFs and UDAFs - Key: HIVE-1545 URL: https://issues.apache.org/jira/browse/HIVE-1545 Project: Hive Issue Type: New Feature Components: UDF Reporter: Jonathan Chang Assignee: Jonathan Chang Priority: Minor Attachments: core.tar.gz, ext.tar.gz, udfs.tar.gz, udfs.tar.gz Here some UD(A)Fs which can be incorporated into the Hive distribution: UDFArgMax - Find the 0-indexed index of the largest argument. e.g., ARGMAX(4, 5, 3) returns 1. UDFBucket - Find the bucket in which the first argument belongs. e.g., BUCKET(x, b_1, b_2, b_3, ...), will return the smallest i such that x b_{i} but = b_{i+1}. Returns 0 if x is smaller than all the buckets. UDFFindInArray - Finds the 1-index of the first element in the array given as the second argument. Returns 0 if not found. Returns NULL if either argument is NULL. E.g., FIND_IN_ARRAY(5, array(1,2,5)) will return 3. FIND_IN_ARRAY(5, array(1,2,3)) will return 0. UDFGreatCircleDist - Finds the great circle distance (in km) between two lat/long coordinates (in degrees). UDFLDA - Performs LDA inference on a vector given fixed topics. UDFNumberRows - Number successive rows starting from 1. Counter resets to 1 whenever any of its parameters changes. UDFPmax - Finds the maximum of a set of columns. e.g., PMAX(4, 5, 3) returns 5. UDFRegexpExtractAll - Like REGEXP_EXTRACT except that it returns all matches in an array. UDFUnescape - Returns the string unescaped (using C/Java style unescaping). UDFWhich - Given a boolean array, return the indices which are TRUE. UDFJaccard UDAFCollect - Takes all the values associated with a row and converts it into a list. Make sure to have: set hive.map.aggr = false; UDAFCollectMap - Like collect except that it takes tuples and generates a map. UDAFEntropy - Compute the entropy of a column. UDAFPearson (BROKEN!!!) - Computes the pearson correlation between two columns. UDAFTop - TOP(KEY, VAL) - returns the KEY associated with the largest value of VAL. UDAFTopN (BROKEN!!!) - Like TOP except returns a list of the keys associated with the N (passed as the third parameter) largest values of VAL. UDAFHistogram -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: HIVE-2337: Predicate pushdown erroneously conservative with outer joins
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1275/ --- (Updated 2011-09-01 04:26:59.076177) Review request for hive. Changes --- Oops fixed dropped unit test, javadoc character Summary --- https://issues.apache.org/jira/browse/HIVE-2337 This addresses bug HIVE-2337. https://issues.apache.org/jira/browse/HIVE-2337 Diffs (updated) - http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join5.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 1163875 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientpositive/ppd_outer_join5.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join4.q.out 1163875 Diff: https://reviews.apache.org/r/1275/diff Testing --- Unit tests passed Thanks, Charles
[jira] [Commented] (HIVE-2337) Predicate pushdown erroneously conservative with outer joins
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095110#comment-13095110 ] jirapos...@reviews.apache.org commented on HIVE-2337: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1275/ --- (Updated 2011-09-01 04:26:59.076177) Review request for hive. Changes --- Oops fixed dropped unit test, javadoc character Summary --- https://issues.apache.org/jira/browse/HIVE-2337 This addresses bug HIVE-2337. https://issues.apache.org/jira/browse/HIVE-2337 Diffs (updated) - http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join5.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 1163875 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientpositive/ppd_outer_join5.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join4.q.out 1163875 Diff: https://reviews.apache.org/r/1275/diff Testing --- Unit tests passed Thanks, Charles Predicate pushdown erroneously conservative with outer joins Key: HIVE-2337 URL: https://issues.apache.org/jira/browse/HIVE-2337 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Charles Chen Assignee: Charles Chen Fix For: 0.9.0 Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch, HIVE-2337v3.patch, HIVE-2337v4.patch, HIVE-2337v5.patch The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates. In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as: {noformat} /** * Figures out the aliases for whom it is safe to push predicates based on * ANSI SQL semantics For inner join, all predicates for all aliases can be * pushed For full outer join, none of the predicates can be pushed as that * would limit the number of rows for join For left outer join, all the * predicates on the left side aliases can be pushed up For right outer * join, all the predicates on the right side aliases can be pushed up Joins * chain containing both left and right outer joins are treated as full * outer join. [...] * * @param op * Join Operator * @param rr * Row resolver * @return set of qualified aliases */ {noformat} Since hive joins are left associative, something like a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d should be interpreted as ((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d, so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible. Using: {noformat} create table t1 (id int, key string, value string); create table t2 (id int, key string, value string); create table t3 (id int, key string, value string); create table t4 (id int, key string, value string); {noformat} For example, the query {noformat} explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20; {noformat} currently gives {noformat} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: t1 TableScan alias: t1 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 0 value expressions: expr: id type: int expr: key type: string expr: value type: string t2 TableScan alias: t2 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 1 value expressions: expr: id type: int expr: key
[jira] [Updated] (HIVE-2337) Predicate pushdown erroneously conservative with outer joins
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Chen updated HIVE-2337: --- Attachment: HIVE-2337v6.patch Predicate pushdown erroneously conservative with outer joins Key: HIVE-2337 URL: https://issues.apache.org/jira/browse/HIVE-2337 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Charles Chen Assignee: Charles Chen Fix For: 0.9.0 Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch, HIVE-2337v3.patch, HIVE-2337v4.patch, HIVE-2337v5.patch, HIVE-2337v6.patch The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates. In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as: {noformat} /** * Figures out the aliases for whom it is safe to push predicates based on * ANSI SQL semantics For inner join, all predicates for all aliases can be * pushed For full outer join, none of the predicates can be pushed as that * would limit the number of rows for join For left outer join, all the * predicates on the left side aliases can be pushed up For right outer * join, all the predicates on the right side aliases can be pushed up Joins * chain containing both left and right outer joins are treated as full * outer join. [...] * * @param op * Join Operator * @param rr * Row resolver * @return set of qualified aliases */ {noformat} Since hive joins are left associative, something like a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d should be interpreted as ((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d, so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible. Using: {noformat} create table t1 (id int, key string, value string); create table t2 (id int, key string, value string); create table t3 (id int, key string, value string); create table t4 (id int, key string, value string); {noformat} For example, the query {noformat} explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20; {noformat} currently gives {noformat} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: t1 TableScan alias: t1 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 0 value expressions: expr: id type: int expr: key type: string expr: value type: string t2 TableScan alias: t2 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 1 value expressions: expr: id type: int expr: key type: string expr: value type: string t3 TableScan alias: t3 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 2 value expressions: expr: id type: int expr: key type: string expr: value type: string Reduce Operator Tree: Join Operator condition map: Outer Join 0 to 1 Inner Join 1 to 2 condition expressions: 0 {VALUE._col0} {VALUE._col1} {VALUE._col2} 1 {VALUE._col0} {VALUE._col1} {VALUE._col2} 2 {VALUE._col0} {VALUE._col1} {VALUE._col2} handleSkewJoin: false outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11,
Re: Review Request: HIVE-2337: Predicate pushdown erroneously conservative with outer joins
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1275/ --- (Updated 2011-09-01 04:42:25.815081) Review request for hive. Changes --- Added TestParse changes ?? Summary --- https://issues.apache.org/jira/browse/HIVE-2337 This addresses bug HIVE-2337. https://issues.apache.org/jira/browse/HIVE-2337 Diffs (updated) - http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 1163875 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientpositive/ppd_outer_join5.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join4.q.out 1163875 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join5.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/compiler/plan/input4.q.xml 1163875 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/compiler/plan/join8.q.xml 1163875 Diff: https://reviews.apache.org/r/1275/diff Testing --- Unit tests passed Thanks, Charles
[jira] [Commented] (HIVE-2337) Predicate pushdown erroneously conservative with outer joins
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095116#comment-13095116 ] jirapos...@reviews.apache.org commented on HIVE-2337: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1275/ --- (Updated 2011-09-01 04:42:25.815081) Review request for hive. Changes --- Added TestParse changes ?? Summary --- https://issues.apache.org/jira/browse/HIVE-2337 This addresses bug HIVE-2337. https://issues.apache.org/jira/browse/HIVE-2337 Diffs (updated) - http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 1163875 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientpositive/ppd_outer_join5.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join4.q.out 1163875 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join5.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/compiler/plan/input4.q.xml 1163875 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/compiler/plan/join8.q.xml 1163875 Diff: https://reviews.apache.org/r/1275/diff Testing --- Unit tests passed Thanks, Charles Predicate pushdown erroneously conservative with outer joins Key: HIVE-2337 URL: https://issues.apache.org/jira/browse/HIVE-2337 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Charles Chen Assignee: Charles Chen Fix For: 0.9.0 Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch, HIVE-2337v3.patch, HIVE-2337v4.patch, HIVE-2337v5.patch, HIVE-2337v6.patch The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates. In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as: {noformat} /** * Figures out the aliases for whom it is safe to push predicates based on * ANSI SQL semantics For inner join, all predicates for all aliases can be * pushed For full outer join, none of the predicates can be pushed as that * would limit the number of rows for join For left outer join, all the * predicates on the left side aliases can be pushed up For right outer * join, all the predicates on the right side aliases can be pushed up Joins * chain containing both left and right outer joins are treated as full * outer join. [...] * * @param op * Join Operator * @param rr * Row resolver * @return set of qualified aliases */ {noformat} Since hive joins are left associative, something like a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d should be interpreted as ((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d, so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible. Using: {noformat} create table t1 (id int, key string, value string); create table t2 (id int, key string, value string); create table t3 (id int, key string, value string); create table t4 (id int, key string, value string); {noformat} For example, the query {noformat} explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20; {noformat} currently gives {noformat} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: t1 TableScan alias: t1 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 0 value expressions: expr: id type: int expr: key type: string expr: value type: string t2 TableScan alias: t2 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr:
[jira] [Updated] (HIVE-2337) Predicate pushdown erroneously conservative with outer joins
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Chen updated HIVE-2337: --- Attachment: HIVE-2337v7.patch Predicate pushdown erroneously conservative with outer joins Key: HIVE-2337 URL: https://issues.apache.org/jira/browse/HIVE-2337 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Charles Chen Assignee: Charles Chen Fix For: 0.9.0 Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch, HIVE-2337v3.patch, HIVE-2337v4.patch, HIVE-2337v5.patch, HIVE-2337v6.patch, HIVE-2337v7.patch The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates. In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as: {noformat} /** * Figures out the aliases for whom it is safe to push predicates based on * ANSI SQL semantics For inner join, all predicates for all aliases can be * pushed For full outer join, none of the predicates can be pushed as that * would limit the number of rows for join For left outer join, all the * predicates on the left side aliases can be pushed up For right outer * join, all the predicates on the right side aliases can be pushed up Joins * chain containing both left and right outer joins are treated as full * outer join. [...] * * @param op * Join Operator * @param rr * Row resolver * @return set of qualified aliases */ {noformat} Since hive joins are left associative, something like a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d should be interpreted as ((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d, so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible. Using: {noformat} create table t1 (id int, key string, value string); create table t2 (id int, key string, value string); create table t3 (id int, key string, value string); create table t4 (id int, key string, value string); {noformat} For example, the query {noformat} explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20; {noformat} currently gives {noformat} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: t1 TableScan alias: t1 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 0 value expressions: expr: id type: int expr: key type: string expr: value type: string t2 TableScan alias: t2 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 1 value expressions: expr: id type: int expr: key type: string expr: value type: string t3 TableScan alias: t3 Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 2 value expressions: expr: id type: int expr: key type: string expr: value type: string Reduce Operator Tree: Join Operator condition map: Outer Join 0 to 1 Inner Join 1 to 2 condition expressions: 0 {VALUE._col0} {VALUE._col1} {VALUE._col2} 1 {VALUE._col0} {VALUE._col1} {VALUE._col2} 2 {VALUE._col0} {VALUE._col1} {VALUE._col2} handleSkewJoin: false outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7,
Re: Review Request: HIVE-1989: recognize transitivity of predicates on join keys
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1228/ --- (Updated 2011-09-01 05:43:51.993258) Review request for hive. Changes --- Rebase to trunk Summary --- https://issues.apache.org/jira/browse/HIVE-1989 This addresses bug HIVE-1989. https://issues.apache.org/jira/browse/HIVE-1989 Diffs (updated) - http://svn.apache.org/repos/asf/hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1163905 http://svn.apache.org/repos/asf/hive/trunk/conf/hive-default.xml 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientpositive/ppd_transitivity.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/auto_join16.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/cluster.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/index_auto_mult_tables.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/index_auto_mult_tables_compact.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/join16.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/join38.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/louter_join_ppr.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_clusterby.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_gby_join.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_join.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_join2.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_join3.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join1.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join2.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join4.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_transitivity.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_udf_case.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/router_join_ppr.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/smb_mapjoin9.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/smb_mapjoin_6.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/udf_named_struct.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/union22.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/compiler/plan/join1.q.xml 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/compiler/plan/join2.q.xml 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/compiler/plan/join3.q.xml 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/compiler/plan/sample1.q.xml 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/compiler/plan/sample2.q.xml 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/compiler/plan/sample3.q.xml 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/compiler/plan/sample4.q.xml 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/compiler/plan/sample5.q.xml 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/compiler/plan/sample6.q.xml 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/compiler/plan/sample7.q.xml 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/compiler/plan/subq.q.xml 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/compiler/plan/udf1.q.xml 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/compiler/plan/udf4.q.xml 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/compiler/plan/udf6.q.xml 1163905
[jira] [Commented] (HIVE-1989) recognize transitivity of predicates on join keys
[ https://issues.apache.org/jira/browse/HIVE-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095134#comment-13095134 ] jirapos...@reviews.apache.org commented on HIVE-1989: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1228/ --- (Updated 2011-09-01 05:43:51.993258) Review request for hive. Changes --- Rebase to trunk Summary --- https://issues.apache.org/jira/browse/HIVE-1989 This addresses bug HIVE-1989. https://issues.apache.org/jira/browse/HIVE-1989 Diffs (updated) - http://svn.apache.org/repos/asf/hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1163905 http://svn.apache.org/repos/asf/hive/trunk/conf/hive-default.xml 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientpositive/ppd_transitivity.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/auto_join16.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/cluster.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/index_auto_mult_tables.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/index_auto_mult_tables_compact.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/join16.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/join38.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/louter_join_ppr.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_clusterby.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_gby_join.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_join.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_join2.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_join3.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join1.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join2.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join4.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_transitivity.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_udf_case.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/router_join_ppr.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/smb_mapjoin9.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/smb_mapjoin_6.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/udf_named_struct.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/union22.q.out 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/compiler/plan/join1.q.xml 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/compiler/plan/join2.q.xml 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/compiler/plan/join3.q.xml 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/compiler/plan/sample1.q.xml 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/compiler/plan/sample2.q.xml 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/compiler/plan/sample3.q.xml 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/compiler/plan/sample4.q.xml 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/compiler/plan/sample5.q.xml 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/compiler/plan/sample6.q.xml 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/compiler/plan/sample7.q.xml 1163905 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/compiler/plan/subq.q.xml 1163905