from:"David Inbar \(JIRA\)"

[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-01-13 Thread David Inbar (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13552282#comment-13552282
]

David Inbar commented on HIVE-2206:
---

I will be on vacation through January 14th, but will be checking email and
voicemail periodically.

For all time-critical items, please call my mobile phone.

Many thanks,
David

NOTICE: All information in and attached to this email may be proprietary,
confidential, privileged and otherwise protected from improper or erroneous
disclosure. If you are not the sender's intended recipient, you are not
authorized to intercept, read, print, retain, copy, forward, or disseminate
this message.

add a new optimizer for query correlation discovery and optimization

Key: HIVE-2206
URL: https://issues.apache.org/jira/browse/HIVE-2206
Project: Hive
Issue Type: New Feature
Components: Query Processor
Affects Versions: 0.10.0
Reporter: He Yongqiang
Assignee: Yin Huai
Attachments: HIVE-2206.10-r1384442.patch.txt,
HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt,
HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt,
HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt,
HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt,
HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt,
HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt,
HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt,
HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt,
HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch

This issue proposes a new logical optimizer called Correlation Optimizer,
which is used to merge correlated MapReduce jobs (MR jobs) into a single MR
job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The
paper and slides of YSmart are linked at the bottom.
Since Hive translates queries in a sentence by sentence fashion, for every
operation which may need to shuffle the data (e.g. join and aggregation
operations), Hive will generate a MapReduce job for that operation. However,
for those operations which may need to shuffle the data, they may involve
correlations explained below and thus can be executed in a single MR job.
# Input Correlation: Multiple MR jobs have input correlation (IC) if their
input relation sets are not disjoint;
# Transit Correlation: Multiple MR jobs have transit correlation (TC) if they
have not only input correlation, but also the same partition key;
# Job Flow Correlation: An MR has job ﬂow correlation (JFC) with one of its
child nodes if it has the same partition key as that child node.
The current implementation of correlation optimizer only detect correlations
among MR jobs for reduce-side join operators and reduce-side aggregation
operators (not map only aggregation). A query will be optimized if it
satisfies following conditions.
# There exists a MR job for reduce-side join operator or reduce side
aggregation operator which have JFC with all of its parents MR jobs (TCs will
be also exploited if JFC exists);
# All input tables of those correlated MR job are original input tables (not
intermediate tables generated by sub-queries); and
# No self join is involved in those correlated MR jobs.
Correlation optimizer is implemented as a logical optimizer. The main reasons
are that it only needs to manipulate the query plan tree and it can leverage
the existing component on generating MR jobs.
Current implementation can serve as a framework for correlation related
optimizations. I think that it is better than adding individual optimizers.
There are several work that can be done in future to improve this optimizer.
Here are three examples.
# Support queries only involve TC;
# Support queries in which input tables of correlated MR jobs involves
intermediate tables; and
# Optimize queries involving self join.
References:
Paper and presentation of YSmart.
Paper:
http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-11-19 Thread David Inbar (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500474#comment-13500474
]

David Inbar commented on HIVE-2206:
---

I will be on vacation through Friday Nov 23rd, but will be checking email and
voicemail periodically.

For all time-critical items, please call my mobile phone.

Many thanks,
David

add a new optimizer for query correlation discovery and optimization

Key: HIVE-2206
URL: https://issues.apache.org/jira/browse/HIVE-2206
Project: Hive
Issue Type: New Feature
Components: Query Processor
Affects Versions: 0.10.0
Reporter: He Yongqiang
Assignee: Yin Huai
Attachments: HIVE-2206.10-r1384442.patch.txt,
HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt,
HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt,
HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt,
HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt,
HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt,
HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt,
HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt,
HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch

[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2 matches

Site Navigation

Mail list logo

Footer information