[ 
https://issues.apache.org/jira/browse/PIG-223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12594044#action_12594044
 ] 

Pi Song commented on PIG-223:
-----------------------------

That's right. Collecting meta data will help a lot. There are 2 cases:-
1. User directs the meta data creation. This is like creating indexes in RDBMS
2. Dynamic meta data creation. This may happen as a part of optimization when 
user runs an adhoc query. 

> Optimization Idea: Dynamic histogram generation for join ordering?
> ------------------------------------------------------------------
>
>                 Key: PIG-223
>                 URL: https://issues.apache.org/jira/browse/PIG-223
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Pi Song
>
> This idea sprang into my mind when I was implementing explicit casting 
> insertion for Type Checking.
> Problem:
> Given a query containing 3 or more joins, what is the most efficient join 
> order? (Pig doesn't have indexing feature so statistics are not available)
> Solution:
> 0. Start with a given plan 
> 1. Somehow select the first join (this is still an open question).
> 2. Insert histogram generator for columns used in remaining joins in the 
> first MapReduce run.
> 3. Run MapReduce
> 4. Use histogram information generated from (2) to order joins for the rest 
> of the plan
> 5. More MapReduce runs until finish.
> There is another open question regarding histogram of joins based on 
> calculated columns. In this case calculating histogram upfront might be 
> conflicting with the conventional optimization technique "pulling filters up 
> and pushing calculations down".
> Not sure about usefulness because myself has never come across any 3-joins.
> Any opinion?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to