[jira] Commented: (PIG-223) Optimization Idea: Dynamic histogram generation for join ordering?

Olga Natkovich (JIRA) Tue, 29 Apr 2008 13:40:38 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12593104#action_12593104
 ]


Olga Natkovich commented on PIG-223:
------------------------------------

I think as Pig becomes more mature, we will start collecting and storing needed 
metadata such as data sizes, column cordinality, sort/partition order of the 
data, etc.

Trying to dynamically compute the information if it is not available sounds 
like a good idea.

> Optimization Idea: Dynamic histogram generation for join ordering?
> ------------------------------------------------------------------
>
>                 Key: PIG-223
>                 URL: https://issues.apache.org/jira/browse/PIG-223
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Pi Song
>
> This idea sprang into my mind when I was implementing explicit casting 
> insertion for Type Checking.
> Problem:
> Given a query containing 3 or more joins, what is the most efficient join 
> order? (Pig doesn't have indexing feature so statistics are not available)
> Solution:
> 0. Start with a given plan 
> 1. Somehow select the first join (this is still an open question).
> 2. Insert histogram generator for columns used in remaining joins in the 
> first MapReduce run.
> 3. Run MapReduce
> 4. Use histogram information generated from (2) to order joins for the rest 
> of the plan
> 5. More MapReduce runs until finish.
> There is another open question regarding histogram of joins based on 
> calculated columns. In this case calculating histogram upfront might be 
> conflicting with the conventional optimization technique "pulling filters up 
> and pushing calculations down".
> Not sure about usefulness because myself has never come across any 3-joins.
> Any opinion?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-223) Optimization Idea: Dynamic histogram generation for join ordering?

Reply via email to