[jira] [Commented] (HIVE-2095) auto convert map join bug

2012-09-06 Thread Matt Kleiderman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449933#comment-13449933
 ] 

Matt Kleiderman commented on HIVE-2095:
---

I think I'm hitting this issue with an 0.7.1 installation - can you provide 
information about how big the tables need to be in order to trigger the 
NullPointerException?

> auto convert map join bug
> -
>
> Key: HIVE-2095
> URL: https://issues.apache.org/jira/browse/HIVE-2095
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Fix For: 0.8.0
>
> Attachments: HIVE-2095.1.patch, HIVE-2095.2.patch
>
>
> 1) 
> when considering to choose one table as the big table candidate for a map 
> join, if at compile time, hive can find out that the total known size of all 
> other tables excluding the big table in consideration is bigger than a 
> configured value, this big table candidate is a bad one, and should not put 
> into plan. Otherwise, at runtime to filter this out may cause more time.
> 2)
> added a null check for back up tasks. Otherwise will see NullPointerException
> 3)
> CommonJoinResolver needs to know a full mapping of pathToAliases. Otherwise 
> it will make wrong decision.
> 4)
> changes made to the ConditionalResolverCommonJoin: added pathToAliases, 
> aliasToSize (alias's input size that is known at compile time, by 
> inputSummary), and intermediate dir path.
> So the logic is, go over all the pathToAliases, and for each path, if it is 
> from intermediate dir path, add this path's size to all aliases. And finally 
> based on the size information and others like aliasToTask to choose the big 
> table. 
> 5)
> Conditional task's children contains wrong options, which may cause join fail 
> or incorrect results. Basically when getting all possible children for the 
> conditional task, should use a whitelist of big tables. Only tables in this 
> while list can be considered as a big table.
> Here is the logic:
> +   * Get a list of big table candidates. Only the tables in the returned set 
> can
> +   * be used as big table in the join operation.
> +   * 
> +   * The logic here is to scan the join condition array from left to right. 
> If
> +   * see a inner join and the bigTableCandidates is empty, add both side of 
> this
> +   * inner join to big table candidates. If see a left outer join, and the
> +   * bigTableCandidates is empty, add the left side to it, and if the
> +   * bigTableCandidates is not empty, do nothing (which means the
> +   * bigTableCandidates is from left side). If see a right outer join, clear 
> the
> +   * bigTableCandidates, and add right side to the bigTableCandidates, it 
> means
> +   * the right side of a right outer join always win. If see a full outer 
> join,
> +   * return null immediately (no one can be the big table, can not do a
> +   * mapjoin).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2095) auto convert map join bug

2011-04-07 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017291#comment-13017291
 ] 

He Yongqiang commented on HIVE-2095:


Uploading a new patch to address namit's comments.

Note, there is an existing bug in hive that cause results of auto_join29.q is 
not correct. 
Let's file another jira for it.
basically, if the outer join filter is enabled, the query "SELECT 
/*+mapjoin(src1, src2)*/ * FROM src src1 RIGHT OUTER JOIN src src2 ON (src1.key 
= src2.key AND src1.key < 10 AND src2.key > 10) JOIN src src3 ON (src2.key = 
src3.key AND src3.key < 10) SORT BY src1.key, src1.value, src2.key, src2.value, 
src3.key, src3.value;" will give wrong results in today's hive.

> auto convert map join bug
> -
>
> Key: HIVE-2095
> URL: https://issues.apache.org/jira/browse/HIVE-2095
> Project: Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-2095.1.patch, HIVE-2095.2.patch
>
>
> 1) 
> when considering to choose one table as the big table candidate for a map 
> join, if at compile time, hive can find out that the total known size of all 
> other tables excluding the big table in consideration is bigger than a 
> configured value, this big table candidate is a bad one, and should not put 
> into plan. Otherwise, at runtime to filter this out may cause more time.
> 2)
> added a null check for back up tasks. Otherwise will see NullPointerException
> 3)
> CommonJoinResolver needs to know a full mapping of pathToAliases. Otherwise 
> it will make wrong decision.
> 4)
> changes made to the ConditionalResolverCommonJoin: added pathToAliases, 
> aliasToSize (alias's input size that is known at compile time, by 
> inputSummary), and intermediate dir path.
> So the logic is, go over all the pathToAliases, and for each path, if it is 
> from intermediate dir path, add this path's size to all aliases. And finally 
> based on the size information and others like aliasToTask to choose the big 
> table. 
> 5)
> Conditional task's children contains wrong options, which may cause join fail 
> or incorrect results. Basically when getting all possible children for the 
> conditional task, should use a whitelist of big tables. Only tables in this 
> while list can be considered as a big table.
> Here is the logic:
> +   * Get a list of big table candidates. Only the tables in the returned set 
> can
> +   * be used as big table in the join operation.
> +   * 
> +   * The logic here is to scan the join condition array from left to right. 
> If
> +   * see a inner join and the bigTableCandidates is empty, add both side of 
> this
> +   * inner join to big table candidates. If see a left outer join, and the
> +   * bigTableCandidates is empty, add the left side to it, and if the
> +   * bigTableCandidates is not empty, do nothing (which means the
> +   * bigTableCandidates is from left side). If see a right outer join, clear 
> the
> +   * bigTableCandidates, and add right side to the bigTableCandidates, it 
> means
> +   * the right side of a right outer join always win. If see a full outer 
> join,
> +   * return null immediately (no one can be the big table, can not do a
> +   * mapjoin).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2095) auto convert map join bug

2011-04-07 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017208#comment-13017208
 ] 

Liyin Tang commented on HIVE-2095:
--

it looks good to me. Thanks Yongqiang

> auto convert map join bug
> -
>
> Key: HIVE-2095
> URL: https://issues.apache.org/jira/browse/HIVE-2095
> Project: Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-2095.1.patch
>
>
> 1) 
> when considering to choose one table as the big table candidate for a map 
> join, if at compile time, hive can find out that the total known size of all 
> other tables excluding the big table in consideration is bigger than a 
> configured value, this big table candidate is a bad one, and should not put 
> into plan. Otherwise, at runtime to filter this out may cause more time.
> 2)
> added a null check for back up tasks. Otherwise will see NullPointerException
> 3)
> CommonJoinResolver needs to know a full mapping of pathToAliases. Otherwise 
> it will make wrong decision.
> 4)
> changes made to the ConditionalResolverCommonJoin: added pathToAliases, 
> aliasToSize (alias's input size that is known at compile time, by 
> inputSummary), and intermediate dir path.
> So the logic is, go over all the pathToAliases, and for each path, if it is 
> from intermediate dir path, add this path's size to all aliases. And finally 
> based on the size information and others like aliasToTask to choose the big 
> table. 
> 5)
> Conditional task's children contains wrong options, which may cause join fail 
> or incorrect results. Basically when getting all possible children for the 
> conditional task, should use a whitelist of big tables. Only tables in this 
> while list can be considered as a big table.
> Here is the logic:
> +   * Get a list of big table candidates. Only the tables in the returned set 
> can
> +   * be used as big table in the join operation.
> +   * 
> +   * The logic here is to scan the join condition array from left to right. 
> If
> +   * see a inner join and the bigTableCandidates is empty, add both side of 
> this
> +   * inner join to big table candidates. If see a left outer join, and the
> +   * bigTableCandidates is empty, add the left side to it, and if the
> +   * bigTableCandidates is not empty, do nothing (which means the
> +   * bigTableCandidates is from left side). If see a right outer join, clear 
> the
> +   * bigTableCandidates, and add right side to the bigTableCandidates, it 
> means
> +   * the right side of a right outer join always win. If see a full outer 
> join,
> +   * return null immediately (no one can be the big table, can not do a
> +   * mapjoin).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2095) auto convert map join bug

2011-04-07 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017064#comment-13017064
 ] 

He Yongqiang commented on HIVE-2095:


https://reviews.apache.org/r/559/

> auto convert map join bug
> -
>
> Key: HIVE-2095
> URL: https://issues.apache.org/jira/browse/HIVE-2095
> Project: Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-2095.1.patch
>
>
> 1) 
> when considering to choose one table as the big table candidate for a map 
> join, if at compile time, hive can find out that the total known size of all 
> other tables excluding the big table in consideration is bigger than a 
> configured value, this big table candidate is a bad one, and should not put 
> into plan. Otherwise, at runtime to filter this out may cause more time.
> 2)
> added a null check for back up tasks. Otherwise will see NullPointerException
> 3)
> CommonJoinResolver needs to know a full mapping of pathToAliases. Otherwise 
> it will make wrong decision.
> 4)
> changes made to the ConditionalResolverCommonJoin: added pathToAliases, 
> aliasToSize (alias's input size that is known at compile time, by 
> inputSummary), and intermediate dir path.
> So the logic is, go over all the pathToAliases, and for each path, if it is 
> from intermediate dir path, add this path's size to all aliases. And finally 
> based on the size information and others like aliasToTask to choose the big 
> table. 
> 5)
> Conditional task's children contains wrong options, which may cause join fail 
> or incorrect results. Basically when getting all possible children for the 
> conditional task, should use a whitelist of big tables. Only tables in this 
> while list can be considered as a big table.
> Here is the logic:
> +   * Get a list of big table candidates. Only the tables in the returned set 
> can
> +   * be used as big table in the join operation.
> +   * 
> +   * The logic here is to scan the join condition array from left to right. 
> If
> +   * see a inner join and the bigTableCandidates is empty, add both side of 
> this
> +   * inner join to big table candidates. If see a left outer join, and the
> +   * bigTableCandidates is empty, add the left side to it, and if the
> +   * bigTableCandidates is not empty, do nothing (which means the
> +   * bigTableCandidates is from left side). If see a right outer join, clear 
> the
> +   * bigTableCandidates, and add right side to the bigTableCandidates, it 
> means
> +   * the right side of a right outer join always win. If see a full outer 
> join,
> +   * return null immediately (no one can be the big table, can not do a
> +   * mapjoin).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2095) auto convert map join bug

2011-04-07 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017049#comment-13017049
 ] 

Namit Jain commented on HIVE-2095:
--

Can you also create a review-board request ?

> auto convert map join bug
> -
>
> Key: HIVE-2095
> URL: https://issues.apache.org/jira/browse/HIVE-2095
> Project: Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-2095.1.patch
>
>
> 1) 
> when considering to choose one table as the big table candidate for a map 
> join, if at compile time, hive can find out that the total known size of all 
> other tables excluding the big table in consideration is bigger than a 
> configured value, this big table candidate is a bad one, and should not put 
> into plan. Otherwise, at runtime to filter this out may cause more time.
> 2)
> added a null check for back up tasks. Otherwise will see NullPointerException
> 3)
> CommonJoinResolver needs to know a full mapping of pathToAliases. Otherwise 
> it will make wrong decision.
> 4)
> changes made to the ConditionalResolverCommonJoin: added pathToAliases, 
> aliasToSize (alias's input size that is known at compile time, by 
> inputSummary), and intermediate dir path.
> So the logic is, go over all the pathToAliases, and for each path, if it is 
> from intermediate dir path, add this path's size to all aliases. And finally 
> based on the size information and others like aliasToTask to choose the big 
> table. 
> 5)
> Conditional task's children contains wrong options, which may cause join fail 
> or incorrect results. Basically when getting all possible children for the 
> conditional task, should use a whitelist of big tables. Only tables in this 
> while list can be considered as a big table.
> Here is the logic:
> +   * Get a list of big table candidates. Only the tables in the returned set 
> can
> +   * be used as big table in the join operation.
> +   * 
> +   * The logic here is to scan the join condition array from left to right. 
> If
> +   * see a inner join and the bigTableCandidates is empty, add both side of 
> this
> +   * inner join to big table candidates. If see a left outer join, and the
> +   * bigTableCandidates is empty, add the left side to it, and if the
> +   * bigTableCandidates is not empty, do nothing (which means the
> +   * bigTableCandidates is from left side). If see a right outer join, clear 
> the
> +   * bigTableCandidates, and add right side to the bigTableCandidates, it 
> means
> +   * the right side of a right outer join always win. If see a full outer 
> join,
> +   * return null immediately (no one can be the big table, can not do a
> +   * mapjoin).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2095) auto convert map join bug

2011-04-07 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016871#comment-13016871
 ] 

Liyin Tang commented on HIVE-2095:
--

I will take a look

> auto convert map join bug
> -
>
> Key: HIVE-2095
> URL: https://issues.apache.org/jira/browse/HIVE-2095
> Project: Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-2095.1.patch
>
>
> 1) 
> when considering to choose one table as the big table candidate for a map 
> join, if at compile time, hive can find out that the total known size of all 
> other tables excluding the big table in consideration is bigger than a 
> configured value, this big table candidate is a bad one, and should not put 
> into plan. Otherwise, at runtime to filter this out may cause more time.
> 2)
> added a null check for back up tasks. Otherwise will see NullPointerException
> 3)
> CommonJoinResolver needs to know a full mapping of pathToAliases. Otherwise 
> it will make wrong decision.
> 4)
> changes made to the ConditionalResolverCommonJoin: added pathToAliases, 
> aliasToSize (alias's input size that is known at compile time, by 
> inputSummary), and intermediate dir path.
> So the logic is, go over all the pathToAliases, and for each path, if it is 
> from intermediate dir path, add this path's size to all aliases. And finally 
> based on the size information and others like aliasToTask to choose the big 
> table. 
> 5)
> Conditional task's children contains wrong options, which may cause join fail 
> or incorrect results. Basically when getting all possible children for the 
> conditional task, should use a whitelist of big tables. Only tables in this 
> while list can be considered as a big table.
> Here is the logic:
> +   * Get a list of big table candidates. Only the tables in the returned set 
> can
> +   * be used as big table in the join operation.
> +   * 
> +   * The logic here is to scan the join condition array from left to right. 
> If
> +   * see a inner join and the bigTableCandidates is empty, add both side of 
> this
> +   * inner join to big table candidates. If see a left outer join, and the
> +   * bigTableCandidates is empty, add the left side to it, and if the
> +   * bigTableCandidates is not empty, do nothing (which means the
> +   * bigTableCandidates is from left side). If see a right outer join, clear 
> the
> +   * bigTableCandidates, and add right side to the bigTableCandidates, it 
> means
> +   * the right side of a right outer join always win. If see a full outer 
> join,
> +   * return null immediately (no one can be the big table, can not do a
> +   * mapjoin).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira