[jira] [Commented] (HIVE-20254) CheckNonCombinablePathCallable is buggy

2018-08-13 Thread Qinghui Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578003#comment-16578003
 ] 

Qinghui Xu commented on HIVE-20254:
---

I think we should backport this fix to version 1.1.0

> CheckNonCombinablePathCallable is buggy
> ---
>
> Key: HIVE-20254
> URL: https://issues.apache.org/jira/browse/HIVE-20254
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Qinghui Xu
>Priority: Major
>
> CombineHiveInputFormat provides the possibility for people to avoid combine 
> some part of their inputs (by implementing AvoidSplitCombination)
> We spot a problem with that when our query tries to read a lot of partitions 
> (more than 100). In fact, when there are more than 100 input paths, the check 
> of combinability is run in parallel:
>  * dividing the input path array into several chunks (each chunk with no more 
> than 100 paths)
>  * submit each chunk to a CheckNonCombinablePathCallable
>  * each CheckNonCombinablePathCallable will return a set of index for the 
> paths to not be combined
> The problem is that CheckNonCombinablePathCallable returns a set of relative 
> index (the index inside the chunk) instead of the absolute index, it means 
> that the returned indices are always smaller than 100, thus all the paths in 
> the array with position bigger than 100 are never taken into account for 
> avoiding combine input.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20254) CheckNonCombinablePathCallable is buggy

2018-07-30 Thread Szehon Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562715#comment-16562715
 ] 

Szehon Ho commented on HIVE-20254:
--

Looks like it is resolved by HIVE-13968

> CheckNonCombinablePathCallable is buggy
> ---
>
> Key: HIVE-20254
> URL: https://issues.apache.org/jira/browse/HIVE-20254
> Project: Hive
>  Issue Type: Bug
>Reporter: Qinghui Xu
>Priority: Major
>
> CombineHiveInputFormat provides the possibility for people to avoid combine 
> some part of their inputs (by implementing AvoidSplitCombination)
> We spot a problem with that when our query tries to read a lot of partitions 
> (more than 100). In fact, when there are more than 100 input paths, the check 
> of combinability is run in parallel:
>  * dividing the input path array into several chunks (each chunk with no more 
> than 100 paths)
>  * submit each chunk to a CheckNonCombinablePathCallable
>  * each CheckNonCombinablePathCallable will return a set of index for the 
> paths to not be combined
> The problem is that CheckNonCombinablePathCallable returns a set of relative 
> index (the index inside the chunk) instead of the absolute index, it means 
> that the returned indices are always smaller than 100, thus all the paths in 
> the array with position bigger than 100 are never taken into account for 
> avoiding combine input.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)