Re: Local join instead of data exchange - co-located blocks

Philipp Krause Wed, 14 Mar 2018 00:07:16 -0700

Thank you very much for these information! I'll try to implement these two
steps and post some updates within the next days!


Best regards
Philipp

2018-03-13 5:38 GMT+01:00 Alexander Behm <alex.b...@cloudera.com>:

> Cool that you working on a research project with Impala!
>
> Properly adding such a feature to Impala is a substantial effort, but
> hacking the code for an experiment or two seems doable.
>
> I think you will need to modify two things: (1) the planner to not add
> exchange nodes, and (2) the scheduler to assign the co-located scan ranges
> to the same host.
>
> Here are a few starting points in the code:
>
> 1) DistributedPlanner
> https://github.com/apache/impala/blob/master/fe/src/
> main/java/org/apache/impala/planner/DistributedPlanner.java#L318
>
> The first condition handles the case where no exchange nodes need to be
> added because the join inputs are already suitably partitioned.
> You could hack the code to always go into that codepath, so no exchanges
> are added.
>
> 2) The scheduler
> https://github.com/apache/impala/blob/master/be/src/
> scheduling/scheduler.cc#L226
>
> You'll need to dig through and understand that code so that you can make
> the necessary changes. Change the scan range to host mapping to your
> liking. The rest of the code should just work.
>
> Cheers,
>
> Alex
>
>
> On Mon, Mar 12, 2018 at 6:55 PM, Philipp Krause <philippkrause.mail@
> googlemail.com> wrote:
>
>> Thank you very much for your quick answers!
>> The intention behind this is to improve the execution time and
>> (primarily) to examine the impact of block-co-location (research project)
>> for this particular query (simplified):
>>
>> select A.x, B.y, A.z from tableA as A inner join tableB as B on A.id=B.id
>>
>> The "real" query includes three joins and the data size is in pb-range.
>> Therefore several nodes (5 in the test environment with less data) are used
>> (without any load balancer).
>>
>> Could you give me some hints what code changes are required and which
>> files are affected? I don't know how to give Impala the information that it
>> should only join the local data blocks on each node and then pass it to the
>> "final" node which receives all intermediate results. I hope you can help
>> me to get this working. That would be awesome!
>> Best regards
>> Philipp
>>
>> Am 12.03.2018 um 18:38 schrieb Alexander Behm:
>>
>> I suppose one exception is if your data lives only on a single node. Then
>> you can set num_nodes=1 and make sure to send the query request to the
>> impalad running on the same data node as the target data. Then you should
>> get a local join.
>>
>> On Mon, Mar 12, 2018 at 9:30 AM, Alexander Behm <alex.b...@cloudera.com>
>> wrote:
>>
>>> Such a specific block arrangement is very uncommon for typical Impala
>>> setups, so we don't attempt to recognize and optimize this narrow case. In
>>> particular, such an arrangement tends to be short lived if you have the
>>> HDFS balancer turned on.
>>>
>>> Without making code changes, there is no way today to remove the data
>>> exchanges and make sure that the scheduler assigns scan splits to nodes in
>>> the desired way (co-located, but with possible load imbalance).
>>>
>>> In what way is the current setup unacceptable to you? Is this pre-mature
>>> optimization? If you have certain performance expectations/requirements for
>>> specific queries we might be able to help you improve those. If you want to
>>> pursue this route, please help us by posting complete query profiles.
>>>
>>> Alex
>>>
>>> On Mon, Mar 12, 2018 at 6:29 AM, Philipp Krause <
>>> philippkrause.m...@googlemail.com> wrote:
>>>
>>>> Hello everyone!
>>>>
>>>> In order to prevent network traffic, I'd like to perform local joins on
>>>> each node instead of exchanging the data and perform a join over the
>>>> complete data afterwards. My query is basically a join over three three
>>>> tables on an ID attribute. The blocks are perfectly distributed, so that
>>>> e.g. Table A - Block 0  and Table B - Block 0  are on the same node. These
>>>> blocks contain all data rows with an ID range [0,1]. Table A - Block 1  and
>>>> Table B - Block 1 with an ID range [2,3] are on another node etc. So I want
>>>> to perform a local join per node because any data exchange would be
>>>> unneccessary (except for the last step when the final node recevieves all
>>>> results of the other nodes). Is this possible?
>>>> At the moment the query plan includes multiple data exchanges, although
>>>> the blocks are already perfectly distributed (manually).
>>>> I would be grateful for any help!
>>>>
>>>> Best regards
>>>> Philipp Krause
>>>>
>>>>
>>>
>>
>>
>

Re: Local join instead of data exchange - co-located blocks

Reply via email to