Re: Local join instead of data exchange - co-located blocks

2018-05-14 Thread Lars Volker
Hi Philipp,

Looking at the profile, one of your scan nodes doesn't seem to receive any
scan ranges ("Hdfs split stats" is empty). The other one receives one
split, but it get's filtered out by the runtime filter coming from that
first node ("Files rejected: 1"). You might want to disable runtime filters
for now until you get it sorted out.

Then you might want to start debugging
in be/src/service/client-request-state.cc:466, which is where the scheduler
gets called. You mentioned that your assignments look OK, so until then
things should be correct. If you're uncomfortable poking it all apart with
GDB you can always print objects using the methods in debug-util.h. From
there go down coord_->Exec() in L480. Set query option num_nodes=1 to
execute everything at the coordinator for easier debugging. Otherwise, the
coordinator will start remote fragments, which you can intercept with a
debugger in ImpalaInternalService::ExecQueryFInstances
(be/src/service/impala-internal-service.cc:42).

Cheers, Lars

On Mon, May 14, 2018 at 1:18 AM, Philipp Krause <
philippkrause.m...@googlemail.com> wrote:

> Hello Alex,
>
> I suppose you're very busy, so I apologize for the interruption. If you
> have any idea of what I could try to solve this problem, please let me
> know. Currently I don't know how to progress and I'd appreciate any help
> you can give me.
>
> Best regards
> Philipp
>
>
> Philipp Krause  schrieb am Mo., 7. Mai
> 2018, 12:59:
>
>> I just wanted to add, that I tried the join with two other, minimal and
>> "fresh" tables. All blocks from both tables were on the same node but I got
>> the same result that no data were processed. To me, the scan range mapping
>> of my modified version looks the same compared to the original one. I only
>> noticed a difference in the query profile:
>> Filter 0 (1.00 MB):
>>  - Files processed: 1 (1)
>>  - Files rejected: 1 (1)
>> ...
>>
>> This filter only appears in my modified version. Hopefully we can find
>> the mistake.
>>
>> Am 04.05.2018 um 15:40 schrieb Philipp Krause:
>>
>> Hi!
>>
>> The query profile and the scan range mappings are attached
>> (query_profile.txt + scan_ranges.txt). The complete log file is also
>> attached. The mapping looks fine to me, I couldn't find any mistakes there.
>> For example, line 168 (scan_ranges.txt) shows that partition ID=4 is
>> assigned to node_0 and partition ID=10 is assigned to node_1. Both
>> partitions contain all id=4 rows which should be correct for the join. But
>> probably I have overlooked something in the log.
>>
>> The partition/block setup is as follows:
>> 6 Nodes (1 Namenode, 5 Datanodes)
>> Node 1:
>> Node 2: 0|0 5|5
>> Node 3: 1|1
>> Node 4: 2|2
>> Node 5: 3|3
>> Node 6: 4|4
>>
>> 0|0 means partition_0 from table A and B.
>>
>> Also thanks to Lars for the logging option, which I have used!
>>
>> Best regards
>> Philipp
>>
>> Am 04.05.2018 um 07:10 schrieb Lars Volker:
>>
>> I haven't followed this thread closely, but you can also print all scan
>> range assignments made by the scheduler by passing -vmodule=scheduler=2 as
>> a startup option. The logging happens in scheduler.cc:612
>> 
>>  .
>>
>>
>> This wiki page has a way to achieve that using environment variables:
>> https://cwiki.apache.org/confluence/display/IMPALA/Useful+Tips+for+New+
>> Impala+Developers
>>
>> Cheers, Lars
>>
>> On Thu, May 3, 2018 at 8:54 PM, Alexander Behm 
>> wrote:
>>
>>> No idea what's going on, but my guess is something is awry with the
>>> scan-range assignment. Can you attach the full profile? It's probably also
>>> good to print the scan ranges created in HdfsScanNode.
>>> computeScanRangeLocations().
>>>
>>> On Thu, May 3, 2018 at 5:51 PM, Philipp Krause >> googlemail.com> wrote:
>>>
 Hello Alex,

 I have tried out several configurations but I still couldn't find a
 solution for my problem :( In the query summary (s. attachment) it looks
 like as if no rows are read. Do you have an idea what I have to change? I
 am sorry for the circumstances and thank you once more for the great
 support to get this working!

 Am 29.04.2018 um 21:21 schrieb Philipp Krause:

 Hi Alex,
 I got the modified version working on my cluster. The query plan looks
 exactly as wanted (s. attachment). This is awesome! Unfortunately the
 result set is empty. As you can see in query_state.png, the scan progress
 always shows 50% although the query has finished.

 The only modification in the code is the if statement you pointed to me
 (I set it to true). Maybe I have to give Impala the information about the
 lhs / rhs join partition since there are no exchange nodes now (like in the
 following lines)? The corresponding  partitions / blocks of each table are
 on the same node.

 I think 

Re: Local join instead of data exchange - co-located blocks

2018-05-14 Thread Philipp Krause
Hello Alex,

I suppose you're very busy, so I apologize for the interruption. If you
have any idea of what I could try to solve this problem, please let me
know. Currently I don't know how to progress and I'd appreciate any help
you can give me.

Best regards
Philipp

Philipp Krause  schrieb am Mo., 7. Mai
2018, 12:59:

> I just wanted to add, that I tried the join with two other, minimal and
> "fresh" tables. All blocks from both tables were on the same node but I got
> the same result that no data were processed. To me, the scan range mapping
> of my modified version looks the same compared to the original one. I only
> noticed a difference in the query profile:
> Filter 0 (1.00 MB):
>  - Files processed: 1 (1)
>  - Files rejected: 1 (1)
> ...
>
> This filter only appears in my modified version. Hopefully we can find the
> mistake.
>
> Am 04.05.2018 um 15:40 schrieb Philipp Krause:
>
> Hi!
>
> The query profile and the scan range mappings are attached
> (query_profile.txt + scan_ranges.txt). The complete log file is also
> attached. The mapping looks fine to me, I couldn't find any mistakes there.
> For example, line 168 (scan_ranges.txt) shows that partition ID=4 is
> assigned to node_0 and partition ID=10 is assigned to node_1. Both
> partitions contain all id=4 rows which should be correct for the join. But
> probably I have overlooked something in the log.
>
> The partition/block setup is as follows:
> 6 Nodes (1 Namenode, 5 Datanodes)
> Node 1:
> Node 2: 0|0 5|5
> Node 3: 1|1
> Node 4: 2|2
> Node 5: 3|3
> Node 6: 4|4
>
> 0|0 means partition_0 from table A and B.
>
> Also thanks to Lars for the logging option, which I have used!
>
> Best regards
> Philipp
>
> Am 04.05.2018 um 07:10 schrieb Lars Volker:
>
> I haven't followed this thread closely, but you can also print all scan
> range assignments made by the scheduler by passing -vmodule=scheduler=2 as
> a startup option. The logging happens in scheduler.cc:612
> 
>  .
>
>
> This wiki page has a way to achieve that using environment variables:
> https://cwiki.apache.org/confluence/display/IMPALA/Useful+Tips+for+New+Impala+Developers
>
> Cheers, Lars
>
> On Thu, May 3, 2018 at 8:54 PM, Alexander Behm 
> wrote:
>
>> No idea what's going on, but my guess is something is awry with the
>> scan-range assignment. Can you attach the full profile? It's probably also
>> good to print the scan ranges created in
>> HdfsScanNode.computeScanRangeLocations().
>>
>> On Thu, May 3, 2018 at 5:51 PM, Philipp Krause <
>> philippkrause.m...@googlemail.com> wrote:
>>
>>> Hello Alex,
>>>
>>> I have tried out several configurations but I still couldn't find a
>>> solution for my problem :( In the query summary (s. attachment) it looks
>>> like as if no rows are read. Do you have an idea what I have to change? I
>>> am sorry for the circumstances and thank you once more for the great
>>> support to get this working!
>>>
>>> Am 29.04.2018 um 21:21 schrieb Philipp Krause:
>>>
>>> Hi Alex,
>>> I got the modified version working on my cluster. The query plan looks
>>> exactly as wanted (s. attachment). This is awesome! Unfortunately the
>>> result set is empty. As you can see in query_state.png, the scan progress
>>> always shows 50% although the query has finished.
>>>
>>> The only modification in the code is the if statement you pointed to me
>>> (I set it to true). Maybe I have to give Impala the information about the
>>> lhs / rhs join partition since there are no exchange nodes now (like in the
>>> following lines)? The corresponding  partitions / blocks of each table are
>>> on the same node.
>>>
>>> I think we are very close to the final result and I hope you can help me
>>> once more. Thank you so much!
>>>
>>> Best regards
>>> Philipp
>>>
>>> Am 24.04.2018 um 18:00 schrieb Alexander Behm:
>>>
>>> On Tue, Apr 24, 2018 at 5:31 AM, Philipp Krause <
>>> philippkrause.m...@googlemail.com> wrote:
>>>
 To prevent the broadcast join I could simply use the shuffle operator
 in the query:

 SELECT * FROM business_partition_1 INNER JOIN [SHUFFLE]
 business_partition_2 WHERE
 business_partition_1.businessid=business_partition_2.businessid

>>>
>>> Not sure what version of Impala you are using, and whether hints
>>> override any changes you might make. I suggest you make the code work as
>>> you wish without requiring hints.
>>>

 I think the broadcast is currently only used because of my very small
 test tables.

 This gives me the plan attached as partitioned_shuffle.png. Since my
 modified version isn't working yet, I partitioned both tables on businessid
 in Impala. The "hack" should only help to get into the if-condition if I
 partition the data manually, right?. But in this case (if the partitioning
 is done by Impala itself) Impala should get into