Re: Local join instead of data exchange - co-located blocks

Lars Volker Wed, 30 May 2018 10:44:38 -0700

Hi Philipp,

The ScanRangeAssignment logfile entry gets printed by the scheduler in L918
in PrintAssignment(). For each host and each plan node it shows the scan
ranges assigned. per_node_scan_ranges is a per-host structure in that
assignment. When inspecting the full logs you should be able to correlate
those and there should not be a difference.


Next you should be able to see that both scan node children of the join
have scan ranges assigned to them. If that is not the case then the
scheduler might have made a wrong decision.

Once you get the assignments right, the join node should have data on both
sides, regardless of exchanges.

I hope this helps you track down the issues. It might help to make the logs
more readable by renaming the files. Currently it's hard to see what
belongs where.

Cheers, Lars

On Wed, May 23, 2018 at 6:25 AM, Philipp Krause <
philippkrause.m...@googlemail.com> wrote:

> Hi Lars,
> thanks for the clarification. I was looking for the single
> ScanRangeAssignments (node_id=0 & node_id=1) on each datanode. E.g. for
> datanode 6 the assignment looks as follows:
>
> ScanRangeAssignment: server=TNetworkAddress {
>   01: hostname (string) = "vm-cluster-node6",
>   02: port (i32) = 22000,
> }
> 17:53:33.602  INFO  cc:916
> node_id=0 ranges=TScanRangeParams {
>   01: scan_range (struct) = TScanRange {
>     01: hdfs_file_split (struct) = THdfsFileSplit {
>       01: file_name (string) = "164b134d9e26eb2c-
> 9c0a6dc800000003_1787762321_data.0.parq",
> ...
>
> 17:53:33.602  INFO  cc:916
> node_id=1 ranges=TScanRangeParams {
>   01: scan_range (struct) = TScanRange {
>     01: hdfs_file_split (struct) = THdfsFileSplit {
>       01: file_name (string) = "ad42accc923aa106-
> da67788400000003_857029511_data.0.parq",
> ...
>
>
> This seems to be correct to me since both corresponding partitions /
> parquet files are on the same node. Is this correct or am I mistaken here?
> I guess these lines only provide information about what partitions each
> node needs and do not refer to the final scan range to node assignment
> (what I thought first)? The latter is expressed in per_node_scan_ranges?
>
>
> TPlanFragmentInstanceCtx {
>   01: fragment_idx (i32) = 0,
>   02: fragment_instance_id (struct) = TUniqueId {
>     01: hi (i64) = -413574937583451838,
>     02: lo (i64) = 7531803561076719616,
>   },
>   03: per_fragment_instance_idx (i32) = 0,
>   04: per_node_scan_ranges (map) = map<i32,list>[0] {
>   },
>   05: per_exch_num_senders (map) = map<i32,i32>[1] {
>     3 -> 5,
>   },
>   06: sender_id (i32) = -1,
> }
>
> Here, I wonder about the exchange.
>
> fragment_instance_ctx:
> TPlanFragmentInstanceCtx {
>   01: fragment_idx (i32) = 1,
>   02: fragment_instance_id (struct) = TUniqueId {
>     01: hi (i64) = -413574937583451838,
>     02: lo (i64) = 7531803561076719621,
>   },
>   03: per_fragment_instance_idx (i32) = 4,
>   04: per_node_scan_ranges (map) = map<i32,list>[1] {
>     0 -> list<struct>[2] {
>       [0] = TScanRangeParams {
>         01: scan_range (struct) = TScanRange {
>           01: hdfs_file_split (struct) = THdfsFileSplit {
>             01: file_name (string) = "164b134d9e26eb2c-
> 9c0a6dc800000004_1915463945_data.0.parq",
>             ...
>           },
>         },
>         ...
>       },
>       [1] = TScanRangeParams {
>         01: scan_range (struct) = TScanRange {
>           01: hdfs_file_split (struct) = THdfsFileSplit {
>             01: file_name (string) = "164b134d9e26eb2c-
> 9c0a6dc800000004_1023833177_data.0.parq",
>             ...
>           },
>         },
>         ...
>       },
>     },
>   },
>   05: per_exch_num_senders (map) = map<i32,i32>[0] {
>   },
>   06: sender_id (i32) = 4,
> }
>
> Why are only two partitions listed here (partition 0 and 5 which are on
> datanode 2)? As you already said, the build side is always empty but the
> probe side is always filled. So shouldn't be at least one partition per
> node be listed? Could you also clearify the difference between
> ScanRangeAssignments (where in my opinion everything looks correct) and
> per_node_scan_ranges to me? What I don't really get is why the build is
> empty although the correct partitions are logged in ScanRangeAssignments
> (but missing in per_node_scan_ranges). Thank you very much in advance!
>
> Best regards
> Philipp
>
> Am 21.05.2018 um 22:57 schrieb Lars Volker:
>
> Hi Philipp,
>
> The distributed profile shows that the HDFS scan on the build side of the
> join does not have any scan ranges assigned to it. You mentioned that you 
> "rechecked
> my scan assignments and they seem to be fine". You should be able to see
> them in the plan using a debugger or some print statements. Check my
> previous email for tips where to start debugging.
>
> If you search for "per_node_scan_ranges" in the log files, you'll see
> that in the num_nodes=0 case, only one node has scan ranges assigned to it.
> You might want to double check that the scheduler does what you expect in
> that case, possibly by stepping through ComputeScanRangeAssignments.
>
> Cheers, Lars
>
> On Mon, May 21, 2018 at 12:02 PM, Philipp Krause <philippkrause.mail@
> googlemail.com> wrote:
>
>> Yes, of course - sorry!
>>
>> Am 21.05.2018 um 20:52 schrieb Lars Volker:
>>
>> I only see the logs, can you attach the profiles, too?
>>
>> Thanks, Lars
>>
>> On Mon, May 21, 2018 at 11:51 AM, Philipp Krause <
>> philippkrause.m...@googlemail.com> wrote:
>>
>>> Hi Lars,
>>> this makes sense! Thanks for the explanation! For a better comparison,
>>> both profiles / logs are attached. I hope you can detect the problem. Thank
>>> you very much for your help!
>>>
>>> Best regards
>>> Philipp
>>>
>>>
>>> Am 17.05.2018 um 23:32 schrieb Lars Volker:
>>>
>>> Hi Philipp,
>>>
>>> My idea was to debug the query startup, both on the coordinator and on
>>> the executors. These are the same process in your case but would still go
>>> through an RPC through ExecQueryFInstances. You'll want to find where
>>> the rows get lost between the scan nodes and the join. Can you attach
>>> profile and logs for the same query with num_nodes=0?
>>>
>>> Cheers, Lars
>>>
>>> On Wed, May 16, 2018 at 6:52 AM, Philipp Krause <
>>> philippkrause.m...@googlemail.com> wrote:
>>>
>>>> Hi Lars,
>>>>
>>>> thank you very much for your quick response! I disabled runtime
>>>> filters, so that one of the scan nodes passes its rows to the hash join
>>>> fragment now. If I additionaly set num_nodes=1 the second scan node also
>>>> receives and passes its data to the join fragment and the query works fine
>>>> (split stats are not empty - the query profile is attached). I rechecked my
>>>> scan assignments and they seem to be fine. I also created two other
>>>> partitioned tables with only a few rows for testing. Here, all
>>>> blocks/partitions are on the same node (so no manual block movement was
>>>> neccessary). Unfortunately the result was the same: One of the scan nodes
>>>> remains empty except if I set num_nodes=1.
>>>> I'm not really sure about what exactly to debug in the lines you
>>>> mentioned and what to look for. Should I try to print the schedule object
>>>> itself via schedule_.get()? Maybe you have a presumtion of what the problem
>>>> might be? I try to proceed with debugging in the meantime. If you need some
>>>> other logs or something else, please let me know. I'm really eager to get
>>>> this working.
>>>>
>>>> Best regards and thank you very much for your help,
>>>> Philipp
>>>>
>>>> Am 14.05.2018 um 18:39 schrieb Lars Volker:
>>>>
>>>> Hi Philipp,
>>>>
>>>> Looking at the profile, one of your scan nodes doesn't seem to receive
>>>> any scan ranges ("Hdfs split stats" is empty). The other one receives one
>>>> split, but it get's filtered out by the runtime filter coming from that
>>>> first node ("Files rejected: 1"). You might want to disable runtime filters
>>>> for now until you get it sorted out.
>>>>
>>>> Then you might want to start debugging in 
>>>> be/src/service/client-request-state.cc:466,
>>>> which is where the scheduler gets called. You mentioned that your
>>>> assignments look OK, so until then things should be correct. If you're
>>>> uncomfortable poking it all apart with GDB you can always print objects
>>>> using the methods in debug-util.h. From there go down coord_->Exec() in
>>>> L480. Set query option num_nodes=1 to execute everything at the coordinator
>>>> for easier debugging. Otherwise, the coordinator will start remote
>>>> fragments, which you can intercept with a debugger in
>>>> ImpalaInternalService::ExecQueryFInstances
>>>> (be/src/service/impala-internal-service.cc:42).
>>>>
>>>> Cheers, Lars
>>>>
>>>> On Mon, May 14, 2018 at 1:18 AM, Philipp Krause <
>>>> philippkrause.m...@googlemail.com> wrote:
>>>>
>>>>> Hello Alex,
>>>>>
>>>>> I suppose you're very busy, so I apologize for the interruption. If
>>>>> you have any idea of what I could try to solve this problem, please let me
>>>>> know. Currently I don't know how to progress and I'd appreciate any help
>>>>> you can give me.
>>>>>
>>>>> Best regards
>>>>> Philipp
>>>>>
>>>>>
>>>>> Philipp Krause <philippkrause.m...@googlemail.com> schrieb am Mo., 7.
>>>>> Mai 2018, 12:59:
>>>>>
>>>>>> I just wanted to add, that I tried the join with two other, minimal
>>>>>> and "fresh" tables. All blocks from both tables were on the same node 
>>>>>> but I
>>>>>> got the same result that no data were processed. To me, the scan range
>>>>>> mapping of my modified version looks the same compared to the original 
>>>>>> one.
>>>>>> I only noticed a difference in the query profile:
>>>>>> Filter 0 (1.00 MB):
>>>>>>              - Files processed: 1 (1)
>>>>>>              - Files rejected: 1 (1)
>>>>>> ...
>>>>>>
>>>>>> This filter only appears in my modified version. Hopefully we can
>>>>>> find the mistake.
>>>>>>
>>>>>> Am 04.05.2018 um 15:40 schrieb Philipp Krause:
>>>>>>
>>>>>> Hi!
>>>>>>
>>>>>> The query profile and the scan range mappings are attached
>>>>>> (query_profile.txt + scan_ranges.txt). The complete log file is also
>>>>>> attached. The mapping looks fine to me, I couldn't find any mistakes 
>>>>>> there.
>>>>>> For example, line 168 (scan_ranges.txt) shows that partition ID=4 is
>>>>>> assigned to node_0 and partition ID=10 is assigned to node_1. Both
>>>>>> partitions contain all id=4 rows which should be correct for the join. 
>>>>>> But
>>>>>> probably I have overlooked something in the log.
>>>>>>
>>>>>> The partition/block setup is as follows:
>>>>>> 6 Nodes (1 Namenode, 5 Datanodes)
>>>>>> Node 1:
>>>>>> Node 2: 0|0 5|5
>>>>>> Node 3: 1|1
>>>>>> Node 4: 2|2
>>>>>> Node 5: 3|3
>>>>>> Node 6: 4|4
>>>>>>
>>>>>> 0|0 means partition_0 from table A and B.
>>>>>>
>>>>>> Also thanks to Lars for the logging option, which I have used!
>>>>>>
>>>>>> Best regards
>>>>>> Philipp
>>>>>>
>>>>>> Am 04.05.2018 um 07:10 schrieb Lars Volker:
>>>>>>
>>>>>> I haven't followed this thread closely, but you can also print all
>>>>>> scan range assignments made by the scheduler by passing
>>>>>> -vmodule=scheduler=2 as a startup option. The logging happens in
>>>>>> scheduler.cc:612
>>>>>> <https://github.com/apache/impala/blob/master/be/src/scheduling/scheduler.cc#L612>
>>>>>>  .
>>>>>>
>>>>>>
>>>>>> This wiki page has a way to achieve that using environment variables:
>>>>>> https://cwiki.apache.org/confluence/display/IMPAL
>>>>>> A/Useful+Tips+for+New+Impala+Developers
>>>>>>
>>>>>> Cheers, Lars
>>>>>>
>>>>>> On Thu, May 3, 2018 at 8:54 PM, Alexander Behm <
>>>>>> alex.b...@cloudera.com> wrote:
>>>>>>
>>>>>>> No idea what's going on, but my guess is something is awry with the
>>>>>>> scan-range assignment. Can you attach the full profile? It's probably 
>>>>>>> also
>>>>>>> good to print the scan ranges created in HdfsScanNode.computeScanRangeL
>>>>>>> ocations().
>>>>>>>
>>>>>>> On Thu, May 3, 2018 at 5:51 PM, Philipp Krause <
>>>>>>> philippkrause.m...@googlemail.com> wrote:
>>>>>>>
>>>>>>>> Hello Alex,
>>>>>>>>
>>>>>>>> I have tried out several configurations but I still couldn't find a
>>>>>>>> solution for my problem :( In the query summary (s. attachment) it 
>>>>>>>> looks
>>>>>>>> like as if no rows are read. Do you have an idea what I have to 
>>>>>>>> change? I
>>>>>>>> am sorry for the circumstances and thank you once more for the great
>>>>>>>> support to get this working!
>>>>>>>>
>>>>>>>> Am 29.04.2018 um 21:21 schrieb Philipp Krause:
>>>>>>>>
>>>>>>>> Hi Alex,
>>>>>>>> I got the modified version working on my cluster. The query plan
>>>>>>>> looks exactly as wanted (s. attachment). This is awesome! 
>>>>>>>> Unfortunately the
>>>>>>>> result set is empty. As you can see in query_state.png, the scan 
>>>>>>>> progress
>>>>>>>> always shows 50% although the query has finished.
>>>>>>>>
>>>>>>>> The only modification in the code is the if statement you pointed
>>>>>>>> to me (I set it to true). Maybe I have to give Impala the information 
>>>>>>>> about
>>>>>>>> the lhs / rhs join partition since there are no exchange nodes now 
>>>>>>>> (like in
>>>>>>>> the following lines)? The corresponding  partitions / blocks of each 
>>>>>>>> table
>>>>>>>> are on the same node.
>>>>>>>>
>>>>>>>> I think we are very close to the final result and I hope you can
>>>>>>>> help me once more. Thank you so much!
>>>>>>>>
>>>>>>>> Best regards
>>>>>>>> Philipp
>>>>>>>>
>>>>>>>> Am 24.04.2018 um 18:00 schrieb Alexander Behm:
>>>>>>>>
>>>>>>>> On Tue, Apr 24, 2018 at 5:31 AM, Philipp Krause <
>>>>>>>> philippkrause.m...@googlemail.com> wrote:
>>>>>>>>
>>>>>>>>> To prevent the broadcast join I could simply use the shuffle
>>>>>>>>> operator in the query:
>>>>>>>>>
>>>>>>>>> SELECT * FROM business_partition_1 INNER JOIN [SHUFFLE]
>>>>>>>>> business_partition_2 WHERE business_partition_1.businessi
>>>>>>>>> d=business_partition_2.businessid
>>>>>>>>>
>>>>>>>>
>>>>>>>> Not sure what version of Impala you are using, and whether hints
>>>>>>>> override any changes you might make. I suggest you make the code work 
>>>>>>>> as
>>>>>>>> you wish without requiring hints.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think the broadcast is currently only used because of my very
>>>>>>>>> small test tables.
>>>>>>>>>
>>>>>>>>> This gives me the plan attached as partitioned_shuffle.png. Since
>>>>>>>>> my modified version isn't working yet, I partitioned both tables on
>>>>>>>>> businessid in Impala. The "hack" should only help to get into the
>>>>>>>>> if-condition if I partition the data manually, right?. But in this 
>>>>>>>>> case (if
>>>>>>>>> the partitioning is done by Impala itself) Impala should get into the
>>>>>>>>> if-condition anyway. Unfortunately I can't see a difference in the 
>>>>>>>>> plan
>>>>>>>>> compared to my unpartitioned tables (unpartitioned.png) concerning the
>>>>>>>>> exchange nodes. My goal is actually to get rid of all exchange nodes 
>>>>>>>>> since
>>>>>>>>> the corresponding data is already present on each node. Actually the 
>>>>>>>>> plan
>>>>>>>>> should look the same except for the 03 and 04 exchange then.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I understand the picture and goal. I suggest you read, understand,
>>>>>>>> and modify the code in DistributedPlanner.createHashJoinFragment()
>>>>>>>> to create the plan shape that you want.
>>>>>>>> I don't know how you are producing these plans. Are you sure your
>>>>>>>> code changes are taking effect?
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Are there other changes neccessary to just take
>>>>>>>>> Table1-partition/block X and Table2-partition/block Y on each node 
>>>>>>>>> and join
>>>>>>>>> them without any data exchange? Actually each node should take all its
>>>>>>>>> local blocks for both tables, join them and pass the results back to 
>>>>>>>>> the
>>>>>>>>> coordinator where all results come together (please see 
>>>>>>>>> explanation.png).
>>>>>>>>>
>>>>>>>>
>>>>>>>> No. You just need to create the plan without exchange nodes, as
>>>>>>>> we've already gone through early in this thread.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I'm looking forward hearing from you. I hope we can realise this.
>>>>>>>>>
>>>>>>>>> Best regards
>>>>>>>>> Philipp
>>>>>>>>>
>>>>>>>>> Am 24.04.2018 um 07:03 schrieb Alexander Behm:
>>>>>>>>>
>>>>>>>>> On Mon, Apr 23, 2018 at 7:24 PM, Philipp Krause <
>>>>>>>>> philippkrause.m...@googlemail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Alex,
>>>>>>>>>> thanks for the information! I've compiled the cdh5.13.1-release
>>>>>>>>>> and replaced impala-frontend-0.1-SNAPSHOT.jar (which seems to
>>>>>>>>>> include the changes in the DistributedPlanner.java). There still 
>>>>>>>>>> seems to
>>>>>>>>>> to be a method missing after replacing the jar but I'll try to 
>>>>>>>>>> figure that
>>>>>>>>>> out.
>>>>>>>>>>
>>>>>>>>>> I have two questions concerning the code fragment in the
>>>>>>>>>> DistributedPlanner.java you pointed to me.
>>>>>>>>>>
>>>>>>>>>> First:
>>>>>>>>>> The attached graphic shows the query plan for two tables which
>>>>>>>>>> are partitioned on the join attribute (standard impala version 
>>>>>>>>>> without any
>>>>>>>>>> changes). If I change the if-condition to true for my modified 
>>>>>>>>>> version I
>>>>>>>>>> expect to get the same result for my "hand made" partitions (outside 
>>>>>>>>>> of
>>>>>>>>>> impala). But why is there an exchange broadcast (even in the standard
>>>>>>>>>> version)? I mean, if I have a partition of Table 1 with ID=0 on Node 
>>>>>>>>>> X why
>>>>>>>>>> is there a broadcast of the partition of Table 2 with ID=0 on Node Y?
>>>>>>>>>> Actually this partition only has to be sent to Node X where the 
>>>>>>>>>> matching
>>>>>>>>>> partition of Table 1 is located (instead of sending it to all nodes
>>>>>>>>>> (broadcast)). Or is this exactly the case and it's only shown as a
>>>>>>>>>> broadcast here in the graphics?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> You are reading the plan correctly. Impala simply does not
>>>>>>>>> implement that optimization.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Second:
>>>>>>>>>> All corresponding blocks of the partitioned tables are on the
>>>>>>>>>> same node (e.g. Table 1 Partition with ID=0, Table 2 Partition with 
>>>>>>>>>> ID=0 =>
>>>>>>>>>> Node 1 etc.). This is what I did manually. As already mentioned 
>>>>>>>>>> before I
>>>>>>>>>> want to join these partitions (blocks) locally on each node. But if 
>>>>>>>>>> I'm
>>>>>>>>>> correct, the modification in the DistributedPlanner will also only 
>>>>>>>>>> lead to
>>>>>>>>>> the plan in the attached graphic so that no further exchanges are 
>>>>>>>>>> created
>>>>>>>>>> if I use my "hand made" partitions. But there is still the broadcast
>>>>>>>>>> exchange which distributes the partitions across the cluster which 
>>>>>>>>>> isn't
>>>>>>>>>> neccessary because all needed blocks are already on the same node 
>>>>>>>>>> and are
>>>>>>>>>> ready to get joined locally. Is there a way to realise that and get 
>>>>>>>>>> rid of
>>>>>>>>>> the broadcast exchange?
>>>>>>>>>>
>>>>>>>>>> You are right. That hack in DistributedPlanner only works for
>>>>>>>>> partitioned hash joins. You can probably stitch the plan together at 
>>>>>>>>> an
>>>>>>>>> earlier place, e.g.:
>>>>>>>>> https://github.com/apache/impala/blob/master/fe/src/main/jav
>>>>>>>>> a/org/apache/impala/planner/DistributedPlanner.java#L506
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Please correct me if I'm wrong with my assumptions.
>>>>>>>>>>
>>>>>>>>>> Thank you very much!
>>>>>>>>>>
>>>>>>>>>> Best regards
>>>>>>>>>> Philipp
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Am 18.04.2018 um 07:16 schrieb Alexander Behm:
>>>>>>>>>>
>>>>>>>>>> Your CM-managed cluster must be running a "compatible" Impala
>>>>>>>>>> version already for this trick to work. It looks like your catalog 
>>>>>>>>>> binary
>>>>>>>>>> is trying to find a method which does not exist in the .jar, 
>>>>>>>>>> presumably
>>>>>>>>>> because your .jar is built based on a different version of Impala 
>>>>>>>>>> where
>>>>>>>>>> that method does not exist anymore.
>>>>>>>>>>
>>>>>>>>>> It looks like you have CDH 5.13.3 installed. CDH 5.13.3 is based
>>>>>>>>>> on Impala 2.10, see:
>>>>>>>>>> https://www.cloudera.com/documentation/enterprise/release-no
>>>>>>>>>> tes/topics/cdh_vd_cdh_package_tarball_513.html#cm_vd_cdh_pac
>>>>>>>>>> kage_tarball_513
>>>>>>>>>>
>>>>>>>>>> That means this binary copying trick will only work with a
>>>>>>>>>> modified version of Impala 2.10, and very likely will not work with a
>>>>>>>>>> different version.
>>>>>>>>>>
>>>>>>>>>> It's probably easier to test with the mini cluster first.
>>>>>>>>>> Alternatively, it "might" work if you replace all the binaries 
>>>>>>>>>> mentioned
>>>>>>>>>> above, but it's quite possible that will not work.
>>>>>>>>>>
>>>>>>>>>> On Sun, Apr 15, 2018 at 7:13 PM, Philipp Krause <
>>>>>>>>>> philippkrause.m...@googlemail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Alex! Thank you for the list! The build of the modified
>>>>>>>>>>> cdh5-trunk branch (debug mode) was sucessfull. After replacing
>>>>>>>>>>> "impala-frontend-0.1-SNAPSHOT.jar" in
>>>>>>>>>>> /opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/jars/ I got
>>>>>>>>>>> the following error in my existing cluster:
>>>>>>>>>>> F0416 01:16:45.402997 17897 catalog.cc:69] NoSuchMethodError:
>>>>>>>>>>> getCatalogObjects
>>>>>>>>>>> When I switch back to the original jar file the error is gone.
>>>>>>>>>>> So it must be something wrong with this file I guess. But I wonder 
>>>>>>>>>>> about
>>>>>>>>>>> the error in catalog.cc because I didn't touch any .cc files.
>>>>>>>>>>>
>>>>>>>>>>> I also replaced "impala-data-source-api-1.0-SNAPSHOT.jar". The
>>>>>>>>>>> other jar files do not exist in my impala installation (CDH-5.13.1).
>>>>>>>>>>>
>>>>>>>>>>> What am I doing wrong?
>>>>>>>>>>>
>>>>>>>>>>> Best regards
>>>>>>>>>>> Philipp
>>>>>>>>>>>
>>>>>>>>>>> Am 13.04.2018 um 20:12 schrieb Alexander Behm:
>>>>>>>>>>>
>>>>>>>>>>> Here's the foll list. It might not be minimal, but
>>>>>>>>>>> copying/overwriting these should work.
>>>>>>>>>>>
>>>>>>>>>>> debug/service/impalad
>>>>>>>>>>> debug/service/libfesupport.so
>>>>>>>>>>> debug/service/libService.a
>>>>>>>>>>> release/service/impalad
>>>>>>>>>>> release/service/libfesupport.so
>>>>>>>>>>> release/service/libService.a
>>>>>>>>>>> yarn-extras-0.1-SNAPSHOT.jar
>>>>>>>>>>> impala-data-source-api-1.0-SNAPSHOT-sources.jar
>>>>>>>>>>> impala-data-source-api-1.0-SNAPSHOT.jar
>>>>>>>>>>> impala-frontend-0.1-SNAPSHOT-tests.jar
>>>>>>>>>>> impala-frontend-0.1-SNAPSHOT.jar
>>>>>>>>>>> libkudu_client.so.0.1.0
>>>>>>>>>>> libstdc++.so.6.0.20
>>>>>>>>>>> impala-no-sse.bc
>>>>>>>>>>> impala-sse.bc
>>>>>>>>>>> libimpalalzo.so
>>>>>>>>>>>
>>>>>>>>>>> If you are only modifying the Java portion (like
>>>>>>>>>>> DistributedPlanner), then only copying/replacing the *.jar files 
>>>>>>>>>>> should be
>>>>>>>>>>> sufficient.
>>>>>>>>>>>
>>>>>>>>>>> ...
>
> [Message clipped]

Re: Local join instead of data exchange - co-located blocks

Reply via email to