Re: Big big query

Christian Schwabe Thu, 04 Sep 2014 11:16:18 -0700

Hello Hyunsik,

are the more detailed information been helpful? I could offer you attend a 
TeamViewer session to examine the problem more closely. Would that be a 
proposal on which you would let you in? So at least ensured that I must not 
publish queries.





   Due to the time difference in Germany would have to find only a suitable 
date and time.



Warm regards,
Chris


Am 01.09.2014 20:26:35, schrieb Christian Schwabe:
> Hello>  > Hyunsik> ,
>  
> the late reply>  > does not matter> .>  > I> 'm glad>  > you>  > support>  > 
> me so much> .>  > That's>  > worth a lot> !>  > Unfortunately I can>  > not>  
> > post>  > the>  > query>  > for privacy>  > reasons> .>  > Can I>  > send>  
> > you>  > other>  > information>  to > identify>  > the problem>  > in more 
> detail?>  

> I can try>  > to describe>  > the>  > general>  > query> :>  > The>  > 
> projection>  > refers to>  > 43>  > fields> ,>  > it>  > is a>  > left join>  
> > combined with an>  > inner join> ,>  > the>  > join conditions>  > are 
> made>  > for up to>  > four>  > fields> .>  
> The > tables are saved in > parquet> -format.>  
> First>  > table>  > (> 12,853,555>  > rows>  > 544,6MB> )>  
> Second>  > table>  > (> 17,896,966>  > rows>  > 361,1MB> )>  
> Third>  > table>  > (> 18,825,153>  > rows>  > 943,2MB> )>  
> Is there a way>  to force > Tajo>  > to a specific>  > algorithm> ?
> 
> You've>  > talked about>  > to give>  > another advice> ,>  > in the case>  > 
> of>  > the>  > nested>  > join is>  > the only choice.>  > What would>  > 
> this>  > mean>  > excactly> ?
> 
> Warm regards,
> Chris
> 
> Am 01.09.2014 19:09:59, schrieb Hyunsik Choi:
> > Hi Chris,
> > Could you share your query? I can see a part of your query from your 
> > screenshot, but the query is not entire one. 
> > 
> > I think that the first problem is that the query planner chooses nested 
> > loop join which is very inefficient join algorithm. In some cases, there is 
> > no alternative if there is no join condition or join condition is theta 
> > join. But, I'm suspecting that it may be a bug because most of joins are 
> > equi-joins.
> > > > 

> > I'll give another advice If block nested-loop join is only choice.
> > 
> > Best regards,
> > > > Hyunsik
> > 
> > 
> > On Mon, Sep 1, 2014 at 8:54 PM, Christian Schwabe > > <> > 
> > [email protected]> > >> >  wrote:

> > > Hello guys,> > > 

> > > Does anyone have an idea what I could do against the failure of a "nested 
> > > loop join"?
> > > 

> > > 
> > > Best regards,
> > > > > > Chris
> > > 
> > > Am 27.08.2014 10:55:27, schrieb Christian Schwabe:
> > > > Hello Jinhang,
> > > > Thank you for your helpful hints. In fact, he makes an impact. You can 
> > > > see it on the attached screenshot. Now he seems to hang when merging 
> > > > the two tables. This helpful hint you should definitely include in your 
> > > > documentation. I have previously read this parameter still nowhere …
> > > > > > > > 

> > > > The log show the following (partial list):
> > > > 
> > > > 

> > > > 2014-08-27 10:13:09,583 INFO org.apache.tajo.worker.Task: * Local task 
> > > > dir: file:/tmp/tajo-chris/tmpdir/q_1409126149411_0001/output/5/1_0
> > > > > > > > 
> > > > 2014-08-27 10:13:09,583 INFO org.apache.tajo.worker.Task: 
> > > > ==================================
> > > > > > > > 2014-08-27 10:13:09,584 INFO org.apache.tajo.worker.Task: the 
> > > > > > > > directory is created  
> > > > > > > > file:/tmp/tajo-chris/tmpdir/q_1409126149411_0001/in/eb_1409126149411_0001_000005/1/0/eb_1409126149411_0001_000003
> > > > > > > > 
> > > > 2014-08-27 10:13:09,584 INFO org.apache.tajo.worker.Task: the directory 
> > > > is created  
> > > > file:/tmp/tajo-chris/tmpdir/q_1409126149411_0001/in/eb_1409126149411_0001_000005/1/0/eb_1409126149411_0001_000004
> > > > > > > > 2014-08-27 10:13:09,600 INFO 
> > > > > > > > org.apache.tajo.worker.TaskAttemptContext: Query status of 
> > > > > > > > ta_1409126149411_0001_000005_000001_00 is changed to TA_RUNNING
> > > > > > > > 
> > > > 2014-08-27 10:13:09,602 INFO org.apache.tajo.worker.Fetcher: Status: 
> > > > FETCH_FETCHING, URI:> > > > 
> > > > http://192.168.178.101:52334/?qid=q_1409126149411_0001&sid=4&p=1&type=h&ta=68_0,17_0,50_0,32_0,61_0,25_0,55_0,60_0,47_0,69_0,16_0,38_0,2_0,24_0,33_0,11_0,5_0,41_0,44_0,66_0,52_0,19_0,8_0,30_0,63_0,27_0,21_0,26_0,72_0,67_0,36_0,57_0,18_0,49_0,31_0,13_0,29_0,62_0,10_0,0_0,46_0,65_0,3_0,1_0,54_0,39_0,15_0,37_0,23_0,6_0,64_0,59_0,73_0,51_0,28_0,42_0,34_0,12_0,20_0,9_0,56_0,35_0,45_0,48_0,53_0,4_0,70_0,71_0,58_0,40_0,22_0,7_0,14_0,43_0
> > > > > > > > 
> > > > 2014-08-27 10:13:09,602 INFO org.apache.tajo.worker.Fetcher: Status: 
> > > > FETCH_FETCHING, URI:> > > > 
> > > > http://christians-mbp.fritz.box:52334/?qid=q_1409126149411_0001&sid=3&p=1&type=h&ta=1_0
> > > > > > > > 
> > > > 2014-08-27 10:13:09,602 INFO 
> > > > org.apache.tajo.pullserver.TajoPullServerService: Current number of 
> > > > shuffle connections (2)
> > > > > > > > 2014-08-27 10:13:09,602 INFO 
> > > > > > > > org.apache.tajo.pullserver.TajoPullServerService: Current 
> > > > > > > > number of shuffle connections (3)
> > > > > > > > 
> > > > 2014-08-27 10:13:09,603 INFO 
> > > > org.apache.tajo.pullserver.TajoPullServerService: PullServer request 
> > > > param: shuffleType=h, sid=3, partId=1, taskIds=[1_0]
> > > > > > > > 2014-08-27 10:13:09,603 INFO 
> > > > > > > > org.apache.tajo.pullserver.TajoPullServerService: PullServer 
> > > > > > > > baseDir: /tmp/tajo-chris/tmpdir/q_1409126149411_0001/output
> > > > > > > > 
> > > > 2014-08-27 10:13:09,604 INFO 
> > > > org.apache.tajo.pullserver.TajoPullServerService: PullServer request 
> > > > param: shuffleType=h, sid=4, partId=1, 
> > > > taskIds=[68_0,17_0,50_0,32_0,61_0,25_0,55_0,60_0,47_0,69_0,16_0,38_0,2_0,24_0,33_0,11_0,5_0,41_0,44_0,66_0,52_0,19_0,8_0,30_0,63_0,27_0,21_0,26_0,72_0,67_0,36_0,57_0,18_0,49_0,31_0,13_0,29_0,62_0,10_0,0_0,46_0,65_0,3_0,1_0,54_0,39_0,15_0,37_0,23_0,6_0,64_0,59_0,73_0,51_0,28_0,42_0,34_0,12_0,20_0,9_0,56_0,35_0,45_0,48_0,53_0,4_0,70_0,71_0,58_0,40_0,22_0,7_0,14_0,43_0]
> > > > > > > > 
> > > > 2014-08-27 10:13:09,604 INFO 
> > > > org.apache.tajo.pullserver.TajoPullServerService: PullServer baseDir: 
> > > > /tmp/tajo-chris/tmpdir/q_1409126149411_0001/output
> > > > > > > > 2014-08-27 10:13:15,313 INFO org.apache.tajo.worker.Fetcher: 
> > > > > > > > Data fetch is done (total received bytes: 991342257)
> > > > > > > > 
> > > > 2014-08-27 10:13:15,317 INFO org.apache.tajo.worker.Fetcher: Status: 
> > > > FETCH_FINISHED, URI:> > > > 
> > > > http://192.168.178.101:52334/?qid=q_1409126149411_0001&sid=4&p=1&type=h&ta=68_0,17_0,50_0,32_0,61_0,25_0,55_0,60_0,47_0,69_0,16_0,38_0,2_0,24_0,33_0,11_0,5_0,41_0,44_0,66_0,52_0,19_0,8_0,30_0,63_0,27_0,21_0,26_0,72_0,67_0,36_0,57_0,18_0,49_0,31_0,13_0,29_0,62_0,10_0,0_0,46_0,65_0,3_0,1_0,54_0,39_0,15_0,37_0,23_0,6_0,64_0,59_0,73_0,51_0,28_0,42_0,34_0,12_0,20_0,9_0,56_0,35_0,45_0,48_0,53_0,4_0,70_0,71_0,58_0,40_0,22_0,7_0,14_0,43_0
> > > > > > > > 
> > > > 2014-08-27 10:13:18,685 INFO org.apache.tajo.worker.Fetcher: Data fetch 
> > > > is done (total received bytes: 2225872857)
> > > > > > > > 2014-08-27 10:13:18,685 INFO org.apache.tajo.worker.Fetcher: 
> > > > > > > > Status: FETCH_FINISHED, URI:> > > > 
> > > > > > > > http://christians-mbp.fritz.box:52334/?qid=q_1409126149411_0001&sid=3&p=1&type=h&ta=1_0
> > > > > > > > 
> > > > 2014-08-27 10:13:18,686 INFO org.apache.tajo.worker.Task: 
> > > > ta_1409126149411_0001_000005_000001_00 All fetches are done!
> > > > > > > > 2014-08-27 10:13:18,688 INFO 
> > > > > > > > org.apache.tajo.engine.planner.PhysicalPlannerImpl: Left Outer 
> > > > > > > > Join (9) chooses [Nested Loop Join].
> > > > > > > > 

> > > > 

> > > > What means „Nested Loop Join“? The last entry in the above log is 
> > > > already passed since 40 minutes. Nothing happens since that point. 
> > > > 

> > > > Best regards,
> > > > > > > > 
> > > > Chris
> > > > 

> > > > 
> > > > > > > > 

> > > > Am 26.08.2014 um 08:17 schrieb Jinhang Choi <> > > > 
> > > > [email protected]> > > > >:
> > > > 
> > > > > 
> > > > > At first, you can take a try of changing resource-tracker's heartbeat 
> > > > > timeout before investigating the actual problem> > > > > . > > > > > 
> > > > > worker log shows on-going fetcher operations even though 
> > > > > acknowledging heartbeat responses are delayed.> > > > > 
> > > > > besides, master log indicates Worker's deactivation with 
> > > > > LivelinessMonitor's timeout as follow:> > > > > ====> > > > > 
> > > > > 2014-08-25 21:18:47,968 INFO 
> > > > > org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
> > > > > Expired:christians-mbp.fritz.box:28093:28091 Timed out after 120 
> > > > > secs> > > > > 
> > > > > 2014-08-25 21:18:47,978 INFO org.apache.tajo.master.rm.Worker: 
> > > > > Deactivating Node christians-mbp.fritz.box:28093:28091 as it is now 
> > > > > LOST> > > > > ====
> > > > > > > > > >  
> > > > > How about testing tajo.resource-tracker.heartbeat.timeout-secs in 
> > > > > tajo-site.xml like this?> > > > > 
> > > > >  
> > > > > <configuration>> > > > > ....> > > > >     <property>> > > > >        
> > > > >  <name>tajo.resource-tracker.heartbeat.timeout-secs</name>> > > > >   
> > > > >       <value>240000</value>  // or, your own longer value. default is 
> > > > > 120*1000 (2 minutes)> > > > > 
> > > > >     </property>> > > > > </configuration>> > > > >  
> > > > > Sincerely,> > > > > ----------------------------------------------
> > > > > Jinhang Choi, CTO.
> > > > > Linewalks Inc. Seoul 137-860, Korea
> > > > > Office: +82 70 7702 3043
> > > > > 
FAX: +82 2 2055 0612> > > > > -----Original Message-----
> > > > > From:> > > > >  "Christian Schwabe"<> > > > > 
> > > > > [email protected]> > > > > > 
> > > > > To:> > > > >  <> > > > > [email protected]> > > > > >; "Jinhang 
> > > > > Choi"<> > > > > [email protected]> > > > > >; 
> > > > > Cc:> > > > >  
> > > > > Sent:> > > > >  2014-08-26 (화) 12:09:37
> > > > > Subject:> > > > >  Re: Big big query
> > > > > 
> > > > > Hello everyone,> > > > >  
> > > > > > > > > > I've tested a lot again today. I want to share my 
> > > > > > > > > > experiences and discuss it here. I have attached again a 
> > > > > > > > > > log of the Worker for the query which runs seemingly 
> > > > > > > > > > endless. 
> > > > > > > > > > 
> > > > > As requested by you I have assigned to the worker 4GB. 8GB has my 
> > > > > MacBook available.
> > > > > > > > > > Also assign more memory is no solution.
> > > > > > > > > > Joins on small tables with windows_functions an little 
> > > > > > > > > > content seem to be no problem.
> > > > > > > > > > 
> > > > > Joins with many columns and large tables, as it is a table in this 
> > > > > example seems to be a real problem. My guess at this point is an 
> > > > > incorrect memory management of Tajo.
> > > > > 
> > > > > > > > > > I have a video made   by you better show the WebUI and to 
> > > > > > > > > > get a better picture of the situation. In combination with 
> > > > > > > > > > the submitted logs I hope here together to find a solution.
> > > > > > > > > > 
> > > > > Here is the video: > > > > > http://youtu.be/_TKzRluzg38
> > > > > > > > > > 
> > > > >  
> > > > > > > > > >  
> > > > > > > > > > Best regards,
> > > > > > > > > > Chris
> > > > > > > > > >  
> > > > > 
> > > > > 

> > > > > Am 25.08.2014 um 08:52 schrieb Jinhang Choi <> > > > > 
> > > > > [email protected]> > > > > >:
> > > > > > > > > > 

> > > > > > Dear Christian,> > > > > > worker log indicates that "GC overhead 
> > > > > > limit exceeded."> > > > > > would you mind to extend worker's heap 
> > > > > > memory size at tajo-env.sh?> > > > > > 
> > > > > > (please refer to > > > > > > 
> > > > > > http://tajo.apache.org/docs/current/configuration/worker_configuration.html)
> > > > > > 
> > > > > > Sincerely,
> > > > > > 
----------------------------------------------
> > > > > > Jinhang Choi, CTO.
> > > > > > Linewalks Inc. Seoul 137-860, Korea
> > > > > > Office: +82 70 7702 3043
> > > > > > FAX: +82 2 2055 0612> > > > > > 
> > > > > > -----Original Message-----
> > > > > > From:> > > > > >  "Christian Schwabe"<> > > > > > 
> > > > > > [email protected]> > > > > > > 
> > > > > > To:> > > > > >  <> > > > > > [email protected]> > > > > > >; 
> > > > > > Cc:> > > > > >  
> > > > > > Sent:> > > > > >  2014-08-25 (월) 15:33:15
> > > > > > Subject:> > > > > >  Re: Big big query
> > > > > > 
> > > > > > 
> > > > > > Hello Hyunsik,> > > > > > sorry. My fault. I will send you another 
> > > > > > email with the attached logs.
> > > > > > > > > > > > 

> > > > > > Best regards,> > > > > > Chris
> > > > > > 
> > > > > > Am 25.08.2014 08:28:17, schrieb Hyunsik Choi:
> > > > > > > Hi Chris,> > > > > > >  
> > > > > > > > > > > > > > As Jihoon mentioned, it would be better to find the 
> > > > > > > > > > > > > > problems if you attach master and worker logs.
> > > > > > > > > > > > > > 
> > > > > > >  
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > hyunsik
> > > > > > > 
> > > > > > > > > > > > > > 

> > > > > > > 
> > > > > > > On Sun, Aug 24, 2014 at 4:17 PM, Christian Schwabe > > > > > > > 
> > > > > > > <> > > > > > > [email protected]> > > > > > > >> > > > > 
> > > > > > > > >  wrote:
> > > > > > > > 

> > > > > > > > Hello guys> > > > > > > >  
> > > > > > > > > > > > > > > > i started following query last night and this 
> > > > > > > > > > > > > > > > morning have seen that still ran the query with 
> > > > > > > > > > > > > > > > the fact that has the display of procent not 
> > > > > > > > > > > > > > > > changed and only ran on time. So I have to 
> > > > > > > > > > > > > > > > start again this morning reproduce the query to 
> > > > > > > > > > > > > > > > the error. The result you see in the table 
> > > > > > > > > > > > > > > > below:
> > > > > > > > > > > > > > > > 

> > > > > > > > Running Queries> > > > > > > > QueryId> > > > > > > > Query 
> > > > > > > > Master> > > > > > > > 
Started> > > > > > > > 
Progress> > > > > > > > Time> > > > > > > > Status> > > > > > > > 
sql> > > > > > > > 
Kill Query> > > > > > > > q_1408862421753_0001> > > > > > > > 
> > > > > > > > christians-mbp.fritz.box> > > > > > > > 2014-08-24 08:46:33> > 
> > > > > > > > > > > > > > 
45%> > > > > > > > 
15 mins, 48 sec> > > > > > > > QUERY_RUNNING> > > > > > > > INSERT INTO 
TMP_DFKKKO_DFKKOP select pos.validthru, pos.mandt, pos.opbel, pos.opupk, 
pos.opupz, pos.bukrs, pos.augrs, pos.abwkt, pos.hvorg, pos.tvorg, pos.kofiz, 
pos.hkont, pos.mwskz, pos.mwszkz, pos.xanza, pos.stakz, pos.astkz, pos.opsta, 
pos.infoz, pos.inkps, pos.betrh, pos.studt, ko.fikey, ko.blart, ko.herkf, 
ko.stbel, ko.storb, ko.ernam, ko.cpudt, ko.cputm, ko.bldat, ko.budat from 
dfkkop_hist pos left join dfkkopw_hist wdh on ( pos.validthru = wdh.validthru 
and pos.mandt = wdh.mandt and pos.opbel = wdh.opbel and pos.whgrp = wdh.whgrp ) 
inner join dfkkko_hist ko on ( pos.validthru = ko.validthru and pos.mandt = 
ko.mandt and pos.opbel = ko.opbel )> > > > > > > > 

> > > > > > > >  
> > > > > > > > 
> > > > > > > > > > > > > > > > 

Second Screenshot:
> > > > > > > > > > > > > > > > Running Queries> > > > > > > > QueryId> > > > > 
> > > > > > > > > > > > > > > > > > > Query Master> > > > > > > > 
Started> > > > > > > > 
Progress> > > > > > > > Time> > > > > > > > Status> > > > > > > > 
sql> > > > > > > > 
Kill Query> > > > > > > > q_1408862421753_0001> > > > > > > > 
> > > > > > > > christians-mbp.fritz.box> > > > > > > > 2014-08-24 08:46:33> > 
> > > > > > > > > > > > > > 
43%> > > > > > > > 
23 mins, 21 sec> > > > > > > > QUERY_RUNNING> > > > > > > > INSERT INTO 
TMP_DFKKKO_DFKKOP select pos.validthru, pos.mandt, pos.opbel, pos.opupk, 
pos.opupz, pos.bukrs, pos.augrs, pos.abwkt, pos.hvorg, pos.tvorg, pos.kofiz, 
pos.hkont, pos.mwskz, pos.mwszkz, pos.xanza, pos.stakz, pos.astkz, pos.opsta, 
pos.infoz, pos.inkps, pos.betrh, pos.studt, ko.fikey, ko.blart, ko.herkf, 
ko.stbel, ko.storb, ko.ernam, ko.cpudt, ko.cputm, ko.bldat, ko.budat from 
dfkkop_hist pos left join dfkkopw_hist wdh on ( pos.validthru = wdh.validthru 
and pos.mandt = wdh.mandt and pos.opbel = wdh.opbel and pos.whgrp = wdh.whgrp ) 
inner join dfkkko_hist ko on ( pos.validthru = ko.validthru and pos.mandt = 
ko.mandt and pos.opbel = ko.opbel )> > > > > > > > 

> > > > > > > >  
> > > > > > > > > > > > > > > > Third Screenshot:
> > > > > > > > > > > > > > > >  
> > > > > > > > > > > > > > > > Finished Queries> > > > > > > > QueryId> > > > 
> > > > > > > > > > > > > > > > > > > > Query Master> > > > > > > > 
Started> > > > > > > > 
Finished> > > > > > > > Time> > > > > > > > Status> > > > > > > > 
sql> > > > > > > > 
> > > > > > > > q_1408862421753_0001> > > > > > > > christians-mbp.fritz.box> > 
> > > > > > > > > > > > > > 
> > > > > > > > 2014-08-24 08:46:33> > > > > > > > -> > > > > > > > -> > > > > 
> > > > > > > > > > > 
> > > > > > > > 
QUERY_RUNNING> > > > > > > > INSERT INTO TMP_DFKKKO_DFKKOP select 
pos.validthru, pos.mandt, pos.opbel, pos.opupk, pos.opupz, pos.bukrs, 
pos.augrs, pos.abwkt, pos.hvorg, pos.tvorg, pos.kofiz, pos.hkont, pos.mwskz, 
pos.mwszkz, pos.xanza, pos.stakz, pos.astkz, pos.opsta, pos.infoz, pos.inkps, 
pos.betrh, pos.studt, ko.fikey, ko.blart, ko.herkf, ko.stbel, ko.storb, 
ko.ernam, ko.cpudt, ko.cputm, ko.bldat, ko.budat from dfkkop_hist pos left join 
dfkkopw_hist wdh on ( pos.validthru = wdh.validthru and pos.mandt = wdh.mandt 
and pos.opbel = wdh.opbel and pos.whgrp = wdh.whgrp ) inner join dfkkko_hist ko 
on ( pos.validthru = ko.validthru and pos.mandt = ko.mandt and pos.opbel = 
ko.opbel )> > > > > > > > 
> > > > > > > >  
> > > > > > > > 
> > > > > > > > > > > > > > > > As you can see, the query is still running but 
> > > > > > > > > > > > > > > > there is no image data and progress more. 
> > > > > > > > > > > > > > > >  
> > > > > > > > > > > > > > > > I attached a log in which you can see the 
> > > > > > > > > > > > > > > > output in the console. Striking here is the 
> > > > > > > > > > > > > > > > display of the percentage jumps sometimes and 
> > > > > > > > > > > > > > > > not further altered towards the end and remains 
> > > > > > > > > > > > > > > > constant.
> > > > > > > > > > > > > > > > 

> > > > > > > > The data size of the tables to which I JOINS by lead is for 
> > > > > > > > dfkkop_hist 5,83GB, dfkkopw_hist 2,47GB and dfkkko_hist 2.35 
> > > > > > > > GB. As I write this, the description of the query is still 
> > > > > > > > running.
> > > > > > > > > > > > > > > > I know these are large amounts of data, but I 
> > > > > > > > > > > > > > > > would have expected which thus constitutes the 
> > > > > > > > > > > > > > > > colloquial no problem. Can you imagine why it 
> > > > > > > > > > > > > > > > comes here to this problem?
> > > > > > > > > > > > > > > > 

> > > > > > > >  
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > >  
> > > > > > > > > > > > > > > >  
> > > > > > > > > > > > > > > > Best regards,
> > > > > > > > > > > > > > > > 
> > > > > > > > Chris
> > > > > > > > 
> > > > > > > > 
> > > > > > >  > > > > > > > 
> > > > > >  > > > > > > 



> > > > > > > > > > > > 

> > > > >  
> > > > > 



> > > > 
> > > > 
> > > 


> > > >

Re: Big big query

Reply via email to