Re: Big big query

Hyunsik Choi Thu, 11 Sep 2014 18:52:53 -0700

Thank you Mai for your comment.


On Fri, Sep 12, 2014 at 9:26 AM, Thanh Mai <[email protected]> wrote:
> Hi Christian,
>
> I attached in this email a patch file for a JIRA issue
> (https://issues.apache.org/jira/browse/TAJO-983). I think that you can try
> to apply this patch to your TAJO source base. Although I made this patch for
> a different problem (and the patch is still under review), it might help a
> little bit because this patch reduces the data transmission. Since you are
> joining relatively big tables, there will be many data chunks transferred
> around. As a result, this patch can improve the processing speed. Anyway,
> you can revert to the stable code base anytime if you do not like the patch.
>
> Beside, when you compile and install TAJO, be sure include the flag
> "-Dhadoop.version=2.4.0". About 2 weeks ago, I also got a never-ending-query
> problem due to a bug in Pull Server, which seems to be caused by Hadoop
> native library. With the "-Dhadoop.version=2.4.0" flag, the problem
> disappears.
>
> Sincerely,
>
> Mai Hai Thanh
>
>
> On Fri, Sep 12, 2014 at 6:52 AM, Christian Schwabe
> <[email protected]> wrote:
>>
>> Hi Hyunsik,
>>
>> thank you for your assistance.
>> Sorry, your suggestions have been unsuccessful. And I have several sizes
>> of parameters used and started some test runs. Every time it ran out and the
>> nested loop join is selected.
>> Here are the following used set statements:
>>
>> 1) \set OUTER_HASH_JOIN_SIZE_LIMIT 1024
>> 2) \set OUTER_HASH_JOIN_SIZE_LIMIT 4096
>>
>> 3) \set JOIN_TASK_INPUT_SIZE 64
>> 3.1) \set JOIN_PER_SHUFFLE_SIZE 64
>>
>> 4) \set JOIN_TASK_INPUT_SIZE 32
>> 4.1) \set JOIN_PER_SHUFFLE_SIZE 32
>>
>> The biggest table in parquet format is 1GB.
>> If I understand correctly, it should over-large, but sufficient to put
>> OUTER_HASH_JOIN_SIZE_LIMIT to the value 4096, right?
>>
>> So many problems. I can honestly not understand and comprehend. What I'm
>> doing here is nothing extraordinary?!
>>
>> Best regards,
>> a sad Chris
>>
>>
>> Am 11.09.2014 um 17:14 schrieb Hyunsik Choi <[email protected]>:
>>
>> > Thank you Chris for sharing detailed information. I can reproduce the
>> > problem.
>> >
>> > The plan choice algorithm is working as following procedure:
>> > 1. If the smallest table of both join tables is less than some
>> > threshold (OUTER_HASH_JOIN_SIZE_LIMIT), it chooses hash left outer
>> > join algorithm.
>> > 2. Otherwise, it chooses nested loop left outer join algorithm.
>> >
>> > Your problem was that your smallest table size exceeds
>> > OUTER_HASH_JOIN_SIZE_LIMIT. There are various solutions you can try.
>> >
>> > First of all, you can try to directly increase
>> > OUTER_HASH_JOIN_SIZE_LIMIT. Please try set command to increase the
>> > threshold (megabytes unit) in tsql as follows:
>> >
>> > tajo> \set OUTER_HASH_JOIN_SIZE_LIMIT 256
>> >
>> > Instead of 256MB, you can use proper number according to your machine.
>> > But, this solution can be failed if available memory is smaller than
>> > the smallest table.
>> >
>> > The better solution is that you decrease JOIN_TASK_INPUT_SIZE and
>> > JOIN_PER_SHUFFLE_SIZE. Their default sizes are 128 MB. If you decrease
>> > them, each join task will be smaller. As a result, smallest table will
>> > be fit to current hash join threshold. For it, please type the set
>> > commands as follows:
>> >
>> > tajo> \set JOIN_TASK_INPUT_SIZE 64
>> > tajo> \set JOIN_PER_SHUFFLE_SIZE 64
>> >
>> >
>> > Thanks,
>> > Hyunsik
>> >
>> >
>> > On Thu, Sep 11, 2014 at 7:20 PM, Christian Schwabe
>> > <[email protected]> wrote:
>> >> Hi Hyunsik,
>> >>
>> >>
>> >> thanks for your response and to take this problem immediately. Many
>> >> thanks.
>> >> Attached the QueryPlan for my problematic Query.
>> >> If you need any further information feel free to ask.
>> >> Step ‚eb_1410429199927_0001_000005‘ is the problematic step.
>> >>
>> >>
>> >> Warm regards,
>> >> Chris
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> Am 11.09.2014 um 04:21 schrieb Hyunsik Choi <[email protected]>:
>> >>
>> >> Hi Chris,
>> >>
>> >> Could you share the plan of your problematic query? I'm trying to
>> >> reproduce
>> >> your problem, but the current planner of my system does not choose
>> >> nested
>> >> loop join. So, I'd like to see your query plan. The query plan will be
>> >> shown
>> >> in the query detail page in web UI.
>> >>
>> >> Usually, this kind of bugs can be solved for few hours if I reproduce.
>> >>
>> >> Best regards,
>> >> Hyunsik
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On Thu, Sep 11, 2014 at 1:54 AM, Hyunsik Choi <[email protected]>
>> >> wrote:
>> >>>
>> >>> Hi Chris,
>> >>>
>> >>> I was on vacation due to holiday. Tomorrow, I'll response how long
>> >>> will it
>> >>> take to solve the problem after I dig it more.
>> >>>
>> >>> Best regards,
>> >>> Hyunsik
>> >>>
>> >>> On Wed, Sep 10, 2014 at 8:07 PM, Christian Schwabe
>> >>> <[email protected]> wrote:
>> >>>>
>> >>>> Hello Hyunsik,
>> >>>>
>> >>>> are there any already saved new knowledge? Approaching the time where
>> >>>> I
>> >>>> need to finish my thesis is namely to end. And so far it was not
>> >>>> possible to
>> >>>> use Tajo within the scope for which it was originally intended. When
>> >>>> changes
>> >>>> to it in the near future not run out my work on it I will make a
>> >>>> recommendation currently not taking this software in production use.
>> >>>> That
>> >>>> would be bad, because I did not expect with this end.
>> >>>>
>> >>>> Best regards,
>> >>>> Chris
>> >>>>
>> >>>>
>> >>>>
>> >>>> Am 06.09.2014 14:01:41, schrieb Hyunsik Choi:
>> >>>>
>> >>>> Thank you for sharing your query. I'll dig into this problem. I'll
>> >>>> response soon.
>> >>>>
>> >>>> Best regards,
>> >>>> Hyunsik
>> >>>>
>> >>>>
>> >>>> On Fri, Sep 5, 2014 at 11:33 PM, Christian Schwabe
>> >>>> <[email protected]> wrote:
>> >>>>
>> >>>>
>> >>>> Hello Hyunsik,
>> >>>>
>> >>>> thanks for your response.
>> >>>> Here is the anonymized query.
>> >>>> If you need any further information feel free to ask ;)
>> >>>>
>> >>>>>> QUERY:
>> >>>> drop table if exists anonym1;
>> >>>> create table anonym1 using parquet
>> >>>> as
>> >>>> select
>> >>>> table1.a,
>> >>>> table1.b,
>> >>>> table1.c,
>> >>>> case when table1.ag = '1800-01-01' and table3.ah = '' then table3.ai
>> >>>> else
>> >>>> table1.ai end      as ai,
>> >>>> table1.d,
>> >>>> table1.e,
>> >>>> table1.f,
>> >>>> table1.g,
>> >>>> case when table1.ag = '1800-01-01' and table3.ah = ''  then table3.aj
>> >>>> else table1.aj end      as aj,
>> >>>> case when table1.ag = '1800-01-01' and table3.ah = ''  then table3.ak
>> >>>> else table1.ak end      as ak,
>> >>>> table1.h,
>> >>>> table1.i,
>> >>>> table1.j,
>> >>>> table1.k,
>> >>>> table1.l,
>> >>>> table1.m,
>> >>>> table1.n,
>> >>>> table1.o,
>> >>>> table1.p,
>> >>>> table1.q,
>> >>>> table1.r,
>> >>>> table1.s,
>> >>>> table1.t,
>> >>>> table1.u,
>> >>>> case when table1.ag = '1800-01-01' and table3.ah = ''  then table3.ag
>> >>>> else table1.ag end      as ag,
>> >>>> case when table1.ag = '1800-01-01' and table3.ah = ''  then table3.al
>> >>>> else table1.al end      as al,
>> >>>> case when table1.ag = '1800-01-01' and table3.ah = ''  then table3.am
>> >>>> else table1.am end      as am,
>> >>>> case when table1.ag = '1800-01-01' and table3.ah = ''  then table3.an
>> >>>> else table1.an end      as an,
>> >>>> case when table1.ag = '1800-01-01' and table3.ah = ''  then table3.ao
>> >>>> else table1.ao end      as ao,
>> >>>> case when table1.ag = '1800-01-01' and table3.ah = ''  then table3.ap
>> >>>> else table1.ap end      as ap,
>> >>>> table1.v,
>> >>>> table2.w,
>> >>>> table2.x,
>> >>>> table2.y,
>> >>>> table2.z,
>> >>>> table2.aa,
>> >>>> table2.ab,
>> >>>> table2.ac,
>> >>>> table2.ad,
>> >>>> table2.ae,
>> >>>> table2.af
>> >>>> from anonym2 table1
>> >>>> left join dfkkopw_hist table3 on (
>> >>>> table1.a = table3.a
>> >>>> and table1.b = table3.b
>> >>>> and table1.c = table3.c
>> >>>> and table1.aq = table3.aq
>> >>>> )
>> >>>> inner join anonym3 table2 on (
>> >>>> table1.a = table2.a
>> >>>> and table1.b = table2.b
>> >>>> and table1.c = table2.c
>> >>>> )
>> >>>> ;
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> Warm regards,
>> >>>> Chris
>> >>>>
>> >>>> Am 05.09.2014 13:46:44, schrieb Hyunsik Choi:
>> >>>>
>> >>>> Hi Chris,
>> >>>>
>> >>>> That's ok. I'd like suggest the following manners. I've been used
>> >>>> those
>> >>>> manners to fix special user query cases. I'm expecting that you can
>> >>>> do at
>> >>>> least one of them.
>> >>>>
>> >>>> - The first manner is to change column names and table names in order
>> >>>> to
>> >>>> hide your business logic if possible.
>> >>>> - The second manner is to share only join conditions with changing
>> >>>> field
>> >>>> names.
>> >>>>
>> >>>> Best regards,
>> >>>> Hyunsik
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Fri, Sep 5, 2014 at 3:15 AM, Christian Schwabe
>> >>>> <[email protected]> wrote:
>> >>>>
>> >>>> Hello Hyunsik,
>> >>>>
>> >>>> are the more detailed information been helpful? I could offer you
>> >>>> attend
>> >>>> a TeamViewer session to examine the problem more closely. Would that
>> >>>> be a
>> >>>> proposal on which you would let you in? So at least ensured that I
>> >>>> must not
>> >>>> publish queries.
>> >>>> Due to the time difference in Germany would have to find only a
>> >>>> suitable
>> >>>> date and time.
>> >>>>
>> >>>>
>> >>>> Warm regards,
>> >>>> Chris
>> >>>>
>> >>>>
>> >>>> Am 01.09.2014 20:26:35, schrieb Christian Schwabe:
>> >>>>
>> >>>> Hello Hyunsik,
>> >>>>
>> >>>> the late reply does not matter. I'm glad you support me so much.
>> >>>> That's
>> >>>> worth a lot! Unfortunately I can not post the query for privacy
>> >>>> reasons. Can
>> >>>> I send you other information to identify the problem in more detail?
>> >>>>
>> >>>> I can try to describe the general query: The projection refers to 43
>> >>>> fields, it is a left join combined with an inner join, the join
>> >>>> conditions
>> >>>> are made for up to four fields.
>> >>>> The tables are saved in parquet-format.
>> >>>> First table (12,853,555 rows 544,6MB)
>> >>>> Second table (17,896,966 rows 361,1MB)
>> >>>> Third table (18,825,153 rows 943,2MB)
>> >>>> Is there a way to force Tajo to a specific algorithm?
>> >>>>
>> >>>> You've talked about to give another advice, in the case of the nested
>> >>>> join is the only choice. What would this mean excactly?
>> >>>>
>> >>>> Warm regards,
>> >>>> Chris
>> >>>>
>> >>>> Am 01.09.2014 19:09:59, schrieb Hyunsik Choi:
>> >>>>
>> >>>> Hi Chris,
>> >>>>
>> >>>> Could you share your query? I can see a part of your query from your
>> >>>> screenshot, but the query is not entire one.
>> >>>>
>> >>>> I think that the first problem is that the query planner chooses
>> >>>> nested
>> >>>> loop join which is very inefficient join algorithm. In some cases,
>> >>>> there is
>> >>>> no alternative if there is no join condition or join condition is
>> >>>> theta
>> >>>> join. But, I'm suspecting that it may be a bug because most of joins
>> >>>> are
>> >>>> equi-joins.
>> >>>>
>> >>>> I'll give another advice If block nested-loop join is only choice.
>> >>>>
>> >>>> Best regards,
>> >>>> Hyunsik
>> >>>>
>> >>>>
>> >>>> On Mon, Sep 1, 2014 at 8:54 PM, Christian Schwabe
>> >>>> <[email protected]> wrote:
>> >>>>
>> >>>>
>> >>>> Hello guys,
>> >>>>
>> >>>> Does anyone have an idea what I could do against the failure of a
>> >>>> "nested
>> >>>> loop join"?
>> >>>>
>> >>>>
>> >>>> Best regards,
>> >>>> Chris
>> >>>>
>> >>>> Am 27.08.2014 10:55:27, schrieb Christian Schwabe:
>> >>>>
>> >>>> Hello Jinhang,
>> >>>>
>> >>>> Thank you for your helpful hints. In fact, he makes an impact. You
>> >>>> can
>> >>>> see it on the attached screenshot. Now he seems to hang when merging
>> >>>> the two
>> >>>> tables. This helpful hint you should definitely include in your
>> >>>> documentation. I have previously read this parameter still nowhere …
>> >>>>
>> >>>> The log show the following (partial list):
>> >>>>
>> >>>>
>> >>>> 2014-08-27 10:13:09,583 INFO org.apache.tajo.worker.Task: * Local
>> >>>> task
>> >>>> dir: file:/tmp/tajo-chris/tmpdir/q_1409126149411_0001/output/5/1_0
>> >>>> 2014-08-27 10:13:09,583 INFO org.apache.tajo.worker.Task:
>> >>>> ==================================
>> >>>> 2014-08-27 10:13:09,584 INFO org.apache.tajo.worker.Task: the
>> >>>> directory
>> >>>> is created
>> >>>>
>> >>>> file:/tmp/tajo-chris/tmpdir/q_1409126149411_0001/in/eb_1409126149411_0001_000005/1/0/eb_1409126149411_0001_000003
>> >>>> 2014-08-27 10:13:09,584 INFO org.apache.tajo.worker.Task: the
>> >>>> directory
>> >>>> is created
>> >>>>
>> >>>> file:/tmp/tajo-chris/tmpdir/q_1409126149411_0001/in/eb_1409126149411_0001_000005/1/0/eb_1409126149411_0001_000004
>> >>>> 2014-08-27 10:13:09,600 INFO
>> >>>> org.apache.tajo.worker.TaskAttemptContext:
>> >>>> Query status of ta_1409126149411_0001_000005_000001_00 is changed to
>> >>>> TA_RUNNING
>> >>>> 2014-08-27 10:13:09,602 INFO org.apache.tajo.worker.Fetcher: Status:
>> >>>> FETCH_FETCHING,
>> >>>>
>> >>>> URI:http://192.168.178.101:52334/?qid=q_1409126149411_0001&sid=4&p=1&type=h&ta=68_0,17_0,50_0,32_0,61_0,25_0,55_0,60_0,47_0,69_0,16_0,38_0,2_0,24_0,33_0,11_0,5_0,41_0,44_0,66_0,52_0,19_0,8_0,30_0,63_0,27_0,21_0,26_0,72_0,67_0,36_0,57_0,18_0,49_0,31_0,13_0,29_0,62_0,10_0,0_0,46_0,65_0,3_0,1_0,54_0,39_0,15_0,37_0,23_0,6_0,64_0,59_0,73_0,51_0,28_0,42_0,34_0,12_0,20_0,9_0,56_0,35_0,45_0,48_0,53_0,4_0,70_0,71_0,58_0,40_0,22_0,7_0,14_0,43_0
>> >>>> 2014-08-27 10:13:09,602 INFO org.apache.tajo.worker.Fetcher: Status:
>> >>>> FETCH_FETCHING,
>> >>>>
>> >>>> URI:http://christians-mbp.fritz.box:52334/?qid=q_1409126149411_0001&sid=3&p=1&type=h&ta=1_0
>> >>>> 2014-08-27 10:13:09,602 INFO
>> >>>> org.apache.tajo.pullserver.TajoPullServerService: Current number of
>> >>>> shuffle
>> >>>> connections (2)
>> >>>> 2014-08-27 10:13:09,602 INFO
>> >>>> org.apache.tajo.pullserver.TajoPullServerService: Current number of
>> >>>> shuffle
>> >>>> connections (3)
>> >>>> 2014-08-27 10:13:09,603 INFO
>> >>>> org.apache.tajo.pullserver.TajoPullServerService: PullServer request
>> >>>> param:
>> >>>> shuffleType=h, sid=3, partId=1, taskIds=[1_0]
>> >>>> 2014-08-27 10:13:09,603 INFO
>> >>>> org.apache.tajo.pullserver.TajoPullServerService: PullServer baseDir:
>> >>>> /tmp/tajo-chris/tmpdir/q_1409126149411_0001/output
>> >>>> 2014-08-27 10:13:09,604 INFO
>> >>>> org.apache.tajo.pullserver.TajoPullServerService: PullServer request
>> >>>> param:
>> >>>> shuffleType=h, sid=4, partId=1,
>> >>>>
>> >>>> taskIds=[68_0,17_0,50_0,32_0,61_0,25_0,55_0,60_0,47_0,69_0,16_0,38_0,2_0,24_0,33_0,11_0,5_0,41_0,44_0,66_0,52_0,19_0,8_0,30_0,63_0,27_0,21_0,26_0,72_0,67_0,36_0,57_0,18_0,49_0,31_0,13_0,29_0,62_0,10_0,0_0,46_0,65_0,3_0,1_0,54_0,39_0,15_0,37_0,23_0,6_0,64_0,59_0,73_0,51_0,28_0,42_0,34_0,12_0,20_0,9_0,56_0,35_0,45_0,48_0,53_0,4_0,70_0,71_0,58_0,40_0,22_0,7_0,14_0,43_0]
>> >>>> 2014-08-27 10:13:09,604 INFO
>> >>>> org.apache.tajo.pullserver.TajoPullServerService: PullServer baseDir:
>> >>>> /tmp/tajo-chris/tmpdir/q_1409126149411_0001/output
>> >>>> 2014-08-27 10:13:15,313 INFO org.apache.tajo.worker.Fetcher: Data
>> >>>> fetch
>> >>>> is done (total received bytes: 991342257)
>> >>>> 2014-08-27 10:13:15,317 INFO org.apache.tajo.worker.Fetcher: Status:
>> >>>> FETCH_FINISHED,
>> >>>>
>> >>>> URI:http://192.168.178.101:52334/?qid=q_1409126149411_0001&sid=4&p=1&type=h&ta=68_0,17_0,50_0,32_0,61_0,25_0,55_0,60_0,47_0,69_0,16_0,38_0,2_0,24_0,33_0,11_0,5_0,41_0,44_0,66_0,52_0,19_0,8_0,30_0,63_0,27_0,21_0,26_0,72_0,67_0,36_0,57_0,18_0,49_0,31_0,13_0,29_0,62_0,10_0,0_0,46_0,65_0,3_0,1_0,54_0,39_0,15_0,37_0,23_0,6_0,64_0,59_0,73_0,51_0,28_0,42_0,34_0,12_0,20_0,9_0,56_0,35_0,45_0,48_0,53_0,4_0,70_0,71_0,58_0,40_0,22_0,7_0,14_0,43_0
>> >>>> 2014-08-27 10:13:18,685 INFO org.apache.tajo.worker.Fetcher: Data
>> >>>> fetch
>> >>>> is done (total received bytes: 2225872857)
>> >>>> 2014-08-27 10:13:18,685 INFO org.apache.tajo.worker.Fetcher: Status:
>> >>>> FETCH_FINISHED,
>> >>>>
>> >>>> URI:http://christians-mbp.fritz.box:52334/?qid=q_1409126149411_0001&sid=3&p=1&type=h&ta=1_0
>> >>>> 2014-08-27 10:13:18,686 INFO org.apache.tajo.worker.Task:
>> >>>> ta_1409126149411_0001_000005_000001_00 All fetches are done!
>> >>>> 2014-08-27 10:13:18,688 INFO
>> >>>> org.apache.tajo.engine.planner.PhysicalPlannerImpl: Left Outer Join
>> >>>> (9)
>> >>>> chooses [Nested Loop Join].
>> >>>>
>> >>>>
>> >>>> What means „Nested Loop Join“? The last entry in the above log is
>> >>>> already
>> >>>> passed since 40 minutes. Nothing happens since that point.
>> >>>>
>> >>>>
>> >>>> Best regards,
>> >>>> Chris
>> >>>>
>> >>>>
>> >>>> Am 26.08.2014 um 08:17 schrieb Jinhang Choi <[email protected]>:
>> >>>>
>> >>>> At first, you can take a try of changing resource-tracker's heartbeat
>> >>>> timeout before investigating the actual problem. worker log shows
>> >>>> on-going
>> >>>> fetcher operations even though acknowledging heartbeat responses are
>> >>>> delayed.
>> >>>>
>> >>>> besides, master log indicates Worker's deactivation with
>> >>>> LivelinessMonitor's timeout as follow:
>> >>>>
>> >>>> ====
>> >>>>
>> >>>> 2014-08-25 21:18:47,968 INFO
>> >>>> org.apache.hadoop.yarn.util.AbstractLivelinessMonitor:
>> >>>> Expired:christians-mbp.fritz.box:28093:28091 Timed out after 120 secs
>> >>>>
>> >>>> 2014-08-25 21:18:47,978 INFO org.apache.tajo.master.rm.Worker:
>> >>>> Deactivating Node christians-mbp.fritz.box:28093:28091 as it is now
>> >>>> LOST
>> >>>>
>> >>>> ====
>> >>>>
>> >>>>
>> >>>> How about testing tajo.resource-tracker.heartbeat.timeout-secs in
>> >>>> tajo-site.xml like this?
>> >>>>
>> >>>>
>> >>>>
>> >>>> <configuration>
>> >>>>
>> >>>> ....
>> >>>>
>> >>>>    <property>
>> >>>>
>> >>>>        <name>tajo.resource-tracker.heartbeat.timeout-secs</name>
>> >>>>
>> >>>>        <value>240000</value>  // or, your own longer value. default
>> >>>> is
>> >>>> 120*1000 (2 minutes)
>> >>>>
>> >>>>    </property>
>> >>>>
>> >>>> </configuration>
>> >>>>
>> >>>>
>> >>>>
>> >>>> Sincerely,
>> >>>>
>> >>>> ----------------------------------------------
>> >>>> Jinhang Choi, CTO.
>> >>>> Linewalks Inc. Seoul 137-860, Korea
>> >>>> Office: +82 70 7702 3043
>> >>>> FAX: +82 2 2055 0612
>> >>>>
>> >>>> -----Original Message-----
>> >>>> From: "Christian Schwabe"<[email protected]>
>> >>>> To: <[email protected]>; "Jinhang Choi"<[email protected]>;
>> >>>> Cc:
>> >>>> Sent: 2014-08-26 (화) 12:09:37
>> >>>> Subject: Re: Big big query
>> >>>>
>> >>>> Hello everyone,
>> >>>>
>> >>>>
>> >>>> I've tested a lot again today. I want to share my experiences and
>> >>>> discuss
>> >>>> it here. I have attached again a log of the Worker for the query
>> >>>> which runs
>> >>>> seemingly endless.
>> >>>> As requested by you I have assigned to the worker 4GB. 8GB has my
>> >>>> MacBook
>> >>>> available.
>> >>>> Also assign more memory is no solution.
>> >>>> Joins on small tables with windows_functions an little content seem
>> >>>> to be
>> >>>> no problem.
>> >>>> Joins with many columns and large tables, as it is a table in this
>> >>>> example seems to be a real problem. My guess at this point is an
>> >>>> incorrect
>> >>>> memory management of Tajo.
>> >>>> I have a video made by you better show the WebUI and to get a better
>> >>>> picture of the situation. In combination with the submitted logs I
>> >>>> hope here
>> >>>> together to find a solution.
>> >>>> Here is the video: http://youtu.be/_TKzRluzg38
>> >>>>
>> >>>>
>> >>>> Best regards,
>> >>>> Chris
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> Am 25.08.2014 um 08:52 schrieb Jinhang Choi <[email protected]>:
>> >>>>
>> >>>> Dear Christian,
>> >>>>
>> >>>> worker log indicates that "GC overhead limit exceeded."
>> >>>>
>> >>>> would you mind to extend worker's heap memory size at tajo-env.sh?
>> >>>>
>> >>>> (please refer to
>> >>>>
>> >>>> http://tajo.apache.org/docs/current/configuration/worker_configuration.html)
>> >>>>
>> >>>> Sincerely,
>> >>>> ----------------------------------------------
>> >>>> Jinhang Choi, CTO.
>> >>>> Linewalks Inc. Seoul 137-860, Korea
>> >>>> Office: +82 70 7702 3043
>> >>>> FAX: +82 2 2055 0612
>> >>>>
>> >>>> -----Original Message-----
>> >>>> From: "Christian Schwabe"<[email protected]>
>> >>>> To: <[email protected]>;
>> >>>> Cc:
>> >>>> Sent: 2014-08-25 (월) 15:33:15
>> >>>> Subject: Re: Big big query
>> >>>>
>> >>>>
>> >>>> Hello Hyunsik,
>> >>>>
>> >>>> sorry. My fault. I will send you another email with the attached
>> >>>> logs.
>> >>>>
>> >>>> Best regards,
>> >>>> Chris
>> >>>>
>> >>>> Am 25.08.2014 08:28:17, schrieb Hyunsik Choi:
>> >>>>
>> >>>> Hi Chris,
>> >>>>
>> >>>> As Jihoon mentioned, it would be better to find the problems if you
>> >>>> attach master and worker logs.
>> >>>>
>> >>>> Thanks,
>> >>>> hyunsik
>> >>>>
>> >>>>
>> >>>> On Sun, Aug 24, 2014 at 4:17 PM, Christian Schwabe
>> >>>> <[email protected]> wrote:
>> >>>>
>> >>>> Hello guys
>> >>>>
>> >>>> i started following query last night and this morning have seen that
>> >>>> still ran the query with the fact that has the display of procent not
>> >>>> changed and only ran on time. So I have to start again this morning
>> >>>> reproduce the query to the error. The result you see in the table
>> >>>> below:
>> >>>>
>> >>>> Running Queries
>> >>>>
>> >>>> QueryIdQuery Master Started ProgressTimeStatus sql Kill Query
>> >>>> q_1408862421753_0001 christians-mbp.fritz.box2014-08-24 08:46:33 45%
>> >>>> 15
>> >>>> mins, 48 secQUERY_RUNNINGINSERT INTO TMP_DFKKKO_DFKKOP select
>> >>>> pos.validthru,
>> >>>> pos.mandt, pos.opbel, pos.opupk, pos.opupz, pos.bukrs, pos.augrs,
>> >>>> pos.abwkt,
>> >>>> pos.hvorg, pos.tvorg, pos.kofiz, pos.hkont, pos.mwskz, pos.mwszkz,
>> >>>> pos.xanza, pos.stakz, pos.astkz, pos.opsta, pos.infoz, pos.inkps,
>> >>>> pos.betrh,
>> >>>> pos.studt, ko.fikey, ko.blart, ko.herkf, ko.stbel, ko.storb,
>> >>>> ko.ernam,
>> >>>> ko.cpudt, ko.cputm, ko.bldat, ko.budat from dfkkop_hist pos left join
>> >>>> dfkkopw_hist wdh on ( pos.validthru = wdh.validthru and pos.mandt =
>> >>>> wdh.mandt and pos.opbel = wdh.opbel and pos.whgrp = wdh.whgrp ) inner
>> >>>> join
>> >>>> dfkkko_hist ko on ( pos.validthru = ko.validthru and pos.mandt =
>> >>>> ko.mandt
>> >>>> and pos.opbel = ko.opbel )
>> >>>>
>> >>>> Second Screenshot:
>> >>>>
>> >>>> Running Queries
>> >>>>
>> >>>> QueryIdQuery Master Started ProgressTimeStatus sql Kill Query
>> >>>> q_1408862421753_0001 christians-mbp.fritz.box2014-08-24 08:46:33 43%
>> >>>> 23
>> >>>> mins, 21 secQUERY_RUNNINGINSERT INTO TMP_DFKKKO_DFKKOP select
>> >>>> pos.validthru,
>> >>>> pos.mandt, pos.opbel, pos.opupk, pos.opupz, pos.bukrs, pos.augrs,
>> >>>> pos.abwkt,
>> >>>> pos.hvorg, pos.tvorg, pos.kofiz, pos.hkont, pos.mwskz, pos.mwszkz,
>> >>>> pos.xanza, pos.stakz, pos.astkz, pos.opsta, pos.infoz, pos.inkps,
>> >>>> pos.betrh,
>> >>>> pos.studt, ko.fikey, ko.blart, ko.herkf, ko.stbel, ko.storb,
>> >>>> ko.ernam,
>> >>>> ko.cpudt, ko.cputm, ko.bldat, ko.budat from dfkkop_hist pos left join
>> >>>> dfkkopw_hist wdh on ( pos.validthru = wdh.validthru and pos.mandt =
>> >>>> wdh.mandt and pos.opbel = wdh.opbel and pos.whgrp = wdh.whgrp ) inner
>> >>>> join
>> >>>> dfkkko_hist ko on ( pos.validthru = ko.validthru and pos.mandt =
>> >>>> ko.mandt
>> >>>> and pos.opbel = ko.opbel )
>> >>>>
>> >>>> Third Screenshot:
>> >>>>
>> >>>>
>> >>>> Finished Queries
>> >>>>
>> >>>> QueryIdQuery Master Started FinishedTimeStatus sql
>> >>>> q_1408862421753_0001christians-mbp.fritz.box 2014-08-24 08:46:33--
>> >>>> QUERY_RUNNINGINSERT INTO TMP_DFKKKO_DFKKOP select pos.validthru,
>> >>>> pos.mandt,
>> >>>> pos.opbel, pos.opupk, pos.opupz, pos.bukrs, pos.augrs, pos.abwkt,
>> >>>> pos.hvorg,
>> >>>> pos.tvorg, pos.kofiz, pos.hkont, pos.mwskz, pos.mwszkz, pos.xanza,
>> >>>> pos.stakz, pos.astkz, pos.opsta, pos.infoz, pos.inkps, pos.betrh,
>> >>>> pos.studt,
>> >>>> ko.fikey, ko.blart, ko.herkf, ko.stbel, ko.storb, ko.ernam, ko.cpudt,
>> >>>> ko.cputm, ko.bldat, ko.budat from dfkkop_hist pos left join
>> >>>> dfkkopw_hist wdh
>> >>>> on ( pos.validthru = wdh.validthru and pos.mandt = wdh.mandt and
>> >>>> pos.opbel =
>> >>>> wdh.opbel and pos.whgrp = wdh.whgrp ) inner join dfkkko_hist ko on (
>> >>>> pos.validthru = ko.validthru and pos.mandt = ko.mandt and pos.opbel =
>> >>>> ko.opbel )
>> >>>>
>> >>>> As you can see, the query is still running but there is no image data
>> >>>> and
>> >>>> progress more.
>> >>>>
>> >>>> I attached a log in which you can see the output in the console.
>> >>>> Striking
>> >>>> here is the display of the percentage jumps sometimes and not further
>> >>>> altered towards the end and remains constant.
>> >>>> The data size of the tables to which I JOINS by lead is for
>> >>>> dfkkop_hist
>> >>>> 5,83GB, dfkkopw_hist 2,47GB and dfkkko_hist 2.35 GB. As I write this,
>> >>>> the
>> >>>> description of the query is still running.
>> >>>> I know these are large amounts of data, but I would have expected
>> >>>> which
>> >>>> thus constitutes the colloquial no problem. Can you imagine why it
>> >>>> comes
>> >>>> here to this problem?
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> Best regards,
>> >>>> Chris
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>
>> >>
>> >>
>> >>
>>
>

Re: Big big query

Reply via email to