At first, you can take a try of changing resource-tracker's heartbeat timeout 
before investigating the actual problem. worker log shows on-going fetcher 
operations even though acknowledging heartbeat responses are delayed.
besides, master log indicates Worker's deactivation with LivelinessMonitor's 
timeout as follow:
====
2014-08-25 21:18:47,968 INFO 
org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
Expired:christians-mbp.fritz.box:28093:28091 Timed out after 120 secs
2014-08-25 21:18:47,978 INFO org.apache.tajo.master.rm.Worker: Deactivating 
Node christians-mbp.fritz.box:28093:28091 as it is now LOST
==== 
How about testing tajo.resource-tracker.heartbeat.timeout-secs in tajo-site.xml 
like this?
 
<configuration>
....
    <property>
        <name>tajo.resource-tracker.heartbeat.timeout-secs</name>
        <value>240000</value>  // or, your own longer value. 
default is 120*1000 (2 minutes)
    </property>
</configuration>
 
Sincerely,
----------------------------------------------
Jinhang Choi, CTO.
Linewalks Inc. Seoul 137-860, Korea
Office: +82 70 7702 3043
FAX: +82 2 2055 0612
-----Original Message-----
From: "Christian Schwabe"<[email protected]> 
To: <[email protected]>; "Jinhang Choi"<[email protected]>; 
Cc: 
Sent: 2014-08-26 (화) 12:09:37
Subject: Re: Big big query

Hello everyone,
 I've tested a lot again today. I want to share my experiences and discuss it 
here. I have attached again a log of the Worker for the query which runs 
seemingly endless. As requested by you I have assigned to the worker 4GB. 8GB 
has my MacBook available.Also assign more memory is no solution.Joins on small 
tables with windows_functions an little content seem to be no problem.Joins 
with many columns and large tables, as it is a table in this example seems to 
be a real problem. My guess at this point is an incorrect memory management of 
Tajo.I have a video made   by you better show the WebUI and to get a better 
picture of the situation. In combination with the submitted logs I hope here 
together to find a solution.Here is the video: http://youtu.be/_TKzRluzg38  
Best regards,Chris 


Am 25.08.2014 um 08:52 schrieb Jinhang Choi <[email protected]>:
Dear Christian,
worker log indicates that "GC overhead limit exceeded."
would you mind to extend worker's heap memory size at tajo-env.sh?
(please refer to 
http://tajo.apache.org/docs/current/configuration/worker_configuration.html)

Sincerely,
----------------------------------------------
Jinhang Choi, CTO.
Linewalks Inc. Seoul 137-860, Korea
Office: +82 70 7702 3043
FAX: +82 2 2055 0612
-----Original Message-----
From: "Christian Schwabe"<[email protected]> 
To: <[email protected]>; 
Cc: 
Sent: 2014-08-25 (월) 15:33:15
Subject: Re: Big big query


Hello Hyunsik,
sorry. My fault. I will send you another email with the attached logs.
Best regards,Chris

Am 25.08.2014 08:28:17, schrieb Hyunsik Choi:
Hi Chris, As Jihoon mentioned, it would be better to find the problems if you 
attach master and worker logs. Thanks,hyunsik


On Sun, Aug 24, 2014 at 4:17 PM, Christian Schwabe 
<[email protected]> wrote:

Hello guys i started following query last night and this morning have seen that 
still ran the query with the fact that has the display of procent not changed 
and only ran on time. So I have to start again this morning reproduce the query 
to the error. The result you see in the table below:
Running QueriesQueryIdQuery MasterStarted
ProgressTimeStatussql
Kill Queryq_1408862421753_0001christians-mbp.fritz.box2014-08-24 08:46:3345%
15 mins, 48 secQUERY_RUNNINGINSERT INTO TMP_DFKKKO_DFKKOP select pos.validthru, 
pos.mandt, pos.opbel, pos.opupk, pos.opupz, pos.bukrs, pos.augrs, pos.abwkt, 
pos.hvorg, pos.tvorg, pos.kofiz, pos.hkont, pos.mwskz, pos.mwszkz, pos.xanza, 
pos.stakz, pos.astkz, pos.opsta, pos.infoz, pos.inkps, pos.betrh, pos.studt, 
ko.fikey, ko.blart, ko.herkf, ko.stbel, ko.storb, ko.ernam, ko.cpudt, ko.cputm, 
ko.bldat, ko.budat from dfkkop_hist pos left join dfkkopw_hist wdh on ( 
pos.validthru = wdh.validthru and pos.mandt = wdh.mandt and pos.opbel = 
wdh.opbel and pos.whgrp = wdh.whgrp ) inner join dfkkko_hist ko on ( 
pos.validthru = ko.validthru and pos.mandt = ko.mandt and pos.opbel = ko.opbel 
) 
Second Screenshot:Running QueriesQueryIdQuery MasterStarted
ProgressTimeStatussql
Kill Queryq_1408862421753_0001christians-mbp.fritz.box2014-08-24 08:46:3343%
23 mins, 21 secQUERY_RUNNINGINSERT INTO TMP_DFKKKO_DFKKOP select pos.validthru, 
pos.mandt, pos.opbel, pos.opupk, pos.opupz, pos.bukrs, pos.augrs, pos.abwkt, 
pos.hvorg, pos.tvorg, pos.kofiz, pos.hkont, pos.mwskz, pos.mwszkz, pos.xanza, 
pos.stakz, pos.astkz, pos.opsta, pos.infoz, pos.inkps, pos.betrh, pos.studt, 
ko.fikey, ko.blart, ko.herkf, ko.stbel, ko.storb, ko.ernam, ko.cpudt, ko.cputm, 
ko.bldat, ko.budat from dfkkop_hist pos left join dfkkopw_hist wdh on ( 
pos.validthru = wdh.validthru and pos.mandt = wdh.mandt and pos.opbel = 
wdh.opbel and pos.whgrp = wdh.whgrp ) inner join dfkkko_hist ko on ( 
pos.validthru = ko.validthru and pos.mandt = ko.mandt and pos.opbel = ko.opbel 
) Third Screenshot: Finished QueriesQueryIdQuery MasterStarted
FinishedTimeStatussql
q_1408862421753_0001christians-mbp.fritz.box2014-08-24 08:46:33--
QUERY_RUNNINGINSERT INTO TMP_DFKKKO_DFKKOP select pos.validthru, pos.mandt, 
pos.opbel, pos.opupk, pos.opupz, pos.bukrs, pos.augrs, pos.abwkt, pos.hvorg, 
pos.tvorg, pos.kofiz, pos.hkont, pos.mwskz, pos.mwszkz, pos.xanza, pos.stakz, 
pos.astkz, pos.opsta, pos.infoz, pos.inkps, pos.betrh, pos.studt, ko.fikey, 
ko.blart, ko.herkf, ko.stbel, ko.storb, ko.ernam, ko.cpudt, ko.cputm, ko.bldat, 
ko.budat from dfkkop_hist pos left join dfkkopw_hist wdh on ( pos.validthru = 
wdh.validthru and pos.mandt = wdh.mandt and pos.opbel = wdh.opbel and pos.whgrp 
= wdh.whgrp ) inner join dfkkko_hist ko on ( pos.validthru = ko.validthru and 
pos.mandt = ko.mandt and pos.opbel = ko.opbel ) As you can see, the query is 
still running but there is no image data and progress more.  I attached a log 
in which you can see the output in the console. Striking here is the display of 
the percentage jumps sometimes and not further altered towards the end and 
remains constant.
The data size of the tables to which I JOINS by lead is for dfkkop_hist 5,83GB, 
dfkkopw_hist 2,47GB and dfkkko_hist 2.35 GB. As I write this, the description 
of the query is still running.I know these are large amounts of data, but I 
would have expected which thus constitutes the colloquial no problem. Can you 
imagine why it comes here to this problem?
 
  Best regards,
Chris
 
 



 


Reply via email to