The "Other" jobs here means those inactive jobs, including completed and
discarded. As time goes on, the number increases. If you found the "Jobs"
tab on web UI loading is very slow, you'd better do a cleanup; The job
engine is doing the same thing, retrieving the jobs from HBase. I'm not
sure whether that is the root cause in your case, because 20 ~ 30 minutes
is too long a time for loading 500 records, but it worth a try.

>From v1.2, run "./bin/metadata clean --delete true" will also drop those
old inactive jobs (old than 30 days I remeber). Before doing that, take a
metadata backup is a must, please check
https://kylin.apache.org/docs/howto/howto_backup_metadata.html

For v1.1, I'm afraid there is no simple way to cleanup the old jobs.

2016-01-25 1:10 GMT+08:00 Prashant Prakash <[email protected]>:

> Hi,
>
> We are running kylin v1.1.1 on our cluster. Over the time we have
> observed degradation in cube building process. By debugging we have
> confirmed that all the Executable, are running fine, the issue lies in
> DefaultScheduler.FetcherRunner. By default FetcherRunner is scheduled to
> run at every 60 sec, but in our case the lag is ~ 20 - 30 min.
>
> [kylin@ec2-184-169-138-213 logs]$ tail -1000f kylin_job.log | grep "Job
> Fetcher"
>                                                               
> [pool-6-thread-1]:[2016-01-24
> *06:12:53*,465][INFO][org.apache.kylin.job.impl.threadpool.DefaultScheduler$FetcherRunner.run(DefaultScheduler.java:112)]
> - Job Fetcher: 0 running, 1 actual running, 4 ready, 533 others
>
> [pool-6-thread-1]:[2016-01-24 
> *06:34:15*,518][INFO][org.apache.kylin.job.impl.threadpool.DefaultScheduler$FetcherRunner.run(DefaultScheduler.java:112)]
> - Job Fetcher: 0 running, 1 actual running, 4 ready, 533 others
>
> [pool-6-thread-1]:[2016-01-24 
> *06:55:46*,133][INFO][org.apache.kylin.job.impl.threadpool.DefaultScheduler$FetcherRunner.run(DefaultScheduler.java:112)]
> - Job Fetcher: 0 running, 1 actual running, 4 ready, 533
> others[pool-6-thread-1]:[2016-01-24
> 07:18:52,040][INFO][org.apache.kylin.job.impl.threadpool.DefaultScheduler$FetcherRunner.run(DefaultScheduler.java:112)]
> - Job Fetcher: 0 running, 1 actual running, 4 ready, 533 others
>
> *1. *One key number in the log is *533 others, *which I suspect mostly
> contains jobs with state "SUCCEED" (Have not verified). The reason behind
> the guess is that we are not deleting the jobs after they get completed.
> There is a method  ExecutableManager.deleteJob but as of now it is being
> called only from Test module.
>
> *2. *Over time we have observed the number in *Others* column grow. From
> thread dump it looks like FetcherRunner most of the time is fetching
> jobOutput.
>
> Because of above the two reason I suspect build of jobs with state "
> SUCCEED"/ DISCARDED" might be the reason behind performance degradation.
>
> Please provide feedback on wether my hypothesis have some merit.
>
>
> *Thread dump:*
>
> "pool-6-thread-1" #819 prio=5 os_prio=0 tid=0x00007fe264591000 nid=0x25c9
> in Object.wait() [0x00007fe237a9d000]
>
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>
>         at java.lang.Object.wait(Native Method)
>
>         at java.lang.Object.wait(Object.java:460)
>
>         at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348)
>
>         at
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService.poll(ResultBoundedCompletionService.java:155)
>
>         - locked <0x00000003d8026808> (a
> [Lorg.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture;)
>
>         at
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:168)
>
>         at
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:57)
>
>         at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
>
>         at
> org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:293)
>
>         at
> org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:393)
>
>         at
> org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:337)
>
>         at
> org.apache.hadoop.hbase.client.AbstractClientScanner$1.hasNext(AbstractClientScanner.java:94)
>
>         at
> org.apache.kylin.common.persistence.HBaseResourceStore.getByScan(HBaseResourceStore.java:283)
>
>         at
> org.apache.kylin.common.persistence.HBaseResourceStore.getResourceImpl(HBaseResourceStore.java:201)
>
>         at
> org.apache.kylin.common.persistence.ResourceStore.getResource(ResourceStore.java:130)
>
>         at
> org.apache.kylin.job.dao.ExecutableDao.readJobOutputResource(ExecutableDao.java:90)
>
>         at
> org.apache.kylin.job.dao.ExecutableDao.getJobOutput(ExecutableDao.java:179)
>
>         at
> org.apache.kylin.job.manager.ExecutableManager.getOutput(ExecutableManager.java:119)
>
>         at org.apache.kylin.job.impl.threadpool.DefaultScheduler$
> FetcherRunner.run(DefaultScheduler.java:93)
>
>         - locked <0x0000000482aa9c98> (a
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$FetcherRunner)
>
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>
>         at java.lang.Thread.run(Thread.java:745)
>
>
> Regards
>
> Prashant
>



-- 
Best regards,

Shaofeng Shi

Reply via email to