Re: common.HadoopJobStatusChecker:58 : error check status

Li Yang Tue, 18 Jul 2017 23:19:56 -0700

Given the exception actually happens in Hadoop code:
> java.lang.NullPointerException at
org.apache.hadoop.mapreduce.Job.getTrackingURL(Job.java:380)


And you had cube built successfully before. You might want to check recent
changes to your Hadoop env. It seems broken somewhere.

On Fri, Jul 14, 2017 at 11:07 AM, crossme <cros...@aliyun.com> wrote:

>
>
>
> >>>>>Perform third steps of logging information
>
>          Counters: 53
>  File System Counters
>   FILE: Number of bytes read=326082918
>   FILE: Number of bytes written=639475115
>   FILE: Number of read operations=0
>   FILE: Number of large read operations=0
>   FILE: Number of write operations=0
>   HDFS: Number of bytes read=375767996
>   HDFS: Number of bytes written=154906
>   HDFS: Number of read operations=48
>   HDFS: Number of large read operations=0
>   HDFS: Number of write operations=8
>  Job Counters
>   Failed reduce tasks=7
>   Killed reduce tasks=4
>   Launched map tasks=9
>   Launched reduce tasks=15
>   Data-local map tasks=7
>   Rack-local map tasks=2
>   Total time spent by all maps in occupied slots (ms)=554536
>   Total time spent by all reduces in occupied slots (ms)=1035019
>   Total time spent by all map tasks (ms)=554536
>   Total time spent by all reduce tasks (ms)=1035019
>   Total vcore-seconds taken by all map tasks=554536
>   Total vcore-seconds taken by all reduce tasks=1035019
>   Total megabyte-seconds taken by all map tasks=567844864
>   Total megabyte-seconds taken by all reduce tasks=1059859456
>  Map-Reduce Framework
>   Map input records=8758042
>   Map output records=70064417
>   Map output bytes=1547698142
>   Map output materialized bytes=310833429
>   Input split bytes=96597
>   Combine input records=70064417
>   Combine output records=42960200
>   Reduce input groups=289020
>   Reduce shuffle bytes=8450082
>   Reduce input records=1652047
>   Reduce output records=0
>   Spilled Records=87572447
>   Shuffled Maps =36
>   Failed Shuffles=0
>   Merged Map outputs=36
>   GC time elapsed (ms)=6372
>   CPU time spent (ms)=677610
>   Physical memory (bytes) snapshot=8539713536
>   Virtual memory (bytes) snapshot=36823269376
>   Total committed heap usage (bytes)=8535932928
>  Shuffle Errors
>   BAD_ID=0
>   CONNECTION=0
>   IO_ERROR=0
>   WRONG_LENGTH=0
>   WRONG_MAP=0
>   WRONG_REDUCE=0
>  File Input Format Counters
>   Bytes Read=0
>  File Output Format Counters
>   Bytes Written=0
>  org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper$RawDataCounter
>   BYTES=1721507003
>
>
> >>>>>The following is an error in the kylin.log file.
>   There is no error in the Hadoop log file
>
> 2017-07-14 10:40:10,427 INFO  [pool-9-thread-1] threadpool.
> DefaultScheduler:124 : Job Fetcher: 1 should running, 1
> actual running, 0 stopped, 0 ready, 10 already succeed, 8
> error, 6 discarded, 0 others
> 2017-07-14 10:40:12,442 DEBUG [Job 14691c4a-64d2-4b1d-ace5-
> d2d6ad9618d0-297] dao.ExecutableDao:217 : updating
> job output, id: 14691c4a-64d2-4b1d-ace5-d2d6ad9618d0-02
> 2017-07-14 10:40:22,450 DEBUG [Job 14691c4a-64d2-4b1d-ace5-
> d2d6ad9618d0-297] dao.ExecutableDao:217 : updating
> job output, id: 14691c4a-64d2-4b1d-ace5-d2d6ad9618d0-02
> 2017-07-14 10:40:32,457 DEBUG [Job 14691c4a-64d2-4b1d-ace5-
> d2d6ad9618d0-297] dao.ExecutableDao:217 : updating
> job output, id: 14691c4a-64d2-4b1d-ace5-d2d6ad9618d0-02
> 2017-07-14 10:40:42,467 DEBUG [Job 14691c4a-64d2-4b1d-ace5-
> d2d6ad9618d0-297] dao.ExecutableDao:217 : updating
> job output, id: 14691c4a-64d2-4b1d-ace5-d2d6ad9618d0-02
> 2017-07-14 10:40:52,478 DEBUG [Job 14691c4a-64d2-4b1d-ace5-
> d2d6ad9618d0-297] dao.ExecutableDao:217 : updating
> job output, id: 14691c4a-64d2-4b1d-ace5-d2d6ad9618d0-02
> 2017-07-14 10:41:02,493 DEBUG [Job 14691c4a-64d2-4b1d-ace5-
> d2d6ad9618d0-297] dao.ExecutableDao:217 : updating
> job output, id: 14691c4a-64d2-4b1d-ace5-d2d6ad9618d0-02
> 2017-07-14 10:41:10,430 INFO  [pool-9-thread-1] threadpool.
> DefaultScheduler:124 : Job Fetcher: 1 should running, 1
> actual running, 0 stopped, 0 ready, 10 already succeed, 8
> error, 6 discarded, 0 others
> 2017-07-14 10:41:12,518 DEBUG [Job 14691c4a-64d2-4b1d-ace5-
> d2d6ad9618d0-297] dao.ExecutableDao:217 : updating
> job output, id: 14691c4a-64d2-4b1d-ace5-d2d6ad9618d0-02
> 2017-07-14 10:41:22,527 DEBUG [Job 14691c4a-64d2-4b1d-ace5-
> d2d6ad9618d0-297] dao.ExecutableDao:217 : updating
> job output, id: 14691c4a-64d2-4b1d-ace5-d2d6ad9618d0-02
> 2017-07-14 10:41:32,535 DEBUG [Job 14691c4a-64d2-4b1d-ace5-
> d2d6ad9618d0-297] dao.ExecutableDao:217 : updating
> job output, id: 14691c4a-64d2-4b1d-ace5-d2d6ad9618d0-02
> 2017-07-14 10:41:42,544 DEBUG [Job 14691c4a-64d2-4b1d-ace5-
> d2d6ad9618d0-297] dao.ExecutableDao:217 : updating
> job output, id: 14691c4a-64d2-4b1d-ace5-d2d6ad9618d0-02
> 2017-07-14 10:41:53,548 INFO  [Job 14691c4a-64d2-4b1d-ace5-
> d2d6ad9618d0-297] ipc.Client:867 : Retrying connect to server: dn1/
> 10.50.229.209:51098. Already tried 0 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=
> 1000 MILLISECONDS)
> 2017-07-14 10:41:54,549 INFO  [Job 14691c4a-64d2-4b1d-ace5-
> d2d6ad9618d0-297] ipc.Client:867 : Retrying connect to server: dn1/
> 10.50.229.209:51098. Already tried 1 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=
> 1000 MILLISECONDS)
> 2017-07-14 10:41:55,549 INFO  [Job 14691c4a-64d2-4b1d-ace5-
> d2d6ad9618d0-297] ipc.Client:867 : Retrying connect to server: dn1/
> 10.50.229.209:51098. Already tried 2 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=
> 1000 MILLISECONDS)
> 2017-07-14 10:41:55,663 INFO  [Job 14691c4a-64d2-4b1d-ace5-
> d2d6ad9618d0-297] mapred.ClientServiceDelegate:277 : Application state is
> completed. FinalApplicationStatus=FAILED. Redirecting to job history
> server
> 2017-07-14 10:41:55,686 ERROR [Job 14691c4a-64d2-4b1d-ace5-
> d2d6ad9618d0-297] common.HadoopJobStatusChecker:58 : error check status
> java.io.IOException: Job status not available
>  at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:334)
>  at org.apache.hadoop.mapreduce.Job.getStatus(Job.java:341)
>  at org.apache.kylin.engine.mr.common.HadoopJobStatusChecker.checkStatus(
> HadoopJobStatusChecker.java:38)
>  at org.apache.kylin.engine.mr.common.MapReduceExecutable.
> doWork(MapReduceExecutable.java:153)
>  at org.apache.kylin.job.execution.AbstractExecutable.
> execute(AbstractExecutable.java:124)
>  at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(
> DefaultChainedExecutable.java:64)
>  at org.apache.kylin.job.execution.AbstractExecutable.
> execute(AbstractExecutable.java:124)
>  at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(
> DefaultScheduler.java:142)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:748)
> 2017-07-14 10:41:55,687 ERROR [Job 14691c4a-64d2-4b1d-ace5-
> d2d6ad9618d0-297] common.MapReduceExecutable:197 : error execute
> MapReduceExecutable{id=14691c4a-64d2-4b1d-ace5-
> d2d6ad9618d0-02, name=Extract Fact Table Distinct Columns, state=RUNNING}
> java.lang.NullPointerException
>  at org.apache.hadoop.mapreduce.Job.getTrackingURL(Job.java:380)
>  at org.apache.kylin.engine.mr.common.HadoopCmdOutput.
> getInfo(HadoopCmdOutput.java:61)
>  at org.apache.kylin.engine.mr.common.MapReduceExecutable.
> doWork(MapReduceExecutable.java:162)
>  at org.apache.kylin.job.execution.AbstractExecutable.
> execute(AbstractExecutable.java:124)
>  at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(
> DefaultChainedExecutable.java:64)
>  at org.apache.kylin.job.execution.AbstractExecutable.
> execute(AbstractExecutable.java:124)
>  at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(
> DefaultScheduler.java:142)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:748)
> 2017-07-14 10:41:55,687 DEBUG [Job 14691c4a-64d2-4b1d-ace5-
> d2d6ad9618d0-297] dao.ExecutableDao:217 : updating
> job output, id: 14691c4a-64d2-4b1d-ace5-d2d6ad9618d0-02
> 2017-07-14 10:41:55,697 DEBUG [Job 14691c4a-64d2-4b1d-ace5-
> d2d6ad9618d0-297] dao.ExecutableDao:217 : updating
> job output, id: 14691c4a-64d2-4b1d-ace5-d2d6ad9618d0-02
> 2017-07-14 10:41:55,703 INFO  [Job 14691c4a-64d2-4b1d-ace5-
> d2d6ad9618d0-297] execution.ExecutableManager:389 : job
> id:14691c4a-64d2-4b1d-ace5-d2d6ad9618d0-02 from RUNNING to ERROR
> 2017-07-14 10:41:55,713 DEBUG [Job 14691c4a-64d2-4b1d-ace5-
> d2d6ad9618d0-297] dao.ExecutableDao:217 : updating
> job output, id: 14691c4a-64d2-4b1d-ace5-d2d6ad9618d0
> 2017-07-14 10:41:55,728 DEBUG [Job 14691c4a-64d2-4b1d-ace5-
> d2d6ad9618d0-297] dao.ExecutableDao:217 : updating
> job output, id: 14691c4a-64d2-4b1d-ace5-d2d6ad9618d0
> 2017-07-14 10:41:55,731 DEBUG [Job 14691c4a-64d2-4b1d-ace5-
> d2d6ad9618d0-297] dao.ExecutableDao:217 : updating
> job output, id: 14691c4a-64d2-4b1d-ace5-d2d6ad9618d0
> 2017-07-14 10:41:55,734 INFO  [Job 14691c4a-64d2-4b1d-ace5-
> d2d6ad9618d0-297] execution.ExecutableManager:389 : job
> id:14691c4a-64d2-4b1d-ace5-d2d6ad9618d0 from RUNNING to ERROR
> 2017-07-14 10:41:55,734 WARN  [Job 14691c4a-64d2-4b1d-ace5-
> d2d6ad9618d0-297] execution.AbstractExecutable:258 : no
> need to send email, user list is empty
> 2017-07-14 10:41:55,745 INFO  [pool-9-thread-1] threadpool.
> DefaultScheduler:124 : Job Fetcher: 0 should running, 0
> actual running, 0 stopped, 0 ready, 10 already succeed, 9
> error, 6 discarded, 0 others
> 2017-07-14 10:42:10,431 INFO  [pool-9-thread-1] threadpool.
> DefaultScheduler:124 : Job Fetcher: 0 should running, 0
> actual running, 0 stopped, 0 ready, 10 already succeed, 9
> error, 6 discarded, 0 others
> 2017-07-14 10:43:10,432 INFO  [pool-9-thread-1] threadpool.
> DefaultScheduler:124 : Job Fetcher: 0 should running, 0
> actual running, 0 stopped, 0 ready, 10 already succeed, 9
> error, 6 discarded, 0 others
> 2017-07-14 10:44:10,431 INFO  [pool-9-thread-1] threadpool.
> DefaultScheduler:124 : Job Fetcher: 0 should running, 0
> actual running, 0 stopped, 0 ready, 10 already succeed, 9
> error, 6 discarded, 0 others
>
>
> ------------------------------------------------------------------
> 发件人：crossme <cros...@aliyun.com>
> 发送时间：2017年7月13日(星期四) 19:51
> 收件人：user <user@kylin.apache.org>
> 主 题：common.HadoopJobStatusChecker:58 : error check status
>
> Hi All
>
>     The Cube build error on Step 3 Extract Fact Table Distinct Columns.
> Here is the error message. Any help please.
>
>     Explain： This created 4 Cube test, only one of which
> all processes run, Cube successfully constructed, can
> query, the rest of the Cube in the third step error, the
> error log is below, do not achieve the status of the Job.
>
> production environment:
>         CDH-5.9   Kylin-2.0
>
> 2017-07-13 14:16:39,835 INFO  [Job 3895f42c-8ee4-4eee-a0fc-
> 9b511f9c0be4-437] mapred.ClientServiceDelegate:277 : Application state is
> completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job
> history server
> 2017-07-13 14:16:39,849 ERROR [Job 3895f42c-8ee4-4eee-a0fc-
> 9b511f9c0be4-437] common.HadoopJobStatusChecker:58 : error check status
> java.io.IOException: Job status not available
>  at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:334)
>  at org.apache.hadoop.mapreduce.Job.getStatus(Job.java:341)
>  at org.apache.kylin.engine.mr.common.HadoopJobStatusChecker.checkStatus(
> HadoopJobStatusChecker.java:38)
>  at org.apache.kylin.engine.mr.common.MapReduceExecutable.
> doWork(MapReduceExecutable.java:153)
>  at org.apache.kylin.job.execution.AbstractExecutable.
> execute(AbstractExecutable.java:124)
>  at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(
> DefaultChainedExecutable.java:64)
>  at org.apache.kylin.job.execution.AbstractExecutable.
> execute(AbstractExecutable.java:124)
>  at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(
> DefaultScheduler.java:142)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:748)
> 2017-07-13 14:16:39,850 ERROR [Job 3895f42c-8ee4-4eee-a0fc-
> 9b511f9c0be4-437] common.MapReduceExecutable:197 : error execute
> MapReduceExecutable{id=3895f42c-8ee4-4eee-a0fc-
> 9b511f9c0be4-07, name=Convert Cuboid Data to HFile, state=RUNNING}
> java.lang.NullPointerException
>  at org.apache.hadoop.mapreduce.Job.getTrackingURL(Job.java:380)
>  at org.apache.kylin.engine.mr.common.HadoopCmdOutput.
> getInfo(HadoopCmdOutput.java:61)
>  at org.apache.kylin.engine.mr.common.MapReduceExecutable.
> doWork(MapReduceExecutable.java:162)
>  at org.apache.kylin.job.execution.AbstractExecutable.
> execute(AbstractExecutable.java:124)
>  at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(
> DefaultChainedExecutable.java:64)
>  at org.apache.kylin.job.execution.AbstractExecutable.
> execute(AbstractExecutable.java:124)
>  at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(
> DefaultScheduler.java:142)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:748)
>
>
> yarn-site.xml
>
> <property>
>          <name>yarn.resourcemanager.webapp.address</name>
>          <value>dn1:8088</value>
> </property>
>
>
>

无标题.png
Description: Binary data

Re: common.HadoopJobStatusChecker:58 : error check status

Reply via email to