https://bugzilla.wikimedia.org/show_bug.cgi?id=71948
--- Comment #1 from christ...@quelltextlich.at --- The Oozie job for checking that partition has status KILLED [1], and seems to have been killed by user hdfs at 17:28 [2]. A few minutes later, bundles have been restarted, so I assume the killing of the partition checking happend deliberately. However, since the job's sequence statistics have not been fully computed (Killed at 95% of reduce step), I started the recomputation job by hand. Sequence stats recomputation is done, and the partition has neither missing nor duplicates. Hence, I manually marked the partition good. [1] qchris@analytics1027:~$ oozie job -verbose -info 0037425-140725140105408-oozie-oozi-W Job ID : 0037425-140725140105408-oozie-oozi-W ------------------------------------------------------------------------------------------------------------------------------------ Workflow Name : hive_add_partition-wmf_raw.webrequest-upload,2014,10,10,15-wf App Path : hdfs://analytics-hadoop/wmf/refinery/current/oozie/webrequest/partition/add/workflow.xml Status : KILLED Run : 0 User : hdfs Group : - Created : 2014-10-10 17:04:54 GMT Started : 2014-10-10 17:04:54 GMT Last Modified : 2014-10-10 17:28:15 GMT Ended : 2014-10-10 17:28:13 GMT CoordAction ID: 0003812-140725140105408-oozie-oozi-C@2060 Actions ------------------------------------------------------------------------------------------------------------------------------------ ID Console URL Error Code Error Message External ID External Status Name Retries Tracker URI Type Started Status Ended ------------------------------------------------------------------------------------------------------------------------------------ 0037425-140725140105408-oozie-oozi-W@:start: - - - - OK :start: 0 - :START: 2014-10-10 17:04:54 GMT OK 2014-10-10 17:04:54 GMT ------------------------------------------------------------------------------------------------------------------------------------ 0037425-140725140105408-oozie-oozi-W@add_partition http://analytics1027.eqiad.wmnet:11000/oozie?job=0037426-140725140105408-oozie-oozi-W - - 0037426-140725140105408-oozie-oozi-W SUCCEEDED add_partition 0 local sub-workflow 2014-10-10 17:04:54 GMT OK 2014-10-10 17:05:11 GMT ------------------------------------------------------------------------------------------------------------------------------------ 0037425-140725140105408-oozie-oozi-W@generate_sequence_statistics http://analytics1010.eqiad.wmnet:8088/proxy/application_1409078537822_38526/ - -job_1409078537822_38526 KILLED generate_sequence_statistics 0 resourcemanager.analytics.eqiad.wmnet:8032 hive 2014-10-10 17:05:11 GMT KILLED2014-10-10 17:28:15 GMT ------------------------------------------------------------------------------------------------------------------------------------ [2] See HDFS's /var/log/hadoop-yarn/apps/hdfs/logs/application_1409078537822_38526/analytics1029.eqiad.wmnet_8041 line 607: :2014-10-10 17:28:13,907 INFO [IPC Server handler 0 on 36062] org.apache.hadoop.mapreduce.v2.app.client.MRClientService: Kill job job_1409078537822_38526 received from hdfs (auth:SIMPLE) at 10.64.36.127 -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l