https://bugzilla.wikimedia.org/show_bug.cgi?id=71948

--- Comment #1 from christ...@quelltextlich.at ---
The Oozie job for checking that partition has status KILLED [1], and
seems to have been killed by user hdfs at 17:28 [2].
A few minutes later, bundles have been restarted, so I assume the
killing of the partition checking happend deliberately.

However, since the job's sequence statistics have not been fully
computed (Killed at 95% of reduce step), I started the recomputation
job by hand.

Sequence stats recomputation is done, and the partition has neither
missing nor duplicates.

Hence, I manually marked the partition good.


[1]

qchris@analytics1027:~$ oozie job -verbose -info
0037425-140725140105408-oozie-oozi-W
Job ID : 0037425-140725140105408-oozie-oozi-W
------------------------------------------------------------------------------------------------------------------------------------
Workflow Name : hive_add_partition-wmf_raw.webrequest-upload,2014,10,10,15-wf
App Path      :
hdfs://analytics-hadoop/wmf/refinery/current/oozie/webrequest/partition/add/workflow.xml
Status        : KILLED
Run           : 0
User          : hdfs
Group         : -
Created       : 2014-10-10 17:04:54 GMT
Started       : 2014-10-10 17:04:54 GMT
Last Modified : 2014-10-10 17:28:15 GMT
Ended         : 2014-10-10 17:28:13 GMT
CoordAction ID: 0003812-140725140105408-oozie-oozi-C@2060

Actions
------------------------------------------------------------------------------------------------------------------------------------
ID      Console URL     Error Code      Error Message   External ID    
External Status Name    Retries Tracker URI     Type    Started Status  Ended
------------------------------------------------------------------------------------------------------------------------------------
0037425-140725140105408-oozie-oozi-W@:start:    -       -       -       -      
OK      :start: 0       -       :START: 2014-10-10 17:04:54 GMT OK     
2014-10-10 17:04:54 GMT
------------------------------------------------------------------------------------------------------------------------------------
0037425-140725140105408-oozie-oozi-W@add_partition     
http://analytics1027.eqiad.wmnet:11000/oozie?job=0037426-140725140105408-oozie-oozi-W
  -       -     0037426-140725140105408-oozie-oozi-W     SUCCEEDED      
add_partition   0       local   sub-workflow    2014-10-10 17:04:54 GMT OK     
2014-10-10 17:05:11 GMT
------------------------------------------------------------------------------------------------------------------------------------
0037425-140725140105408-oozie-oozi-W@generate_sequence_statistics      
http://analytics1010.eqiad.wmnet:8088/proxy/application_1409078537822_38526/   
-     -job_1409078537822_38526 KILLED  generate_sequence_statistics    0      
resourcemanager.analytics.eqiad.wmnet:8032      hive    2014-10-10 17:05:11 GMT
KILLED2014-10-10 17:28:15 GMT
------------------------------------------------------------------------------------------------------------------------------------



[2] See HDFS's
/var/log/hadoop-yarn/apps/hdfs/logs/application_1409078537822_38526/analytics1029.eqiad.wmnet_8041
line 607:
:2014-10-10 17:28:13,907 INFO [IPC Server handler 0 on 36062]
org.apache.hadoop.mapreduce.v2.app.client.MRClientService: Kill job
job_1409078537822_38526 received from hdfs (auth:SIMPLE) at 10.64.36.127

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to