WorkflowFunctionalSpec.twiki release-log.txt

tucu Wed, 14 Mar 2012 16:01:55 -0700

Author: tucu
Date: Wed Mar 14 23:01:26 2012
New Revision: 1300779

URL: http://svn.apache.org/viewvc?rev=1300779&view=rev
Log:
OOZIE-646 Doc changes: Drilldown and PigStats (params via tucu)


Modified:
    incubator/oozie/trunk/docs/src/site/twiki/DG_CommandLineTool.twiki
    incubator/oozie/trunk/docs/src/site/twiki/WorkflowFunctionalSpec.twiki
    incubator/oozie/trunk/release-log.txt

Modified: incubator/oozie/trunk/docs/src/site/twiki/DG_CommandLineTool.twiki
URL: 
http://svn.apache.org/viewvc/incubator/oozie/trunk/docs/src/site/twiki/DG_CommandLineTool.twiki?rev=1300779&r1=1300778&r2=1300779&view=diff
==============================================================================
--- incubator/oozie/trunk/docs/src/site/twiki/DG_CommandLineTool.twiki 
(original)
+++ incubator/oozie/trunk/docs/src/site/twiki/DG_CommandLineTool.twiki Wed Mar 
14 23:01:26 2012
@@ -373,6 +373,33 @@ The filter option syntax is: <code>[stat
 Multiple values must be specified as different name value pairs. When multiple 
filters are specified, all Coordinator actions that satisfy any one of the 
filters will be retrieved. (The query will do an OR among all the filter values 
for the status)
 Currently, the filter option can be used only with an =info= option on 
Coordinator job.
 
+An example below shows how the =verbose= option can be used to gather action 
statistics information for a job:
+
+<verbatim>
+$ oozie job -oozie http://localhost:8080/oozie -info 
0000001-111219170928042-oozie-para-W@mr-node -verbose
+ID : 0000001-111219170928042-oozie-para-W@mr-node
+------------------------------------------------------------------------------------------------------------------------------------
+Console URL       : 
http://localhost:50030/jobdetails.jsp?jobid=job_201112191708_0006
+Error Code        : -
+Error Message     : -
+External ID       : job_201112191708_0006
+External Status   : SUCCEEDED
+Name              : mr-node
+Retries           : 0
+Tracker URI       : localhost:9001
+Type              : map-reduce
+Started           : 2011-12-20 01:12
+Status            : OK
+Ended             : 2011-12-20 01:12
+External Stats    : 
{"org.apache.hadoop.mapred.JobInProgress$Counter":{"TOTAL_LAUNCHED_REDUCES":1,"TOTAL_LAUNCHED_MAPS":1,"DATA_LOCAL_MAPS":1},"ACTION_TYPE":"MAP_REDUCE","FileSystemCounters":{"FILE_BYTES_READ":1746,"HDFS_BYTES_READ":1409,"FILE_BYTES_WRITTEN":3524,"HDFS_BYTES_WRITTEN":1547},"org.apache.hadoop.mapred.Task$Counter":{"REDUCE_INPUT_GROUPS":33,"COMBINE_OUTPUT_RECORDS":0,"MAP_INPUT_RECORDS":33,"REDUCE_SHUFFLE_BYTES":0,"REDUCE_OUTPUT_RECORDS":33,"SPILLED_RECORDS":66,"MAP_OUTPUT_BYTES":1674,"MAP_INPUT_BYTES":1409,"MAP_OUTPUT_RECORDS":33,"COMBINE_INPUT_RECORDS":0,"REDUCE_INPUT_RECORDS":33}}
+External ChildIDs : null
+------------------------------------------------------------------------------------------------------------------------------------
+</verbatim>
+
+The two fields External Stats and External ChildIDs display the action 
statistics information (that includes counter information in case of MR action 
and PigStats information in case of a pig action) and child ids of the given 
job.
+
+Note that the user can turn on/off External Stats by specifying the property 
_oozie.action.external.stats.write_ as _true_ or _false_ in workflow.xml. By 
default, it is set to false (not to collect External Stats). External ChildIDs 
will always be stored.
+
 ---+++ Checking the xml definition of a Workflow, Coordinator or Bundle Job
 
 Example:

Modified: incubator/oozie/trunk/docs/src/site/twiki/WorkflowFunctionalSpec.twiki
URL: 
http://svn.apache.org/viewvc/incubator/oozie/trunk/docs/src/site/twiki/WorkflowFunctionalSpec.twiki?rev=1300779&r1=1300778&r2=1300779&view=diff
==============================================================================
--- incubator/oozie/trunk/docs/src/site/twiki/WorkflowFunctionalSpec.twiki 
(original)
+++ incubator/oozie/trunk/docs/src/site/twiki/WorkflowFunctionalSpec.twiki Wed 
Mar 14 23:01:26 2012
@@ -655,6 +655,8 @@ The =configuration= element, if present,
 Properties specified in the =configuration= element override properties 
specified in the file specified in the
  =job-xml= element.
 
+External Stats can be turned on/off by specifying the property 
_oozie.action.external.stats.write_ as _true_ or _false_ in the configuration 
element of workflow.xml. The default value for this property is _false_.
+
 The =file= element, if present, must specify the target sybolic link for 
binaries by separating the original file and target with a # 
(file#target-sym-link). This is not required for libraries.
 
 The =mapper= and =reducer= process for streaming jobs, should specify the 
executable command with URL encoding. e.g. '%' should be replaced by '%25'.
@@ -685,6 +687,10 @@ The =mapper= and =reducer= process for s
                     <name>mapred.reduce.tasks</name>
                     <value>${firstJobReducers}</value>
                 </property>
+                <property>
+                    <name>oozie.action.external.stats.write</name>
+                    <value>true</value>
+                </property>
             </configuration>
         </map-reduce>
         <ok to="myNextAction"/>
@@ -893,6 +899,8 @@ The =configuration= element, if present,
 Properties specified in the =configuration= element override properties 
specified in the file specified in the
  =job-xml= element.
 
+External Stats can be turned on/off by specifying the property 
_oozie.action.external.stats.write_ as _true_ or _false_ in the configuration 
element of workflow.xml. The default value for this property is _false_.
+
 The inline and job-xml configuration properties are passed to the Hadoop jobs 
submitted by Pig runtime.
 
 The =script= element contains the pig script to execute. The pig script can be 
templatized with variables of the
@@ -927,6 +935,10 @@ All the above elements can be parameteri
                     <name>mapred.compress.map.output</name>
                     <value>true</value>
                 </property>
+                <property>
+                    <name>oozie.action.external.stats.write</name>
+                    <value>true</value>
+                </property>
             </configuration>
             <script>/mypigscript.pig</script>
             <argument>-param</argument>
@@ -1550,7 +1562,273 @@ The outer Map key is a counter group nam
 
 The Hadoop EL constants defined in the previous section provide access to the 
Hadoop built in record counters.
 
----++++ 4.2.6 HDFS EL Functions
+This function can also be used to access specific action statistics 
information. Examples of action stats and their usage through EL Functions 
(referenced in workflow xml) are given below.
+
+*Example of MR action stats:*
+<verbatim>
+{
+    "ACTION_TYPE": "MAP_REDUCE",
+    "org.apache.hadoop.mapred.JobInProgress$Counter": {
+        "TOTAL_LAUNCHED_REDUCES": 1,
+        "TOTAL_LAUNCHED_MAPS": 1,
+        "DATA_LOCAL_MAPS": 1
+    },
+    "FileSystemCounters": {
+        "FILE_BYTES_READ": 1746,
+        "HDFS_BYTES_READ": 1409,
+        "FILE_BYTES_WRITTEN": 3524,
+        "HDFS_BYTES_WRITTEN": 1547
+    },
+    "org.apache.hadoop.mapred.Task$Counter": {
+        "REDUCE_INPUT_GROUPS": 33,
+        "COMBINE_OUTPUT_RECORDS": 0,
+        "MAP_INPUT_RECORDS": 33,
+        "REDUCE_SHUFFLE_BYTES": 0,
+        "REDUCE_OUTPUT_RECORDS": 33,
+        "SPILLED_RECORDS": 66,
+        "MAP_OUTPUT_BYTES": 1674,
+        "MAP_INPUT_BYTES": 1409,
+        "MAP_OUTPUT_RECORDS": 33,
+        "COMBINE_INPUT_RECORDS": 0,
+        "REDUCE_INPUT_RECORDS": 33
+    }
+}
+</verbatim>
+
+Below is the workflow that describes how to access specific information using 
hadoop:counters() EL function from the MR stats.
+*Workflow xml:*
+<verbatim>
+<workflow-app xmlns="uri:oozie:workflow:0.2" name="map-reduce-wf">
+    <start to="mr-node"/>
+    <action name="mr-node">
+        <map-reduce>
+            <job-tracker>${jobTracker}</job-tracker>
+            <name-node>${nameNode}</name-node>
+            <prepare>
+                <delete 
path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/${outputDir}"/>
+            </prepare>
+            <configuration>
+                <property>
+                    <name>mapred.job.queue.name</name>
+                    <value>${queueName}</value>
+                </property>
+                <property>
+                    <name>mapred.mapper.class</name>
+                    <value>org.apache.oozie.example.SampleMapper</value>
+                </property>
+                <property>
+                    <name>mapred.reducer.class</name>
+                    <value>org.apache.oozie.example.SampleReducer</value>
+                </property>
+                <property>
+                    <name>mapred.map.tasks</name>
+                    <value>1</value>
+                </property>
+                <property>
+                    <name>mapred.input.dir</name>
+                    
<value>/user/${wf:user()}/${examplesRoot}/input-data/text</value>
+                </property>
+                <property>
+                    <name>mapred.output.dir</name>
+                    
<value>/user/${wf:user()}/${examplesRoot}/output-data/${outputDir}</value>
+                </property>
+                               <property>
+                                       
<name>oozie.action.external.stats.write</name>
+                                       <value>true</value>
+                               </property>
+            </configuration>
+        </map-reduce>
+        <ok to="java1"/>
+        <error to="fail"/>
+    </action>
+    <action name="java1">
+        <java>
+           <job-tracker>${jobTracker}</job-tracker>
+           <name-node>${nameNode}</name-node>
+           <configuration>
+              <property>
+                   <name>mapred.job.queue.name</name>
+                   <value>${queueName}</value>
+               </property>
+           </configuration>
+           <main-class>MyTest</main-class>
+           <arg>  
${hadoop:counters("mr-node")["FileSystemCounters"]["FILE_BYTES_READ"]}</arg>
+        <capture-output/>
+        </java>
+        <ok to="end" />
+        <error to="fail" />
+    </action>
+    <kill name="fail">
+        <message>Map/Reduce failed, error 
message[${wf:errorMessage(wf:lastErrorNode())}]</message>
+    </kill>
+    <end name="end"/>
+</workflow-app>
+</verbatim>
+
+
+*Example of Pig action stats:*
+<verbatim>
+{
+    "ACTION_TYPE": "PIG",
+    "JOB_GRAPH": "job_201112191708_0008",
+    "PIG_VERSION": "0.9.0",
+    "FEATURES": "UNKNOWN",
+    "ERROR_MESSAGE": null,
+    "NUMBER_JOBS": "1",
+    "RECORD_WRITTEN": "33",
+    "BYTES_WRITTEN": "1410",
+    "HADOOP_VERSION": "0.20.2",
+    "SCRIPT_ID": "bbe016e9-f678-43c3-96fc-f24359957582",
+    "PROACTIVE_SPILL_COUNT_RECORDS": "0",
+    "PROACTIVE_SPILL_COUNT_OBJECTS": "0",
+    "RETURN_CODE": "0",
+    "ERROR_CODE": "-1",
+    "SMM_SPILL_COUNT": "0",
+    "DURATION": "36850",
+    "job_201112191708_0008": {
+        "MAP_INPUT_RECORDS": "33",
+        "MIN_REDUCE_TIME": "0",
+        "MULTI_STORE_COUNTERS": {},
+        "MAX_REDUCE_TIME": "0",
+        "NUMBER_REDUCES": "0",
+        "ERROR_MESSAGE": null,
+        "RECORD_WRITTEN": "33",
+        "HDFS_BYTES_WRITTEN": "1410",
+        "JOB_ID": "job_201112191708_0008",
+        "REDUCE_INPUT_RECORDS": "0",
+        "AVG_REDUCE_TIME": "0",
+        "MAX_MAP_TIME": "9169",
+        "BYTES_WRITTEN": "1410",
+        "Alias": "A,B",
+        "REDUCE_OUTPUT_RECORDS": "0",
+        "SMMS_SPILL_COUNT": "0",
+        "PROACTIVE_SPILL_COUNT_RECS": "0",
+        "PROACTIVE_SPILL_COUNT_OBJECTS": "0",
+        "HADOOP_COUNTERS": null,
+        "MIN_MAP_TIME": "9169",
+        "MAP_OUTPUT_RECORDS": "33",
+        "AVG_MAP_TIME": "9169",
+        "FEATURE": "MAP_ONLY",
+        "NUMBER_MAPS": "1"
+    }
+}
+</verbatim>
+
+Below is the workflow that describes how to access specific information using 
hadoop:counters() EL function from the Pig stats.
+*Workflow xml:*
+<verbatim>
+<workflow-app xmlns="uri:oozie:workflow:0.2" name="pig-wf">
+    <start to="pig-node"/>
+    <action name="pig-node">
+        <pig>
+            <job-tracker>${jobTracker}</job-tracker>
+            <name-node>${nameNode}</name-node>
+            <prepare>
+                <delete 
path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/pig"/>
+            </prepare>
+            <configuration>
+                <property>
+                    <name>mapred.job.queue.name</name>
+                    <value>${queueName}</value>
+                </property>
+                <property>
+                    <name>mapred.compress.map.output</name>
+                    <value>true</value>
+                </property>
+                               <property>
+                                       
<name>oozie.action.external.stats.write</name>
+                                       <value>true</value>
+                               </property>
+            </configuration>
+            <script>id.pig</script>
+            
<param>INPUT=/user/${wf:user()}/${examplesRoot}/input-data/text</param>
+            
<param>OUTPUT=/user/${wf:user()}/${examplesRoot}/output-data/pig</param>
+        </pig>
+        <ok to="java1"/>
+        <error to="fail"/>
+    </action>
+    <action name="java1">
+        <java>
+            <job-tracker>${jobTracker}</job-tracker>
+            <name-node>${nameNode}</name-node>
+            <configuration>
+               <property>
+                    <name>mapred.job.queue.name</name>
+                    <value>${queueName}</value>
+                </property>
+            </configuration>
+            <main-class>MyTest</main-class>
+            <arg>  ${hadoop:counters("pig-node")["JOB_GRAPH"]}</arg>
+            <capture-output/>
+        </java>
+        <ok to="end" />
+        <error to="fail" />
+    </action>
+    <kill name="fail">
+        <message>Pig failed, error 
message[${wf:errorMessage(wf:lastErrorNode())}]</message>
+    </kill>
+    <end name="end"/>
+</workflow-app>
+</verbatim>
+
+---++++ 4.2.6 Hadoop Jobs EL Function
+
+The function _wf:actionData()_ can be used to access Hadoop ID's for actions 
such as Pig, by specifying the key as _hadoopJobs_.
+An example is shown below.
+
+<verbatim>
+<workflow-app xmlns="uri:oozie:workflow:0.2" name="pig-wf">
+    <start to="pig-node"/>
+    <action name="pig-node">
+        <pig>
+            <job-tracker>${jobTracker}</job-tracker>
+            <name-node>${nameNode}</name-node>
+            <prepare>
+                <delete 
path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/pig"/>
+            </prepare>
+            <configuration>
+                <property>
+                    <name>mapred.job.queue.name</name>
+                    <value>${queueName}</value>
+                </property>
+                <property>
+                    <name>mapred.compress.map.output</name>
+                    <value>true</value>
+                </property>
+            </configuration>
+            <script>id.pig</script>
+            
<param>INPUT=/user/${wf:user()}/${examplesRoot}/input-data/text</param>
+            
<param>OUTPUT=/user/${wf:user()}/${examplesRoot}/output-data/pig</param>
+        </pig>
+        <ok to="java1"/>
+        <error to="fail"/>
+    </action>
+    <action name="java1">
+        <java>
+            <job-tracker>${jobTracker}</job-tracker>
+            <name-node>${nameNode}</name-node>
+            <configuration>
+               <property>
+                    <name>mapred.job.queue.name</name>
+                    <value>${queueName}</value>
+                </property>
+            </configuration>
+            <main-class>MyTest</main-class>
+            <arg> ${wf:actionData("pig-node")["hadoopJobs"]}</arg>
+            <capture-output/>
+        </java>
+        <ok to="end" />
+        <error to="fail" />
+    </action>
+    <kill name="fail">
+        <message>Pig failed, error 
message[${wf:errorMessage(wf:lastErrorNode())}]</message>
+    </kill>
+    <end name="end"/>
+</workflow-app>
+</verbatim>
+</verbatim>
+
+---++++ 4.2.7 HDFS EL Functions
 
 For all the functions in this section the path must include the FS URI. For 
example =hdfs://foo:9000/user/tucu=.
 

Modified: incubator/oozie/trunk/release-log.txt
URL: 
http://svn.apache.org/viewvc/incubator/oozie/trunk/release-log.txt?rev=1300779&r1=1300778&r2=1300779&view=diff
==============================================================================
--- incubator/oozie/trunk/release-log.txt (original)
+++ incubator/oozie/trunk/release-log.txt Wed Mar 14 23:01:26 2012
@@ -1,5 +1,6 @@
 -- Oozie 3.2.0 release
 
+OOZIE-646 Doc changes: Drilldown and PigStats (params via tucu)
 OOZIE-4   Configuration of Maximum output len for each action (harsh via tucu)
 OOZIE-759 Fix config defaults in EmailActionExecutor (harsh via tucu)
 OOZIE-675 checkMultipleTimeInstances doesn't work for EL extensions. EX: 
${coord:formatTime(coord:current(0),'yyyy-MM-dd')} (sriksun via tucu)

svn commit: r1300779 - in /incubator/oozie/trunk: docs/src/site/twiki/DG_CommandLineTool.twiki docs/src/site/twiki/WorkflowFunctionalSpec.twiki release-log.txt

Reply via email to