[ 
https://issues.apache.org/jira/browse/OOZIE-588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164860#comment-13164860
 ] 

[email protected] commented on OOZIE-588:
-----------------------------------------------------



bq.  On 2011-12-07 21:29:50, Prakhar Sharma wrote:
bq.  > 
trunk/core/src/main/java/org/apache/oozie/client/rest/JsonWorkflowAction.java, 
line 325
bq.  > <https://reviews.apache.org/r/3047/diff/1/?file=62775#file62775line325>
bq.  >
bq.  >     [Cosmetic comment]
bq.  >     Place getStats() next to setStats()?

Moved.


bq.  On 2011-12-07 21:29:50, Prakhar Sharma wrote:
bq.  > 
trunk/core/src/main/java/org/apache/oozie/action/hadoop/MapReduceActionExecutor.java,
 line 200
bq.  > <https://reviews.apache.org/r/3047/diff/1/?file=62774#file62774line200>
bq.  >
bq.  >     [Cosmetic comment]
bq.  >     Compiler will most likely remove local variables 'name' and 'value'. 
You can also :)

Changed.


bq.  On 2011-12-07 21:29:50, Prakhar Sharma wrote:
bq.  > 
trunk/core/src/main/java/org/apache/oozie/action/hadoop/MapReduceActionExecutor.java,
 line 158
bq.  > <https://reviews.apache.org/r/3047/diff/1/?file=62774#file62774line158>
bq.  >
bq.  >     Remove <<Params>> :)

Removed.


bq.  On 2011-12-07 21:29:50, Prakhar Sharma wrote:
bq.  > 
trunk/core/src/main/java/org/apache/oozie/action/hadoop/MapReduceActionExecutor.java,
 line 152
bq.  > <https://reviews.apache.org/r/3047/diff/1/?file=62774#file62774line152>
bq.  >
bq.  >     Link to the comment above regarding ActionType parameter to MRStats 
ctor being redundant.

The first argument is no more required to be passed (ActionType is set 
implicitly inside the class based on the type of stats).


bq.  On 2011-12-07 21:29:50, Prakhar Sharma wrote:
bq.  > trunk/core/src/main/java/org/apache/oozie/action/hadoop/MRStats.java, 
line 32
bq.  > <https://reviews.apache.org/r/3047/diff/1/?file=62773#file62773line32>
bq.  >
bq.  >     Isn't ActionType parameter to the ctor redundant? Will this class 
ever create an object with value of ActionType parameter other than MAP_REDUCE? 
If yes, then you can ignore this comment.

Yes, it seems redundant. Removing it from here. Instead, it ll be set inside 
the ctor based on the type of action.


bq.  On 2011-12-07 21:29:50, Prakhar Sharma wrote:
bq.  > 
trunk/core/src/main/java/org/apache/oozie/action/hadoop/ActionStats.java, line 
39
bq.  > <https://reviews.apache.org/r/3047/diff/1/?file=62771#file62771line39>
bq.  >
bq.  >     Parent class should not know the type of derived classes. 
setCurrentActionType is being called only from derived classes. This method can 
be just removed.

Removing the setter.


bq.  On 2011-12-07 21:29:50, Prakhar Sharma wrote:
bq.  > 
trunk/core/src/main/java/org/apache/oozie/action/hadoop/ActionStats.java, line 
25
bq.  > <https://reviews.apache.org/r/3047/diff/1/?file=62771#file62771line25>
bq.  >
bq.  >     Make currentActionType protected and set it directly in derived 
classes and do away with the corresponding setter.
bq.  >     
bq.  >     Even better design will be not to have the currentActionType in 
parent class altogether and have an abstract method (like toJSON()) which 
returns the type. Motivation being: Parent should not care who is deriving from 
it.

Made currentActionType protected. Corresponding setter will be removed.


bq.  On 2011-12-07 21:29:50, Prakhar Sharma wrote:
bq.  > trunk/core/src/main/java/org/apache/oozie/DagELFunctions.java, line 116
bq.  > <https://reviews.apache.org/r/3047/diff/1/?file=62768#file62768line116>
bq.  >
bq.  >     Rational behind short circuit condition: action.getData() == null
bq.  >     
bq.  >     Inside the if block you are not accessing return value of 
action.getData(). Looks like you might end up missing some cases due to the 
check.
bq.  >     
bq.  >     If ensuring (action.getData() != null && 
action.getExternalChildIds() != null) not being true is the motivation, then 
may be consider throwing an exception?

Agreed. This might miss some cases.

Removing getData() check from the if condition check. Retaining just the 
required check alone (which is to test that externalChildIDs are not null).


bq.  On 2011-12-07 21:29:50, Prakhar Sharma wrote:
bq.  > trunk/client/src/main/java/org/apache/oozie/client/rest/JsonTags.java, 
line 69
bq.  > <https://reviews.apache.org/r/3047/diff/1/?file=62766#file62766line69>
bq.  >
bq.  >     Rename "externalchildids" to "externalChildIds".
bq.  >     Rest of the variable values appear to be following the later format.

Renamed.


bq.  On 2011-12-07 21:29:50, Prakhar Sharma wrote:
bq.  > trunk/core/src/main/java/org/apache/oozie/DagELFunctions.java, line 42
bq.  > <https://reviews.apache.org/r/3047/diff/1/?file=62768#file62768line42>
bq.  >
bq.  >     Specific reason for the trailing ':' in value of HADOOP_JOBS_PREFIX?

Hadoop Job IDs are stored in the WorkflowInstance as a property 
(hadoopJobs:job1,job2). Since, we are creating this property in the 
DAGELFunctions.java, we need to prefix "hadoopJobs:" to the data retrieved from 
getExternalChildIDs() function - ("job1,job2").

For the other variables stored in WorkflowInstance, this prefixing is not 
required as they are already retrieved as properties (A:a).


- params


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3047/#review3711
-----------------------------------------------------------


On 2011-12-07 23:53:52, params wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3047/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-12-07 23:53:52)
bq.  
bq.  
bq.  Review request for oozie, Mohammad Islam and Angelo K. Huang.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  OOZIE-588: Oozie to allow drill down to hadoop job's details
bq.  
bq.  Summary:
bq.  https://issues.apache.org/jira/browse/OOZIE-588
bq.  
bq.  Features Implemented:
bq.  - Expose action stats information (hadoop counters and pig stats) through 
CLI -info -verbose option
bq.  - Expose external child IDs (eg: hadoop job IDs for the pig action) 
through CLI -info -verbose option
bq.  - Expose job counter info as EL Functions
bq.  - Expose external child IDs as EL Functions
bq.  - User level configuration to specify if stats should be collected or not 
(with a default setting)
bq.  - Oozie level configuration to specify the max allowable size for stats
bq.  
bq.  
bq.  This addresses bug OOZIE-588.
bq.      https://issues.apache.org/jira/browse/OOZIE-588
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/client/src/main/java/org/apache/oozie/cli/OozieCLI.java 1211308 
bq.    trunk/client/src/main/java/org/apache/oozie/client/WorkflowAction.java 
1211308 
bq.    trunk/client/src/main/java/org/apache/oozie/client/rest/JsonTags.java 
1211308 
bq.    trunk/client/src/main/java/org/apache/oozie/client/rest/JsonToBean.java 
1211308 
bq.    trunk/core/src/main/java/org/apache/oozie/DagELFunctions.java 1211308 
bq.    trunk/core/src/main/java/org/apache/oozie/WorkflowActionBean.java 
1211308 
bq.    trunk/core/src/main/java/org/apache/oozie/action/ActionExecutor.java 
1211308 
bq.    trunk/core/src/main/java/org/apache/oozie/action/hadoop/ActionStats.java 
PRE-CREATION 
bq.    trunk/core/src/main/java/org/apache/oozie/action/hadoop/ActionType.java 
PRE-CREATION 
bq.    trunk/core/src/main/java/org/apache/oozie/action/hadoop/MRStats.java 
PRE-CREATION 
bq.    
trunk/core/src/main/java/org/apache/oozie/action/hadoop/MapReduceActionExecutor.java
 1211308 
bq.    
trunk/core/src/main/java/org/apache/oozie/client/rest/JsonWorkflowAction.java 
1211308 
bq.    trunk/core/src/main/java/org/apache/oozie/command/wf/ActionCommand.java 
1211308 
bq.    trunk/core/src/main/java/org/apache/oozie/command/wf/ActionXCommand.java 
1211308 
bq.    
trunk/core/src/main/java/org/apache/oozie/executor/jpa/WorkflowActionGetJPAExecutor.java
 1211308 
bq.    
trunk/core/src/main/java/org/apache/oozie/executor/jpa/WorkflowActionSubsetGetJPAExecutor.java
 1211308 
bq.    
trunk/core/src/main/java/org/apache/oozie/executor/jpa/WorkflowActionsGetForJobJPAExecutor.java
 1211308 
bq.    
trunk/core/src/main/java/org/apache/oozie/executor/jpa/WorkflowActionsRunningGetJPAExecutor.java
 1211308 
bq.    
trunk/core/src/main/java/org/apache/oozie/executor/jpa/WorkflowJobGetActionsJPAExecutor.java
 1211308 
bq.    trunk/core/src/main/java/org/apache/oozie/store/WorkflowStore.java 
1211308 
bq.    trunk/core/src/test/java/org/apache/oozie/TestActionBean.java 1211308 
bq.    
trunk/core/src/test/java/org/apache/oozie/action/decision/TestDecisionActionExecutor.java
 1211308 
bq.    
trunk/core/src/test/java/org/apache/oozie/action/hadoop/ActionExecutorTestCase.java
 1211308 
bq.    
trunk/core/src/test/java/org/apache/oozie/action/hadoop/TestHadoopELFunctions.java
 1211308 
bq.    
trunk/core/src/test/java/org/apache/oozie/action/hadoop/TestMapReduceActionError.java
 1211308 
bq.    
trunk/core/src/test/java/org/apache/oozie/action/hadoop/TestMapReduceActionExecutor.java
 1211308 
bq.    
trunk/core/src/test/java/org/apache/oozie/action/ssh/TestSshActionExecutor.java 
1211308 
bq.    
trunk/core/src/test/java/org/apache/oozie/client/rest/TestJsonWorkflowAction.java
 1211308 
bq.  
bq.  Diff: https://reviews.apache.org/r/3047/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  - Tested against existing test cases and added tests where required.
bq.  - Tests for the new classes MRStats and OoziePigStats have not been added. 
This is work in progress.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  params
bq.  
bq.


                
> Oozie to allow drill down to hadoop job's details  
> ---------------------------------------------------
>
>                 Key: OOZIE-588
>                 URL: https://issues.apache.org/jira/browse/OOZIE-588
>             Project: Oozie
>          Issue Type: New Feature
>            Reporter: Mohammad Kamrul Islam
>
> High-level Requirements:
> -----------------------------------
> Since Oozie is designed as the gateway to grid, we need to support WS API for 
> most common hadoop commands through Oozie.  User doesn't want to go to 
> multiple system to get the required data. Based on these, we propose to 
> implement the following requirements into Oozie.
>  
> R1: Oozie will provide WS endpoints to get hadoop job details (including job 
> counters).
> R2: It will support both types of hadoop jobs : MR job created for MR action, 
>  MR jobs created as part of pig script.
> R3:  In addition, for pig action, oozie will provide a way to query the pig 
> stats.
> Proposed design:
> ----------------------
>      D1: Oozie will store the *summary* jobcounter /pigstats into oozie DB. 
> The items in the summary stats will be determined by oozie to limit the size. 
> However,the commonly used stats will be include into the summary. It is 
> important to note that summary information will be collected *after* the job 
> finished.
>      
>     D2: If the user asks for *details* hadoop job stats ,  the user needs to 
> query using different WS API. In this query, a user will specify *a* hadoop 
> job id.  Oozie will directly query the hadoop JT/RM/HS. Since it is an 
> external call with undetermined response time, Oozie will provide only one 
> hadoop job id per-request to avoid the timeout in WS call. Caveats: If hadoop 
> is down or the job is not in JT/RM/History Server, Oozie will fail to collect 
> the details. 
>     
>      D3: For pig, Oozie will store the pig-generated hadoop ids in it DB and 
> will expose that to user throw the "verbose" query.
>      D4: Oozie will need to collect those summary pig stats and corresponding 
> job counters and store it in Oozie DB. PigStats has a way of getting job 
> counter for each hadoop job that it submits. We could use that API to collect 
> summary counters for pig-created jobs.
>      D5: The complete/detail pigstats will be stored into Pig Launcher Mapper 
> as job counter. So that if a user wants to get the detail pig stats, we could 
> get it from the LM directly.
>      
> Open questions:
> ----------------------
> * What should be in the summary counters/stats? 
> * What is the max size of stats?
> Advanced planning: <Not in the scope of this task, but might required for 
> design to support later>
> --------------------------
> * Some users are asking to query the job stats when the job is RUNNING. They 
> need it to decide for subsequent job submissions.
> * By the above design , user could use D2, to get the counter when MR action 
> is running.
> * However, for pig, it is not that straight forward. Because Pig submits the 
> jobs during execution. But the new PigRunner provide a listener concept where 
> user can get the notifications such as when a new MR job submitted and its ID.
> * By using this, Oozie could get the running hadoop job id instantly. In 
> future, user might want this to query using D2.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to