[jira] [Commented] (OOZIE-588) Oozie to allow drill down to hadoop job's details

[email protected] (Commented) (JIRA) Thu, 08 Dec 2011 11:35:09 -0800

    [ 
https://issues.apache.org/jira/browse/OOZIE-588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165467#comment-13165467
 ]


[email protected] commented on OOZIE-588:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3059/#review3708
-----------------------------------------------------------



trunk/core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java
<https://reviews.apache.org/r/3059/#comment8447>

    I believe you need to handle the HadoopAccessorException here 



trunk/core/src/main/java/org/apache/oozie/action/hadoop/LauncherMapper.java
<https://reviews.apache.org/r/3059/#comment8450>

    use a defined constant for "oozie.action.stats.properties"
    
    it seems it has been used for some properties 
[CONF_OOZIE_EXTERNAL_STATS_MAX_SIZE] but not for others.



trunk/core/src/main/java/org/apache/oozie/action/hadoop/LauncherMapper.java
<https://reviews.apache.org/r/3059/#comment8448>

    what happen if it doesn't exist



trunk/core/src/main/java/org/apache/oozie/action/hadoop/LauncherMapper.java
<https://reviews.apache.org/r/3059/#comment8451>

    same as before ... 



trunk/core/src/main/java/org/apache/oozie/action/hadoop/LauncherMapper.java
<https://reviews.apache.org/r/3059/#comment8301>

    remove extra tabs 



trunk/core/src/main/java/org/apache/oozie/action/hadoop/LauncherMapper.java
<https://reviews.apache.org/r/3059/#comment8452>

    The map method seems very long which make it hard to read and increase its 
error prune, specially with a catch all at the end. it may be better to add you 
new code in a sperate methods instead of increasing the size of the method



trunk/core/src/main/java/org/apache/oozie/action/hadoop/LauncherMapper.java
<https://reviews.apache.org/r/3059/#comment8453>

    use constant .. specially this has been used more than once in this file 



trunk/core/src/main/java/org/apache/oozie/action/hadoop/LauncherMapper.java
<https://reviews.apache.org/r/3059/#comment8454>

    same as before



trunk/core/src/main/java/org/apache/oozie/action/hadoop/OoziePigStats.java
<https://reviews.apache.org/r/3059/#comment8459>

    do not suppress "unchecked" unless you really need it



trunk/core/src/main/java/org/apache/oozie/action/hadoop/OoziePigStats.java
<https://reviews.apache.org/r/3059/#comment8455>

    same as before putting the separator in the loop doesn't seem right 
specially the first element will not be separated



trunk/core/src/main/java/org/apache/oozie/action/hadoop/OoziePigStats.java
<https://reviews.apache.org/r/3059/#comment8456>

    i do not see any type casting here, why you suppress "unchecked"



trunk/core/src/main/java/org/apache/oozie/action/hadoop/OoziePigStats.java
<https://reviews.apache.org/r/3059/#comment8457>

    same as before



trunk/core/src/main/java/org/apache/oozie/action/hadoop/OoziePigStats.java
<https://reviews.apache.org/r/3059/#comment8458>

    same as before



trunk/core/src/main/java/org/apache/oozie/action/hadoop/PigMain.java
<https://reviews.apache.org/r/3059/#comment8299>

    Another practice that is used in different pieces of code is to define the 
prefix [in this case it will be "oozie.action" as a constant and reuse it in 
the three constant. In this case if the prefex has been changed, there is not 
need to change all of them.



trunk/core/src/main/java/org/apache/oozie/action/hadoop/PigMain.java
<https://reviews.apache.org/r/3059/#comment8434>

    not clear to me if the comment reflect the reality. did you check for the 
pig version here or for the embedded python



trunk/core/src/main/java/org/apache/oozie/action/hadoop/PigMain.java
<https://reviews.apache.org/r/3059/#comment8398>

    extra line



trunk/core/src/main/java/org/apache/oozie/action/hadoop/PigMain.java
<https://reviews.apache.org/r/3059/#comment8437>

    check for the existence of the file and for exception
    



trunk/core/src/main/java/org/apache/oozie/action/hadoop/PigMain.java
<https://reviews.apache.org/r/3059/#comment8441>

    same as before or more the creation of the file to writeExternalData and 
make all the validation ther



trunk/core/src/main/java/org/apache/oozie/action/hadoop/PigMain.java
<https://reviews.apache.org/r/3059/#comment8435>

    it looks odd to have try, finally block without catching any exception. Do 
you need to handle exception here
    
    Especially if you were not able to write to a file.. at least you have to 
log the error here



trunk/core/src/main/java/org/apache/oozie/action/hadoop/PigMain.java
<https://reviews.apache.org/r/3059/#comment8442>

    make the 100 constant or configurable .. hard coding is not always a good 
programming practice



trunk/core/src/main/java/org/apache/oozie/action/hadoop/PigMain.java
<https://reviews.apache.org/r/3059/#comment8446>

    setting the value of the separator inside the for loop seems unnecessary.
    it seems also you are not separating the first jobId.
    
    



trunk/core/src/main/resources/oozie-default.xml
<https://reviews.apache.org/r/3059/#comment8397>

    you may add what -1 means here  [unlimited]


- Mohamed


On 2011-12-08 08:54:20, Virag Kothari wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3059/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-12-08 08:54:20)
bq.  
bq.  
bq.  Review request for oozie, Mohammad Islam and Angelo K. Huang.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This patch and https://reviews.apache.org/r/3047/diff addresses Oozie-588
bq.  
bq.  
bq.  Docs for how to access pig stats through el functions needs to be added 
bq.  
bq.  
bq.  This addresses bug Oozie-588.
bq.      https://issues.apache.org/jira/browse/Oozie-588
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/core/src/test/java/org/apache/oozie/action/hadoop/TestPigMain.java 
1211573 
bq.    trunk/core/src/test/java/org/apache/oozie/action/hadoop/PigTestCase.java 
1211573 
bq.    
trunk/core/src/test/java/org/apache/oozie/action/hadoop/TestJavaActionExecutor.java
 1211573 
bq.    
trunk/core/src/test/java/org/apache/oozie/action/hadoop/TestPigActionExecutor.java
 1211573 
bq.    trunk/core/src/main/resources/oozie-default.xml 1211573 
bq.    trunk/core/src/main/java/org/apache/oozie/action/hadoop/PigMain.java 
1211573 
bq.    
trunk/core/src/main/java/org/apache/oozie/action/hadoop/PigActionExecutor.java 
1211573 
bq.    
trunk/core/src/main/java/org/apache/oozie/action/hadoop/LauncherMapper.java 
1211573 
bq.    
trunk/core/src/main/java/org/apache/oozie/action/hadoop/OoziePigStats.java 
PRE-CREATION 
bq.    
trunk/core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java 
1211573 
bq.  
bq.  Diff: https://reviews.apache.org/r/3059/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Test cases for PigMain and PigAE added
bq.  Tested pig action after running examples
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Virag
bq.  
bq.


                
> Oozie to allow drill down to hadoop job's details  
> ---------------------------------------------------
>
>                 Key: OOZIE-588
>                 URL: https://issues.apache.org/jira/browse/OOZIE-588
>             Project: Oozie
>          Issue Type: New Feature
>            Reporter: Mohammad Kamrul Islam
>
> High-level Requirements:
> -----------------------------------
> Since Oozie is designed as the gateway to grid, we need to support WS API for 
> most common hadoop commands through Oozie.  User doesn't want to go to 
> multiple system to get the required data. Based on these, we propose to 
> implement the following requirements into Oozie.
>  
> R1: Oozie will provide WS endpoints to get hadoop job details (including job 
> counters).
> R2: It will support both types of hadoop jobs : MR job created for MR action, 
>  MR jobs created as part of pig script.
> R3:  In addition, for pig action, oozie will provide a way to query the pig 
> stats.
> Proposed design:
> ----------------------
>      D1: Oozie will store the *summary* jobcounter /pigstats into oozie DB. 
> The items in the summary stats will be determined by oozie to limit the size. 
> However,the commonly used stats will be include into the summary. It is 
> important to note that summary information will be collected *after* the job 
> finished.
>      
>     D2: If the user asks for *details* hadoop job stats ,  the user needs to 
> query using different WS API. In this query, a user will specify *a* hadoop 
> job id.  Oozie will directly query the hadoop JT/RM/HS. Since it is an 
> external call with undetermined response time, Oozie will provide only one 
> hadoop job id per-request to avoid the timeout in WS call. Caveats: If hadoop 
> is down or the job is not in JT/RM/History Server, Oozie will fail to collect 
> the details. 
>     
>      D3: For pig, Oozie will store the pig-generated hadoop ids in it DB and 
> will expose that to user throw the "verbose" query.
>      D4: Oozie will need to collect those summary pig stats and corresponding 
> job counters and store it in Oozie DB. PigStats has a way of getting job 
> counter for each hadoop job that it submits. We could use that API to collect 
> summary counters for pig-created jobs.
>      D5: The complete/detail pigstats will be stored into Pig Launcher Mapper 
> as job counter. So that if a user wants to get the detail pig stats, we could 
> get it from the LM directly.
>      
> Open questions:
> ----------------------
> * What should be in the summary counters/stats? 
> * What is the max size of stats?
> Advanced planning: <Not in the scope of this task, but might required for 
> design to support later>
> --------------------------
> * Some users are asking to query the job stats when the job is RUNNING. They 
> need it to decide for subsequent job submissions.
> * By the above design , user could use D2, to get the counter when MR action 
> is running.
> * However, for pig, it is not that straight forward. Because Pig submits the 
> jobs during execution. But the new PigRunner provide a listener concept where 
> user can get the notifications such as when a new MR job submitted and its ID.
> * By using this, Oozie could get the running hadoop job id instantly. In 
> future, user might want this to query using D2.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OOZIE-588) Oozie to allow drill down to hadoop job's details

Reply via email to