[jira] [Commented] (CRUNCH-272) Unable to correlate crunch jobs within Oozie

2014-06-28 Thread Micah Whitacre (JIRA)

[ 
https://issues.apache.org/jira/browse/CRUNCH-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047038#comment-14047038
 ] 

Micah Whitacre commented on CRUNCH-272:
---

I completely agree with Robert that it'd be best if the Crunch action as part 
of Oozie as it make integration and deployment a lot easier.  The difficulty 
the custom action will face is adding structure to how a Crunch Pipeline(s) 
would get launched.  What we will want to add in Crunch might be a common 
launching API for the launching job to report the PipelineResult objects much 
like the CrunchOozieLauncher.  This way instead of the Oozie action taking a 
generic main class it'd take a launcher class that returns the PipelineResults.

 Unable to correlate crunch jobs within Oozie
 

 Key: CRUNCH-272
 URL: https://issues.apache.org/jira/browse/CRUNCH-272
 Project: Crunch
  Issue Type: Improvement
Reporter: Mike Zimmerman
Assignee: Micah Whitacre
 Attachments: CRUNCH-272.patch, CRUNCH-272_prototype.patch


 I'm not really sure if this should be logged to Oozie or to Crunch, so please 
 feel free to move as needed.
 I would like to request a way to decorate map/reduce jobs that are spawned by 
 a Crunch pipeline so that I can programmatically determine their origin.  The 
 primary use case for this is integration with Oozie.  Oozie launches a single 
 map job to run a java action (in our case this java action runs a crunch 
 job).  Traceability from this original launcher job to the jobs created by 
 the crunch job is impossible without trolling logs.  This leaves a big black 
 hole for the system operator to assess the performance/impact of these jobs.  
 My initial thought was to provide a simple way to indicate a correlationId or 
 similar on a map/reduce job and then make it accessible within Oozie to query 
 for.  Obviously, that request would have to come after the correlation 
 feature was available within map/reduce.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CRUNCH-272) Unable to correlate crunch jobs within Oozie

2014-06-27 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/CRUNCH-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046476#comment-14046476
 ] 

Robert Kanter commented on CRUNCH-272:
--

I'd say that it's best if Oozie owns this rather than Crunch.  Otherwise, users 
have to add an extra jar to Oozie, add some configs to oozie-site, manually 
create a crunch sharelib, etc.  If we put it in Oozie, then from the users 
perspective, this is all built-in and done for them.

I'll try to take a look early next week.  In the mean time, perhaps you should 
create an OOZIE JIRA to Create a Crunch action?  

 Unable to correlate crunch jobs within Oozie
 

 Key: CRUNCH-272
 URL: https://issues.apache.org/jira/browse/CRUNCH-272
 Project: Crunch
  Issue Type: Improvement
Reporter: Mike Zimmerman
Assignee: Micah Whitacre
 Attachments: CRUNCH-272.patch, CRUNCH-272_prototype.patch


 I'm not really sure if this should be logged to Oozie or to Crunch, so please 
 feel free to move as needed.
 I would like to request a way to decorate map/reduce jobs that are spawned by 
 a Crunch pipeline so that I can programmatically determine their origin.  The 
 primary use case for this is integration with Oozie.  Oozie launches a single 
 map job to run a java action (in our case this java action runs a crunch 
 job).  Traceability from this original launcher job to the jobs created by 
 the crunch job is impossible without trolling logs.  This leaves a big black 
 hole for the system operator to assess the performance/impact of these jobs.  
 My initial thought was to provide a simple way to indicate a correlationId or 
 similar on a map/reduce job and then make it accessible within Oozie to query 
 for.  Obviously, that request would have to come after the correlation 
 feature was available within map/reduce.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CRUNCH-272) Unable to correlate crunch jobs within Oozie

2014-05-27 Thread Micah Whitacre (JIRA)

[ 
https://issues.apache.org/jira/browse/CRUNCH-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010181#comment-14010181
 ] 

Micah Whitacre commented on CRUNCH-272:
---

So unfortunately my approach is not a complete solution.  Specifically I missed 
this line[1] of code that is embedded inside of the launcher action that 
actually ties the properties back into the action and subsequently had the 
values stored in the Oozie.  This means that we will need a custom Oozie 
launching action/code which isn't horrible but I'm not sure we have a set 
structure to be able to create a schema for launching Crunch pipelines.

[1] - 
https://github.com/cloudera/oozie/blob/a659fd0f2e56850a35e38a6174667b0c07a75b57/core/src/main/java/org/apache/oozie/action/hadoop/HiveActionExecutor.java#L123

 Unable to correlate crunch jobs within Oozie
 

 Key: CRUNCH-272
 URL: https://issues.apache.org/jira/browse/CRUNCH-272
 Project: Crunch
  Issue Type: Improvement
Reporter: Mike Zimmerman
Assignee: Micah Whitacre
 Attachments: CRUNCH-272_prototype.patch


 I'm not really sure if this should be logged to Oozie or to Crunch, so please 
 feel free to move as needed.
 I would like to request a way to decorate map/reduce jobs that are spawned by 
 a Crunch pipeline so that I can programmatically determine their origin.  The 
 primary use case for this is integration with Oozie.  Oozie launches a single 
 map job to run a java action (in our case this java action runs a crunch 
 job).  Traceability from this original launcher job to the jobs created by 
 the crunch job is impossible without trolling logs.  This leaves a big black 
 hole for the system operator to assess the performance/impact of these jobs.  
 My initial thought was to provide a simple way to indicate a correlationId or 
 similar on a map/reduce job and then make it accessible within Oozie to query 
 for.  Obviously, that request would have to come after the correlation 
 feature was available within map/reduce.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CRUNCH-272) Unable to correlate crunch jobs within Oozie

2014-05-22 Thread Micah Whitacre (JIRA)

[ 
https://issues.apache.org/jira/browse/CRUNCH-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006389#comment-14006389
 ] 

Micah Whitacre commented on CRUNCH-272:
---

Logged CRUNCH-400 to track that change regarding tracking the material stages.  
I'll roll with vetting this prototype + tests assuming the missing stages will 
be handled on the other issue.

 Unable to correlate crunch jobs within Oozie
 

 Key: CRUNCH-272
 URL: https://issues.apache.org/jira/browse/CRUNCH-272
 Project: Crunch
  Issue Type: Improvement
Reporter: Mike Zimmerman
Assignee: Micah Whitacre
 Attachments: CRUNCH-272_prototype.patch


 I'm not really sure if this should be logged to Oozie or to Crunch, so please 
 feel free to move as needed.
 I would like to request a way to decorate map/reduce jobs that are spawned by 
 a Crunch pipeline so that I can programmatically determine their origin.  The 
 primary use case for this is integration with Oozie.  Oozie launches a single 
 map job to run a java action (in our case this java action runs a crunch 
 job).  Traceability from this original launcher job to the jobs created by 
 the crunch job is impossible without trolling logs.  This leaves a big black 
 hole for the system operator to assess the performance/impact of these jobs.  
 My initial thought was to provide a simple way to indicate a correlationId or 
 similar on a map/reduce job and then make it accessible within Oozie to query 
 for.  Obviously, that request would have to come after the correlation 
 feature was available within map/reduce.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CRUNCH-272) Unable to correlate crunch jobs within Oozie

2014-04-13 Thread Josh Wills (JIRA)

[ 
https://issues.apache.org/jira/browse/CRUNCH-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967997#comment-13967997
 ] 

Josh Wills commented on CRUNCH-272:
---

I like the ideas in this patch-- should the Pipeline object have a way of 
tracking all of the PipelineResult objects that corresponded to any jobs that 
were run in the life of the Pipeline so that we can get around the 
materialize() cases?

 Unable to correlate crunch jobs within Oozie
 

 Key: CRUNCH-272
 URL: https://issues.apache.org/jira/browse/CRUNCH-272
 Project: Crunch
  Issue Type: Improvement
Reporter: Mike Zimmerman
Assignee: Micah Whitacre
 Attachments: CRUNCH-272_prototype.patch


 I'm not really sure if this should be logged to Oozie or to Crunch, so please 
 feel free to move as needed.
 I would like to request a way to decorate map/reduce jobs that are spawned by 
 a Crunch pipeline so that I can programmatically determine their origin.  The 
 primary use case for this is integration with Oozie.  Oozie launches a single 
 map job to run a java action (in our case this java action runs a crunch 
 job).  Traceability from this original launcher job to the jobs created by 
 the crunch job is impossible without trolling logs.  This leaves a big black 
 hole for the system operator to assess the performance/impact of these jobs.  
 My initial thought was to provide a simple way to indicate a correlationId or 
 similar on a map/reduce job and then make it accessible within Oozie to query 
 for.  Obviously, that request would have to come after the correlation 
 feature was available within map/reduce.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CRUNCH-272) Unable to correlate crunch jobs within Oozie

2014-03-31 Thread Micah Whitacre (JIRA)

[ 
https://issues.apache.org/jira/browse/CRUNCH-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13955974#comment-13955974
 ] 

Micah Whitacre commented on CRUNCH-272:
---

So I spent some time last week poking through Oozie code trying to figure out 
how Oozie was doing it for other processes like Hive.  It seems like how they 
solved the problem for the Hive action[1] could be generically applied to the 
Java action in general.  Therefore i logged an enhancement to Oozie to possibly 
solve this.[2]

[1] - 
https://github.com/apache/oozie/blob/master/sharelib/hive/src/main/java/org/apache/oozie/action/hadoop/HiveMain.java#L295
[2] - https://issues.apache.org/jira/browse/OOZIE-1767

 Unable to correlate crunch jobs within Oozie
 

 Key: CRUNCH-272
 URL: https://issues.apache.org/jira/browse/CRUNCH-272
 Project: Crunch
  Issue Type: Improvement
Reporter: Mike Zimmerman

 I'm not really sure if this should be logged to Oozie or to Crunch, so please 
 feel free to move as needed.
 I would like to request a way to decorate map/reduce jobs that are spawned by 
 a Crunch pipeline so that I can programmatically determine their origin.  The 
 primary use case for this is integration with Oozie.  Oozie launches a single 
 map job to run a java action (in our case this java action runs a crunch 
 job).  Traceability from this original launcher job to the jobs created by 
 the crunch job is impossible without trolling logs.  This leaves a big black 
 hole for the system operator to assess the performance/impact of these jobs.  
 My initial thought was to provide a simple way to indicate a correlationId or 
 similar on a map/reduce job and then make it accessible within Oozie to query 
 for.  Obviously, that request would have to come after the correlation 
 feature was available within map/reduce.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CRUNCH-272) Unable to correlate crunch jobs within Oozie

2013-10-02 Thread Josh Wills (JIRA)

[ 
https://issues.apache.org/jira/browse/CRUNCH-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784257#comment-13784257
 ] 

Josh Wills commented on CRUNCH-272:
---

The MRPipeline constructor has an option to specify a string that is used as a 
common name for all of the MR jobs kicked off by that pipeline: could your 
JavaAction pass a string in via the commandline that could be passed to the 
MRPipeline instance, and used to track the jobs that were spawned by the action?

 Unable to correlate crunch jobs within Oozie
 

 Key: CRUNCH-272
 URL: https://issues.apache.org/jira/browse/CRUNCH-272
 Project: Crunch
  Issue Type: Improvement
Reporter: Mike Zimmerman

 I'm not really sure if this should be logged to Oozie or to Crunch, so please 
 feel free to move as needed.
 I would like to request a way to decorate map/reduce jobs that are spawned by 
 a Crunch pipeline so that I can programmatically determine their origin.  The 
 primary use case for this is integration with Oozie.  Oozie launches a single 
 map job to run a java action (in our case this java action runs a crunch 
 job).  Traceability from this original launcher job to the jobs created by 
 the crunch job is impossible without trolling logs.  This leaves a big black 
 hole for the system operator to assess the performance/impact of these jobs.  
 My initial thought was to provide a simple way to indicate a correlationId or 
 similar on a map/reduce job and then make it accessible within Oozie to query 
 for.  Obviously, that request would have to come after the correlation 
 feature was available within map/reduce.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CRUNCH-272) Unable to correlate crunch jobs within Oozie

2013-10-01 Thread Micah Whitacre (JIRA)

[ 
https://issues.apache.org/jira/browse/CRUNCH-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783430#comment-13783430
 ] 

Micah Whitacre commented on CRUNCH-272:
---

To which persona would you want this feature to be available?  As a programmer 
creating the Pipeline + Java Action that Oozie is launching?  As an Oozie 
workflow owner assembling multiple Java Actions and therefore not able to 
access Crunch APIs?  

The reason I ask is that some of these capabilities are already available in 
certain situations.  Also since Crunch has no concept of Oozie and Oozie no 
concept of Crunch it is very difficult to try and tie the two together.

 Unable to correlate crunch jobs within Oozie
 

 Key: CRUNCH-272
 URL: https://issues.apache.org/jira/browse/CRUNCH-272
 Project: Crunch
  Issue Type: Improvement
Reporter: Mike Zimmerman

 I'm not really sure if this should be logged to Oozie or to Crunch, so please 
 feel free to move as needed.
 I would like to request a way to decorate map/reduce jobs that are spawned by 
 a Crunch pipeline so that I can programmatically determine their origin.  The 
 primary use case for this is integration with Oozie.  Oozie launches a single 
 map job to run a java action (in our case this java action runs a crunch 
 job).  Traceability from this original launcher job to the jobs created by 
 the crunch job is impossible without trolling logs.  This leaves a big black 
 hole for the system operator to assess the performance/impact of these jobs.  
 My initial thought was to provide a simple way to indicate a correlationId or 
 similar on a map/reduce job and then make it accessible within Oozie to query 
 for.  Obviously, that request would have to come after the correlation 
 feature was available within map/reduce.



--
This message was sent by Atlassian JIRA
(v6.1#6144)