Hi all,

I’m building an Oozie workflow to schedule the generate, fetch, etc… workflow. 
Right now I'm trying to feed the list of generated segments into the following 
fetch stage.

The “crawl” script assumes that the most recently added segment is un-fetched 
and does some hdfs shell scripting to determine its name and stuff this into a 
shell variable, but I’d like to avoid this and somehow feed the list of 
generated segments directly into the following step.

I have the feeling that I could use the ooze “capture data from action” option 
but I think that will require fiddling with the Generator class source; that’s 
ok but I’m a bit weary of adding custom code that may not be part of the core 
distribution. Has anyone already done something similar, preferably without 
touching the source? (e.g. 
http://qnalist.com/questions/2330221/nutch-oozie-and-elasticsearch but it now 
404s on GitHub)


Best,
Edoardo 

-- 
Edoardo Causarano
Sent with Airmail

Reply via email to