Hi all, I’m building an Oozie workflow to schedule the generate, fetch, etc… workflow. Right now I'm trying to feed the list of generated segments into the following fetch stage.
The “crawl” script assumes that the most recently added segment is un-fetched and does some hdfs shell scripting to determine its name and stuff this into a shell variable, but I’d like to avoid this and somehow feed the list of generated segments directly into the following step. I have the feeling that I could use the ooze “capture data from action” option but I think that will require fiddling with the Generator class source; that’s ok but I’m a bit weary of adding custom code that may not be part of the core distribution. Has anyone already done something similar, preferably without touching the source? (e.g. http://qnalist.com/questions/2330221/nutch-oozie-and-elasticsearch but it now 404s on GitHub) Best, Edoardo -- Edoardo Causarano Sent with Airmail

