Hello experienced oozie users,

I am new to apache oozie and I am facing following task which I don't know
how to solve according "best" practices if even possible.It would be great
to get several opinions on it. So the situation is following:

I have a MR job which from the data stream categorize data according to
their date in several output directories. It is finite number but it is
huge  and moving, e.g. last ten years -> 10 times 365 is the total number
of data buckets. Than I have an archive which essentially has the same data
folders according to date which is kind of accumulating the data from
various sources for a given date. The problem here is that during the MR
run we don't know the dates we are processing beforehand and we need to
"merge" those data to archive. We need first to check what dates the output
has and than assemble paths for "merge" MR job, which does merge, cleaning,
removing possible duplicities etc. Having predefined 10*365 jobs looks
horribly.

I hope that the case is clear and I would be really grateful for  any
thoughts or suggestions.

Thanks
Jakub

Reply via email to