Hello experienced oozie users, I am new to apache oozie and I am facing following task which I don't know how to solve according "best" practices if even possible.It would be great to get several opinions on it. So the situation is following:
I have a MR job which from the data stream categorize data according to their date in several output directories. It is finite number but it is huge and moving, e.g. last ten years -> 10 times 365 is the total number of data buckets. Than I have an archive which essentially has the same data folders according to date which is kind of accumulating the data from various sources for a given date. The problem here is that during the MR run we don't know the dates we are processing beforehand and we need to "merge" those data to archive. We need first to check what dates the output has and than assemble paths for "merge" MR job, which does merge, cleaning, removing possible duplicities etc. Having predefined 10*365 jobs looks horribly. I hope that the case is clear and I would be really grateful for any thoughts or suggestions. Thanks Jakub
