Hi Denis, Oozie can scale for tens of thousands of "overall" workflow actions I.e. Actions that are executed via multiple workflows and are staggered in time. For parallel actions in a single workflow, we have come across around a maximum of 100 forks, but with some slowness to stream the logs. Total number of actions in a workflow can be >100, limited by the size of the internal queue (default 10,000) Oozie server maintains to insert and then process the various commands on those actions. For such a high-scale application, you can consider running Oozie with 8GB of memory.
Improvements around reducing the memory footprint have been recently added to trunk and will be available in the next release. -- Mona On 12/18/13 11:45 AM, "Denis Yuen" <[email protected]> wrote: >Hi, > >Are there any good resources or does anyone have experience regarding >running workflows with a very large number of actions? > >We're currently using an Oozie install allocated with 4GB of memory >connected to a postgres database and we're successfully running workflows >with hundreds of actions. However, we're having trouble scaling up to >workflows that contain tens of thousands of actions. For example, errors >like "E0603: SQL error in operation, Ran out of memory retrieving query >results" or "E0603: SQL error in operation, An I/O error occured while >sending to the backend" occur in the Oozie logs, but we also see other >symptoms like the Oozie console becoming very slow and unresponsive. > >What are the typical and maximum workflow sizes that people have seen? >Both in terms of total number of actions in a workflow or the maximum >number of actions after a fork in a workflow would be useful. > >I want to get an idea of whether we're even in the ballpark so that its >worthwhile looking at tuning the various configuration settings for Oozie >or whether we're simply too far out to be reasonable. > >Thanks! > >-- Denis
