Hi, I want to build a result verification system for Map reduce, using the concept of replication, for which I am employing 2 kinds of tasks - quizzes and map reduce tasks. Basically, if the job type submitted is a quiz, I want to replicate it to all the worker nodes, whereas, in the case of map reduce tasks, I want to replicate it to only a fraction of the worker nodes.For this, I thought of introducing a new field jobtype in the workflow.xml, which when parsed would result in a set of new workflow.xml, that would replicate the task in the worker nodes, the number of replicas based on the job type.I am new to Oozie, so I really have no idea as to where the code should be modified and even if the parsing of workflow.xml happens or not. Could you tell me if it's possible to modify the Oozie code to implement this concept. Thanks, Tina
On Mon, May 19, 2014 at 10:53 PM, Mona Chitnis <[email protected]>wrote: > Hi Tina, > > Oozie is not meant currently to influence resource management. It > coordinates and tracks workflows but the decision about number of M-R > tasks (aka number of nodes parallelly executing the workflow) rests with > Hadoop. Of course, we can pass mapreduce configuration parameters through > Oozie, such as split size, number of map and reduce tasks desired, to > influence resource management, but that is done at a best-effort basis in > principle. > > Can you provide us more details of your use-case? It sounds interesting > but not sure if Oozie would be the place for this kind of logic. > > ‹ > Mona > > On 5/19/14, 3:08 AM, "Tina Samuel" <[email protected]> wrote: > > >I would like to modify the Oozie code to introduce a new scheduling > >pattern > >in Hadoop. I am new to Oozie. I read that there is a file called > >workflow.xml which has the actions that are to be performed by Hadoop. I > >want to introduce a new field to the job, something like a JOB_TYPE. For > >eg, if a job belongs to TYPE_1, then it should be replicated in all the > >worker nodes. If a job belongs to TYPE_2, then it should be replicated in > >only a fraction of nodes. Is it possible to modify the parser of Oozie > >which parses the workflow.xml? Please do help > > > >-- > >Tina > > -- Tina
