Hi, Here's my situation, currently I have hadoop v1 cluster and I made some changes to CapacityTaskScheduler to meet my requirement as below
My MR job is fetching data from HDFS(the data is generated by someone else) and put the data on a certain node in the cluster, I need the data distribution shall be even over time, in order to find the target node, I changed CapacityTaskScheduler in such way that it will select the node which has the largest disk space. Now I want to move to YARN, what changes I need to made to meet my requirement? I have some thoughts, please comment. 1) implement a new policy to Fair Scheduler, and what tricky part I need to care about? Another question is more generic, if I have 2 applications in my YARN cluster, one is MR and another is XXX, they have different requirement on resource, is it possible I can have different scheduler per application in the same cluster? Thanks -- --Anfernee
