Rajesh Balamohan created HIVE-7910:
--------------------------------------

             Summary: Enhance natural order scheduler to prevent downstream 
vertex from monopolizing the cluster resources
                 Key: HIVE-7910
                 URL: https://issues.apache.org/jira/browse/HIVE-7910
             Project: Hive
          Issue Type: Bug
            Reporter: Rajesh Balamohan


M2             M7
    \              /
(sg) \            /
       R3        / (b)
        \       /
     (b) \     /
          \   /
            M5
            |
            R6 

Plz refer to the attachment (task runtime SVG).  In this case, M5 got scheduled 
much earlier than R3 (R3 is mentioned as green color in the diagram) and 
retained lots of containers.  R3 got less containers to work with. 

Attaching the output from the status monitor when the job ran;  Map_5 has taken 
up almost all containers, whereas Reducer_3 got fraction of the capacity.

Map_2: 1/1      Map_5: 0(+373)/1000     Map_7: 1/1      Reducer_3: 0/8000       
Reducer_6: 0/1
Map_2: 1/1      Map_5: 0(+374)/1000     Map_7: 1/1      Reducer_3: 0/8000       
Reducer_6: 0/1
Map_2: 1/1      Map_5: 0(+374)/1000     Map_7: 1/1      Reducer_3: 0(+1)/8000   
Reducer_6: 0/1
....
Map_2: 1/1      Map_5: 0(+374)/1000     Map_7: 1/1      Reducer_3: 14(+7)/8000  
Reducer_6: 0/1
Map_2: 1/1      Map_5: 0(+374)/1000     Map_7: 1/1      Reducer_3: 63(+14)/8000 
Reducer_6: 0/1
Map_2: 1/1      Map_5: 0(+374)/1000     Map_7: 1/1      Reducer_3: 
159(+22)/8000        Reducer_6: 0/1
Map_2: 1/1      Map_5: 0(+374)/1000     Map_7: 1/1      Reducer_3: 
308(+29)/8000        Reducer_6: 0/1
...


Creating this JIRA as a placeholder for scheduler enhancement. One possibililty 
could be to
schedule lesser number of tasks in downstream vertices, based on the 
information available for the upstream vertex.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to