[jira] [Commented] (YARN-1024) Define a virtual core unambigiously
[ https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766124#comment-13766124 ] Eli Collins commented on YARN-1024: --- bq. keeping virtual cores to express parallelism sounds good as it is clear it is not a real core. Hm, I read this the other way. If a framework asks for three vcores on a host it intends to run some code on three real physical cores at the same time. If a long lived framework wants to reserve 2 cores per host it would ask for 2 cores (and 100% YCU per core). Sandy's proposal, switching to cores and YCU instead of just vcores, is equivalent to the proposal above of getting rid of vcores and supporting fractional cores. A vcore becomes a core and YCU is just a way to express that you want a fraction of a core. Sounds good to me. Define a virtual core unambigiously --- Key: YARN-1024 URL: https://issues.apache.org/jira/browse/YARN-1024 Project: Hadoop YARN Issue Type: Improvement Reporter: Arun C Murthy Assignee: Arun C Murthy Attachments: CPUasaYARNresource.pdf We need to clearly define the meaning of a virtual core unambiguously so that it's easy to migrate applications between clusters. For e.g. here is Amazon EC2 definition of ECU: http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it Essentially we need to clearly define a YARN Virtual Core (YVC). Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1024) Define a virtual core unambigiously
[ https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750263#comment-13750263 ] Chris Riccomini commented on YARN-1024: --- Jumping into this late. I'm thinking about this discussion from an end user's perspective. 1. It seems to me that the only time you'd want a YCU value that's not -1 is when you're running a thread that uses less than 100% of the CPU. Is that a correct statement? 2. As an end user, how do I know what YCU value is reasonable for my job? In the distcp example, how do I figure out that 500 YCUs is reasonable? Are we expecting users to run their job on an isolated box, run top, then do some arithmetic? Alternatively, are we expecting them to run their job repeatedly, and tune down their YCU request until the point that it becomes too Instinctively, I kind of have the feeling that the concept of the YCU is more useful if it encompasses more than just CPU. The benefit of Amazon's ECU is that it's fairly straightforward to reason about. You get a pre-defined slice of memory, CPU, disk, and network. If the primary goal is simplicity (stated above), why wouldn't you go that route, vs. limiting YCUs to being a strictly CPU-related concept? This leads to (perhaps significantly) worse cluster utilization, but it's a simpler model for the end user. As I understand it, this is kind of how memory was being treated prior to adding CPU resources (i.e. asking for 20% of the memory on a host is really just a proxy for 20% of machine resources as a whole). Define a virtual core unambigiously --- Key: YARN-1024 URL: https://issues.apache.org/jira/browse/YARN-1024 Project: Hadoop YARN Issue Type: Improvement Reporter: Arun C Murthy Assignee: Arun C Murthy Attachments: CPUasaYARNresource.pdf We need to clearly define the meaning of a virtual core unambiguously so that it's easy to migrate applications between clusters. For e.g. here is Amazon EC2 definition of ECU: http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it Essentially we need to clearly define a YARN Virtual Core (YVC). Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1024) Define a virtual core unambigiously
[ https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750270#comment-13750270 ] Chris Riccomini commented on YARN-1024: --- Realized I'm conflating ECU with AWS machine instances. I retract the whole last paragraph. :) Define a virtual core unambigiously --- Key: YARN-1024 URL: https://issues.apache.org/jira/browse/YARN-1024 Project: Hadoop YARN Issue Type: Improvement Reporter: Arun C Murthy Assignee: Arun C Murthy Attachments: CPUasaYARNresource.pdf We need to clearly define the meaning of a virtual core unambiguously so that it's easy to migrate applications between clusters. For e.g. here is Amazon EC2 definition of ECU: http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it Essentially we need to clearly define a YARN Virtual Core (YVC). Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1024) Define a virtual core unambigiously
[ https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750286#comment-13750286 ] Sandy Ryza commented on YARN-1024: -- bq. It seems to me that the only time you'd want a YCU value that's not -1 is when you're running a thread that uses less than 100% of the CPU. Is that a correct statement? That's correct. This is common for data-intensive tasks that can be more I/O-bound than CPU-bound. bq. As an end user, how do I know what YCU value is reasonable for my job? I think selecting the right value is an inherently difficult task. I think we would expect different users with different amounts of technical proficiency to do it in different ways. Something like: * Simple: Use the default value on the cluster. * Intermediate: Notice your tasks are running too slow and increase YCUs. Or notice your tasks aren't getting scheduled enough and decrease them. * Advanced: Do the thing with top. Define a virtual core unambigiously --- Key: YARN-1024 URL: https://issues.apache.org/jira/browse/YARN-1024 Project: Hadoop YARN Issue Type: Improvement Reporter: Arun C Murthy Assignee: Arun C Murthy Attachments: CPUasaYARNresource.pdf We need to clearly define the meaning of a virtual core unambiguously so that it's easy to migrate applications between clusters. For e.g. here is Amazon EC2 definition of ECU: http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it Essentially we need to clearly define a YARN Virtual Core (YVC). Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1024) Define a virtual core unambigiously
[ https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747701#comment-13747701 ] Sandy Ryza commented on YARN-1024: -- Filed YARN-1089 for adding YCUs. Define a virtual core unambigiously --- Key: YARN-1024 URL: https://issues.apache.org/jira/browse/YARN-1024 Project: Hadoop YARN Issue Type: Improvement Reporter: Arun C Murthy Assignee: Arun C Murthy We need to clearly define the meaning of a virtual core unambiguously so that it's easy to migrate applications between clusters. For e.g. here is Amazon EC2 definition of ECU: http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it Essentially we need to clearly define a YARN Virtual Core (YVC). Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1024) Define a virtual core unambigiously
[ https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13748062#comment-13748062 ] Sandy Ryza commented on YARN-1024: -- I wrote up a more detailed proposal and attached a PDF of it. Define a virtual core unambigiously --- Key: YARN-1024 URL: https://issues.apache.org/jira/browse/YARN-1024 Project: Hadoop YARN Issue Type: Improvement Reporter: Arun C Murthy Assignee: Arun C Murthy Attachments: CPUasaYARNresource.pdf We need to clearly define the meaning of a virtual core unambiguously so that it's easy to migrate applications between clusters. For e.g. here is Amazon EC2 definition of ECU: http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it Essentially we need to clearly define a YARN Virtual Core (YVC). Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1024) Define a virtual core unambigiously
[ https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13741051#comment-13741051 ] Arun C Murthy commented on YARN-1024: - [~sandyr] Thanks for taking the time to clearly elucidate our long always-half-confused discussion i.e. the longwindedness! I think we could be close to a solution here, I really do - though, I'm not betting my house yet. *smile* To paraphrase your proposal (mainly for my own benefit): # Split current (get,set)VirtualCores into (get,set)YCUPerCore and (get,set)Cores. # There is a cluster-wide constant of maxYCUPerCore # The schedulers use {{core * YCUPerCore}} to do resource-allocation comparisons. The one issue that we need to think about is that we'll need to enhance the schedulers to track how much YCUs are available on which core on any given node... you could have 5 YCUs in a node but split 3-2-1 across 3 cores. Any good ideas on how to get to this? Define a virtual core unambigiously --- Key: YARN-1024 URL: https://issues.apache.org/jira/browse/YARN-1024 Project: Hadoop YARN Issue Type: Improvement Reporter: Arun C Murthy Assignee: Arun C Murthy We need to clearly define the meaning of a virtual core unambiguously so that it's easy to migrate applications between clusters. For e.g. here is Amazon EC2 definition of ECU: http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it Essentially we need to clearly define a YARN Virtual Core (YVC). Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1024) Define a virtual core unambigiously
[ https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13741081#comment-13741081 ] Alejandro Abdelnur commented on YARN-1024: -- Instead cores, could we talk about parallelism? given that there are cpus with hyper-threading and a physical core may be seen as more than one from a parallelism perspective? (ie a singlethreaded MR task would consume at most 1/2 of a core) Define a virtual core unambigiously --- Key: YARN-1024 URL: https://issues.apache.org/jira/browse/YARN-1024 Project: Hadoop YARN Issue Type: Improvement Reporter: Arun C Murthy Assignee: Arun C Murthy We need to clearly define the meaning of a virtual core unambiguously so that it's easy to migrate applications between clusters. For e.g. here is Amazon EC2 definition of ECU: http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it Essentially we need to clearly define a YARN Virtual Core (YVC). Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1024) Define a virtual core unambigiously
[ https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13741191#comment-13741191 ] Sandy Ryza commented on YARN-1024: -- bq. To paraphrase your proposal... Thanks for summarizing. That captures it perfectly. bq, The one issue that we need to think about is that we'll need to enhance the schedulers to track how much YCUs are available on which core on any given node... you could have 5 YCUs in a node but split 3-2-1 across 3 cores. Any good ideas on how to get to this? Correct me if I'm wrong, but you're talking about availability based on usage, not heterogeneous cores within a node, right? If my assumptions about how we can view CPUs are sufficient, I'm thinking we shouldn't need this, at least for a start. I.e. if we have two threads to run and two cores to run them on, we can be agnostic to whether the OS scheduler is running each on its own core or splitting both across two. The CGroups properties discussed in YARN-810 allow you to limit the total processing power that a process gets without pinning its threads to cores. Assigning tasks to cores might matter for things like cache performance, so I agree it's a useful thing to work on eventually. But I think any solution will either end up with a decent amount of fragmentation or require doing some NP-hard combinatorial optimization repeatedly. Define a virtual core unambigiously --- Key: YARN-1024 URL: https://issues.apache.org/jira/browse/YARN-1024 Project: Hadoop YARN Issue Type: Improvement Reporter: Arun C Murthy Assignee: Arun C Murthy We need to clearly define the meaning of a virtual core unambiguously so that it's easy to migrate applications between clusters. For e.g. here is Amazon EC2 definition of ECU: http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it Essentially we need to clearly define a YARN Virtual Core (YVC). Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1024) Define a virtual core unambigiously
[ https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13741213#comment-13741213 ] Arun C Murthy commented on YARN-1024: - bq. Correct me if I'm wrong, but you're talking about availability based on usage, not heterogeneous cores within a node, right? If my assumptions about how we can view CPUs are sufficient, I'm thinking we shouldn't need this, at least for a start. Ah, good point. Although I would like to really think through implications of heterogenous nodes in the cluster. In spite, I think there isn't anything here we'd be blocked on. Anyone disagrees? - Now, the important question. If we agree, in a broad sense, on [~sandyr]'s proposal - are we happy with our current APIs, particularly in light of 2.1.0-beta? One option is for us to use the current (get,set)VirtualCores as the basis for 'cores' or 'parallelism' going fwd and introduce a new (get,set)YCUPerCore? Is that ok? What do you guys think? Thanks. Define a virtual core unambigiously --- Key: YARN-1024 URL: https://issues.apache.org/jira/browse/YARN-1024 Project: Hadoop YARN Issue Type: Improvement Reporter: Arun C Murthy Assignee: Arun C Murthy We need to clearly define the meaning of a virtual core unambiguously so that it's easy to migrate applications between clusters. For e.g. here is Amazon EC2 definition of ECU: http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it Essentially we need to clearly define a YARN Virtual Core (YVC). Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1024) Define a virtual core unambigiously
[ https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13741289#comment-13741289 ] Sandy Ryza commented on YARN-1024: -- My opinion is that we shouldn't delay the release for adding in both, and that we can add in YCUs in 2.1.1. If we want to change virtual cores to 'cores' or 'parallelism', I could post a refactoring patch by EOD. I also wouldn't cry if we left it as virtual cores. Define a virtual core unambigiously --- Key: YARN-1024 URL: https://issues.apache.org/jira/browse/YARN-1024 Project: Hadoop YARN Issue Type: Improvement Reporter: Arun C Murthy Assignee: Arun C Murthy We need to clearly define the meaning of a virtual core unambiguously so that it's easy to migrate applications between clusters. For e.g. here is Amazon EC2 definition of ECU: http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it Essentially we need to clearly define a YARN Virtual Core (YVC). Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1024) Define a virtual core unambigiously
[ https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13741518#comment-13741518 ] Robert Joseph Evans commented on YARN-1024: --- I am fine with that too. Define a virtual core unambigiously --- Key: YARN-1024 URL: https://issues.apache.org/jira/browse/YARN-1024 Project: Hadoop YARN Issue Type: Improvement Reporter: Arun C Murthy Assignee: Arun C Murthy We need to clearly define the meaning of a virtual core unambiguously so that it's easy to migrate applications between clusters. For e.g. here is Amazon EC2 definition of ECU: http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it Essentially we need to clearly define a YARN Virtual Core (YVC). Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1024) Define a virtual core unambigiously
[ https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739853#comment-13739853 ] Robert Joseph Evans commented on YARN-1024: --- {quote}Sorry for the longwindedness.{quote} From what people have told me you still have a long ways to go before you approach me for longwindedness :). My initial gut reaction is that only having two numbers to express the request seems too simplified, but the more I think about it the more I am OK with it, although I think I would change the numbers to be total YCUs requested and minimum YCUs per core. This gives the user better viability into how the scheduler is treating these numbers so they can better reason about them. The total YCUs is the value used for scheduling. The minimum YCUs per core is compared to the maxComputeUnitsPerCore like was suggested to reject a request as not possible, or in the case of a heterogeneous environment restrict the hosts that this container can run on. Although I am OK with the original proposal too. I would also like us to have a flag that would either limit the container to the requested CPU and let it have no more even when more is available, or would let it expand to use whatever CPU was free, but would be guaranteed to get at least the YCUs requested. This is likely something that would have to be done on a separate JIRA though. Without this I don't see a way to really get simplicity, predictability, or consistency. 1 MB of RAM is fairly simple to understand. It can be measured without too much of a problem just by running the process. Most user do a simple search for the correct value run with the default, if it does not work I increase the amount and run again. 1 YCU is very complex to measure for an application. If I cannot restrict a container to never use more than what was requested I cannot consistently predict how long it will take to run later. Without this I don't know how to answer the question I know will come up. What should I set these values to? Define a virtual core unambigiously --- Key: YARN-1024 URL: https://issues.apache.org/jira/browse/YARN-1024 Project: Hadoop YARN Issue Type: Improvement Reporter: Arun C Murthy Assignee: Arun C Murthy We need to clearly define the meaning of a virtual core unambiguously so that it's easy to migrate applications between clusters. For e.g. here is Amazon EC2 definition of ECU: http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it Essentially we need to clearly define a YARN Virtual Core (YVC). Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1024) Define a virtual core unambigiously
[ https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739900#comment-13739900 ] Sandy Ryza commented on YARN-1024: -- bq. I would also like us to have a flag that would either limit the container to the requested CPU and let it have no more even when more is available, or would let it expand to use whatever CPU was free, but would be guaranteed to get at least the YCUs requested. YARN-810 should handle this. The plan is to make it a cluster config, but feel free to chime in there if you think it needs to be an app config. bq. 1 YCU is very complex to measure for an application. Agreed that YCUs are very complex to measure and set for applications, and I don't think there is any good way around this. YARN-810 will help considerably, but still won't make it close to as easy as configuring memory. bq. although I think I would change the numbers to be total YCUs requested and minimum YCUs per core. Because of the complexity discussed above in dealing with YCUs, I strongly believe that we should keep one of the parameters as just number of cores, which allows a user to separate the concerns of how much parallelism can my task take advantage of? and how CPU-bound is my task?. This will also give us something in common with every other cluster resource manager I have surveyed (Condor, Maui, and Torque, etc.) Define a virtual core unambigiously --- Key: YARN-1024 URL: https://issues.apache.org/jira/browse/YARN-1024 Project: Hadoop YARN Issue Type: Improvement Reporter: Arun C Murthy Assignee: Arun C Murthy We need to clearly define the meaning of a virtual core unambiguously so that it's easy to migrate applications between clusters. For e.g. here is Amazon EC2 definition of ECU: http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it Essentially we need to clearly define a YARN Virtual Core (YVC). Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1024) Define a virtual core unambigiously
[ https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738985#comment-13738985 ] Sandy Ryza commented on YARN-1024: -- I've been thinking a lot about this, and wanted to propose a modified approach, inspired by an offline discussion with Arun and his max-vcores idea (https://issues.apache.org/jira/browse/YARN-1024?focusedCommentId=13730074page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13730074). First, my assumptions about how CPUs work: * A CPU is essentially a bathtub full of processing power that can be doled out to threads, with a limit per thread based on the power of each core within it. * To give X processing power to a thread means that within a standard unit of time, roughly some number of instructions proportional to X can be executed for that thread. * No more than a certain amount of processing power (the amount of processing power per core) can be given to each thread. * We can use CGroups to say that a task gets some fraction of the system's processing power. * This means that if we have 5 cores with Y processing power each, we can give 5 threads Y processing power each, or 6 threads 5Y/6 processing power each, but we can't give 4 threads 5Y/4 processing power each. * It never makes sense to use CGroups assign a higher fraction of the system's processing power than (numthreads the task can take advantage of / number of cores) to a task. * Equivalently, if my CPU has X processing power per core, it never makes sense to assign more than (numthreads the task can take advantage of) * X processing power to a task. So as long as we account for that last constraint, we can essentially view processing power as a fluid resource like memory. With this in mind, we can: 1. Split virtual cores into cores and yarnComputeUnitsPerCore. Requests can include both and nodes can be configured with both. 2. Have a cluster-defined maxComputeUnitsPerCore, which would be the smallest yarnComputeUnitsPerCore on any node. We min all yarnComputeUnitsPerCore requests with this number when they hit the RM. 3. Use YCUs, not cores, for scheduling. I.e. the scheduler thinks of a node's CPU capacity in terms of the number of YCUs it can handle and thinks of a resource's CPU request in terms of its (normalized yarnComputeUnitsPerCore * # cores). We use YCUs for DRF. 4. If we make YCUs small enough, no need for fractional anything. This reduces to a number-of-cores-based approach if all containers are requested with yarnComputeUnitsPerCore=infinity, and reduces to a YCU approach if maxComputeUnitsPerCore is set to infinity. Predictability, simplicity, and scheduling flexibility can be traded off per cluster without overloading the same concept with multiple definitions. This doesn't take into account heteregeneous hardware within a cluster, but I think (2) can be tweaked to handle this by holding a value for each node (can elaborate on how this would work). It also doesn't take into account pinning threads to CPUs, but I don't think it's any less extensible for ultimately dealing with this than other proposals. Sorry for the longwindedness. Bobby, would this provide the flexibility you're looking for? Define a virtual core unambigiously --- Key: YARN-1024 URL: https://issues.apache.org/jira/browse/YARN-1024 Project: Hadoop YARN Issue Type: Improvement Reporter: Arun C Murthy Assignee: Arun C Murthy We need to clearly define the meaning of a virtual core unambiguously so that it's easy to migrate applications between clusters. For e.g. here is Amazon EC2 definition of ECU: http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it Essentially we need to clearly define a YARN Virtual Core (YVC). Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1024) Define a virtual core unambigiously
[ https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736919#comment-13736919 ] Robert Joseph Evans commented on YARN-1024: --- Perhaps I am missing something here. The goals Arun has asked for are simplicity, predictability, and consistency. Simplicity I totally agree with, but I do not totally agree with always having predictability and consistency after simplicity, and I do not agree that they are always required. These two come with a trade-off with utilization, and this is something that Sandy brought up, although not directly. For HBase guaranteed resources, in terms of both parallelism and raw CPU speed are important because it is using those to provide a service where predictability and consistency are needed. If the HBase AM cannot truly express to YARN what it needs because of simplicity HBase on YARN will not be used, because it will not behave the way users need/expect it to. Similarly if HBase is allowed to steal resources from others you can easily request too little resources on an underutilized cluster and when the cluster is under load it falls apart. This is similar for me with my desire for Storm on YARN. I am happy to use a complex API to express my needs if it means that I get what I need. On the other hand, if I am doing MR batch processing most of the time (but not all of it) I am doing single threaded processing and I really just want it to fill in the gaps and use as much unused CPU as it can. Yes, some MR jobs have strict SLAs but most do not and it is best if we can provide a scheduler that can balance both. I also don't agree that because YARN lacks the ability to schedule everything that impacts performance, including network and disk IO, that we should skip doing CPU correctly. Some applications are truly CPU bound and they will benefit. For other resources we can add them to YARN as they are needed until we do meet the goal of predictability and consistency. Define a virtual core unambigiously --- Key: YARN-1024 URL: https://issues.apache.org/jira/browse/YARN-1024 Project: Hadoop YARN Issue Type: Improvement Reporter: Arun C Murthy Assignee: Arun C Murthy We need to clearly define the meaning of a virtual core unambiguously so that it's easy to migrate applications between clusters. For e.g. here is Amazon EC2 definition of ECU: http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it Essentially we need to clearly define a YARN Virtual Core (YVC). Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1024) Define a virtual core unambigiously
[ https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730622#comment-13730622 ] Junping Du commented on YARN-1024: -- I would also prefer #1 in scheduling resources as #2 is only meaningful in charge/billing as [~philip] mentioned above. For #2, simple calculation like ECU (it is released in 2006/2007, but didn't change over 7 years which against Moore's law :)) has two common questioned scenarios below: - assignment of multiple slow p-cores (4 x 1G) to a single thread task (1 x 4G) asking for a fast core (mapping to multiple vcore) cannot help performance but a waste of cpu resource: unused core will still consume timer interrupts, and idle loop cause resources too. In addition, maintaining a consistent memory view among multiple vCPUs consume resources. All of these are unnecessary. Another case is that it is possible for OS CPU scheduler to migrate a single-threaded workload amongst multiple vCPUs, thereby losing cache locality. - assignment of single faster p-cores (1 x 4G) to multiple thread task asking for multiple slow core (4 x 1G), it will cause performance issues as Steve mentioned above and in YARN-972, too much overhead in process context switch and cache miss. #1 sounds more reasonable and 1 vcore don't have to be 1pcore, but could be mapped to 1 vCPU on virtualization and can be overcommit latter (with configured ratio) by virtualized platform. Define a virtual core unambigiously --- Key: YARN-1024 URL: https://issues.apache.org/jira/browse/YARN-1024 Project: Hadoop YARN Issue Type: Improvement Reporter: Arun C Murthy Assignee: Arun C Murthy We need to clearly define the meaning of a virtual core unambiguously so that it's easy to migrate applications between clusters. For e.g. here is Amazon EC2 definition of ECU: http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it Essentially we need to clearly define a YARN Virtual Core (YVC). Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1024) Define a virtual core unambigiously
[ https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730961#comment-13730961 ] Eli Collins commented on YARN-1024: --- bq. vcores are optional anyway (only used in DRF) Sandy corrected me offline that while this is true for the CS it is not true for the FS, which by default (w/o DRF) will not schedule more containers worth of vcores than configured vcores (which seems like it could lead to under-utilization given that the default resource calculator only uses memory and not every container needs a whole core). By default the # vcores is the # cores on the machine and MR asks containers w/ 1 vcore so we effectively have vcore=pcore today as the default (re-inforced by the decision to remove the notion of pcore in YARN-782). Define a virtual core unambigiously --- Key: YARN-1024 URL: https://issues.apache.org/jira/browse/YARN-1024 Project: Hadoop YARN Issue Type: Improvement Reporter: Arun C Murthy Assignee: Arun C Murthy We need to clearly define the meaning of a virtual core unambiguously so that it's easy to migrate applications between clusters. For e.g. here is Amazon EC2 definition of ECU: http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it Essentially we need to clearly define a YARN Virtual Core (YVC). Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1024) Define a virtual core unambigiously
[ https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729724#comment-13729724 ] Arun C Murthy commented on YARN-1024: - bq. If I were to package my simulator and give it to other people on other clusters, it would still be true that it spins one CPU. Its runtime, however, would vary depending on the horsepower. I don't see the conflict. If you don't care about predictable runtime, you could still say I want to run on 1 virtual-core. By the above non-requirement on predictability, whether it's 1 (virtual) core out of 16 physical cores or 1024 virtual cores is immaterial, isn't it? And yes, you still get only 1 physical core since the virtual core is mapped to a single physical core. The point about specifying a virtual core is that you get predictable performance when you migrate your application between clusters and other goodness. What am I missing here? Define a virtual core unambigiously --- Key: YARN-1024 URL: https://issues.apache.org/jira/browse/YARN-1024 Project: Hadoop YARN Issue Type: Improvement Reporter: Arun C Murthy Assignee: Arun C Murthy We need to clearly define the meaning of a virtual core unambiguously so that it's easy to migrate applications between clusters. For e.g. here is Amazon EC2 definition of ECU: http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it Essentially we need to clearly define a YARN Virtual Core (YVC). Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1024) Define a virtual core unambigiously
[ https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729833#comment-13729833 ] Steve Loughran commented on YARN-1024: -- I was the one trying to convince Sandy that a uniform core metric is dangerous, it's like when a MIP was a VAX-equivalent Million Instructions. # different parts have different performance in terms of FPU and memory IO bandwidth, even if the integer perf is the same. (hence people like to get Intel parts over AMD parts on EC2 allocations). # there's also the hyperthreading issue; is an HT core the equivalent of a real core (no, but Linux treats them the same, AFAIK). # over time, as 2007 gets further away, the metric becomes less relevant. # EC2 also includes RAM (e.g m1.small has same CPU as m1.medium, only less RAM; AWS considers medium as having 2x ECUs. One thing I was arguing against in YARN-972 is allocating fractions of a real core: if I say 1 core, I get a single core, irrespective of performance. If EC2s are used, and I ask for 1 ECU, does that mean that I get 0.50 of a bigger core, or a free upgrade. I'm happy if I ask for 8 ECUs and get a guarantee of not being on a CPU with 8 ECUs, making it a minimum requirement of the CPU perf. Define a virtual core unambigiously --- Key: YARN-1024 URL: https://issues.apache.org/jira/browse/YARN-1024 Project: Hadoop YARN Issue Type: Improvement Reporter: Arun C Murthy Assignee: Arun C Murthy We need to clearly define the meaning of a virtual core unambiguously so that it's easy to migrate applications between clusters. For e.g. here is Amazon EC2 definition of ECU: http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it Essentially we need to clearly define a YARN Virtual Core (YVC). Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1024) Define a virtual core unambigiously
[ https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729839#comment-13729839 ] Sandy Ryza commented on YARN-1024: -- If I am used to running my single-threaded task on a fast core (let's say rated at 250 YVCs), and then I migrate it to another cluster with slower cores (let's say rated at 150 YVCs), and still request 250 YVCs, my task will run no faster than if I had requested it with 150 YVCs. I won't get predictable performance, and, from a scheduling perspective, I'd be better off requesting 150 YVCs on the slower cluster. In a single pcore-to-vcore world, if I know that my task is CPU-bound and uses X threads, I know that each vcore I ask for up to X vcores will predictably improve its performance, whatever cluster I am running on. In a world where different cores have different YVCs, I don't get a clear concept of when I should increase my YVCs requested, and the advantage of doing so depends mostly on the cluster I am running on. A virtual core definition based on processing power masks the fact that two 1.5 GHz cores mean something very different than three 1.0 GHz cores. And makes it very hard to reason about how many virtual cores to request. Define a virtual core unambigiously --- Key: YARN-1024 URL: https://issues.apache.org/jira/browse/YARN-1024 Project: Hadoop YARN Issue Type: Improvement Reporter: Arun C Murthy Assignee: Arun C Murthy We need to clearly define the meaning of a virtual core unambiguously so that it's easy to migrate applications between clusters. For e.g. here is Amazon EC2 definition of ECU: http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it Essentially we need to clearly define a YARN Virtual Core (YVC). Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1024) Define a virtual core unambigiously
[ https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729895#comment-13729895 ] Jason Lowe commented on YARN-1024: -- Agree that the example posed by [~sandyr] shows that a single unit in the request cannot properly convey the ask. Chatted briefly about this offline with [~revans2] and [~nroberts] and we think in general there needs to be a way to show the parallelism needed along with some performance guarantee from those threads. That basically leads us to a path where in the generalized case we're asking for a list of vcore units, where the number of entries in the list represents the desired hardware parallelism and the value of each entry represents the performance needed for that execution thread. Using this with Sandy's example, asking for a single unit of 250 YVCs means it would not be allocated on the node with three cores each rated at 150 YVCs because none of the cores meets the single-threaded performance needed by the container. If another job came along and asked for three cores each at 100 YVCs, that could still run on a node that only has a single core rated at 500 YVCs because that core likely has enough horsepower to multitask the three threads and get them each the required performance. I understand where [~ste...@apache.org] is coming from re: dangers of developing one unit to rule them all, but I also think there needs to be *some* way to convey performance requirements. Sandy's example shows that just because a job ran fine with one core on some box doesn't mean the job is going to run fine with one core on another. We will not be able to develop a metric that will cover all the hardware architecture differences, but if a metric works in the vast majority of cases then I think that's a net win over no metric. The APIs are already set for 2.1, and I believe the common case will be jobs where a single thread dominates the overall CPU request of the container. In that sense, we can map the existing API call to a single vcore ask and add another API where the ask can be a list/array of vcore asks. This could get complicated in the scheduler for an architecture where the effective vcore rating of the processors is not homogenous (brings up the spectre of processor-pinning and per-processor scheduling), but I don't think this will be a common architecture. Define a virtual core unambigiously --- Key: YARN-1024 URL: https://issues.apache.org/jira/browse/YARN-1024 Project: Hadoop YARN Issue Type: Improvement Reporter: Arun C Murthy Assignee: Arun C Murthy We need to clearly define the meaning of a virtual core unambiguously so that it's easy to migrate applications between clusters. For e.g. here is Amazon EC2 definition of ECU: http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it Essentially we need to clearly define a YARN Virtual Core (YVC). Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1024) Define a virtual core unambigiously
[ https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730074#comment-13730074 ] Arun C Murthy commented on YARN-1024: - bq. If I am used to running my single-threaded task on a fast core (let's say rated at 250 YVCs), and then I migrate it to another cluster with slower cores (let's say rated at 150 YVCs), and still request 250 YVCs, my task will run no faster than if I had requested it with 150 YVCs. [~sandyr] That is why you'd set a max-vcores in CS/FS of 150. This prevents users from falling into that trap. So, that should solve it - correct? Define a virtual core unambigiously --- Key: YARN-1024 URL: https://issues.apache.org/jira/browse/YARN-1024 Project: Hadoop YARN Issue Type: Improvement Reporter: Arun C Murthy Assignee: Arun C Murthy We need to clearly define the meaning of a virtual core unambiguously so that it's easy to migrate applications between clusters. For e.g. here is Amazon EC2 definition of ECU: http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it Essentially we need to clearly define a YARN Virtual Core (YVC). Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1024) Define a virtual core unambigiously
[ https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730077#comment-13730077 ] Arun C Murthy commented on YARN-1024: - [~jlowe] Yep, it does make sense to talk about a more explicit 'vector of cores' model as we've discussed in past - that said, I agree it's too early. Define a virtual core unambigiously --- Key: YARN-1024 URL: https://issues.apache.org/jira/browse/YARN-1024 Project: Hadoop YARN Issue Type: Improvement Reporter: Arun C Murthy Assignee: Arun C Murthy We need to clearly define the meaning of a virtual core unambiguously so that it's easy to migrate applications between clusters. For e.g. here is Amazon EC2 definition of ECU: http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it Essentially we need to clearly define a YARN Virtual Core (YVC). Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1024) Define a virtual core unambigiously
[ https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730080#comment-13730080 ] Arun C Murthy commented on YARN-1024: - Overall, yes, there are certainly issues with a strict definition vcore etc., but we need to do *just enough* for now - not solve all possible permutations. Basic requirements are simplicity, predictability and consistency - in that order. Define a virtual core unambigiously --- Key: YARN-1024 URL: https://issues.apache.org/jira/browse/YARN-1024 Project: Hadoop YARN Issue Type: Improvement Reporter: Arun C Murthy Assignee: Arun C Murthy We need to clearly define the meaning of a virtual core unambiguously so that it's easy to migrate applications between clusters. For e.g. here is Amazon EC2 definition of ECU: http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it Essentially we need to clearly define a YARN Virtual Core (YVC). Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1024) Define a virtual core unambigiously
[ https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730195#comment-13730195 ] Sandy Ryza commented on YARN-1024: -- Jason, Steve, and Arun, you bring up good points that I think have helped me understand some of my assumptions. I agree that simplicity, predictability, and consistency are our most important requirements. I agree with Jason that at least two values - processing power per core and # of cores - are required to fully express a request, and that, in spite of this, we should not use both and that a single value is better than nothing. We have a tradeoff between * A definition that offers some predictability between clusters, but only makes sense for requests for a single physical core or less per container. * A definition that offers predictability only on homogeneous hardware, but that functions sensibly for requests for both more and less than a single physical core. I thought that one of the exciting things about allowing requests for CPU would be that YARN would be able to better accommodate multi-threaded CPU-intensive frameworks like MPI and Storm. Predictability between clusters seems to matter a lot less to me. A ton of other factors interfere with this kind of predictability. The speed that hardware permits a task to read from disk or over the network has can have just as large an impact on the processing power it consumes as whatever the task is doing. I don't believe that we will be able to attain predictability to the degree that it will provide much value. Define a virtual core unambigiously --- Key: YARN-1024 URL: https://issues.apache.org/jira/browse/YARN-1024 Project: Hadoop YARN Issue Type: Improvement Reporter: Arun C Murthy Assignee: Arun C Murthy We need to clearly define the meaning of a virtual core unambiguously so that it's easy to migrate applications between clusters. For e.g. here is Amazon EC2 definition of ECU: http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it Essentially we need to clearly define a YARN Virtual Core (YVC). Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1024) Define a virtual core unambigiously
[ https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730197#comment-13730197 ] Sandy Ryza commented on YARN-1024: -- bq. The speed that hardware permits a task to read from disk or over the network has can have just as large an impact on the processing power it consumes as whatever the task is doing. Meant: The speed that hardware permits a task to read from disk or over the network can have just as large an impact on the processing power it consumes as whatever the task is doing. Define a virtual core unambigiously --- Key: YARN-1024 URL: https://issues.apache.org/jira/browse/YARN-1024 Project: Hadoop YARN Issue Type: Improvement Reporter: Arun C Murthy Assignee: Arun C Murthy We need to clearly define the meaning of a virtual core unambiguously so that it's easy to migrate applications between clusters. For e.g. here is Amazon EC2 definition of ECU: http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it Essentially we need to clearly define a YARN Virtual Core (YVC). Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1024) Define a virtual core unambigiously
[ https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730301#comment-13730301 ] Eli Collins commented on YARN-1024: --- I agree we need to define the meaning of a virtual core unambiguously (otherwise we won't be able to support two different frameworks on the same cluster that may have differing ideas of what a vcore is). I also agree with Phil that there are essentially two high-level use cases: 1. Jobs that want to express how much CPU capacity the job needs. Real world example - a distcp job wants to express it needs 100 containers but only a fraction of a CPU for each since it will spend most of its time blocking on IO. 2. Services - ie long-lived frameworks (ie support 2-level scheduling) - that want to request cores on many machines on a cluster and want to express CPU-level parallelism and aggregate demand (because they will schedule fine-grain requests w/in their long-lived containers). Eg a framework should be able to ask for two containers on a host, each with one core, so it can get two containers that can execute in parallel on a full core. This is assuming we plan to support long-running services in Yarn (YARN-896), which is hopefully not controversial. Real world example is HBase which may want 2 guaranteed cores per host on a given set of hosts. Seems like there are two high-level approaches: 1. Get rid of vcores. If we define 1vcore=1pcore (1vcore=1vcpu for virtual environments) and support fractional cores (YARN-972) then services can ask for 1 or more vcores knowing they're getting real cores and jobs just ask for what fraction of a vcore they think they need. This is really abandoning the concept of a virtual core because it's actually expressing a physical requirement (like memory, we assume Yarn is not dramatically over-committing the host). We can handle heterogeneous CPUs via attributes (as discussed in other Yarn jiras) since most clusters in my experience don't have wildly different processors (eg 1 or 2 generations is common), and attributes are sufficient to express policies like all my cores should have equal/comparable performance. 2. Keep going with vcores as a CPU unit of measurement. If we define 1vcore=1ECU (works 1:1 for virtual environments) then services (#1) need to understand the the power of a core so they can ask for that many vcores - essentially they are just undoing the virtualization. YARN would need to make sure two containers each with 1 pcores worth of vcores does in fact give you two cores( just like hypervisors schedule vcpus for the same VM on different pcores to ensure parallelism), but there would be no guarantee that two containers on the same host each w/ one vcore would run in parallel. Jobs that want fractional cores would just express 1vcore per container and work they're way up based on the experience running on the cluster (or also undo the virtualization by calculating vcore/pcore if they know what fraction of a pcore they want). Heterogenous CPUs does not fall out naturally (still need attributes) since there's no guarantee you can describe the difference between two CPUs is roughly 1 or more vcore (eg 2.4 - vs 2.0 Ghz 1ECU), however there's no need for fractional vcores. I think either is reasonable and can be made to work, though I think #1 is preferable because: - Some frameworks want to express containers in physical resources (this is consistent with how YARN handles memory) - You can support jobs that don't want a full core via fractional cores (or slightly over-committing cores) - You can support heterogeneous cores via attributes (I want equivalent containers) - vcores are optional anyway (only used in DRF) and therefore only need to be expressed if you care about physical cores because you need to reserve them or say you want a fraction of one Either way I think vcore is the wrong name either way because in #1 1vcore=1pcore so there's no virtualization and in #2 1 vcore is not a virtualization of a core (10 vcores does not give me 10 levels of parallelism), it's _just a unit_ (like an ECU). Define a virtual core unambigiously --- Key: YARN-1024 URL: https://issues.apache.org/jira/browse/YARN-1024 Project: Hadoop YARN Issue Type: Improvement Reporter: Arun C Murthy Assignee: Arun C Murthy We need to clearly define the meaning of a virtual core unambiguously so that it's easy to migrate applications between clusters. For e.g. here is Amazon EC2 definition of ECU: http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it Essentially we need to clearly define a YARN Virtual Core (YVC). Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2
[jira] [Commented] (YARN-1024) Define a virtual core unambigiously
[ https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13728952#comment-13728952 ] Arun C Murthy commented on YARN-1024: - We need to push on YARN-160 and normalize to YARN Virtual Core (YVC) or ECU itself. Define a virtual core unambigiously --- Key: YARN-1024 URL: https://issues.apache.org/jira/browse/YARN-1024 Project: Hadoop YARN Issue Type: Improvement Reporter: Arun C Murthy Assignee: Arun C Murthy We need to clearly define the meaning of a virtual core unambiguously so that it's easy to migrate applications between clusters. For e.g. here is Amazon EC2 definition of ECU: http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it We can use ECU itself: *One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira