>
> Question is when I am modifying job details, in particular, scaling up
> instances based on demand, do I use the startJobUpdate or the addInstances
> API?

If all you need is scale out (meaning add more instances of the existing
config) the 'addInstances' should be an easier and faster API to work with.

The 'startJobUpdate' can still be used to add/remove instances but it's
main goal is to perform a rolling update of existing instances. So, the
guideline may be:
- addInstances: Adds instances of the existing config. Use for a quick
scale out of *existing* job, e.g. in an autoscaler solution.
- killTasks: Kills existing instances. Use to reduce the number of running
instances or kill the job entirely
- startJobUpdate: Performs rolling update of the existing job. This is a
more involved API that requires full job configuration. It's capable of
updating existing services in a safe manner by performing health evaluation
during the rollout. It's also capable of rolling back the failed update
automatically. You can also add instances of the *new *config during the
update as well as remove instances of the *old *config if needed.

Only killTasks and startJobUpdate are potentially destructive (capable of
killing instances). The addInstances never kills anything.

And if I reduce the instances (for eg, from 6 to 5), will the API
> (addInstances or startJobUpdate) also kill the last instance of the job?

The startJobUpdate ensures the job instance count matches the one specified
in the .aurora config. So, if you reduced the count from 6 to 5 (via
killTasks) AND your .aurora config still has 'instances=6' you job will be
restored back to full count. Think of startJobUpdate as the ultimate source
of truth. No matter what the current job state is (wrt instance counts or
task configs) it will attempt to bring everything to the common denominator
in terms of task_config and instance_count.

On Wed, Mar 16, 2016 at 11:36 AM, Krish <[email protected]> wrote:

> Thanks, Maxim & Bill!
>
> I would love some more clarifications to the below observations.
>
> A little googling helped me find
> https://issues.apache.org/jira/browse/AURORA-1258, which then led me to
> http://markmail.org/message/al26gmpwlcns3oer#query:+page:1+mid:2smaej5n5e54li3g+state:results
> .
>
> Question is when I am modifying job details, in particular, scaling up
> instances based on demand, do I use the startJobUpdate or the addInstances
> API?
> Seems like addInstances is supposed to do this, but you mention that
> startJobUpdate is also supposed to be the "main API to change your
> service job in any way (including adding, removing or modifying instances).
> "
>
> Also, if both are valid, under what scenarios would one use startJobUpdate?
> Which one will be non-destructive? As in, which API does not kill current
> instances while adding new ones?
>
> And if I reduce the instances (for eg, from 6 to 5), will the API
> (addInstances or startJobUpdate) also kill the last instance of the job?
>
>
> --
> κρισhναν
>
> On Wed, Mar 16, 2016 at 10:30 PM, Bill Farner <[email protected]> wrote:
>
>> Regarding documentation - Maxim is correct that there isn't much in the
>> way of independent/holistic docs for the thrift API.  There is, however,
>> scant javadoc-style documentation within the IDL spec itself:
>> https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift
>>
>> If you are looking to use the thrift API directly, the most difficult API
>> method will be defining the ExecutorConfig.data value when calling
>> createJob.  Please don't hesitate to ask for assistance if you get to that
>> point!
>>
>> On Wed, Mar 16, 2016 at 9:19 AM, Maxim Khutornenko <[email protected]>
>> wrote:
>>
>>> 1. All APIs require thrift inputs of the structs specified, and return
>>>> thrift values only in Response.result field.
>>>
>>> Correct. There is also 'details' field that may have additional messages
>>> (of error or informational nature)
>>>
>>> 2. Is there a set of examples in the documentation to help understand
>>>> Thrift API better?
>>>
>>> The thrift API is largely undocumented. There is an effort to bring up a
>>> fully supported REST API that will presumably get documented and become
>>> much easier to use. It's mostly in flux now.
>>>
>>> 3. createJob(JobDescription desc, Lock lock):
>>>
>>> This is the API to use when you a brand new service or adhoc (batch) job
>>> created. The JobDescription is populated from the .aurora config. You may
>>> want to trace "aurora job create" client command implementation to see how
>>> it happens.
>>>
>>> 4. What is the Lock object? I see that some APIs require locking and
>>>> some don't. For example, createJob needs a Lock object as parameter, & I am
>>>> assuming that it is required so that one does not create multiple jobs with
>>>> the same JobKey.
>>>
>>> Ignore this object as it's an echo of the old client updater. It's now
>>> deprecated and will be removed soon. You can pass NULL for now.
>>>
>>> 5. addInstances(AddInstancesConfig cfg, Lock lock):
>>>
>>> Another echo of the client updater but this time it's got a second life.
>>> Check out its new signature and comments in the api.thrift. It's
>>> essentially a "scale-out" API that can add instances to the existing job
>>> without changing the underlying task assumptions.
>>>
>>> 6. getPendingResult(TaskQuery taskquery):
>>>
>>> It's actually 'getPendingReason' and is currently used exclusively by
>>> the UI to get the reason for a task PENDING state.
>>>
>>> 7. setQuota & getQuota for setting user level quotas.
>>>
>>> This is to set role-level quota. Currently only required for tasks with
>>> 'production=True'. Search through our docs for more details.
>>>
>>> 8. killTasks to kill all running instances of a job in the cluster.
>>>
>>> It's quite versatile and can be used to kill some or all instances of
>>> the job.
>>>
>>> 9. startJobUpdate(JobUpdateRequest request, string message):
>>>
>>> Your observations are correct. This is the main API to change your
>>> service job in any way (including adding, removing or modifying instances).
>>>
>>> An aurora scheduling question is if I start a job with 5 instances, and
>>>> there are resources available to run only 4 of them, does the entire job
>>>> block, or only the 5th instance of the job blocks?
>>>
>>> Scheduler will try to schedule as many instances as it can. Those that
>>> will not find resources will remain in PENDING state until more resources
>>> are available. In your particular example only the 5th will remain PENDING.
>>>
>>>
>>> On Wed, Mar 16, 2016 at 5:54 AM, Krish <[email protected]>
>>> wrote:
>>>
>>>> Hi,
>>>> I was going through the Aurora Thrift API to determine how to add new
>>>> jobs.
>>>> I am using aurora v0.12 released last month and have upgraded to mesos
>>>> v0.25 accordingly.
>>>>
>>>> Below is a summary of my (very limited) understanding of some APIs, &
>>>> would help it if someone can point out flaws in my understanding:
>>>>
>>>>    1. All APIs require thrift inputs of the structs specified, and
>>>>    return thrift values only in Response.result field.
>>>>
>>>>    2. Is there a set of examples in the documentation to help
>>>>    understand Thrift API better?
>>>>
>>>>    3. createJob(JobDescription desc, Lock lock):
>>>>    This is basically the API to replace the Aurora DSL/.aurora files
>>>>    for job configuration.
>>>>
>>>>    4. What is the Lock object? I see that some APIs require locking
>>>>    and some don't. For example, createJob needs a Lock object as 
>>>> parameter, &
>>>>    I am assuming that it is required so that one does not create multiple 
>>>> jobs
>>>>    with the same JobKey.
>>>>
>>>>    5. addInstances(AddInstancesConfig cfg, Lock lock):
>>>>    By the naming convention, it seems this is used to increase the
>>>>    number of instances of a job. It will not result in stopping of current
>>>>    instances of the job.
>>>>
>>>>    My second explanation for this API: Since it uses a set of
>>>>    instanceIds, this is used for adding already running job in slaves to 
>>>> the
>>>>    internal data structures of Aurora to track the job.
>>>>
>>>>    6. getPendingResult(TaskQuery taskquery):
>>>>    Return the reason (in string) about why the job is PENDING. For
>>>>    example: insufficient CPU.
>>>>
>>>>    7. setQuota & getQuota for setting user level quotas.
>>>>
>>>>    8. killTasks to kill all running instances of a job in the cluster.
>>>>
>>>>    9. startJobUpdate(JobUpdateRequest request, string message):
>>>>    Used for updating jobs with the new TaskConfig specified. Can be
>>>>    used if resource requirement changes. For example: If I wanted aurora to
>>>>    update the version of container used for a job using 
>>>> TaskConfig.Container
>>>>    attribute.
>>>>
>>>>
>>>> An aurora scheduling question is if I start a job with 5 instances, and
>>>> there are resources available to run only 4 of them, does the entire job
>>>> block, or only the 5th instance of the job blocks?
>>>>
>>>> Thanks!
>>>>
>>>> --
>>>> κρισhναν
>>>>
>>>
>>>
>>
>

Reply via email to