TLDR;
Use only file with the name .dockercfg for docker credentials in mesos
tasks!
Long story:
---------------
Holy smokescreens!
This is for reporting & documenting purposes only, so that others don't
have to pull their hair like I did for the past few evenings!
A little background:
I am running Ubuntu 14.04 on my system and docker stores its credentials in
the ~/.docker/config.json as
cat ~/.docker/config.json
{
"auths": {
"repo.example.com:5000": {
"auth": "<snip>",
"email": "<snip>"
}
}
}
And I am doing all these experiments on a coreOS system which stores the
credentials in ~/.dockercfg as
core@aurora-1 ~ $ cat ~/.dockercfg
{
"repo.example.com:5000": {
"auth": "<snip>",
"email": "<snip>"
}
}
Since my container was an Ubuntu 14.04 container (as was my local system),
I used the ubuntu credential file format, i.e. I couldn't get the slave
task to read the docker credentials as I had stored it as
~/.docker/config.json.
After parsing through (a lot of find's, grep's and regex matching) aurora,
mesos, and thermos source code, I saw in mesos/src/docker/docker.cpp:
1126 // Set HOME variable to pick up *.dockercfg*.
1127 map<string, string> environment = os::environment();
1128
1129 environment["HOME"] = directory;
1130
Changed the filename and the json content, changed the
thermos_executor_resources, and bam, docker pull works!
Well, the mesos documentation does say "To run an image from a private
repository, one can include the URI pointing to a .dockercfg that contains
login information." and I would have read it a dozen times!
But I never thought that they literally meant '.dockercfg' as the name of
the file!
--
κρισhναν
On Thu, Mar 3, 2016 at 1:45 PM, Krish <[email protected]> wrote:
>
> I have got the docker config file copied into the sandbox using the
> thermos_executor_resources flag; however docker is still not able to find
> the credentials file for doing an appropriate pull of image from a private
> repo.
>
> When I try to use the library/hello-world:latest image from public docker
> repo to check if everything works fine without the credentials, I encounter
> a different problem:
> exec: "/bin/sh": stat /bin/sh: no such file or directory
> Error response from daemon: Cannot start container
> de93dc344d44b41bccccff49e508001a97ff23a8964e637d32a506a31fd4d946: [8]
> System error: exec: "/bin/sh": stat /bin/sh: no such file or directory
>
> I was referring to this email for guidance on setting up a mesos slave:
> http://mail-archives.apache.org/mod_mbox/aurora-dev/201503.mbox/%3CCAKB1MkHR=+nonesllx9buwttdthsku46pw_wr4b+_z9p59+...@mail.gmail.com%3E
>
> So, I cannot get the credentials file to be used by docker, and if I
> bypass authentication, I can do a docker pull, but encounter a weird error
> in launching the hello-world image.
>
> Am I missing out on checking any log files generated? I currently refer to
> mesos-slave stdout and the sandbox stderr file.
> Any configuration parameter I am missing for this to happen?
>
> Any pointers will be really helpful. Thanks in advance.
>
>
>
> --
> κρισhναν
>
> On Sun, Feb 28, 2016 at 3:37 PM, Krish <[email protected]> wrote:
>
>> Continuing my earlier chain of thought, I found this in the mesos bug
>> list:
>> MESOS-4242 - Allow Docker private registry credentials to be passed from
>> framework.
>> How does one pass credentials using the framework? As it seems the
>> .docker/config.json is not read from the slave.
>>
>>
>>
>>
>> --
>> κρισhναν
>>
>> On Sat, Feb 27, 2016 at 11:46 PM, Krish <[email protected]>
>> wrote:
>>
>>> I couldn't complete my PoC before project before (got busy with other
>>> work). Well, it is never too late and here's my update and issue.
>>>
>>> I have a 3 node zk (3.5.1 alpha), mesos-master (v0.24.1) & aurora
>>> (v0.11.0) running.
>>> I was stuck in a problem where I was using mesos 0.25.0 & aurora 0.9.0 &
>>> got a protobuf field not set error - ExecutorInfo field.
>>>
>>> I have a mesos agent running in docker container on coreos and it can
>>> access the host docker just fine.
>>> I have also put the docker login credentials file at the right location
>>> for it to access the private docker registry.
>>> I can manually trigger a docker pull and docker run without issues from
>>> the slave (which is also reflected properly outside the slave container
>>> with docker images and docker ps).
>>>
>>> However, when I try to run an aurora job with hello-docker container,
>>> the slave prints out the log that docker pull has failed; more specifically:
>>> " failed to start: Failed to 'docker pull
>>> private_repo.com:5000/krish/test:latest': exit status = exited with
>>> status 1 stderr = Error: image krish/test:latest not found"
>>>
>>> My hunch is that when using docker run from aurora DSL, it does not read
>>> the docker credentials file properly and hence fails. I can reproduce the
>>> exact same error when I delete the credentials file from the slave and
>>> trigger a pull.
>>>
>>> Is the hunch right? If yes, is there a way to resolve this? Maybe source
>>> it some way before the run command?
>>>
>>>
>>>
>>> --
>>> κρισhναν
>>>
>>> On Tue, Oct 27, 2015 at 10:35 PM, Bill Farner <[email protected]>
>>> wrote:
>>>
>>>> (1) clusters.json is written by you, configuring the CLI client with
>>>> instructions for what clusters are available and how to discover them.
>>>>
>>>> (2) That's expected - mesos only allows one active replica of a
>>>> framework at a time, this signals which one is active.
>>>>
>>>> (3) The observer is essentially a web server that allows you to browse
>>>> a task's sandbox directory and other information about it. You will need
>>>> to configure it to run on your worker/agent nodes for that functionality to
>>>> work (it's linked from the scheduler web UI).
>>>>
>>>> (4) You could indeed implement that behavior externally. There is a
>>>> reason:
>>>> https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L556-L559
>>>>
>>>> (5) That is correct. The scheduler exposes a thrift API that you would
>>>> use (a REST API is coming, but ground has not yet been broken). If you go
>>>> this route, i suggest you skip the DSL and use the JSON task description
>>>> format that is shipped over the API. There's not good documentation on
>>>> this, but we can help you through it and would be grateful for a writeup of
>>>> your approach!
>>>>
>>>>
>>>> On Mon, Oct 26, 2015 at 11:44 PM, Krish <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Folks,
>>>>> Firstly, thanks for all the help. Am happy to report that I have set
>>>>> up zk, mesos & aurora, & can work further towards my idea of having an
>>>>> auto-scaling cluster.
>>>>> I have some further questions about the work done so far & things I
>>>>> plan to do:
>>>>>
>>>>> 1. Is the /etc/aurora/clusters.json file created by the scheduled
>>>>> or does it need to be handcrafted? I had to manually edit the file to
>>>>> get
>>>>> my `aurora job ...` cli to work.
>>>>>
>>>>> 2. I am running a cluster of 3 coreOS VMs on vagrant with zk,
>>>>> mesos & aurora in a docker container. Only 1 of them outputs '1' when I
>>>>> look at the framework_registered' field. Is this expected? How do I
>>>>> verify
>>>>> that they are working as a cluster?
>>>>>
>>>>> 3. From the documentation, I see that there is an observer that
>>>>> needs to be listening on port 1338. What is the observer socket & its
>>>>> purpose? I have aurora listening only on ports 8081 (http port) & 8083
>>>>> (libprocess).
>>>>>
>>>>> 4. I read about the 'PENDING' field in aurora documentation, as
>>>>> Bill suggested, & realize that it just shows that a task is waiting for
>>>>> some reasons (for want of resources, in my case, as 0 slaves have
>>>>> registered). I was thinking of adding a hook to the pending state; say
>>>>> if a
>>>>> task is PENDING for 5 minutes for lack of resources in the cluster,
>>>>> then
>>>>> spin up a new machine. Is this the right approach to take? Does aurora
>>>>> provide reasons for why is a task in PENDING state?
>>>>>
>>>>> => aurora job status testcluster/$USER/test/hello_world
>>>>> INFO] Checking status of testcluster/ubuntu/test/hello_world
>>>>> Active tasks (1):
>>>>> Task role: ubuntu, env: test, name: hello_world, instance:
>>>>> 0, status:
>>>>> PENDING on None
>>>>> cpus: 0.1, ram: 16 MB, disk: 16 MB
>>>>> events:
>>>>> 2015-10-23 04:55:33 PENDING: None
>>>>> Inactive tasks (0):
>>>>>
>>>>> 5. Aurora defines job/s is a .aurora config file & if I decide to
>>>>> increase/decrease the number of instances in my cluster, then I need to
>>>>> create/overwrite the concerned the .aurora and trigger the `aurora
>>>>> update
>>>>> ...` command. Is this right?
>>>>> If yes, is there an HTTP API I can invoke remotely which triggers
>>>>> this update?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> κρισhναν
>>>>>
>>>>> On Fri, Oct 23, 2015 at 8:09 AM, Joshua Cohen <[email protected]
>>>>> > wrote:
>>>>>
>>>>>> I suspect your error from `aurora job create ...` is due to the
>>>>>> aurora config you're using referencing `/vagrant/hello_world.py` which
>>>>>> does
>>>>>> not exist (as you say: you're not even using Vagrant). Can you link the
>>>>>> .aurora config you're using?
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Joshua
>>>>>>
>>>>>> On Thu, Oct 22, 2015 at 3:22 PM, Krish <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks, Zameer.
>>>>>>>
>>>>>>> I had to modify /etc/aurora/clusters.json:
>>>>>>> [
>>>>>>> {
>>>>>>> "auth_mechanism": "UNAUTHENTICATED",
>>>>>>> "name": "testcluster",
>>>>>>> "scheduler_zk_path": "/scheduler/aurora",
>>>>>>> "slave_root": "/var/lib/mesos",
>>>>>>> "slave_run_directory": "latest",
>>>>>>> "zk": "127.0.1.1"
>>>>>>> }
>>>>>>> ]
>>>>>>>
>>>>>>> I have a hello_world.aurora in my home folder. However the following
>>>>>>> command errors out:
>>>>>>> ~$ aurora job create testcluster/testrole/test/hellojob
>>>>>>> ./hello_world.aurora
>>>>>>> Error loading configuration: [Errno 2] No such file or directory:
>>>>>>> '/vagrant/hello_world.py'
>>>>>>>
>>>>>>> A job list does work:
>>>>>>> ~$ aurora job list testcluster
>>>>>>> INFO] Retrieving jobs for role None
>>>>>>>
>>>>>>> I am not even using the vagrant. I am using zk & mesos on the same
>>>>>>> machine as aurora. How do I submit these job templates to aurora?
>>>>>>>
>>>>>>> Any pointers to documentation will be helpful.
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> κρισhναν
>>>>>>>
>>>>>>> On Wed, Oct 21, 2015 at 11:09 PM, Zameer Manji <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Version 0.9.0 does not have the gc executor. Version 0.9.0 uses
>>>>>>>> Mesos' task reconciliation
>>>>>>>> <http://mesos.apache.org/documentation/latest/reconciliation/> API
>>>>>>>> instead.
>>>>>>>>
>>>>>>>> On Wed, Oct 21, 2015 at 9:28 AM, Krish <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks Bill for the location to the debs. I was finally able to
>>>>>>>>> run aurora. :)
>>>>>>>>>
>>>>>>>>> I did find thermos_executor.pex & thermos_observer after
>>>>>>>>> installing aurora-executor. I still could not find gc_executor.pex on
>>>>>>>>> my
>>>>>>>>> system.
>>>>>>>>> Is there a location from where I can download the binaries for
>>>>>>>>> *.pex or build them from scratch?
>>>>>>>>>
>>>>>>>>> root@dev:/# find . -name "*.pex"
>>>>>>>>> ./usr/share/aurora/bin/thermos_executor.pex
>>>>>>>>> ./usr/share/aurora/bin/kaurora_admin.pex
>>>>>>>>> ./usr/share/aurora/bin/kaurora.pex
>>>>>>>>> ./usr/share/aurora/bin/thermos.pex
>>>>>>>>> ./usr/share/aurora/bin/thermos_observer.pex
>>>>>>>>> ./home/ubuntu/.pex
>>>>>>>>> ./root/.pex
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> κρισhναν
>>>>>>>>>
>>>>>>>>> On Tue, Oct 20, 2015 at 11:46 PM, Bill Farner <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Aurora currently requires an executor, so setting it to /dev/null
>>>>>>>>>> will not work. Happy to talk further about your thoughts around
>>>>>>>>>> sidestepping the executor.
>>>>>>>>>>
>>>>>>>>>> As for working with the scheduler source code, it's a standard
>>>>>>>>>> gradle project and we tend to use intellij. Docs to help ramp on
>>>>>>>>>> that:
>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md
>>>>>>>>>>
>>>>>>>>>> As for builds - the .zip is a source distribution, so it won't
>>>>>>>>>> have any pre-built binaries. If you're on debian, we have official
>>>>>>>>>> debs
>>>>>>>>>> here: https://bintray.com/apache/aurora
>>>>>>>>>> You can see how they're built here (and can build your own)
>>>>>>>>>> packages: https://github.com/apache/aurora-packaging
>>>>>>>>>> We're close to having official RPMs, but none to speak of yet.
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 20, 2015 at 9:47 AM, Krish <[email protected]
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> Stephen,
>>>>>>>>>>> I am trying to get started and run aurora without thermos
>>>>>>>>>>> executor (setting it to /dev/null does not help) - on a local linux
>>>>>>>>>>> box for
>>>>>>>>>>> now & planning to containerize/dockerize it later.
>>>>>>>>>>>
>>>>>>>>>>> Can you please point me to the right documentation (or a pointer
>>>>>>>>>>> to the cli parsing source code) which can help me resolve this?
>>>>>>>>>>> Also, are
>>>>>>>>>>> there any steps steps to import source code into eclipse to browse &
>>>>>>>>>>> analyze code for this.
>>>>>>>>>>>
>>>>>>>>>>> Also, where do i find all the *.pex files? They are not present
>>>>>>>>>>> in the zip file nor anywhere in the built source code.
>>>>>>>>>>>
>>>>>>>>>>> I know I am asking too many queries on a single thread here, &
>>>>>>>>>>> would appreciate the help.
>>>>>>>>>>> I think at the end of this, I will put the steps I followed in a
>>>>>>>>>>> gist/blog so others might find their way around, & not struggle as
>>>>>>>>>>> much.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> κρισhναν
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Krish,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> you don't have to set framework_authentication_file and
>>>>>>>>>>>> zk_digest_credentials. The scheduler help text is misleading here
>>>>>>>>>>>> as
>>>>>>>>>>>> everything will work fine if you leave those empty.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> In addition, looks like you are misunderstanding the usage of
>>>>>>>>>>>> the thermos_executor_path command line flag of the scheduler.
>>>>>>>>>>>> It is supposed to point to the binary containing the generic
>>>>>>>>>>>> Aurora executor (thermos_executor.pex). You only need the
>>>>>>>>>>>> hello_world.aurora
>>>>>>>>>>>> once your scheduler is up an running. It serves as an example
>>>>>>>>>>>> input for the
>>>>>>>>>>>> aurora command line client which can be used to scheduler jobs and
>>>>>>>>>>>> services
>>>>>>>>>>>> on an Aurora master.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Have you tried to use the vagrant box? Just type 'vagrant up`in
>>>>>>>>>>>> a checkout of the Aurora source code. It gives you a running
>>>>>>>>>>>> scheduler to
>>>>>>>>>>>> play with. Once you have understood how it works, you can start
>>>>>>>>>>>> trying to
>>>>>>>>>>>> install it on your own (by reverse-engineering the vagrant box).
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hope this helps a little,
>>>>>>>>>>>>
>>>>>>>>>>>> Stephan
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>> *From:* Krish <[email protected]>
>>>>>>>>>>>> *Sent:* Tuesday, October 20, 2015 11:39 AM
>>>>>>>>>>>> *To:* Bill Farner
>>>>>>>>>>>> *Cc:* [email protected]; Erb, Stephan
>>>>>>>>>>>>
>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>
>>>>>>>>>>>> Bill/Stephen,
>>>>>>>>>>>> I still get a stacktrace when running the aurora scheduler CLI.
>>>>>>>>>>>>
>>>>>>>>>>>> I do not know what to specify for
>>>>>>>>>>>> -framework_authentication_file & -zk_digest_credentials, and they
>>>>>>>>>>>> are
>>>>>>>>>>>> required arguments.
>>>>>>>>>>>>
>>>>>>>>>>>> I am not using any authentication on Mesos master, do I still
>>>>>>>>>>>> need the framework_authentication_file parameter?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> rm -rf /db /backup_dir
>>>>>>>>>>>> mesos-log initialize --path="/db"
>>>>>>>>>>>> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
>>>>>>>>>>>> JAVA_OPTS="-Xmx1536m -Xms256m"
>>>>>>>>>>>> /usr/local/aurora-scheduler/bin/aurora-scheduler
>>>>>>>>>>>> -backup_dir=/backup_dir
>>>>>>>>>>>> -cluster_name=tc
>>>>>>>>>>>> -mesos_master_address=zk://localhost:2181/mesos/master
>>>>>>>>>>>> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181
>>>>>>>>>>>> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false
>>>>>>>>>>>> -native_log_file_path=/db
>>>>>>>>>>>> -thermos_executor_path=/home/ubuntu/hello_world.aurora
>>>>>>>>>>>> ...
>>>>>>>>>>>> ...
>>>>>>>>>>>> INFO: Binding org.apache.aurora.scheduler.http.Utilization to
>>>>>>>>>>>> GuiceManagedCompon
>>>>>>>>>>>> entProvider with the scope "PerRequest"
>>>>>>>>>>>> Oct 20, 2015 9:27:40 AM
>>>>>>>>>>>> org.apache.aurora.scheduler.cron.quartz.CronModule provi
>>>>>>>>>>>> deTimeZone
>>>>>>>>>>>> WARNING: Cron schedules are configured to fire according to
>>>>>>>>>>>> timezone Greenwich M
>>>>>>>>>>>> ean Time but system timezone is set to Coordinated Universal
>>>>>>>>>>>> Time
>>>>>>>>>>>> Oct 20, 2015 9:27:41 AM
>>>>>>>>>>>> org.eclipse.jetty.server.AbstractConnector doStart
>>>>>>>>>>>> INFO: Started [email protected]:43843
>>>>>>>>>>>> E1020 09:27:41.290 THREAD1
>>>>>>>>>>>> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec
>>>>>>>>>>>> ute: Caught unchecked exception:
>>>>>>>>>>>> com.google.inject.ProvisionException: Guice pro
>>>>>>>>>>>> vision errors:
>>>>>>>>>>>>
>>>>>>>>>>>> 1) Error in custom provider,
>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be null at
>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>> while locating org.apache.mesos.Log
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>>> while locating
>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>>>
>>>>>>>>>>>> 1 error
>>>>>>>>>>>> com.google.inject.ProvisionException: Guice provision errors:
>>>>>>>>>>>>
>>>>>>>>>>>> 1) Error in custom provider,
>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be
>>>>>>>>>>>> null
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>> while locating org.apache.mesos.Log
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>>> while locating
>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>>>
>>>>>>>>>>>> 1 error
>>>>>>>>>>>> at
>>>>>>>>>>>> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> The typical flow is that you keep your .aurora file checked
>>>>>>>>>>>>> into git, and commit every time you deploy/update. When you
>>>>>>>>>>>>> change your
>>>>>>>>>>>>> file, you will instruct Aurora to update the live job (have a
>>>>>>>>>>>>> look at aurora
>>>>>>>>>>>>> update -h). Aurora will perform a rolling upgrade of your
>>>>>>>>>>>>> job to the new config. You'll use this same flow for updating
>>>>>>>>>>>>> your job's
>>>>>>>>>>>>> software as well as resizing the job.
>>>>>>>>>>>>>
>>>>>>>>>>>>> For (3), you could set up alerting for stats that the
>>>>>>>>>>>>> scheduler exports. Have a look here for monitoring background:
>>>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/monitoring.md
>>>>>>>>>>>>>
>>>>>>>>>>>>> You'll find want to look at scheduler stats related to
>>>>>>>>>>>>> 'pending'.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 12:16 PM, Krish <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for the pointer. Now I notice that the
>>>>>>>>>>>>>> aurora-scheduler script has the --thermos_executor_path as a
>>>>>>>>>>>>>> mandatory
>>>>>>>>>>>>>> requirement.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have a couple of questions on how the
>>>>>>>>>>>>>> thermos_executor/.aurora config file functions:
>>>>>>>>>>>>>> 1. Do we have to statically define the file beforehand?
>>>>>>>>>>>>>> 2. What happens when we want to dynamically change the
>>>>>>>>>>>>>> config, say increasing the number of instances of a service
>>>>>>>>>>>>>> required? Does
>>>>>>>>>>>>>> aurora require a reboot then?
>>>>>>>>>>>>>> 3. How do I get notified about the message mesos sends when
>>>>>>>>>>>>>> it cannot schedule tasks for lack of resources? Should I depend
>>>>>>>>>>>>>> on aurora
>>>>>>>>>>>>>> for this or try to look for a hook into mesos?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think a little bit of context would help here.
>>>>>>>>>>>>>> What I plan to check is to run a very basic job/task inside a
>>>>>>>>>>>>>> docker container with aurora & wait for a 'resource not
>>>>>>>>>>>>>> available' message
>>>>>>>>>>>>>> from mesos, and accordingly call an api to spin up a new node in
>>>>>>>>>>>>>> my cluster.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I believe you are missing the thermos_executor options that
>>>>>>>>>>>>>>> have to be passed to the scheduler command line.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> See
>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39
>>>>>>>>>>>>>>> for an example
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>>> *From:* Krish <[email protected]>
>>>>>>>>>>>>>>> *Sent:* Monday, October 19, 2015 8:45 AM
>>>>>>>>>>>>>>> *To:* [email protected]
>>>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>> I am a n00b with apache aurora & trying to experiment some
>>>>>>>>>>>>>>> things on my local machine with zookeeper and mesos-master
>>>>>>>>>>>>>>> running locally.
>>>>>>>>>>>>>>> They have initialized properly. When I try to run aurora with
>>>>>>>>>>>>>>> the required
>>>>>>>>>>>>>>> options, I get the following error, & googing hasn't helped me
>>>>>>>>>>>>>>> much here.
>>>>>>>>>>>>>>> Appreciate any help. Thanks in advance.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>> WARNING: Method [public void
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)]
>>>>>>>>>>>>>>> is synthetic and is being intercepted by
>>>>>>>>>>>>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8]. This
>>>>>>>>>>>>>>> could indicate a bug. The method
>>>>>>>>>>>>>>> may be intercepted twice, or may not be intercepted at all.
>>>>>>>>>>>>>>> Exception in thread "main"
>>>>>>>>>>>>>>> com.google.inject.CreationException: Guice creation errors:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1) An exception was caught and reported. Message: A value
>>>>>>>>>>>>>>> may only be retrieved from a variable that has a default or has
>>>>>>>>>>>>>>> been
>>>>>>>>>>>>>>> set.
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2) Could not find a suitable constructor in
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes
>>>>>>>>>>>>>>> must have
>>>>>>>>>>>>>>> either one (a
>>>>>>>>>>>>>>> nd only one) constructor annotated with @Inject or a
>>>>>>>>>>>>>>> zero-argument constructor that is not private.
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2 errors
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:95)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:83)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
>>>>>>>>>>>>>>> Caused by: java.lang.IllegalStateException: A value may only
>>>>>>>>>>>>>>> be retrieved from a variable that has a default or has been set.
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>>>>>>>>>>>>>>> at com.twitter.common.args.Arg.get(Arg.java:82)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>> com.google.inject.util.Modules$2.configure(Modules.java:114)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>> com.google.inject.spi.Elements.getElements(Elements.java:101)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
>>>>>>>>>>>>>>> ... 7 more
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Complete logs are present @http://pastebin.com/i72HvbYi.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Zameer Manji
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>