Wow! I'm glad you got it working! To help the next poor soul trying to do this, would you be willing to put up a doc patch on our end?
On Thursday, March 3, 2016, Krish <[email protected]> wrote: > TLDR; > Use only file with the name .dockercfg for docker credentials in mesos > tasks! > > Long story: > --------------- > Holy smokescreens! > This is for reporting & documenting purposes only, so that others don't > have to pull their hair like I did for the past few evenings! > > A little background: > I am running Ubuntu 14.04 on my system and docker stores its credentials > in the ~/.docker/config.json as > cat ~/.docker/config.json > { > "auths": { > "repo.example.com:5000": { > "auth": "<snip>", > "email": "<snip>" > } > } > } > > And I am doing all these experiments on a coreOS system which stores the > credentials in ~/.dockercfg as > core@aurora-1 ~ $ cat ~/.dockercfg > { > "repo.example.com:5000": { > "auth": "<snip>", > "email": "<snip>" > } > } > > Since my container was an Ubuntu 14.04 container (as was my local system), > I used the ubuntu credential file format, i.e. I couldn't get the slave > task to read the docker credentials as I had stored it as > ~/.docker/config.json. > After parsing through (a lot of find's, grep's and regex matching) aurora, > mesos, and thermos source code, I saw in mesos/src/docker/docker.cpp: > > 1126 // Set HOME variable to pick up *.dockercfg*. > 1127 map<string, string> environment = os::environment(); > 1128 > 1129 environment["HOME"] = directory; > 1130 > > Changed the filename and the json content, changed the > thermos_executor_resources, and bam, docker pull works! > > Well, the mesos documentation does say "To run an image from a private > repository, one can include the URI pointing to a .dockercfg that contains > login information." and I would have read it a dozen times! > But I never thought that they literally meant '.dockercfg' as the name of > the file! > > > > > -- > κρισhναν > > On Thu, Mar 3, 2016 at 1:45 PM, Krish <[email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > >> >> I have got the docker config file copied into the sandbox using the >> thermos_executor_resources flag; however docker is still not able to find >> the credentials file for doing an appropriate pull of image from a private >> repo. >> >> When I try to use the library/hello-world:latest image from public docker >> repo to check if everything works fine without the credentials, I encounter >> a different problem: >> exec: "/bin/sh": stat /bin/sh: no such file or directory >> Error response from daemon: Cannot start container >> de93dc344d44b41bccccff49e508001a97ff23a8964e637d32a506a31fd4d946: [8] >> System error: exec: "/bin/sh": stat /bin/sh: no such file or directory >> >> I was referring to this email for guidance on setting up a mesos slave: >> http://mail-archives.apache.org/mod_mbox/aurora-dev/201503.mbox/%3CCAKB1MkHR=+nonesllx9buwttdthsku46pw_wr4b+_z9p59+...@mail.gmail.com%3E >> >> So, I cannot get the credentials file to be used by docker, and if I >> bypass authentication, I can do a docker pull, but encounter a weird error >> in launching the hello-world image. >> >> Am I missing out on checking any log files generated? I currently refer >> to mesos-slave stdout and the sandbox stderr file. >> Any configuration parameter I am missing for this to happen? >> >> Any pointers will be really helpful. Thanks in advance. >> >> >> >> -- >> κρισhναν >> >> On Sun, Feb 28, 2016 at 3:37 PM, Krish <[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >> >>> Continuing my earlier chain of thought, I found this in the mesos bug >>> list: >>> MESOS-4242 - Allow Docker private registry credentials to be passed from >>> framework. >>> How does one pass credentials using the framework? As it seems the >>> .docker/config.json is not read from the slave. >>> >>> >>> >>> >>> -- >>> κρισhναν >>> >>> On Sat, Feb 27, 2016 at 11:46 PM, Krish <[email protected] >>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >>> >>>> I couldn't complete my PoC before project before (got busy with other >>>> work). Well, it is never too late and here's my update and issue. >>>> >>>> I have a 3 node zk (3.5.1 alpha), mesos-master (v0.24.1) & aurora >>>> (v0.11.0) running. >>>> I was stuck in a problem where I was using mesos 0.25.0 & aurora 0.9.0 >>>> & got a protobuf field not set error - ExecutorInfo field. >>>> >>>> I have a mesos agent running in docker container on coreos and it can >>>> access the host docker just fine. >>>> I have also put the docker login credentials file at the right location >>>> for it to access the private docker registry. >>>> I can manually trigger a docker pull and docker run without issues from >>>> the slave (which is also reflected properly outside the slave container >>>> with docker images and docker ps). >>>> >>>> However, when I try to run an aurora job with hello-docker container, >>>> the slave prints out the log that docker pull has failed; more >>>> specifically: >>>> " failed to start: Failed to 'docker pull >>>> private_repo.com:5000/krish/test:latest': exit status = exited with >>>> status 1 stderr = Error: image krish/test:latest not found" >>>> >>>> My hunch is that when using docker run from aurora DSL, it does not >>>> read the docker credentials file properly and hence fails. I can reproduce >>>> the exact same error when I delete the credentials file from the slave and >>>> trigger a pull. >>>> >>>> Is the hunch right? If yes, is there a way to resolve this? Maybe >>>> source it some way before the run command? >>>> >>>> >>>> >>>> -- >>>> κρισhναν >>>> >>>> On Tue, Oct 27, 2015 at 10:35 PM, Bill Farner <[email protected] >>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >>>> >>>>> (1) clusters.json is written by you, configuring the CLI client with >>>>> instructions for what clusters are available and how to discover them. >>>>> >>>>> (2) That's expected - mesos only allows one active replica of a >>>>> framework at a time, this signals which one is active. >>>>> >>>>> (3) The observer is essentially a web server that allows you to browse >>>>> a task's sandbox directory and other information about it. You will need >>>>> to configure it to run on your worker/agent nodes for that functionality >>>>> to >>>>> work (it's linked from the scheduler web UI). >>>>> >>>>> (4) You could indeed implement that behavior externally. There is a >>>>> reason: >>>>> https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L556-L559 >>>>> >>>>> (5) That is correct. The scheduler exposes a thrift API that you >>>>> would use (a REST API is coming, but ground has not yet been broken). If >>>>> you go this route, i suggest you skip the DSL and use the JSON task >>>>> description format that is shipped over the API. There's not good >>>>> documentation on this, but we can help you through it and would be >>>>> grateful >>>>> for a writeup of your approach! >>>>> >>>>> >>>>> On Mon, Oct 26, 2015 at 11:44 PM, Krish <[email protected] >>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >>>>> >>>>>> Hi Folks, >>>>>> Firstly, thanks for all the help. Am happy to report that I have set >>>>>> up zk, mesos & aurora, & can work further towards my idea of having an >>>>>> auto-scaling cluster. >>>>>> I have some further questions about the work done so far & things I >>>>>> plan to do: >>>>>> >>>>>> 1. Is the /etc/aurora/clusters.json file created by the scheduled >>>>>> or does it need to be handcrafted? I had to manually edit the file to >>>>>> get >>>>>> my `aurora job ...` cli to work. >>>>>> >>>>>> 2. I am running a cluster of 3 coreOS VMs on vagrant with zk, >>>>>> mesos & aurora in a docker container. Only 1 of them outputs '1' when >>>>>> I >>>>>> look at the framework_registered' field. Is this expected? How do I >>>>>> verify >>>>>> that they are working as a cluster? >>>>>> >>>>>> 3. From the documentation, I see that there is an observer that >>>>>> needs to be listening on port 1338. What is the observer socket & its >>>>>> purpose? I have aurora listening only on ports 8081 (http port) & 8083 >>>>>> (libprocess). >>>>>> >>>>>> 4. I read about the 'PENDING' field in aurora documentation, as >>>>>> Bill suggested, & realize that it just shows that a task is waiting >>>>>> for >>>>>> some reasons (for want of resources, in my case, as 0 slaves have >>>>>> registered). I was thinking of adding a hook to the pending state; >>>>>> say if a >>>>>> task is PENDING for 5 minutes for lack of resources in the cluster, >>>>>> then >>>>>> spin up a new machine. Is this the right approach to take? Does aurora >>>>>> provide reasons for why is a task in PENDING state? >>>>>> >>>>>> => aurora job status testcluster/$USER/test/hello_world >>>>>> INFO] Checking status of testcluster/ubuntu/test/hello_world >>>>>> Active tasks (1): >>>>>> Task role: ubuntu, env: test, name: hello_world, instance: >>>>>> 0, status: >>>>>> PENDING on None >>>>>> cpus: 0.1, ram: 16 MB, disk: 16 MB >>>>>> events: >>>>>> 2015-10-23 04:55:33 PENDING: None >>>>>> Inactive tasks (0): >>>>>> >>>>>> 5. Aurora defines job/s is a .aurora config file & if I decide to >>>>>> increase/decrease the number of instances in my cluster, then I need >>>>>> to >>>>>> create/overwrite the concerned the .aurora and trigger the `aurora >>>>>> update >>>>>> ...` command. Is this right? >>>>>> If yes, is there an HTTP API I can invoke remotely which triggers >>>>>> this update? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> κρισhναν >>>>>> >>>>>> On Fri, Oct 23, 2015 at 8:09 AM, Joshua Cohen < >>>>>> [email protected] >>>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >>>>>> >>>>>>> I suspect your error from `aurora job create ...` is due to the >>>>>>> aurora config you're using referencing `/vagrant/hello_world.py` which >>>>>>> does >>>>>>> not exist (as you say: you're not even using Vagrant). Can you link the >>>>>>> .aurora config you're using? >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Joshua >>>>>>> >>>>>>> On Thu, Oct 22, 2015 at 3:22 PM, Krish <[email protected] >>>>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >>>>>>> >>>>>>>> Thanks, Zameer. >>>>>>>> >>>>>>>> I had to modify /etc/aurora/clusters.json: >>>>>>>> [ >>>>>>>> { >>>>>>>> "auth_mechanism": "UNAUTHENTICATED", >>>>>>>> "name": "testcluster", >>>>>>>> "scheduler_zk_path": "/scheduler/aurora", >>>>>>>> "slave_root": "/var/lib/mesos", >>>>>>>> "slave_run_directory": "latest", >>>>>>>> "zk": "127.0.1.1" >>>>>>>> } >>>>>>>> ] >>>>>>>> >>>>>>>> I have a hello_world.aurora in my home folder. However the >>>>>>>> following command errors out: >>>>>>>> ~$ aurora job create testcluster/testrole/test/hellojob >>>>>>>> ./hello_world.aurora >>>>>>>> Error loading configuration: [Errno 2] No such file or directory: >>>>>>>> '/vagrant/hello_world.py' >>>>>>>> >>>>>>>> A job list does work: >>>>>>>> ~$ aurora job list testcluster >>>>>>>> INFO] Retrieving jobs for role None >>>>>>>> >>>>>>>> I am not even using the vagrant. I am using zk & mesos on the same >>>>>>>> machine as aurora. How do I submit these job templates to aurora? >>>>>>>> >>>>>>>> Any pointers to documentation will be helpful. >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> κρισhναν >>>>>>>> >>>>>>>> On Wed, Oct 21, 2015 at 11:09 PM, Zameer Manji <[email protected] >>>>>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >>>>>>>> >>>>>>>>> Version 0.9.0 does not have the gc executor. Version 0.9.0 uses >>>>>>>>> Mesos' task reconciliation >>>>>>>>> <http://mesos.apache.org/documentation/latest/reconciliation/> API >>>>>>>>> instead. >>>>>>>>> >>>>>>>>> On Wed, Oct 21, 2015 at 9:28 AM, Krish <[email protected] >>>>>>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Thanks Bill for the location to the debs. I was finally able to >>>>>>>>>> run aurora. :) >>>>>>>>>> >>>>>>>>>> I did find thermos_executor.pex & thermos_observer after >>>>>>>>>> installing aurora-executor. I still could not find gc_executor.pex >>>>>>>>>> on my >>>>>>>>>> system. >>>>>>>>>> Is there a location from where I can download the binaries for >>>>>>>>>> *.pex or build them from scratch? >>>>>>>>>> >>>>>>>>>> root@dev:/# find . -name "*.pex" >>>>>>>>>> ./usr/share/aurora/bin/thermos_executor.pex >>>>>>>>>> ./usr/share/aurora/bin/kaurora_admin.pex >>>>>>>>>> ./usr/share/aurora/bin/kaurora.pex >>>>>>>>>> ./usr/share/aurora/bin/thermos.pex >>>>>>>>>> ./usr/share/aurora/bin/thermos_observer.pex >>>>>>>>>> ./home/ubuntu/.pex >>>>>>>>>> ./root/.pex >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> κρισhναν >>>>>>>>>> >>>>>>>>>> On Tue, Oct 20, 2015 at 11:46 PM, Bill Farner <[email protected] >>>>>>>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >>>>>>>>>> >>>>>>>>>>> Aurora currently requires an executor, so setting it to >>>>>>>>>>> /dev/null will not work. Happy to talk further about your thoughts >>>>>>>>>>> around >>>>>>>>>>> sidestepping the executor. >>>>>>>>>>> >>>>>>>>>>> As for working with the scheduler source code, it's a standard >>>>>>>>>>> gradle project and we tend to use intellij. Docs to help ramp on >>>>>>>>>>> that: >>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md >>>>>>>>>>> >>>>>>>>>>> As for builds - the .zip is a source distribution, so it won't >>>>>>>>>>> have any pre-built binaries. If you're on debian, we have official >>>>>>>>>>> debs >>>>>>>>>>> here: https://bintray.com/apache/aurora >>>>>>>>>>> You can see how they're built here (and can build your own) >>>>>>>>>>> packages: https://github.com/apache/aurora-packaging >>>>>>>>>>> We're close to having official RPMs, but none to speak of yet. >>>>>>>>>>> >>>>>>>>>>> On Tue, Oct 20, 2015 at 9:47 AM, Krish < >>>>>>>>>>> [email protected] >>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Stephen, >>>>>>>>>>>> I am trying to get started and run aurora without thermos >>>>>>>>>>>> executor (setting it to /dev/null does not help) - on a local >>>>>>>>>>>> linux box for >>>>>>>>>>>> now & planning to containerize/dockerize it later. >>>>>>>>>>>> >>>>>>>>>>>> Can you please point me to the right documentation (or a >>>>>>>>>>>> pointer to the cli parsing source code) which can help me resolve >>>>>>>>>>>> this? >>>>>>>>>>>> Also, are there any steps steps to import source code into eclipse >>>>>>>>>>>> to >>>>>>>>>>>> browse & analyze code for this. >>>>>>>>>>>> >>>>>>>>>>>> Also, where do i find all the *.pex files? They are not present >>>>>>>>>>>> in the zip file nor anywhere in the built source code. >>>>>>>>>>>> >>>>>>>>>>>> I know I am asking too many queries on a single thread here, & >>>>>>>>>>>> would appreciate the help. >>>>>>>>>>>> I think at the end of this, I will put the steps I followed in >>>>>>>>>>>> a gist/blog so others might find their way around, & not struggle >>>>>>>>>>>> as much. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> κρισhναν >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan < >>>>>>>>>>>> [email protected] >>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Krish, >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> you don't have to set framework_authentication_file and >>>>>>>>>>>>> zk_digest_credentials. The scheduler help text is misleading here >>>>>>>>>>>>> as >>>>>>>>>>>>> everything will work fine if you leave those empty. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> In addition, looks like you are misunderstanding the usage of >>>>>>>>>>>>> the thermos_executor_path command line flag of the scheduler. >>>>>>>>>>>>> It is supposed to point to the binary containing the generic >>>>>>>>>>>>> Aurora executor (thermos_executor.pex). You only need the >>>>>>>>>>>>> hello_world.aurora >>>>>>>>>>>>> once your scheduler is up an running. It serves as an example >>>>>>>>>>>>> input for the >>>>>>>>>>>>> aurora command line client which can be used to scheduler jobs >>>>>>>>>>>>> and services >>>>>>>>>>>>> on an Aurora master. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Have you tried to use the vagrant box? Just type 'vagrant >>>>>>>>>>>>> up`in a checkout of the Aurora source code. It gives you a running >>>>>>>>>>>>> scheduler to play with. Once you have understood how it works, >>>>>>>>>>>>> you can >>>>>>>>>>>>> start trying to install it on your own (by reverse-engineering >>>>>>>>>>>>> the vagrant >>>>>>>>>>>>> box). >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Hope this helps a little, >>>>>>>>>>>>> >>>>>>>>>>>>> Stephan >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> ------------------------------ >>>>>>>>>>>>> *From:* Krish <[email protected] >>>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> >>>>>>>>>>>>> *Sent:* Tuesday, October 20, 2015 11:39 AM >>>>>>>>>>>>> *To:* Bill Farner >>>>>>>>>>>>> *Cc:* [email protected] >>>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>; >>>>>>>>>>>>> Erb, Stephan >>>>>>>>>>>>> >>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora >>>>>>>>>>>>> >>>>>>>>>>>>> Bill/Stephen, >>>>>>>>>>>>> I still get a stacktrace when running the aurora scheduler CLI. >>>>>>>>>>>>> >>>>>>>>>>>>> I do not know what to specify for >>>>>>>>>>>>> -framework_authentication_file & -zk_digest_credentials, and >>>>>>>>>>>>> they are >>>>>>>>>>>>> required arguments. >>>>>>>>>>>>> >>>>>>>>>>>>> I am not using any authentication on Mesos master, do I still >>>>>>>>>>>>> need the framework_authentication_file parameter? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> rm -rf /db /backup_dir >>>>>>>>>>>>> mesos-log initialize --path="/db" >>>>>>>>>>>>> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/ >>>>>>>>>>>>> JAVA_OPTS="-Xmx1536m -Xms256m" >>>>>>>>>>>>> /usr/local/aurora-scheduler/bin/aurora-scheduler >>>>>>>>>>>>> -backup_dir=/backup_dir >>>>>>>>>>>>> -cluster_name=tc >>>>>>>>>>>>> -mesos_master_address=zk://localhost:2181/mesos/master >>>>>>>>>>>>> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181 >>>>>>>>>>>>> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false >>>>>>>>>>>>> -native_log_file_path=/db >>>>>>>>>>>>> -thermos_executor_path=/home/ubuntu/hello_world.aurora >>>>>>>>>>>>> ... >>>>>>>>>>>>> ... >>>>>>>>>>>>> INFO: Binding org.apache.aurora.scheduler.http.Utilization to >>>>>>>>>>>>> GuiceManagedCompon >>>>>>>>>>>>> entProvider with the scope "PerRequest" >>>>>>>>>>>>> Oct 20, 2015 9:27:40 AM >>>>>>>>>>>>> org.apache.aurora.scheduler.cron.quartz.CronModule provi >>>>>>>>>>>>> deTimeZone >>>>>>>>>>>>> WARNING: Cron schedules are configured to fire according to >>>>>>>>>>>>> timezone Greenwich M >>>>>>>>>>>>> ean Time but system timezone is set to Coordinated Universal >>>>>>>>>>>>> Time >>>>>>>>>>>>> Oct 20, 2015 9:27:41 AM >>>>>>>>>>>>> org.eclipse.jetty.server.AbstractConnector doStart >>>>>>>>>>>>> INFO: Started [email protected]:43843 >>>>>>>>>>>>> E1020 09:27:41.290 THREAD1 >>>>>>>>>>>>> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec >>>>>>>>>>>>> ute: Caught unchecked exception: >>>>>>>>>>>>> com.google.inject.ProvisionException: Guice pro >>>>>>>>>>>>> vision errors: >>>>>>>>>>>>> >>>>>>>>>>>>> 1) Error in custom provider, >>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be null at >>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos >>>>>>>>>>>>> LogStreamModule.java:117) >>>>>>>>>>>>> at >>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos >>>>>>>>>>>>> LogStreamModule.java:117) >>>>>>>>>>>>> while locating org.apache.mesos.Log >>>>>>>>>>>>> at >>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf >>>>>>>>>>>>> ace(MesosLogStreamModule.java:152) >>>>>>>>>>>>> while locating >>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface >>>>>>>>>>>>> >>>>>>>>>>>>> 1 error >>>>>>>>>>>>> com.google.inject.ProvisionException: Guice provision errors: >>>>>>>>>>>>> >>>>>>>>>>>>> 1) Error in custom provider, >>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be >>>>>>>>>>>>> null >>>>>>>>>>>>> at >>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos >>>>>>>>>>>>> LogStreamModule.java:117) >>>>>>>>>>>>> at >>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos >>>>>>>>>>>>> LogStreamModule.java:117) >>>>>>>>>>>>> while locating org.apache.mesos.Log >>>>>>>>>>>>> at >>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf >>>>>>>>>>>>> ace(MesosLogStreamModule.java:152) >>>>>>>>>>>>> while locating >>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface >>>>>>>>>>>>> >>>>>>>>>>>>> 1 error >>>>>>>>>>>>> at >>>>>>>>>>>>> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987) >>>>>>>>>>>>> at >>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136 >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> κρισhναν >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner < >>>>>>>>>>>>> [email protected] >>>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> The typical flow is that you keep your .aurora file checked >>>>>>>>>>>>>> into git, and commit every time you deploy/update. When you >>>>>>>>>>>>>> change your >>>>>>>>>>>>>> file, you will instruct Aurora to update the live job (have a >>>>>>>>>>>>>> look at aurora >>>>>>>>>>>>>> update -h). Aurora will perform a rolling upgrade of your >>>>>>>>>>>>>> job to the new config. You'll use this same flow for updating >>>>>>>>>>>>>> your job's >>>>>>>>>>>>>> software as well as resizing the job. >>>>>>>>>>>>>> >>>>>>>>>>>>>> For (3), you could set up alerting for stats that the >>>>>>>>>>>>>> scheduler exports. Have a look here for monitoring background: >>>>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/monitoring.md >>>>>>>>>>>>>> >>>>>>>>>>>>>> You'll find want to look at scheduler stats related to >>>>>>>>>>>>>> 'pending'. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 12:16 PM, Krish < >>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks for the pointer. Now I notice that the >>>>>>>>>>>>>>> aurora-scheduler script has the --thermos_executor_path as a >>>>>>>>>>>>>>> mandatory >>>>>>>>>>>>>>> requirement. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have a couple of questions on how the >>>>>>>>>>>>>>> thermos_executor/.aurora config file functions: >>>>>>>>>>>>>>> 1. Do we have to statically define the file beforehand? >>>>>>>>>>>>>>> 2. What happens when we want to dynamically change the >>>>>>>>>>>>>>> config, say increasing the number of instances of a service >>>>>>>>>>>>>>> required? Does >>>>>>>>>>>>>>> aurora require a reboot then? >>>>>>>>>>>>>>> 3. How do I get notified about the message mesos sends when >>>>>>>>>>>>>>> it cannot schedule tasks for lack of resources? Should I depend >>>>>>>>>>>>>>> on aurora >>>>>>>>>>>>>>> for this or try to look for a hook into mesos? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I think a little bit of context would help here. >>>>>>>>>>>>>>> What I plan to check is to run a very basic job/task inside >>>>>>>>>>>>>>> a docker container with aurora & wait for a 'resource not >>>>>>>>>>>>>>> available' >>>>>>>>>>>>>>> message from mesos, and accordingly call an api to spin up a >>>>>>>>>>>>>>> new node in my >>>>>>>>>>>>>>> cluster. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> κρισhναν >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan < >>>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');> >>>>>>>>>>>>>>> > wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I believe you are missing the thermos_executor options that >>>>>>>>>>>>>>>> have to be passed to the scheduler command line. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> See >>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39 >>>>>>>>>>>>>>>> for an example >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Stephan >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ------------------------------ >>>>>>>>>>>>>>>> *From:* Krish <[email protected] >>>>>>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');> >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> *Sent:* Monday, October 19, 2015 8:45 AM >>>>>>>>>>>>>>>> *To:* [email protected] >>>>>>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');> >>>>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> I am a n00b with apache aurora & trying to experiment some >>>>>>>>>>>>>>>> things on my local machine with zookeeper and mesos-master >>>>>>>>>>>>>>>> running locally. >>>>>>>>>>>>>>>> They have initialized properly. When I try to run aurora with >>>>>>>>>>>>>>>> the required >>>>>>>>>>>>>>>> options, I get the following error, & googing hasn't helped me >>>>>>>>>>>>>>>> much here. >>>>>>>>>>>>>>>> Appreciate any help. Thanks in advance. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>> WARNING: Method [public void >>>>>>>>>>>>>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)] >>>>>>>>>>>>>>>> is synthetic and is being intercepted by >>>>>>>>>>>>>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8]. >>>>>>>>>>>>>>>> This could indicate a bug. The method >>>>>>>>>>>>>>>> may be intercepted twice, or may not be intercepted at all. >>>>>>>>>>>>>>>> Exception in thread "main" >>>>>>>>>>>>>>>> com.google.inject.CreationException: Guice creation errors: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 1) An exception was caught and reported. Message: A value >>>>>>>>>>>>>>>> may only be retrieved from a variable that has a default or >>>>>>>>>>>>>>>> has been >>>>>>>>>>>>>>>> set. >>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2) Could not find a suitable constructor in >>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes >>>>>>>>>>>>>>>> must have >>>>>>>>>>>>>>>> either one (a >>>>>>>>>>>>>>>> nd only one) constructor annotated with @Inject or a >>>>>>>>>>>>>>>> zero-argument constructor that is not private. >>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43) >>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2 errors >>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435) >>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154) >>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106) >>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:95) >>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:83) >>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120) >>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87) >>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181) >>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142) >>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263) >>>>>>>>>>>>>>>> Caused by: java.lang.IllegalStateException: A value may >>>>>>>>>>>>>>>> only be retrieved from a variable that has a default or has >>>>>>>>>>>>>>>> been set. >>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176) >>>>>>>>>>>>>>>> at com.twitter.common.args.Arg.get(Arg.java:82) >>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206) >>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59) >>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223) >>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> com.google.inject.util.Modules$2.configure(Modules.java:114) >>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223) >>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> com.google.inject.spi.Elements.getElements(Elements.java:101) >>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133) >>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103) >>>>>>>>>>>>>>>> ... 7 more >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Complete logs are present @http://pastebin.com/i72HvbYi. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> κρισhναν >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Zameer Manji >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
