Can you guide me how to do that? Should I start with a new page and then submit it or would you like that as an entry in some existing doc? That will be the short term (couple of hours) item on my checklist.
Actually, as I said before, I have in mind to blog about my entire design and implementation process - the how and the why of docker configuration, private docker repo setup, coreos cluster setup, and zk, mesos master, aurora containerisation and setup, along with their monitoring (have decided on bosun.org with cAdvisor). And a short guide as to how to run both containerized and non containerized jobs in production. I had to refer to a dozen and more sites and blogs and manuals and source to get so far; and got help from engineers in various mailing lists. A unified guide should be helpful, imho. On Thursday 3 March 2016, Bill Farner <[email protected]> wrote: > Wow! I'm glad you got it working! To help the next poor soul trying to > do this, would you be willing to put up a doc patch on our end? > > On Thursday, March 3, 2016, Krish <[email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > >> TLDR; >> Use only file with the name .dockercfg for docker credentials in mesos >> tasks! >> >> Long story: >> --------------- >> Holy smokescreens! >> This is for reporting & documenting purposes only, so that others don't >> have to pull their hair like I did for the past few evenings! >> >> A little background: >> I am running Ubuntu 14.04 on my system and docker stores its credentials >> in the ~/.docker/config.json as >> cat ~/.docker/config.json >> { >> "auths": { >> "repo.example.com:5000": { >> "auth": "<snip>", >> "email": "<snip>" >> } >> } >> } >> >> And I am doing all these experiments on a coreOS system which stores the >> credentials in ~/.dockercfg as >> core@aurora-1 ~ $ cat ~/.dockercfg >> { >> "repo.example.com:5000": { >> "auth": "<snip>", >> "email": "<snip>" >> } >> } >> >> Since my container was an Ubuntu 14.04 container (as was my local >> system), I used the ubuntu credential file format, i.e. I couldn't get the >> slave task to read the docker credentials as I had stored it as >> ~/.docker/config.json. >> After parsing through (a lot of find's, grep's and regex matching) >> aurora, mesos, and thermos source code, I saw in >> mesos/src/docker/docker.cpp: >> >> 1126 // Set HOME variable to pick up *.dockercfg*. >> 1127 map<string, string> environment = os::environment(); >> 1128 >> 1129 environment["HOME"] = directory; >> 1130 >> >> Changed the filename and the json content, changed the >> thermos_executor_resources, and bam, docker pull works! >> >> Well, the mesos documentation does say "To run an image from a private >> repository, one can include the URI pointing to a .dockercfg that contains >> login information." and I would have read it a dozen times! >> But I never thought that they literally meant '.dockercfg' as the name of >> the file! >> >> >> >> >> -- >> κρισhναν >> >> On Thu, Mar 3, 2016 at 1:45 PM, Krish <[email protected]> wrote: >> >>> >>> I have got the docker config file copied into the sandbox using the >>> thermos_executor_resources flag; however docker is still not able to find >>> the credentials file for doing an appropriate pull of image from a private >>> repo. >>> >>> When I try to use the library/hello-world:latest image from public >>> docker repo to check if everything works fine without the credentials, I >>> encounter a different problem: >>> exec: "/bin/sh": stat /bin/sh: no such file or directory >>> Error response from daemon: Cannot start container >>> de93dc344d44b41bccccff49e508001a97ff23a8964e637d32a506a31fd4d946: [8] >>> System error: exec: "/bin/sh": stat /bin/sh: no such file or directory >>> >>> I was referring to this email for guidance on setting up a mesos slave: >>> http://mail-archives.apache.org/mod_mbox/aurora-dev/201503.mbox/%3CCAKB1MkHR=+nonesllx9buwttdthsku46pw_wr4b+_z9p59+...@mail.gmail.com%3E >>> >>> So, I cannot get the credentials file to be used by docker, and if I >>> bypass authentication, I can do a docker pull, but encounter a weird error >>> in launching the hello-world image. >>> >>> Am I missing out on checking any log files generated? I currently refer >>> to mesos-slave stdout and the sandbox stderr file. >>> Any configuration parameter I am missing for this to happen? >>> >>> Any pointers will be really helpful. Thanks in advance. >>> >>> >>> >>> -- >>> κρισhναν >>> >>> On Sun, Feb 28, 2016 at 3:37 PM, Krish <[email protected]> >>> wrote: >>> >>>> Continuing my earlier chain of thought, I found this in the mesos bug >>>> list: >>>> MESOS-4242 - Allow Docker private registry credentials to be passed >>>> from framework. >>>> How does one pass credentials using the framework? As it seems the >>>> .docker/config.json is not read from the slave. >>>> >>>> >>>> >>>> >>>> -- >>>> κρισhναν >>>> >>>> On Sat, Feb 27, 2016 at 11:46 PM, Krish <[email protected]> >>>> wrote: >>>> >>>>> I couldn't complete my PoC before project before (got busy with other >>>>> work). Well, it is never too late and here's my update and issue. >>>>> >>>>> I have a 3 node zk (3.5.1 alpha), mesos-master (v0.24.1) & aurora >>>>> (v0.11.0) running. >>>>> I was stuck in a problem where I was using mesos 0.25.0 & aurora 0.9.0 >>>>> & got a protobuf field not set error - ExecutorInfo field. >>>>> >>>>> I have a mesos agent running in docker container on coreos and it can >>>>> access the host docker just fine. >>>>> I have also put the docker login credentials file at the right >>>>> location for it to access the private docker registry. >>>>> I can manually trigger a docker pull and docker run without issues >>>>> from the slave (which is also reflected properly outside the slave >>>>> container with docker images and docker ps). >>>>> >>>>> However, when I try to run an aurora job with hello-docker container, >>>>> the slave prints out the log that docker pull has failed; more >>>>> specifically: >>>>> " failed to start: Failed to 'docker pull >>>>> private_repo.com:5000/krish/test:latest': exit status = exited with >>>>> status 1 stderr = Error: image krish/test:latest not found" >>>>> >>>>> My hunch is that when using docker run from aurora DSL, it does not >>>>> read the docker credentials file properly and hence fails. I can reproduce >>>>> the exact same error when I delete the credentials file from the slave and >>>>> trigger a pull. >>>>> >>>>> Is the hunch right? If yes, is there a way to resolve this? Maybe >>>>> source it some way before the run command? >>>>> >>>>> >>>>> >>>>> -- >>>>> κρισhναν >>>>> >>>>> On Tue, Oct 27, 2015 at 10:35 PM, Bill Farner <[email protected]> >>>>> wrote: >>>>> >>>>>> (1) clusters.json is written by you, configuring the CLI client with >>>>>> instructions for what clusters are available and how to discover them. >>>>>> >>>>>> (2) That's expected - mesos only allows one active replica of a >>>>>> framework at a time, this signals which one is active. >>>>>> >>>>>> (3) The observer is essentially a web server that allows you to >>>>>> browse a task's sandbox directory and other information about it. You >>>>>> will >>>>>> need to configure it to run on your worker/agent nodes for that >>>>>> functionality to work (it's linked from the scheduler web UI). >>>>>> >>>>>> (4) You could indeed implement that behavior externally. There is a >>>>>> reason: >>>>>> https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L556-L559 >>>>>> >>>>>> (5) That is correct. The scheduler exposes a thrift API that you >>>>>> would use (a REST API is coming, but ground has not yet been broken). If >>>>>> you go this route, i suggest you skip the DSL and use the JSON task >>>>>> description format that is shipped over the API. There's not good >>>>>> documentation on this, but we can help you through it and would be >>>>>> grateful >>>>>> for a writeup of your approach! >>>>>> >>>>>> >>>>>> On Mon, Oct 26, 2015 at 11:44 PM, Krish <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Folks, >>>>>>> Firstly, thanks for all the help. Am happy to report that I have set >>>>>>> up zk, mesos & aurora, & can work further towards my idea of having an >>>>>>> auto-scaling cluster. >>>>>>> I have some further questions about the work done so far & things I >>>>>>> plan to do: >>>>>>> >>>>>>> 1. Is the /etc/aurora/clusters.json file created by the >>>>>>> scheduled or does it need to be handcrafted? I had to manually edit >>>>>>> the >>>>>>> file to get my `aurora job ...` cli to work. >>>>>>> >>>>>>> 2. I am running a cluster of 3 coreOS VMs on vagrant with zk, >>>>>>> mesos & aurora in a docker container. Only 1 of them outputs '1' >>>>>>> when I >>>>>>> look at the framework_registered' field. Is this expected? How do I >>>>>>> verify >>>>>>> that they are working as a cluster? >>>>>>> >>>>>>> 3. From the documentation, I see that there is an observer that >>>>>>> needs to be listening on port 1338. What is the observer socket & its >>>>>>> purpose? I have aurora listening only on ports 8081 (http port) & >>>>>>> 8083 >>>>>>> (libprocess). >>>>>>> >>>>>>> 4. I read about the 'PENDING' field in aurora documentation, as >>>>>>> Bill suggested, & realize that it just shows that a task is waiting >>>>>>> for >>>>>>> some reasons (for want of resources, in my case, as 0 slaves have >>>>>>> registered). I was thinking of adding a hook to the pending state; >>>>>>> say if a >>>>>>> task is PENDING for 5 minutes for lack of resources in the cluster, >>>>>>> then >>>>>>> spin up a new machine. Is this the right approach to take? Does >>>>>>> aurora >>>>>>> provide reasons for why is a task in PENDING state? >>>>>>> >>>>>>> => aurora job status testcluster/$USER/test/hello_world >>>>>>> INFO] Checking status of testcluster/ubuntu/test/hello_world >>>>>>> Active tasks (1): >>>>>>> Task role: ubuntu, env: test, name: hello_world, >>>>>>> instance: 0, status: >>>>>>> PENDING on None >>>>>>> cpus: 0.1, ram: 16 MB, disk: 16 MB >>>>>>> events: >>>>>>> 2015-10-23 04:55:33 PENDING: None >>>>>>> Inactive tasks (0): >>>>>>> >>>>>>> 5. Aurora defines job/s is a .aurora config file & if I decide >>>>>>> to increase/decrease the number of instances in my cluster, then I >>>>>>> need to >>>>>>> create/overwrite the concerned the .aurora and trigger the `aurora >>>>>>> update >>>>>>> ...` command. Is this right? >>>>>>> If yes, is there an HTTP API I can invoke remotely which >>>>>>> triggers this update? >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> κρισhναν >>>>>>> >>>>>>> On Fri, Oct 23, 2015 at 8:09 AM, Joshua Cohen < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> I suspect your error from `aurora job create ...` is due to the >>>>>>>> aurora config you're using referencing `/vagrant/hello_world.py` which >>>>>>>> does >>>>>>>> not exist (as you say: you're not even using Vagrant). Can you link the >>>>>>>> .aurora config you're using? >>>>>>>> >>>>>>>> Cheers, >>>>>>>> >>>>>>>> Joshua >>>>>>>> >>>>>>>> On Thu, Oct 22, 2015 at 3:22 PM, Krish <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Thanks, Zameer. >>>>>>>>> >>>>>>>>> I had to modify /etc/aurora/clusters.json: >>>>>>>>> [ >>>>>>>>> { >>>>>>>>> "auth_mechanism": "UNAUTHENTICATED", >>>>>>>>> "name": "testcluster", >>>>>>>>> "scheduler_zk_path": "/scheduler/aurora", >>>>>>>>> "slave_root": "/var/lib/mesos", >>>>>>>>> "slave_run_directory": "latest", >>>>>>>>> "zk": "127.0.1.1" >>>>>>>>> } >>>>>>>>> ] >>>>>>>>> >>>>>>>>> I have a hello_world.aurora in my home folder. However the >>>>>>>>> following command errors out: >>>>>>>>> ~$ aurora job create testcluster/testrole/test/hellojob >>>>>>>>> ./hello_world.aurora >>>>>>>>> Error loading configuration: [Errno 2] No such file or directory: >>>>>>>>> '/vagrant/hello_world.py' >>>>>>>>> >>>>>>>>> A job list does work: >>>>>>>>> ~$ aurora job list testcluster >>>>>>>>> INFO] Retrieving jobs for role None >>>>>>>>> >>>>>>>>> I am not even using the vagrant. I am using zk & mesos on the same >>>>>>>>> machine as aurora. How do I submit these job templates to aurora? >>>>>>>>> >>>>>>>>> Any pointers to documentation will be helpful. >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> κρισhναν >>>>>>>>> >>>>>>>>> On Wed, Oct 21, 2015 at 11:09 PM, Zameer Manji <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Version 0.9.0 does not have the gc executor. Version 0.9.0 uses >>>>>>>>>> Mesos' task reconciliation >>>>>>>>>> <http://mesos.apache.org/documentation/latest/reconciliation/> API >>>>>>>>>> instead. >>>>>>>>>> >>>>>>>>>> On Wed, Oct 21, 2015 at 9:28 AM, Krish <[email protected] >>>>>>>>>> > wrote: >>>>>>>>>> >>>>>>>>>>> Thanks Bill for the location to the debs. I was finally able to >>>>>>>>>>> run aurora. :) >>>>>>>>>>> >>>>>>>>>>> I did find thermos_executor.pex & thermos_observer after >>>>>>>>>>> installing aurora-executor. I still could not find gc_executor.pex >>>>>>>>>>> on my >>>>>>>>>>> system. >>>>>>>>>>> Is there a location from where I can download the binaries for >>>>>>>>>>> *.pex or build them from scratch? >>>>>>>>>>> >>>>>>>>>>> root@dev:/# find . -name "*.pex" >>>>>>>>>>> ./usr/share/aurora/bin/thermos_executor.pex >>>>>>>>>>> ./usr/share/aurora/bin/kaurora_admin.pex >>>>>>>>>>> ./usr/share/aurora/bin/kaurora.pex >>>>>>>>>>> ./usr/share/aurora/bin/thermos.pex >>>>>>>>>>> ./usr/share/aurora/bin/thermos_observer.pex >>>>>>>>>>> ./home/ubuntu/.pex >>>>>>>>>>> ./root/.pex >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> κρισhναν >>>>>>>>>>> >>>>>>>>>>> On Tue, Oct 20, 2015 at 11:46 PM, Bill Farner < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Aurora currently requires an executor, so setting it to >>>>>>>>>>>> /dev/null will not work. Happy to talk further about your >>>>>>>>>>>> thoughts around >>>>>>>>>>>> sidestepping the executor. >>>>>>>>>>>> >>>>>>>>>>>> As for working with the scheduler source code, it's a standard >>>>>>>>>>>> gradle project and we tend to use intellij. Docs to help ramp on >>>>>>>>>>>> that: >>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md >>>>>>>>>>>> >>>>>>>>>>>> As for builds - the .zip is a source distribution, so it won't >>>>>>>>>>>> have any pre-built binaries. If you're on debian, we have >>>>>>>>>>>> official debs >>>>>>>>>>>> here: https://bintray.com/apache/aurora >>>>>>>>>>>> You can see how they're built here (and can build your own) >>>>>>>>>>>> packages: https://github.com/apache/aurora-packaging >>>>>>>>>>>> We're close to having official RPMs, but none to speak of yet. >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Oct 20, 2015 at 9:47 AM, Krish < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Stephen, >>>>>>>>>>>>> I am trying to get started and run aurora without thermos >>>>>>>>>>>>> executor (setting it to /dev/null does not help) - on a local >>>>>>>>>>>>> linux box for >>>>>>>>>>>>> now & planning to containerize/dockerize it later. >>>>>>>>>>>>> >>>>>>>>>>>>> Can you please point me to the right documentation (or a >>>>>>>>>>>>> pointer to the cli parsing source code) which can help me resolve >>>>>>>>>>>>> this? >>>>>>>>>>>>> Also, are there any steps steps to import source code into >>>>>>>>>>>>> eclipse to >>>>>>>>>>>>> browse & analyze code for this. >>>>>>>>>>>>> >>>>>>>>>>>>> Also, where do i find all the *.pex files? They are not >>>>>>>>>>>>> present in the zip file nor anywhere in the built source code. >>>>>>>>>>>>> >>>>>>>>>>>>> I know I am asking too many queries on a single thread here, & >>>>>>>>>>>>> would appreciate the help. >>>>>>>>>>>>> I think at the end of this, I will put the steps I followed in >>>>>>>>>>>>> a gist/blog so others might find their way around, & not struggle >>>>>>>>>>>>> as much. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> κρισhναν >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Krish, >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> you don't have to set framework_authentication_file and >>>>>>>>>>>>>> zk_digest_credentials. The scheduler help text is misleading >>>>>>>>>>>>>> here as >>>>>>>>>>>>>> everything will work fine if you leave those empty. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> In addition, looks like you are misunderstanding the usage of >>>>>>>>>>>>>> the thermos_executor_path command line flag of the >>>>>>>>>>>>>> scheduler. It is supposed to point to the binary containing the >>>>>>>>>>>>>> generic >>>>>>>>>>>>>> Aurora executor (thermos_executor.pex). You only need the >>>>>>>>>>>>>> hello_world.aurora >>>>>>>>>>>>>> once your scheduler is up an running. It serves as an example >>>>>>>>>>>>>> input for the >>>>>>>>>>>>>> aurora command line client which can be used to scheduler jobs >>>>>>>>>>>>>> and services >>>>>>>>>>>>>> on an Aurora master. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Have you tried to use the vagrant box? Just type 'vagrant >>>>>>>>>>>>>> up`in a checkout of the Aurora source code. It gives you a >>>>>>>>>>>>>> running >>>>>>>>>>>>>> scheduler to play with. Once you have understood how it works, >>>>>>>>>>>>>> you can >>>>>>>>>>>>>> start trying to install it on your own (by reverse-engineering >>>>>>>>>>>>>> the vagrant >>>>>>>>>>>>>> box). >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hope this helps a little, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Stephan >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> ------------------------------ >>>>>>>>>>>>>> *From:* Krish <[email protected]> >>>>>>>>>>>>>> *Sent:* Tuesday, October 20, 2015 11:39 AM >>>>>>>>>>>>>> *To:* Bill Farner >>>>>>>>>>>>>> *Cc:* [email protected]; Erb, Stephan >>>>>>>>>>>>>> >>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora >>>>>>>>>>>>>> >>>>>>>>>>>>>> Bill/Stephen, >>>>>>>>>>>>>> I still get a stacktrace when running the aurora scheduler >>>>>>>>>>>>>> CLI. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I do not know what to specify for >>>>>>>>>>>>>> -framework_authentication_file & -zk_digest_credentials, and >>>>>>>>>>>>>> they are >>>>>>>>>>>>>> required arguments. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I am not using any authentication on Mesos master, do I still >>>>>>>>>>>>>> need the framework_authentication_file parameter? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> rm -rf /db /backup_dir >>>>>>>>>>>>>> mesos-log initialize --path="/db" >>>>>>>>>>>>>> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/ >>>>>>>>>>>>>> JAVA_OPTS="-Xmx1536m -Xms256m" >>>>>>>>>>>>>> /usr/local/aurora-scheduler/bin/aurora-scheduler >>>>>>>>>>>>>> -backup_dir=/backup_dir >>>>>>>>>>>>>> -cluster_name=tc >>>>>>>>>>>>>> -mesos_master_address=zk://localhost:2181/mesos/master >>>>>>>>>>>>>> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181 >>>>>>>>>>>>>> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false >>>>>>>>>>>>>> -native_log_file_path=/db >>>>>>>>>>>>>> -thermos_executor_path=/home/ubuntu/hello_world.aurora >>>>>>>>>>>>>> ... >>>>>>>>>>>>>> ... >>>>>>>>>>>>>> INFO: Binding org.apache.aurora.scheduler.http.Utilization to >>>>>>>>>>>>>> GuiceManagedCompon >>>>>>>>>>>>>> entProvider with the scope "PerRequest" >>>>>>>>>>>>>> Oct 20, 2015 9:27:40 AM >>>>>>>>>>>>>> org.apache.aurora.scheduler.cron.quartz.CronModule provi >>>>>>>>>>>>>> deTimeZone >>>>>>>>>>>>>> WARNING: Cron schedules are configured to fire according to >>>>>>>>>>>>>> timezone Greenwich M >>>>>>>>>>>>>> ean Time but system timezone is set to Coordinated Universal >>>>>>>>>>>>>> Time >>>>>>>>>>>>>> Oct 20, 2015 9:27:41 AM >>>>>>>>>>>>>> org.eclipse.jetty.server.AbstractConnector doStart >>>>>>>>>>>>>> INFO: Started [email protected]:43843 >>>>>>>>>>>>>> E1020 09:27:41.290 THREAD1 >>>>>>>>>>>>>> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec >>>>>>>>>>>>>> ute: Caught unchecked exception: >>>>>>>>>>>>>> com.google.inject.ProvisionException: Guice pro >>>>>>>>>>>>>> vision errors: >>>>>>>>>>>>>> >>>>>>>>>>>>>> 1) Error in custom provider, >>>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be null at >>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos >>>>>>>>>>>>>> LogStreamModule.java:117) >>>>>>>>>>>>>> at >>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos >>>>>>>>>>>>>> LogStreamModule.java:117) >>>>>>>>>>>>>> while locating org.apache.mesos.Log >>>>>>>>>>>>>> at >>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf >>>>>>>>>>>>>> ace(MesosLogStreamModule.java:152) >>>>>>>>>>>>>> while locating >>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface >>>>>>>>>>>>>> >>>>>>>>>>>>>> 1 error >>>>>>>>>>>>>> com.google.inject.ProvisionException: Guice provision errors: >>>>>>>>>>>>>> >>>>>>>>>>>>>> 1) Error in custom provider, >>>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be >>>>>>>>>>>>>> null >>>>>>>>>>>>>> at >>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos >>>>>>>>>>>>>> LogStreamModule.java:117) >>>>>>>>>>>>>> at >>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos >>>>>>>>>>>>>> LogStreamModule.java:117) >>>>>>>>>>>>>> while locating org.apache.mesos.Log >>>>>>>>>>>>>> at >>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf >>>>>>>>>>>>>> ace(MesosLogStreamModule.java:152) >>>>>>>>>>>>>> while locating >>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface >>>>>>>>>>>>>> >>>>>>>>>>>>>> 1 error >>>>>>>>>>>>>> at >>>>>>>>>>>>>> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987) >>>>>>>>>>>>>> at >>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136 >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> κρισhναν >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> The typical flow is that you keep your .aurora file checked >>>>>>>>>>>>>>> into git, and commit every time you deploy/update. When you >>>>>>>>>>>>>>> change your >>>>>>>>>>>>>>> file, you will instruct Aurora to update the live job (have a >>>>>>>>>>>>>>> look at aurora >>>>>>>>>>>>>>> update -h). Aurora will perform a rolling upgrade of your >>>>>>>>>>>>>>> job to the new config. You'll use this same flow for updating >>>>>>>>>>>>>>> your job's >>>>>>>>>>>>>>> software as well as resizing the job. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> For (3), you could set up alerting for stats that the >>>>>>>>>>>>>>> scheduler exports. Have a look here for monitoring background: >>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/monitoring.md >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> You'll find want to look at scheduler stats related to >>>>>>>>>>>>>>> 'pending'. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 12:16 PM, Krish < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks for the pointer. Now I notice that the >>>>>>>>>>>>>>>> aurora-scheduler script has the --thermos_executor_path as a >>>>>>>>>>>>>>>> mandatory >>>>>>>>>>>>>>>> requirement. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I have a couple of questions on how the >>>>>>>>>>>>>>>> thermos_executor/.aurora config file functions: >>>>>>>>>>>>>>>> 1. Do we have to statically define the file beforehand? >>>>>>>>>>>>>>>> 2. What happens when we want to dynamically change the >>>>>>>>>>>>>>>> config, say increasing the number of instances of a service >>>>>>>>>>>>>>>> required? Does >>>>>>>>>>>>>>>> aurora require a reboot then? >>>>>>>>>>>>>>>> 3. How do I get notified about the message mesos sends when >>>>>>>>>>>>>>>> it cannot schedule tasks for lack of resources? Should I >>>>>>>>>>>>>>>> depend on aurora >>>>>>>>>>>>>>>> for this or try to look for a hook into mesos? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I think a little bit of context would help here. >>>>>>>>>>>>>>>> What I plan to check is to run a very basic job/task inside >>>>>>>>>>>>>>>> a docker container with aurora & wait for a 'resource not >>>>>>>>>>>>>>>> available' >>>>>>>>>>>>>>>> message from mesos, and accordingly call an api to spin up a >>>>>>>>>>>>>>>> new node in my >>>>>>>>>>>>>>>> cluster. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> κρισhναν >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I believe you are missing the thermos_executor options >>>>>>>>>>>>>>>>> that have to be passed to the scheduler command line. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> See >>>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39 >>>>>>>>>>>>>>>>> for an example >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Stephan >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ------------------------------ >>>>>>>>>>>>>>>>> *From:* Krish <[email protected]> >>>>>>>>>>>>>>>>> *Sent:* Monday, October 19, 2015 8:45 AM >>>>>>>>>>>>>>>>> *To:* [email protected] >>>>>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>> I am a n00b with apache aurora & trying to experiment some >>>>>>>>>>>>>>>>> things on my local machine with zookeeper and mesos-master >>>>>>>>>>>>>>>>> running locally. >>>>>>>>>>>>>>>>> They have initialized properly. When I try to run aurora with >>>>>>>>>>>>>>>>> the required >>>>>>>>>>>>>>>>> options, I get the following error, & googing hasn't helped >>>>>>>>>>>>>>>>> me much here. >>>>>>>>>>>>>>>>> Appreciate any help. Thanks in advance. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>>> WARNING: Method [public void >>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)] >>>>>>>>>>>>>>>>> is synthetic and is being intercepted by >>>>>>>>>>>>>>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8]. >>>>>>>>>>>>>>>>> This could indicate a bug. The method >>>>>>>>>>>>>>>>> may be intercepted twice, or may not be intercepted at >>>>>>>>>>>>>>>>> all. >>>>>>>>>>>>>>>>> Exception in thread "main" >>>>>>>>>>>>>>>>> com.google.inject.CreationException: Guice creation errors: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 1) An exception was caught and reported. Message: A value >>>>>>>>>>>>>>>>> may only be retrieved from a variable that has a default or >>>>>>>>>>>>>>>>> has been >>>>>>>>>>>>>>>>> set. >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 2) Could not find a suitable constructor in >>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes >>>>>>>>>>>>>>>>> must have >>>>>>>>>>>>>>>>> either one (a >>>>>>>>>>>>>>>>> nd only one) constructor annotated with @Inject or a >>>>>>>>>>>>>>>>> zero-argument constructor that is not private. >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 2 errors >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:95) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:83) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263) >>>>>>>>>>>>>>>>> Caused by: java.lang.IllegalStateException: A value may >>>>>>>>>>>>>>>>> only be retrieved from a variable that has a default or has >>>>>>>>>>>>>>>>> been set. >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176) >>>>>>>>>>>>>>>>> at com.twitter.common.args.Arg.get(Arg.java:82) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> com.google.inject.util.Modules$2.configure(Modules.java:114) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> com.google.inject.spi.Elements.getElements(Elements.java:101) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103) >>>>>>>>>>>>>>>>> ... 7 more >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Complete logs are present @http://pastebin.com/i72HvbYi. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> κρισhναν >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Zameer Manji >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> -- Thumb typed mail
