This can also be avoided by setting DOCKER_CONFIG as an os environment variable.
The issue is caused when docker containers from private registry are pulled on a mesos agent due to mesos versions < .26 only supporting the v1 registries which require the .dockercfg config file. Docker 1.8+ uses $HOME/.docker/config.json to store config. Mesos .26 has fixed this issue in the universal containerizer puller, but to workaround this patch enabling a environment file in the mesos-agents systemd service set with DOCKER_CONFIG to say $HOME/.docker/ so the config.json can be picked up correctly. MESOS-2969, MESOS-3031 caused by docker/docker#12009 -Jake On Thu, Mar 3, 2016 at 11:30 AM, Krish <[email protected]> wrote: > Used rbt for the first time and some weird thing happened to the console, > and it got submitted! > https://reviews.apache.org/r/44341/ > > Will sure keep the list posted with any new info. Thanks. > > > > -- > κρισhναν > > On Thu, Mar 3, 2016 at 9:20 PM, Bill Farner <[email protected]> wrote: > >> Likely in an existing page, preferably wherever you think would have >> saved you the trial and error! >> >> I look forward to the blog post, be sure to shoot a link here once it's >> up! >> >> Thanks! >> >> >> On Thursday, March 3, 2016, Krish <[email protected]> wrote: >> >>> Can you guide me how to do that? Should I start with a new page and then >>> submit it or would you like that as an entry in some existing doc? >>> That will be the short term (couple of hours) item on my checklist. >>> >>> Actually, as I said before, I have in mind to blog about my entire >>> design and implementation process - the how and the why of docker >>> configuration, private docker repo setup, coreos cluster setup, and zk, >>> mesos master, aurora containerisation and setup, along with their >>> monitoring (have decided on bosun.org with cAdvisor). And a short guide >>> as to how to run both containerized and non containerized jobs in >>> production. >>> I had to refer to a dozen and more sites and blogs and manuals and >>> source to get so far; and got help from engineers in various mailing lists. >>> A unified guide should be helpful, imho. >>> >>> >>> On Thursday 3 March 2016, Bill Farner <[email protected]> wrote: >>> >>>> Wow! I'm glad you got it working! To help the next poor soul trying >>>> to do this, would you be willing to put up a doc patch on our end? >>>> >>>> On Thursday, March 3, 2016, Krish <[email protected]> wrote: >>>> >>>>> TLDR; >>>>> Use only file with the name .dockercfg for docker credentials in mesos >>>>> tasks! >>>>> >>>>> Long story: >>>>> --------------- >>>>> Holy smokescreens! >>>>> This is for reporting & documenting purposes only, so that others >>>>> don't have to pull their hair like I did for the past few evenings! >>>>> >>>>> A little background: >>>>> I am running Ubuntu 14.04 on my system and docker stores its >>>>> credentials in the ~/.docker/config.json as >>>>> cat ~/.docker/config.json >>>>> { >>>>> "auths": { >>>>> "repo.example.com:5000": { >>>>> "auth": "<snip>", >>>>> "email": "<snip>" >>>>> } >>>>> } >>>>> } >>>>> >>>>> And I am doing all these experiments on a coreOS system which stores >>>>> the credentials in ~/.dockercfg as >>>>> core@aurora-1 ~ $ cat ~/.dockercfg >>>>> { >>>>> "repo.example.com:5000": { >>>>> "auth": "<snip>", >>>>> "email": "<snip>" >>>>> } >>>>> } >>>>> >>>>> Since my container was an Ubuntu 14.04 container (as was my local >>>>> system), I used the ubuntu credential file format, i.e. I couldn't get the >>>>> slave task to read the docker credentials as I had stored it as >>>>> ~/.docker/config.json. >>>>> After parsing through (a lot of find's, grep's and regex matching) >>>>> aurora, mesos, and thermos source code, I saw in >>>>> mesos/src/docker/docker.cpp: >>>>> >>>>> 1126 // Set HOME variable to pick up *.dockercfg*. >>>>> 1127 map<string, string> environment = os::environment(); >>>>> 1128 >>>>> 1129 environment["HOME"] = directory; >>>>> 1130 >>>>> >>>>> Changed the filename and the json content, changed the >>>>> thermos_executor_resources, and bam, docker pull works! >>>>> >>>>> Well, the mesos documentation does say "To run an image from a private >>>>> repository, one can include the URI pointing to a .dockercfg that contains >>>>> login information." and I would have read it a dozen times! >>>>> But I never thought that they literally meant '.dockercfg' as the name >>>>> of the file! >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> κρισhναν >>>>> >>>>> On Thu, Mar 3, 2016 at 1:45 PM, Krish <[email protected]> >>>>> wrote: >>>>> >>>>>> >>>>>> I have got the docker config file copied into the sandbox using the >>>>>> thermos_executor_resources flag; however docker is still not able to find >>>>>> the credentials file for doing an appropriate pull of image from a >>>>>> private >>>>>> repo. >>>>>> >>>>>> When I try to use the library/hello-world:latest image from public >>>>>> docker repo to check if everything works fine without the credentials, I >>>>>> encounter a different problem: >>>>>> exec: "/bin/sh": stat /bin/sh: no such file or directory >>>>>> Error response from daemon: Cannot start container >>>>>> de93dc344d44b41bccccff49e508001a97ff23a8964e637d32a506a31fd4d946: [8] >>>>>> System error: exec: "/bin/sh": stat /bin/sh: no such file or directory >>>>>> >>>>>> I was referring to this email for guidance on setting up a mesos >>>>>> slave: >>>>>> http://mail-archives.apache.org/mod_mbox/aurora-dev/201503.mbox/%3CCAKB1MkHR=+nonesllx9buwttdthsku46pw_wr4b+_z9p59+...@mail.gmail.com%3E >>>>>> >>>>>> So, I cannot get the credentials file to be used by docker, and if I >>>>>> bypass authentication, I can do a docker pull, but encounter a weird >>>>>> error >>>>>> in launching the hello-world image. >>>>>> >>>>>> Am I missing out on checking any log files generated? I currently >>>>>> refer to mesos-slave stdout and the sandbox stderr file. >>>>>> Any configuration parameter I am missing for this to happen? >>>>>> >>>>>> Any pointers will be really helpful. Thanks in advance. >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> κρισhναν >>>>>> >>>>>> On Sun, Feb 28, 2016 at 3:37 PM, Krish <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Continuing my earlier chain of thought, I found this in the mesos >>>>>>> bug list: >>>>>>> MESOS-4242 - Allow Docker private registry credentials to be passed >>>>>>> from framework. >>>>>>> How does one pass credentials using the framework? As it seems the >>>>>>> .docker/config.json is not read from the slave. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> κρισhναν >>>>>>> >>>>>>> On Sat, Feb 27, 2016 at 11:46 PM, Krish <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> I couldn't complete my PoC before project before (got busy with >>>>>>>> other work). Well, it is never too late and here's my update and issue. >>>>>>>> >>>>>>>> I have a 3 node zk (3.5.1 alpha), mesos-master (v0.24.1) & aurora >>>>>>>> (v0.11.0) running. >>>>>>>> I was stuck in a problem where I was using mesos 0.25.0 & aurora >>>>>>>> 0.9.0 & got a protobuf field not set error - ExecutorInfo field. >>>>>>>> >>>>>>>> I have a mesos agent running in docker container on coreos and it >>>>>>>> can access the host docker just fine. >>>>>>>> I have also put the docker login credentials file at the right >>>>>>>> location for it to access the private docker registry. >>>>>>>> I can manually trigger a docker pull and docker run without issues >>>>>>>> from the slave (which is also reflected properly outside the slave >>>>>>>> container with docker images and docker ps). >>>>>>>> >>>>>>>> However, when I try to run an aurora job with hello-docker >>>>>>>> container, the slave prints out the log that docker pull has failed; >>>>>>>> more >>>>>>>> specifically: >>>>>>>> " failed to start: Failed to 'docker pull >>>>>>>> private_repo.com:5000/krish/test:latest': exit status = exited >>>>>>>> with status 1 stderr = Error: image krish/test:latest not found" >>>>>>>> >>>>>>>> My hunch is that when using docker run from aurora DSL, it does not >>>>>>>> read the docker credentials file properly and hence fails. I can >>>>>>>> reproduce >>>>>>>> the exact same error when I delete the credentials file from the slave >>>>>>>> and >>>>>>>> trigger a pull. >>>>>>>> >>>>>>>> Is the hunch right? If yes, is there a way to resolve this? Maybe >>>>>>>> source it some way before the run command? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> κρισhναν >>>>>>>> >>>>>>>> On Tue, Oct 27, 2015 at 10:35 PM, Bill Farner <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> (1) clusters.json is written by you, configuring the CLI client >>>>>>>>> with instructions for what clusters are available and how to discover >>>>>>>>> them. >>>>>>>>> >>>>>>>>> (2) That's expected - mesos only allows one active replica of a >>>>>>>>> framework at a time, this signals which one is active. >>>>>>>>> >>>>>>>>> (3) The observer is essentially a web server that allows you to >>>>>>>>> browse a task's sandbox directory and other information about it. >>>>>>>>> You will >>>>>>>>> need to configure it to run on your worker/agent nodes for that >>>>>>>>> functionality to work (it's linked from the scheduler web UI). >>>>>>>>> >>>>>>>>> (4) You could indeed implement that behavior externally. There is >>>>>>>>> a reason: >>>>>>>>> https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L556-L559 >>>>>>>>> >>>>>>>>> (5) That is correct. The scheduler exposes a thrift API that you >>>>>>>>> would use (a REST API is coming, but ground has not yet been broken). >>>>>>>>> If >>>>>>>>> you go this route, i suggest you skip the DSL and use the JSON task >>>>>>>>> description format that is shipped over the API. There's not good >>>>>>>>> documentation on this, but we can help you through it and would be >>>>>>>>> grateful >>>>>>>>> for a writeup of your approach! >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Oct 26, 2015 at 11:44 PM, Krish <[email protected] >>>>>>>>> > wrote: >>>>>>>>> >>>>>>>>>> Hi Folks, >>>>>>>>>> Firstly, thanks for all the help. Am happy to report that I have >>>>>>>>>> set up zk, mesos & aurora, & can work further towards my idea of >>>>>>>>>> having an >>>>>>>>>> auto-scaling cluster. >>>>>>>>>> I have some further questions about the work done so far & things >>>>>>>>>> I plan to do: >>>>>>>>>> >>>>>>>>>> 1. Is the /etc/aurora/clusters.json file created by the >>>>>>>>>> scheduled or does it need to be handcrafted? I had to manually >>>>>>>>>> edit the >>>>>>>>>> file to get my `aurora job ...` cli to work. >>>>>>>>>> >>>>>>>>>> 2. I am running a cluster of 3 coreOS VMs on vagrant with zk, >>>>>>>>>> mesos & aurora in a docker container. Only 1 of them outputs '1' >>>>>>>>>> when I >>>>>>>>>> look at the framework_registered' field. Is this expected? How do >>>>>>>>>> I verify >>>>>>>>>> that they are working as a cluster? >>>>>>>>>> >>>>>>>>>> 3. From the documentation, I see that there is an observer >>>>>>>>>> that needs to be listening on port 1338. What is the observer >>>>>>>>>> socket & its >>>>>>>>>> purpose? I have aurora listening only on ports 8081 (http port) & >>>>>>>>>> 8083 >>>>>>>>>> (libprocess). >>>>>>>>>> >>>>>>>>>> 4. I read about the 'PENDING' field in aurora documentation, >>>>>>>>>> as Bill suggested, & realize that it just shows that a task is >>>>>>>>>> waiting for >>>>>>>>>> some reasons (for want of resources, in my case, as 0 slaves have >>>>>>>>>> registered). I was thinking of adding a hook to the pending >>>>>>>>>> state; say if a >>>>>>>>>> task is PENDING for 5 minutes for lack of resources in the >>>>>>>>>> cluster, then >>>>>>>>>> spin up a new machine. Is this the right approach to take? Does >>>>>>>>>> aurora >>>>>>>>>> provide reasons for why is a task in PENDING state? >>>>>>>>>> >>>>>>>>>> => aurora job status testcluster/$USER/test/hello_world >>>>>>>>>> INFO] Checking status of testcluster/ubuntu/test/hello_world >>>>>>>>>> Active tasks (1): >>>>>>>>>> Task role: ubuntu, env: test, name: hello_world, >>>>>>>>>> instance: 0, status: >>>>>>>>>> PENDING on None >>>>>>>>>> cpus: 0.1, ram: 16 MB, disk: 16 MB >>>>>>>>>> events: >>>>>>>>>> 2015-10-23 04:55:33 PENDING: None >>>>>>>>>> Inactive tasks (0): >>>>>>>>>> >>>>>>>>>> 5. Aurora defines job/s is a .aurora config file & if I >>>>>>>>>> decide to increase/decrease the number of instances in my >>>>>>>>>> cluster, then I >>>>>>>>>> need to create/overwrite the concerned the .aurora and trigger >>>>>>>>>> the `aurora >>>>>>>>>> update ...` command. Is this right? >>>>>>>>>> If yes, is there an HTTP API I can invoke remotely which >>>>>>>>>> triggers this update? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> κρισhναν >>>>>>>>>> >>>>>>>>>> On Fri, Oct 23, 2015 at 8:09 AM, Joshua Cohen < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> I suspect your error from `aurora job create ...` is due to the >>>>>>>>>>> aurora config you're using referencing `/vagrant/hello_world.py` >>>>>>>>>>> which does >>>>>>>>>>> not exist (as you say: you're not even using Vagrant). Can you link >>>>>>>>>>> the >>>>>>>>>>> .aurora config you're using? >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> >>>>>>>>>>> Joshua >>>>>>>>>>> >>>>>>>>>>> On Thu, Oct 22, 2015 at 3:22 PM, Krish < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Thanks, Zameer. >>>>>>>>>>>> >>>>>>>>>>>> I had to modify /etc/aurora/clusters.json: >>>>>>>>>>>> [ >>>>>>>>>>>> { >>>>>>>>>>>> "auth_mechanism": "UNAUTHENTICATED", >>>>>>>>>>>> "name": "testcluster", >>>>>>>>>>>> "scheduler_zk_path": "/scheduler/aurora", >>>>>>>>>>>> "slave_root": "/var/lib/mesos", >>>>>>>>>>>> "slave_run_directory": "latest", >>>>>>>>>>>> "zk": "127.0.1.1" >>>>>>>>>>>> } >>>>>>>>>>>> ] >>>>>>>>>>>> >>>>>>>>>>>> I have a hello_world.aurora in my home folder. However the >>>>>>>>>>>> following command errors out: >>>>>>>>>>>> ~$ aurora job create testcluster/testrole/test/hellojob >>>>>>>>>>>> ./hello_world.aurora >>>>>>>>>>>> Error loading configuration: [Errno 2] No such file or >>>>>>>>>>>> directory: '/vagrant/hello_world.py' >>>>>>>>>>>> >>>>>>>>>>>> A job list does work: >>>>>>>>>>>> ~$ aurora job list testcluster >>>>>>>>>>>> INFO] Retrieving jobs for role None >>>>>>>>>>>> >>>>>>>>>>>> I am not even using the vagrant. I am using zk & mesos on the >>>>>>>>>>>> same machine as aurora. How do I submit these job templates to >>>>>>>>>>>> aurora? >>>>>>>>>>>> >>>>>>>>>>>> Any pointers to documentation will be helpful. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> κρισhναν >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Oct 21, 2015 at 11:09 PM, Zameer Manji < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Version 0.9.0 does not have the gc executor. Version 0.9.0 >>>>>>>>>>>>> uses Mesos' task reconciliation >>>>>>>>>>>>> <http://mesos.apache.org/documentation/latest/reconciliation/> API >>>>>>>>>>>>> instead. >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Oct 21, 2015 at 9:28 AM, Krish < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks Bill for the location to the debs. I was finally able >>>>>>>>>>>>>> to run aurora. :) >>>>>>>>>>>>>> >>>>>>>>>>>>>> I did find thermos_executor.pex & thermos_observer after >>>>>>>>>>>>>> installing aurora-executor. I still could not find >>>>>>>>>>>>>> gc_executor.pex on my >>>>>>>>>>>>>> system. >>>>>>>>>>>>>> Is there a location from where I can download the binaries >>>>>>>>>>>>>> for *.pex or build them from scratch? >>>>>>>>>>>>>> >>>>>>>>>>>>>> root@dev:/# find . -name "*.pex" >>>>>>>>>>>>>> ./usr/share/aurora/bin/thermos_executor.pex >>>>>>>>>>>>>> ./usr/share/aurora/bin/kaurora_admin.pex >>>>>>>>>>>>>> ./usr/share/aurora/bin/kaurora.pex >>>>>>>>>>>>>> ./usr/share/aurora/bin/thermos.pex >>>>>>>>>>>>>> ./usr/share/aurora/bin/thermos_observer.pex >>>>>>>>>>>>>> ./home/ubuntu/.pex >>>>>>>>>>>>>> ./root/.pex >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> κρισhναν >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 11:46 PM, Bill Farner < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Aurora currently requires an executor, so setting it to >>>>>>>>>>>>>>> /dev/null will not work. Happy to talk further about your >>>>>>>>>>>>>>> thoughts around >>>>>>>>>>>>>>> sidestepping the executor. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> As for working with the scheduler source code, it's a >>>>>>>>>>>>>>> standard gradle project and we tend to use intellij. Docs to >>>>>>>>>>>>>>> help ramp on >>>>>>>>>>>>>>> that: >>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> As for builds - the .zip is a source distribution, so it >>>>>>>>>>>>>>> won't have any pre-built binaries. If you're on debian, we >>>>>>>>>>>>>>> have official >>>>>>>>>>>>>>> debs here: https://bintray.com/apache/aurora >>>>>>>>>>>>>>> You can see how they're built here (and can build your own) >>>>>>>>>>>>>>> packages: https://github.com/apache/aurora-packaging >>>>>>>>>>>>>>> We're close to having official RPMs, but none to speak of >>>>>>>>>>>>>>> yet. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 9:47 AM, Krish < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Stephen, >>>>>>>>>>>>>>>> I am trying to get started and run aurora without thermos >>>>>>>>>>>>>>>> executor (setting it to /dev/null does not help) - on a local >>>>>>>>>>>>>>>> linux box for >>>>>>>>>>>>>>>> now & planning to containerize/dockerize it later. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Can you please point me to the right documentation (or a >>>>>>>>>>>>>>>> pointer to the cli parsing source code) which can help me >>>>>>>>>>>>>>>> resolve this? >>>>>>>>>>>>>>>> Also, are there any steps steps to import source code into >>>>>>>>>>>>>>>> eclipse to >>>>>>>>>>>>>>>> browse & analyze code for this. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Also, where do i find all the *.pex files? They are not >>>>>>>>>>>>>>>> present in the zip file nor anywhere in the built source code. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I know I am asking too many queries on a single thread >>>>>>>>>>>>>>>> here, & would appreciate the help. >>>>>>>>>>>>>>>> I think at the end of this, I will put the steps I followed >>>>>>>>>>>>>>>> in a gist/blog so others might find their way around, & not >>>>>>>>>>>>>>>> struggle as >>>>>>>>>>>>>>>> much. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> κρισhναν >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Krish, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> you don't have to set framework_authentication_file and >>>>>>>>>>>>>>>>> zk_digest_credentials. The scheduler help text is misleading >>>>>>>>>>>>>>>>> here as >>>>>>>>>>>>>>>>> everything will work fine if you leave those empty. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> In addition, looks like you are misunderstanding the usage >>>>>>>>>>>>>>>>> of the thermos_executor_path command line flag of the >>>>>>>>>>>>>>>>> scheduler. It is supposed to point to the binary containing >>>>>>>>>>>>>>>>> the generic >>>>>>>>>>>>>>>>> Aurora executor (thermos_executor.pex). You only need >>>>>>>>>>>>>>>>> the hello_world.aurora once your scheduler is up an >>>>>>>>>>>>>>>>> running. It serves as an example input for the aurora command >>>>>>>>>>>>>>>>> line client >>>>>>>>>>>>>>>>> which can be used to scheduler jobs and services on an Aurora >>>>>>>>>>>>>>>>> master. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Have you tried to use the vagrant box? Just type 'vagrant >>>>>>>>>>>>>>>>> up`in a checkout of the Aurora source code. It gives you a >>>>>>>>>>>>>>>>> running >>>>>>>>>>>>>>>>> scheduler to play with. Once you have understood how it >>>>>>>>>>>>>>>>> works, you can >>>>>>>>>>>>>>>>> start trying to install it on your own (by >>>>>>>>>>>>>>>>> reverse-engineering the vagrant >>>>>>>>>>>>>>>>> box). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hope this helps a little, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Stephan >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ------------------------------ >>>>>>>>>>>>>>>>> *From:* Krish <[email protected]> >>>>>>>>>>>>>>>>> *Sent:* Tuesday, October 20, 2015 11:39 AM >>>>>>>>>>>>>>>>> *To:* Bill Farner >>>>>>>>>>>>>>>>> *Cc:* [email protected]; Erb, Stephan >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Bill/Stephen, >>>>>>>>>>>>>>>>> I still get a stacktrace when running the aurora scheduler >>>>>>>>>>>>>>>>> CLI. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I do not know what to specify for >>>>>>>>>>>>>>>>> -framework_authentication_file & -zk_digest_credentials, and >>>>>>>>>>>>>>>>> they are >>>>>>>>>>>>>>>>> required arguments. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I am not using any authentication on Mesos master, do I >>>>>>>>>>>>>>>>> still need the framework_authentication_file parameter? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> rm -rf /db /backup_dir >>>>>>>>>>>>>>>>> mesos-log initialize --path="/db" >>>>>>>>>>>>>>>>> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/ >>>>>>>>>>>>>>>>> JAVA_OPTS="-Xmx1536m -Xms256m" >>>>>>>>>>>>>>>>> /usr/local/aurora-scheduler/bin/aurora-scheduler >>>>>>>>>>>>>>>>> -backup_dir=/backup_dir >>>>>>>>>>>>>>>>> -cluster_name=tc >>>>>>>>>>>>>>>>> -mesos_master_address=zk://localhost:2181/mesos/master >>>>>>>>>>>>>>>>> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181 >>>>>>>>>>>>>>>>> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false >>>>>>>>>>>>>>>>> -native_log_file_path=/db >>>>>>>>>>>>>>>>> -thermos_executor_path=/home/ubuntu/hello_world.aurora >>>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>>> INFO: Binding org.apache.aurora.scheduler.http.Utilization >>>>>>>>>>>>>>>>> to GuiceManagedCompon >>>>>>>>>>>>>>>>> entProvider with the scope "PerRequest" >>>>>>>>>>>>>>>>> Oct 20, 2015 9:27:40 AM >>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.cron.quartz.CronModule provi >>>>>>>>>>>>>>>>> deTimeZone >>>>>>>>>>>>>>>>> WARNING: Cron schedules are configured to fire according >>>>>>>>>>>>>>>>> to timezone Greenwich M >>>>>>>>>>>>>>>>> ean Time but system timezone is set to Coordinated >>>>>>>>>>>>>>>>> Universal Time >>>>>>>>>>>>>>>>> Oct 20, 2015 9:27:41 AM >>>>>>>>>>>>>>>>> org.eclipse.jetty.server.AbstractConnector doStart >>>>>>>>>>>>>>>>> INFO: Started [email protected]:43843 >>>>>>>>>>>>>>>>> E1020 09:27:41.290 THREAD1 >>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec >>>>>>>>>>>>>>>>> ute: Caught unchecked exception: >>>>>>>>>>>>>>>>> com.google.inject.ProvisionException: Guice pro >>>>>>>>>>>>>>>>> vision errors: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 1) Error in custom provider, >>>>>>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be null at >>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos >>>>>>>>>>>>>>>>> LogStreamModule.java:117) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos >>>>>>>>>>>>>>>>> LogStreamModule.java:117) >>>>>>>>>>>>>>>>> while locating org.apache.mesos.Log >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf >>>>>>>>>>>>>>>>> ace(MesosLogStreamModule.java:152) >>>>>>>>>>>>>>>>> while locating >>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 1 error >>>>>>>>>>>>>>>>> com.google.inject.ProvisionException: Guice provision >>>>>>>>>>>>>>>>> errors: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 1) Error in custom provider, >>>>>>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be >>>>>>>>>>>>>>>>> null >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos >>>>>>>>>>>>>>>>> LogStreamModule.java:117) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos >>>>>>>>>>>>>>>>> LogStreamModule.java:117) >>>>>>>>>>>>>>>>> while locating org.apache.mesos.Log >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf >>>>>>>>>>>>>>>>> ace(MesosLogStreamModule.java:152) >>>>>>>>>>>>>>>>> while locating >>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 1 error >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> κρισhναν >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The typical flow is that you keep your .aurora file >>>>>>>>>>>>>>>>>> checked into git, and commit every time you deploy/update. >>>>>>>>>>>>>>>>>> When you change >>>>>>>>>>>>>>>>>> your file, you will instruct Aurora to update the live job >>>>>>>>>>>>>>>>>> (have a look at aurora >>>>>>>>>>>>>>>>>> update -h). Aurora will perform a rolling upgrade of >>>>>>>>>>>>>>>>>> your job to the new config. You'll use this same flow for >>>>>>>>>>>>>>>>>> updating your >>>>>>>>>>>>>>>>>> job's software as well as resizing the job. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> For (3), you could set up alerting for stats that the >>>>>>>>>>>>>>>>>> scheduler exports. Have a look here for monitoring >>>>>>>>>>>>>>>>>> background: >>>>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/monitoring.md >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> You'll find want to look at scheduler stats related to >>>>>>>>>>>>>>>>>> 'pending'. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 12:16 PM, Krish < >>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks for the pointer. Now I notice that the >>>>>>>>>>>>>>>>>>> aurora-scheduler script has the --thermos_executor_path as >>>>>>>>>>>>>>>>>>> a mandatory >>>>>>>>>>>>>>>>>>> requirement. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I have a couple of questions on how the >>>>>>>>>>>>>>>>>>> thermos_executor/.aurora config file functions: >>>>>>>>>>>>>>>>>>> 1. Do we have to statically define the file beforehand? >>>>>>>>>>>>>>>>>>> 2. What happens when we want to dynamically change the >>>>>>>>>>>>>>>>>>> config, say increasing the number of instances of a service >>>>>>>>>>>>>>>>>>> required? Does >>>>>>>>>>>>>>>>>>> aurora require a reboot then? >>>>>>>>>>>>>>>>>>> 3. How do I get notified about the message mesos sends >>>>>>>>>>>>>>>>>>> when it cannot schedule tasks for lack of resources? Should >>>>>>>>>>>>>>>>>>> I depend on >>>>>>>>>>>>>>>>>>> aurora for this or try to look for a hook into mesos? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I think a little bit of context would help here. >>>>>>>>>>>>>>>>>>> What I plan to check is to run a very basic job/task >>>>>>>>>>>>>>>>>>> inside a docker container with aurora & wait for a >>>>>>>>>>>>>>>>>>> 'resource not available' >>>>>>>>>>>>>>>>>>> message from mesos, and accordingly call an api to spin up >>>>>>>>>>>>>>>>>>> a new node in my >>>>>>>>>>>>>>>>>>> cluster. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>> κρισhναν >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan < >>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I believe you are missing the thermos_executor options >>>>>>>>>>>>>>>>>>>> that have to be passed to the scheduler command line. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> See >>>>>>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39 >>>>>>>>>>>>>>>>>>>> for an example >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Stephan >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> ------------------------------ >>>>>>>>>>>>>>>>>>>> *From:* Krish <[email protected]> >>>>>>>>>>>>>>>>>>>> *Sent:* Monday, October 19, 2015 8:45 AM >>>>>>>>>>>>>>>>>>>> *To:* [email protected] >>>>>>>>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>> I am a n00b with apache aurora & trying to experiment >>>>>>>>>>>>>>>>>>>> some things on my local machine with zookeeper and >>>>>>>>>>>>>>>>>>>> mesos-master running >>>>>>>>>>>>>>>>>>>> locally. They have initialized properly. When I try to run >>>>>>>>>>>>>>>>>>>> aurora with the >>>>>>>>>>>>>>>>>>>> required options, I get the following error, & googing >>>>>>>>>>>>>>>>>>>> hasn't helped me >>>>>>>>>>>>>>>>>>>> much here. >>>>>>>>>>>>>>>>>>>> Appreciate any help. Thanks in advance. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>>>>>> WARNING: Method [public void >>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)] >>>>>>>>>>>>>>>>>>>> is synthetic and is being intercepted by >>>>>>>>>>>>>>>>>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8]. >>>>>>>>>>>>>>>>>>>> This could indicate a bug. The method >>>>>>>>>>>>>>>>>>>> may be intercepted twice, or may not be intercepted at >>>>>>>>>>>>>>>>>>>> all. >>>>>>>>>>>>>>>>>>>> Exception in thread "main" >>>>>>>>>>>>>>>>>>>> com.google.inject.CreationException: Guice creation errors: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> 1) An exception was caught and reported. Message: A >>>>>>>>>>>>>>>>>>>> value may only be retrieved from a variable that has a >>>>>>>>>>>>>>>>>>>> default or has been >>>>>>>>>>>>>>>>>>>> set. >>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133) >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> 2) Could not find a suitable constructor in >>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. >>>>>>>>>>>>>>>>>>>> Classes must have >>>>>>>>>>>>>>>>>>>> either one (a >>>>>>>>>>>>>>>>>>>> nd only one) constructor annotated with @Inject or a >>>>>>>>>>>>>>>>>>>> zero-argument constructor that is not private. >>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43) >>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204) >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> 2 errors >>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435) >>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154) >>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106) >>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:95) >>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:83) >>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120) >>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87) >>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181) >>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142) >>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263) >>>>>>>>>>>>>>>>>>>> Caused by: java.lang.IllegalStateException: A value may >>>>>>>>>>>>>>>>>>>> only be retrieved from a variable that has a default or >>>>>>>>>>>>>>>>>>>> has been set. >>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176) >>>>>>>>>>>>>>>>>>>> at com.twitter.common.args.Arg.get(Arg.java:82) >>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206) >>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59) >>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223) >>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>> com.google.inject.util.Modules$2.configure(Modules.java:114) >>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223) >>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements.getElements(Elements.java:101) >>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133) >>>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103) >>>>>>>>>>>>>>>>>>>> ... 7 more >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Complete logs are present @http://pastebin.com/i72HvbYi >>>>>>>>>>>>>>>>>>>> . >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>> κρισhναν >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Zameer Manji >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>> >>> -- >>> >>> Thumb typed mail >>> >>> >
