Likely in an existing page, preferably wherever you think would have saved you the trial and error!
I look forward to the blog post, be sure to shoot a link here once it's up! Thanks! On Thursday, March 3, 2016, Krish <[email protected]> wrote: > Can you guide me how to do that? Should I start with a new page and then > submit it or would you like that as an entry in some existing doc? > That will be the short term (couple of hours) item on my checklist. > > Actually, as I said before, I have in mind to blog about my entire design > and implementation process - the how and the why of docker configuration, > private docker repo setup, coreos cluster setup, and zk, mesos master, > aurora containerisation and setup, along with their monitoring (have > decided on bosun.org with cAdvisor). And a short guide as to how to run > both containerized and non containerized jobs in production. > I had to refer to a dozen and more sites and blogs and manuals and source > to get so far; and got help from engineers in various mailing lists. > A unified guide should be helpful, imho. > > > On Thursday 3 March 2016, Bill Farner <[email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > >> Wow! I'm glad you got it working! To help the next poor soul trying to >> do this, would you be willing to put up a doc patch on our end? >> >> On Thursday, March 3, 2016, Krish <[email protected]> wrote: >> >>> TLDR; >>> Use only file with the name .dockercfg for docker credentials in mesos >>> tasks! >>> >>> Long story: >>> --------------- >>> Holy smokescreens! >>> This is for reporting & documenting purposes only, so that others don't >>> have to pull their hair like I did for the past few evenings! >>> >>> A little background: >>> I am running Ubuntu 14.04 on my system and docker stores its credentials >>> in the ~/.docker/config.json as >>> cat ~/.docker/config.json >>> { >>> "auths": { >>> "repo.example.com:5000": { >>> "auth": "<snip>", >>> "email": "<snip>" >>> } >>> } >>> } >>> >>> And I am doing all these experiments on a coreOS system which stores the >>> credentials in ~/.dockercfg as >>> core@aurora-1 ~ $ cat ~/.dockercfg >>> { >>> "repo.example.com:5000": { >>> "auth": "<snip>", >>> "email": "<snip>" >>> } >>> } >>> >>> Since my container was an Ubuntu 14.04 container (as was my local >>> system), I used the ubuntu credential file format, i.e. I couldn't get the >>> slave task to read the docker credentials as I had stored it as >>> ~/.docker/config.json. >>> After parsing through (a lot of find's, grep's and regex matching) >>> aurora, mesos, and thermos source code, I saw in >>> mesos/src/docker/docker.cpp: >>> >>> 1126 // Set HOME variable to pick up *.dockercfg*. >>> 1127 map<string, string> environment = os::environment(); >>> 1128 >>> 1129 environment["HOME"] = directory; >>> 1130 >>> >>> Changed the filename and the json content, changed the >>> thermos_executor_resources, and bam, docker pull works! >>> >>> Well, the mesos documentation does say "To run an image from a private >>> repository, one can include the URI pointing to a .dockercfg that contains >>> login information." and I would have read it a dozen times! >>> But I never thought that they literally meant '.dockercfg' as the name >>> of the file! >>> >>> >>> >>> >>> -- >>> κρισhναν >>> >>> On Thu, Mar 3, 2016 at 1:45 PM, Krish <[email protected]> wrote: >>> >>>> >>>> I have got the docker config file copied into the sandbox using the >>>> thermos_executor_resources flag; however docker is still not able to find >>>> the credentials file for doing an appropriate pull of image from a private >>>> repo. >>>> >>>> When I try to use the library/hello-world:latest image from public >>>> docker repo to check if everything works fine without the credentials, I >>>> encounter a different problem: >>>> exec: "/bin/sh": stat /bin/sh: no such file or directory >>>> Error response from daemon: Cannot start container >>>> de93dc344d44b41bccccff49e508001a97ff23a8964e637d32a506a31fd4d946: [8] >>>> System error: exec: "/bin/sh": stat /bin/sh: no such file or directory >>>> >>>> I was referring to this email for guidance on setting up a mesos slave: >>>> http://mail-archives.apache.org/mod_mbox/aurora-dev/201503.mbox/%3CCAKB1MkHR=+nonesllx9buwttdthsku46pw_wr4b+_z9p59+...@mail.gmail.com%3E >>>> >>>> So, I cannot get the credentials file to be used by docker, and if I >>>> bypass authentication, I can do a docker pull, but encounter a weird error >>>> in launching the hello-world image. >>>> >>>> Am I missing out on checking any log files generated? I currently refer >>>> to mesos-slave stdout and the sandbox stderr file. >>>> Any configuration parameter I am missing for this to happen? >>>> >>>> Any pointers will be really helpful. Thanks in advance. >>>> >>>> >>>> >>>> -- >>>> κρισhναν >>>> >>>> On Sun, Feb 28, 2016 at 3:37 PM, Krish <[email protected]> >>>> wrote: >>>> >>>>> Continuing my earlier chain of thought, I found this in the mesos bug >>>>> list: >>>>> MESOS-4242 - Allow Docker private registry credentials to be passed >>>>> from framework. >>>>> How does one pass credentials using the framework? As it seems the >>>>> .docker/config.json is not read from the slave. >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> κρισhναν >>>>> >>>>> On Sat, Feb 27, 2016 at 11:46 PM, Krish <[email protected]> >>>>> wrote: >>>>> >>>>>> I couldn't complete my PoC before project before (got busy with other >>>>>> work). Well, it is never too late and here's my update and issue. >>>>>> >>>>>> I have a 3 node zk (3.5.1 alpha), mesos-master (v0.24.1) & aurora >>>>>> (v0.11.0) running. >>>>>> I was stuck in a problem where I was using mesos 0.25.0 & aurora >>>>>> 0.9.0 & got a protobuf field not set error - ExecutorInfo field. >>>>>> >>>>>> I have a mesos agent running in docker container on coreos and it can >>>>>> access the host docker just fine. >>>>>> I have also put the docker login credentials file at the right >>>>>> location for it to access the private docker registry. >>>>>> I can manually trigger a docker pull and docker run without issues >>>>>> from the slave (which is also reflected properly outside the slave >>>>>> container with docker images and docker ps). >>>>>> >>>>>> However, when I try to run an aurora job with hello-docker container, >>>>>> the slave prints out the log that docker pull has failed; more >>>>>> specifically: >>>>>> " failed to start: Failed to 'docker pull >>>>>> private_repo.com:5000/krish/test:latest': exit status = exited with >>>>>> status 1 stderr = Error: image krish/test:latest not found" >>>>>> >>>>>> My hunch is that when using docker run from aurora DSL, it does not >>>>>> read the docker credentials file properly and hence fails. I can >>>>>> reproduce >>>>>> the exact same error when I delete the credentials file from the slave >>>>>> and >>>>>> trigger a pull. >>>>>> >>>>>> Is the hunch right? If yes, is there a way to resolve this? Maybe >>>>>> source it some way before the run command? >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> κρισhναν >>>>>> >>>>>> On Tue, Oct 27, 2015 at 10:35 PM, Bill Farner <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> (1) clusters.json is written by you, configuring the CLI client with >>>>>>> instructions for what clusters are available and how to discover them. >>>>>>> >>>>>>> (2) That's expected - mesos only allows one active replica of a >>>>>>> framework at a time, this signals which one is active. >>>>>>> >>>>>>> (3) The observer is essentially a web server that allows you to >>>>>>> browse a task's sandbox directory and other information about it. You >>>>>>> will >>>>>>> need to configure it to run on your worker/agent nodes for that >>>>>>> functionality to work (it's linked from the scheduler web UI). >>>>>>> >>>>>>> (4) You could indeed implement that behavior externally. There is a >>>>>>> reason: >>>>>>> https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L556-L559 >>>>>>> >>>>>>> (5) That is correct. The scheduler exposes a thrift API that you >>>>>>> would use (a REST API is coming, but ground has not yet been broken). >>>>>>> If >>>>>>> you go this route, i suggest you skip the DSL and use the JSON task >>>>>>> description format that is shipped over the API. There's not good >>>>>>> documentation on this, but we can help you through it and would be >>>>>>> grateful >>>>>>> for a writeup of your approach! >>>>>>> >>>>>>> >>>>>>> On Mon, Oct 26, 2015 at 11:44 PM, Krish <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Folks, >>>>>>>> Firstly, thanks for all the help. Am happy to report that I have >>>>>>>> set up zk, mesos & aurora, & can work further towards my idea of >>>>>>>> having an >>>>>>>> auto-scaling cluster. >>>>>>>> I have some further questions about the work done so far & things I >>>>>>>> plan to do: >>>>>>>> >>>>>>>> 1. Is the /etc/aurora/clusters.json file created by the >>>>>>>> scheduled or does it need to be handcrafted? I had to manually edit >>>>>>>> the >>>>>>>> file to get my `aurora job ...` cli to work. >>>>>>>> >>>>>>>> 2. I am running a cluster of 3 coreOS VMs on vagrant with zk, >>>>>>>> mesos & aurora in a docker container. Only 1 of them outputs '1' >>>>>>>> when I >>>>>>>> look at the framework_registered' field. Is this expected? How do I >>>>>>>> verify >>>>>>>> that they are working as a cluster? >>>>>>>> >>>>>>>> 3. From the documentation, I see that there is an observer that >>>>>>>> needs to be listening on port 1338. What is the observer socket & >>>>>>>> its >>>>>>>> purpose? I have aurora listening only on ports 8081 (http port) & >>>>>>>> 8083 >>>>>>>> (libprocess). >>>>>>>> >>>>>>>> 4. I read about the 'PENDING' field in aurora documentation, as >>>>>>>> Bill suggested, & realize that it just shows that a task is waiting >>>>>>>> for >>>>>>>> some reasons (for want of resources, in my case, as 0 slaves have >>>>>>>> registered). I was thinking of adding a hook to the pending state; >>>>>>>> say if a >>>>>>>> task is PENDING for 5 minutes for lack of resources in the cluster, >>>>>>>> then >>>>>>>> spin up a new machine. Is this the right approach to take? Does >>>>>>>> aurora >>>>>>>> provide reasons for why is a task in PENDING state? >>>>>>>> >>>>>>>> => aurora job status testcluster/$USER/test/hello_world >>>>>>>> INFO] Checking status of testcluster/ubuntu/test/hello_world >>>>>>>> Active tasks (1): >>>>>>>> Task role: ubuntu, env: test, name: hello_world, >>>>>>>> instance: 0, status: >>>>>>>> PENDING on None >>>>>>>> cpus: 0.1, ram: 16 MB, disk: 16 MB >>>>>>>> events: >>>>>>>> 2015-10-23 04:55:33 PENDING: None >>>>>>>> Inactive tasks (0): >>>>>>>> >>>>>>>> 5. Aurora defines job/s is a .aurora config file & if I decide >>>>>>>> to increase/decrease the number of instances in my cluster, then I >>>>>>>> need to >>>>>>>> create/overwrite the concerned the .aurora and trigger the `aurora >>>>>>>> update >>>>>>>> ...` command. Is this right? >>>>>>>> If yes, is there an HTTP API I can invoke remotely which >>>>>>>> triggers this update? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> κρισhναν >>>>>>>> >>>>>>>> On Fri, Oct 23, 2015 at 8:09 AM, Joshua Cohen < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> I suspect your error from `aurora job create ...` is due to the >>>>>>>>> aurora config you're using referencing `/vagrant/hello_world.py` >>>>>>>>> which does >>>>>>>>> not exist (as you say: you're not even using Vagrant). Can you link >>>>>>>>> the >>>>>>>>> .aurora config you're using? >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> >>>>>>>>> Joshua >>>>>>>>> >>>>>>>>> On Thu, Oct 22, 2015 at 3:22 PM, Krish <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Thanks, Zameer. >>>>>>>>>> >>>>>>>>>> I had to modify /etc/aurora/clusters.json: >>>>>>>>>> [ >>>>>>>>>> { >>>>>>>>>> "auth_mechanism": "UNAUTHENTICATED", >>>>>>>>>> "name": "testcluster", >>>>>>>>>> "scheduler_zk_path": "/scheduler/aurora", >>>>>>>>>> "slave_root": "/var/lib/mesos", >>>>>>>>>> "slave_run_directory": "latest", >>>>>>>>>> "zk": "127.0.1.1" >>>>>>>>>> } >>>>>>>>>> ] >>>>>>>>>> >>>>>>>>>> I have a hello_world.aurora in my home folder. However the >>>>>>>>>> following command errors out: >>>>>>>>>> ~$ aurora job create testcluster/testrole/test/hellojob >>>>>>>>>> ./hello_world.aurora >>>>>>>>>> Error loading configuration: [Errno 2] No such file or directory: >>>>>>>>>> '/vagrant/hello_world.py' >>>>>>>>>> >>>>>>>>>> A job list does work: >>>>>>>>>> ~$ aurora job list testcluster >>>>>>>>>> INFO] Retrieving jobs for role None >>>>>>>>>> >>>>>>>>>> I am not even using the vagrant. I am using zk & mesos on the >>>>>>>>>> same machine as aurora. How do I submit these job templates to >>>>>>>>>> aurora? >>>>>>>>>> >>>>>>>>>> Any pointers to documentation will be helpful. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> κρισhναν >>>>>>>>>> >>>>>>>>>> On Wed, Oct 21, 2015 at 11:09 PM, Zameer Manji <[email protected] >>>>>>>>>> > wrote: >>>>>>>>>> >>>>>>>>>>> Version 0.9.0 does not have the gc executor. Version 0.9.0 uses >>>>>>>>>>> Mesos' task reconciliation >>>>>>>>>>> <http://mesos.apache.org/documentation/latest/reconciliation/> API >>>>>>>>>>> instead. >>>>>>>>>>> >>>>>>>>>>> On Wed, Oct 21, 2015 at 9:28 AM, Krish < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Thanks Bill for the location to the debs. I was finally able to >>>>>>>>>>>> run aurora. :) >>>>>>>>>>>> >>>>>>>>>>>> I did find thermos_executor.pex & thermos_observer after >>>>>>>>>>>> installing aurora-executor. I still could not find gc_executor.pex >>>>>>>>>>>> on my >>>>>>>>>>>> system. >>>>>>>>>>>> Is there a location from where I can download the binaries for >>>>>>>>>>>> *.pex or build them from scratch? >>>>>>>>>>>> >>>>>>>>>>>> root@dev:/# find . -name "*.pex" >>>>>>>>>>>> ./usr/share/aurora/bin/thermos_executor.pex >>>>>>>>>>>> ./usr/share/aurora/bin/kaurora_admin.pex >>>>>>>>>>>> ./usr/share/aurora/bin/kaurora.pex >>>>>>>>>>>> ./usr/share/aurora/bin/thermos.pex >>>>>>>>>>>> ./usr/share/aurora/bin/thermos_observer.pex >>>>>>>>>>>> ./home/ubuntu/.pex >>>>>>>>>>>> ./root/.pex >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> κρισhναν >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Oct 20, 2015 at 11:46 PM, Bill Farner < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Aurora currently requires an executor, so setting it to >>>>>>>>>>>>> /dev/null will not work. Happy to talk further about your >>>>>>>>>>>>> thoughts around >>>>>>>>>>>>> sidestepping the executor. >>>>>>>>>>>>> >>>>>>>>>>>>> As for working with the scheduler source code, it's a standard >>>>>>>>>>>>> gradle project and we tend to use intellij. Docs to help ramp on >>>>>>>>>>>>> that: >>>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md >>>>>>>>>>>>> >>>>>>>>>>>>> As for builds - the .zip is a source distribution, so it won't >>>>>>>>>>>>> have any pre-built binaries. If you're on debian, we have >>>>>>>>>>>>> official debs >>>>>>>>>>>>> here: https://bintray.com/apache/aurora >>>>>>>>>>>>> You can see how they're built here (and can build your own) >>>>>>>>>>>>> packages: https://github.com/apache/aurora-packaging >>>>>>>>>>>>> We're close to having official RPMs, but none to speak of yet. >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Oct 20, 2015 at 9:47 AM, Krish < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Stephen, >>>>>>>>>>>>>> I am trying to get started and run aurora without thermos >>>>>>>>>>>>>> executor (setting it to /dev/null does not help) - on a local >>>>>>>>>>>>>> linux box for >>>>>>>>>>>>>> now & planning to containerize/dockerize it later. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Can you please point me to the right documentation (or a >>>>>>>>>>>>>> pointer to the cli parsing source code) which can help me >>>>>>>>>>>>>> resolve this? >>>>>>>>>>>>>> Also, are there any steps steps to import source code into >>>>>>>>>>>>>> eclipse to >>>>>>>>>>>>>> browse & analyze code for this. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Also, where do i find all the *.pex files? They are not >>>>>>>>>>>>>> present in the zip file nor anywhere in the built source code. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I know I am asking too many queries on a single thread here, >>>>>>>>>>>>>> & would appreciate the help. >>>>>>>>>>>>>> I think at the end of this, I will put the steps I followed >>>>>>>>>>>>>> in a gist/blog so others might find their way around, & not >>>>>>>>>>>>>> struggle as >>>>>>>>>>>>>> much. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> κρισhναν >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Krish, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> you don't have to set framework_authentication_file and >>>>>>>>>>>>>>> zk_digest_credentials. The scheduler help text is misleading >>>>>>>>>>>>>>> here as >>>>>>>>>>>>>>> everything will work fine if you leave those empty. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In addition, looks like you are misunderstanding the usage >>>>>>>>>>>>>>> of the thermos_executor_path command line flag of the >>>>>>>>>>>>>>> scheduler. It is supposed to point to the binary containing the >>>>>>>>>>>>>>> generic >>>>>>>>>>>>>>> Aurora executor (thermos_executor.pex). You only need the >>>>>>>>>>>>>>> hello_world.aurora >>>>>>>>>>>>>>> once your scheduler is up an running. It serves as an example >>>>>>>>>>>>>>> input for the >>>>>>>>>>>>>>> aurora command line client which can be used to scheduler jobs >>>>>>>>>>>>>>> and services >>>>>>>>>>>>>>> on an Aurora master. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Have you tried to use the vagrant box? Just type 'vagrant >>>>>>>>>>>>>>> up`in a checkout of the Aurora source code. It gives you a >>>>>>>>>>>>>>> running >>>>>>>>>>>>>>> scheduler to play with. Once you have understood how it works, >>>>>>>>>>>>>>> you can >>>>>>>>>>>>>>> start trying to install it on your own (by reverse-engineering >>>>>>>>>>>>>>> the vagrant >>>>>>>>>>>>>>> box). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hope this helps a little, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Stephan >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ------------------------------ >>>>>>>>>>>>>>> *From:* Krish <[email protected]> >>>>>>>>>>>>>>> *Sent:* Tuesday, October 20, 2015 11:39 AM >>>>>>>>>>>>>>> *To:* Bill Farner >>>>>>>>>>>>>>> *Cc:* [email protected]; Erb, Stephan >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Bill/Stephen, >>>>>>>>>>>>>>> I still get a stacktrace when running the aurora scheduler >>>>>>>>>>>>>>> CLI. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I do not know what to specify for >>>>>>>>>>>>>>> -framework_authentication_file & -zk_digest_credentials, and >>>>>>>>>>>>>>> they are >>>>>>>>>>>>>>> required arguments. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I am not using any authentication on Mesos master, do I >>>>>>>>>>>>>>> still need the framework_authentication_file parameter? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> rm -rf /db /backup_dir >>>>>>>>>>>>>>> mesos-log initialize --path="/db" >>>>>>>>>>>>>>> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/ >>>>>>>>>>>>>>> JAVA_OPTS="-Xmx1536m -Xms256m" >>>>>>>>>>>>>>> /usr/local/aurora-scheduler/bin/aurora-scheduler >>>>>>>>>>>>>>> -backup_dir=/backup_dir >>>>>>>>>>>>>>> -cluster_name=tc >>>>>>>>>>>>>>> -mesos_master_address=zk://localhost:2181/mesos/master >>>>>>>>>>>>>>> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181 >>>>>>>>>>>>>>> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false >>>>>>>>>>>>>>> -native_log_file_path=/db >>>>>>>>>>>>>>> -thermos_executor_path=/home/ubuntu/hello_world.aurora >>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>> INFO: Binding org.apache.aurora.scheduler.http.Utilization >>>>>>>>>>>>>>> to GuiceManagedCompon >>>>>>>>>>>>>>> entProvider with the scope "PerRequest" >>>>>>>>>>>>>>> Oct 20, 2015 9:27:40 AM >>>>>>>>>>>>>>> org.apache.aurora.scheduler.cron.quartz.CronModule provi >>>>>>>>>>>>>>> deTimeZone >>>>>>>>>>>>>>> WARNING: Cron schedules are configured to fire according to >>>>>>>>>>>>>>> timezone Greenwich M >>>>>>>>>>>>>>> ean Time but system timezone is set to Coordinated Universal >>>>>>>>>>>>>>> Time >>>>>>>>>>>>>>> Oct 20, 2015 9:27:41 AM >>>>>>>>>>>>>>> org.eclipse.jetty.server.AbstractConnector doStart >>>>>>>>>>>>>>> INFO: Started [email protected]:43843 >>>>>>>>>>>>>>> E1020 09:27:41.290 THREAD1 >>>>>>>>>>>>>>> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec >>>>>>>>>>>>>>> ute: Caught unchecked exception: >>>>>>>>>>>>>>> com.google.inject.ProvisionException: Guice pro >>>>>>>>>>>>>>> vision errors: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 1) Error in custom provider, >>>>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be null at >>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos >>>>>>>>>>>>>>> LogStreamModule.java:117) >>>>>>>>>>>>>>> at >>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos >>>>>>>>>>>>>>> LogStreamModule.java:117) >>>>>>>>>>>>>>> while locating org.apache.mesos.Log >>>>>>>>>>>>>>> at >>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf >>>>>>>>>>>>>>> ace(MesosLogStreamModule.java:152) >>>>>>>>>>>>>>> while locating >>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 1 error >>>>>>>>>>>>>>> com.google.inject.ProvisionException: Guice provision errors: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 1) Error in custom provider, >>>>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be >>>>>>>>>>>>>>> null >>>>>>>>>>>>>>> at >>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos >>>>>>>>>>>>>>> LogStreamModule.java:117) >>>>>>>>>>>>>>> at >>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos >>>>>>>>>>>>>>> LogStreamModule.java:117) >>>>>>>>>>>>>>> while locating org.apache.mesos.Log >>>>>>>>>>>>>>> at >>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf >>>>>>>>>>>>>>> ace(MesosLogStreamModule.java:152) >>>>>>>>>>>>>>> while locating >>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 1 error >>>>>>>>>>>>>>> at >>>>>>>>>>>>>>> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987) >>>>>>>>>>>>>>> at >>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> κρισhναν >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The typical flow is that you keep your .aurora file checked >>>>>>>>>>>>>>>> into git, and commit every time you deploy/update. When you >>>>>>>>>>>>>>>> change your >>>>>>>>>>>>>>>> file, you will instruct Aurora to update the live job (have a >>>>>>>>>>>>>>>> look at aurora >>>>>>>>>>>>>>>> update -h). Aurora will perform a rolling upgrade of your >>>>>>>>>>>>>>>> job to the new config. You'll use this same flow for updating >>>>>>>>>>>>>>>> your job's >>>>>>>>>>>>>>>> software as well as resizing the job. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> For (3), you could set up alerting for stats that the >>>>>>>>>>>>>>>> scheduler exports. Have a look here for monitoring background: >>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/monitoring.md >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> You'll find want to look at scheduler stats related to >>>>>>>>>>>>>>>> 'pending'. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 12:16 PM, Krish < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks for the pointer. Now I notice that the >>>>>>>>>>>>>>>>> aurora-scheduler script has the --thermos_executor_path as a >>>>>>>>>>>>>>>>> mandatory >>>>>>>>>>>>>>>>> requirement. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I have a couple of questions on how the >>>>>>>>>>>>>>>>> thermos_executor/.aurora config file functions: >>>>>>>>>>>>>>>>> 1. Do we have to statically define the file beforehand? >>>>>>>>>>>>>>>>> 2. What happens when we want to dynamically change the >>>>>>>>>>>>>>>>> config, say increasing the number of instances of a service >>>>>>>>>>>>>>>>> required? Does >>>>>>>>>>>>>>>>> aurora require a reboot then? >>>>>>>>>>>>>>>>> 3. How do I get notified about the message mesos sends >>>>>>>>>>>>>>>>> when it cannot schedule tasks for lack of resources? Should I >>>>>>>>>>>>>>>>> depend on >>>>>>>>>>>>>>>>> aurora for this or try to look for a hook into mesos? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I think a little bit of context would help here. >>>>>>>>>>>>>>>>> What I plan to check is to run a very basic job/task >>>>>>>>>>>>>>>>> inside a docker container with aurora & wait for a 'resource >>>>>>>>>>>>>>>>> not available' >>>>>>>>>>>>>>>>> message from mesos, and accordingly call an api to spin up a >>>>>>>>>>>>>>>>> new node in my >>>>>>>>>>>>>>>>> cluster. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> κρισhναν >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I believe you are missing the thermos_executor options >>>>>>>>>>>>>>>>>> that have to be passed to the scheduler command line. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> See >>>>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39 >>>>>>>>>>>>>>>>>> for an example >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Stephan >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ------------------------------ >>>>>>>>>>>>>>>>>> *From:* Krish <[email protected]> >>>>>>>>>>>>>>>>>> *Sent:* Monday, October 19, 2015 8:45 AM >>>>>>>>>>>>>>>>>> *To:* [email protected] >>>>>>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>> I am a n00b with apache aurora & trying to experiment >>>>>>>>>>>>>>>>>> some things on my local machine with zookeeper and >>>>>>>>>>>>>>>>>> mesos-master running >>>>>>>>>>>>>>>>>> locally. They have initialized properly. When I try to run >>>>>>>>>>>>>>>>>> aurora with the >>>>>>>>>>>>>>>>>> required options, I get the following error, & googing >>>>>>>>>>>>>>>>>> hasn't helped me >>>>>>>>>>>>>>>>>> much here. >>>>>>>>>>>>>>>>>> Appreciate any help. Thanks in advance. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>>>> WARNING: Method [public void >>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)] >>>>>>>>>>>>>>>>>> is synthetic and is being intercepted by >>>>>>>>>>>>>>>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8]. >>>>>>>>>>>>>>>>>> This could indicate a bug. The method >>>>>>>>>>>>>>>>>> may be intercepted twice, or may not be intercepted at >>>>>>>>>>>>>>>>>> all. >>>>>>>>>>>>>>>>>> Exception in thread "main" >>>>>>>>>>>>>>>>>> com.google.inject.CreationException: Guice creation errors: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 1) An exception was caught and reported. Message: A value >>>>>>>>>>>>>>>>>> may only be retrieved from a variable that has a default or >>>>>>>>>>>>>>>>>> has been >>>>>>>>>>>>>>>>>> set. >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 2) Could not find a suitable constructor in >>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes >>>>>>>>>>>>>>>>>> must have >>>>>>>>>>>>>>>>>> either one (a >>>>>>>>>>>>>>>>>> nd only one) constructor annotated with @Inject or a >>>>>>>>>>>>>>>>>> zero-argument constructor that is not private. >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 2 errors >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:95) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:83) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263) >>>>>>>>>>>>>>>>>> Caused by: java.lang.IllegalStateException: A value may >>>>>>>>>>>>>>>>>> only be retrieved from a variable that has a default or has >>>>>>>>>>>>>>>>>> been set. >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176) >>>>>>>>>>>>>>>>>> at com.twitter.common.args.Arg.get(Arg.java:82) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> com.google.inject.util.Modules$2.configure(Modules.java:114) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements.getElements(Elements.java:101) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103) >>>>>>>>>>>>>>>>>> ... 7 more >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Complete logs are present @http://pastebin.com/i72HvbYi. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>> κρισhναν >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Zameer Manji >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> > > -- > > Thumb typed mail > >
