Used rbt for the first time and some weird thing happened to the console, and it got submitted! https://reviews.apache.org/r/44341/
Will sure keep the list posted with any new info. Thanks. -- κρισhναν On Thu, Mar 3, 2016 at 9:20 PM, Bill Farner <[email protected]> wrote: > Likely in an existing page, preferably wherever you think would have saved > you the trial and error! > > I look forward to the blog post, be sure to shoot a link here once it's up! > > Thanks! > > > On Thursday, March 3, 2016, Krish <[email protected]> wrote: > >> Can you guide me how to do that? Should I start with a new page and then >> submit it or would you like that as an entry in some existing doc? >> That will be the short term (couple of hours) item on my checklist. >> >> Actually, as I said before, I have in mind to blog about my entire design >> and implementation process - the how and the why of docker configuration, >> private docker repo setup, coreos cluster setup, and zk, mesos master, >> aurora containerisation and setup, along with their monitoring (have >> decided on bosun.org with cAdvisor). And a short guide as to how to run >> both containerized and non containerized jobs in production. >> I had to refer to a dozen and more sites and blogs and manuals and source >> to get so far; and got help from engineers in various mailing lists. >> A unified guide should be helpful, imho. >> >> >> On Thursday 3 March 2016, Bill Farner <[email protected]> wrote: >> >>> Wow! I'm glad you got it working! To help the next poor soul trying to >>> do this, would you be willing to put up a doc patch on our end? >>> >>> On Thursday, March 3, 2016, Krish <[email protected]> wrote: >>> >>>> TLDR; >>>> Use only file with the name .dockercfg for docker credentials in mesos >>>> tasks! >>>> >>>> Long story: >>>> --------------- >>>> Holy smokescreens! >>>> This is for reporting & documenting purposes only, so that others don't >>>> have to pull their hair like I did for the past few evenings! >>>> >>>> A little background: >>>> I am running Ubuntu 14.04 on my system and docker stores its >>>> credentials in the ~/.docker/config.json as >>>> cat ~/.docker/config.json >>>> { >>>> "auths": { >>>> "repo.example.com:5000": { >>>> "auth": "<snip>", >>>> "email": "<snip>" >>>> } >>>> } >>>> } >>>> >>>> And I am doing all these experiments on a coreOS system which stores >>>> the credentials in ~/.dockercfg as >>>> core@aurora-1 ~ $ cat ~/.dockercfg >>>> { >>>> "repo.example.com:5000": { >>>> "auth": "<snip>", >>>> "email": "<snip>" >>>> } >>>> } >>>> >>>> Since my container was an Ubuntu 14.04 container (as was my local >>>> system), I used the ubuntu credential file format, i.e. I couldn't get the >>>> slave task to read the docker credentials as I had stored it as >>>> ~/.docker/config.json. >>>> After parsing through (a lot of find's, grep's and regex matching) >>>> aurora, mesos, and thermos source code, I saw in >>>> mesos/src/docker/docker.cpp: >>>> >>>> 1126 // Set HOME variable to pick up *.dockercfg*. >>>> 1127 map<string, string> environment = os::environment(); >>>> 1128 >>>> 1129 environment["HOME"] = directory; >>>> 1130 >>>> >>>> Changed the filename and the json content, changed the >>>> thermos_executor_resources, and bam, docker pull works! >>>> >>>> Well, the mesos documentation does say "To run an image from a private >>>> repository, one can include the URI pointing to a .dockercfg that contains >>>> login information." and I would have read it a dozen times! >>>> But I never thought that they literally meant '.dockercfg' as the name >>>> of the file! >>>> >>>> >>>> >>>> >>>> -- >>>> κρισhναν >>>> >>>> On Thu, Mar 3, 2016 at 1:45 PM, Krish <[email protected]> >>>> wrote: >>>> >>>>> >>>>> I have got the docker config file copied into the sandbox using the >>>>> thermos_executor_resources flag; however docker is still not able to find >>>>> the credentials file for doing an appropriate pull of image from a private >>>>> repo. >>>>> >>>>> When I try to use the library/hello-world:latest image from public >>>>> docker repo to check if everything works fine without the credentials, I >>>>> encounter a different problem: >>>>> exec: "/bin/sh": stat /bin/sh: no such file or directory >>>>> Error response from daemon: Cannot start container >>>>> de93dc344d44b41bccccff49e508001a97ff23a8964e637d32a506a31fd4d946: [8] >>>>> System error: exec: "/bin/sh": stat /bin/sh: no such file or directory >>>>> >>>>> I was referring to this email for guidance on setting up a mesos >>>>> slave: >>>>> http://mail-archives.apache.org/mod_mbox/aurora-dev/201503.mbox/%3CCAKB1MkHR=+nonesllx9buwttdthsku46pw_wr4b+_z9p59+...@mail.gmail.com%3E >>>>> >>>>> So, I cannot get the credentials file to be used by docker, and if I >>>>> bypass authentication, I can do a docker pull, but encounter a weird error >>>>> in launching the hello-world image. >>>>> >>>>> Am I missing out on checking any log files generated? I currently >>>>> refer to mesos-slave stdout and the sandbox stderr file. >>>>> Any configuration parameter I am missing for this to happen? >>>>> >>>>> Any pointers will be really helpful. Thanks in advance. >>>>> >>>>> >>>>> >>>>> -- >>>>> κρισhναν >>>>> >>>>> On Sun, Feb 28, 2016 at 3:37 PM, Krish <[email protected]> >>>>> wrote: >>>>> >>>>>> Continuing my earlier chain of thought, I found this in the mesos bug >>>>>> list: >>>>>> MESOS-4242 - Allow Docker private registry credentials to be passed >>>>>> from framework. >>>>>> How does one pass credentials using the framework? As it seems the >>>>>> .docker/config.json is not read from the slave. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> κρισhναν >>>>>> >>>>>> On Sat, Feb 27, 2016 at 11:46 PM, Krish <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> I couldn't complete my PoC before project before (got busy with >>>>>>> other work). Well, it is never too late and here's my update and issue. >>>>>>> >>>>>>> I have a 3 node zk (3.5.1 alpha), mesos-master (v0.24.1) & aurora >>>>>>> (v0.11.0) running. >>>>>>> I was stuck in a problem where I was using mesos 0.25.0 & aurora >>>>>>> 0.9.0 & got a protobuf field not set error - ExecutorInfo field. >>>>>>> >>>>>>> I have a mesos agent running in docker container on coreos and it >>>>>>> can access the host docker just fine. >>>>>>> I have also put the docker login credentials file at the right >>>>>>> location for it to access the private docker registry. >>>>>>> I can manually trigger a docker pull and docker run without issues >>>>>>> from the slave (which is also reflected properly outside the slave >>>>>>> container with docker images and docker ps). >>>>>>> >>>>>>> However, when I try to run an aurora job with hello-docker >>>>>>> container, the slave prints out the log that docker pull has failed; >>>>>>> more >>>>>>> specifically: >>>>>>> " failed to start: Failed to 'docker pull >>>>>>> private_repo.com:5000/krish/test:latest': exit status = exited with >>>>>>> status 1 stderr = Error: image krish/test:latest not found" >>>>>>> >>>>>>> My hunch is that when using docker run from aurora DSL, it does not >>>>>>> read the docker credentials file properly and hence fails. I can >>>>>>> reproduce >>>>>>> the exact same error when I delete the credentials file from the slave >>>>>>> and >>>>>>> trigger a pull. >>>>>>> >>>>>>> Is the hunch right? If yes, is there a way to resolve this? Maybe >>>>>>> source it some way before the run command? >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> κρισhναν >>>>>>> >>>>>>> On Tue, Oct 27, 2015 at 10:35 PM, Bill Farner <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> (1) clusters.json is written by you, configuring the CLI client >>>>>>>> with instructions for what clusters are available and how to discover >>>>>>>> them. >>>>>>>> >>>>>>>> (2) That's expected - mesos only allows one active replica of a >>>>>>>> framework at a time, this signals which one is active. >>>>>>>> >>>>>>>> (3) The observer is essentially a web server that allows you to >>>>>>>> browse a task's sandbox directory and other information about it. You >>>>>>>> will >>>>>>>> need to configure it to run on your worker/agent nodes for that >>>>>>>> functionality to work (it's linked from the scheduler web UI). >>>>>>>> >>>>>>>> (4) You could indeed implement that behavior externally. There is >>>>>>>> a reason: >>>>>>>> https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L556-L559 >>>>>>>> >>>>>>>> (5) That is correct. The scheduler exposes a thrift API that you >>>>>>>> would use (a REST API is coming, but ground has not yet been broken). >>>>>>>> If >>>>>>>> you go this route, i suggest you skip the DSL and use the JSON task >>>>>>>> description format that is shipped over the API. There's not good >>>>>>>> documentation on this, but we can help you through it and would be >>>>>>>> grateful >>>>>>>> for a writeup of your approach! >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Oct 26, 2015 at 11:44 PM, Krish <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Folks, >>>>>>>>> Firstly, thanks for all the help. Am happy to report that I have >>>>>>>>> set up zk, mesos & aurora, & can work further towards my idea of >>>>>>>>> having an >>>>>>>>> auto-scaling cluster. >>>>>>>>> I have some further questions about the work done so far & things >>>>>>>>> I plan to do: >>>>>>>>> >>>>>>>>> 1. Is the /etc/aurora/clusters.json file created by the >>>>>>>>> scheduled or does it need to be handcrafted? I had to manually >>>>>>>>> edit the >>>>>>>>> file to get my `aurora job ...` cli to work. >>>>>>>>> >>>>>>>>> 2. I am running a cluster of 3 coreOS VMs on vagrant with zk, >>>>>>>>> mesos & aurora in a docker container. Only 1 of them outputs '1' >>>>>>>>> when I >>>>>>>>> look at the framework_registered' field. Is this expected? How do >>>>>>>>> I verify >>>>>>>>> that they are working as a cluster? >>>>>>>>> >>>>>>>>> 3. From the documentation, I see that there is an observer >>>>>>>>> that needs to be listening on port 1338. What is the observer >>>>>>>>> socket & its >>>>>>>>> purpose? I have aurora listening only on ports 8081 (http port) & >>>>>>>>> 8083 >>>>>>>>> (libprocess). >>>>>>>>> >>>>>>>>> 4. I read about the 'PENDING' field in aurora documentation, >>>>>>>>> as Bill suggested, & realize that it just shows that a task is >>>>>>>>> waiting for >>>>>>>>> some reasons (for want of resources, in my case, as 0 slaves have >>>>>>>>> registered). I was thinking of adding a hook to the pending state; >>>>>>>>> say if a >>>>>>>>> task is PENDING for 5 minutes for lack of resources in the >>>>>>>>> cluster, then >>>>>>>>> spin up a new machine. Is this the right approach to take? Does >>>>>>>>> aurora >>>>>>>>> provide reasons for why is a task in PENDING state? >>>>>>>>> >>>>>>>>> => aurora job status testcluster/$USER/test/hello_world >>>>>>>>> INFO] Checking status of testcluster/ubuntu/test/hello_world >>>>>>>>> Active tasks (1): >>>>>>>>> Task role: ubuntu, env: test, name: hello_world, >>>>>>>>> instance: 0, status: >>>>>>>>> PENDING on None >>>>>>>>> cpus: 0.1, ram: 16 MB, disk: 16 MB >>>>>>>>> events: >>>>>>>>> 2015-10-23 04:55:33 PENDING: None >>>>>>>>> Inactive tasks (0): >>>>>>>>> >>>>>>>>> 5. Aurora defines job/s is a .aurora config file & if I decide >>>>>>>>> to increase/decrease the number of instances in my cluster, then I >>>>>>>>> need to >>>>>>>>> create/overwrite the concerned the .aurora and trigger the `aurora >>>>>>>>> update >>>>>>>>> ...` command. Is this right? >>>>>>>>> If yes, is there an HTTP API I can invoke remotely which >>>>>>>>> triggers this update? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> κρισhναν >>>>>>>>> >>>>>>>>> On Fri, Oct 23, 2015 at 8:09 AM, Joshua Cohen < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> I suspect your error from `aurora job create ...` is due to the >>>>>>>>>> aurora config you're using referencing `/vagrant/hello_world.py` >>>>>>>>>> which does >>>>>>>>>> not exist (as you say: you're not even using Vagrant). Can you link >>>>>>>>>> the >>>>>>>>>> .aurora config you're using? >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> >>>>>>>>>> Joshua >>>>>>>>>> >>>>>>>>>> On Thu, Oct 22, 2015 at 3:22 PM, Krish <[email protected] >>>>>>>>>> > wrote: >>>>>>>>>> >>>>>>>>>>> Thanks, Zameer. >>>>>>>>>>> >>>>>>>>>>> I had to modify /etc/aurora/clusters.json: >>>>>>>>>>> [ >>>>>>>>>>> { >>>>>>>>>>> "auth_mechanism": "UNAUTHENTICATED", >>>>>>>>>>> "name": "testcluster", >>>>>>>>>>> "scheduler_zk_path": "/scheduler/aurora", >>>>>>>>>>> "slave_root": "/var/lib/mesos", >>>>>>>>>>> "slave_run_directory": "latest", >>>>>>>>>>> "zk": "127.0.1.1" >>>>>>>>>>> } >>>>>>>>>>> ] >>>>>>>>>>> >>>>>>>>>>> I have a hello_world.aurora in my home folder. However the >>>>>>>>>>> following command errors out: >>>>>>>>>>> ~$ aurora job create testcluster/testrole/test/hellojob >>>>>>>>>>> ./hello_world.aurora >>>>>>>>>>> Error loading configuration: [Errno 2] No such file or >>>>>>>>>>> directory: '/vagrant/hello_world.py' >>>>>>>>>>> >>>>>>>>>>> A job list does work: >>>>>>>>>>> ~$ aurora job list testcluster >>>>>>>>>>> INFO] Retrieving jobs for role None >>>>>>>>>>> >>>>>>>>>>> I am not even using the vagrant. I am using zk & mesos on the >>>>>>>>>>> same machine as aurora. How do I submit these job templates to >>>>>>>>>>> aurora? >>>>>>>>>>> >>>>>>>>>>> Any pointers to documentation will be helpful. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> κρισhναν >>>>>>>>>>> >>>>>>>>>>> On Wed, Oct 21, 2015 at 11:09 PM, Zameer Manji < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Version 0.9.0 does not have the gc executor. Version 0.9.0 uses >>>>>>>>>>>> Mesos' task reconciliation >>>>>>>>>>>> <http://mesos.apache.org/documentation/latest/reconciliation/> API >>>>>>>>>>>> instead. >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Oct 21, 2015 at 9:28 AM, Krish < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Thanks Bill for the location to the debs. I was finally able >>>>>>>>>>>>> to run aurora. :) >>>>>>>>>>>>> >>>>>>>>>>>>> I did find thermos_executor.pex & thermos_observer after >>>>>>>>>>>>> installing aurora-executor. I still could not find >>>>>>>>>>>>> gc_executor.pex on my >>>>>>>>>>>>> system. >>>>>>>>>>>>> Is there a location from where I can download the binaries for >>>>>>>>>>>>> *.pex or build them from scratch? >>>>>>>>>>>>> >>>>>>>>>>>>> root@dev:/# find . -name "*.pex" >>>>>>>>>>>>> ./usr/share/aurora/bin/thermos_executor.pex >>>>>>>>>>>>> ./usr/share/aurora/bin/kaurora_admin.pex >>>>>>>>>>>>> ./usr/share/aurora/bin/kaurora.pex >>>>>>>>>>>>> ./usr/share/aurora/bin/thermos.pex >>>>>>>>>>>>> ./usr/share/aurora/bin/thermos_observer.pex >>>>>>>>>>>>> ./home/ubuntu/.pex >>>>>>>>>>>>> ./root/.pex >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> κρισhναν >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Oct 20, 2015 at 11:46 PM, Bill Farner < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Aurora currently requires an executor, so setting it to >>>>>>>>>>>>>> /dev/null will not work. Happy to talk further about your >>>>>>>>>>>>>> thoughts around >>>>>>>>>>>>>> sidestepping the executor. >>>>>>>>>>>>>> >>>>>>>>>>>>>> As for working with the scheduler source code, it's a >>>>>>>>>>>>>> standard gradle project and we tend to use intellij. Docs to >>>>>>>>>>>>>> help ramp on >>>>>>>>>>>>>> that: >>>>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md >>>>>>>>>>>>>> >>>>>>>>>>>>>> As for builds - the .zip is a source distribution, so it >>>>>>>>>>>>>> won't have any pre-built binaries. If you're on debian, we have >>>>>>>>>>>>>> official >>>>>>>>>>>>>> debs here: https://bintray.com/apache/aurora >>>>>>>>>>>>>> You can see how they're built here (and can build your own) >>>>>>>>>>>>>> packages: https://github.com/apache/aurora-packaging >>>>>>>>>>>>>> We're close to having official RPMs, but none to speak of yet. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 9:47 AM, Krish < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Stephen, >>>>>>>>>>>>>>> I am trying to get started and run aurora without thermos >>>>>>>>>>>>>>> executor (setting it to /dev/null does not help) - on a local >>>>>>>>>>>>>>> linux box for >>>>>>>>>>>>>>> now & planning to containerize/dockerize it later. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Can you please point me to the right documentation (or a >>>>>>>>>>>>>>> pointer to the cli parsing source code) which can help me >>>>>>>>>>>>>>> resolve this? >>>>>>>>>>>>>>> Also, are there any steps steps to import source code into >>>>>>>>>>>>>>> eclipse to >>>>>>>>>>>>>>> browse & analyze code for this. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Also, where do i find all the *.pex files? They are not >>>>>>>>>>>>>>> present in the zip file nor anywhere in the built source code. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I know I am asking too many queries on a single thread here, >>>>>>>>>>>>>>> & would appreciate the help. >>>>>>>>>>>>>>> I think at the end of this, I will put the steps I followed >>>>>>>>>>>>>>> in a gist/blog so others might find their way around, & not >>>>>>>>>>>>>>> struggle as >>>>>>>>>>>>>>> much. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> κρισhναν >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Krish, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> you don't have to set framework_authentication_file and >>>>>>>>>>>>>>>> zk_digest_credentials. The scheduler help text is misleading >>>>>>>>>>>>>>>> here as >>>>>>>>>>>>>>>> everything will work fine if you leave those empty. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> In addition, looks like you are misunderstanding the usage >>>>>>>>>>>>>>>> of the thermos_executor_path command line flag of the >>>>>>>>>>>>>>>> scheduler. It is supposed to point to the binary containing >>>>>>>>>>>>>>>> the generic >>>>>>>>>>>>>>>> Aurora executor (thermos_executor.pex). You only need the >>>>>>>>>>>>>>>> hello_world.aurora >>>>>>>>>>>>>>>> once your scheduler is up an running. It serves as an example >>>>>>>>>>>>>>>> input for the >>>>>>>>>>>>>>>> aurora command line client which can be used to scheduler jobs >>>>>>>>>>>>>>>> and services >>>>>>>>>>>>>>>> on an Aurora master. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Have you tried to use the vagrant box? Just type 'vagrant >>>>>>>>>>>>>>>> up`in a checkout of the Aurora source code. It gives you a >>>>>>>>>>>>>>>> running >>>>>>>>>>>>>>>> scheduler to play with. Once you have understood how it works, >>>>>>>>>>>>>>>> you can >>>>>>>>>>>>>>>> start trying to install it on your own (by reverse-engineering >>>>>>>>>>>>>>>> the vagrant >>>>>>>>>>>>>>>> box). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hope this helps a little, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Stephan >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ------------------------------ >>>>>>>>>>>>>>>> *From:* Krish <[email protected]> >>>>>>>>>>>>>>>> *Sent:* Tuesday, October 20, 2015 11:39 AM >>>>>>>>>>>>>>>> *To:* Bill Farner >>>>>>>>>>>>>>>> *Cc:* [email protected]; Erb, Stephan >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Bill/Stephen, >>>>>>>>>>>>>>>> I still get a stacktrace when running the aurora scheduler >>>>>>>>>>>>>>>> CLI. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I do not know what to specify for >>>>>>>>>>>>>>>> -framework_authentication_file & -zk_digest_credentials, and >>>>>>>>>>>>>>>> they are >>>>>>>>>>>>>>>> required arguments. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I am not using any authentication on Mesos master, do I >>>>>>>>>>>>>>>> still need the framework_authentication_file parameter? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> rm -rf /db /backup_dir >>>>>>>>>>>>>>>> mesos-log initialize --path="/db" >>>>>>>>>>>>>>>> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/ >>>>>>>>>>>>>>>> JAVA_OPTS="-Xmx1536m -Xms256m" >>>>>>>>>>>>>>>> /usr/local/aurora-scheduler/bin/aurora-scheduler >>>>>>>>>>>>>>>> -backup_dir=/backup_dir >>>>>>>>>>>>>>>> -cluster_name=tc >>>>>>>>>>>>>>>> -mesos_master_address=zk://localhost:2181/mesos/master >>>>>>>>>>>>>>>> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181 >>>>>>>>>>>>>>>> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false >>>>>>>>>>>>>>>> -native_log_file_path=/db >>>>>>>>>>>>>>>> -thermos_executor_path=/home/ubuntu/hello_world.aurora >>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>> INFO: Binding org.apache.aurora.scheduler.http.Utilization >>>>>>>>>>>>>>>> to GuiceManagedCompon >>>>>>>>>>>>>>>> entProvider with the scope "PerRequest" >>>>>>>>>>>>>>>> Oct 20, 2015 9:27:40 AM >>>>>>>>>>>>>>>> org.apache.aurora.scheduler.cron.quartz.CronModule provi >>>>>>>>>>>>>>>> deTimeZone >>>>>>>>>>>>>>>> WARNING: Cron schedules are configured to fire according to >>>>>>>>>>>>>>>> timezone Greenwich M >>>>>>>>>>>>>>>> ean Time but system timezone is set to Coordinated >>>>>>>>>>>>>>>> Universal Time >>>>>>>>>>>>>>>> Oct 20, 2015 9:27:41 AM >>>>>>>>>>>>>>>> org.eclipse.jetty.server.AbstractConnector doStart >>>>>>>>>>>>>>>> INFO: Started [email protected]:43843 >>>>>>>>>>>>>>>> E1020 09:27:41.290 THREAD1 >>>>>>>>>>>>>>>> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec >>>>>>>>>>>>>>>> ute: Caught unchecked exception: >>>>>>>>>>>>>>>> com.google.inject.ProvisionException: Guice pro >>>>>>>>>>>>>>>> vision errors: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 1) Error in custom provider, >>>>>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be null at >>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos >>>>>>>>>>>>>>>> LogStreamModule.java:117) >>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos >>>>>>>>>>>>>>>> LogStreamModule.java:117) >>>>>>>>>>>>>>>> while locating org.apache.mesos.Log >>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf >>>>>>>>>>>>>>>> ace(MesosLogStreamModule.java:152) >>>>>>>>>>>>>>>> while locating >>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 1 error >>>>>>>>>>>>>>>> com.google.inject.ProvisionException: Guice provision >>>>>>>>>>>>>>>> errors: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 1) Error in custom provider, >>>>>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be >>>>>>>>>>>>>>>> null >>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos >>>>>>>>>>>>>>>> LogStreamModule.java:117) >>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos >>>>>>>>>>>>>>>> LogStreamModule.java:117) >>>>>>>>>>>>>>>> while locating org.apache.mesos.Log >>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf >>>>>>>>>>>>>>>> ace(MesosLogStreamModule.java:152) >>>>>>>>>>>>>>>> while locating >>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 1 error >>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987) >>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> κρισhναν >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The typical flow is that you keep your .aurora file >>>>>>>>>>>>>>>>> checked into git, and commit every time you deploy/update. >>>>>>>>>>>>>>>>> When you change >>>>>>>>>>>>>>>>> your file, you will instruct Aurora to update the live job >>>>>>>>>>>>>>>>> (have a look at aurora >>>>>>>>>>>>>>>>> update -h). Aurora will perform a rolling upgrade of >>>>>>>>>>>>>>>>> your job to the new config. You'll use this same flow for >>>>>>>>>>>>>>>>> updating your >>>>>>>>>>>>>>>>> job's software as well as resizing the job. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> For (3), you could set up alerting for stats that the >>>>>>>>>>>>>>>>> scheduler exports. Have a look here for monitoring >>>>>>>>>>>>>>>>> background: >>>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/monitoring.md >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> You'll find want to look at scheduler stats related to >>>>>>>>>>>>>>>>> 'pending'. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 12:16 PM, Krish < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks for the pointer. Now I notice that the >>>>>>>>>>>>>>>>>> aurora-scheduler script has the --thermos_executor_path as a >>>>>>>>>>>>>>>>>> mandatory >>>>>>>>>>>>>>>>>> requirement. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I have a couple of questions on how the >>>>>>>>>>>>>>>>>> thermos_executor/.aurora config file functions: >>>>>>>>>>>>>>>>>> 1. Do we have to statically define the file beforehand? >>>>>>>>>>>>>>>>>> 2. What happens when we want to dynamically change the >>>>>>>>>>>>>>>>>> config, say increasing the number of instances of a service >>>>>>>>>>>>>>>>>> required? Does >>>>>>>>>>>>>>>>>> aurora require a reboot then? >>>>>>>>>>>>>>>>>> 3. How do I get notified about the message mesos sends >>>>>>>>>>>>>>>>>> when it cannot schedule tasks for lack of resources? Should >>>>>>>>>>>>>>>>>> I depend on >>>>>>>>>>>>>>>>>> aurora for this or try to look for a hook into mesos? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I think a little bit of context would help here. >>>>>>>>>>>>>>>>>> What I plan to check is to run a very basic job/task >>>>>>>>>>>>>>>>>> inside a docker container with aurora & wait for a 'resource >>>>>>>>>>>>>>>>>> not available' >>>>>>>>>>>>>>>>>> message from mesos, and accordingly call an api to spin up a >>>>>>>>>>>>>>>>>> new node in my >>>>>>>>>>>>>>>>>> cluster. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> κρισhναν >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan < >>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I believe you are missing the thermos_executor options >>>>>>>>>>>>>>>>>>> that have to be passed to the scheduler command line. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> See >>>>>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39 >>>>>>>>>>>>>>>>>>> for an example >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Stephan >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ------------------------------ >>>>>>>>>>>>>>>>>>> *From:* Krish <[email protected]> >>>>>>>>>>>>>>>>>>> *Sent:* Monday, October 19, 2015 8:45 AM >>>>>>>>>>>>>>>>>>> *To:* [email protected] >>>>>>>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>> I am a n00b with apache aurora & trying to experiment >>>>>>>>>>>>>>>>>>> some things on my local machine with zookeeper and >>>>>>>>>>>>>>>>>>> mesos-master running >>>>>>>>>>>>>>>>>>> locally. They have initialized properly. When I try to run >>>>>>>>>>>>>>>>>>> aurora with the >>>>>>>>>>>>>>>>>>> required options, I get the following error, & googing >>>>>>>>>>>>>>>>>>> hasn't helped me >>>>>>>>>>>>>>>>>>> much here. >>>>>>>>>>>>>>>>>>> Appreciate any help. Thanks in advance. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>>>>> WARNING: Method [public void >>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)] >>>>>>>>>>>>>>>>>>> is synthetic and is being intercepted by >>>>>>>>>>>>>>>>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8]. >>>>>>>>>>>>>>>>>>> This could indicate a bug. The method >>>>>>>>>>>>>>>>>>> may be intercepted twice, or may not be intercepted at >>>>>>>>>>>>>>>>>>> all. >>>>>>>>>>>>>>>>>>> Exception in thread "main" >>>>>>>>>>>>>>>>>>> com.google.inject.CreationException: Guice creation errors: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 1) An exception was caught and reported. Message: A >>>>>>>>>>>>>>>>>>> value may only be retrieved from a variable that has a >>>>>>>>>>>>>>>>>>> default or has been >>>>>>>>>>>>>>>>>>> set. >>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 2) Could not find a suitable constructor in >>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes >>>>>>>>>>>>>>>>>>> must have >>>>>>>>>>>>>>>>>>> either one (a >>>>>>>>>>>>>>>>>>> nd only one) constructor annotated with @Inject or a >>>>>>>>>>>>>>>>>>> zero-argument constructor that is not private. >>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43) >>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 2 errors >>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435) >>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154) >>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106) >>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:95) >>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:83) >>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120) >>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87) >>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181) >>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142) >>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263) >>>>>>>>>>>>>>>>>>> Caused by: java.lang.IllegalStateException: A value may >>>>>>>>>>>>>>>>>>> only be retrieved from a variable that has a default or has >>>>>>>>>>>>>>>>>>> been set. >>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176) >>>>>>>>>>>>>>>>>>> at com.twitter.common.args.Arg.get(Arg.java:82) >>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206) >>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59) >>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223) >>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>> com.google.inject.util.Modules$2.configure(Modules.java:114) >>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223) >>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements.getElements(Elements.java:101) >>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133) >>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103) >>>>>>>>>>>>>>>>>>> ... 7 more >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Complete logs are present @http://pastebin.com/i72HvbYi. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>> κρισhναν >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Zameer Manji >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> >> -- >> >> Thumb typed mail >> >>
