Hi Folks,
Firstly, thanks for all the help. Am happy to report that I have set up zk,
mesos & aurora, & can work further towards my idea of having an
auto-scaling cluster.
I have some further questions about the work done so far & things I plan to
do:
1. Is the /etc/aurora/clusters.json file created by the scheduled or
does it need to be handcrafted? I had to manually edit the file to get my
`aurora job ...` cli to work.
2. I am running a cluster of 3 coreOS VMs on vagrant with zk, mesos &
aurora in a docker container. Only 1 of them outputs '1' when I look at the
framework_registered' field. Is this expected? How do I verify that they
are working as a cluster?
3. From the documentation, I see that there is an observer that needs to
be listening on port 1338. What is the observer socket & its purpose? I
have aurora listening only on ports 8081 (http port) & 8083 (libprocess).
4. I read about the 'PENDING' field in aurora documentation, as Bill
suggested, & realize that it just shows that a task is waiting for some
reasons (for want of resources, in my case, as 0 slaves have registered). I
was thinking of adding a hook to the pending state; say if a task is
PENDING for 5 minutes for lack of resources in the cluster, then spin up a
new machine. Is this the right approach to take? Does aurora provide
reasons for why is a task in PENDING state?
=> aurora job status testcluster/$USER/test/hello_world
INFO] Checking status of testcluster/ubuntu/test/hello_world
Active tasks (1):
Task role: ubuntu, env: test, name: hello_world, instance: 0,
status:
PENDING on None
cpus: 0.1, ram: 16 MB, disk: 16 MB
events:
2015-10-23 04:55:33 PENDING: None
Inactive tasks (0):
5. Aurora defines job/s is a .aurora config file & if I decide to
increase/decrease the number of instances in my cluster, then I need to
create/overwrite the concerned the .aurora and trigger the `aurora update
...` command. Is this right?
If yes, is there an HTTP API I can invoke remotely which triggers this
update?
--
κρισhναν
On Fri, Oct 23, 2015 at 8:09 AM, Joshua Cohen <[email protected]>
wrote:
> I suspect your error from `aurora job create ...` is due to the aurora
> config you're using referencing `/vagrant/hello_world.py` which does not
> exist (as you say: you're not even using Vagrant). Can you link the .aurora
> config you're using?
>
> Cheers,
>
> Joshua
>
> On Thu, Oct 22, 2015 at 3:22 PM, Krish <[email protected]> wrote:
>
>> Thanks, Zameer.
>>
>> I had to modify /etc/aurora/clusters.json:
>> [
>> {
>> "auth_mechanism": "UNAUTHENTICATED",
>> "name": "testcluster",
>> "scheduler_zk_path": "/scheduler/aurora",
>> "slave_root": "/var/lib/mesos",
>> "slave_run_directory": "latest",
>> "zk": "127.0.1.1"
>> }
>> ]
>>
>> I have a hello_world.aurora in my home folder. However the following
>> command errors out:
>> ~$ aurora job create testcluster/testrole/test/hellojob
>> ./hello_world.aurora
>> Error loading configuration: [Errno 2] No such file or directory:
>> '/vagrant/hello_world.py'
>>
>> A job list does work:
>> ~$ aurora job list testcluster
>> INFO] Retrieving jobs for role None
>>
>> I am not even using the vagrant. I am using zk & mesos on the same
>> machine as aurora. How do I submit these job templates to aurora?
>>
>> Any pointers to documentation will be helpful.
>>
>>
>> --
>> κρισhναν
>>
>> On Wed, Oct 21, 2015 at 11:09 PM, Zameer Manji <[email protected]> wrote:
>>
>>> Version 0.9.0 does not have the gc executor. Version 0.9.0 uses Mesos' task
>>> reconciliation
>>> <http://mesos.apache.org/documentation/latest/reconciliation/> API
>>> instead.
>>>
>>> On Wed, Oct 21, 2015 at 9:28 AM, Krish <[email protected]>
>>> wrote:
>>>
>>>> Thanks Bill for the location to the debs. I was finally able to run
>>>> aurora. :)
>>>>
>>>> I did find thermos_executor.pex & thermos_observer after installing
>>>> aurora-executor. I still could not find gc_executor.pex on my system.
>>>> Is there a location from where I can download the binaries for *.pex or
>>>> build them from scratch?
>>>>
>>>> root@dev:/# find . -name "*.pex"
>>>> ./usr/share/aurora/bin/thermos_executor.pex
>>>> ./usr/share/aurora/bin/kaurora_admin.pex
>>>> ./usr/share/aurora/bin/kaurora.pex
>>>> ./usr/share/aurora/bin/thermos.pex
>>>> ./usr/share/aurora/bin/thermos_observer.pex
>>>> ./home/ubuntu/.pex
>>>> ./root/.pex
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> κρισhναν
>>>>
>>>> On Tue, Oct 20, 2015 at 11:46 PM, Bill Farner <[email protected]>
>>>> wrote:
>>>>
>>>>> Aurora currently requires an executor, so setting it to /dev/null will
>>>>> not work. Happy to talk further about your thoughts around sidestepping
>>>>> the executor.
>>>>>
>>>>> As for working with the scheduler source code, it's a standard gradle
>>>>> project and we tend to use intellij. Docs to help ramp on that:
>>>>> https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md
>>>>>
>>>>> As for builds - the .zip is a source distribution, so it won't have
>>>>> any pre-built binaries. If you're on debian, we have official debs here:
>>>>> https://bintray.com/apache/aurora
>>>>> You can see how they're built here (and can build your own) packages:
>>>>> https://github.com/apache/aurora-packaging
>>>>> We're close to having official RPMs, but none to speak of yet.
>>>>>
>>>>> On Tue, Oct 20, 2015 at 9:47 AM, Krish <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Stephen,
>>>>>> I am trying to get started and run aurora without thermos executor
>>>>>> (setting it to /dev/null does not help) - on a local linux box for now &
>>>>>> planning to containerize/dockerize it later.
>>>>>>
>>>>>> Can you please point me to the right documentation (or a pointer to
>>>>>> the cli parsing source code) which can help me resolve this? Also, are
>>>>>> there any steps steps to import source code into eclipse to browse &
>>>>>> analyze code for this.
>>>>>>
>>>>>> Also, where do i find all the *.pex files? They are not present in
>>>>>> the zip file nor anywhere in the built source code.
>>>>>>
>>>>>> I know I am asking too many queries on a single thread here, & would
>>>>>> appreciate the help.
>>>>>> I think at the end of this, I will put the steps I followed in a
>>>>>> gist/blog so others might find their way around, & not struggle as much.
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> κρισhναν
>>>>>>
>>>>>> On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi Krish,
>>>>>>>
>>>>>>>
>>>>>>> you don't have to set framework_authentication_file and
>>>>>>> zk_digest_credentials. The scheduler help text is misleading here as
>>>>>>> everything will work fine if you leave those empty.
>>>>>>>
>>>>>>>
>>>>>>> In addition, looks like you are misunderstanding the usage of the
>>>>>>> thermos_executor_path command line flag of the scheduler. It is
>>>>>>> supposed to point to the binary containing the generic Aurora executor
>>>>>>> (thermos_executor.pex). You only need the hello_world.aurora once
>>>>>>> your scheduler is up an running. It serves as an example input for the
>>>>>>> aurora command line client which can be used to scheduler jobs and
>>>>>>> services
>>>>>>> on an Aurora master.
>>>>>>>
>>>>>>>
>>>>>>> Have you tried to use the vagrant box? Just type 'vagrant up`in a
>>>>>>> checkout of the Aurora source code. It gives you a running scheduler to
>>>>>>> play with. Once you have understood how it works, you can start trying
>>>>>>> to
>>>>>>> install it on your own (by reverse-engineering the vagrant box).
>>>>>>>
>>>>>>>
>>>>>>> Hope this helps a little,
>>>>>>>
>>>>>>> Stephan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------
>>>>>>> *From:* Krish <[email protected]>
>>>>>>> *Sent:* Tuesday, October 20, 2015 11:39 AM
>>>>>>> *To:* Bill Farner
>>>>>>> *Cc:* [email protected]; Erb, Stephan
>>>>>>>
>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>
>>>>>>> Bill/Stephen,
>>>>>>> I still get a stacktrace when running the aurora scheduler CLI.
>>>>>>>
>>>>>>> I do not know what to specify for -framework_authentication_file
>>>>>>> & -zk_digest_credentials, and they are required arguments.
>>>>>>>
>>>>>>> I am not using any authentication on Mesos master, do I still need
>>>>>>> the framework_authentication_file parameter?
>>>>>>>
>>>>>>>
>>>>>>> rm -rf /db /backup_dir
>>>>>>> mesos-log initialize --path="/db"
>>>>>>> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
>>>>>>> JAVA_OPTS="-Xmx1536m -Xms256m"
>>>>>>> /usr/local/aurora-scheduler/bin/aurora-scheduler -backup_dir=/backup_dir
>>>>>>> -cluster_name=tc -mesos_master_address=zk://localhost:2181/mesos/master
>>>>>>> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181
>>>>>>> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false
>>>>>>> -native_log_file_path=/db
>>>>>>> -thermos_executor_path=/home/ubuntu/hello_world.aurora
>>>>>>> ...
>>>>>>> ...
>>>>>>> INFO: Binding org.apache.aurora.scheduler.http.Utilization to
>>>>>>> GuiceManagedCompon
>>>>>>> entProvider with the scope "PerRequest"
>>>>>>> Oct 20, 2015 9:27:40 AM
>>>>>>> org.apache.aurora.scheduler.cron.quartz.CronModule provi
>>>>>>> deTimeZone
>>>>>>> WARNING: Cron schedules are configured to fire according to timezone
>>>>>>> Greenwich M
>>>>>>> ean Time but system timezone is set to Coordinated Universal Time
>>>>>>> Oct 20, 2015 9:27:41 AM org.eclipse.jetty.server.AbstractConnector
>>>>>>> doStart
>>>>>>> INFO: Started [email protected]:43843
>>>>>>> E1020 09:27:41.290 THREAD1
>>>>>>> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec
>>>>>>> ute: Caught unchecked exception:
>>>>>>> com.google.inject.ProvisionException: Guice pro
>>>>>>> vision errors:
>>>>>>>
>>>>>>> 1) Error in custom provider, java.lang.IllegalArgumentException:
>>>>>>> Path cannot be null at
>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>> LogStreamModule.java:117)
>>>>>>> at
>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>> LogStreamModule.java:117)
>>>>>>> while locating org.apache.mesos.Log
>>>>>>> at
>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>> while locating org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>
>>>>>>> 1 error
>>>>>>> com.google.inject.ProvisionException: Guice provision errors:
>>>>>>>
>>>>>>> 1) Error in custom provider, java.lang.IllegalArgumentException:
>>>>>>> Path cannot be
>>>>>>> null
>>>>>>> at
>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>> LogStreamModule.java:117)
>>>>>>> at
>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>> LogStreamModule.java:117)
>>>>>>> while locating org.apache.mesos.Log
>>>>>>> at
>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>> while locating org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>
>>>>>>> 1 error
>>>>>>> at
>>>>>>> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
>>>>>>> at
>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> κρισhναν
>>>>>>>
>>>>>>> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> The typical flow is that you keep your .aurora file checked into
>>>>>>>> git, and commit every time you deploy/update. When you change your
>>>>>>>> file,
>>>>>>>> you will instruct Aurora to update the live job (have a look at aurora
>>>>>>>> update -h). Aurora will perform a rolling upgrade of your job to
>>>>>>>> the new config. You'll use this same flow for updating your job's
>>>>>>>> software
>>>>>>>> as well as resizing the job.
>>>>>>>>
>>>>>>>> For (3), you could set up alerting for stats that the scheduler
>>>>>>>> exports. Have a look here for monitoring background:
>>>>>>>> https://github.com/apache/aurora/blob/master/docs/monitoring.md
>>>>>>>>
>>>>>>>> You'll find want to look at scheduler stats related to 'pending'.
>>>>>>>>
>>>>>>>> On Mon, Oct 19, 2015 at 12:16 PM, Krish <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks for the pointer. Now I notice that the aurora-scheduler
>>>>>>>>> script has the --thermos_executor_path as a mandatory requirement.
>>>>>>>>>
>>>>>>>>> I have a couple of questions on how the thermos_executor/.aurora
>>>>>>>>> config file functions:
>>>>>>>>> 1. Do we have to statically define the file beforehand?
>>>>>>>>> 2. What happens when we want to dynamically change the config, say
>>>>>>>>> increasing the number of instances of a service required? Does aurora
>>>>>>>>> require a reboot then?
>>>>>>>>> 3. How do I get notified about the message mesos sends when it
>>>>>>>>> cannot schedule tasks for lack of resources? Should I depend on
>>>>>>>>> aurora for
>>>>>>>>> this or try to look for a hook into mesos?
>>>>>>>>>
>>>>>>>>> I think a little bit of context would help here.
>>>>>>>>> What I plan to check is to run a very basic job/task inside a
>>>>>>>>> docker container with aurora & wait for a 'resource not available'
>>>>>>>>> message
>>>>>>>>> from mesos, and accordingly call an api to spin up a new node in my
>>>>>>>>> cluster.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> κρισhναν
>>>>>>>>>
>>>>>>>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> I believe you are missing the thermos_executor options that have
>>>>>>>>>> to be passed to the scheduler command line.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> See
>>>>>>>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39
>>>>>>>>>> for an example
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Best Regards,
>>>>>>>>>>
>>>>>>>>>> Stephan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ------------------------------
>>>>>>>>>> *From:* Krish <[email protected]>
>>>>>>>>>> *Sent:* Monday, October 19, 2015 8:45 AM
>>>>>>>>>> *To:* [email protected]
>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>> I am a n00b with apache aurora & trying to experiment some things
>>>>>>>>>> on my local machine with zookeeper and mesos-master running locally.
>>>>>>>>>> They
>>>>>>>>>> have initialized properly. When I try to run aurora with the required
>>>>>>>>>> options, I get the following error, & googing hasn't helped me much
>>>>>>>>>> here.
>>>>>>>>>> Appreciate any help. Thanks in advance.
>>>>>>>>>>
>>>>>>>>>> ...
>>>>>>>>>> ...
>>>>>>>>>> WARNING: Method [public void
>>>>>>>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)]
>>>>>>>>>> is synthetic and is being intercepted by
>>>>>>>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8]. This
>>>>>>>>>> could indicate a bug. The method
>>>>>>>>>> may be intercepted twice, or may not be intercepted at all.
>>>>>>>>>> Exception in thread "main" com.google.inject.CreationException:
>>>>>>>>>> Guice creation errors:
>>>>>>>>>>
>>>>>>>>>> 1) An exception was caught and reported. Message: A value may
>>>>>>>>>> only be retrieved from a variable that has a default or has been
>>>>>>>>>> set.
>>>>>>>>>> at
>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>
>>>>>>>>>> 2) Could not find a suitable constructor in
>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes must have
>>>>>>>>>> either one (a
>>>>>>>>>> nd only one) constructor annotated with @Inject or a
>>>>>>>>>> zero-argument constructor that is not private.
>>>>>>>>>> at
>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
>>>>>>>>>> at
>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)
>>>>>>>>>>
>>>>>>>>>> 2 errors
>>>>>>>>>> at
>>>>>>>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
>>>>>>>>>> at
>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
>>>>>>>>>> at
>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
>>>>>>>>>> at com.google.inject.Guice.createInjector(Guice.java:95)
>>>>>>>>>> at com.google.inject.Guice.createInjector(Guice.java:83)
>>>>>>>>>> at
>>>>>>>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
>>>>>>>>>> at
>>>>>>>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
>>>>>>>>>> at
>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
>>>>>>>>>> at
>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
>>>>>>>>>> at
>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
>>>>>>>>>> Caused by: java.lang.IllegalStateException: A value may only be
>>>>>>>>>> retrieved from a variable that has a default or has been set.
>>>>>>>>>> at
>>>>>>>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>>>>>>>>>> at com.twitter.common.args.Arg.get(Arg.java:82)
>>>>>>>>>> at
>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
>>>>>>>>>> at
>>>>>>>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59)
>>>>>>>>>> at
>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>> at
>>>>>>>>>> com.google.inject.util.Modules$2.configure(Modules.java:114)
>>>>>>>>>> at
>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>> at
>>>>>>>>>> com.google.inject.spi.Elements.getElements(Elements.java:101)
>>>>>>>>>> at
>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>> at
>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
>>>>>>>>>> ... 7 more
>>>>>>>>>>
>>>>>>>>>> Complete logs are present @http://pastebin.com/i72HvbYi.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> κρισhναν
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Zameer Manji
>>>
>>
>>
>