Re: Stacktrace when running Apache Aurora

Jake Farrell Thu, 03 Mar 2016 11:30:03 -0800

This can also be avoided by setting DOCKER_CONFIG as an os environment
variable.


The issue is caused when docker containers from private registry are pulled
on a mesos
agent due to mesos versions < .26 only supporting the v1 registries which
require the
.dockercfg config file. Docker 1.8+ uses $HOME/.docker/config.json to store
config.
Mesos .26 has fixed this issue in the universal containerizer puller, but
to workaround
this patch enabling a environment file in the mesos-agents systemd service
set with
DOCKER_CONFIG to say $HOME/.docker/ so the config.json can be picked up
correctly.

MESOS-2969, MESOS-3031 caused by docker/docker#12009


-Jake



On Thu, Mar 3, 2016 at 11:30 AM, Krish <[email protected]> wrote:

> Used rbt for the first time and some weird thing happened to the console,
> and it got submitted!
> https://reviews.apache.org/r/44341/
>
> Will sure keep the list posted with any new info. Thanks.
>
>
>
> --
> κρισhναν
>
> On Thu, Mar 3, 2016 at 9:20 PM, Bill Farner <[email protected]> wrote:
>
>> Likely in an existing page, preferably wherever you think would have
>> saved you the trial and error!
>>
>> I look forward to the blog post, be sure to shoot a link here once it's
>> up!
>>
>> Thanks!
>>
>>
>> On Thursday, March 3, 2016, Krish <[email protected]> wrote:
>>
>>> Can you guide me how to do that? Should I start with a new page and then
>>> submit it or would you like that as an entry in some existing doc?
>>> That will be the short term (couple of hours)  item on my checklist.
>>>
>>> Actually, as I said before, I have in mind to blog about my entire
>>> design and implementation process - the how and the why of docker
>>> configuration, private docker repo setup, coreos cluster setup, and zk,
>>> mesos master, aurora containerisation and setup, along with their
>>> monitoring (have decided on bosun.org with cAdvisor). And a short guide
>>> as to how to run both containerized and non containerized jobs in
>>> production.
>>> I had to refer to a dozen and more sites and blogs and manuals and
>>> source to get so far; and got help from engineers in various mailing lists.
>>> A unified guide should be helpful, imho.
>>>
>>>
>>> On Thursday 3 March 2016, Bill Farner <[email protected]> wrote:
>>>
>>>> Wow!  I'm glad you got it working!  To help the next poor soul trying
>>>> to do this, would you be willing to put up a doc patch on our end?
>>>>
>>>> On Thursday, March 3, 2016, Krish <[email protected]> wrote:
>>>>
>>>>> TLDR;
>>>>> Use only file with the name .dockercfg for docker credentials in mesos
>>>>> tasks!
>>>>>
>>>>> Long story:
>>>>> ---------------
>>>>> Holy smokescreens!
>>>>> This is for reporting & documenting purposes only, so that others
>>>>> don't have to pull their hair like I did for the past few evenings!
>>>>>
>>>>> A little background:
>>>>> I am running Ubuntu 14.04 on my system and docker stores its
>>>>> credentials in the ~/.docker/config.json as
>>>>> cat ~/.docker/config.json
>>>>> {
>>>>> "auths": {
>>>>> "repo.example.com:5000": {
>>>>> "auth": "<snip>",
>>>>> "email": "<snip>"
>>>>> }
>>>>> }
>>>>> }
>>>>>
>>>>> And I am doing all these experiments on a coreOS system which stores
>>>>> the credentials  in ~/.dockercfg as
>>>>> core@aurora-1 ~ $ cat ~/.dockercfg
>>>>> {
>>>>>   "repo.example.com:5000": {
>>>>>     "auth": "<snip>",
>>>>>     "email": "<snip>"
>>>>>   }
>>>>> }
>>>>>
>>>>> Since my container was an Ubuntu 14.04 container (as was my local
>>>>> system), I used the ubuntu credential file format, i.e. I couldn't get the
>>>>> slave task to read the docker credentials as I had stored it as
>>>>> ~/.docker/config.json.
>>>>> After parsing through (a lot of find's, grep's and regex matching)
>>>>> aurora, mesos, and thermos source code, I saw in
>>>>> mesos/src/docker/docker.cpp:
>>>>>
>>>>> 1126   // Set HOME variable to pick up *.dockercfg*.
>>>>> 1127   map<string, string> environment = os::environment();
>>>>> 1128
>>>>> 1129   environment["HOME"] = directory;
>>>>> 1130
>>>>>
>>>>> Changed the filename and the json content, changed the
>>>>> thermos_executor_resources, and bam, docker pull works!
>>>>>
>>>>> Well, the mesos documentation does say "To run an image from a private
>>>>> repository, one can include the URI pointing to a .dockercfg that contains
>>>>> login information." and I would have read it a dozen times!
>>>>> But I never thought that they literally meant '.dockercfg' as the name
>>>>> of the file!
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> κρισhναν
>>>>>
>>>>> On Thu, Mar 3, 2016 at 1:45 PM, Krish <[email protected]>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> I have got the docker config file copied into the sandbox using the
>>>>>> thermos_executor_resources flag; however docker is still not able to find
>>>>>> the credentials file for doing an appropriate pull of image from a 
>>>>>> private
>>>>>> repo.
>>>>>>
>>>>>> When I try to use the library/hello-world:latest image from public
>>>>>> docker repo to check if everything works fine without the credentials, I
>>>>>> encounter a different problem:
>>>>>> exec: "/bin/sh": stat /bin/sh: no such file or directory
>>>>>> Error response from daemon: Cannot start container
>>>>>> de93dc344d44b41bccccff49e508001a97ff23a8964e637d32a506a31fd4d946: [8]
>>>>>> System error: exec: "/bin/sh": stat /bin/sh: no such file or directory
>>>>>>
>>>>>> I was referring to this email for guidance on setting up a mesos
>>>>>> slave:
>>>>>> http://mail-archives.apache.org/mod_mbox/aurora-dev/201503.mbox/%3CCAKB1MkHR=+nonesllx9buwttdthsku46pw_wr4b+_z9p59+...@mail.gmail.com%3E
>>>>>>
>>>>>> So, I cannot get the credentials file to be used by docker, and if I
>>>>>> bypass authentication, I can do a docker pull, but encounter a weird 
>>>>>> error
>>>>>> in launching the hello-world image.
>>>>>>
>>>>>> Am I missing out on checking any log files generated? I currently
>>>>>> refer to mesos-slave stdout and the sandbox stderr file.
>>>>>> Any configuration parameter I am missing for this to happen?
>>>>>>
>>>>>> Any pointers will be really helpful. Thanks in advance.
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> κρισhναν
>>>>>>
>>>>>> On Sun, Feb 28, 2016 at 3:37 PM, Krish <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Continuing my earlier chain of thought, I found this in the mesos
>>>>>>> bug list:
>>>>>>> MESOS-4242 - Allow Docker private registry credentials to be passed
>>>>>>> from framework.
>>>>>>> How does one pass credentials using the framework? As it seems the
>>>>>>> .docker/config.json is not read from the slave.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> κρισhναν
>>>>>>>
>>>>>>> On Sat, Feb 27, 2016 at 11:46 PM, Krish <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I couldn't complete my PoC before project before (got busy with
>>>>>>>> other work). Well, it is never too late and here's my update and issue.
>>>>>>>>
>>>>>>>> I have a 3 node zk (3.5.1 alpha), mesos-master (v0.24.1) & aurora
>>>>>>>> (v0.11.0) running.
>>>>>>>> I was stuck in a problem where I was using mesos 0.25.0 & aurora
>>>>>>>> 0.9.0 & got a protobuf field not set error - ExecutorInfo field.
>>>>>>>>
>>>>>>>> I have a mesos agent running in docker container on coreos and it
>>>>>>>> can access the host docker just fine.
>>>>>>>> I have also put the docker login credentials file at the right
>>>>>>>> location for it to access the private docker registry.
>>>>>>>> I can manually trigger a docker pull and docker run without issues
>>>>>>>> from the slave (which is also reflected properly outside the slave
>>>>>>>> container with docker images and docker ps).
>>>>>>>>
>>>>>>>> However, when I try to run an aurora job with hello-docker
>>>>>>>> container, the slave prints out the log that docker pull has failed; 
>>>>>>>> more
>>>>>>>> specifically:
>>>>>>>> " failed to start: Failed to 'docker pull
>>>>>>>> private_repo.com:5000/krish/test:latest': exit status = exited
>>>>>>>> with status 1 stderr = Error: image krish/test:latest not found"
>>>>>>>>
>>>>>>>> My hunch is that when using docker run from aurora DSL, it does not
>>>>>>>> read the docker credentials file properly and hence fails. I can 
>>>>>>>> reproduce
>>>>>>>> the exact same error when I delete the credentials file from the slave 
>>>>>>>> and
>>>>>>>> trigger a pull.
>>>>>>>>
>>>>>>>> Is the hunch right? If yes, is there a way to resolve this? Maybe
>>>>>>>> source it some way before the run command?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> κρισhναν
>>>>>>>>
>>>>>>>> On Tue, Oct 27, 2015 at 10:35 PM, Bill Farner <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> (1) clusters.json is written by you, configuring the CLI client
>>>>>>>>> with instructions for what clusters are available and how to discover 
>>>>>>>>> them.
>>>>>>>>>
>>>>>>>>> (2) That's expected - mesos only allows one active replica of a
>>>>>>>>> framework at a time, this signals which one is active.
>>>>>>>>>
>>>>>>>>> (3) The observer is essentially a web server that allows you to
>>>>>>>>> browse a task's sandbox directory and other information about it.  
>>>>>>>>> You will
>>>>>>>>> need to configure it to run on your worker/agent nodes for that
>>>>>>>>> functionality to work (it's linked from the scheduler web UI).
>>>>>>>>>
>>>>>>>>> (4) You could indeed implement that behavior externally.  There is
>>>>>>>>> a reason:
>>>>>>>>> https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L556-L559
>>>>>>>>>
>>>>>>>>> (5) That is correct.  The scheduler exposes a thrift API that you
>>>>>>>>> would use (a REST API is coming, but ground has not yet been broken). 
>>>>>>>>>  If
>>>>>>>>> you go this route, i suggest you skip the DSL and use the JSON task
>>>>>>>>> description format that is shipped over the API.  There's not good
>>>>>>>>> documentation on this, but we can help you through it and would be 
>>>>>>>>> grateful
>>>>>>>>> for a writeup of your approach!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Oct 26, 2015 at 11:44 PM, Krish <[email protected]
>>>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>> Hi Folks,
>>>>>>>>>> Firstly, thanks for all the help. Am happy to report that I have
>>>>>>>>>> set up zk, mesos & aurora, & can work further towards my idea of 
>>>>>>>>>> having an
>>>>>>>>>> auto-scaling cluster.
>>>>>>>>>> I have some further questions about the work done so far & things
>>>>>>>>>> I plan to do:
>>>>>>>>>>
>>>>>>>>>>    1. Is the /etc/aurora/clusters.json file created by the
>>>>>>>>>>    scheduled or does it need to be handcrafted? I had to manually 
>>>>>>>>>> edit the
>>>>>>>>>>    file to get my `aurora job ...` cli to work.
>>>>>>>>>>
>>>>>>>>>>    2. I am running a cluster of 3 coreOS VMs on vagrant with zk,
>>>>>>>>>>    mesos & aurora in a docker container. Only 1 of them outputs '1' 
>>>>>>>>>> when I
>>>>>>>>>>    look at the framework_registered' field. Is this expected? How do 
>>>>>>>>>> I verify
>>>>>>>>>>    that they are working as a cluster?
>>>>>>>>>>
>>>>>>>>>>    3. From the documentation, I see that there is an observer
>>>>>>>>>>    that needs to be listening on port 1338. What is the observer 
>>>>>>>>>> socket & its
>>>>>>>>>>    purpose? I have aurora listening only on ports 8081 (http port) & 
>>>>>>>>>> 8083
>>>>>>>>>>    (libprocess).
>>>>>>>>>>
>>>>>>>>>>    4. I read about the 'PENDING' field in aurora documentation,
>>>>>>>>>>    as Bill suggested, & realize that it just shows that a task is 
>>>>>>>>>> waiting for
>>>>>>>>>>    some reasons (for want of resources, in my case, as 0 slaves have
>>>>>>>>>>    registered). I was thinking of adding a hook to the pending 
>>>>>>>>>> state; say if a
>>>>>>>>>>    task is PENDING for 5 minutes for lack of resources in the 
>>>>>>>>>> cluster, then
>>>>>>>>>>    spin up a new machine. Is this the right approach to take? Does 
>>>>>>>>>> aurora
>>>>>>>>>>    provide reasons for why is a task in PENDING state?
>>>>>>>>>>
>>>>>>>>>>    => aurora job status testcluster/$USER/test/hello_world
>>>>>>>>>>     INFO] Checking status of testcluster/ubuntu/test/hello_world
>>>>>>>>>>    Active tasks (1):
>>>>>>>>>>           Task role: ubuntu, env: test, name: hello_world,
>>>>>>>>>>    instance: 0, status:
>>>>>>>>>>    PENDING on None
>>>>>>>>>>              cpus: 0.1, ram: 16 MB, disk: 16 MB
>>>>>>>>>>              events:
>>>>>>>>>>               2015-10-23 04:55:33 PENDING: None
>>>>>>>>>>    Inactive tasks (0):
>>>>>>>>>>
>>>>>>>>>>    5. Aurora defines job/s is a .aurora config file & if I
>>>>>>>>>>    decide to increase/decrease the number of instances in my 
>>>>>>>>>> cluster, then I
>>>>>>>>>>    need to create/overwrite the concerned the .aurora and trigger 
>>>>>>>>>> the `aurora
>>>>>>>>>>    update ...` command. Is this right?
>>>>>>>>>>    If yes, is there an HTTP API I can invoke remotely which
>>>>>>>>>>    triggers this update?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> κρισhναν
>>>>>>>>>>
>>>>>>>>>> On Fri, Oct 23, 2015 at 8:09 AM, Joshua Cohen <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> I suspect your error from `aurora job create ...` is due to the
>>>>>>>>>>> aurora config you're using referencing `/vagrant/hello_world.py` 
>>>>>>>>>>> which does
>>>>>>>>>>> not exist (as you say: you're not even using Vagrant). Can you link 
>>>>>>>>>>> the
>>>>>>>>>>> .aurora config you're using?
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>>
>>>>>>>>>>> Joshua
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Oct 22, 2015 at 3:22 PM, Krish <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks, Zameer.
>>>>>>>>>>>>
>>>>>>>>>>>> I had to modify  /etc/aurora/clusters.json:
>>>>>>>>>>>> [
>>>>>>>>>>>>   {
>>>>>>>>>>>>     "auth_mechanism": "UNAUTHENTICATED",
>>>>>>>>>>>>     "name": "testcluster",
>>>>>>>>>>>>     "scheduler_zk_path": "/scheduler/aurora",
>>>>>>>>>>>>     "slave_root": "/var/lib/mesos",
>>>>>>>>>>>>     "slave_run_directory": "latest",
>>>>>>>>>>>>     "zk": "127.0.1.1"
>>>>>>>>>>>>   }
>>>>>>>>>>>> ]
>>>>>>>>>>>>
>>>>>>>>>>>> I have a hello_world.aurora in my home folder. However the
>>>>>>>>>>>> following command errors out:
>>>>>>>>>>>> ~$ aurora job create testcluster/testrole/test/hellojob
>>>>>>>>>>>> ./hello_world.aurora
>>>>>>>>>>>> Error loading configuration: [Errno 2] No such file or
>>>>>>>>>>>> directory: '/vagrant/hello_world.py'
>>>>>>>>>>>>
>>>>>>>>>>>> A job list does work:
>>>>>>>>>>>> ~$ aurora job list testcluster
>>>>>>>>>>>>  INFO] Retrieving jobs for role None
>>>>>>>>>>>>
>>>>>>>>>>>> I am not even using the vagrant. I am using zk & mesos on the
>>>>>>>>>>>> same machine as aurora. How do I submit these job templates to 
>>>>>>>>>>>> aurora?
>>>>>>>>>>>>
>>>>>>>>>>>> Any pointers to documentation will be helpful.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Oct 21, 2015 at 11:09 PM, Zameer Manji <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Version 0.9.0 does not have the gc executor. Version 0.9.0
>>>>>>>>>>>>> uses Mesos' task reconciliation
>>>>>>>>>>>>> <http://mesos.apache.org/documentation/latest/reconciliation/> API
>>>>>>>>>>>>> instead.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Oct 21, 2015 at 9:28 AM, Krish <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks Bill for the location to the debs. I was finally able
>>>>>>>>>>>>>> to run aurora. :)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I did find thermos_executor.pex & thermos_observer after
>>>>>>>>>>>>>> installing aurora-executor. I still could not find 
>>>>>>>>>>>>>> gc_executor.pex on my
>>>>>>>>>>>>>> system.
>>>>>>>>>>>>>> Is there a location from where I can download the binaries
>>>>>>>>>>>>>> for *.pex or build them from scratch?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> root@dev:/# find . -name "*.pex"
>>>>>>>>>>>>>> ./usr/share/aurora/bin/thermos_executor.pex
>>>>>>>>>>>>>> ./usr/share/aurora/bin/kaurora_admin.pex
>>>>>>>>>>>>>> ./usr/share/aurora/bin/kaurora.pex
>>>>>>>>>>>>>> ./usr/share/aurora/bin/thermos.pex
>>>>>>>>>>>>>> ./usr/share/aurora/bin/thermos_observer.pex
>>>>>>>>>>>>>> ./home/ubuntu/.pex
>>>>>>>>>>>>>> ./root/.pex
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 11:46 PM, Bill Farner <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Aurora currently requires an executor, so setting it to
>>>>>>>>>>>>>>> /dev/null will not work.  Happy to talk further about your 
>>>>>>>>>>>>>>> thoughts around
>>>>>>>>>>>>>>> sidestepping the executor.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> As for working with the scheduler source code, it's a
>>>>>>>>>>>>>>> standard gradle project and we tend to use intellij.  Docs to 
>>>>>>>>>>>>>>> help ramp on
>>>>>>>>>>>>>>> that:
>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> As for builds - the .zip is a source distribution, so it
>>>>>>>>>>>>>>> won't have any pre-built binaries.  If you're on debian, we 
>>>>>>>>>>>>>>> have official
>>>>>>>>>>>>>>> debs here: https://bintray.com/apache/aurora
>>>>>>>>>>>>>>> You can see how they're built here (and can build your own)
>>>>>>>>>>>>>>> packages: https://github.com/apache/aurora-packaging
>>>>>>>>>>>>>>> We're close to having official RPMs, but none to speak of
>>>>>>>>>>>>>>> yet.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 9:47 AM, Krish <
>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Stephen,
>>>>>>>>>>>>>>>> I am trying to get started and run aurora without thermos
>>>>>>>>>>>>>>>> executor (setting it to /dev/null does not help) - on a local 
>>>>>>>>>>>>>>>> linux box for
>>>>>>>>>>>>>>>> now & planning to containerize/dockerize it later.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Can you please point me to the right documentation (or a
>>>>>>>>>>>>>>>> pointer to the cli parsing source code) which can help me 
>>>>>>>>>>>>>>>> resolve this?
>>>>>>>>>>>>>>>> Also, are there any steps steps to import source code into 
>>>>>>>>>>>>>>>> eclipse to
>>>>>>>>>>>>>>>> browse & analyze code for this.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Also, where do i find all the *.pex files? They are not
>>>>>>>>>>>>>>>> present in the zip file nor anywhere in the built source code.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I know I am asking too many queries on a single thread
>>>>>>>>>>>>>>>> here, & would appreciate the help.
>>>>>>>>>>>>>>>> I think at the end of this, I will put the steps I followed
>>>>>>>>>>>>>>>> in a gist/blog so others might find their way around, & not 
>>>>>>>>>>>>>>>> struggle as
>>>>>>>>>>>>>>>> much.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Krish,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> you don't have to set framework_authentication_file and
>>>>>>>>>>>>>>>>> zk_digest_credentials. The scheduler help text is misleading 
>>>>>>>>>>>>>>>>> here as
>>>>>>>>>>>>>>>>> everything will work fine if you leave those empty.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In addition, looks like you are misunderstanding the usage
>>>>>>>>>>>>>>>>> of the thermos_executor_path command line flag of the
>>>>>>>>>>>>>>>>> scheduler. It is supposed to point to the binary containing 
>>>>>>>>>>>>>>>>> the generic
>>>>>>>>>>>>>>>>> Aurora executor (thermos_executor.pex).  You only need
>>>>>>>>>>>>>>>>> the hello_world.aurora once your scheduler is up an
>>>>>>>>>>>>>>>>> running. It serves as an example input for the aurora command 
>>>>>>>>>>>>>>>>> line client
>>>>>>>>>>>>>>>>> which can be used to scheduler jobs and services on an Aurora 
>>>>>>>>>>>>>>>>> master.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Have you tried to use the vagrant box? Just type 'vagrant
>>>>>>>>>>>>>>>>> up`in a checkout of the Aurora source code. It gives you a 
>>>>>>>>>>>>>>>>> running
>>>>>>>>>>>>>>>>> scheduler to play with. Once you have understood how it 
>>>>>>>>>>>>>>>>> works, you can
>>>>>>>>>>>>>>>>> start trying to install it on your own (by 
>>>>>>>>>>>>>>>>> reverse-engineering the vagrant
>>>>>>>>>>>>>>>>> box).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hope this helps a little,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>>>>> *From:* Krish <[email protected]>
>>>>>>>>>>>>>>>>> *Sent:* Tuesday, October 20, 2015 11:39 AM
>>>>>>>>>>>>>>>>> *To:* Bill Farner
>>>>>>>>>>>>>>>>> *Cc:* [email protected]; Erb, Stephan
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Bill/Stephen,
>>>>>>>>>>>>>>>>> I still get a stacktrace when running the aurora scheduler
>>>>>>>>>>>>>>>>> CLI.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I do not know what to specify for
>>>>>>>>>>>>>>>>>  -framework_authentication_file & -zk_digest_credentials, and 
>>>>>>>>>>>>>>>>> they are
>>>>>>>>>>>>>>>>> required arguments.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I am not using any authentication on Mesos master, do I
>>>>>>>>>>>>>>>>> still need the framework_authentication_file parameter?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> rm -rf /db /backup_dir
>>>>>>>>>>>>>>>>> mesos-log initialize --path="/db"
>>>>>>>>>>>>>>>>> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
>>>>>>>>>>>>>>>>> JAVA_OPTS="-Xmx1536m  -Xms256m"
>>>>>>>>>>>>>>>>> /usr/local/aurora-scheduler/bin/aurora-scheduler 
>>>>>>>>>>>>>>>>> -backup_dir=/backup_dir
>>>>>>>>>>>>>>>>> -cluster_name=tc 
>>>>>>>>>>>>>>>>> -mesos_master_address=zk://localhost:2181/mesos/master
>>>>>>>>>>>>>>>>> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181
>>>>>>>>>>>>>>>>> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false
>>>>>>>>>>>>>>>>> -native_log_file_path=/db
>>>>>>>>>>>>>>>>> -thermos_executor_path=/home/ubuntu/hello_world.aurora
>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>> INFO: Binding org.apache.aurora.scheduler.http.Utilization
>>>>>>>>>>>>>>>>> to GuiceManagedCompon
>>>>>>>>>>>>>>>>> entProvider with the scope "PerRequest"
>>>>>>>>>>>>>>>>> Oct 20, 2015 9:27:40 AM
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.cron.quartz.CronModule provi
>>>>>>>>>>>>>>>>> deTimeZone
>>>>>>>>>>>>>>>>> WARNING: Cron schedules are configured to fire according
>>>>>>>>>>>>>>>>> to timezone Greenwich M
>>>>>>>>>>>>>>>>> ean Time but system timezone is set to Coordinated
>>>>>>>>>>>>>>>>> Universal Time
>>>>>>>>>>>>>>>>> Oct 20, 2015 9:27:41 AM
>>>>>>>>>>>>>>>>> org.eclipse.jetty.server.AbstractConnector doStart
>>>>>>>>>>>>>>>>> INFO: Started [email protected]:43843
>>>>>>>>>>>>>>>>> E1020 09:27:41.290 THREAD1
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec
>>>>>>>>>>>>>>>>> ute: Caught unchecked exception:
>>>>>>>>>>>>>>>>> com.google.inject.ProvisionException: Guice pro
>>>>>>>>>>>>>>>>> vision errors:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1) Error in custom provider,
>>>>>>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be null at
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>>>>>>>>   while locating
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1 error
>>>>>>>>>>>>>>>>> com.google.inject.ProvisionException: Guice provision
>>>>>>>>>>>>>>>>> errors:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1) Error in custom provider,
>>>>>>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be
>>>>>>>>>>>>>>>>> null
>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>>>>>>>>   while locating
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1 error
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The typical flow is that you keep your .aurora file
>>>>>>>>>>>>>>>>>> checked into git, and commit every time you deploy/update.  
>>>>>>>>>>>>>>>>>> When you change
>>>>>>>>>>>>>>>>>> your file, you will instruct Aurora to update the live job 
>>>>>>>>>>>>>>>>>> (have a look at aurora
>>>>>>>>>>>>>>>>>> update -h).  Aurora will perform a rolling upgrade of
>>>>>>>>>>>>>>>>>> your job to the new config.  You'll use this same flow for 
>>>>>>>>>>>>>>>>>> updating your
>>>>>>>>>>>>>>>>>> job's software as well as resizing the job.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> For (3), you could set up alerting for stats that the
>>>>>>>>>>>>>>>>>> scheduler exports.  Have a look here for monitoring 
>>>>>>>>>>>>>>>>>> background:
>>>>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/monitoring.md
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> You'll find want to look at scheduler stats related to
>>>>>>>>>>>>>>>>>> 'pending'.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 12:16 PM, Krish <
>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks for the pointer. Now I notice that the
>>>>>>>>>>>>>>>>>>> aurora-scheduler script has the --thermos_executor_path as 
>>>>>>>>>>>>>>>>>>> a mandatory
>>>>>>>>>>>>>>>>>>> requirement.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I have a couple of questions on how the
>>>>>>>>>>>>>>>>>>> thermos_executor/.aurora config file functions:
>>>>>>>>>>>>>>>>>>> 1. Do we have to statically define the file beforehand?
>>>>>>>>>>>>>>>>>>> 2. What happens when we want to dynamically change the
>>>>>>>>>>>>>>>>>>> config, say increasing the number of instances of a service 
>>>>>>>>>>>>>>>>>>> required? Does
>>>>>>>>>>>>>>>>>>> aurora require a reboot then?
>>>>>>>>>>>>>>>>>>> 3. How do I get notified about the message mesos sends
>>>>>>>>>>>>>>>>>>> when it cannot schedule tasks for lack of resources? Should 
>>>>>>>>>>>>>>>>>>> I depend on
>>>>>>>>>>>>>>>>>>> aurora for this or try to look for a hook into mesos?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I think a little bit of context would help here.
>>>>>>>>>>>>>>>>>>> What I plan to check is to run a very basic job/task
>>>>>>>>>>>>>>>>>>> inside a docker container with aurora & wait for a 
>>>>>>>>>>>>>>>>>>> 'resource not available'
>>>>>>>>>>>>>>>>>>> message from mesos, and accordingly call an api to spin up 
>>>>>>>>>>>>>>>>>>> a new node in my
>>>>>>>>>>>>>>>>>>> cluster.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan <
>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I believe you are missing the thermos_executor options
>>>>>>>>>>>>>>>>>>>> that have to be passed to the scheduler command line.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> See
>>>>>>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39
>>>>>>>>>>>>>>>>>>>> for an example
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>>>>>>>> *From:* Krish <[email protected]>
>>>>>>>>>>>>>>>>>>>> *Sent:* Monday, October 19, 2015 8:45 AM
>>>>>>>>>>>>>>>>>>>> *To:* [email protected]
>>>>>>>>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>> I am a n00b with apache aurora & trying to experiment
>>>>>>>>>>>>>>>>>>>> some things on my local machine with zookeeper and 
>>>>>>>>>>>>>>>>>>>> mesos-master running
>>>>>>>>>>>>>>>>>>>> locally. They have initialized properly. When I try to run 
>>>>>>>>>>>>>>>>>>>> aurora with the
>>>>>>>>>>>>>>>>>>>> required options, I get the following error, & googing 
>>>>>>>>>>>>>>>>>>>> hasn't helped me
>>>>>>>>>>>>>>>>>>>> much here.
>>>>>>>>>>>>>>>>>>>> Appreciate any help. Thanks in advance.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>>> WARNING: Method [public void
>>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)]
>>>>>>>>>>>>>>>>>>>> is synthetic and is being intercepted by
>>>>>>>>>>>>>>>>>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8].
>>>>>>>>>>>>>>>>>>>> This could indicate a bug.  The method
>>>>>>>>>>>>>>>>>>>>  may be intercepted twice, or may not be intercepted at
>>>>>>>>>>>>>>>>>>>> all.
>>>>>>>>>>>>>>>>>>>> Exception in thread "main"
>>>>>>>>>>>>>>>>>>>> com.google.inject.CreationException: Guice creation errors:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 1) An exception was caught and reported. Message: A
>>>>>>>>>>>>>>>>>>>> value may only be retrieved from a variable that has a 
>>>>>>>>>>>>>>>>>>>> default or has been
>>>>>>>>>>>>>>>>>>>> set.
>>>>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 2) Could not find a suitable constructor in
>>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. 
>>>>>>>>>>>>>>>>>>>> Classes must have
>>>>>>>>>>>>>>>>>>>> either one (a
>>>>>>>>>>>>>>>>>>>> nd only one) constructor annotated with @Inject or a
>>>>>>>>>>>>>>>>>>>> zero-argument constructor that is not private.
>>>>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
>>>>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 2 errors
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:95)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:83)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
>>>>>>>>>>>>>>>>>>>> Caused by: java.lang.IllegalStateException: A value may
>>>>>>>>>>>>>>>>>>>> only be retrieved from a variable that has a default or 
>>>>>>>>>>>>>>>>>>>> has been set.
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>>>>>>>>>>>>>>>>>>>>         at com.twitter.common.args.Arg.get(Arg.java:82)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.inject.util.Modules$2.configure(Modules.java:114)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements.getElements(Elements.java:101)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
>>>>>>>>>>>>>>>>>>>>         ... 7 more
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Complete logs are present @http://pastebin.com/i72HvbYi
>>>>>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Zameer Manji
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>> --
>>>
>>> Thumb typed mail
>>>
>>>
>

Re: Stacktrace when running Apache Aurora

Reply via email to