Re: Stacktrace when running Apache Aurora

Bill Farner Thu, 03 Mar 2016 07:37:10 -0800

Wow!  I'm glad you got it working!  To help the next poor soul trying to do
this, would you be willing to put up a doc patch on our end?


On Thursday, March 3, 2016, Krish <[email protected]> wrote:

> TLDR;
> Use only file with the name .dockercfg for docker credentials in mesos
> tasks!
>
> Long story:
> ---------------
> Holy smokescreens!
> This is for reporting & documenting purposes only, so that others don't
> have to pull their hair like I did for the past few evenings!
>
> A little background:
> I am running Ubuntu 14.04 on my system and docker stores its credentials
> in the ~/.docker/config.json as
> cat ~/.docker/config.json
> {
> "auths": {
> "repo.example.com:5000": {
> "auth": "<snip>",
> "email": "<snip>"
> }
> }
> }
>
> And I am doing all these experiments on a coreOS system which stores the
> credentials  in ~/.dockercfg as
> core@aurora-1 ~ $ cat ~/.dockercfg
> {
>   "repo.example.com:5000": {
>     "auth": "<snip>",
>     "email": "<snip>"
>   }
> }
>
> Since my container was an Ubuntu 14.04 container (as was my local system),
> I used the ubuntu credential file format, i.e. I couldn't get the slave
> task to read the docker credentials as I had stored it as
> ~/.docker/config.json.
> After parsing through (a lot of find's, grep's and regex matching) aurora,
> mesos, and thermos source code, I saw in mesos/src/docker/docker.cpp:
>
> 1126   // Set HOME variable to pick up *.dockercfg*.
> 1127   map<string, string> environment = os::environment();
> 1128
> 1129   environment["HOME"] = directory;
> 1130
>
> Changed the filename and the json content, changed the
> thermos_executor_resources, and bam, docker pull works!
>
> Well, the mesos documentation does say "To run an image from a private
> repository, one can include the URI pointing to a .dockercfg that contains
> login information." and I would have read it a dozen times!
> But I never thought that they literally meant '.dockercfg' as the name of
> the file!
>
>
>
>
> --
> κρισhναν
>
> On Thu, Mar 3, 2016 at 1:45 PM, Krish <[email protected]
> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>
>>
>> I have got the docker config file copied into the sandbox using the
>> thermos_executor_resources flag; however docker is still not able to find
>> the credentials file for doing an appropriate pull of image from a private
>> repo.
>>
>> When I try to use the library/hello-world:latest image from public docker
>> repo to check if everything works fine without the credentials, I encounter
>> a different problem:
>> exec: "/bin/sh": stat /bin/sh: no such file or directory
>> Error response from daemon: Cannot start container
>> de93dc344d44b41bccccff49e508001a97ff23a8964e637d32a506a31fd4d946: [8]
>> System error: exec: "/bin/sh": stat /bin/sh: no such file or directory
>>
>> I was referring to this email for guidance on setting up a mesos slave:
>> http://mail-archives.apache.org/mod_mbox/aurora-dev/201503.mbox/%3CCAKB1MkHR=+nonesllx9buwttdthsku46pw_wr4b+_z9p59+...@mail.gmail.com%3E
>>
>> So, I cannot get the credentials file to be used by docker, and if I
>> bypass authentication, I can do a docker pull, but encounter a weird error
>> in launching the hello-world image.
>>
>> Am I missing out on checking any log files generated? I currently refer
>> to mesos-slave stdout and the sandbox stderr file.
>> Any configuration parameter I am missing for this to happen?
>>
>> Any pointers will be really helpful. Thanks in advance.
>>
>>
>>
>> --
>> κρισhναν
>>
>> On Sun, Feb 28, 2016 at 3:37 PM, Krish <[email protected]
>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>>
>>> Continuing my earlier chain of thought, I found this in the mesos bug
>>> list:
>>> MESOS-4242 - Allow Docker private registry credentials to be passed from
>>> framework.
>>> How does one pass credentials using the framework? As it seems the
>>> .docker/config.json is not read from the slave.
>>>
>>>
>>>
>>>
>>> --
>>> κρισhναν
>>>
>>> On Sat, Feb 27, 2016 at 11:46 PM, Krish <[email protected]
>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>>>
>>>> I couldn't complete my PoC before project before (got busy with other
>>>> work). Well, it is never too late and here's my update and issue.
>>>>
>>>> I have a 3 node zk (3.5.1 alpha), mesos-master (v0.24.1) & aurora
>>>> (v0.11.0) running.
>>>> I was stuck in a problem where I was using mesos 0.25.0 & aurora 0.9.0
>>>> & got a protobuf field not set error - ExecutorInfo field.
>>>>
>>>> I have a mesos agent running in docker container on coreos and it can
>>>> access the host docker just fine.
>>>> I have also put the docker login credentials file at the right location
>>>> for it to access the private docker registry.
>>>> I can manually trigger a docker pull and docker run without issues from
>>>> the slave (which is also reflected properly outside the slave container
>>>> with docker images and docker ps).
>>>>
>>>> However, when I try to run an aurora job with hello-docker container,
>>>> the slave prints out the log that docker pull has failed; more 
>>>> specifically:
>>>> " failed to start: Failed to 'docker pull
>>>> private_repo.com:5000/krish/test:latest': exit status = exited with
>>>> status 1 stderr = Error: image krish/test:latest not found"
>>>>
>>>> My hunch is that when using docker run from aurora DSL, it does not
>>>> read the docker credentials file properly and hence fails. I can reproduce
>>>> the exact same error when I delete the credentials file from the slave and
>>>> trigger a pull.
>>>>
>>>> Is the hunch right? If yes, is there a way to resolve this? Maybe
>>>> source it some way before the run command?
>>>>
>>>>
>>>>
>>>> --
>>>> κρισhναν
>>>>
>>>> On Tue, Oct 27, 2015 at 10:35 PM, Bill Farner <[email protected]
>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>>>>
>>>>> (1) clusters.json is written by you, configuring the CLI client with
>>>>> instructions for what clusters are available and how to discover them.
>>>>>
>>>>> (2) That's expected - mesos only allows one active replica of a
>>>>> framework at a time, this signals which one is active.
>>>>>
>>>>> (3) The observer is essentially a web server that allows you to browse
>>>>> a task's sandbox directory and other information about it.  You will need
>>>>> to configure it to run on your worker/agent nodes for that functionality 
>>>>> to
>>>>> work (it's linked from the scheduler web UI).
>>>>>
>>>>> (4) You could indeed implement that behavior externally.  There is a
>>>>> reason:
>>>>> https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L556-L559
>>>>>
>>>>> (5) That is correct.  The scheduler exposes a thrift API that you
>>>>> would use (a REST API is coming, but ground has not yet been broken).  If
>>>>> you go this route, i suggest you skip the DSL and use the JSON task
>>>>> description format that is shipped over the API.  There's not good
>>>>> documentation on this, but we can help you through it and would be 
>>>>> grateful
>>>>> for a writeup of your approach!
>>>>>
>>>>>
>>>>> On Mon, Oct 26, 2015 at 11:44 PM, Krish <[email protected]
>>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>>>>>
>>>>>> Hi Folks,
>>>>>> Firstly, thanks for all the help. Am happy to report that I have set
>>>>>> up zk, mesos & aurora, & can work further towards my idea of having an
>>>>>> auto-scaling cluster.
>>>>>> I have some further questions about the work done so far & things I
>>>>>> plan to do:
>>>>>>
>>>>>>    1. Is the /etc/aurora/clusters.json file created by the scheduled
>>>>>>    or does it need to be handcrafted? I had to manually edit the file to 
>>>>>> get
>>>>>>    my `aurora job ...` cli to work.
>>>>>>
>>>>>>    2. I am running a cluster of 3 coreOS VMs on vagrant with zk,
>>>>>>    mesos & aurora in a docker container. Only 1 of them outputs '1' when 
>>>>>> I
>>>>>>    look at the framework_registered' field. Is this expected? How do I 
>>>>>> verify
>>>>>>    that they are working as a cluster?
>>>>>>
>>>>>>    3. From the documentation, I see that there is an observer that
>>>>>>    needs to be listening on port 1338. What is the observer socket & its
>>>>>>    purpose? I have aurora listening only on ports 8081 (http port) & 8083
>>>>>>    (libprocess).
>>>>>>
>>>>>>    4. I read about the 'PENDING' field in aurora documentation, as
>>>>>>    Bill suggested, & realize that it just shows that a task is waiting 
>>>>>> for
>>>>>>    some reasons (for want of resources, in my case, as 0 slaves have
>>>>>>    registered). I was thinking of adding a hook to the pending state; 
>>>>>> say if a
>>>>>>    task is PENDING for 5 minutes for lack of resources in the cluster, 
>>>>>> then
>>>>>>    spin up a new machine. Is this the right approach to take? Does aurora
>>>>>>    provide reasons for why is a task in PENDING state?
>>>>>>
>>>>>>    => aurora job status testcluster/$USER/test/hello_world
>>>>>>     INFO] Checking status of testcluster/ubuntu/test/hello_world
>>>>>>    Active tasks (1):
>>>>>>           Task role: ubuntu, env: test, name: hello_world, instance:
>>>>>>    0, status:
>>>>>>    PENDING on None
>>>>>>              cpus: 0.1, ram: 16 MB, disk: 16 MB
>>>>>>              events:
>>>>>>               2015-10-23 04:55:33 PENDING: None
>>>>>>    Inactive tasks (0):
>>>>>>
>>>>>>    5. Aurora defines job/s is a .aurora config file & if I decide to
>>>>>>    increase/decrease the number of instances in my cluster, then I need 
>>>>>> to
>>>>>>    create/overwrite the concerned the .aurora and trigger the `aurora 
>>>>>> update
>>>>>>    ...` command. Is this right?
>>>>>>    If yes, is there an HTTP API I can invoke remotely which triggers
>>>>>>    this update?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> κρισhναν
>>>>>>
>>>>>> On Fri, Oct 23, 2015 at 8:09 AM, Joshua Cohen <
>>>>>> [email protected]
>>>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>>>>>>
>>>>>>> I suspect your error from `aurora job create ...` is due to the
>>>>>>> aurora config you're using referencing `/vagrant/hello_world.py` which 
>>>>>>> does
>>>>>>> not exist (as you say: you're not even using Vagrant). Can you link the
>>>>>>> .aurora config you're using?
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Joshua
>>>>>>>
>>>>>>> On Thu, Oct 22, 2015 at 3:22 PM, Krish <[email protected]
>>>>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>>>>>>>
>>>>>>>> Thanks, Zameer.
>>>>>>>>
>>>>>>>> I had to modify  /etc/aurora/clusters.json:
>>>>>>>> [
>>>>>>>>   {
>>>>>>>>     "auth_mechanism": "UNAUTHENTICATED",
>>>>>>>>     "name": "testcluster",
>>>>>>>>     "scheduler_zk_path": "/scheduler/aurora",
>>>>>>>>     "slave_root": "/var/lib/mesos",
>>>>>>>>     "slave_run_directory": "latest",
>>>>>>>>     "zk": "127.0.1.1"
>>>>>>>>   }
>>>>>>>> ]
>>>>>>>>
>>>>>>>> I have a hello_world.aurora in my home folder. However the
>>>>>>>> following command errors out:
>>>>>>>> ~$ aurora job create testcluster/testrole/test/hellojob
>>>>>>>> ./hello_world.aurora
>>>>>>>> Error loading configuration: [Errno 2] No such file or directory:
>>>>>>>> '/vagrant/hello_world.py'
>>>>>>>>
>>>>>>>> A job list does work:
>>>>>>>> ~$ aurora job list testcluster
>>>>>>>>  INFO] Retrieving jobs for role None
>>>>>>>>
>>>>>>>> I am not even using the vagrant. I am using zk & mesos on the same
>>>>>>>> machine as aurora. How do I submit these job templates to aurora?
>>>>>>>>
>>>>>>>> Any pointers to documentation will be helpful.
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> κρισhναν
>>>>>>>>
>>>>>>>> On Wed, Oct 21, 2015 at 11:09 PM, Zameer Manji <[email protected]
>>>>>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>>>>>>>>
>>>>>>>>> Version 0.9.0 does not have the gc executor. Version 0.9.0 uses
>>>>>>>>> Mesos' task reconciliation
>>>>>>>>> <http://mesos.apache.org/documentation/latest/reconciliation/> API
>>>>>>>>> instead.
>>>>>>>>>
>>>>>>>>> On Wed, Oct 21, 2015 at 9:28 AM, Krish <[email protected]
>>>>>>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks Bill for the location to the debs. I was finally able to
>>>>>>>>>> run aurora. :)
>>>>>>>>>>
>>>>>>>>>> I did find thermos_executor.pex & thermos_observer after
>>>>>>>>>> installing aurora-executor. I still could not find gc_executor.pex 
>>>>>>>>>> on my
>>>>>>>>>> system.
>>>>>>>>>> Is there a location from where I can download the binaries for
>>>>>>>>>> *.pex or build them from scratch?
>>>>>>>>>>
>>>>>>>>>> root@dev:/# find . -name "*.pex"
>>>>>>>>>> ./usr/share/aurora/bin/thermos_executor.pex
>>>>>>>>>> ./usr/share/aurora/bin/kaurora_admin.pex
>>>>>>>>>> ./usr/share/aurora/bin/kaurora.pex
>>>>>>>>>> ./usr/share/aurora/bin/thermos.pex
>>>>>>>>>> ./usr/share/aurora/bin/thermos_observer.pex
>>>>>>>>>> ./home/ubuntu/.pex
>>>>>>>>>> ./root/.pex
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> κρισhναν
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 20, 2015 at 11:46 PM, Bill Farner <[email protected]
>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Aurora currently requires an executor, so setting it to
>>>>>>>>>>> /dev/null will not work.  Happy to talk further about your thoughts 
>>>>>>>>>>> around
>>>>>>>>>>> sidestepping the executor.
>>>>>>>>>>>
>>>>>>>>>>> As for working with the scheduler source code, it's a standard
>>>>>>>>>>> gradle project and we tend to use intellij.  Docs to help ramp on 
>>>>>>>>>>> that:
>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md
>>>>>>>>>>>
>>>>>>>>>>> As for builds - the .zip is a source distribution, so it won't
>>>>>>>>>>> have any pre-built binaries.  If you're on debian, we have official 
>>>>>>>>>>> debs
>>>>>>>>>>> here: https://bintray.com/apache/aurora
>>>>>>>>>>> You can see how they're built here (and can build your own)
>>>>>>>>>>> packages: https://github.com/apache/aurora-packaging
>>>>>>>>>>> We're close to having official RPMs, but none to speak of yet.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 20, 2015 at 9:47 AM, Krish <
>>>>>>>>>>> [email protected]
>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Stephen,
>>>>>>>>>>>> I am trying to get started and run aurora without thermos
>>>>>>>>>>>> executor (setting it to /dev/null does not help) - on a local 
>>>>>>>>>>>> linux box for
>>>>>>>>>>>> now & planning to containerize/dockerize it later.
>>>>>>>>>>>>
>>>>>>>>>>>> Can you please point me to the right documentation (or a
>>>>>>>>>>>> pointer to the cli parsing source code) which can help me resolve 
>>>>>>>>>>>> this?
>>>>>>>>>>>> Also, are there any steps steps to import source code into eclipse 
>>>>>>>>>>>> to
>>>>>>>>>>>> browse & analyze code for this.
>>>>>>>>>>>>
>>>>>>>>>>>> Also, where do i find all the *.pex files? They are not present
>>>>>>>>>>>> in the zip file nor anywhere in the built source code.
>>>>>>>>>>>>
>>>>>>>>>>>> I know I am asking too many queries on a single thread here, &
>>>>>>>>>>>> would appreciate the help.
>>>>>>>>>>>> I think at the end of this, I will put the steps I followed in
>>>>>>>>>>>> a gist/blog so others might find their way around, & not struggle 
>>>>>>>>>>>> as much.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan <
>>>>>>>>>>>> [email protected]
>>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Krish,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> you don't have to set framework_authentication_file and
>>>>>>>>>>>>> zk_digest_credentials. The scheduler help text is misleading here 
>>>>>>>>>>>>> as
>>>>>>>>>>>>> everything will work fine if you leave those empty.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> In addition, looks like you are misunderstanding the usage of
>>>>>>>>>>>>> the thermos_executor_path command line flag of the scheduler.
>>>>>>>>>>>>> It is supposed to point to the binary containing the generic
>>>>>>>>>>>>> Aurora executor (thermos_executor.pex).  You only need the 
>>>>>>>>>>>>> hello_world.aurora
>>>>>>>>>>>>> once your scheduler is up an running. It serves as an example 
>>>>>>>>>>>>> input for the
>>>>>>>>>>>>> aurora command line client which can be used to scheduler jobs 
>>>>>>>>>>>>> and services
>>>>>>>>>>>>> on an Aurora master.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Have you tried to use the vagrant box? Just type 'vagrant
>>>>>>>>>>>>> up`in a checkout of the Aurora source code. It gives you a running
>>>>>>>>>>>>> scheduler to play with. Once you have understood how it works, 
>>>>>>>>>>>>> you can
>>>>>>>>>>>>> start trying to install it on your own (by reverse-engineering 
>>>>>>>>>>>>> the vagrant
>>>>>>>>>>>>> box).
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hope this helps a little,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>> *From:* Krish <[email protected]
>>>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>>
>>>>>>>>>>>>> *Sent:* Tuesday, October 20, 2015 11:39 AM
>>>>>>>>>>>>> *To:* Bill Farner
>>>>>>>>>>>>> *Cc:* [email protected]
>>>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>;
>>>>>>>>>>>>> Erb, Stephan
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>>
>>>>>>>>>>>>> Bill/Stephen,
>>>>>>>>>>>>> I still get a stacktrace when running the aurora scheduler CLI.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I do not know what to specify for
>>>>>>>>>>>>>  -framework_authentication_file & -zk_digest_credentials, and 
>>>>>>>>>>>>> they are
>>>>>>>>>>>>> required arguments.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am not using any authentication on Mesos master, do I still
>>>>>>>>>>>>> need the framework_authentication_file parameter?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> rm -rf /db /backup_dir
>>>>>>>>>>>>> mesos-log initialize --path="/db"
>>>>>>>>>>>>> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
>>>>>>>>>>>>> JAVA_OPTS="-Xmx1536m  -Xms256m"
>>>>>>>>>>>>> /usr/local/aurora-scheduler/bin/aurora-scheduler 
>>>>>>>>>>>>> -backup_dir=/backup_dir
>>>>>>>>>>>>> -cluster_name=tc 
>>>>>>>>>>>>> -mesos_master_address=zk://localhost:2181/mesos/master
>>>>>>>>>>>>> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181
>>>>>>>>>>>>> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false
>>>>>>>>>>>>> -native_log_file_path=/db
>>>>>>>>>>>>> -thermos_executor_path=/home/ubuntu/hello_world.aurora
>>>>>>>>>>>>> ...
>>>>>>>>>>>>> ...
>>>>>>>>>>>>> INFO: Binding org.apache.aurora.scheduler.http.Utilization to
>>>>>>>>>>>>> GuiceManagedCompon
>>>>>>>>>>>>> entProvider with the scope "PerRequest"
>>>>>>>>>>>>> Oct 20, 2015 9:27:40 AM
>>>>>>>>>>>>> org.apache.aurora.scheduler.cron.quartz.CronModule provi
>>>>>>>>>>>>> deTimeZone
>>>>>>>>>>>>> WARNING: Cron schedules are configured to fire according to
>>>>>>>>>>>>> timezone Greenwich M
>>>>>>>>>>>>> ean Time but system timezone is set to Coordinated Universal
>>>>>>>>>>>>> Time
>>>>>>>>>>>>> Oct 20, 2015 9:27:41 AM
>>>>>>>>>>>>> org.eclipse.jetty.server.AbstractConnector doStart
>>>>>>>>>>>>> INFO: Started [email protected]:43843
>>>>>>>>>>>>> E1020 09:27:41.290 THREAD1
>>>>>>>>>>>>> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec
>>>>>>>>>>>>> ute: Caught unchecked exception:
>>>>>>>>>>>>> com.google.inject.ProvisionException: Guice pro
>>>>>>>>>>>>> vision errors:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1) Error in custom provider,
>>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be null at
>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>   at
>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>>>>   at
>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>>>>   while locating
>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1 error
>>>>>>>>>>>>> com.google.inject.ProvisionException: Guice provision errors:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1) Error in custom provider,
>>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be
>>>>>>>>>>>>> null
>>>>>>>>>>>>>   at
>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>   at
>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>>>>   at
>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>>>>   while locating
>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1 error
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner <
>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> The typical flow is that you keep your .aurora file checked
>>>>>>>>>>>>>> into git, and commit every time you deploy/update.  When you 
>>>>>>>>>>>>>> change your
>>>>>>>>>>>>>> file, you will instruct Aurora to update the live job (have a 
>>>>>>>>>>>>>> look at aurora
>>>>>>>>>>>>>> update -h).  Aurora will perform a rolling upgrade of your
>>>>>>>>>>>>>> job to the new config.  You'll use this same flow for updating 
>>>>>>>>>>>>>> your job's
>>>>>>>>>>>>>> software as well as resizing the job.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> For (3), you could set up alerting for stats that the
>>>>>>>>>>>>>> scheduler exports.  Have a look here for monitoring background:
>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/monitoring.md
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> You'll find want to look at scheduler stats related to
>>>>>>>>>>>>>> 'pending'.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 12:16 PM, Krish <
>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for the pointer. Now I notice that the
>>>>>>>>>>>>>>> aurora-scheduler script has the --thermos_executor_path as a 
>>>>>>>>>>>>>>> mandatory
>>>>>>>>>>>>>>> requirement.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have a couple of questions on how the
>>>>>>>>>>>>>>> thermos_executor/.aurora config file functions:
>>>>>>>>>>>>>>> 1. Do we have to statically define the file beforehand?
>>>>>>>>>>>>>>> 2. What happens when we want to dynamically change the
>>>>>>>>>>>>>>> config, say increasing the number of instances of a service 
>>>>>>>>>>>>>>> required? Does
>>>>>>>>>>>>>>> aurora require a reboot then?
>>>>>>>>>>>>>>> 3. How do I get notified about the message mesos sends when
>>>>>>>>>>>>>>> it cannot schedule tasks for lack of resources? Should I depend 
>>>>>>>>>>>>>>> on aurora
>>>>>>>>>>>>>>> for this or try to look for a hook into mesos?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think a little bit of context would help here.
>>>>>>>>>>>>>>> What I plan to check is to run a very basic job/task inside
>>>>>>>>>>>>>>> a docker container with aurora & wait for a 'resource not 
>>>>>>>>>>>>>>> available'
>>>>>>>>>>>>>>> message from mesos, and accordingly call an api to spin up a 
>>>>>>>>>>>>>>> new node in my
>>>>>>>>>>>>>>> cluster.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan <
>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>
>>>>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I believe you are missing the thermos_executor options that
>>>>>>>>>>>>>>>> have to be passed to the scheduler command line.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> See
>>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39
>>>>>>>>>>>>>>>> for an example
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>>>> *From:* Krish <[email protected]
>>>>>>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> *Sent:* Monday, October 19, 2015 8:45 AM
>>>>>>>>>>>>>>>> *To:* [email protected]
>>>>>>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>
>>>>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>> I am a n00b with apache aurora & trying to experiment some
>>>>>>>>>>>>>>>> things on my local machine with zookeeper and mesos-master 
>>>>>>>>>>>>>>>> running locally.
>>>>>>>>>>>>>>>> They have initialized properly. When I try to run aurora with 
>>>>>>>>>>>>>>>> the required
>>>>>>>>>>>>>>>> options, I get the following error, & googing hasn't helped me 
>>>>>>>>>>>>>>>> much here.
>>>>>>>>>>>>>>>> Appreciate any help. Thanks in advance.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>> WARNING: Method [public void
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)]
>>>>>>>>>>>>>>>> is synthetic and is being intercepted by
>>>>>>>>>>>>>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8].
>>>>>>>>>>>>>>>> This could indicate a bug.  The method
>>>>>>>>>>>>>>>>  may be intercepted twice, or may not be intercepted at all.
>>>>>>>>>>>>>>>> Exception in thread "main"
>>>>>>>>>>>>>>>> com.google.inject.CreationException: Guice creation errors:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1) An exception was caught and reported. Message: A value
>>>>>>>>>>>>>>>> may only be retrieved from a variable that has a default or 
>>>>>>>>>>>>>>>> has been
>>>>>>>>>>>>>>>> set.
>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2) Could not find a suitable constructor in
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes 
>>>>>>>>>>>>>>>> must have
>>>>>>>>>>>>>>>> either one (a
>>>>>>>>>>>>>>>> nd only one) constructor annotated with @Inject or a
>>>>>>>>>>>>>>>> zero-argument constructor that is not private.
>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2 errors
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:95)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:83)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
>>>>>>>>>>>>>>>> Caused by: java.lang.IllegalStateException: A value may
>>>>>>>>>>>>>>>> only be retrieved from a variable that has a default or has 
>>>>>>>>>>>>>>>> been set.
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>>>>>>>>>>>>>>>>         at com.twitter.common.args.Arg.get(Arg.java:82)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.util.Modules$2.configure(Modules.java:114)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.spi.Elements.getElements(Elements.java:101)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
>>>>>>>>>>>>>>>>         ... 7 more
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Complete logs are present @http://pastebin.com/i72HvbYi.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Zameer Manji
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Stacktrace when running Apache Aurora

Reply via email to