Re: Stacktrace when running Apache Aurora

Krish Thu, 03 Mar 2016 08:31:12 -0800

Used rbt for the first time and some weird thing happened to the console,
and it got submitted!
https://reviews.apache.org/r/44341/


Will sure keep the list posted with any new info. Thanks.



--
κρισhναν

On Thu, Mar 3, 2016 at 9:20 PM, Bill Farner <[email protected]> wrote:

> Likely in an existing page, preferably wherever you think would have saved
> you the trial and error!
>
> I look forward to the blog post, be sure to shoot a link here once it's up!
>
> Thanks!
>
>
> On Thursday, March 3, 2016, Krish <[email protected]> wrote:
>
>> Can you guide me how to do that? Should I start with a new page and then
>> submit it or would you like that as an entry in some existing doc?
>> That will be the short term (couple of hours)  item on my checklist.
>>
>> Actually, as I said before, I have in mind to blog about my entire design
>> and implementation process - the how and the why of docker configuration,
>> private docker repo setup, coreos cluster setup, and zk, mesos master,
>> aurora containerisation and setup, along with their monitoring (have
>> decided on bosun.org with cAdvisor). And a short guide as to how to run
>> both containerized and non containerized jobs in production.
>> I had to refer to a dozen and more sites and blogs and manuals and source
>> to get so far; and got help from engineers in various mailing lists.
>> A unified guide should be helpful, imho.
>>
>>
>> On Thursday 3 March 2016, Bill Farner <[email protected]> wrote:
>>
>>> Wow!  I'm glad you got it working!  To help the next poor soul trying to
>>> do this, would you be willing to put up a doc patch on our end?
>>>
>>> On Thursday, March 3, 2016, Krish <[email protected]> wrote:
>>>
>>>> TLDR;
>>>> Use only file with the name .dockercfg for docker credentials in mesos
>>>> tasks!
>>>>
>>>> Long story:
>>>> ---------------
>>>> Holy smokescreens!
>>>> This is for reporting & documenting purposes only, so that others don't
>>>> have to pull their hair like I did for the past few evenings!
>>>>
>>>> A little background:
>>>> I am running Ubuntu 14.04 on my system and docker stores its
>>>> credentials in the ~/.docker/config.json as
>>>> cat ~/.docker/config.json
>>>> {
>>>> "auths": {
>>>> "repo.example.com:5000": {
>>>> "auth": "<snip>",
>>>> "email": "<snip>"
>>>> }
>>>> }
>>>> }
>>>>
>>>> And I am doing all these experiments on a coreOS system which stores
>>>> the credentials  in ~/.dockercfg as
>>>> core@aurora-1 ~ $ cat ~/.dockercfg
>>>> {
>>>>   "repo.example.com:5000": {
>>>>     "auth": "<snip>",
>>>>     "email": "<snip>"
>>>>   }
>>>> }
>>>>
>>>> Since my container was an Ubuntu 14.04 container (as was my local
>>>> system), I used the ubuntu credential file format, i.e. I couldn't get the
>>>> slave task to read the docker credentials as I had stored it as
>>>> ~/.docker/config.json.
>>>> After parsing through (a lot of find's, grep's and regex matching)
>>>> aurora, mesos, and thermos source code, I saw in
>>>> mesos/src/docker/docker.cpp:
>>>>
>>>> 1126   // Set HOME variable to pick up *.dockercfg*.
>>>> 1127   map<string, string> environment = os::environment();
>>>> 1128
>>>> 1129   environment["HOME"] = directory;
>>>> 1130
>>>>
>>>> Changed the filename and the json content, changed the
>>>> thermos_executor_resources, and bam, docker pull works!
>>>>
>>>> Well, the mesos documentation does say "To run an image from a private
>>>> repository, one can include the URI pointing to a .dockercfg that contains
>>>> login information." and I would have read it a dozen times!
>>>> But I never thought that they literally meant '.dockercfg' as the name
>>>> of the file!
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> κρισhναν
>>>>
>>>> On Thu, Mar 3, 2016 at 1:45 PM, Krish <[email protected]>
>>>> wrote:
>>>>
>>>>>
>>>>> I have got the docker config file copied into the sandbox using the
>>>>> thermos_executor_resources flag; however docker is still not able to find
>>>>> the credentials file for doing an appropriate pull of image from a private
>>>>> repo.
>>>>>
>>>>> When I try to use the library/hello-world:latest image from public
>>>>> docker repo to check if everything works fine without the credentials, I
>>>>> encounter a different problem:
>>>>> exec: "/bin/sh": stat /bin/sh: no such file or directory
>>>>> Error response from daemon: Cannot start container
>>>>> de93dc344d44b41bccccff49e508001a97ff23a8964e637d32a506a31fd4d946: [8]
>>>>> System error: exec: "/bin/sh": stat /bin/sh: no such file or directory
>>>>>
>>>>> I was referring to this email for guidance on setting up a mesos
>>>>> slave:
>>>>> http://mail-archives.apache.org/mod_mbox/aurora-dev/201503.mbox/%3CCAKB1MkHR=+nonesllx9buwttdthsku46pw_wr4b+_z9p59+...@mail.gmail.com%3E
>>>>>
>>>>> So, I cannot get the credentials file to be used by docker, and if I
>>>>> bypass authentication, I can do a docker pull, but encounter a weird error
>>>>> in launching the hello-world image.
>>>>>
>>>>> Am I missing out on checking any log files generated? I currently
>>>>> refer to mesos-slave stdout and the sandbox stderr file.
>>>>> Any configuration parameter I am missing for this to happen?
>>>>>
>>>>> Any pointers will be really helpful. Thanks in advance.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> κρισhναν
>>>>>
>>>>> On Sun, Feb 28, 2016 at 3:37 PM, Krish <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Continuing my earlier chain of thought, I found this in the mesos bug
>>>>>> list:
>>>>>> MESOS-4242 - Allow Docker private registry credentials to be passed
>>>>>> from framework.
>>>>>> How does one pass credentials using the framework? As it seems the
>>>>>> .docker/config.json is not read from the slave.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> κρισhναν
>>>>>>
>>>>>> On Sat, Feb 27, 2016 at 11:46 PM, Krish <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> I couldn't complete my PoC before project before (got busy with
>>>>>>> other work). Well, it is never too late and here's my update and issue.
>>>>>>>
>>>>>>> I have a 3 node zk (3.5.1 alpha), mesos-master (v0.24.1) & aurora
>>>>>>> (v0.11.0) running.
>>>>>>> I was stuck in a problem where I was using mesos 0.25.0 & aurora
>>>>>>> 0.9.0 & got a protobuf field not set error - ExecutorInfo field.
>>>>>>>
>>>>>>> I have a mesos agent running in docker container on coreos and it
>>>>>>> can access the host docker just fine.
>>>>>>> I have also put the docker login credentials file at the right
>>>>>>> location for it to access the private docker registry.
>>>>>>> I can manually trigger a docker pull and docker run without issues
>>>>>>> from the slave (which is also reflected properly outside the slave
>>>>>>> container with docker images and docker ps).
>>>>>>>
>>>>>>> However, when I try to run an aurora job with hello-docker
>>>>>>> container, the slave prints out the log that docker pull has failed; 
>>>>>>> more
>>>>>>> specifically:
>>>>>>> " failed to start: Failed to 'docker pull
>>>>>>> private_repo.com:5000/krish/test:latest': exit status = exited with
>>>>>>> status 1 stderr = Error: image krish/test:latest not found"
>>>>>>>
>>>>>>> My hunch is that when using docker run from aurora DSL, it does not
>>>>>>> read the docker credentials file properly and hence fails. I can 
>>>>>>> reproduce
>>>>>>> the exact same error when I delete the credentials file from the slave 
>>>>>>> and
>>>>>>> trigger a pull.
>>>>>>>
>>>>>>> Is the hunch right? If yes, is there a way to resolve this? Maybe
>>>>>>> source it some way before the run command?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> κρισhναν
>>>>>>>
>>>>>>> On Tue, Oct 27, 2015 at 10:35 PM, Bill Farner <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> (1) clusters.json is written by you, configuring the CLI client
>>>>>>>> with instructions for what clusters are available and how to discover 
>>>>>>>> them.
>>>>>>>>
>>>>>>>> (2) That's expected - mesos only allows one active replica of a
>>>>>>>> framework at a time, this signals which one is active.
>>>>>>>>
>>>>>>>> (3) The observer is essentially a web server that allows you to
>>>>>>>> browse a task's sandbox directory and other information about it.  You 
>>>>>>>> will
>>>>>>>> need to configure it to run on your worker/agent nodes for that
>>>>>>>> functionality to work (it's linked from the scheduler web UI).
>>>>>>>>
>>>>>>>> (4) You could indeed implement that behavior externally.  There is
>>>>>>>> a reason:
>>>>>>>> https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L556-L559
>>>>>>>>
>>>>>>>> (5) That is correct.  The scheduler exposes a thrift API that you
>>>>>>>> would use (a REST API is coming, but ground has not yet been broken).  
>>>>>>>> If
>>>>>>>> you go this route, i suggest you skip the DSL and use the JSON task
>>>>>>>> description format that is shipped over the API.  There's not good
>>>>>>>> documentation on this, but we can help you through it and would be 
>>>>>>>> grateful
>>>>>>>> for a writeup of your approach!
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Oct 26, 2015 at 11:44 PM, Krish <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Folks,
>>>>>>>>> Firstly, thanks for all the help. Am happy to report that I have
>>>>>>>>> set up zk, mesos & aurora, & can work further towards my idea of 
>>>>>>>>> having an
>>>>>>>>> auto-scaling cluster.
>>>>>>>>> I have some further questions about the work done so far & things
>>>>>>>>> I plan to do:
>>>>>>>>>
>>>>>>>>>    1. Is the /etc/aurora/clusters.json file created by the
>>>>>>>>>    scheduled or does it need to be handcrafted? I had to manually 
>>>>>>>>> edit the
>>>>>>>>>    file to get my `aurora job ...` cli to work.
>>>>>>>>>
>>>>>>>>>    2. I am running a cluster of 3 coreOS VMs on vagrant with zk,
>>>>>>>>>    mesos & aurora in a docker container. Only 1 of them outputs '1' 
>>>>>>>>> when I
>>>>>>>>>    look at the framework_registered' field. Is this expected? How do 
>>>>>>>>> I verify
>>>>>>>>>    that they are working as a cluster?
>>>>>>>>>
>>>>>>>>>    3. From the documentation, I see that there is an observer
>>>>>>>>>    that needs to be listening on port 1338. What is the observer 
>>>>>>>>> socket & its
>>>>>>>>>    purpose? I have aurora listening only on ports 8081 (http port) & 
>>>>>>>>> 8083
>>>>>>>>>    (libprocess).
>>>>>>>>>
>>>>>>>>>    4. I read about the 'PENDING' field in aurora documentation,
>>>>>>>>>    as Bill suggested, & realize that it just shows that a task is 
>>>>>>>>> waiting for
>>>>>>>>>    some reasons (for want of resources, in my case, as 0 slaves have
>>>>>>>>>    registered). I was thinking of adding a hook to the pending state; 
>>>>>>>>> say if a
>>>>>>>>>    task is PENDING for 5 minutes for lack of resources in the 
>>>>>>>>> cluster, then
>>>>>>>>>    spin up a new machine. Is this the right approach to take? Does 
>>>>>>>>> aurora
>>>>>>>>>    provide reasons for why is a task in PENDING state?
>>>>>>>>>
>>>>>>>>>    => aurora job status testcluster/$USER/test/hello_world
>>>>>>>>>     INFO] Checking status of testcluster/ubuntu/test/hello_world
>>>>>>>>>    Active tasks (1):
>>>>>>>>>           Task role: ubuntu, env: test, name: hello_world,
>>>>>>>>>    instance: 0, status:
>>>>>>>>>    PENDING on None
>>>>>>>>>              cpus: 0.1, ram: 16 MB, disk: 16 MB
>>>>>>>>>              events:
>>>>>>>>>               2015-10-23 04:55:33 PENDING: None
>>>>>>>>>    Inactive tasks (0):
>>>>>>>>>
>>>>>>>>>    5. Aurora defines job/s is a .aurora config file & if I decide
>>>>>>>>>    to increase/decrease the number of instances in my cluster, then I 
>>>>>>>>> need to
>>>>>>>>>    create/overwrite the concerned the .aurora and trigger the `aurora 
>>>>>>>>> update
>>>>>>>>>    ...` command. Is this right?
>>>>>>>>>    If yes, is there an HTTP API I can invoke remotely which
>>>>>>>>>    triggers this update?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> κρισhναν
>>>>>>>>>
>>>>>>>>> On Fri, Oct 23, 2015 at 8:09 AM, Joshua Cohen <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> I suspect your error from `aurora job create ...` is due to the
>>>>>>>>>> aurora config you're using referencing `/vagrant/hello_world.py` 
>>>>>>>>>> which does
>>>>>>>>>> not exist (as you say: you're not even using Vagrant). Can you link 
>>>>>>>>>> the
>>>>>>>>>> .aurora config you're using?
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>>
>>>>>>>>>> Joshua
>>>>>>>>>>
>>>>>>>>>> On Thu, Oct 22, 2015 at 3:22 PM, Krish <[email protected]
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks, Zameer.
>>>>>>>>>>>
>>>>>>>>>>> I had to modify  /etc/aurora/clusters.json:
>>>>>>>>>>> [
>>>>>>>>>>>   {
>>>>>>>>>>>     "auth_mechanism": "UNAUTHENTICATED",
>>>>>>>>>>>     "name": "testcluster",
>>>>>>>>>>>     "scheduler_zk_path": "/scheduler/aurora",
>>>>>>>>>>>     "slave_root": "/var/lib/mesos",
>>>>>>>>>>>     "slave_run_directory": "latest",
>>>>>>>>>>>     "zk": "127.0.1.1"
>>>>>>>>>>>   }
>>>>>>>>>>> ]
>>>>>>>>>>>
>>>>>>>>>>> I have a hello_world.aurora in my home folder. However the
>>>>>>>>>>> following command errors out:
>>>>>>>>>>> ~$ aurora job create testcluster/testrole/test/hellojob
>>>>>>>>>>> ./hello_world.aurora
>>>>>>>>>>> Error loading configuration: [Errno 2] No such file or
>>>>>>>>>>> directory: '/vagrant/hello_world.py'
>>>>>>>>>>>
>>>>>>>>>>> A job list does work:
>>>>>>>>>>> ~$ aurora job list testcluster
>>>>>>>>>>>  INFO] Retrieving jobs for role None
>>>>>>>>>>>
>>>>>>>>>>> I am not even using the vagrant. I am using zk & mesos on the
>>>>>>>>>>> same machine as aurora. How do I submit these job templates to 
>>>>>>>>>>> aurora?
>>>>>>>>>>>
>>>>>>>>>>> Any pointers to documentation will be helpful.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> κρισhναν
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 21, 2015 at 11:09 PM, Zameer Manji <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Version 0.9.0 does not have the gc executor. Version 0.9.0 uses
>>>>>>>>>>>> Mesos' task reconciliation
>>>>>>>>>>>> <http://mesos.apache.org/documentation/latest/reconciliation/> API
>>>>>>>>>>>> instead.
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Oct 21, 2015 at 9:28 AM, Krish <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks Bill for the location to the debs. I was finally able
>>>>>>>>>>>>> to run aurora. :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> I did find thermos_executor.pex & thermos_observer after
>>>>>>>>>>>>> installing aurora-executor. I still could not find 
>>>>>>>>>>>>> gc_executor.pex on my
>>>>>>>>>>>>> system.
>>>>>>>>>>>>> Is there a location from where I can download the binaries for
>>>>>>>>>>>>> *.pex or build them from scratch?
>>>>>>>>>>>>>
>>>>>>>>>>>>> root@dev:/# find . -name "*.pex"
>>>>>>>>>>>>> ./usr/share/aurora/bin/thermos_executor.pex
>>>>>>>>>>>>> ./usr/share/aurora/bin/kaurora_admin.pex
>>>>>>>>>>>>> ./usr/share/aurora/bin/kaurora.pex
>>>>>>>>>>>>> ./usr/share/aurora/bin/thermos.pex
>>>>>>>>>>>>> ./usr/share/aurora/bin/thermos_observer.pex
>>>>>>>>>>>>> ./home/ubuntu/.pex
>>>>>>>>>>>>> ./root/.pex
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 11:46 PM, Bill Farner <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Aurora currently requires an executor, so setting it to
>>>>>>>>>>>>>> /dev/null will not work.  Happy to talk further about your 
>>>>>>>>>>>>>> thoughts around
>>>>>>>>>>>>>> sidestepping the executor.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As for working with the scheduler source code, it's a
>>>>>>>>>>>>>> standard gradle project and we tend to use intellij.  Docs to 
>>>>>>>>>>>>>> help ramp on
>>>>>>>>>>>>>> that:
>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As for builds - the .zip is a source distribution, so it
>>>>>>>>>>>>>> won't have any pre-built binaries.  If you're on debian, we have 
>>>>>>>>>>>>>> official
>>>>>>>>>>>>>> debs here: https://bintray.com/apache/aurora
>>>>>>>>>>>>>> You can see how they're built here (and can build your own)
>>>>>>>>>>>>>> packages: https://github.com/apache/aurora-packaging
>>>>>>>>>>>>>> We're close to having official RPMs, but none to speak of yet.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 9:47 AM, Krish <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Stephen,
>>>>>>>>>>>>>>> I am trying to get started and run aurora without thermos
>>>>>>>>>>>>>>> executor (setting it to /dev/null does not help) - on a local 
>>>>>>>>>>>>>>> linux box for
>>>>>>>>>>>>>>> now & planning to containerize/dockerize it later.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Can you please point me to the right documentation (or a
>>>>>>>>>>>>>>> pointer to the cli parsing source code) which can help me 
>>>>>>>>>>>>>>> resolve this?
>>>>>>>>>>>>>>> Also, are there any steps steps to import source code into 
>>>>>>>>>>>>>>> eclipse to
>>>>>>>>>>>>>>> browse & analyze code for this.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Also, where do i find all the *.pex files? They are not
>>>>>>>>>>>>>>> present in the zip file nor anywhere in the built source code.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I know I am asking too many queries on a single thread here,
>>>>>>>>>>>>>>> & would appreciate the help.
>>>>>>>>>>>>>>> I think at the end of this, I will put the steps I followed
>>>>>>>>>>>>>>> in a gist/blog so others might find their way around, & not 
>>>>>>>>>>>>>>> struggle as
>>>>>>>>>>>>>>> much.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan <
>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Krish,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> you don't have to set framework_authentication_file and
>>>>>>>>>>>>>>>> zk_digest_credentials. The scheduler help text is misleading 
>>>>>>>>>>>>>>>> here as
>>>>>>>>>>>>>>>> everything will work fine if you leave those empty.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In addition, looks like you are misunderstanding the usage
>>>>>>>>>>>>>>>> of the thermos_executor_path command line flag of the
>>>>>>>>>>>>>>>> scheduler. It is supposed to point to the binary containing 
>>>>>>>>>>>>>>>> the generic
>>>>>>>>>>>>>>>> Aurora executor (thermos_executor.pex).  You only need the 
>>>>>>>>>>>>>>>> hello_world.aurora
>>>>>>>>>>>>>>>> once your scheduler is up an running. It serves as an example 
>>>>>>>>>>>>>>>> input for the
>>>>>>>>>>>>>>>> aurora command line client which can be used to scheduler jobs 
>>>>>>>>>>>>>>>> and services
>>>>>>>>>>>>>>>> on an Aurora master.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Have you tried to use the vagrant box? Just type 'vagrant
>>>>>>>>>>>>>>>> up`in a checkout of the Aurora source code. It gives you a 
>>>>>>>>>>>>>>>> running
>>>>>>>>>>>>>>>> scheduler to play with. Once you have understood how it works, 
>>>>>>>>>>>>>>>> you can
>>>>>>>>>>>>>>>> start trying to install it on your own (by reverse-engineering 
>>>>>>>>>>>>>>>> the vagrant
>>>>>>>>>>>>>>>> box).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hope this helps a little,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>>>> *From:* Krish <[email protected]>
>>>>>>>>>>>>>>>> *Sent:* Tuesday, October 20, 2015 11:39 AM
>>>>>>>>>>>>>>>> *To:* Bill Farner
>>>>>>>>>>>>>>>> *Cc:* [email protected]; Erb, Stephan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Bill/Stephen,
>>>>>>>>>>>>>>>> I still get a stacktrace when running the aurora scheduler
>>>>>>>>>>>>>>>> CLI.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I do not know what to specify for
>>>>>>>>>>>>>>>>  -framework_authentication_file & -zk_digest_credentials, and 
>>>>>>>>>>>>>>>> they are
>>>>>>>>>>>>>>>> required arguments.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I am not using any authentication on Mesos master, do I
>>>>>>>>>>>>>>>> still need the framework_authentication_file parameter?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> rm -rf /db /backup_dir
>>>>>>>>>>>>>>>> mesos-log initialize --path="/db"
>>>>>>>>>>>>>>>> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
>>>>>>>>>>>>>>>> JAVA_OPTS="-Xmx1536m  -Xms256m"
>>>>>>>>>>>>>>>> /usr/local/aurora-scheduler/bin/aurora-scheduler 
>>>>>>>>>>>>>>>> -backup_dir=/backup_dir
>>>>>>>>>>>>>>>> -cluster_name=tc 
>>>>>>>>>>>>>>>> -mesos_master_address=zk://localhost:2181/mesos/master
>>>>>>>>>>>>>>>> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181
>>>>>>>>>>>>>>>> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false
>>>>>>>>>>>>>>>> -native_log_file_path=/db
>>>>>>>>>>>>>>>> -thermos_executor_path=/home/ubuntu/hello_world.aurora
>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>> INFO: Binding org.apache.aurora.scheduler.http.Utilization
>>>>>>>>>>>>>>>> to GuiceManagedCompon
>>>>>>>>>>>>>>>> entProvider with the scope "PerRequest"
>>>>>>>>>>>>>>>> Oct 20, 2015 9:27:40 AM
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.cron.quartz.CronModule provi
>>>>>>>>>>>>>>>> deTimeZone
>>>>>>>>>>>>>>>> WARNING: Cron schedules are configured to fire according to
>>>>>>>>>>>>>>>> timezone Greenwich M
>>>>>>>>>>>>>>>> ean Time but system timezone is set to Coordinated
>>>>>>>>>>>>>>>> Universal Time
>>>>>>>>>>>>>>>> Oct 20, 2015 9:27:41 AM
>>>>>>>>>>>>>>>> org.eclipse.jetty.server.AbstractConnector doStart
>>>>>>>>>>>>>>>> INFO: Started [email protected]:43843
>>>>>>>>>>>>>>>> E1020 09:27:41.290 THREAD1
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec
>>>>>>>>>>>>>>>> ute: Caught unchecked exception:
>>>>>>>>>>>>>>>> com.google.inject.ProvisionException: Guice pro
>>>>>>>>>>>>>>>> vision errors:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1) Error in custom provider,
>>>>>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be null at
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>>>>>>>   while locating
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1 error
>>>>>>>>>>>>>>>> com.google.inject.ProvisionException: Guice provision
>>>>>>>>>>>>>>>> errors:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1) Error in custom provider,
>>>>>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be
>>>>>>>>>>>>>>>> null
>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>>>>>>>   while locating
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1 error
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The typical flow is that you keep your .aurora file
>>>>>>>>>>>>>>>>> checked into git, and commit every time you deploy/update.  
>>>>>>>>>>>>>>>>> When you change
>>>>>>>>>>>>>>>>> your file, you will instruct Aurora to update the live job 
>>>>>>>>>>>>>>>>> (have a look at aurora
>>>>>>>>>>>>>>>>> update -h).  Aurora will perform a rolling upgrade of
>>>>>>>>>>>>>>>>> your job to the new config.  You'll use this same flow for 
>>>>>>>>>>>>>>>>> updating your
>>>>>>>>>>>>>>>>> job's software as well as resizing the job.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> For (3), you could set up alerting for stats that the
>>>>>>>>>>>>>>>>> scheduler exports.  Have a look here for monitoring 
>>>>>>>>>>>>>>>>> background:
>>>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/monitoring.md
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> You'll find want to look at scheduler stats related to
>>>>>>>>>>>>>>>>> 'pending'.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 12:16 PM, Krish <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks for the pointer. Now I notice that the
>>>>>>>>>>>>>>>>>> aurora-scheduler script has the --thermos_executor_path as a 
>>>>>>>>>>>>>>>>>> mandatory
>>>>>>>>>>>>>>>>>> requirement.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I have a couple of questions on how the
>>>>>>>>>>>>>>>>>> thermos_executor/.aurora config file functions:
>>>>>>>>>>>>>>>>>> 1. Do we have to statically define the file beforehand?
>>>>>>>>>>>>>>>>>> 2. What happens when we want to dynamically change the
>>>>>>>>>>>>>>>>>> config, say increasing the number of instances of a service 
>>>>>>>>>>>>>>>>>> required? Does
>>>>>>>>>>>>>>>>>> aurora require a reboot then?
>>>>>>>>>>>>>>>>>> 3. How do I get notified about the message mesos sends
>>>>>>>>>>>>>>>>>> when it cannot schedule tasks for lack of resources? Should 
>>>>>>>>>>>>>>>>>> I depend on
>>>>>>>>>>>>>>>>>> aurora for this or try to look for a hook into mesos?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I think a little bit of context would help here.
>>>>>>>>>>>>>>>>>> What I plan to check is to run a very basic job/task
>>>>>>>>>>>>>>>>>> inside a docker container with aurora & wait for a 'resource 
>>>>>>>>>>>>>>>>>> not available'
>>>>>>>>>>>>>>>>>> message from mesos, and accordingly call an api to spin up a 
>>>>>>>>>>>>>>>>>> new node in my
>>>>>>>>>>>>>>>>>> cluster.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan <
>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I believe you are missing the thermos_executor options
>>>>>>>>>>>>>>>>>>> that have to be passed to the scheduler command line.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> See
>>>>>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39
>>>>>>>>>>>>>>>>>>> for an example
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>>>>>>> *From:* Krish <[email protected]>
>>>>>>>>>>>>>>>>>>> *Sent:* Monday, October 19, 2015 8:45 AM
>>>>>>>>>>>>>>>>>>> *To:* [email protected]
>>>>>>>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>> I am a n00b with apache aurora & trying to experiment
>>>>>>>>>>>>>>>>>>> some things on my local machine with zookeeper and 
>>>>>>>>>>>>>>>>>>> mesos-master running
>>>>>>>>>>>>>>>>>>> locally. They have initialized properly. When I try to run 
>>>>>>>>>>>>>>>>>>> aurora with the
>>>>>>>>>>>>>>>>>>> required options, I get the following error, & googing 
>>>>>>>>>>>>>>>>>>> hasn't helped me
>>>>>>>>>>>>>>>>>>> much here.
>>>>>>>>>>>>>>>>>>> Appreciate any help. Thanks in advance.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>> WARNING: Method [public void
>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)]
>>>>>>>>>>>>>>>>>>> is synthetic and is being intercepted by
>>>>>>>>>>>>>>>>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8].
>>>>>>>>>>>>>>>>>>> This could indicate a bug.  The method
>>>>>>>>>>>>>>>>>>>  may be intercepted twice, or may not be intercepted at
>>>>>>>>>>>>>>>>>>> all.
>>>>>>>>>>>>>>>>>>> Exception in thread "main"
>>>>>>>>>>>>>>>>>>> com.google.inject.CreationException: Guice creation errors:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 1) An exception was caught and reported. Message: A
>>>>>>>>>>>>>>>>>>> value may only be retrieved from a variable that has a 
>>>>>>>>>>>>>>>>>>> default or has been
>>>>>>>>>>>>>>>>>>> set.
>>>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2) Could not find a suitable constructor in
>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes 
>>>>>>>>>>>>>>>>>>> must have
>>>>>>>>>>>>>>>>>>> either one (a
>>>>>>>>>>>>>>>>>>> nd only one) constructor annotated with @Inject or a
>>>>>>>>>>>>>>>>>>> zero-argument constructor that is not private.
>>>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
>>>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2 errors
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:95)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:83)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
>>>>>>>>>>>>>>>>>>> Caused by: java.lang.IllegalStateException: A value may
>>>>>>>>>>>>>>>>>>> only be retrieved from a variable that has a default or has 
>>>>>>>>>>>>>>>>>>> been set.
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>>>>>>>>>>>>>>>>>>>         at com.twitter.common.args.Arg.get(Arg.java:82)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.inject.util.Modules$2.configure(Modules.java:114)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements.getElements(Elements.java:101)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
>>>>>>>>>>>>>>>>>>>         ... 7 more
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Complete logs are present @http://pastebin.com/i72HvbYi.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Zameer Manji
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>
>> --
>>
>> Thumb typed mail
>>
>>

Re: Stacktrace when running Apache Aurora

Reply via email to