Fab, thanks Vinod. Turns out that feature (different FQDN to serve the ui up on)
might well be really useful for us, so every cloud has a silver lining :)

back to the metadata feature though - do you know why just the 'id' of
the slaves isn't used?
As it stands adding disk storage, cores or RAM to a slave will cause
it to drop out of cluster -
does checking the whole metadata provide any benefit vs. checking the id?

On 18 June 2014 19:46, Vinod Kone <vinodk...@gmail.com> wrote:
> Filed https://issues.apache.org/jira/browse/MESOS-1506 for fixing
> flags/documentation.
>
>
> On Wed, Jun 18, 2014 at 11:33 AM, Dick Davies <d...@hellooperator.net>
> wrote:
>>
>> Thanks, it might be worth correcting the docs in that case then.
>> This URL says it'll use the system hostname, not the reverse DNS of
>> the ip argument:
>>
>> http://mesos.apache.org/documentation/latest/configuration/
>>
>> re: the CFS thing - this was while running Docker on the slaves - that
>> also uses cgroups
>> so maybe resources were getting split with mesos or something? (I'm
>> still reading up on
>> cgroups) - definitely wasn't the case until cfs was enabled.
>>
>>
>> On 18 June 2014 18:34, Vinod Kone <vinodk...@gmail.com> wrote:
>> > Hey Dick,
>> >
>> > Regarding slave recovery, any changes in the SlaveInfo (see mesos.proto)
>> > are
>> > considered as a new slave and hence recovery doesn't proceed forward.
>> > This
>> > is because Master caches SlaveInfo and it is quite complex to reconcile
>> > the
>> > differences in SlaveInfo. So we decided to fail on any SlaveInfo changes
>> > for
>> > now.
>> >
>> > In your particular case, https://issues.apache.org/jira/browse/MESOS-672
>> > was
>> > committed in 0.18.0 which fixed redirection
>> >  of WebUI. Included in this fix is https://reviews.apache.org/r/17573/
>> > which
>> > changed how SlaveInfo.hostname is calculated. Since you are not
>> > providing a
>> > hostname via "--hostname" flag, slave now deduces the hostname from
>> > "--ip"
>> > flag. Looks like in your cluster the hostname corresponding to that ip
>> > is
>> > different than what 'os::hostname()' gives.
>> >
>> > Couple of options to move forward. If you want slave recovery, provide
>> > "--hostname" that matches the previous hostname. If you don't care above
>> > recovery, just remove the meta directory ("rm -rf /var/mesos/meta") so
>> > that
>> > the slave starts as a fresh one (since you are not using cgroups, you
>> > will
>> > have to manually kill any old executors/tasks that are still alive on
>> > the
>> > slave).
>> >
>> > Not sure about your comment on CFS. Enabling CFS shouldn't change how
>> > much
>> > memory the slave sees as available. More details/logs would help
>> > diagnose
>> > the issue.
>> >
>> > HTH,
>> >
>> >
>> >
>> > On Wed, Jun 18, 2014 at 4:26 AM, Dick Davies <d...@hellooperator.net>
>> > wrote:
>> >>
>> >> Should have said, the CLI for this is :
>> >>
>> >> /usr/local/sbin/mesos-slave --master=zk://10.10.10.105:2181/mesos
>> >> --log_dir=/var/log/mesos --ip=10.10.10.101 --work_dir=/var/mesos
>> >>
>> >> (note IP is specified, hostname is not - docs indicated hostname arg
>> >> will default to the fqdn of host, but it appears to be using the value
>> >> passed as 'ip' instead.)
>> >>
>> >> On 18 June 2014 12:00, Dick Davies <d...@hellooperator.net> wrote:
>> >> > Hi, we recently bumped 0.17.0 -> 0.18.2 and the slaves
>> >> > now show their IPs rather than their FQDNs on the mesos UI.
>> >> >
>> >> > This broke slave recovery with the error:
>> >> >
>> >> > "Failed to perform recovery: Incompatible slave info detected"
>> >> >
>> >> >
>> >> > cpu, mem, disk, ports are all the same. so is the 'id' field.
>> >> >
>> >> > the only thing that's changed is are the 'hostname' and
>> >> > webui_hostname
>> >> > arguments
>> >> > (the CLI we're passing in is exactly the same as it was on 0.17.0, so
>> >> > presumably this is down to a change in mesos conventions).
>> >> >
>> >> > I've had similar issues enabling CFS in test environments (slaves
>> >> > show
>> >> > less free memory and refuse to recover).
>> >> >
>> >> > is the 'id' field not enough to uniquely identify a slave?
>> >
>> >
>
>

Reply via email to