Hi @Haosdent, We have multiple networks- that could be one of the problems. I tried with all 3 of them and it still shows the same error. Can you help me understand what hostname exactly expects in such scenario?
On Thu, Dec 15, 2016 at 6:08 PM, haosdent <[email protected]> wrote: > Hi, @haripriya What's the hostname flag that you use to start master? > According to the screenshot you posted before, I think you need to set it > to something like `socrates-nid000xxx.us.cray.com`. > However, the error log you post above, you set the hostname flag to > nid00016 which could not be resolved. > > On Fri, Dec 16, 2016 at 6:51 AM, Haripriya Ayyalasomayajula < > [email protected]> wrote: > >> Hello @Haosdent, >> >> After I tried to use hostname, I still see the error. This is the output >> I see in developer tools for chrome: >> >> Failed to load resource: the server responded with a status of 404 (Not >> Found) >> http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._2 Failed >> to load resource: net::ERR_NAME_NOT_RESOLVED >> http://nid00016:5050/master/state?jsonp=angular.callbacks._3 Failed to >> load resource: net::ERR_NAME_NOT_RESOLVED >> http://nid00016:5050/master/state?jsonp=angular.callbacks._4 Failed to >> load resource: net::ERR_NAME_NOT_RESOLVED >> http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._5 Failed >> to load resource: net::ERR_NAME_NOT_RESOLVED >> http://nid00016:5050/master/state?jsonp=angular.callbacks._6 Failed to >> load resource: net::ERR_NAME_NOT_RESOLVED >> http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._7 Failed >> to load resource: net::ERR_NAME_NOT_RESOLVED >> http://nid00016:5050/master/state?jsonp=angular.callbacks._8 Failed to >> load resource: net::ERR_NAME_NOT_RESOLVED >> http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._9 Failed >> to load resource: net::ERR_NAME_NOT_RESOLVED >> http://nid00016:5050/master/state?jsonp=angular.callbacks._a Failed to >> load resource: net::ERR_NAME_NOT_RESOLVED >> http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._b Failed >> to load resource: net::ERR_NAME_NOT_RESOLVED >> http://nid00016:5050/master/state?jsonp=angular.callbacks._c Failed to >> load resource: net::ERR_NAME_NOT_RESOLVED >> http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._d Failed >> to load resource: net::ERR_NAME_NOT_RESOLVED >> http://nid00016:5050/master/state?jsonp=angular.callbacks._e Failed to >> load resource: net::ERR_NAME_NOT_RESOLVED >> http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._f Failed >> to load resource: net::ERR_NAME_NOT_RESOLVED >> http://nid00016:5050/master/state?jsonp=angular.callbacks._g Failed to >> load resource: net::ERR_NAME_NOT_RESOLVED >> http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._h Failed >> to load resource: net::ERR_NAME_NOT_RESOLVED >> angular-1.2.3.min.js:70 GET http://nid00016:5050/master/st >> ate?jsonp=angular.callbacks._i net::ERR_NAME_NOT_RESOLVEDg @ >> angular-1.2.3.min.js:70(anonymous function) @ angular-1.2.3.min.js:71D @ >> angular-1.2.3.min.js:68h @ angular-1.2.3.min.js:66D @ >> angular-1.2.3.min.js:91D @ angular-1.2.3.min.js:91(anonymous function) @ >> angular-1.2.3.min.js:93$eval @ angular-1.2.3.min.js:101$digest @ >> angular-1.2.3.min.js:98$apply @ angular-1.2.3.min.js:101(anonymous >> function) @ angular-1.2.3.min.js:111e @ angular-1.2.3.min.js:33(anonymous >> function) @ angular-1.2.3.min.js:37 >> angular-1.2.3.min.js:70 GET http://nid00016:5050/metrics/s >> napshot?jsonp=angular.callbacks._j net::ERR_NAME_NOT_RESOLVEDg @ >> angular-1.2.3.min.js:70(anonymous function) @ angular-1.2.3.min.js:71D @ >> angular-1.2.3.min.js:68h @ angular-1.2.3.min.js:66D @ >> angular-1.2.3.min.js:91D @ angular-1.2.3.min.js:91(anonymous function) @ >> angular-1.2.3.min.js:93$eval @ angular-1.2.3.min.js:101$digest @ >> angular-1.2.3.min.js:98$apply @ angular-1.2.3.min.js:101(anonymous >> function) @ angular-1.2.3.min.js:111e @ angular-1.2.3.min.js:33(anonymous >> function) @ angular-1.2.3.min.js:37 >> >> >> Also, regarding the "cluster flag", here is my output: >> >> nid00016: root 14940 2.5 0.0 2080192 85012 ? Ssl 16:44 >> 0:08 /usr/sbin/mesos-master --zk=zk://192.168.0.1:2181,192.168.0.17:2181, >> 192.168.0.33:2181/mesos --port=5050 --log_dir=/var/log/mesos >> --acls=/etc/mesos_acls.json --authenticate_frameworks=true >> --cluster="socrates" --credentials=/etc/marathon-auth/credentials >> --hostname=nid00016 --quorum=2 --work_dir=/var/lib/mesos >> >> nid00016: root 14965 0.0 0.0 107892 612 ? S 16:44 >> 0:00 logger -p user.info -t mesos-master[14940] >> >> nid00016: root 14966 0.0 0.0 107892 692 ? S 16:44 >> 0:00 logger -p user.err -t mesos-master[14940] >> >> nid00016: root 15892 0.0 0.0 113116 1604 ? Ss 16:50 >> 0:00 bash -c ps -aux | grep mesos-master >> >> nid00016: root 15959 0.0 0.0 112644 948 ? S 16:50 >> 0:00 grep mesos-master >> >> nid00032: root 30018 2.5 0.0 2670032 26480 ? Ssl 16:44 >> 0:08 /usr/sbin/mesos-master --zk=zk://192.168.0.1:2181,192.168.0.17:2181, >> 192.168.0.33:2181/mesos --port=5050 --log_dir=/var/log/mesos >> --acls=/etc/mesos_acls.json --authenticate_frameworks=true >> --cluster="socrates" --credentials=/etc/marathon-auth/credentials >> --hostname=nid00032 --quorum=2 --work_dir=/var/lib/mesos >> >> nid00032: root 30043 0.0 0.0 107892 612 ? S 16:44 >> 0:00 logger -p user.info -t mesos-master[30018] >> >> nid00032: root 30044 0.0 0.0 107892 692 ? S 16:44 >> 0:00 logger -p user.err -t mesos-master[30018] >> >> nid00032: root 31091 0.0 0.0 113116 1604 ? Ss 16:50 >> 0:00 bash -c ps -aux | grep mesos-master >> >> nid00032: root 31158 0.0 0.0 112644 948 ? S 16:50 >> 0:00 grep mesos-master >> >> nid00000: root 49753 3.7 0.0 3259912 27584 ? Ssl 16:44 >> 0:13 /usr/sbin/mesos-master --zk=zk://192.168.0.1:2181,192.168.0.17:2181, >> 192.168.0.33:2181/mesos --port=5050 --log_dir=/var/log/mesos >> --acls=/etc/mesos_acls.json --authenticate_frameworks=true >> --cluster="socrates" --credentials=/etc/marathon-auth/credentials >> --hostname=nid00000.local --quorum=2 --work_dir=/var/lib/mesos >> >> nid00000: root 49778 0.0 0.0 107892 612 ? S 16:44 >> 0:00 logger -p user.info -t mesos-master[49753] >> >> nid00000: root 49779 0.0 0.0 107892 692 ? S 16:44 >> 0:00 logger -p user.err -t mesos-master[49753] >> >> nid00000: root 50887 0.0 0.0 113116 1604 ? Ss 16:50 >> 0:00 bash -c ps -aux | grep mesos-master >> >> nid00000: root 50954 0.0 0.0 112648 948 ? S 16:50 >> 0:00 grep mesos-master >> >> On Tue, Dec 6, 2016 at 6:58 PM, haosdent <[email protected]> wrote: >> >>> Hi, @Haripriya It looks like there are some problems in your master >>> flags. >>> >>> > I'm attaching a snapshot of the error I've seen in Chrome with this >>> email. It'll be great if you can suggest if I'm missing any configuration >>> or if its some bug. >>> According to the screenshot you attached, the hostnames are incorrect on >>> your servers. Mesos WebUI depends on that to find the leading master. >>> A workaround is to specific the `--hostname` flag when starting your >>> masters. For example, launch your masters with >>> >>> ``` >>> $ mesos-master --hostname=socrates-nid000xxx.us.cray.com xxx >>> ``` >>> >>> > Is it something to do with a stale state of mesos anywhere or the way >>> I'm passing cluster? I have a config file named cluster in >>> /etc/mesos-master/ and when I restart the cluster it picks up the config >>> files. >>> >>> You need to ensure the flags of every master contains >>> `--cluster=your_cluster_name`. >>> >>> Could you perform `ps aux |grep mesos-master` on every master and paste >>> their outputs here? >>> >>> >>> On Wed, Dec 7, 2016 at 4:39 AM, Haripriya Ayyalasomayajula < >>> [email protected]> wrote: >>> >>>> Hello, @Haosdent, >>>> >>>> Thanks for suggesting these. >>>> I'm attaching a snapshot of the error I've seen in Chrome with this >>>> email. It'll be great if you can suggest if I'm missing any configuration >>>> or if its some bug. >>>> >>>> And for the second part, my `/master/state` end point does not return >>>> "cluster" anywhere. It returned 75k lines of json so I'm not pasting all of >>>> it. >>>> { >>>> "activated_slaves": 37.0, >>>> "build_date": "2016-11-16 01:31:49", >>>> "build_time": 1479259909.0, >>>> "build_user": "centos", >>>> "completed_frameworks": [ >>>> { >>>> "active": true, >>>> .......... >>>> >>>> >>>> >>>> "start_time": 1480967418.42687, >>>> "unregistered_frameworks": [], >>>> "version": "1.1.0" >>>> } >>>> >>>> Is it something to do with a stale state of mesos anywhere or the way >>>> I'm passing cluster? I have a config file named cluster in >>>> /etc/mesos-master/ and when I restart the cluster it picks up the config >>>> files. >>>> >>>> On Mon, Dec 5, 2016 at 6:24 PM, haosdent <[email protected]> wrote: >>>> >>>>> Hi, @Haripriya >>>>> >>>>> > (less than 1 min though the jobs are running just fine). >>>>> > Is there any new configuration that has to be added? >>>>> >>>>> We change to use JSONP to send requests in WebUI since 1.0 May I have >>>>> your error log in Safari, Chrome and Firefox? >>>>> You could open it via https://developers.google. >>>>> com/web/tools/chrome-devtools/console/ >>>>> >>>>> > The UI does not display the name of the cluster despite using the >>>>> --cluster flag. >>>>> --cluster flag works fine for me. May you paste your `/master/state` >>>>> endpoint at the email, I would like to check the value of `cluster` field >>>>> in it. >>>>> >>>>> On Tue, Dec 6, 2016 at 5:34 AM, Haripriya Ayyalasomayajula < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> I have two issues with the web UI in Mesos 1.1 >>>>>> >>>>>> 1. >>>>>> >>>>>> Earlier when I was using Mesos 0.28, mesos web UI would try to >>>>>> reconnect only when there are network issues or when there is a newly >>>>>> elected leader. After upgrade to 1.1, we see that it won't work (shows no >>>>>> leader is elected even when there is a leader elected and jobs are >>>>>> running >>>>>> happily ) on safari, works on chrome and firefox but tries to re-connect >>>>>> very often (less than 1 min though the jobs are running just fine). >>>>>> >>>>>> Is there any new configuration that has to be added? >>>>>> >>>>>> >>>>>> 2. The UI does not display the name of the cluster despite using the >>>>>> --cluster flag. >>>>>> >>>>>> /usr/sbin/mesos-master --zk=zk://mesos1:2181,mesos2:2181,mesos3:2181/ >>>>>> mesos --port=5050 --log_dir=/var/log/mesos --acls=/etc/mesos_acls.json >>>>>> --authenticate_frameworks=true --cluster="cluster1" >>>>>> --credentials=/etc/auth/credentials --quorum=2 --work_dir=/var/lib/ >>>>>> mesos >>>>>> >>>>>> >>>>>> I also tried adding the name of the cluster without quotes: cluster1 >>>>>> instead of "cluster1", but that doesn't work either. >>>>>> >>>>>> /usr/sbin/mesos-master --zk=zk://mesos1:2181,mesos2:2181,mesos3:2181/ >>>>>> mesos --port=5050 --log_dir=/var/log/mesos --acls=/etc/mesos_acls.json >>>>>> --authenticate_frameworks=true --cluster=cluster1 >>>>>> --credentials=/etc/auth/credentials --quorum=2 --work_dir=/var/lib/ >>>>>> mesos >>>>>> I greatly appreciate any help! >>>>>> >>>>>> -- >>>>>> Thanks, >>>>>> Haripriya >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Best Regards, >>>>> Haosdent Huang >>>>> >>>> >>>> >>>> >>>> -- >>>> Thanks, >>>> Haripriya >>>> >>>> >>> >>> >>> -- >>> Best Regards, >>> Haosdent Huang >>> >> >> >> >> -- >> Regards, >> Haripriya Ayyalasomayajula >> >> > > > -- > Best Regards, > Haosdent Huang > -- Regards, Haripriya Ayyalasomayajula

