Hi, @haripriya What's the hostname flag that you use to start master? According to the screenshot you posted before, I think you need to set it to something like `socrates-nid000xxx.us.cray.com`. However, the error log you post above, you set the hostname flag to nid00016 which could not be resolved.
On Fri, Dec 16, 2016 at 6:51 AM, Haripriya Ayyalasomayajula < [email protected]> wrote: > Hello @Haosdent, > > After I tried to use hostname, I still see the error. This is the output I > see in developer tools for chrome: > > Failed to load resource: the server responded with a status of 404 (Not > Found) > http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._2 Failed > to load resource: net::ERR_NAME_NOT_RESOLVED > http://nid00016:5050/master/state?jsonp=angular.callbacks._3 Failed to > load resource: net::ERR_NAME_NOT_RESOLVED > http://nid00016:5050/master/state?jsonp=angular.callbacks._4 Failed to > load resource: net::ERR_NAME_NOT_RESOLVED > http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._5 Failed > to load resource: net::ERR_NAME_NOT_RESOLVED > http://nid00016:5050/master/state?jsonp=angular.callbacks._6 Failed to > load resource: net::ERR_NAME_NOT_RESOLVED > http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._7 Failed > to load resource: net::ERR_NAME_NOT_RESOLVED > http://nid00016:5050/master/state?jsonp=angular.callbacks._8 Failed to > load resource: net::ERR_NAME_NOT_RESOLVED > http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._9 Failed > to load resource: net::ERR_NAME_NOT_RESOLVED > http://nid00016:5050/master/state?jsonp=angular.callbacks._a Failed to > load resource: net::ERR_NAME_NOT_RESOLVED > http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._b Failed > to load resource: net::ERR_NAME_NOT_RESOLVED > http://nid00016:5050/master/state?jsonp=angular.callbacks._c Failed to > load resource: net::ERR_NAME_NOT_RESOLVED > http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._d Failed > to load resource: net::ERR_NAME_NOT_RESOLVED > http://nid00016:5050/master/state?jsonp=angular.callbacks._e Failed to > load resource: net::ERR_NAME_NOT_RESOLVED > http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._f Failed > to load resource: net::ERR_NAME_NOT_RESOLVED > http://nid00016:5050/master/state?jsonp=angular.callbacks._g Failed to > load resource: net::ERR_NAME_NOT_RESOLVED > http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._h Failed > to load resource: net::ERR_NAME_NOT_RESOLVED > angular-1.2.3.min.js:70 GET http://nid00016:5050/master/ > state?jsonp=angular.callbacks._i net::ERR_NAME_NOT_RESOLVEDg @ > angular-1.2.3.min.js:70(anonymous function) @ angular-1.2.3.min.js:71D @ > angular-1.2.3.min.js:68h @ angular-1.2.3.min.js:66D @ > angular-1.2.3.min.js:91D @ angular-1.2.3.min.js:91(anonymous function) @ > angular-1.2.3.min.js:93$eval @ angular-1.2.3.min.js:101$digest @ > angular-1.2.3.min.js:98$apply @ angular-1.2.3.min.js:101(anonymous > function) @ angular-1.2.3.min.js:111e @ angular-1.2.3.min.js:33(anonymous > function) @ angular-1.2.3.min.js:37 > angular-1.2.3.min.js:70 GET http://nid00016:5050/metrics/ > snapshot?jsonp=angular.callbacks._j net::ERR_NAME_NOT_RESOLVEDg @ > angular-1.2.3.min.js:70(anonymous function) @ angular-1.2.3.min.js:71D @ > angular-1.2.3.min.js:68h @ angular-1.2.3.min.js:66D @ > angular-1.2.3.min.js:91D @ angular-1.2.3.min.js:91(anonymous function) @ > angular-1.2.3.min.js:93$eval @ angular-1.2.3.min.js:101$digest @ > angular-1.2.3.min.js:98$apply @ angular-1.2.3.min.js:101(anonymous > function) @ angular-1.2.3.min.js:111e @ angular-1.2.3.min.js:33(anonymous > function) @ angular-1.2.3.min.js:37 > > > Also, regarding the "cluster flag", here is my output: > > nid00016: root 14940 2.5 0.0 2080192 85012 ? Ssl 16:44 0:08 > /usr/sbin/mesos-master --zk=zk://192.168.0.1:2181,192.168.0.17:2181, > 192.168.0.33:2181/mesos --port=5050 --log_dir=/var/log/mesos > --acls=/etc/mesos_acls.json --authenticate_frameworks=true > --cluster="socrates" --credentials=/etc/marathon-auth/credentials > --hostname=nid00016 --quorum=2 --work_dir=/var/lib/mesos > > nid00016: root 14965 0.0 0.0 107892 612 ? S 16:44 0:00 > logger -p user.info -t mesos-master[14940] > > nid00016: root 14966 0.0 0.0 107892 692 ? S 16:44 0:00 > logger -p user.err -t mesos-master[14940] > > nid00016: root 15892 0.0 0.0 113116 1604 ? Ss 16:50 0:00 > bash -c ps -aux | grep mesos-master > > nid00016: root 15959 0.0 0.0 112644 948 ? S 16:50 0:00 > grep mesos-master > > nid00032: root 30018 2.5 0.0 2670032 26480 ? Ssl 16:44 0:08 > /usr/sbin/mesos-master --zk=zk://192.168.0.1:2181,192.168.0.17:2181, > 192.168.0.33:2181/mesos --port=5050 --log_dir=/var/log/mesos > --acls=/etc/mesos_acls.json --authenticate_frameworks=true > --cluster="socrates" --credentials=/etc/marathon-auth/credentials > --hostname=nid00032 --quorum=2 --work_dir=/var/lib/mesos > > nid00032: root 30043 0.0 0.0 107892 612 ? S 16:44 0:00 > logger -p user.info -t mesos-master[30018] > > nid00032: root 30044 0.0 0.0 107892 692 ? S 16:44 0:00 > logger -p user.err -t mesos-master[30018] > > nid00032: root 31091 0.0 0.0 113116 1604 ? Ss 16:50 0:00 > bash -c ps -aux | grep mesos-master > > nid00032: root 31158 0.0 0.0 112644 948 ? S 16:50 0:00 > grep mesos-master > > nid00000: root 49753 3.7 0.0 3259912 27584 ? Ssl 16:44 0:13 > /usr/sbin/mesos-master --zk=zk://192.168.0.1:2181,192.168.0.17:2181, > 192.168.0.33:2181/mesos --port=5050 --log_dir=/var/log/mesos > --acls=/etc/mesos_acls.json --authenticate_frameworks=true > --cluster="socrates" --credentials=/etc/marathon-auth/credentials > --hostname=nid00000.local --quorum=2 --work_dir=/var/lib/mesos > > nid00000: root 49778 0.0 0.0 107892 612 ? S 16:44 0:00 > logger -p user.info -t mesos-master[49753] > > nid00000: root 49779 0.0 0.0 107892 692 ? S 16:44 0:00 > logger -p user.err -t mesos-master[49753] > > nid00000: root 50887 0.0 0.0 113116 1604 ? Ss 16:50 0:00 > bash -c ps -aux | grep mesos-master > > nid00000: root 50954 0.0 0.0 112648 948 ? S 16:50 0:00 > grep mesos-master > > On Tue, Dec 6, 2016 at 6:58 PM, haosdent <[email protected]> wrote: > >> Hi, @Haripriya It looks like there are some problems in your master flags. >> >> > I'm attaching a snapshot of the error I've seen in Chrome with this >> email. It'll be great if you can suggest if I'm missing any configuration >> or if its some bug. >> According to the screenshot you attached, the hostnames are incorrect on >> your servers. Mesos WebUI depends on that to find the leading master. >> A workaround is to specific the `--hostname` flag when starting your >> masters. For example, launch your masters with >> >> ``` >> $ mesos-master --hostname=socrates-nid000xxx.us.cray.com xxx >> ``` >> >> > Is it something to do with a stale state of mesos anywhere or the way >> I'm passing cluster? I have a config file named cluster in >> /etc/mesos-master/ and when I restart the cluster it picks up the config >> files. >> >> You need to ensure the flags of every master contains >> `--cluster=your_cluster_name`. >> >> Could you perform `ps aux |grep mesos-master` on every master and paste >> their outputs here? >> >> >> On Wed, Dec 7, 2016 at 4:39 AM, Haripriya Ayyalasomayajula < >> [email protected]> wrote: >> >>> Hello, @Haosdent, >>> >>> Thanks for suggesting these. >>> I'm attaching a snapshot of the error I've seen in Chrome with this >>> email. It'll be great if you can suggest if I'm missing any configuration >>> or if its some bug. >>> >>> And for the second part, my `/master/state` end point does not return >>> "cluster" anywhere. It returned 75k lines of json so I'm not pasting all of >>> it. >>> { >>> "activated_slaves": 37.0, >>> "build_date": "2016-11-16 01:31:49", >>> "build_time": 1479259909.0, >>> "build_user": "centos", >>> "completed_frameworks": [ >>> { >>> "active": true, >>> .......... >>> >>> >>> >>> "start_time": 1480967418.42687, >>> "unregistered_frameworks": [], >>> "version": "1.1.0" >>> } >>> >>> Is it something to do with a stale state of mesos anywhere or the way >>> I'm passing cluster? I have a config file named cluster in >>> /etc/mesos-master/ and when I restart the cluster it picks up the config >>> files. >>> >>> On Mon, Dec 5, 2016 at 6:24 PM, haosdent <[email protected]> wrote: >>> >>>> Hi, @Haripriya >>>> >>>> > (less than 1 min though the jobs are running just fine). >>>> > Is there any new configuration that has to be added? >>>> >>>> We change to use JSONP to send requests in WebUI since 1.0 May I have >>>> your error log in Safari, Chrome and Firefox? >>>> You could open it via https://developers.google. >>>> com/web/tools/chrome-devtools/console/ >>>> >>>> > The UI does not display the name of the cluster despite using the >>>> --cluster flag. >>>> --cluster flag works fine for me. May you paste your `/master/state` >>>> endpoint at the email, I would like to check the value of `cluster` field >>>> in it. >>>> >>>> On Tue, Dec 6, 2016 at 5:34 AM, Haripriya Ayyalasomayajula < >>>> [email protected]> wrote: >>>> >>>>> Hi all, >>>>> >>>>> I have two issues with the web UI in Mesos 1.1 >>>>> >>>>> 1. >>>>> >>>>> Earlier when I was using Mesos 0.28, mesos web UI would try to >>>>> reconnect only when there are network issues or when there is a newly >>>>> elected leader. After upgrade to 1.1, we see that it won't work (shows no >>>>> leader is elected even when there is a leader elected and jobs are running >>>>> happily ) on safari, works on chrome and firefox but tries to re-connect >>>>> very often (less than 1 min though the jobs are running just fine). >>>>> >>>>> Is there any new configuration that has to be added? >>>>> >>>>> >>>>> 2. The UI does not display the name of the cluster despite using the >>>>> --cluster flag. >>>>> >>>>> /usr/sbin/mesos-master --zk=zk://mesos1:2181,mesos2:2181,mesos3:2181/ >>>>> mesos --port=5050 --log_dir=/var/log/mesos --acls=/etc/mesos_acls.json >>>>> --authenticate_frameworks=true --cluster="cluster1" >>>>> --credentials=/etc/auth/credentials --quorum=2 --work_dir=/var/lib/ >>>>> mesos >>>>> >>>>> >>>>> I also tried adding the name of the cluster without quotes: cluster1 >>>>> instead of "cluster1", but that doesn't work either. >>>>> >>>>> /usr/sbin/mesos-master --zk=zk://mesos1:2181,mesos2:2181,mesos3:2181/ >>>>> mesos --port=5050 --log_dir=/var/log/mesos --acls=/etc/mesos_acls.json >>>>> --authenticate_frameworks=true --cluster=cluster1 >>>>> --credentials=/etc/auth/credentials --quorum=2 --work_dir=/var/lib/ >>>>> mesos >>>>> I greatly appreciate any help! >>>>> >>>>> -- >>>>> Thanks, >>>>> Haripriya >>>>> >>>> >>>> >>>> >>>> -- >>>> Best Regards, >>>> Haosdent Huang >>>> >>> >>> >>> >>> -- >>> Thanks, >>> Haripriya >>> >>> >> >> >> -- >> Best Regards, >> Haosdent Huang >> > > > > -- > Regards, > Haripriya Ayyalasomayajula > > -- Best Regards, Haosdent Huang

