Hi, @haripriya Ping me in Mesos Slack (https://mesos.slack.com/) when you are available, I think it would speed up the progress to solve your problem. My id is @haosdent. If you have not join Mesos Slack before, you could join it via https://mesos-slackin.herokuapp.com .
On Tue, Dec 20, 2016 at 2:22 AM, Haripriya Ayyalasomayajula < [email protected]> wrote: > Hi @Haosdent, > > We have multiple networks- that could be one of the problems. I tried with > all 3 of them and it still shows the same error. Can you help me understand > what hostname exactly expects in such scenario? > > On Thu, Dec 15, 2016 at 6:08 PM, haosdent <[email protected]> wrote: > >> Hi, @haripriya What's the hostname flag that you use to start master? >> According to the screenshot you posted before, I think you need to set it >> to something like `socrates-nid000xxx.us.cray.com`. >> However, the error log you post above, you set the hostname flag to >> nid00016 which could not be resolved. >> >> On Fri, Dec 16, 2016 at 6:51 AM, Haripriya Ayyalasomayajula < >> [email protected]> wrote: >> >>> Hello @Haosdent, >>> >>> After I tried to use hostname, I still see the error. This is the output >>> I see in developer tools for chrome: >>> >>> Failed to load resource: the server responded with a status of 404 (Not >>> Found) >>> http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._2 Failed >>> to load resource: net::ERR_NAME_NOT_RESOLVED >>> http://nid00016:5050/master/state?jsonp=angular.callbacks._3 Failed to >>> load resource: net::ERR_NAME_NOT_RESOLVED >>> http://nid00016:5050/master/state?jsonp=angular.callbacks._4 Failed to >>> load resource: net::ERR_NAME_NOT_RESOLVED >>> http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._5 Failed >>> to load resource: net::ERR_NAME_NOT_RESOLVED >>> http://nid00016:5050/master/state?jsonp=angular.callbacks._6 Failed to >>> load resource: net::ERR_NAME_NOT_RESOLVED >>> http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._7 Failed >>> to load resource: net::ERR_NAME_NOT_RESOLVED >>> http://nid00016:5050/master/state?jsonp=angular.callbacks._8 Failed to >>> load resource: net::ERR_NAME_NOT_RESOLVED >>> http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._9 Failed >>> to load resource: net::ERR_NAME_NOT_RESOLVED >>> http://nid00016:5050/master/state?jsonp=angular.callbacks._a Failed to >>> load resource: net::ERR_NAME_NOT_RESOLVED >>> http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._b Failed >>> to load resource: net::ERR_NAME_NOT_RESOLVED >>> http://nid00016:5050/master/state?jsonp=angular.callbacks._c Failed to >>> load resource: net::ERR_NAME_NOT_RESOLVED >>> http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._d Failed >>> to load resource: net::ERR_NAME_NOT_RESOLVED >>> http://nid00016:5050/master/state?jsonp=angular.callbacks._e Failed to >>> load resource: net::ERR_NAME_NOT_RESOLVED >>> http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._f Failed >>> to load resource: net::ERR_NAME_NOT_RESOLVED >>> http://nid00016:5050/master/state?jsonp=angular.callbacks._g Failed to >>> load resource: net::ERR_NAME_NOT_RESOLVED >>> http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._h Failed >>> to load resource: net::ERR_NAME_NOT_RESOLVED >>> angular-1.2.3.min.js:70 GET http://nid00016:5050/master/st >>> ate?jsonp=angular.callbacks._i net::ERR_NAME_NOT_RESOLVEDg @ >>> angular-1.2.3.min.js:70(anonymous function) @ angular-1.2.3.min.js:71D >>> @ angular-1.2.3.min.js:68h @ angular-1.2.3.min.js:66D @ >>> angular-1.2.3.min.js:91D @ angular-1.2.3.min.js:91(anonymous function) >>> @ angular-1.2.3.min.js:93$eval @ angular-1.2.3.min.js:101$digest @ >>> angular-1.2.3.min.js:98$apply @ angular-1.2.3.min.js:101(anonymous >>> function) @ angular-1.2.3.min.js:111e @ angular-1.2.3.min.js:33(anonymous >>> function) @ angular-1.2.3.min.js:37 >>> angular-1.2.3.min.js:70 GET http://nid00016:5050/metrics/s >>> napshot?jsonp=angular.callbacks._j net::ERR_NAME_NOT_RESOLVEDg @ >>> angular-1.2.3.min.js:70(anonymous function) @ angular-1.2.3.min.js:71D >>> @ angular-1.2.3.min.js:68h @ angular-1.2.3.min.js:66D @ >>> angular-1.2.3.min.js:91D @ angular-1.2.3.min.js:91(anonymous function) >>> @ angular-1.2.3.min.js:93$eval @ angular-1.2.3.min.js:101$digest @ >>> angular-1.2.3.min.js:98$apply @ angular-1.2.3.min.js:101(anonymous >>> function) @ angular-1.2.3.min.js:111e @ angular-1.2.3.min.js:33(anonymous >>> function) @ angular-1.2.3.min.js:37 >>> >>> >>> Also, regarding the "cluster flag", here is my output: >>> >>> nid00016: root 14940 2.5 0.0 2080192 85012 ? Ssl 16:44 >>> 0:08 /usr/sbin/mesos-master --zk=zk://192.168.0.1:2181,192.168.0.17:2181 >>> ,192.168.0.33:2181/mesos --port=5050 --log_dir=/var/log/mesos >>> --acls=/etc/mesos_acls.json --authenticate_frameworks=true >>> --cluster="socrates" --credentials=/etc/marathon-auth/credentials >>> --hostname=nid00016 --quorum=2 --work_dir=/var/lib/mesos >>> >>> nid00016: root 14965 0.0 0.0 107892 612 ? S 16:44 >>> 0:00 logger -p user.info -t mesos-master[14940] >>> >>> nid00016: root 14966 0.0 0.0 107892 692 ? S 16:44 >>> 0:00 logger -p user.err -t mesos-master[14940] >>> >>> nid00016: root 15892 0.0 0.0 113116 1604 ? Ss 16:50 >>> 0:00 bash -c ps -aux | grep mesos-master >>> >>> nid00016: root 15959 0.0 0.0 112644 948 ? S 16:50 >>> 0:00 grep mesos-master >>> >>> nid00032: root 30018 2.5 0.0 2670032 26480 ? Ssl 16:44 >>> 0:08 /usr/sbin/mesos-master --zk=zk://192.168.0.1:2181,192.168.0.17:2181 >>> ,192.168.0.33:2181/mesos --port=5050 --log_dir=/var/log/mesos >>> --acls=/etc/mesos_acls.json --authenticate_frameworks=true >>> --cluster="socrates" --credentials=/etc/marathon-auth/credentials >>> --hostname=nid00032 --quorum=2 --work_dir=/var/lib/mesos >>> >>> nid00032: root 30043 0.0 0.0 107892 612 ? S 16:44 >>> 0:00 logger -p user.info -t mesos-master[30018] >>> >>> nid00032: root 30044 0.0 0.0 107892 692 ? S 16:44 >>> 0:00 logger -p user.err -t mesos-master[30018] >>> >>> nid00032: root 31091 0.0 0.0 113116 1604 ? Ss 16:50 >>> 0:00 bash -c ps -aux | grep mesos-master >>> >>> nid00032: root 31158 0.0 0.0 112644 948 ? S 16:50 >>> 0:00 grep mesos-master >>> >>> nid00000: root 49753 3.7 0.0 3259912 27584 ? Ssl 16:44 >>> 0:13 /usr/sbin/mesos-master --zk=zk://192.168.0.1:2181,192.168.0.17:2181 >>> ,192.168.0.33:2181/mesos --port=5050 --log_dir=/var/log/mesos >>> --acls=/etc/mesos_acls.json --authenticate_frameworks=true >>> --cluster="socrates" --credentials=/etc/marathon-auth/credentials >>> --hostname=nid00000.local --quorum=2 --work_dir=/var/lib/mesos >>> >>> nid00000: root 49778 0.0 0.0 107892 612 ? S 16:44 >>> 0:00 logger -p user.info -t mesos-master[49753] >>> >>> nid00000: root 49779 0.0 0.0 107892 692 ? S 16:44 >>> 0:00 logger -p user.err -t mesos-master[49753] >>> >>> nid00000: root 50887 0.0 0.0 113116 1604 ? Ss 16:50 >>> 0:00 bash -c ps -aux | grep mesos-master >>> >>> nid00000: root 50954 0.0 0.0 112648 948 ? S 16:50 >>> 0:00 grep mesos-master >>> >>> On Tue, Dec 6, 2016 at 6:58 PM, haosdent <[email protected]> wrote: >>> >>>> Hi, @Haripriya It looks like there are some problems in your master >>>> flags. >>>> >>>> > I'm attaching a snapshot of the error I've seen in Chrome with this >>>> email. It'll be great if you can suggest if I'm missing any configuration >>>> or if its some bug. >>>> According to the screenshot you attached, the hostnames are incorrect >>>> on your servers. Mesos WebUI depends on that to find the leading master. >>>> A workaround is to specific the `--hostname` flag when starting your >>>> masters. For example, launch your masters with >>>> >>>> ``` >>>> $ mesos-master --hostname=socrates-nid000xxx.us.cray.com xxx >>>> ``` >>>> >>>> > Is it something to do with a stale state of mesos anywhere or the >>>> way I'm passing cluster? I have a config file named cluster in >>>> /etc/mesos-master/ and when I restart the cluster it picks up the config >>>> files. >>>> >>>> You need to ensure the flags of every master contains >>>> `--cluster=your_cluster_name`. >>>> >>>> Could you perform `ps aux |grep mesos-master` on every master and paste >>>> their outputs here? >>>> >>>> >>>> On Wed, Dec 7, 2016 at 4:39 AM, Haripriya Ayyalasomayajula < >>>> [email protected]> wrote: >>>> >>>>> Hello, @Haosdent, >>>>> >>>>> Thanks for suggesting these. >>>>> I'm attaching a snapshot of the error I've seen in Chrome with this >>>>> email. It'll be great if you can suggest if I'm missing any configuration >>>>> or if its some bug. >>>>> >>>>> And for the second part, my `/master/state` end point does not return >>>>> "cluster" anywhere. It returned 75k lines of json so I'm not pasting all >>>>> of >>>>> it. >>>>> { >>>>> "activated_slaves": 37.0, >>>>> "build_date": "2016-11-16 01:31:49", >>>>> "build_time": 1479259909.0, >>>>> "build_user": "centos", >>>>> "completed_frameworks": [ >>>>> { >>>>> "active": true, >>>>> .......... >>>>> >>>>> >>>>> >>>>> "start_time": 1480967418.42687, >>>>> "unregistered_frameworks": [], >>>>> "version": "1.1.0" >>>>> } >>>>> >>>>> Is it something to do with a stale state of mesos anywhere or the way >>>>> I'm passing cluster? I have a config file named cluster in >>>>> /etc/mesos-master/ and when I restart the cluster it picks up the config >>>>> files. >>>>> >>>>> On Mon, Dec 5, 2016 at 6:24 PM, haosdent <[email protected]> wrote: >>>>> >>>>>> Hi, @Haripriya >>>>>> >>>>>> > (less than 1 min though the jobs are running just fine). >>>>>> > Is there any new configuration that has to be added? >>>>>> >>>>>> We change to use JSONP to send requests in WebUI since 1.0 May I have >>>>>> your error log in Safari, Chrome and Firefox? >>>>>> You could open it via https://developers.google. >>>>>> com/web/tools/chrome-devtools/console/ >>>>>> >>>>>> > The UI does not display the name of the cluster despite using the >>>>>> --cluster flag. >>>>>> --cluster flag works fine for me. May you paste your `/master/state` >>>>>> endpoint at the email, I would like to check the value of `cluster` field >>>>>> in it. >>>>>> >>>>>> On Tue, Dec 6, 2016 at 5:34 AM, Haripriya Ayyalasomayajula < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I have two issues with the web UI in Mesos 1.1 >>>>>>> >>>>>>> 1. >>>>>>> >>>>>>> Earlier when I was using Mesos 0.28, mesos web UI would try to >>>>>>> reconnect only when there are network issues or when there is a newly >>>>>>> elected leader. After upgrade to 1.1, we see that it won't work (shows >>>>>>> no >>>>>>> leader is elected even when there is a leader elected and jobs are >>>>>>> running >>>>>>> happily ) on safari, works on chrome and firefox but tries to re-connect >>>>>>> very often (less than 1 min though the jobs are running just fine). >>>>>>> >>>>>>> Is there any new configuration that has to be added? >>>>>>> >>>>>>> >>>>>>> 2. The UI does not display the name of the cluster despite using the >>>>>>> --cluster flag. >>>>>>> >>>>>>> /usr/sbin/mesos-master --zk=zk://mesos1:2181,mesos2:2 >>>>>>> 181,mesos3:2181/mesos --port=5050 --log_dir=/var/log/mesos >>>>>>> --acls=/etc/mesos_acls.json --authenticate_frameworks=true >>>>>>> --cluster="cluster1" --credentials=/etc/auth/credentials --quorum=2 >>>>>>> --work_dir=/var/lib/mesos >>>>>>> >>>>>>> >>>>>>> I also tried adding the name of the cluster without quotes: cluster1 >>>>>>> instead of "cluster1", but that doesn't work either. >>>>>>> >>>>>>> /usr/sbin/mesos-master --zk=zk://mesos1:2181,mesos2:2 >>>>>>> 181,mesos3:2181/mesos --port=5050 --log_dir=/var/log/mesos --acl >>>>>>> s=/etc/mesos_acls.json --authenticate_frameworks=true >>>>>>> --cluster=cluster1 --credentials=/etc/auth/credentials --quorum=2 >>>>>>> --work_dir=/var/lib/mesos >>>>>>> I greatly appreciate any help! >>>>>>> >>>>>>> -- >>>>>>> Thanks, >>>>>>> Haripriya >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Best Regards, >>>>>> Haosdent Huang >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Thanks, >>>>> Haripriya >>>>> >>>>> >>>> >>>> >>>> -- >>>> Best Regards, >>>> Haosdent Huang >>>> >>> >>> >>> >>> -- >>> Regards, >>> Haripriya Ayyalasomayajula >>> >>> >> >> >> -- >> Best Regards, >> Haosdent Huang >> > > > > -- > Regards, > Haripriya Ayyalasomayajula > > -- Best Regards, Haosdent Huang

