BTW, using "lsof" command finds there are only 16 file descriptors. I don't know why Mesos master try to close "fd 17". Best Regards Nan Xiao
On Tue, Dec 29, 2015 at 11:32 AM, Nan Xiao <[email protected]> wrote: > Hi Klaus, > > Firstly, thanks very much for your answer! > > The km processes are all live: > root 129474 128024 2 22:26 pts/0 00:00:00 km apiserver > --address=15.242.100.60 --etcd-servers=http://15.242.100.60:4001 > --service-cluster-ip-range=10.10.10.0/24 --port=8888 > --cloud-provider=mesos --cloud-config=mesos-cloud.conf --secure-port=0 > --v=1 > root 129509 128024 2 22:26 pts/0 00:00:00 km > controller-manager --master=15.242.100.60:8888 --cloud-provider=mesos > --cloud-config=./mesos-cloud.conf --v=1 > root 129538 128024 0 22:26 pts/0 00:00:00 km scheduler > --address=15.242.100.60 --mesos-master=15.242.100.56:5050 > --etcd-servers=http://15.242.100.60:4001 --mesos-user=root > --api-servers=15.242.100.60:8888 --cluster-dns=10.10.10.10 > --cluster-domain=cluster.local --v=2 > > All the logs are also seem OK, except the logs from scheduler.log: > ...... > I1228 22:26:37.883092 129538 messenger.go:381] Receiving message > mesos.internal.InternalMasterChangeDetected from > scheduler(1)@15.242.100.60:33077 > I1228 22:26:37.883225 129538 scheduler.go:374] New master > [email protected]:5050 detected > I1228 22:26:37.883268 129538 scheduler.go:435] No credentials were > provided. Attempting to register scheduler without authentication. > I1228 22:26:37.883356 129538 scheduler.go:928] Registering with > master: [email protected]:5050 > I1228 22:26:37.883460 129538 messenger.go:187] Sending message > mesos.internal.RegisterFrameworkMessage to [email protected]:5050 > I1228 22:26:37.883504 129538 scheduler.go:881] will retry > registration in 1.209320575s if necessary > I1228 22:26:37.883758 129538 http_transporter.go:193] Sending message > to [email protected]:5050 via http > I1228 22:26:37.883873 129538 http_transporter.go:587] libproc target > URL http://15.242.100.56:5050/master/mesos.internal.RegisterFrameworkMessage > I1228 22:26:39.093560 129538 scheduler.go:928] Registering with > master: [email protected]:5050 > I1228 22:26:39.093659 129538 messenger.go:187] Sending message > mesos.internal.RegisterFrameworkMessage to [email protected]:5050 > I1228 22:26:39.093702 129538 scheduler.go:881] will retry > registration in 3.762036352s if necessary > I1228 22:26:39.093765 129538 http_transporter.go:193] Sending message > to [email protected]:5050 via http > I1228 22:26:39.093847 129538 http_transporter.go:587] libproc target > URL http://15.242.100.56:5050/master/mesos.internal.RegisterFrameworkMessage > ...... > > From the log, the Mesos master rejected the k8s's registeration, and > k8s retry constantly. > > Have you met this issue before? Thanks very much in advance! > Best Regards > Nan Xiao > > > On Mon, Dec 28, 2015 at 7:26 PM, Klaus Ma <[email protected]> wrote: >> It seems Kubernetes is down; would you help to check kubernetes's status >> (km)? >> >> ---- >> Da (Klaus), Ma (马达) | PMP® | Advisory Software Engineer >> Platform Symphony/DCOS Development & Support, STG, IBM GCG >> +86-10-8245 4084 | [email protected] | http://k82.me >> >> On Mon, Dec 28, 2015 at 6:35 PM, Nan Xiao <[email protected]> wrote: >>> >>> Hi all, >>> >>> Greetings from me! >>> >>> I am trying to follow this tutorial >>> >>> (https://github.com/kubernetes/kubernetes/blob/master/docs/getting-started-guides/mesos.md) >>> to deploy "k8s on Mesos" on local machines: The k8s is the newest >>> master branch, and Mesos is the 0.26 edition. >>> >>> After running Mesos master(IP:15.242.100.56), Mesos >>> slave(IP:15.242.100.16),, and the k8s(IP:15.242.100.60), I can see the >>> following logs from Mesos master: >>> >>> ...... >>> I1227 22:52:34.494478 8069 master.cpp:4269] Received update of slave >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-S0 at slave(1)@15.242.100.16:5051 >>> (pqsfc016.ftc.rdlabs.hpecorp.net) with total oversubscribed resources >>> I1227 22:52:34.494940 8065 hierarchical.cpp:400] Slave >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-S0 >>> (pqsfc016.ftc.rdlabs.hpecorp.net) updated with oversubscribed >>> resources (total: cpus(*):32; mem(*):127878; disk(*):4336; >>> ports(*):[31000-32000], allocated: ) >>> I1227 22:53:06.740757 8053 http.cpp:334] HTTP GET for >>> /master/state.json from 15.242.100.60:56219 with >>> User-Agent='Go-http-client/1.1' >>> I1227 22:53:07.736419 8065 http.cpp:334] HTTP GET for >>> /master/state.json from 15.242.100.60:56241 with >>> User-Agent='Go-http-client/1.1' >>> I1227 22:53:07.767196 8070 http.cpp:334] HTTP GET for >>> /master/state.json from 15.242.100.60:56252 with >>> User-Agent='Go-http-client/1.1' >>> I1227 22:53:08.808171 8053 http.cpp:334] HTTP GET for >>> /master/state.json from 15.242.100.60:56272 with >>> User-Agent='Go-http-client/1.1' >>> I1227 22:53:08.815811 8060 master.cpp:2176] Received SUBSCRIBE call >>> for framework 'Kubernetes' at scheduler(1)@15.242.100.60:59488 >>> I1227 22:53:08.816182 8060 master.cpp:2247] Subscribing framework >>> Kubernetes with checkpointing enabled and capabilities [ ] >>> I1227 22:53:08.817294 8052 hierarchical.cpp:195] Added framework >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-0000 >>> I1227 22:53:08.817464 8050 master.cpp:1122] Framework >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-0000 (Kubernetes) at >>> scheduler(1)@15.242.100.60:59488 disconnected >>> E1227 22:53:08.817497 8073 process.cpp:1911] Failed to shutdown >>> socket with fd 17: Transport endpoint is not connected >>> I1227 22:53:08.817533 8050 master.cpp:2472] Disconnecting framework >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-0000 (Kubernetes) at >>> scheduler(1)@15.242.100.60:59488 >>> I1227 22:53:08.817595 8050 master.cpp:2496] Deactivating framework >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-0000 (Kubernetes) at >>> scheduler(1)@15.242.100.60:59488 >>> I1227 22:53:08.817797 8050 master.cpp:1146] Giving framework >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-0000 (Kubernetes) at >>> scheduler(1)@15.242.100.60:59488 7625.14222623576weeks to failover >>> W1227 22:53:08.818389 8062 master.cpp:4840] Master returning >>> resources offered to framework >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-0000 because the framework has >>> terminated or is inactive >>> I1227 22:53:08.818397 8052 hierarchical.cpp:273] Deactivated >>> framework 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-0000 >>> I1227 22:53:08.819046 8066 hierarchical.cpp:744] Recovered >>> cpus(*):32; mem(*):127878; disk(*):4336; ports(*):[31000-32000] >>> (total: cpus(*):32; mem(*):127878; disk(*):4336; >>> ports(*):[31000-32000], allocated: ) on slave >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-S0 from framework >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-0000 >>> ...... >>> >>> I can't figure out why Mesos master complains "Failed to shutdown >>> socket with fd 17: Transport endpoint is not connected". >>> Could someone give some clues on this issue? >>> >>> Thanks very much in advance! >>> >>> Best Regards >>> Nan Xiao >> >>

