http://stackoverflow.com/questions/24559616/mesos-scheduler-slave-continuously-gets-disconnected
On Wed, Oct 15, 2014 at 9:57 AM, Brian Devins <[email protected]> wrote: > Also Johannes, is there a network segment between Spark and the Mesos > master? This looks like behavior I have seen before when the Master cannot > connect back to the framework. The master also needs to be able to reach > the Spark machine by IP > > From: Tim Chen <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Wednesday, October 15, 2014 at 12:52 PM > To: "[email protected]" <[email protected]> > > Subject: Re: Connecting spark from a different Machine to mesos cluster > > Hi Johannes, > > When you started your 2nd shell, what log output from the slave do you > see for that framework? > > Master seems to think it's already terminated. > > Tim > > On Wed, Oct 15, 2014 at 6:31 AM, Johannes Schillinger (Intern) < > [email protected]> wrote: > >> Hi Tim, >> >> >> >> We are running Spark 1.1.0 with Hadoop 2.4. Mesos is in Version 0.20.1 >> all in binary releases. >> >> >> >> The Spark console is running in default mode, which is fine grained. >> >> >> >> The Spark process is started from a physical Machine running Ubuntu, the >> Mesos nodes are running in VMs also in Ubuntu. >> >> >> >> This is the output of the Spark Shell: >> >> >> >> >> -------------------------------------------------------------------------------------------------------------------------------- >> >> Spark assembly has been built with Hive, including Datanucleus jars on >> classpath >> >> Using Spark's default log4j profile: >> org/apache/spark/log4j-defaults.properties >> >> 14/10/15 15:18:24 INFO SecurityManager: Changing view acls to: USERNAME, >> >> 14/10/15 15:18:24 INFO SecurityManager: Changing modify acls to: USERNAME, >> >> 14/10/15 15:18:24 INFO SecurityManager: SecurityManager: authentication >> disabled; ui acls disabled; users with view permissions: Set(USERNAME, ); >> users with modify permissions: Set(USERNAME, ) >> >> 14/10/15 15:18:24 INFO HttpServer: Starting HTTP Server >> >> 14/10/15 15:18:24 INFO Utils: Successfully started service 'HTTP class >> server' on port 42469. >> >> Welcome to >> >> ____ __ >> >> / __/__ ___ _____/ /__ >> >> _\ \/ _ \/ _ `/ __/ '_/ >> >> /___/ .__/\_,_/_/ /_/\_\ version 1.1.0 >> >> /_/ >> >> >> >> Using Scala version 2.10.4 (OpenJDK 64-Bit Server VM, Java 1.7.0_65) >> >> Type in expressions to have them evaluated. >> >> Type :help for more information. >> >> 14/10/15 15:18:26 WARN Utils: Your hostname, karwjohannes01 resolves to a >> loopback address: 127.0.1.1; using CLIENT_IP instead (on interface eth0) >> >> 14/10/15 15:18:26 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to >> another address >> >> 14/10/15 15:18:27 INFO SecurityManager: Changing view acls to: USERNAME, >> >> 14/10/15 15:18:27 INFO SecurityManager: Changing modify acls to: USERNAME, >> >> 14/10/15 15:18:27 INFO SecurityManager: SecurityManager: authentication >> disabled; ui acls disabled; users with view permissions: Set(USERNAME, ); >> users with modify permissions: Set(USERNAME, ) >> >> 14/10/15 15:18:27 INFO Slf4jLogger: Slf4jLogger started >> >> 14/10/15 15:18:27 INFO Remoting: Starting remoting >> >> 14/10/15 15:18:27 INFO Remoting: Remoting started; listening on addresses >> :[akka.tcp://sparkDriver@CLIENT_IP:51879] >> >> 14/10/15 15:18:27 INFO Remoting: Remoting now listens on addresses: >> [akka.tcp://sparkDriver@CLIENT_IP:51879] >> >> 14/10/15 15:18:27 INFO Utils: Successfully started service 'sparkDriver' >> on port 51879. >> >> 14/10/15 15:18:27 INFO SparkEnv: Registering MapOutputTracker >> >> 14/10/15 15:18:27 INFO SparkEnv: Registering BlockManagerMaster >> >> 14/10/15 15:18:27 INFO DiskBlockManager: Created local directory at >> /tmp/spark-local-20141015151827-1a2e >> >> 14/10/15 15:18:27 INFO Utils: Successfully started service 'Connection >> manager for block manager' on port 60963. >> >> 14/10/15 15:18:27 INFO ConnectionManager: Bound socket to port 60963 with >> id = ConnectionManagerId(CLIENT_IP,60963) >> >> 14/10/15 15:18:27 INFO MemoryStore: MemoryStore started with capacity >> 265.4 MB >> >> 14/10/15 15:18:27 INFO BlockManagerMaster: Trying to register BlockManager >> >> 14/10/15 15:18:27 INFO BlockManagerMasterActor: Registering block manager >> CLIENT_IP:60963 with 265.4 MB RAM >> >> 14/10/15 15:18:27 INFO BlockManagerMaster: Registered BlockManager >> >> 14/10/15 15:18:27 INFO HttpFileServer: HTTP File server directory is >> /tmp/spark-b032c76c-93e1-473e-802c-c55e12e85d41 >> >> 14/10/15 15:18:27 INFO HttpServer: Starting HTTP Server >> >> 14/10/15 15:18:27 INFO Utils: Successfully started service 'HTTP file >> server' on port 47989. >> >> 14/10/15 15:18:27 INFO Utils: Successfully started service 'SparkUI' on >> port 4040. >> >> 14/10/15 15:18:27 INFO SparkUI: Started SparkUI at http://CLIENT_IP:4040 >> >> 14/10/15 15:18:27 WARN NativeCodeLoader: Unable to load native-hadoop >> library for your platform... using builtin-java classes where applicable >> >> I1015 15:18:28.524736 4748 sched.cpp:139] Version: 0.20.1 >> >> I1015 15:18:28.527180 4750 sched.cpp:235] New master detected at >> master@MESOS_MASTER_IP:5050 >> >> I1015 15:18:28.527300 4750 sched.cpp:243] No credentials provided. >> Attempting to register without authentication >> >> >> -------------------------------------------------------------------------------------------------------------------------------- >> >> >> >> Mesos master WARNING log: >> >> W1015 14:13:00.235213 1118 master.cpp:3452] Master returning resources >> offered to framework 20141007-102213-343139338-5050-1037-3490 because the >> framework has terminated or is inactive >> >> W1015 14:13:35.244055 1121 master.cpp:3452] Master returning resources >> offered to framework 20141007-102213-343139338-5050-1037-3525 because the >> framework has terminated or is inactive >> >> W1015 14:13:50.252436 1121 master.cpp:3452] Master returning resources >> offered to framework 20141007-102213-343139338-5050-1037-3540 because the >> framework has terminated or is inactive >> >> W1015 14:14:05.252708 1117 master.cpp:3452] Master returning resources >> offered to framework 20141007-102213-343139338-5050-1037-3555 because the >> framework has terminated or is inactive >> >> >> >> >> >> Mesos slave WARNING log : >> >> >> >> W1015 13:58:19.103196 1211 slave.cpp:1421] Cannot shut down unknown >> framework 20141007-102213-343139338-5050-1037-3116 >> >> W1015 13:58:20.104650 1210 slave.cpp:1421] Cannot shut down unknown >> framework 20141007-102213-343139338-5050-1037-3117 >> >> W1015 13:58:21.119839 1211 slave.cpp:1421] Cannot shut down unknown >> framework 20141007-102213-343139338-5050-1037-3118 >> >> W1015 13:58:22.115965 1210 slave.cpp:1421] Cannot shut down unknown >> framework 20141007-102213-343139338-5050-1037-3119 >> >> W1015 13:58:23.104925 1211 slave.cpp:1421] Cannot shut down unknown >> framework 20141007-102213-343139338-5050-1037-3120 >> >> W1015 13:58:24.104652 1210 slave.cpp:1421] Cannot shut down unknown >> framework 20141007-102213-343139338-5050-1037-3121 >> >> W1015 13:58:59.853744 1212 slave.cpp:1421] Cannot shut down unknown >> framework 20141007-102213-343139338-5050-1037-3122 >> >> W1015 13:59:00.853086 1214 slave.cpp:1421] Cannot shut down unknown >> framework 20141007-102213-343139338-5050-1037-3123 >> >> W1015 13:59:01.853137 1212 slave.cpp:1421] Cannot shut down unknown >> framework 20141007-102213-343139338-5050-1037-3124 >> >> W1015 13:59:03.318259 1214 slave.cpp:1421] Cannot shut down unknown >> framework 20141007-102213-343139338-5050-1037-3029 >> >> >> >> >> >> I hope this information helps, please ask if you have any more questions >> and thank you for your help! >> >> >> >> Johannes >> >> >> >> *From:* Tim St Clair [mailto:[email protected]] >> *Sent:* Mittwoch, 15. Oktober 2014 15:11 >> *To:* [email protected] >> *Subject:* Re: Connecting spark from a different Machine to mesos cluster >> >> >> >> Details? >> >> >> >> 1. What versions are you running? >> >> 2. Fine grained mode or Course Gained? >> >> 3. Are you running in VM's? >> >> >> >> Logs always help too. >> >> >> >> Cheers, >> >> Tim >> >> >> ------------------------------ >> >> *From: *"Johannes Schillinger (Intern)" <[email protected]> >> *To: *[email protected] >> *Sent: *Wednesday, October 15, 2014 7:42:36 AM >> *Subject: *Connecting spark from a different Machine to mesos cluster >> >> >> >> Hi, >> >> >> >> we are currently trying to get a mesos cluster running as a base for >> Spark. >> >> >> >> The mesos cluster itself runs and connecting a spark shell from the >> machine the maser runs on works perfectly. >> >> We can see the Framework being started and the slaves working. >> >> >> >> If we try to connect the exact same shell from a different machine to the >> exact same cluster the screen stops at >> >> >> >> … 4013 sched.cpp:243] No credentials provided. Attempting to register >> without authentication >> >> >> >> The cluster spins up a framework every two seconds with a new ID and >> stops it immediately. This continues (we stopped it after a few dozen >> starts). >> >> >> >> We can see the frameworks being started in the master- and slave-logs as >> well as the command of the master to terminate it. >> >> >> >> Has anyone ever encountered a similar problem or has any advice on >> solving this problem? >> >> >> >> Thanks! >> >> Johannes >> >> >> >> >> >> -- >> >> Cheers, >> Timothy St. Clair >> Red Hat Inc. >> > > > > Brian Devins* |* Java Developer > [email protected] > > [image: Dealer.com] > >

