Just as a general note, especially when dealing with VM's, NTP is *required* otherwise we've found you can get out of snyc'd updates which can result in some pretty strange behavior.
----- Original Message ----- > From: "Brian Devins" <[email protected]> > To: [email protected] > Sent: Wednesday, October 15, 2014 11:57:19 AM > Subject: Re: Connecting spark from a different Machine to mesos cluster > Also Johannes, is there a network segment between Spark and the Mesos master? > This looks like behavior I have seen before when the Master cannot connect > back to the framework. The master also needs to be able to reach the Spark > machine by IP > From: Tim Chen < [email protected] > > Reply-To: " [email protected] " < [email protected] > > Date: Wednesday, October 15, 2014 at 12:52 PM > To: " [email protected] " < [email protected] > > Subject: Re: Connecting spark from a different Machine to mesos cluster > Hi Johannes, > When you started your 2nd shell, what log output from the slave do you see > for that framework? > Master seems to think it's already terminated. > Tim > On Wed, Oct 15, 2014 at 6:31 AM, Johannes Schillinger (Intern) < > [email protected] > wrote: > > Hi Tim, > > > We are running Spark 1.1.0 with Hadoop 2.4. Mesos is in Version 0.20.1 all > > in > > binary releases. > > > The Spark console is running in default mode, which is fine grained. > > > The Spark process is started from a physical Machine running Ubuntu, the > > Mesos nodes are running in VMs also in Ubuntu. > > > This is the output of the Spark Shell: > > > -------------------------------------------------------------------------------------------------------------------------------- > > > Spark assembly has been built with Hive, including Datanucleus jars on > > classpath > > > Using Spark's default log4j profile: > > org/apache/spark/log4j-defaults.properties > > > 14/10/15 15:18:24 INFO SecurityManager: Changing view acls to: USERNAME, > > > 14/10/15 15:18:24 INFO SecurityManager: Changing modify acls to: USERNAME, > > > 14/10/15 15:18:24 INFO SecurityManager: SecurityManager: authentication > > disabled; ui acls disabled; users with view permissions: Set(USERNAME, ); > > users with modify permissions: Set(USERNAME, ) > > > 14/10/15 15:18:24 INFO HttpServer: Starting HTTP Server > > > 14/10/15 15:18:24 INFO Utils: Successfully started service 'HTTP class > > server' on port 42469. > > > Welcome to > > > ____ __ > > > / __/__ ___ _____/ /__ > > > _\ \/ _ \/ _ `/ __/ '_/ > > > /___/ .__/\_,_/_/ /_/\_\ version 1.1.0 > > > /_/ > > > Using Scala version 2.10.4 (OpenJDK 64-Bit Server VM, Java 1.7.0_65) > > > Type in expressions to have them evaluated. > > > Type :help for more information. > > > 14/10/15 15:18:26 WARN Utils: Your hostname, karwjohannes01 resolves to a > > loopback address: 127.0.1.1; using CLIENT_IP instead (on interface eth0) > > > 14/10/15 15:18:26 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to > > another address > > > 14/10/15 15:18:27 INFO SecurityManager: Changing view acls to: USERNAME, > > > 14/10/15 15:18:27 INFO SecurityManager: Changing modify acls to: USERNAME, > > > 14/10/15 15:18:27 INFO SecurityManager: SecurityManager: authentication > > disabled; ui acls disabled; users with view permissions: Set(USERNAME, ); > > users with modify permissions: Set(USERNAME, ) > > > 14/10/15 15:18:27 INFO Slf4jLogger: Slf4jLogger started > > > 14/10/15 15:18:27 INFO Remoting: Starting remoting > > > 14/10/15 15:18:27 INFO Remoting: Remoting started; listening on addresses > > :[akka.tcp://sparkDriver@CLIENT_IP:51879] > > > 14/10/15 15:18:27 INFO Remoting: Remoting now listens on addresses: > > [akka.tcp://sparkDriver@CLIENT_IP:51879] > > > 14/10/15 15:18:27 INFO Utils: Successfully started service 'sparkDriver' on > > port 51879. > > > 14/10/15 15:18:27 INFO SparkEnv: Registering MapOutputTracker > > > 14/10/15 15:18:27 INFO SparkEnv: Registering BlockManagerMaster > > > 14/10/15 15:18:27 INFO DiskBlockManager: Created local directory at > > /tmp/spark-local-20141015151827-1a2e > > > 14/10/15 15:18:27 INFO Utils: Successfully started service 'Connection > > manager for block manager' on port 60963. > > > 14/10/15 15:18:27 INFO ConnectionManager: Bound socket to port 60963 with > > id > > = ConnectionManagerId(CLIENT_IP,60963) > > > 14/10/15 15:18:27 INFO MemoryStore: MemoryStore started with capacity 265.4 > > MB > > > 14/10/15 15:18:27 INFO BlockManagerMaster: Trying to register BlockManager > > > 14/10/15 15:18:27 INFO BlockManagerMasterActor: Registering block manager > > CLIENT_IP:60963 with 265.4 MB RAM > > > 14/10/15 15:18:27 INFO BlockManagerMaster: Registered BlockManager > > > 14/10/15 15:18:27 INFO HttpFileServer: HTTP File server directory is > > /tmp/spark-b032c76c-93e1-473e-802c-c55e12e85d41 > > > 14/10/15 15:18:27 INFO HttpServer: Starting HTTP Server > > > 14/10/15 15:18:27 INFO Utils: Successfully started service 'HTTP file > > server' > > on port 47989. > > > 14/10/15 15:18:27 INFO Utils: Successfully started service 'SparkUI' on > > port > > 4040. > > > 14/10/15 15:18:27 INFO SparkUI: Started SparkUI at http://CLIENT_IP:4040 > > > 14/10/15 15:18:27 WARN NativeCodeLoader: Unable to load native-hadoop > > library > > for your platform... using builtin-java classes where applicable > > > I1015 15:18:28.524736 4748 sched.cpp:139] Version: 0.20.1 > > > I1015 15:18:28.527180 4750 sched.cpp:235] New master detected at > > master@MESOS_MASTER_IP:5050 > > > I1015 15:18:28.527300 4750 sched.cpp:243] No credentials provided. > > Attempting > > to register without authentication > > > -------------------------------------------------------------------------------------------------------------------------------- > > > Mesos master WARNING log: > > > W1015 14:13:00.235213 1118 master.cpp:3452] Master returning resources > > offered to framework 20141007-102213-343139338-5050-1037-3490 because the > > framework has terminated or is inactive > > > W1015 14:13:35.244055 1121 master.cpp:3452] Master returning resources > > offered to framework 20141007-102213-343139338-5050-1037-3525 because the > > framework has terminated or is inactive > > > W1015 14:13:50.252436 1121 master.cpp:3452] Master returning resources > > offered to framework 20141007-102213-343139338-5050-1037-3540 because the > > framework has terminated or is inactive > > > W1015 14:14:05.252708 1117 master.cpp:3452] Master returning resources > > offered to framework 20141007-102213-343139338-5050-1037-3555 because the > > framework has terminated or is inactive > > > Mesos slave WARNING log : > > > W1015 13:58:19.103196 1211 slave.cpp:1421] Cannot shut down unknown > > framework > > 20141007-102213-343139338-5050-1037-3116 > > > W1015 13:58:20.104650 1210 slave.cpp:1421] Cannot shut down unknown > > framework > > 20141007-102213-343139338-5050-1037-3117 > > > W1015 13:58:21.119839 1211 slave.cpp:1421] Cannot shut down unknown > > framework > > 20141007-102213-343139338-5050-1037-3118 > > > W1015 13:58:22.115965 1210 slave.cpp:1421] Cannot shut down unknown > > framework > > 20141007-102213-343139338-5050-1037-3119 > > > W1015 13:58:23.104925 1211 slave.cpp:1421] Cannot shut down unknown > > framework > > 20141007-102213-343139338-5050-1037-3120 > > > W1015 13:58:24.104652 1210 slave.cpp:1421] Cannot shut down unknown > > framework > > 20141007-102213-343139338-5050-1037-3121 > > > W1015 13:58:59.853744 1212 slave.cpp:1421] Cannot shut down unknown > > framework > > 20141007-102213-343139338-5050-1037-3122 > > > W1015 13:59:00.853086 1214 slave.cpp:1421] Cannot shut down unknown > > framework > > 20141007-102213-343139338-5050-1037-3123 > > > W1015 13:59:01.853137 1212 slave.cpp:1421] Cannot shut down unknown > > framework > > 20141007-102213-343139338-5050-1037-3124 > > > W1015 13:59:03.318259 1214 slave.cpp:1421] Cannot shut down unknown > > framework > > 20141007-102213-343139338-5050-1037-3029 > > > I hope this information helps, please ask if you have any more questions > > and > > thank you for your help! > > > Johannes > > > From: Tim St Clair [mailto: [email protected] ] > > > Sent: Mittwoch, 15. Oktober 2014 15:11 > > > To: [email protected] > > > Subject: Re: Connecting spark from a different Machine to mesos cluster > > > Details? > > > 1. What versions are you running? > > > 2. Fine grained mode or Course Gained? > > > 3. Are you running in VM's? > > > Logs always help too. > > > Cheers, > > > Tim > > > > From: "Johannes Schillinger (Intern)" < [email protected] > > > > > > > To: [email protected] > > > > > > Sent: Wednesday, October 15, 2014 7:42:36 AM > > > > > > Subject: Connecting spark from a different Machine to mesos cluster > > > > > > Hi, > > > > > > we are currently trying to get a mesos cluster running as a base for > > > Spark. > > > > > > The mesos cluster itself runs and connecting a spark shell from the > > > machine > > > the maser runs on works perfectly. > > > > > > We can see the Framework being started and the slaves working. > > > > > > If we try to connect the exact same shell from a different machine to the > > > exact same cluster the screen stops at > > > > > > … 4013 sched.cpp:243] No credentials provided. Attempting to register > > > without > > > authentication > > > > > > The cluster spins up a framework every two seconds with a new ID and > > > stops > > > it > > > immediately. This continues (we stopped it after a few dozen starts). > > > > > > We can see the frameworks being started in the master- and slave-logs as > > > well > > > as the command of the master to terminate it. > > > > > > Has anyone ever encountered a similar problem or has any advice on > > > solving > > > this problem? > > > > > > Thanks! > > > > > > Johannes > > > > > -- > > > Cheers, > > > Timothy St. Clair > > > Red Hat Inc. > > Brian Devins | Java Developer > [email protected] -- Cheers, Timothy St. Clair Red Hat Inc.

