[jira] [Updated] (SPARK-2586) Lack of information to figure out connection to Tachyon master is inactive/ down
[ https://issues.apache.org/jira/browse/SPARK-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Ash updated SPARK-2586: -- Description: When you running Spark with Tachyon, when the connection to Tachyon master is down (due to problem in network or the Master node is down) there is no clear log or error message to indicate it. Here is sample stack running SparkTachyonPi example with Tachyon connecting: {noformat} 14/07/15 16:43:10 INFO Utils: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 14/07/15 16:43:10 WARN Utils: Your hostname, henry-pivotal.local resolves to a loopback address: 127.0.0.1; using 10.64.5.148 instead (on interface en5) 14/07/15 16:43:10 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 14/07/15 16:43:11 INFO SecurityManager: Changing view acls to: hsaputra 14/07/15 16:43:11 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hsaputra) 14/07/15 16:43:11 INFO Slf4jLogger: Slf4jLogger started 14/07/15 16:43:11 INFO Remoting: Starting remoting 14/07/15 16:43:11 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sp...@office-5-148.pa.gopivotal.com:53203] 14/07/15 16:43:11 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sp...@office-5-148.pa.gopivotal.com:53203] 14/07/15 16:43:11 INFO SparkEnv: Registering MapOutputTracker 14/07/15 16:43:11 INFO SparkEnv: Registering BlockManagerMaster 14/07/15 16:43:11 INFO DiskBlockManager: Created local directory at /var/folders/nv/nsr_3ysj0wgfq93fqp0rdt3wgp/T/spark-local-20140715164311-e63c 14/07/15 16:43:11 INFO ConnectionManager: Bound socket to port 53204 with id = ConnectionManagerId(office-5-148.pa.gopivotal.com,53204) 14/07/15 16:43:11 INFO MemoryStore: MemoryStore started with capacity 2.1 GB 14/07/15 16:43:11 INFO BlockManagerMaster: Trying to register BlockManager 14/07/15 16:43:11 INFO BlockManagerMasterActor: Registering block manager office-5-148.pa.gopivotal.com:53204 with 2.1 GB RAM 14/07/15 16:43:11 INFO BlockManagerMaster: Registered BlockManager 14/07/15 16:43:11 INFO HttpServer: Starting HTTP Server 14/07/15 16:43:11 INFO HttpBroadcast: Broadcast server started at http://10.64.5.148:53205 14/07/15 16:43:11 INFO HttpFileServer: HTTP File server directory is /var/folders/nv/nsr_3ysj0wgfq93fqp0rdt3wgp/T/spark-b2fb12ae-4608-4833-87b6-b335da00738e 14/07/15 16:43:11 INFO HttpServer: Starting HTTP Server 14/07/15 16:43:12 INFO SparkUI: Started SparkUI at http://office-5-148.pa.gopivotal.com:4040 2014-07-15 16:43:12.210 java[39068:1903] Unable to load realm info from SCDynamicStore 14/07/15 16:43:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/07/15 16:43:12 INFO SparkContext: Added JAR examples/target/scala-2.10/spark-examples-1.1.0-SNAPSHOT-hadoop2.4.0.jar at http://10.64.5.148:53206/jars/spark-examples-1.1.0-SNAPSHOT-hadoop2.4.0.jar with timestamp 1405467792813 14/07/15 16:43:12 INFO AppClient$ClientActor: Connecting to master spark://henry-pivotal.local:7077... 14/07/15 16:43:12 INFO SparkContext: Starting job: reduce at SparkTachyonPi.scala:43 14/07/15 16:43:12 INFO DAGScheduler: Got job 0 (reduce at SparkTachyonPi.scala:43) with 2 output partitions (allowLocal=false) 14/07/15 16:43:12 INFO DAGScheduler: Final stage: Stage 0(reduce at SparkTachyonPi.scala:43) 14/07/15 16:43:12 INFO DAGScheduler: Parents of final stage: List() 14/07/15 16:43:12 INFO DAGScheduler: Missing parents: List() 14/07/15 16:43:12 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[1] at map at SparkTachyonPi.scala:39), which has no missing parents 14/07/15 16:43:13 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[1] at map at SparkTachyonPi.scala:39) 14/07/15 16:43:13 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 14/07/15 16:43:13 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20140715164313- 14/07/15 16:43:13 INFO AppClient$ClientActor: Executor added: app-20140715164313-/0 on worker-20140715164009-office-5-148.pa.gopivotal.com-52519 (office-5-148.pa.gopivotal.com:52519) with 8 cores 14/07/15 16:43:13 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140715164313-/0 on hostPort office-5-148.pa.gopivotal.com:52519 with 8 cores, 512.0 MB RAM 14/07/15 16:43:13 INFO AppClient$ClientActor: Executor updated: app-20140715164313-/0 is now RUNNING 14/07/15 16:43:15 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkexecu...@office-5-148.pa.gopivotal.com:53213/user/Executor#-423405256] with ID 0 14/07/15 16:43:15 INFO TaskSetManager: Re-computing pending task lists. 14/07/15 16:43:15 INFO TaskSetManager: Starting task 0.0:0 as TID 0 on executor 0: office-5-148.pa.gopivotal.com (PROCESS_LOCAL) 14/07/15 16:43:15 INFO
[jira] [Updated] (SPARK-2586) Lack of information to figure out connection to Tachyon master is inactive/ down
[ https://issues.apache.org/jira/browse/SPARK-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Ash updated SPARK-2586: -- Description: When you running Spark with Tachyon, when the connection to Tachyon master is down (due to problem in network or the Master node is down) there is no clear log or error message to indicate it. Here is sample stack running SparkTachyonPi example with Tachyon connecting: {noformat} 14/07/15 16:43:10 INFO Utils: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 14/07/15 16:43:10 WARN Utils: Your hostname, henry-pivotal.local resolves to a loopback address: 127.0.0.1; using 10.64.5.148 instead (on interface en5) 14/07/15 16:43:10 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 14/07/15 16:43:11 INFO SecurityManager: Changing view acls to: hsaputra 14/07/15 16:43:11 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hsaputra) 14/07/15 16:43:11 INFO Slf4jLogger: Slf4jLogger started 14/07/15 16:43:11 INFO Remoting: Starting remoting 14/07/15 16:43:11 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sp...@office-5-148.pa.gopivotal.com:53203] 14/07/15 16:43:11 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sp...@office-5-148.pa.gopivotal.com:53203] 14/07/15 16:43:11 INFO SparkEnv: Registering MapOutputTracker 14/07/15 16:43:11 INFO SparkEnv: Registering BlockManagerMaster 14/07/15 16:43:11 INFO DiskBlockManager: Created local directory at /var/folders/nv/nsr_3ysj0wgfq93fqp0rdt3wgp/T/spark-local-20140715164311-e63c 14/07/15 16:43:11 INFO ConnectionManager: Bound socket to port 53204 with id = ConnectionManagerId(office-5-148.pa.gopivotal.com,53204) 14/07/15 16:43:11 INFO MemoryStore: MemoryStore started with capacity 2.1 GB 14/07/15 16:43:11 INFO BlockManagerMaster: Trying to register BlockManager 14/07/15 16:43:11 INFO BlockManagerMasterActor: Registering block manager office-5-148.pa.gopivotal.com:53204 with 2.1 GB RAM 14/07/15 16:43:11 INFO BlockManagerMaster: Registered BlockManager 14/07/15 16:43:11 INFO HttpServer: Starting HTTP Server 14/07/15 16:43:11 INFO HttpBroadcast: Broadcast server started at http://10.64.5.148:53205 14/07/15 16:43:11 INFO HttpFileServer: HTTP File server directory is /var/folders/nv/nsr_3ysj0wgfq93fqp0rdt3wgp/T/spark-b2fb12ae-4608-4833-87b6-b335da00738e 14/07/15 16:43:11 INFO HttpServer: Starting HTTP Server 14/07/15 16:43:12 INFO SparkUI: Started SparkUI at http://office-5-148.pa.gopivotal.com:4040 2014-07-15 16:43:12.210 java[39068:1903] Unable to load realm info from SCDynamicStore 14/07/15 16:43:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/07/15 16:43:12 INFO SparkContext: Added JAR examples/target/scala-2.10/spark-examples-1.1.0-SNAPSHOT-hadoop2.4.0.jar at http://10.64.5.148:53206/jars/spark-examples-1.1.0-SNAPSHOT-hadoop2.4.0.jar with timestamp 1405467792813 14/07/15 16:43:12 INFO AppClient$ClientActor: Connecting to master spark://henry-pivotal.local:7077... 14/07/15 16:43:12 INFO SparkContext: Starting job: reduce at SparkTachyonPi.scala:43 14/07/15 16:43:12 INFO DAGScheduler: Got job 0 (reduce at SparkTachyonPi.scala:43) with 2 output partitions (allowLocal=false) 14/07/15 16:43:12 INFO DAGScheduler: Final stage: Stage 0(reduce at SparkTachyonPi.scala:43) 14/07/15 16:43:12 INFO DAGScheduler: Parents of final stage: List() 14/07/15 16:43:12 INFO DAGScheduler: Missing parents: List() 14/07/15 16:43:12 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[1] at map at SparkTachyonPi.scala:39), which has no missing parents 14/07/15 16:43:13 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[1] at map at SparkTachyonPi.scala:39) 14/07/15 16:43:13 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 14/07/15 16:43:13 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20140715164313- 14/07/15 16:43:13 INFO AppClient$ClientActor: Executor added: app-20140715164313-/0 on worker-20140715164009-office-5-148.pa.gopivotal.com-52519 (office-5-148.pa.gopivotal.com:52519) with 8 cores 14/07/15 16:43:13 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140715164313-/0 on hostPort office-5-148.pa.gopivotal.com:52519 with 8 cores, 512.0 MB RAM 14/07/15 16:43:13 INFO AppClient$ClientActor: Executor updated: app-20140715164313-/0 is now RUNNING 14/07/15 16:43:15 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkexecu...@office-5-148.pa.gopivotal.com:53213/user/Executor#-423405256] with ID 0 14/07/15 16:43:15 INFO TaskSetManager: Re-computing pending task lists. 14/07/15 16:43:15 INFO TaskSetManager: Starting task 0.0:0 as TID 0 on executor 0: office-5-148.pa.gopivotal.com (PROCESS_LOCAL) 14/07/15 16:43:15 INFO
[jira] [Updated] (SPARK-2586) Lack of information to figure out connection to Tachyon master is inactive/ down
[ https://issues.apache.org/jira/browse/SPARK-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Saputra updated SPARK-2586: - Labels: tachyon (was: ) Lack of information to figure out connection to Tachyon master is inactive/ down Key: SPARK-2586 URL: https://issues.apache.org/jira/browse/SPARK-2586 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Henry Saputra Labels: tachyon When you running Spark with Tachyon, when the connection to Tachyon master is down (due to problem in network or the Master node is down) there is no clear log or error message to indicate it. Here is sample stack running SparkTachyonPi example with Tachyon connecting: 14/07/15 16:43:10 INFO Utils: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 14/07/15 16:43:10 WARN Utils: Your hostname, henry-pivotal.local resolves to a loopback address: 127.0.0.1; using 10.64.5.148 instead (on interface en5) 14/07/15 16:43:10 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 14/07/15 16:43:11 INFO SecurityManager: Changing view acls to: hsaputra 14/07/15 16:43:11 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hsaputra) 14/07/15 16:43:11 INFO Slf4jLogger: Slf4jLogger started 14/07/15 16:43:11 INFO Remoting: Starting remoting 14/07/15 16:43:11 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sp...@office-5-148.pa.gopivotal.com:53203] 14/07/15 16:43:11 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sp...@office-5-148.pa.gopivotal.com:53203] 14/07/15 16:43:11 INFO SparkEnv: Registering MapOutputTracker 14/07/15 16:43:11 INFO SparkEnv: Registering BlockManagerMaster 14/07/15 16:43:11 INFO DiskBlockManager: Created local directory at /var/folders/nv/nsr_3ysj0wgfq93fqp0rdt3wgp/T/spark-local-20140715164311-e63c 14/07/15 16:43:11 INFO ConnectionManager: Bound socket to port 53204 with id = ConnectionManagerId(office-5-148.pa.gopivotal.com,53204) 14/07/15 16:43:11 INFO MemoryStore: MemoryStore started with capacity 2.1 GB 14/07/15 16:43:11 INFO BlockManagerMaster: Trying to register BlockManager 14/07/15 16:43:11 INFO BlockManagerMasterActor: Registering block manager office-5-148.pa.gopivotal.com:53204 with 2.1 GB RAM 14/07/15 16:43:11 INFO BlockManagerMaster: Registered BlockManager 14/07/15 16:43:11 INFO HttpServer: Starting HTTP Server 14/07/15 16:43:11 INFO HttpBroadcast: Broadcast server started at http://10.64.5.148:53205 14/07/15 16:43:11 INFO HttpFileServer: HTTP File server directory is /var/folders/nv/nsr_3ysj0wgfq93fqp0rdt3wgp/T/spark-b2fb12ae-4608-4833-87b6-b335da00738e 14/07/15 16:43:11 INFO HttpServer: Starting HTTP Server 14/07/15 16:43:12 INFO SparkUI: Started SparkUI at http://office-5-148.pa.gopivotal.com:4040 2014-07-15 16:43:12.210 java[39068:1903] Unable to load realm info from SCDynamicStore 14/07/15 16:43:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/07/15 16:43:12 INFO SparkContext: Added JAR examples/target/scala-2.10/spark-examples-1.1.0-SNAPSHOT-hadoop2.4.0.jar at http://10.64.5.148:53206/jars/spark-examples-1.1.0-SNAPSHOT-hadoop2.4.0.jar with timestamp 1405467792813 14/07/15 16:43:12 INFO AppClient$ClientActor: Connecting to master spark://henry-pivotal.local:7077... 14/07/15 16:43:12 INFO SparkContext: Starting job: reduce at SparkTachyonPi.scala:43 14/07/15 16:43:12 INFO DAGScheduler: Got job 0 (reduce at SparkTachyonPi.scala:43) with 2 output partitions (allowLocal=false) 14/07/15 16:43:12 INFO DAGScheduler: Final stage: Stage 0(reduce at SparkTachyonPi.scala:43) 14/07/15 16:43:12 INFO DAGScheduler: Parents of final stage: List() 14/07/15 16:43:12 INFO DAGScheduler: Missing parents: List() 14/07/15 16:43:12 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[1] at map at SparkTachyonPi.scala:39), which has no missing parents 14/07/15 16:43:13 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[1] at map at SparkTachyonPi.scala:39) 14/07/15 16:43:13 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 14/07/15 16:43:13 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20140715164313- 14/07/15 16:43:13 INFO AppClient$ClientActor: Executor added: app-20140715164313-/0 on worker-20140715164009-office-5-148.pa.gopivotal.com-52519 (office-5-148.pa.gopivotal.com:52519) with 8 cores 14/07/15 16:43:13 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140715164313-/0 on hostPort office-5-148.pa.gopivotal.com:52519 with 8 cores, 512.0 MB RAM 14/07/15