How big is your data set? Did you set SPARK_MEM and SPARK_WORKER_MEMORY environmental variables?
On Thu, Dec 12, 2013 at 9:07 AM, Walrus theCat <[email protected]>wrote: > Hi all, > > I've had smashing success with Spark 0.7.x with this code, and this same > code on Spark 0.8.0 using a smaller data set. However, when I try to use a > larger data set, some strange behavior occurs. > > I'm trying to do L2 regularization with Logistic Regression using the new > ML Lib. > > Reading through the logs, everything looks and works fine with the smaller > data set. The larger data set, which works just fine with Spark 0.7.x, > evidences some bizarre behavior. 8 of my 25 slaves had STDERR logs that > looked something like this (only the command they should have executed): > > Spark Executor Command: "java" "-cp" > ":/root/jars/aspectjrt.jar:/root/jars/aspectjweaver.jar:/root/jars/aws-java-sdk-1.4.5.jar:/root/jars/aws-java-sdk-1.4.5-javadoc.jar:/root/jars/aws-java-sdk-1.4.5-sources.jar:/root/jars/aws-java-sdk-flow-build-tools-1.4.5.jar:/root/jars/commons-codec-1.3.jar:/root/jars/commons-logging-1.1.1.jar:/root/jars/freemarker-2.3.18.jar:/root/jars/httpclient-4.1.1.jar:/root/jars/httpcore-4.1.jar:/root/jars/jackson-core-asl-1.8.7.jar:/root/jars/mail-1.4.3.jar:/root/jars/spring-beans-3.0.7.jar:/root/jars/spring-context-3.0.7.jar:/root/jars/spring-core-3.0.7.jar:/root/jars/stax-1.2.0.jar:/root/jars/stax-api-1.0.1.jar:/root/spark/conf:/root/spark/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.0-incubating-hadoop1.0.4.jar" > "-Djava.library.path=/root/ephemeral-hdfs/lib/native/" > "-Dspark.default.parallelism=400" "-Dspark.akka.threads=8" > "-Dspark.local.dir=/mnt/spark" "-Dspark.worker.timeout=60000" > "-Dspark.akka.timeout=60000" > "-Dspark.storage.blockManagerHeartBeatMs=60000" > "-Dspark.akka.retry.wait=60000" "-Dspark.akka.frameSize=10000" "-Xms61G" > "-Xmx61G" "-Dspark.default.parallelism=400" "-Dspark.akka.threads=8" > "-Dspark.local.dir=/mnt/spark" "-Dspark.worker.timeout=60000" > "-Dspark.akka.timeout=60000" > "-Dspark.storage.blockManagerHeartBeatMs=60000" > "-Dspark.akka.retry.wait=60000" "-Dspark.akka.frameSize=10000" "-Xms61G" > "-Xmx61G" "-Dspark.default.parallelism=400" "-Dspark.akka.threads=8" > "-Dspark.local.dir=/mnt/spark" "-Dspark.worker.timeout=60000" > "-Dspark.akka.timeout=60000" > "-Dspark.storage.blockManagerHeartBeatMs=60000" > "-Dspark.akka.retry.wait=60000" "-Dspark.akka.frameSize=10000" "-Xms61G" > "-Xmx61G" "-Xms62464M" "-Xmx62464M" > "org.apache.spark.executor.StandaloneExecutorBackend" > "akka://[email protected]:34981/user/StandaloneScheduler" > "33" "ip-10-33-139-73.ec2.internal" "8" > ======================================== > > > The log starts complaining that it's losing executors and then dies in a > ball of fire, no reference to anything in my code whatsoever. Stack is > below. Please help! > > Thanks > > 13/12/12 16:23:12 INFO scheduler.DAGScheduler: Failed to run reduce at > GradientDescent.scala:144 > Exception in thread "main" org.apache.spark.SparkException: Job failed: > Error: Disconnected from Spark cluster > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:760) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:758) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:758) > at > org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:379) > at org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:441) > at > org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:149) > > > >
