Re: Spark streaming job failing after some time.
I have figured out the problem here. Turned out that there was a problem with my SparkConf when I was running my application with yarn in cluster mode. I was setting my master to be local[4] inside my application, whereas I was setting it to yarn-cluster with spark-submit. Now I have changed my SparkConf in my application to not to hardcore master and it works. The application was running for some time since yarn application master attempts retry for maxNumTries and waits between each retry before it completely fails. I was getting appropriate results from my streaming job during this time. Now, I can't figure out as to why it should run successfully during this time even if it could not find SparkContext. I am sure there should be good reason behind this behavior. Anyone has any idea on this? Thanks, Pankaj Channe On Saturday, November 22, 2014, pankaj channe wrote: > Thanks Akhil for your input. > > I have already tried with 3 executors and it still results into the same > problem. So as Sean mentioned, the problem does not seem to be related to > that. > > > On Sat, Nov 22, 2014 at 11:00 AM, Sean Owen wrote: > >> That doesn't seem to be the problem though. It processes but then stops. >> Presumably there are many executors. >> On Nov 22, 2014 9:40 AM, "Akhil Das" wrote: >> >>> For Spark streaming, you must always set *--executor-cores* to a value >>> which is >= 2. Or else it will not do any processing. >>> >>> Thanks >>> Best Regards >>> >>> On Sat, Nov 22, 2014 at 8:39 AM, pankaj channe >>> wrote: >>> I have seen similar posts on this issue but could not find solution. Apologies if this has been discussed here before. I am running a spark streaming job with yarn on a 5 node cluster. I am using following command to submit my streaming job. spark-submit --class class_name --master yarn-cluster --num-executors 1 --driver-memory 1g --executor-memory 1g --executor-cores 1 my_app.jar After running for some time, the job stops. The application log shows following two errors: 14/11/21 22:05:04 WARN yarn.ApplicationMaster: Unable to retrieve SparkContext in spite of waiting for 10, maxNumTries = 10 Exception in thread "main" java.lang.NullPointerException at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkContextInitialized(ApplicationMaster.scala:218) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:107) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:410) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:53) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:52) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:409) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) and later... Failed to list files for dir: /data2/hadoop/yarn/local/usercache/user_name/appcache/application_1416332002106_0009/spark-local-20141121220325-b529/20 at org.apache.spark.util.Utils$.listFilesSafely(Utils.scala:673) at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:685) at org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:686) at org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:685) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34) at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:685) at org.apache.spark.storage.DiskBlockManager$$anonfun$stop$1.apply(DiskBlockManager.scala:181) at org.apache.spark.storage.DiskBlockManager$$anonfun$stop$1.apply(DiskBlockManager.scala:178) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at org.apache.spark.storage.DiskBlockManager.stop(DiskBlockManager.scala:178) at org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply$mcV$sp(DiskBlockManager.scala:171) at org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:169) at org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:169) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) at org.apache.spark.storage.DiskBlock
Re: Spark streaming job failing after some time.
Thanks Akhil for your input. I have already tried with 3 executors and it still results into the same problem. So as Sean mentioned, the problem does not seem to be related to that. On Sat, Nov 22, 2014 at 11:00 AM, Sean Owen wrote: > That doesn't seem to be the problem though. It processes but then stops. > Presumably there are many executors. > On Nov 22, 2014 9:40 AM, "Akhil Das" wrote: > >> For Spark streaming, you must always set *--executor-cores* to a value >> which is >= 2. Or else it will not do any processing. >> >> Thanks >> Best Regards >> >> On Sat, Nov 22, 2014 at 8:39 AM, pankaj channe >> wrote: >> >>> I have seen similar posts on this issue but could not find solution. >>> Apologies if this has been discussed here before. >>> >>> I am running a spark streaming job with yarn on a 5 node cluster. I am >>> using following command to submit my streaming job. >>> >>> spark-submit --class class_name --master yarn-cluster --num-executors 1 >>> --driver-memory 1g --executor-memory 1g --executor-cores 1 my_app.jar >>> >>> >>> After running for some time, the job stops. The application log shows >>> following two errors: >>> >>> 14/11/21 22:05:04 WARN yarn.ApplicationMaster: Unable to retrieve >>> SparkContext in spite of waiting for 10, maxNumTries = 10 >>> Exception in thread "main" java.lang.NullPointerException >>> at >>> org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkContextInitialized(ApplicationMaster.scala:218) >>> at >>> org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:107) >>> at >>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:410) >>> at >>> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:53) >>> at >>> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:52) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at javax.security.auth.Subject.doAs(Subject.java:415) >>> at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594) >>> at >>> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52) >>> at >>> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:409) >>> at >>> org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) >>> >>> >>> and later... >>> >>> Failed to list files for dir: >>> /data2/hadoop/yarn/local/usercache/user_name/appcache/application_1416332002106_0009/spark-local-20141121220325-b529/20 >>> at org.apache.spark.util.Utils$.listFilesSafely(Utils.scala:673) >>> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:685) >>> at >>> org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:686) >>> at >>> org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:685) >>> at >>> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) >>> at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34) >>> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:685) >>> at >>> org.apache.spark.storage.DiskBlockManager$$anonfun$stop$1.apply(DiskBlockManager.scala:181) >>> at >>> org.apache.spark.storage.DiskBlockManager$$anonfun$stop$1.apply(DiskBlockManager.scala:178) >>> at >>> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) >>> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) >>> at >>> org.apache.spark.storage.DiskBlockManager.stop(DiskBlockManager.scala:178) >>> at >>> org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply$mcV$sp(DiskBlockManager.scala:171) >>> at >>> org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:169) >>> at >>> org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:169) >>> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) >>> at >>> org.apache.spark.storage.DiskBlockManager$$anon$1.run(DiskBlockManager.scala:169) >>> >>> >>> Note: I am building my jar on my local with spark dependency added in >>> pom.xml and running it on cluster running spark. >>> >>> >>> -Pankaj >>> >> >>
Re: Spark streaming job failing after some time.
That doesn't seem to be the problem though. It processes but then stops. Presumably there are many executors. On Nov 22, 2014 9:40 AM, "Akhil Das" wrote: > For Spark streaming, you must always set *--executor-cores* to a value > which is >= 2. Or else it will not do any processing. > > Thanks > Best Regards > > On Sat, Nov 22, 2014 at 8:39 AM, pankaj channe > wrote: > >> I have seen similar posts on this issue but could not find solution. >> Apologies if this has been discussed here before. >> >> I am running a spark streaming job with yarn on a 5 node cluster. I am >> using following command to submit my streaming job. >> >> spark-submit --class class_name --master yarn-cluster --num-executors 1 >> --driver-memory 1g --executor-memory 1g --executor-cores 1 my_app.jar >> >> >> After running for some time, the job stops. The application log shows >> following two errors: >> >> 14/11/21 22:05:04 WARN yarn.ApplicationMaster: Unable to retrieve >> SparkContext in spite of waiting for 10, maxNumTries = 10 >> Exception in thread "main" java.lang.NullPointerException >> at >> org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkContextInitialized(ApplicationMaster.scala:218) >> at >> org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:107) >> at >> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:410) >> at >> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:53) >> at >> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:52) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:415) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594) >> at >> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52) >> at >> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:409) >> at >> org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) >> >> >> and later... >> >> Failed to list files for dir: >> /data2/hadoop/yarn/local/usercache/user_name/appcache/application_1416332002106_0009/spark-local-20141121220325-b529/20 >> at org.apache.spark.util.Utils$.listFilesSafely(Utils.scala:673) >> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:685) >> at >> org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:686) >> at >> org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:685) >> at >> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) >> at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34) >> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:685) >> at >> org.apache.spark.storage.DiskBlockManager$$anonfun$stop$1.apply(DiskBlockManager.scala:181) >> at >> org.apache.spark.storage.DiskBlockManager$$anonfun$stop$1.apply(DiskBlockManager.scala:178) >> at >> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) >> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) >> at >> org.apache.spark.storage.DiskBlockManager.stop(DiskBlockManager.scala:178) >> at >> org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply$mcV$sp(DiskBlockManager.scala:171) >> at >> org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:169) >> at >> org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:169) >> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) >> at >> org.apache.spark.storage.DiskBlockManager$$anon$1.run(DiskBlockManager.scala:169) >> >> >> Note: I am building my jar on my local with spark dependency added in >> pom.xml and running it on cluster running spark. >> >> >> -Pankaj >> > >
Re: Spark streaming job failing after some time.
For Spark streaming, you must always set *--executor-cores* to a value which is >= 2. Or else it will not do any processing. Thanks Best Regards On Sat, Nov 22, 2014 at 8:39 AM, pankaj channe wrote: > I have seen similar posts on this issue but could not find solution. > Apologies if this has been discussed here before. > > I am running a spark streaming job with yarn on a 5 node cluster. I am > using following command to submit my streaming job. > > spark-submit --class class_name --master yarn-cluster --num-executors 1 > --driver-memory 1g --executor-memory 1g --executor-cores 1 my_app.jar > > > After running for some time, the job stops. The application log shows > following two errors: > > 14/11/21 22:05:04 WARN yarn.ApplicationMaster: Unable to retrieve > SparkContext in spite of waiting for 10, maxNumTries = 10 > Exception in thread "main" java.lang.NullPointerException > at > org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkContextInitialized(ApplicationMaster.scala:218) > at > org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:107) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:410) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:53) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:52) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594) > at > org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52) > at > org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:409) > at > org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) > > > and later... > > Failed to list files for dir: > /data2/hadoop/yarn/local/usercache/user_name/appcache/application_1416332002106_0009/spark-local-20141121220325-b529/20 > at org.apache.spark.util.Utils$.listFilesSafely(Utils.scala:673) > at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:685) > at > org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:686) > at > org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:685) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34) > at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:685) > at > org.apache.spark.storage.DiskBlockManager$$anonfun$stop$1.apply(DiskBlockManager.scala:181) > at > org.apache.spark.storage.DiskBlockManager$$anonfun$stop$1.apply(DiskBlockManager.scala:178) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) > at > org.apache.spark.storage.DiskBlockManager.stop(DiskBlockManager.scala:178) > at > org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply$mcV$sp(DiskBlockManager.scala:171) > at > org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:169) > at > org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:169) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) > at > org.apache.spark.storage.DiskBlockManager$$anon$1.run(DiskBlockManager.scala:169) > > > Note: I am building my jar on my local with spark dependency added in > pom.xml and running it on cluster running spark. > > > -Pankaj >
Spark streaming job failing after some time.
I have seen similar posts on this issue but could not find solution. Apologies if this has been discussed here before. I am running a spark streaming job with yarn on a 5 node cluster. I am using following command to submit my streaming job. spark-submit --class class_name --master yarn-cluster --num-executors 1 --driver-memory 1g --executor-memory 1g --executor-cores 1 my_app.jar After running for some time, the job stops. The application log shows following two errors: 14/11/21 22:05:04 WARN yarn.ApplicationMaster: Unable to retrieve SparkContext in spite of waiting for 10, maxNumTries = 10 Exception in thread "main" java.lang.NullPointerException at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkContextInitialized(ApplicationMaster.scala:218) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:107) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:410) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:53) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:52) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:409) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) and later... Failed to list files for dir: /data2/hadoop/yarn/local/usercache/user_name/appcache/application_1416332002106_0009/spark-local-20141121220325-b529/20 at org.apache.spark.util.Utils$.listFilesSafely(Utils.scala:673) at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:685) at org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:686) at org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:685) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34) at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:685) at org.apache.spark.storage.DiskBlockManager$$anonfun$stop$1.apply(DiskBlockManager.scala:181) at org.apache.spark.storage.DiskBlockManager$$anonfun$stop$1.apply(DiskBlockManager.scala:178) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at org.apache.spark.storage.DiskBlockManager.stop(DiskBlockManager.scala:178) at org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply$mcV$sp(DiskBlockManager.scala:171) at org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:169) at org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:169) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) at org.apache.spark.storage.DiskBlockManager$$anon$1.run(DiskBlockManager.scala:169) Note: I am building my jar on my local with spark dependency added in pom.xml and running it on cluster running spark. -Pankaj