It looks like you are having issues with the files getting distributed to the cluster. What is the exception you are getting now?
On Wednesday, August 19, 2015, Ramkumar V <ramkumar.c...@gmail.com> wrote: > Thanks a lot for your suggestion. I had modified HADOOP_CONF_DIR in > spark-env.sh so that core-site.xml is under HADOOP_CONF_DIR. i can able > to see the logs like that you had shown above. Now i can able to run for 3 > minutes and store results between every minutes. After sometimes, there is > an exception. How to fix this exception ? and Can you please explain where > its going wrong ? > > *Log Link : http://pastebin.com/xL9jaRUa <http://pastebin.com/xL9jaRUa> * > > > *Thanks*, > <https://in.linkedin.com/in/ramkumarcs31> > > > On Wed, Aug 19, 2015 at 1:54 PM, Jeff Zhang <zjf...@gmail.com > <javascript:_e(%7B%7D,'cvml','zjf...@gmail.com');>> wrote: > >> HADOOP_CONF_DIR is the environment variable point to the hadoop conf >> directory. Not sure how CDH organize that, make sure core-site.xml is >> under HADOOP_CONF_DIR. >> >> On Wed, Aug 19, 2015 at 4:06 PM, Ramkumar V <ramkumar.c...@gmail.com >> <javascript:_e(%7B%7D,'cvml','ramkumar.c...@gmail.com');>> wrote: >> >>> We are using Cloudera-5.3.1. since it is one of the earlier version of >>> CDH, it doesnt supports the latest version of spark. So i installed >>> spark-1.4.1 separately in my machine. I couldnt able to do spark-submit in >>> cluster mode. How to core-site.xml under classpath ? it will be very >>> helpful if you could explain in detail to solve this issue. >>> >>> *Thanks*, >>> <https://in.linkedin.com/in/ramkumarcs31> >>> >>> >>> On Fri, Aug 14, 2015 at 8:25 AM, Jeff Zhang <zjf...@gmail.com >>> <javascript:_e(%7B%7D,'cvml','zjf...@gmail.com');>> wrote: >>> >>>> >>>> 1. 15/08/12 13:24:49 INFO Client: Source and destination file >>>> systems are the same. Not copying >>>> >>>> file:/home/hdfs/spark-1.4.1/assembly/target/scala-2.10/spark-assembly-1.4.1-hadoop2.5.0-cdh5.3.5.jar >>>> 2. 15/08/12 13:24:49 INFO Client: Source and destination file >>>> systems are the same. Not copying >>>> >>>> file:/home/hdfs/spark-1.4.1/external/kafka-assembly/target/spark-streaming-kafka-assembly_2.10-1.4.1.jar >>>> 3. 15/08/12 13:24:49 INFO Client: Source and destination file >>>> systems are the same. Not copying >>>> file:/home/hdfs/spark-1.4.1/python/lib/pyspark.zip >>>> 4. 15/08/12 13:24:49 INFO Client: Source and destination file >>>> systems are the same. Not copying >>>> file:/home/hdfs/spark-1.4.1/python/lib/py4j-0.8.2.1-src.zip >>>> 5. 15/08/12 13:24:49 INFO Client: Source and destination file >>>> systems are the same. Not copying >>>> file:/home/hdfs/spark-1.4.1/examples/src/main/python/streaming/kyt.py >>>> 6. >>>> >>>> >>>> 1. diagnostics: Application application_1437639737006_3808 failed 2 >>>> times due to AM Container for appattempt_1437639737006_3808_000002 >>>> exited >>>> with exitCode: -1000 due to: File >>>> file:/home/hdfs/spark-1.4.1/python/lib/pyspark.zip does not exist >>>> 2. .Failing this attempt.. Failing the application. >>>> >>>> >>>> >>>> The machine you run spark is the client machine, while the yarn AM is >>>> running on another machine. And the yarn AM complains that the files are >>>> not found as your logs shown. >>>> From the logs, its seems that these files are not copied to the HDFS as >>>> local resources. I doubt that you didn't put core-site.xml under your >>>> classpath, so that spark can not detect your remote file system and won't >>>> copy the files to hdfs as local resources. Usually in yarn-cluster mode, >>>> you should be able to see the logs like following. >>>> >>>> > 15/08/14 10:48:49 INFO yarn.Client: Preparing resources for our AM >>>> container >>>> > 15/08/14 10:48:49 INFO yarn.Client: Uploading resource >>>> file:/Users/abc/github/spark/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar >>>> -> hdfs:// >>>> 0.0.0.0:9000/user/abc/.sparkStaging/application_1439432662178_0019/spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar >>>> > 15/08/14 10:48:50 INFO yarn.Client: Uploading resource >>>> file:/Users/abc/github/spark/spark.py -> hdfs:// >>>> 0.0.0.0:9000/user/abc/.sparkStaging/application_1439432662178_0019/spark.py >>>> > 15/08/14 10:48:50 INFO yarn.Client: Uploading resource >>>> file:/Users/abc/github/spark/python/lib/pyspark.zip -> hdfs:// >>>> 0.0.0.0:9000/user/abc/.sparkStaging/application_1439432662178_0019/pyspark.zip >>>> >>>> On Thu, Aug 13, 2015 at 2:50 PM, Ramkumar V <ramkumar.c...@gmail.com >>>> <javascript:_e(%7B%7D,'cvml','ramkumar.c...@gmail.com');>> wrote: >>>> >>>>> Hi, >>>>> >>>>> I have a cluster of 1 master and 2 slaves. I'm running a spark >>>>> streaming in master and I want to utilize all nodes in my cluster. i had >>>>> specified some parameters like driver memory and executor memory in my >>>>> code. when i give --deploy-mode cluster --master yarn-cluster in my >>>>> spark-submit, it gives the following error. >>>>> >>>>> Log link : *http://pastebin.com/kfyVWDGR >>>>> <http://pastebin.com/kfyVWDGR>* >>>>> >>>>> How to fix this issue ? Please help me if i'm doing wrong. >>>>> >>>>> >>>>> *Thanks*, >>>>> Ramkumar V >>>>> >>>>> >>>> >>>> >>>> -- >>>> Best Regards >>>> >>>> Jeff Zhang >>>> >>> >>> >> >> >> -- >> Best Regards >> >> Jeff Zhang >> > > -- Thanks, Hari