How to set executor num on spark on yarn
hi~I want to set the executor number to 16, but it is very strange that executor cores may affect executor num on spark on yarn, i don't know why and how to set executor number. = ./bin/spark-submit --class com.hequn.spark.SparkJoins \ --master yarn-cluster \ --num-executors 16 \ --driver-memory 2g \ --executor-memory 10g \ * --executor-cores 4 \* /home/sparkjoins-1.0-SNAPSHOT.jar The UI shows there are *7 executors* = ./bin/spark-submit --class com.hequn.spark.SparkJoins \ --master yarn-cluster \ --num-executors 16 \ --driver-memory 2g \ --executor-memory 10g \ *--executor-cores 2 \* /home/sparkjoins-1.0-SNAPSHOT.jar The UI shows there are *9 executors* = ./bin/spark-submit --class com.hequn.spark.SparkJoins \ --master yarn-cluster \ --num-executors 16 \ --driver-memory 2g \ --executor-memory 10g \ *--executor-cores 1 \* /home/sparkjoins-1.0-SNAPSHOT.jar The UI shows there are *9 executors* == The cluster contains 16 nodes. Each node 64G RAM.
Re: Hadoop 2.3 Centralized Cache vs RDD
I tried centralized cache step by step following the apache hadoop oficial website, but it seems centralized cache doesn't work. see : http://stackoverflow.com/questions/22293358/centralized-cache-failed-in-hadoop-2-3 . Can anyone succeed? 2014-05-15 5:30 GMT+08:00 William Kang : > Hi, > Any comments or thoughts on the implications of the newly released feature > from Hadoop 2.3 on the centralized cache? How different it is from RDD? > > Many thanks. > > > Cao >
Re: 答复: 答复: RDD usage
Hi~I wrote a program to test.The non-idempotent "compute" function in foreach does change the value of RDD. It may looks a little crazy to do so since modify the RDD will make it impossible to keep RDD fault-tolerant in spark :) 2014-03-25 11:11 GMT+08:00 林武康 : > Hi hequn, I dig into the source of spark a bit deeper, and I got some > ideas, firstly, immutable is a feather of rdd but not a solid rule, there > are ways to change it, for excample, a rdd with non-idempotent "compute" > function, though it is really a bad design to make that function > non-idempotent for uncontrollable side-effect. I agree with Mark that > foreach can modify the elements of a rdd, but we should avoid this because > it will effect all the rdds generate by this changed rdd , make the whole > process inconsistent and unstable. > > Some rough opinions on the immutable feature of rdd, full discuss can make > it more clear. Any ideas? > -- > 发件人: hequn cheng > 发送时间: 2014/3/25 10:40 > 收件人: user@spark.apache.org > 主题: Re: 答复: RDD usage > > First question: > If you save your modified RDD like this: > points.foreach(p=>p.y = another_value).collect() or > points.foreach(p=>p.y = another_value).saveAsTextFile(...) > the modified RDD will be materialized and this will not use any work's > memory. > If you have more transformatins after the map(), the spark will pipelines > all transformations and build a DAG. Very little memory will be used in > this stage and the memory will be free soon. > Only cache() will persist your RDD in memory for a long time. > Second question: > Once RDD be created, it can not be changed due to the immutable > feature.You can only create a new RDD from the existing RDD or from file > system. > > > 2014-03-25 9:45 GMT+08:00 林武康 : > >> Hi hequn, a relative question, is that mean the memory usage will >> doubled? And further more, if the compute function in a rdd is not >> idempotent, rdd will changed during the job running, is that right? >> -- >> 发件人: hequn cheng >> 发送时间: 2014/3/25 9:35 >> 收件人: user@spark.apache.org >> 主题: Re: RDD usage >> >> points.foreach(p=>p.y = another_value) will return a new modified RDD. >> >> >> 2014-03-24 18:13 GMT+08:00 Chieh-Yen : >> >>> Dear all, >>> >>> I have a question about the usage of RDD. >>> I implemented a class called AppDataPoint, it looks like: >>> >>> case class AppDataPoint(input_y : Double, input_x : Array[Double]) >>> extends Serializable { >>> var y : Double = input_y >>> var x : Array[Double] = input_x >>> .. >>> } >>> Furthermore, I created the RDD by the following function. >>> >>> def parsePoint(line: String): AppDataPoint = { >>> /* Some related works for parsing */ >>> .. >>> } >>> >>> Assume the RDD called "points": >>> >>> val lines = sc.textFile(inputPath, numPartition) >>> var points = lines.map(parsePoint _).cache() >>> >>> The question is that, I tried to modify the value of this RDD, the >>> operation is: >>> >>> points.foreach(p=>p.y = another_value) >>> >>> The operation is workable. >>> There doesn't have any warning or error message showed by the system and >>> the results are right. >>> I wonder that if the modification for RDD is a correct and in fact >>> workable design. >>> The usage web said that the RDD is immutable, is there any suggestion? >>> >>> Thanks a lot. >>> >>> Chieh-Yen Lin >>> >> >> >
Re: 答复: RDD usage
First question: If you save your modified RDD like this: points.foreach(p=>p.y = another_value).collect() or points.foreach(p=>p.y = another_value).saveAsTextFile(...) the modified RDD will be materialized and this will not use any work's memory. If you have more transformatins after the map(), the spark will pipelines all transformations and build a DAG. Very little memory will be used in this stage and the memory will be free soon. Only cache() will persist your RDD in memory for a long time. Second question: Once RDD be created, it can not be changed due to the immutable feature.You can only create a new RDD from the existing RDD or from file system. 2014-03-25 9:45 GMT+08:00 林武康 : > Hi hequn, a relative question, is that mean the memory usage will > doubled? And further more, if the compute function in a rdd is not > idempotent, rdd will changed during the job running, is that right? > ------ > 发件人: hequn cheng > 发送时间: 2014/3/25 9:35 > 收件人: user@spark.apache.org > 主题: Re: RDD usage > > points.foreach(p=>p.y = another_value) will return a new modified RDD. > > > 2014-03-24 18:13 GMT+08:00 Chieh-Yen : > >> Dear all, >> >> I have a question about the usage of RDD. >> I implemented a class called AppDataPoint, it looks like: >> >> case class AppDataPoint(input_y : Double, input_x : Array[Double]) >> extends Serializable { >> var y : Double = input_y >> var x : Array[Double] = input_x >> .. >> } >> Furthermore, I created the RDD by the following function. >> >> def parsePoint(line: String): AppDataPoint = { >> /* Some related works for parsing */ >> .. >> } >> >> Assume the RDD called "points": >> >> val lines = sc.textFile(inputPath, numPartition) >> var points = lines.map(parsePoint _).cache() >> >> The question is that, I tried to modify the value of this RDD, the >> operation is: >> >> points.foreach(p=>p.y = another_value) >> >> The operation is workable. >> There doesn't have any warning or error message showed by the system and >> the results are right. >> I wonder that if the modification for RDD is a correct and in fact >> workable design. >> The usage web said that the RDD is immutable, is there any suggestion? >> >> Thanks a lot. >> >> Chieh-Yen Lin >> > >
Re: RDD usage
points.foreach(p=>p.y = another_value) will return a new modified RDD. 2014-03-24 18:13 GMT+08:00 Chieh-Yen : > Dear all, > > I have a question about the usage of RDD. > I implemented a class called AppDataPoint, it looks like: > > case class AppDataPoint(input_y : Double, input_x : Array[Double]) extends > Serializable { > var y : Double = input_y > var x : Array[Double] = input_x > .. > } > Furthermore, I created the RDD by the following function. > > def parsePoint(line: String): AppDataPoint = { > /* Some related works for parsing */ > .. > } > > Assume the RDD called "points": > > val lines = sc.textFile(inputPath, numPartition) > var points = lines.map(parsePoint _).cache() > > The question is that, I tried to modify the value of this RDD, the > operation is: > > points.foreach(p=>p.y = another_value) > > The operation is workable. > There doesn't have any warning or error message showed by the system and > the results are right. > I wonder that if the modification for RDD is a correct and in fact > workable design. > The usage web said that the RDD is immutable, is there any suggestion? > > Thanks a lot. > > Chieh-Yen Lin >
Re: What's the lifecycle of an rdd? Can I control it?
persist and unpersist. unpersist:Mark the RDD as non-persistent, and remove all blocks for it from memory and disk 2014-03-19 16:40 GMT+08:00 林武康 : > Hi, can any one tell me about the lifecycle of an rdd? I search through > the official website and still can't figure it out. Can I use an rdd in > some stages and destroy it in order to release memory because that no > stages ahead will use this rdd any more. Is it possible? > > Thanks! > > Sincerely > Lin wukang >
How to set task number in a container
When i increase my input data size, the executor will be failed and lost. see below: 14/03/11 20:44:18 INFO AppClient$ClientActor: Executor updated: app-20140311204343-0008/8 is now FAILED (Command exited with code 134) 14/03/11 20:44:18 INFO SparkDeploySchedulerBackend: Executor app-20140311204343-0008/8 removed: Command exited with code 134 14/03/11 20:44:18 INFO SparkDeploySchedulerBackend: Executor 8 disconnected, so removing it 14/03/11 20:44:18 ERROR TaskSchedulerImpl: Lost executor 8 on Salve10.Hadoop: Unknown executor exit code (134) (died from signal 6?) 14/03/11 20:44:18 INFO TaskSetManager: Re-queueing tasks for 8 from TaskSet 1.0 14/03/11 20:44:18 WARN TaskSetManager: Lost TID 8 (task 1.0:22) 14/03/11 20:44:18 WARN TaskSetManager: Lost TID 49 (task 1.0:49) 14/03/11 20:44:18 WARN TaskSetManager: Lost TID 31 (task 1.0:40) 14/03/11 20:44:18 WARN TaskSetManager: Lost TID 1 (task 1.0:10) 14/03/11 20:44:18 WARN TaskSetManager: Lost TID 15 (task 1.0:48) 14/03/11 20:44:18 INFO AppClient$ClientActor: Executor added: app-20140311204343-0008/11 on worker-20140311172513-Salve10.Hadoop-56435 (Salve10.Hadoop:56435) with 8 cores 14/03/11 20:44:18 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140311204343-0008/11 on hostPort Salve10.Hadoop:56435 with 8 cores, 4.0 GB RAM 14/03/11 20:44:18 INFO AppClient$ClientActor: Executor updated: app-20140311204343-0008/11 is now RUNNING It seems that too many tasks are running in a container and the total task memory exceed the "spark.executor.memory" so how to set the task number in a executor? thank you
Re: SPARK_JAVA_OPTS not picked up by the application
have your send spark-env.sh to the slave nodes ? 2014-03-11 6:47 GMT+08:00 Linlin : > > Hi, > > I have a java option (-Xss) setting specified in SPARK_JAVA_OPTS in > spark-env.sh, noticed after stop/restart the spark cluster, the > master/worker daemon has the setting being applied, but this setting is not > being propagated to the executor, my application continue behave the same. > I > am not sure if there is a way to specify it through SparkConf? like > SparkConf.set(), and what is the correct way of setting this up for a > particular spark application. > > Thank you! > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/SPARK-JAVA-OPTS-not-picked-up-by-the-application-tp2483.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >
subscribe
hi
subscribe
hi