Could you share more details about the dataset and the algorithm? For example, if the dataset has 10M+ features, it may be slow for the driver to collect the weights from executors (just a blind guess). -Xiangrui
On Tue, Jul 29, 2014 at 9:15 PM, Tan Tim <unname...@gmail.com> wrote: > Hi, all > > [Setting] > > Input data: > the data on the hdfs, 10 part (text file), the size of every part is about > 2.3G > > Spark Clusters > Run on CentOS, 8 machines, 8 cores and 128G Memory per machine. > > The setting for Spark Context: > val conf = new SparkConf().setMaster("spark://xxx-xxx-xx001:12036"). > setAppName("OWLQN").setSparkHome("/var/bh/lib/spark-0.9.1-bin-hadoop1"). > setJars(List(jarFile)) > conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") > conf.set("spark.kryo.registrator", "LRRegistrator") > conf.set("spark.executor.memory", "64g") > conf.set("spark.default.parallelism", "128") > conf.set("spark.akka.timeout", "60") > conf.set("spark.storage.memoryFraction", "0.7") > conf.set("spark.kryoserializer.buffer.mb", "1024") > conf.set("spark.cores.max", "64") > conf.set("spark.speculation", "true") > conf.set("spark.storage.blockManagerTimeoutIntervalMs", "60000") > val sc = new SparkContext(conf) > > [Trouble] > > Executor not start up concurency > For every stage, the executor not start up concurrency, some executor > finished all the tasks, other excutor still not begin the task, as the > webUI shows (some executors finished 10 tasks, and the other two is still > not shown on the webUI): > > as Andrew Xia suggestion, I add sleep after new spark context, but some > stage also has this problem. > > IO/CPU alwsy not fully used > when taskes start up, all the cpu is not fully used, the usage of cpu more > than 100% for less than 2 seconds, and then drop to 1%, but all the task > not finished. The same thing happens to I/O > > > The attach file is the log for some stages, every stage average 3.5 > minutes, too slowly compares to other experiment(run the same task on the > clusters of ubuntu not centos) > > >