Hi, all
There is a discussion about the accumulator in stack overflow:
http://stackoverflow.com/questions/27357440/spark-accumalator-value-is-different-when-inside-rdd-and-outside-rdd
I comment about this question (from user Tim). As the output I tried, I
hava two questions:
1. Why the addInplace
I also encountered the similar problem: after some stages, all the taskes
are assigned to one machine, and the stage execution get slower and slower.
*[the spark conf setting]*
val conf = new SparkConf().setMaster(sparkMaster).setAppName("ModelTraining"
).setSparkHome(sparkHome).setJars(List(jarFi
Hi, all,
I can see the job failed from the web UI, But when I run ps on the
client(which machine I submit the job), I can find the proces is still
exists:
user_tt 5971 2.6 2.2 15030180 3029840 ?Sl 11:41 4:37 java -cp
/var/bh/lib/spark-0.9.1-bin-hadoop1/assembly/target/scala-2.10/spark
have the same internode
> bandwidth? -Xiangrui
>
> On Tue, Jul 29, 2014 at 11:06 PM, Tan Tim wrote:
> > input data is evenly distributed to the executors.
> >
> > The input data is on the HDFS, not on the spark clusters. How can I make
> the
> > data distr
? -Xiangrui
>
> On Tue, Jul 29, 2014 at 10:46 PM, Tan Tim wrote:
> > The application is Logistic Regression (OWLQN), we develop a sparse
> vector
> > version. The feature dimesions is 1M+, but its very sparse. This
> appliction
> > can run on another spark cluster, and
The application is Logistic Regression (OWLQN), we develop a sparse vector
version. The feature dimesions is 1M+, but its very sparse. This appliction
can run on another spark cluster, and every stage is about 50 seconds, and
every executors have highly cpu usage. the only difference is OS(the fast