Hi, - I have seen similar behavior before. As far as I can tell, the root cause is the out of memory error - verified this by monitoring the memory. - I had a 30 GB file and was running on a single machine with 16GB. So I knew it would fail. - But instead of raising an exception, some part of the system keeps on churning. - My suggestion is to follow the memory settings for the JVM (try bigger settings), make sure the settings are propagated to all the workers and finally monitor the memory while the job is running. - Another vector is to split the file, try with progressively increasing size. - I also see symptoms of failed connections. While I can't positively say that it is a problem, check your topology & network connectivity. - Out of curiosity, what kind of machines are you running ? Bare metal ? EC2 ? How much memory ? 64 bit OS ? - I assume these are big machines and so the resources themselves might not be a problem.
Cheers <k/> On Sat, Jun 21, 2014 at 12:55 PM, yxzhao <yxz...@ualr.edu> wrote: > I run the pagerank example processing a large data set, 5GB in size, using > 48 > machines. The job got stuck at the time point: 14/05/20 21:32:17, as the > attached log shows. It was stuck there for more than 10 hours and then I > killed it at last. But I did not find any information explaining why it was > stuck. Any suggestions? Thanks. > > Spark_OK_48_pagerank.log > < > http://apache-spark-user-list.1001560.n3.nabble.com/file/n8075/Spark_OK_48_pagerank.log > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Processing-Large-Data-Stuck-tp8075.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >