Hello Group,
I'm using Spark 0.8.0 with Scala 2.9.3.
Two issues - >
*1) Job hangs when the number of files increases > 4000 *
First, I was using "local" as an argument for for the Master URL like here,
*val sc = new SparkContext("local", "AnonApp", "/usr/local/spark/")
*
*// Read all files in a directory *
*var t = sc.textFile(fileName)*
*t. map(each_line => some_functions(each_line)).saveAsTextFile("/output/" +
filename)*
My job runs fine for sample inputs (~20 files ) but when the number of
input files increases (~7000 files ) the program execution stops around
4000 files [once hanged at 4200 files and once at 4216 files ].
This is my console ,
"13/10/14 11:54:08 INFO mapred.FileOutputCommitter: Saved output of task
'attempt_201310141154_0000_m_000000_4212' to
file:/usr/local/spark/ram_examples/AnonApp/sample-ANON/MSC/06/MAZ0320111206074848911831.cdr.gz
13/10/14 11:54:08 INFO mapred.FileInputFormat: Total input paths to process
: 1
13/10/14 11:54:08 INFO mapred.FileOutputCommitter: Saved output of task
'attempt_201310141154_0000_m_000000_4213' to
file:/usr/local/spark/ram_examples/AnonApp/sample-ANON/MSC/06/KBL0320111206154040475980.cdr.gz
13/10/14 11:54:08 INFO mapred.FileInputFormat: Total input paths to process
: 1"
__<Control waits here>
When the job hangs, I checked the output folder , the _temporary file is
created but I'm not sure why the program hangs there. The control
stops/waits like this,
I saw one post on user group and it suggested me to increase my
*ulimit*(on number of open files) - but my ulimit is already set to
unlimited.
2 ) When I change the Master URI to local[2], where I have 2 cores.
My earlier said works fine for sample inputs of 20 files. But the same
program when changed from local to local[2] in SparkContext, hangs in the
same fashion like the one shown above. While making the change (local ->
local[2]) am I expected to make any other change ?
Is there any pattern between both these failures ? Apart from the console
logs ? Is there a place where I can see the logs to understand what is
going on when the program hangs ?
Regards,
Ram.