Vijay - you said the job gets stuck but you also said it eventually
completes. What do you mean by stuck? Do you mean that there are
periods of low CPU utilization?

If you can run jstack during one of the periods and post the output
that would be most helpful.

On Wed, Nov 27, 2013 at 1:04 AM, Vijay Gaikwad <[email protected]> wrote:
> The server has 100+ gb of memory. Virtual memory for my job is 60gb and
> reserved is 20-30 gb. So there is plenty of memory to spare even when job is
> stuck. I am not sure if it is GC because there is still lot of memory which
> job could have used. The jobs memory consumption remains same after it
> resumes and there are no swaps too. (I observe all this using top command)
> However as job progresses (with all the halt and resume cycles) the memory
> used slowly increases but never reaches max.
>
> When the job gets stuck, CPU drops to 0% and memory is unchanged.
>
> I have observed this behavior with my other spark scripts too which run on
> multiple small files. I thought it was because I was using a single machine
> . but I believe that shouldn't be the case.
>
> Does anyone of you observe such behavior?
> Thx
>
> -Vijay
> University of Washington
>
> On Nov 27, 2013 12:44 AM, "Liu, Raymond" <[email protected]> wrote:
>>
>> How about memory usage, any GC problem? When you mention get stuck, you
>> mean 0% or 1200% CPU while no progress?
>>
>> Raymond
>>
>> From: Vijay Gaikwad [mailto:[email protected]]
>> Sent: Wednesday, November 27, 2013 2:54 PM
>> To: [email protected]
>> Subject: Re: local[k] job gets stuck - spark 0.8.0
>>
>> Hi Patrick,
>>
>> Sorry I don't have access to web UI.
>> So I have been running these jobs on larger servers and letting them run..
>> I have observed that when I run a job with "local[12]", it runs for some
>> time on full throttle at 1200% CPU consumptions, but after some this
>> processing goes to 0%.
>> After few seconds it again starts processing and goes to high percentage
>> of CPU utilization. This cycle repeats till the job is completed.
>> Ironically I observed similar behavior simple "local" jobs.
>>
>> Is it the nature of the job that is causing this? I am processing a 70GB
>> file and performing simple map and reduce operations. I am sufficient 100GB
>> ram.
>> Any thoughts?
>>
>> Vijay Gaikwad
>> University of Washington MSIM
>> [email protected]
>> (206) 261-5828
>>
>> On Nov 25, 2013, at 11:43 AM, Patrick Wendell <[email protected]> wrote:
>>
>>
>> When it gets stuck, what does it show in the web UI? Also, can you run
>> a jstack on the process and attach the output... that might explain
>> what's going on.
>>
>> On Mon, Nov 25, 2013 at 11:30 AM, Vijay Gaikwad <[email protected]>
>> wrote:
>>
>> I am using apache spark 0.8.0 to process a large data file and perform
>> some
>> basic .map and.reduceByKey operations on the RDD.
>>
>> Since I am using a single machine with multiple processors, I mention
>> local[8] in the Master URL field while creating SparkContext
>>
>> val sc = new SparkContext("local[8]", "Tower-Aggs", SPARK_HOME )
>>
>> But whenever I mention multiple processors, the job gets stuck
>> (pauses/halts) randomly. There is no definite place where it gets stuck,
>> its
>> just random. Sometimes it won't happen at all. I am not sure if it
>> continues
>> after that but it gets stuck for a long time after which I abort the job.
>>
>> But when I just use local in place of local[8], the job runs seamlessly
>> without getting stuck ever.
>>
>> val sc = new SparkContext("local", "Tower-Aggs", SPARK_HOME )
>>
>> I am not able to understand where is the problem.
>>
>> I am using Scala 2.9.3 and sbt to build and run the application
>>
>>
>> -
>>
>> http://stackoverflow.com/questions/20187048/apache-spark-localk-master-url-job-gets-stuck
>>
>> Thx
>> Vijay Gaikwad
>> University of Washington MSIM
>> [email protected]
>> (206) 261-5828
>>
>

Reply via email to