On Fri, Feb 7, 2014 at 7:48 AM, Aaron Davidson <[email protected]> wrote:

> Sorry for delay, by long-running I just meant if you were running an
> iterative algorithm that was slowing down over time. We have observed this
> in the spark-perf benchmark; as file system state builds up, the job can
> slow down. Once the job finishes, however, it is cleaned up and should not
> affect subsequent jobs.
>
> I can think of three other possibilities for a slowdown: (1) unclean
> shutdown of Spark (i.e., kill -9), which doesn't allow us to clean up our
> data
>

By 'shutdown of Spark', do you mean shutting down the spark app, or the
spark cluster?

How is it possible to gracefully shut down a spark app?


> (2) buildup of logs in the work/ directory or files in the Spark tmp
> directory, and (3) bug in Spark (woo!).
>
>
> On Tue, Feb 4, 2014 at 5:58 AM, Aureliano Buendia <[email protected]>wrote:
>
>>
>>
>>
>> On Mon, Feb 3, 2014 at 12:26 AM, Aaron Davidson <[email protected]>wrote:
>>
>>> Are you seeing any exceptions in between running apps? Does restarting
>>> the master/workers actually cause Spark to speed back up again? It's
>>> possible, for instance, that you run out of disk space, which should cause
>>> exceptions but not go away by restarting the master/workers.
>>>
>>
>> Not really, no exceptions and plenty of disk space left. At this point
>> I'm not certain that restarting spark master/workers definitely helps. The
>> only thing that does help is bringing up a fresh ec2 cluster, which is not
>> a solution. This could suggest that spark leaves some stuff and get build
>> up every time the app is executed.
>>
>>
>>>
>>> One thing to worry about is long-running jobs or shells.
>>>
>>
>> What do you mean by long running jobs?
>>
>>
>>> Currently, state buildup of a single job in Spark *is* a problem, as
>>> certain state such as shuffle files and RDD metadata is not cleaned up
>>> until the job (or shell) exits. We have hacky ways to reduce this, and are
>>> working on a long term solution. However, separate, consecutive jobs should
>>> be independent in terms of performance.
>>>
>>>
>>> On Sat, Feb 1, 2014 at 8:27 PM, 尹绪森 <[email protected]> wrote:
>>>
>>>> Is your spark app an iterative one ? If so, your app is creating a big
>>>> DAG in every iteration. You should use checkpoint it periodically, say, 10
>>>> iterations one checkpoint.
>>>>
>>>>
>>>> 2014-02-01 Aureliano Buendia <[email protected]>:
>>>>
>>>> Hi,
>>>>>
>>>>> I've noticed my spark app (on ec2) gets slower and slower as I
>>>>> repeatedly execute it.
>>>>>
>>>>> With a fresh ec2 cluster, it is snappy and takes about 15 mins to
>>>>> complete, after running the same app 4 times it gets slower and takes to 
>>>>> 40
>>>>> mins and more.
>>>>>
>>>>> While the cluster gets slower, the monitoring metrics show less and
>>>>> less activities (almost no cpu, or io).
>>>>>
>>>>> When it gets slow, sometimes the number of running tasks (light blue
>>>>> in web ui progress bar) is zero, and only the number of completed tasks
>>>>> (dark blue) increments.
>>>>>
>>>>> Is this a known spark issue?
>>>>>
>>>>> Do I need to restart spark master and workers in between running apps?
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards
>>>> -----------------------------------
>>>> Xusen Yin    尹绪森
>>>> Beijing Key Laboratory of Intelligent Telecommunications Software and
>>>> Multimedia
>>>> Beijing University of Posts & Telecommunications
>>>> Intel Labs China
>>>> Homepage: *http://yinxusen.github.io/ <http://yinxusen.github.io/>*
>>>>
>>>
>>>
>>
>

Reply via email to