Re: Spark Streaming Worker runs out of inodes

Charles Feduke Thu, 02 Apr 2015 23:22:14 -0700

You could also try setting your `nofile` value in /etc/security/limits.conf
for `soft` to some ridiculously high value if you haven't done so already.


On Fri, Apr 3, 2015 at 2:09 AM Akhil Das <ak...@sigmoidanalytics.com> wrote:

> Did you try these?
>
> - Disable shuffle : spark.shuffle.spill=false
> - Enable log rotation:
>
> sparkConf.set("spark.executor.logs.rolling.strategy", "size")
> .set("spark.executor.logs.rolling.size.maxBytes", "1024")
> .set("spark.executor.logs.rolling.maxRetainedFiles", "3")
>
>
> Thanks
> Best Regards
>
> On Fri, Apr 3, 2015 at 9:09 AM, a mesar <amesa...@gmail.com> wrote:
>
>> Yes, with spark.cleaner.ttl set there is no cleanup.  We pass 
>> --properties-file
>> spark-dev.conf to spark-submit where  spark-dev.conf contains:
>>
>> spark.master spark://10.250.241.66:7077
>> spark.logConf true
>> spark.cleaner.ttl 1800
>> spark.executor.memory 10709m
>> spark.cores.max 4
>> spark.shuffle.consolidateFiles true
>>
>> On Thu, Apr 2, 2015 at 7:12 PM, Tathagata Das <t...@databricks.com>
>> wrote:
>>
>>> Are you saying that even with the spark.cleaner.ttl set your files are
>>> not getting cleaned up?
>>>
>>> TD
>>>
>>> On Thu, Apr 2, 2015 at 8:23 AM, andrem <amesa...@gmail.com> wrote:
>>>
>>>> Apparently Spark Streaming 1.3.0 is not cleaning up its internal files
>>>> and
>>>> the worker nodes eventually run out of inodes.
>>>> We see tons of old shuffle_*.data and *.index files that are never
>>>> deleted.
>>>> How do we get Spark to remove these files?
>>>>
>>>> We have a simple standalone app with one RabbitMQ receiver and a two
>>>> node
>>>> cluster (2 x r3large AWS instances).
>>>> Batch interval is 10 minutes after which we process data and write
>>>> results
>>>> to DB. No windowing or state mgmt is used.
>>>>
>>>> I've poured over the documentation and tried setting the following
>>>> properties but they have not helped.
>>>> As a work around we're using a cron script that periodically cleans up
>>>> old
>>>> files but this has a bad smell to it.
>>>>
>>>> SPARK_WORKER_OPTS in spark-env.sh on every worker node
>>>>   spark.worker.cleanup.enabled true
>>>>   spark.worker.cleanup.interval
>>>>   spark.worker.cleanup.appDataTtl
>>>>
>>>> Also tried on the driver side:
>>>>   spark.cleaner.ttl
>>>>   spark.shuffle.consolidateFiles true
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context: http://apache-spark-user-list.
>>>> 1001560.n3.nabble.com/Spark-Streaming-Worker-runs-out-of-
>>>> inodes-tp22355.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>
>>>>
>>>
>>
>

Re: Spark Streaming Worker runs out of inodes

Reply via email to