Yes, with spark.cleaner.ttl set there is no cleanup.  We pass --properties-file
spark-dev.conf to spark-submit where  spark-dev.conf contains:

spark.master spark://10.250.241.66:7077
spark.logConf true
spark.cleaner.ttl 1800
spark.executor.memory 10709m
spark.cores.max 4
spark.shuffle.consolidateFiles true

On Thu, Apr 2, 2015 at 7:12 PM, Tathagata Das <[email protected]> wrote:

> Are you saying that even with the spark.cleaner.ttl set your files are not
> getting cleaned up?
>
> TD
>
> On Thu, Apr 2, 2015 at 8:23 AM, andrem <[email protected]> wrote:
>
>> Apparently Spark Streaming 1.3.0 is not cleaning up its internal files and
>> the worker nodes eventually run out of inodes.
>> We see tons of old shuffle_*.data and *.index files that are never
>> deleted.
>> How do we get Spark to remove these files?
>>
>> We have a simple standalone app with one RabbitMQ receiver and a two node
>> cluster (2 x r3large AWS instances).
>> Batch interval is 10 minutes after which we process data and write results
>> to DB. No windowing or state mgmt is used.
>>
>> I've poured over the documentation and tried setting the following
>> properties but they have not helped.
>> As a work around we're using a cron script that periodically cleans up old
>> files but this has a bad smell to it.
>>
>> SPARK_WORKER_OPTS in spark-env.sh on every worker node
>>   spark.worker.cleanup.enabled true
>>   spark.worker.cleanup.interval
>>   spark.worker.cleanup.appDataTtl
>>
>> Also tried on the driver side:
>>   spark.cleaner.ttl
>>   spark.shuffle.consolidateFiles true
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Worker-runs-out-of-inodes-tp22355.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>

Reply via email to