we want to monitor hdfs (or local) directory, read csv files that appear and after successful processing - delete them (mainly not to run out of disk space...)

I'm not quite sure how to achieve it with current implementation. Previously, when we read binary data (unsplittable files) we made small hack and deleted them

in our FileInputFormat - but now we want to use splits and detecting which split is 'the last one' is no longer so obvious - of course it's also problematic when it comes to checkpointing...

So my question is - is there a idiomatic way of deleting processed files?



Reply via email to