Re: ContinuousFileMonitoringFunction - deleting file after processing

2016-12-01 Thread Kostas Kloudas
Hi Maciek, The first point seems interesting and we should definitely look into that, also for other filesystems e.g. HDFS. It would be nice if we could together find a more “one-size-fits-all” solution. Because local fs rounds up to a second but other filesystems may have different

Re: ContinuousFileMonitoringFunction - deleting file after processing

2016-11-27 Thread Kostas Kloudas
Hi Maciek, Thanks for bringing this up again and sorry for not opening the discussion yet. I will check it out and get back to you during week. Kostas > On Nov 26, 2016, at 9:40 PM, Maciek Próchniak wrote: > > Hi Kostas, > > I didn't see any discussion on dev mailing list, so

Re: ContinuousFileMonitoringFunction - deleting file after processing

2016-11-26 Thread Maciek Próchniak
Hi Kostas, I didn't see any discussion on dev mailing list, so I'd like to share our problems/solutions (we had a busy month...;) 1. we refactored ContinuousFileMonitoringFunction so that state includes not only lastModificationTime, but also list of files that have exactly this

Re: ContinuousFileMonitoringFunction - deleting file after processing

2016-10-18 Thread Kostas Kloudas
Hi Maciek, I agree with you that 1ms is often too long :P This is the reason why I will open a discussion to have all the ideas/ requirements / shortcomings in a single place. This way the community can track and influence what is coming next. Hopefully I will do it in the afternoon and I will

Re: ContinuousFileMonitoringFunction - deleting file after processing

2016-10-18 Thread Maciek Próchniak
Hi Kostas, thanks for quick answer. I wouldn't dare to delete files in InputFormat if they were splitted and processed in parallel... As for using notifyCheckpointComplete - thanks for suggestion, it looks pretty interesting, I'll try to try it out. Although I wonder a bit if relying only

Re: ContinuousFileMonitoringFunction - deleting file after processing

2016-10-18 Thread Kostas Kloudas
Hi Maciek, Just a follow-up on the previous email, given that splits are read in parallel, when the ContinuousFileMonitoringFunction forwards the last split, it does not mean that the final splits is going to be processed last. If the node it gets assigned is fast enough then it may be

ContinuousFileMonitoringFunction - deleting file after processing

2016-10-18 Thread Maciek Próchniak
Hi, we want to monitor hdfs (or local) directory, read csv files that appear and after successful processing - delete them (mainly not to run out of disk space...) I'm not quite sure how to achieve it with current implementation. Previously, when we read binary data (unsplittable files) we