Hello! I am fetching files from a FTP server (severall GB for the next years). The files are produced daily in directories which correspond to the date, like
- 20120501 - 20120502 - ... I have only read rights and I am not the only consumer. This means that they keep the last month or so on the server and I fetch on a daily base. To avoid that I am fetching files twice I want to use an IdempotentRepository implementation. I don't want to save each file in a database or in a text file because the service will run for years and this is just unnecessary data. What I want to store is the last processed date only. This handles just the directories and would mean that I need some other strategy for the files. I could combine this approach with the default in memory based store. But let just stick to the directories: I read the directory sorted by file name. The IdempotentRepository is called by the FtpConsumer with - start() - contains() for every directory and file - add() for files only and that's it. No stop(), no confirm(). When I have errors, sometimes remove() is called. Since the repository is called only with String (the full path) I have not information if I deal with directories or files. I know it from the structure, but I am not able to implement a generic solution. Anyway the idea is: - Store the LastProcessedDate inside the repository - contains(): if the path contains an already processed date (<LastProcessedDate) then I skip it (return true) otherwise return false. - add(): if add() jumps to the next directory I set the LastProcessedDate to the directory before The only problem is the last processed directory: even if it is finished I do not get the chance to mark it as processed (set LastProcessedDate to its value). So finally my questions: do you think this approach makes sense and if yes: how would you deal with the last processed directory? If no, how would you solve it? Thanks and kind regards, Christian
