Hello!

I am fetching files from a FTP server (severall GB for the next years). The 
files are produced daily in directories which correspond to the date, like

-       20120501
-       20120502
-       ...

I have only read rights and I am not the only consumer. This means that they 
keep the last month or so on the server and I fetch on a daily base. To avoid 
that I am fetching files twice I want to use an IdempotentRepository 
implementation. I don't want to save each file in a database or in a text file 
because the service will run for years and this is just unnecessary data.

What I want to store is the last processed date only. This handles just the 
directories and would mean that I need some other strategy for the files. I 
could combine this approach with the default in memory based store. But let 
just stick to the directories:

I read the directory sorted by file name. The IdempotentRepository is called by 
the FtpConsumer with

- start()
- contains() for every directory and file
- add() for files only

and that's it. No stop(), no confirm(). When I have errors, sometimes remove() 
is called. Since the repository is called only with String (the full path) I 
have not information if I deal with directories or files. I know it from the 
structure, but I am not able to implement a generic solution.

Anyway the idea is:

- Store the LastProcessedDate inside the repository
- contains(): if the path contains an already processed date 
(<LastProcessedDate) then I skip it (return true) otherwise return false.
- add(): if add() jumps to the next directory I set the LastProcessedDate to 
the directory before

The only problem is the last processed directory: even if it is finished I do 
not get the chance to mark it as processed (set LastProcessedDate to its value).

So finally my questions: do you think this approach makes sense and if yes: how 
would you deal with the last processed directory?
If no, how would you solve it?

Thanks and kind regards, Christian

Reply via email to