Nigel Allen wrote:
Greetings
We have been using the Thunderbird plugins "Remove Duplicates" and
"Remove Duplicates (Alternate)" for some time.
The situation is as follows. A customer of ours has a single IMAP
account under which there are some hundreds of folders. There are (in
total) 325,000+ emails in these folders and more are added every day
(email archive system). These folders are also used for charging
purposes, so on a monthly basis we de-duplicate all the folders and
add up all the received and set in each of the folders for that month.
The folders look like this:
Imap Archive Account
++++++++Customer 0001
++++++++Customer 0002
++++++++Customer 0003
++++++++
++++++++Customer 9999
The problem is that the de-duplication run is taking (as you would
expect) longer and longer each month as it compares every email with
every other email (we de-duplicate from the IMAP account down and
include all folders). It's now up to 24+ hours. The only way I can
think of shortening this is to select and de-duplicate each of the
individual sub-folders (Customers) as a separate operation. That will
be a lot faster in pure comparison but will tie up a valuable resource
(me) for hours on end - maybe an entire day.
Does anyone know of a way in which we could run the de-duplication
process on all the sub-folders at once on the server itself - which
should be way faster?
Currently running Centos 4.8, sendmail and dovecot.
Thanks in anticipation.
Nigel.
Not really what you were after I know but dbmail has the option of
removing duplicate emails as it receives them.
perhaps dovecot has a similar feature?
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html