Greetings
We have been using the Thunderbird plugins "Remove Duplicates" and
"Remove Duplicates (Alternate)" for some time.
The situation is as follows. A customer of ours has a single IMAP
account under which there are some hundreds of folders. There are (in
total) 325,000+ emails in these folders and more are added every day
(email archive system). These folders are also used for charging
purposes, so on a monthly basis we de-duplicate all the folders and add
up all the received and set in each of the folders for that month.
The folders look like this:
Imap Archive Account
++++++++Customer 0001
++++++++Customer 0002
++++++++Customer 0003
++++++++
++++++++Customer 9999
The problem is that the de-duplication run is taking (as you would
expect) longer and longer each month as it compares every email with
every other email (we de-duplicate from the IMAP account down and
include all folders). It's now up to 24+ hours. The only way I can think
of shortening this is to select and de-duplicate each of the individual
sub-folders (Customers) as a separate operation. That will be a lot
faster in pure comparison but will tie up a valuable resource (me) for
hours on end - maybe an entire day.
Does anyone know of a way in which we could run the de-duplication
process on all the sub-folders at once on the server itself - which
should be way faster?
Currently running Centos 4.8, sendmail and dovecot.
Thanks in anticipation.
Nigel.
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html