[SLUG] Removing IMAP duplicates

Nigel Allen Thu, 03 Dec 2009 16:03:11 -0800

Greetings

We have been using the Thunderbird plugins "Remove Duplicates" and"Remove Duplicates (Alternate)" for some time.

The situation is as follows. A customer of ours has a single IMAPaccount under which there are some hundreds of folders. There are (intotal) 325,000+ emails in these folders and more are added every day(email archive system). These folders are also used for chargingpurposes, so on a monthly basis we de-duplicate all the folders and addup all the received and set in each of the folders for that month.


The folders look like this:

Imap Archive Account
++++++++Customer 0001
++++++++Customer 0002
++++++++Customer 0003
++++++++
++++++++Customer 9999

The problem is that the de-duplication run is taking (as you wouldexpect) longer and longer each month as it compares every email withevery other email (we de-duplicate from the IMAP account down andinclude all folders). It's now up to 24+ hours. The only way I can thinkof shortening this is to select and de-duplicate each of the individualsub-folders (Customers) as a separate operation. That will be a lotfaster in pure comparison but will tie up a valuable resource (me) forhours on end - maybe an entire day.

Does anyone know of a way in which we could run the de-duplicationprocess on all the sub-folders at once on the server itself - whichshould be way faster?


Currently running Centos 4.8, sendmail and dovecot.

Thanks in anticipation.

Nigel.

--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html

[SLUG] Removing IMAP duplicates

Reply via email to