AW: AW: Tracking Mail After Folder Moves [unsigned]

Bernd Waibel Sun, 29 Mar 2015 12:56:17 -0700

Hi Jerry

i would try to generate a UUID by supported Java classes (UUID class or 
SecureRandom class).
For how unique they are you should search the web.
I used SecureRandom.


You could also hash the whole email object, but this may be time consuming and 
be not unique enough.

I think, creating a UUID using SecueRandom together with your selection of 
attributes would be good enough.
And I do not think your hash is a hack.

Adding the UUID as a header may be a solution. See the AddHeader Mailet.
Normally self defined headers keep unchanged by the email systems.
But headers may be lost when a mail is copied. This depends on the mail client.
I am not sure, but I think they keep intact for move and delete.
I am not sure for copy, since copy creates a new mail object.

I think you will keep the headers if the mail client makes a copy to another 
imap folder.
Resending or forward may kill he most headers.
You may need to test this.

If using a header it is common to use a X_ header, e.g. X_PPP_UUID. Where PPP 
is a name for your product.
A header could have many values as e.g the Received header.
You could use addHeader to add another value, or replaceHeader to be sure to 
have only one header.
I think replaceHeader is what you need.

Adding a header at the time the server receives the email sounds good to me.
You could check that the header does not exist at this time.


But Outgoing mails will not have a header?
I am not sure how to handle outgoing headers.


Greetings
Bernd


-------- Ursprüngliche Nachricht --------
Von: Jerry Malcolm <techst...@malcolms.com>
Datum: 28.03.2015 17:53 (GMT+01:00)
An: James Users List <server-user@james.apache.org>
Betreff: Re: AW: Tracking Mail After Folder Moves [unsigned]

Hi Bernd,

Thanks for the response.  Here's the problem scenario.... A company has
lots of dealing with their client, "Joe".  Several different employees
have to email Joe regularly.  So correspondence to Joe is spread across
several mail accounts and several folders in each account (sent, inbox,
archive 2013, archive 2014, etc, and some employees might have a "Joe's
stuff" folder).  There are no restrictions as to what folder a
particular email might be in (and possibly be moved to a different
folder tomorrow).  The 'boss' wants to be able to see a list of 'all
email to Joe'.

As mail arrives I extract relevant search criteria and store it in my
search engine database.  So it is easy for me to assemble the list of
emails to Joe.  The one thing I need..... I need to be able then to
query JAMES_MAIL and extract the actual mail records.  A simple,
guaranteed, unchangeable UUID in JAMES_MAIL is ALL I need. I simply
store the UUID as the key in my search engine.  (Once I find the record,
I can determine the account/folder from the record and then use standard
IMAP functions to access the mail item).

My main design points...

1) The UUID must be immutable,  folder independent, and folder-move
independent.
2) I do not want to duplicate a near million mail entry db somewhere
else.  I want to pull the mail from the existing JAMES_MAIL db.
3) New mail and deleted mail will be handled by search engine sync
utilities and is not an issue

In summary.... my search engine finds the index/key record it wants....
It must then locate that  particular mail item in the JAMES_MAIL table"

I'm currently generating a hash UUID including various fields such as
from, to, subject, etc. that is working fairly well at generating a
unique id/key.  It still feels like a hack.  And I'm sure there will be
situations where the calculated hash is a dup from a different email.
So it's not 100%.

As you originally theorized, the simplest solution would have been to
have the db autogen an incrementing id.  But as you pointed out, the
copy/delete on folder move kills that id.  Perhaps add a UUID header to
the mail when it first comes in if such header does not exist.  Then
always reflect that UUID header value to the JAMES_MAIL table's UUID
field for db query use (??).  Headers remain intact in case of
Thunderbird's  copy/delete, correct?

Thoughts?

Jerry

On 3/13/2015 6:15 PM, Bernd Waibel wrote:
> Sorry,
>
> Thought about again:
> I think using a sequence is wrong. Cause Thunderbird makes a "COPY", you will 
> get a new UUID for the "B:42" mail, and as I understand that is not what you 
> need.
>
> Greetings
> Bernd
>
> -----Ursprüngliche Nachricht-----
> Von: Bernd Waibel [mailto:bwai...@intarsys.de]
> Gesendet: Samstag, 14. März 2015 00:07
> An: James Users List
> Betreff: AW: Tracking Mail After Folder Moves [unsigned]
>
> Hello Jerry,
>
> just a few thoughts about alternatives (not sure I got your problem).
>
> Why don't use a database sequence field or AUTO_INCREMENT field, instead of a 
> UUID? And let the database handle the UUID creation?
> But if you would like to use UUIDs: Make sure it is not part of a race 
> condition.
> As shortly described here for postgres sequences: 
> http://www.neilconway.org/docs/sequences/
> James is multithreaded.
>
> Maybe the UUID field should be indexed, if you search for it often (a 
> sequence field does not need to be indexed).
>
> Maybe a database trigger on insert could create your "index table". And 
> another trigger could delete "on delete".
>
> You said, you will have a hourly delay of indexing when using cron. What 
> happens, if a new mail arrives, and the user moves this mail immediately to 
> another folder, before indexed, is this ok for your process?
> It is just the way I handle my mails: on arrival I move the mails to a new 
> folder (after reading).
>
>
> But a good indexing solution implemented in James would be nice, too. ;-)
>
>
> Greetings
> Bernd
>
> -----Ursprüngliche Nachricht-----
> Von: Jerry Malcolm [mailto:techst...@malcolms.com]
> Gesendet: Freitag, 13. März 2015 22:08
> An: server-user@james.apache.org
> Betreff: Re: Tracking Mail After Folder Moves
>
> Benoit,
>
> Thanks for the info.  Kinda what I was suspecting.  Here's what I've done so 
> far...
>
> My ultimate objective is to maintain a searchable index for all of the 
> hundreds of thousands of emails stored in my JAMES mail db.  As previously 
> discussed, this is only possible assuming I have a way to later locate a 
> particular email that I have built an index for (assuming the user will move 
> it around between folders...)
>
> 1) Step one was to add one more column to the JAMES_MAIL table for my own 
> globally-unique UUID
> 2) When JAMES stores an email, this column defaults to -1, so I'll know it 
> hasn't yet been indexed
> 3) A chron job runs hourly and creates an index for the new mail. It also 
> adds the matching index records with all of the keyword info I want to track 
> into my own separate index table.
> 4) I have code to process index queries and identify the UUID for the desired 
> mail
> 5) I query the JAMES_MAIL table for the mail record using the UUID value.  I 
> then extract the folder and ID info in that record.
> 6) Finally, I go back around to the 'front door' and use the standard IMAP 
> interface with the folder and ID info to access the desired email for the 
> user.
>
> Granted, emails can be deleted.  I periodically clean out index entries for 
> UUIDs that no longer exist.
>
> This is all pretty much working.  But as you said, this is going to require 
> remerging everything each time I upgrade JAMES. I'm not really thrilled with 
> modifying the schema for JAMES db tables.  I wouldn't expect all of my 
> indexing functionality to be in JAMES. But I would love to have JAMES 
> maintain a single global UUID column in JAMES_MAIL.  That would make merging 
> my functionality with JAMES much cleaner.
>
> As I said, this is pretty much working now the way I described.  I just 
> decided to bring it up here on the forum to make sure I'm not re-inventing 
> the wheel or something by overlooking existing functionality in JAMES.  It 
> appears now that I'm blazing new trails and not duplicating anything that's 
> existing.  But if there's any talk in the future, I definitely want to keep 
> up with discussions.
>
> Thanks again.
>
> Jerry
>
> On 3/13/2015 11:42 AM, Benoit Tellier wrote:
>> Hi Jerry,
>>
>> You are right ... This is what happens when you drag and drop an
>> e-mail in thunderbid from folder A to B :
>>
>>    1 : Client receive a mail in folder A . The mail is identified by
>> the pair ( mailbox path + uid ). Mailbox path ( or mailbox Id ) is
>> folder specific. Uid is a long, per mailbox generated. It makes no sens 
>> alone.
>> Let say we have ( A : 36 ).
>>
>>    2 : You perform the drag and drop
>>
>>    3 : Thunderbird issue a UID COPY command.
>>
>>    4 : So you have the exact same mail in B, let say ( B : 42 ).
>>
>>    5 : James dispatch a Added event for ( B : 42 ) ( Here we don't know
>> where this mail came from )
>>
>>    6 : Your client perform a UID EXPUNGE command on ( A : 36 ).
>>
>>    7 : ( A : 36 ) is deleted
>>
>>    8 : You have de delete event for ( A : 36 ) ( Here we don't know
>> where this mail came from )
>>
>> Note that the events I quoted you triggers IDLE operation, and
>> thunderbird gets aware of what is happening. Then it reads ( B : 42 )
>> and displays it.
>>
>> Well, to sum up :
>>
>>    - You do not have global e-mail identifier that survives copy.
>>    - You can not base such a feature on event
>>
>> So what can you do ?
>>
>> If I were you, I would do this :
>>
>>    1 : to choose a MAILBOX implementation ( the one your client want to
>> use ? ),
>>     2 : generate an value on mapper's add operation  ( either a long
>> (if you want it sorted) or a UUID. )
>>
>>    3 : Provides a custom message implementation with an accessor on
>> this value.
>>
>>    3.5 : Every where in your mapper you need to use this new message type.
>>
>>    4 : Upon message mapper copy calls, you cast the copied message into
>> your message type, and copy the field without modiffying it.
>>
>>    5 : Here we are ( not that this value may not be unic as message can
>> get copied but not deleted ). You can just build it, and replaces the
>> old jar for your MAILBOX implementation with the new one, and restart
>> your James server ( yes it works ). Note : update the db schemas
>> before restarting James ;-)
>>
>> Note that you do not need more : such a feature can not be accessed
>> over IMAP, but you can read it using an other application. So your are
>> commpelled to access it threw your mail's storage ( you said it was no
>> problem ... )
>>
>> Don't worry, such a feature is not that hard to implement.
>>
>> Drawbacks : you may have to merge it with other James releases. ( Or
>> get it accepted in the project ? ).
>>
>>
>> Hope it helps,
>>
>> Benoit
>>
>>
>> Le 13/03/2015 16:50, Jerry Malcolm a écrit :
>>> This is somewhat an IMAP question.  But also a JAMES implementation
>>> question.  My client has a massive amount of mail that must be kept
>>> and accessed.  They use Thunderbird and Outlook to do the normal mail
>>> handling stuff.  No problems at all on the client side.  But on the
>>> back end, I need to sort and organize and keep track of emails and be
>>> able to pull them up using a web interface on demand, completely
>>> independent of folders that they may currently be in.  In other
>>> words, I need to keep track of 'email x' and be able to find it at a
>>> later time no matter how many times the user moves it from folder to folder.
>>>
>>> I believe I understand the philosophy of IMAP for the client is to
>>> find a folder, display the contents, refresh periodically and
>>> add/remove mail from its records for that folder as contents change.
>>> Basically if the user moves a mail item from one folder to another,
>>> the first folder recognizes it's no longer there, and is done with
>>> it.  The other folder subsequently realizes it has a new email item
>>> and displays it.  But there is no knowledge that this is the same
>>> email.  Have I got it pretty much correct?
>>>
>>> So... I realize I may be stretching/bending the intent of IMAP.  But
>>> that doesn't diminish the fact that I have the requirement.  I've dug
>>> through all of the database table schemas for JAMES and have a pretty
>>> good handle on how mail is stored and tracked internally. But I may
>>> have missed something.  So my main question is.... is there a way for
>>> me to permanently track an email item and be able to locate it at
>>> some point down the road even if it's been moved around folders several 
>>> times?
>>> Basically, is there a global unique ID for every email stored?  BTW....
>>> I'm not bound by having to use only IMAP.  I have no problem at all
>>> back-dooring to the JAMES database and writing code to use SQL to
>>> track through the database tables to find the email.  I just don't
>>> think there is anything unique/unchangeable that will allow me to
>>> permanently track a particular email.
>>>
>>> Am I totally off the wall in considering something like this?  Seems
>>> a complete waste to have to duplicate a hundred gigs of mail data for
>>> my own archive when JAMES has a perfectly good copy of everything.
>>>
>>> Suggestions?
>>>
>>> Thanks.
>>>
>>> Jerry
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: server-user-unsubscr...@james.apache.org
>>> For additional commands, e-mail: server-user-h...@james.apache.org
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: server-user-unsubscr...@james.apache.org
>> For additional commands, e-mail: server-user-h...@james.apache.org
>>
>>
>>
>> -----
>> No virus found in this message.
>> Checked by AVG - www.avg.com<http://www.avg.com>
>> Version: 2015.0.5856 / Virus Database: 4306/9292 - Release Date:
>> 03/13/15
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-user-unsubscr...@james.apache.org
> For additional commands, e-mail: server-user-h...@james.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-user-unsubscr...@james.apache.org
> For additional commands, e-mail: server-user-h...@james.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-user-unsubscr...@james.apache.org
> For additional commands, e-mail: server-user-h...@james.apache.org
>
>
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com<http://www.avg.com>
> Version: 2015.0.5751 / Virus Database: 4315/9400 - Release Date: 03/28/15


---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscr...@james.apache.org
For additional commands, e-mail: server-user-h...@james.apache.org

AW: AW: Tracking Mail After Folder Moves [unsigned]

Reply via email to