Re: Tracking Mail After Folder Moves

Robert Munn Mon, 16 Mar 2015 19:32:42 -0700

Certainly it all depends on your needs and your resource constraints. Solr
or ElasticSearch would provide a robust and feature-rich search system, but
you would ideally build a custom search interface for your needs, and as
you noted, possibly spin up new infrastructure to support it.


Solr isn't a database, per se, it is an indexing/search system, so syncing
it is a matter of indexing your email into it. A mailet that adds all new
emails to Solr would do it. If you ever think your index is out of sync,
all you do is a full re-indexing of the content. Most of the work is up
front in mapping content into Solr, building the indexing process into the
mailet, building a general administrative re-indexing and search system
(which you can build in any programming language of your choice ).

Either way, good luck with your implementation.




On Mon, Mar 16, 2015 at 3:17 PM, Jerry Malcolm <techst...@malcolms.com>
wrote:

> Robert,
>
> I guess that's always a possibility.  But it seems like the overhead would
> be huge:  Purchasing, installing, and learning a completely new database
> system, doubling the several terabytes of storage, both primary and backup
> to duplicate the entire existing mail db, writing and maintaining code to
> guarantee the databases are 100% in sync (which experience tells me they'll
> never stay 100% in sync, no matter how much code I write), and then writing
> the code to search and extract the data from the duplicate parallel
> database system.... All of this vs. adding another column to JAMES_MAIL
> table.   Not sure I want to go that route.  I've already got the code that
> maintains the search keys and does the searches.  I simply need a way to
> retrieve the one record out of nearly a million records in the JAMES_MAIL
> table that contains the email that my search engine has found.
>
>
> On 3/14/2015 6:18 PM, Robert Munn wrote:
>
>> Jerry,
>>
>> Have you considered using an external system for this purpose?
>>
>> Offhand, I would think you could use either a NoSQL store like Cassandra
>> or a search system like Solr to maintain your searchable history. You could
>> create a mailet to insert new emails into the external system, then you
>> don’t need to modify James. Moving emails should not be a problem, even if
>> James makes copies of them. Solr is easy to set up and use, and a few
>> hundred thousand documents is a small document store for Solr to index,
>> especially since your documents will not change once they are inserted.
>>
>> Using Solr would also give you the benefits of sophisticated searching
>> while still allowing you to store the full text of each email.
>>
>>
>> On Mar 13, 2015, at 2:08 PM, Jerry Malcolm <techst...@malcolms.com>
>> wrote:
>>
>>
>>  Benoit,
>>>
>>> Thanks for the info.  Kinda what I was suspecting.  Here's what I've
>>> done so far...
>>>
>>> My ultimate objective is to maintain a searchable index for all of the
>>> hundreds of thousands of emails stored in my JAMES mail db.  As previously
>>> discussed, this is only possible assuming I have a way to later locate a
>>> particular email that I have built an index for (assuming the user will
>>> move it around between folders...)
>>>
>>> 1) Step one was to add one more column to the JAMES_MAIL table for my
>>> own globally-unique UUID
>>> 2) When JAMES stores an email, this column defaults to -1, so I'll know
>>> it hasn't yet been indexed
>>> 3) A chron job runs hourly and creates an index for the new mail. It
>>> also adds the matching index records with all of the keyword info I want to
>>> track into my own separate index table.
>>> 4) I have code to process index queries and identify the UUID for the
>>> desired mail
>>> 5) I query the JAMES_MAIL table for the mail record using the UUID
>>> value.  I then extract the folder and ID info in that record.
>>> 6) Finally, I go back around to the 'front door' and use the standard
>>> IMAP interface with the folder and ID info to access the desired email for
>>> the user.
>>>
>>> Granted, emails can be deleted.  I periodically clean out index entries
>>> for UUIDs that no longer exist.
>>>
>>> This is all pretty much working.  But as you said, this is going to
>>> require remerging everything each time I upgrade JAMES. I'm not really
>>> thrilled with modifying the schema for JAMES db tables.  I wouldn't expect
>>> all of my indexing functionality to be in JAMES. But I would love to have
>>> JAMES maintain a single global UUID column in JAMES_MAIL.  That would make
>>> merging my functionality with JAMES much cleaner.
>>>
>>> As I said, this is pretty much working now the way I described.  I just
>>> decided to bring it up here on the forum to make sure I'm not re-inventing
>>> the wheel or something by overlooking existing functionality in JAMES.  It
>>> appears now that I'm blazing new trails and not duplicating anything that's
>>> existing.  But if there's any talk in the future, I definitely want to keep
>>> up with discussions.
>>>
>>> Thanks again.
>>>
>>> Jerry
>>>
>>> On 3/13/2015 11:42 AM, Benoit Tellier wrote:
>>>
>>>> Hi Jerry,
>>>>
>>>> You are right ... This is what happens when you drag and drop an e-mail
>>>> in thunderbid from folder A to B :
>>>>
>>>>   1 : Client receive a mail in folder A . The mail is identified by the
>>>> pair ( mailbox path + uid ). Mailbox path ( or mailbox Id ) is folder
>>>> specific. Uid is a long, per mailbox generated. It makes no sens alone.
>>>> Let say we have ( A : 36 ).
>>>>
>>>>   2 : You perform the drag and drop
>>>>
>>>>   3 : Thunderbird issue a UID COPY command.
>>>>
>>>>   4 : So you have the exact same mail in B, let say ( B : 42 ).
>>>>
>>>>   5 : James dispatch a Added event for ( B : 42 ) ( Here we don't know
>>>> where this mail came from )
>>>>
>>>>   6 : Your client perform a UID EXPUNGE command on ( A : 36 ).
>>>>
>>>>   7 : ( A : 36 ) is deleted
>>>>
>>>>   8 : You have de delete event for ( A : 36 ) ( Here we don't know where
>>>> this mail came from )
>>>>
>>>> Note that the events I quoted you triggers IDLE operation, and
>>>> thunderbird gets aware of what is happening. Then it reads ( B : 42 )
>>>> and displays it.
>>>>
>>>> Well, to sum up :
>>>>
>>>>   - You do not have global e-mail identifier that survives copy.
>>>>   - You can not base such a feature on event
>>>>
>>>> So what can you do ?
>>>>
>>>> If I were you, I would do this :
>>>>
>>>>   1 : to choose a MAILBOX implementation ( the one your client want to
>>>> use ? ),
>>>>    2 : generate an value on mapper's add operation  ( either a long (if
>>>> you want it sorted) or a UUID. )
>>>>
>>>>   3 : Provides a custom message implementation with an accessor on this
>>>> value.
>>>>
>>>>   3.5 : Every where in your mapper you need to use this new message
>>>> type.
>>>>
>>>>   4 : Upon message mapper copy calls, you cast the copied message into
>>>> your message type, and copy the field without modiffying it.
>>>>
>>>>   5 : Here we are ( not that this value may not be unic as message can
>>>> get copied but not deleted ). You can just build it, and replaces the
>>>> old jar for your MAILBOX implementation with the new one, and restart
>>>> your James server ( yes it works ). Note : update the db schemas before
>>>> restarting James ;-)
>>>>
>>>> Note that you do not need more : such a feature can not be accessed over
>>>> IMAP, but you can read it using an other application. So your are
>>>> commpelled to access it threw your mail's storage ( you said it was no
>>>> problem ... )
>>>>
>>>> Don't worry, such a feature is not that hard to implement.
>>>>
>>>> Drawbacks : you may have to merge it with other James releases. ( Or get
>>>> it accepted in the project ? ).
>>>>
>>>>
>>>> Hope it helps,
>>>>
>>>> Benoit
>>>>
>>>>
>>>> Le 13/03/2015 16:50, Jerry Malcolm a écrit :
>>>>
>>>>> This is somewhat an IMAP question.  But also a JAMES implementation
>>>>> question.  My client has a massive amount of mail that must be kept and
>>>>> accessed.  They use Thunderbird and Outlook to do the normal mail
>>>>> handling stuff.  No problems at all on the client side.  But on the
>>>>> back
>>>>> end, I need to sort and organize and keep track of emails and be able
>>>>> to
>>>>> pull them up using a web interface on demand, completely independent of
>>>>> folders that they may currently be in.  In other words, I need to keep
>>>>> track of 'email x' and be able to find it at a later time no matter how
>>>>> many times the user moves it from folder to folder.
>>>>>
>>>>> I believe I understand the philosophy of IMAP for the client is to find
>>>>> a folder, display the contents, refresh periodically and add/remove
>>>>> mail
>>>>> from its records for that folder as contents change.  Basically if the
>>>>> user moves a mail item from one folder to another, the first folder
>>>>> recognizes it's no longer there, and is done with it.  The other folder
>>>>> subsequently realizes it has a new email item and displays it.  But
>>>>> there is no knowledge that this is the same email.  Have I got it
>>>>> pretty
>>>>> much correct?
>>>>>
>>>>> So... I realize I may be stretching/bending the intent of IMAP.  But
>>>>> that doesn't diminish the fact that I have the requirement.  I've dug
>>>>> through all of the database table schemas for JAMES and have a pretty
>>>>> good handle on how mail is stored and tracked internally. But I may
>>>>> have
>>>>> missed something.  So my main question is.... is there a way for me to
>>>>> permanently track an email item and be able to locate it at some point
>>>>> down the road even if it's been moved around folders several times?
>>>>> Basically, is there a global unique ID for every email stored?  BTW....
>>>>> I'm not bound by having to use only IMAP.  I have no problem at all
>>>>> back-dooring to the JAMES database and writing code to use SQL to track
>>>>> through the database tables to find the email.  I just don't think
>>>>> there
>>>>> is anything unique/unchangeable that will allow me to permanently track
>>>>> a particular email.
>>>>>
>>>>> Am I totally off the wall in considering something like this?  Seems a
>>>>> complete waste to have to duplicate a hundred gigs of mail data for my
>>>>> own archive when JAMES has a perfectly good copy of everything.
>>>>>
>>>>> Suggestions?
>>>>>
>>>>> Thanks.
>>>>>
>>>>> Jerry
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: server-user-unsubscr...@james.apache.org
>>>>> For additional commands, e-mail: server-user-h...@james.apache.org
>>>>>
>>>>>  ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: server-user-unsubscr...@james.apache.org
>>>> For additional commands, e-mail: server-user-h...@james.apache.org
>>>>
>>>>
>>>>
>>>> -----
>>>> No virus found in this message.
>>>> Checked by AVG - www.avg.com
>>>> Version: 2015.0.5856 / Virus Database: 4306/9292 - Release Date:
>>>> 03/13/15
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: server-user-unsubscr...@james.apache.org
>>> For additional commands, e-mail: server-user-h...@james.apache.org
>>>
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: server-user-unsubscr...@james.apache.org
>> For additional commands, e-mail: server-user-h...@james.apache.org
>>
>>
>>
>> -----
>> No virus found in this message.
>> Checked by AVG - www.avg.com
>> Version: 2015.0.5751 / Virus Database: 4306/9301 - Release Date: 03/14/15
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-user-unsubscr...@james.apache.org
> For additional commands, e-mail: server-user-h...@james.apache.org
>
>

Re: Tracking Mail After Folder Moves

Reply via email to