Robert,

I guess that's always a possibility. But it seems like the overhead would be huge: Purchasing, installing, and learning a completely new database system, doubling the several terabytes of storage, both primary and backup to duplicate the entire existing mail db, writing and maintaining code to guarantee the databases are 100% in sync (which experience tells me they'll never stay 100% in sync, no matter how much code I write), and then writing the code to search and extract the data from the duplicate parallel database system.... All of this vs. adding another column to JAMES_MAIL table. Not sure I want to go that route. I've already got the code that maintains the search keys and does the searches. I simply need a way to retrieve the one record out of nearly a million records in the JAMES_MAIL table that contains the email that my search engine has found.

On 3/14/2015 6:18 PM, Robert Munn wrote:
Jerry,

Have you considered using an external system for this purpose?

Offhand, I would think you could use either a NoSQL store like Cassandra or a 
search system like Solr to maintain your searchable history. You could create a 
mailet to insert new emails into the external system, then you don’t need to 
modify James. Moving emails should not be a problem, even if James makes copies 
of them. Solr is easy to set up and use, and a few hundred thousand documents 
is a small document store for Solr to index, especially since your documents 
will not change once they are inserted.

Using Solr would also give you the benefits of sophisticated searching while 
still allowing you to store the full text of each email.

On Mar 13, 2015, at 2:08 PM, Jerry Malcolm <techst...@malcolms.com> wrote:


Benoit,

Thanks for the info.  Kinda what I was suspecting.  Here's what I've done so 
far...

My ultimate objective is to maintain a searchable index for all of the hundreds 
of thousands of emails stored in my JAMES mail db.  As previously discussed, 
this is only possible assuming I have a way to later locate a particular email 
that I have built an index for (assuming the user will move it around between 
folders...)

1) Step one was to add one more column to the JAMES_MAIL table for my own 
globally-unique UUID
2) When JAMES stores an email, this column defaults to -1, so I'll know it 
hasn't yet been indexed
3) A chron job runs hourly and creates an index for the new mail. It also adds 
the matching index records with all of the keyword info I want to track into my 
own separate index table.
4) I have code to process index queries and identify the UUID for the desired 
mail
5) I query the JAMES_MAIL table for the mail record using the UUID value.  I 
then extract the folder and ID info in that record.
6) Finally, I go back around to the 'front door' and use the standard IMAP 
interface with the folder and ID info to access the desired email for the user.

Granted, emails can be deleted.  I periodically clean out index entries for 
UUIDs that no longer exist.

This is all pretty much working.  But as you said, this is going to require 
remerging everything each time I upgrade JAMES. I'm not really thrilled with 
modifying the schema for JAMES db tables.  I wouldn't expect all of my indexing 
functionality to be in JAMES. But I would love to have JAMES maintain a single 
global UUID column in JAMES_MAIL.  That would make merging my functionality 
with JAMES much cleaner.

As I said, this is pretty much working now the way I described.  I just decided 
to bring it up here on the forum to make sure I'm not re-inventing the wheel or 
something by overlooking existing functionality in JAMES.  It appears now that 
I'm blazing new trails and not duplicating anything that's existing.  But if 
there's any talk in the future, I definitely want to keep up with discussions.

Thanks again.

Jerry

On 3/13/2015 11:42 AM, Benoit Tellier wrote:
Hi Jerry,

You are right ... This is what happens when you drag and drop an e-mail
in thunderbid from folder A to B :

  1 : Client receive a mail in folder A . The mail is identified by the
pair ( mailbox path + uid ). Mailbox path ( or mailbox Id ) is folder
specific. Uid is a long, per mailbox generated. It makes no sens alone.
Let say we have ( A : 36 ).

  2 : You perform the drag and drop

  3 : Thunderbird issue a UID COPY command.

  4 : So you have the exact same mail in B, let say ( B : 42 ).

  5 : James dispatch a Added event for ( B : 42 ) ( Here we don't know
where this mail came from )

  6 : Your client perform a UID EXPUNGE command on ( A : 36 ).

  7 : ( A : 36 ) is deleted

  8 : You have de delete event for ( A : 36 ) ( Here we don't know where
this mail came from )

Note that the events I quoted you triggers IDLE operation, and
thunderbird gets aware of what is happening. Then it reads ( B : 42 )
and displays it.

Well, to sum up :

  - You do not have global e-mail identifier that survives copy.
  - You can not base such a feature on event

So what can you do ?

If I were you, I would do this :

  1 : to choose a MAILBOX implementation ( the one your client want to
use ? ),
   2 : generate an value on mapper's add operation  ( either a long (if
you want it sorted) or a UUID. )

  3 : Provides a custom message implementation with an accessor on this
value.

  3.5 : Every where in your mapper you need to use this new message type.

  4 : Upon message mapper copy calls, you cast the copied message into
your message type, and copy the field without modiffying it.

  5 : Here we are ( not that this value may not be unic as message can
get copied but not deleted ). You can just build it, and replaces the
old jar for your MAILBOX implementation with the new one, and restart
your James server ( yes it works ). Note : update the db schemas before
restarting James ;-)

Note that you do not need more : such a feature can not be accessed over
IMAP, but you can read it using an other application. So your are
commpelled to access it threw your mail's storage ( you said it was no
problem ... )

Don't worry, such a feature is not that hard to implement.

Drawbacks : you may have to merge it with other James releases. ( Or get
it accepted in the project ? ).


Hope it helps,

Benoit


Le 13/03/2015 16:50, Jerry Malcolm a écrit :
This is somewhat an IMAP question.  But also a JAMES implementation
question.  My client has a massive amount of mail that must be kept and
accessed.  They use Thunderbird and Outlook to do the normal mail
handling stuff.  No problems at all on the client side.  But on the back
end, I need to sort and organize and keep track of emails and be able to
pull them up using a web interface on demand, completely independent of
folders that they may currently be in.  In other words, I need to keep
track of 'email x' and be able to find it at a later time no matter how
many times the user moves it from folder to folder.

I believe I understand the philosophy of IMAP for the client is to find
a folder, display the contents, refresh periodically and add/remove mail
from its records for that folder as contents change.  Basically if the
user moves a mail item from one folder to another, the first folder
recognizes it's no longer there, and is done with it.  The other folder
subsequently realizes it has a new email item and displays it.  But
there is no knowledge that this is the same email.  Have I got it pretty
much correct?

So... I realize I may be stretching/bending the intent of IMAP.  But
that doesn't diminish the fact that I have the requirement.  I've dug
through all of the database table schemas for JAMES and have a pretty
good handle on how mail is stored and tracked internally. But I may have
missed something.  So my main question is.... is there a way for me to
permanently track an email item and be able to locate it at some point
down the road even if it's been moved around folders several times?
Basically, is there a global unique ID for every email stored?  BTW....
I'm not bound by having to use only IMAP.  I have no problem at all
back-dooring to the JAMES database and writing code to use SQL to track
through the database tables to find the email.  I just don't think there
is anything unique/unchangeable that will allow me to permanently track
a particular email.

Am I totally off the wall in considering something like this?  Seems a
complete waste to have to duplicate a hundred gigs of mail data for my
own archive when JAMES has a perfectly good copy of everything.

Suggestions?

Thanks.

Jerry

---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscr...@james.apache.org
For additional commands, e-mail: server-user-h...@james.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscr...@james.apache.org
For additional commands, e-mail: server-user-h...@james.apache.org



-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5856 / Virus Database: 4306/9292 - Release Date: 03/13/15

---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscr...@james.apache.org
For additional commands, e-mail: server-user-h...@james.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscr...@james.apache.org
For additional commands, e-mail: server-user-h...@james.apache.org



-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4306/9301 - Release Date: 03/14/15



---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscr...@james.apache.org
For additional commands, e-mail: server-user-h...@james.apache.org

Reply via email to