Certainly it all depends on your needs and your resource constraints. Solr or ElasticSearch would provide a robust and feature-rich search system, but you would ideally build a custom search interface for your needs, and as you noted, possibly spin up new infrastructure to support it.
Solr isn't a database, per se, it is an indexing/search system, so syncing it is a matter of indexing your email into it. A mailet that adds all new emails to Solr would do it. If you ever think your index is out of sync, all you do is a full re-indexing of the content. Most of the work is up front in mapping content into Solr, building the indexing process into the mailet, building a general administrative re-indexing and search system (which you can build in any programming language of your choice ). Either way, good luck with your implementation. On Mon, Mar 16, 2015 at 3:17 PM, Jerry Malcolm <techst...@malcolms.com> wrote: > Robert, > > I guess that's always a possibility. But it seems like the overhead would > be huge: Purchasing, installing, and learning a completely new database > system, doubling the several terabytes of storage, both primary and backup > to duplicate the entire existing mail db, writing and maintaining code to > guarantee the databases are 100% in sync (which experience tells me they'll > never stay 100% in sync, no matter how much code I write), and then writing > the code to search and extract the data from the duplicate parallel > database system.... All of this vs. adding another column to JAMES_MAIL > table. Not sure I want to go that route. I've already got the code that > maintains the search keys and does the searches. I simply need a way to > retrieve the one record out of nearly a million records in the JAMES_MAIL > table that contains the email that my search engine has found. > > > On 3/14/2015 6:18 PM, Robert Munn wrote: > >> Jerry, >> >> Have you considered using an external system for this purpose? >> >> Offhand, I would think you could use either a NoSQL store like Cassandra >> or a search system like Solr to maintain your searchable history. You could >> create a mailet to insert new emails into the external system, then you >> don’t need to modify James. Moving emails should not be a problem, even if >> James makes copies of them. Solr is easy to set up and use, and a few >> hundred thousand documents is a small document store for Solr to index, >> especially since your documents will not change once they are inserted. >> >> Using Solr would also give you the benefits of sophisticated searching >> while still allowing you to store the full text of each email. >> >> >> On Mar 13, 2015, at 2:08 PM, Jerry Malcolm <techst...@malcolms.com> >> wrote: >> >> >> Benoit, >>> >>> Thanks for the info. Kinda what I was suspecting. Here's what I've >>> done so far... >>> >>> My ultimate objective is to maintain a searchable index for all of the >>> hundreds of thousands of emails stored in my JAMES mail db. As previously >>> discussed, this is only possible assuming I have a way to later locate a >>> particular email that I have built an index for (assuming the user will >>> move it around between folders...) >>> >>> 1) Step one was to add one more column to the JAMES_MAIL table for my >>> own globally-unique UUID >>> 2) When JAMES stores an email, this column defaults to -1, so I'll know >>> it hasn't yet been indexed >>> 3) A chron job runs hourly and creates an index for the new mail. It >>> also adds the matching index records with all of the keyword info I want to >>> track into my own separate index table. >>> 4) I have code to process index queries and identify the UUID for the >>> desired mail >>> 5) I query the JAMES_MAIL table for the mail record using the UUID >>> value. I then extract the folder and ID info in that record. >>> 6) Finally, I go back around to the 'front door' and use the standard >>> IMAP interface with the folder and ID info to access the desired email for >>> the user. >>> >>> Granted, emails can be deleted. I periodically clean out index entries >>> for UUIDs that no longer exist. >>> >>> This is all pretty much working. But as you said, this is going to >>> require remerging everything each time I upgrade JAMES. I'm not really >>> thrilled with modifying the schema for JAMES db tables. I wouldn't expect >>> all of my indexing functionality to be in JAMES. But I would love to have >>> JAMES maintain a single global UUID column in JAMES_MAIL. That would make >>> merging my functionality with JAMES much cleaner. >>> >>> As I said, this is pretty much working now the way I described. I just >>> decided to bring it up here on the forum to make sure I'm not re-inventing >>> the wheel or something by overlooking existing functionality in JAMES. It >>> appears now that I'm blazing new trails and not duplicating anything that's >>> existing. But if there's any talk in the future, I definitely want to keep >>> up with discussions. >>> >>> Thanks again. >>> >>> Jerry >>> >>> On 3/13/2015 11:42 AM, Benoit Tellier wrote: >>> >>>> Hi Jerry, >>>> >>>> You are right ... This is what happens when you drag and drop an e-mail >>>> in thunderbid from folder A to B : >>>> >>>> 1 : Client receive a mail in folder A . The mail is identified by the >>>> pair ( mailbox path + uid ). Mailbox path ( or mailbox Id ) is folder >>>> specific. Uid is a long, per mailbox generated. It makes no sens alone. >>>> Let say we have ( A : 36 ). >>>> >>>> 2 : You perform the drag and drop >>>> >>>> 3 : Thunderbird issue a UID COPY command. >>>> >>>> 4 : So you have the exact same mail in B, let say ( B : 42 ). >>>> >>>> 5 : James dispatch a Added event for ( B : 42 ) ( Here we don't know >>>> where this mail came from ) >>>> >>>> 6 : Your client perform a UID EXPUNGE command on ( A : 36 ). >>>> >>>> 7 : ( A : 36 ) is deleted >>>> >>>> 8 : You have de delete event for ( A : 36 ) ( Here we don't know where >>>> this mail came from ) >>>> >>>> Note that the events I quoted you triggers IDLE operation, and >>>> thunderbird gets aware of what is happening. Then it reads ( B : 42 ) >>>> and displays it. >>>> >>>> Well, to sum up : >>>> >>>> - You do not have global e-mail identifier that survives copy. >>>> - You can not base such a feature on event >>>> >>>> So what can you do ? >>>> >>>> If I were you, I would do this : >>>> >>>> 1 : to choose a MAILBOX implementation ( the one your client want to >>>> use ? ), >>>> 2 : generate an value on mapper's add operation ( either a long (if >>>> you want it sorted) or a UUID. ) >>>> >>>> 3 : Provides a custom message implementation with an accessor on this >>>> value. >>>> >>>> 3.5 : Every where in your mapper you need to use this new message >>>> type. >>>> >>>> 4 : Upon message mapper copy calls, you cast the copied message into >>>> your message type, and copy the field without modiffying it. >>>> >>>> 5 : Here we are ( not that this value may not be unic as message can >>>> get copied but not deleted ). You can just build it, and replaces the >>>> old jar for your MAILBOX implementation with the new one, and restart >>>> your James server ( yes it works ). Note : update the db schemas before >>>> restarting James ;-) >>>> >>>> Note that you do not need more : such a feature can not be accessed over >>>> IMAP, but you can read it using an other application. So your are >>>> commpelled to access it threw your mail's storage ( you said it was no >>>> problem ... ) >>>> >>>> Don't worry, such a feature is not that hard to implement. >>>> >>>> Drawbacks : you may have to merge it with other James releases. ( Or get >>>> it accepted in the project ? ). >>>> >>>> >>>> Hope it helps, >>>> >>>> Benoit >>>> >>>> >>>> Le 13/03/2015 16:50, Jerry Malcolm a écrit : >>>> >>>>> This is somewhat an IMAP question. But also a JAMES implementation >>>>> question. My client has a massive amount of mail that must be kept and >>>>> accessed. They use Thunderbird and Outlook to do the normal mail >>>>> handling stuff. No problems at all on the client side. But on the >>>>> back >>>>> end, I need to sort and organize and keep track of emails and be able >>>>> to >>>>> pull them up using a web interface on demand, completely independent of >>>>> folders that they may currently be in. In other words, I need to keep >>>>> track of 'email x' and be able to find it at a later time no matter how >>>>> many times the user moves it from folder to folder. >>>>> >>>>> I believe I understand the philosophy of IMAP for the client is to find >>>>> a folder, display the contents, refresh periodically and add/remove >>>>> mail >>>>> from its records for that folder as contents change. Basically if the >>>>> user moves a mail item from one folder to another, the first folder >>>>> recognizes it's no longer there, and is done with it. The other folder >>>>> subsequently realizes it has a new email item and displays it. But >>>>> there is no knowledge that this is the same email. Have I got it >>>>> pretty >>>>> much correct? >>>>> >>>>> So... I realize I may be stretching/bending the intent of IMAP. But >>>>> that doesn't diminish the fact that I have the requirement. I've dug >>>>> through all of the database table schemas for JAMES and have a pretty >>>>> good handle on how mail is stored and tracked internally. But I may >>>>> have >>>>> missed something. So my main question is.... is there a way for me to >>>>> permanently track an email item and be able to locate it at some point >>>>> down the road even if it's been moved around folders several times? >>>>> Basically, is there a global unique ID for every email stored? BTW.... >>>>> I'm not bound by having to use only IMAP. I have no problem at all >>>>> back-dooring to the JAMES database and writing code to use SQL to track >>>>> through the database tables to find the email. I just don't think >>>>> there >>>>> is anything unique/unchangeable that will allow me to permanently track >>>>> a particular email. >>>>> >>>>> Am I totally off the wall in considering something like this? Seems a >>>>> complete waste to have to duplicate a hundred gigs of mail data for my >>>>> own archive when JAMES has a perfectly good copy of everything. >>>>> >>>>> Suggestions? >>>>> >>>>> Thanks. >>>>> >>>>> Jerry >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: server-user-unsubscr...@james.apache.org >>>>> For additional commands, e-mail: server-user-h...@james.apache.org >>>>> >>>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: server-user-unsubscr...@james.apache.org >>>> For additional commands, e-mail: server-user-h...@james.apache.org >>>> >>>> >>>> >>>> ----- >>>> No virus found in this message. >>>> Checked by AVG - www.avg.com >>>> Version: 2015.0.5856 / Virus Database: 4306/9292 - Release Date: >>>> 03/13/15 >>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: server-user-unsubscr...@james.apache.org >>> For additional commands, e-mail: server-user-h...@james.apache.org >>> >>> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: server-user-unsubscr...@james.apache.org >> For additional commands, e-mail: server-user-h...@james.apache.org >> >> >> >> ----- >> No virus found in this message. >> Checked by AVG - www.avg.com >> Version: 2015.0.5751 / Virus Database: 4306/9301 - Release Date: 03/14/15 >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: server-user-unsubscr...@james.apache.org > For additional commands, e-mail: server-user-h...@james.apache.org > >