I don't know if I am missing something, but emails have a Message-ID header that is unique by definition, would that do?
On Fri, Apr 3, 2009 at 1:12 PM, Shalin Shekhar Mangar (JIRA) <[email protected]> wrote: > > [ > https://issues.apache.org/jira/browse/SOLR-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel > ] > > Shalin Shekhar Mangar updated SOLR-934: > --------------------------------------- > > Attachment: SOLR-934.patch > > Changes: > # Parse and store the fetchMailsSince string during init. > # Return the sentDate as a Date object rather than as a long timestamp > # Removed context as an argument from the getXFromContext methods > # Removed unused getLongFromContext method > > I just indexed a month's worth of my gmail inbox. Works great! > > One question, what is the uniqueKey that we should use when indexing emails? > I couldn't figure out so I removed the uniqueKey from my schema to try this > out. > > Next steps: > # Enhance the ant build file to copy the dependencies to example/solr/lib > just like Solr Cell does. > # Add a wiki page with instructions to setup, list of dependencies, example > schema and data-config.xml > >> Enable importing of mails into a solr index through DIH. >> -------------------------------------------------------- >> >> Key: SOLR-934 >> URL: https://issues.apache.org/jira/browse/SOLR-934 >> Project: Solr >> Issue Type: New Feature >> Components: contrib - DataImportHandler >> Affects Versions: 1.4 >> Reporter: Preetam Rao >> Assignee: Shalin Shekhar Mangar >> Fix For: 1.4 >> >> Attachments: SOLR-934.patch, SOLR-934.patch, SOLR-934.patch, >> SOLR-934.patch, SOLR-934.patch >> >> Original Estimate: 24h >> Remaining Estimate: 24h >> >> Enable importing of mails into solr through DIH. Take one or more mailbox >> credentials, download and index their content along with the content from >> attachments. The folders to fetch can be made configurable based on various >> criteria. Apache Tika is used for extracting content from different kinds of >> attachments. JavaMail is used for mail box related operations like fetching >> mails, filtering them etc. >> The basic configuration for one mail box is as below: >> {code:xml} >> <document> >> <entity processor="MailEntityProcessor" user="[email protected]" >> password="something" host="imap.gmail.com" protocol="imaps"/> >> </document> >> {code} >> The below is the list of all configuration available: >> {color:green}Required{color} >> --------- >> *user* >> *pwd* >> *protocol* (only "imaps" supported now) >> *host* >> {color:green}Optional{color} >> --------- >> *folders* - comma seperated list of folders. >> If not specified, default folder is used. Nested folders can be specified >> like a/b/c >> *recurse* - index subfolders. Defaults to true. >> *exclude* - comma seperated list of patterns. >> *include* - comma seperated list of patterns. >> *batchSize* - mails to fetch at once in a given folder. >> Only headers can be prefetched in Javamail IMAP. >> *readTimeout* - defaults to 60000ms >> *conectTimeout* - defaults to 30000ms >> *fetchSize* - IMAP config. 32KB default >> *fetchMailsSince* - >> date/time in "yyyy-MM-dd HH:mm:ss" format, mails received after which will >> be fetched. Useful for delta import. >> *customFilter* - class name. >> {code} >> import javax.mail.Folder; >> import javax.mail.SearchTerm; >> clz implements MailEntityProcessor.CustomFilter() { >> public SearchTerm getCustomSearch(Folder folder); >> } >> {code} >> *processAttachement* - defaults to true >> The below are the indexed fields. >> {code} >> // Fields To Index >> // single valued >> private static final String SUBJECT = "subject"; >> private static final String FROM = "from"; >> private static final String SENT_DATE = "sentDate"; >> private static final String XMAILER = "xMailer"; >> // multi valued >> private static final String TO_CC_BCC = "allTo"; >> private static final String FLAGS = "flags"; >> private static final String CONTENT = "content"; >> private static final String ATTACHMENT = "attachement"; >> private static final String ATTACHMENT_NAMES = "attachementNames"; >> // flag values >> private static final String FLAG_ANSWERED = "answered"; >> private static final String FLAG_DELETED = "deleted"; >> private static final String FLAG_DRAFT = "draft"; >> private static final String FLAG_FLAGGED = "flagged"; >> private static final String FLAG_RECENT = "recent"; >> private static final String FLAG_SEEN = "seen"; >> {code} > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > >
