I don't know if I am missing something, but emails have a Message-ID
header that is unique by definition, would that do?

On Fri, Apr 3, 2009 at 1:12 PM, Shalin Shekhar Mangar (JIRA)
<[email protected]> wrote:
>
>     [ 
> https://issues.apache.org/jira/browse/SOLR-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>  ]
>
> Shalin Shekhar Mangar updated SOLR-934:
> ---------------------------------------
>
>    Attachment: SOLR-934.patch
>
> Changes:
> # Parse and store the fetchMailsSince string during init.
> # Return the sentDate as a Date object rather than as a long timestamp
> # Removed context as an argument from the getXFromContext methods
> # Removed unused getLongFromContext method
>
> I just indexed a month's worth of my gmail inbox. Works great!
>
> One question, what is the uniqueKey that we should use when indexing emails? 
> I couldn't figure out so I removed the uniqueKey from my schema to try this 
> out.
>
> Next steps:
> # Enhance the ant build file to copy the dependencies to example/solr/lib 
> just like Solr Cell does.
> # Add a wiki page with instructions to setup, list of dependencies, example 
> schema and data-config.xml
>
>> Enable importing of mails into a solr index through DIH.
>> --------------------------------------------------------
>>
>>                 Key: SOLR-934
>>                 URL: https://issues.apache.org/jira/browse/SOLR-934
>>             Project: Solr
>>          Issue Type: New Feature
>>          Components: contrib - DataImportHandler
>>    Affects Versions: 1.4
>>            Reporter: Preetam Rao
>>            Assignee: Shalin Shekhar Mangar
>>             Fix For: 1.4
>>
>>         Attachments: SOLR-934.patch, SOLR-934.patch, SOLR-934.patch, 
>> SOLR-934.patch, SOLR-934.patch
>>
>>   Original Estimate: 24h
>>  Remaining Estimate: 24h
>>
>> Enable importing of mails into solr through DIH. Take one or more mailbox 
>> credentials, download and index their content along with the content from 
>> attachments. The folders to fetch can be made configurable based on various 
>> criteria. Apache Tika is used for extracting content from different kinds of 
>> attachments. JavaMail is used for mail box related operations like fetching 
>> mails, filtering them etc.
>> The basic configuration for one mail box is as below:
>> {code:xml}
>> <document>
>>    <entity processor="MailEntityProcessor" user="[email protected]"
>>                 password="something" host="imap.gmail.com" protocol="imaps"/>
>> </document>
>> {code}
>> The below is the list of all configuration available:
>> {color:green}Required{color}
>> ---------
>> *user*
>> *pwd*
>> *protocol*  (only "imaps" supported now)
>> *host*
>> {color:green}Optional{color}
>> ---------
>> *folders* - comma seperated list of folders.
>> If not specified, default folder is used. Nested folders can be specified 
>> like a/b/c
>> *recurse* - index subfolders. Defaults to true.
>> *exclude* - comma seperated list of patterns.
>> *include* - comma seperated list of patterns.
>> *batchSize* - mails to fetch at once in a given folder.
>> Only headers can be prefetched in Javamail IMAP.
>> *readTimeout* - defaults to 60000ms
>> *conectTimeout* - defaults to 30000ms
>> *fetchSize* - IMAP config. 32KB default
>> *fetchMailsSince* -
>> date/time in "yyyy-MM-dd HH:mm:ss" format, mails received after which will 
>> be fetched. Useful for delta import.
>> *customFilter* - class name.
>> {code}
>> import javax.mail.Folder;
>> import javax.mail.SearchTerm;
>> clz implements MailEntityProcessor.CustomFilter() {
>> public SearchTerm getCustomSearch(Folder folder);
>> }
>> {code}
>> *processAttachement* - defaults to true
>> The below are the indexed fields.
>> {code}
>>   // Fields To Index
>>   // single valued
>>   private static final String SUBJECT = "subject";
>>   private static final String FROM = "from";
>>   private static final String SENT_DATE = "sentDate";
>>   private static final String XMAILER = "xMailer";
>>   // multi valued
>>   private static final String TO_CC_BCC = "allTo";
>>   private static final String FLAGS = "flags";
>>   private static final String CONTENT = "content";
>>   private static final String ATTACHMENT = "attachement";
>>   private static final String ATTACHMENT_NAMES = "attachementNames";
>>   // flag values
>>   private static final String FLAG_ANSWERED = "answered";
>>   private static final String FLAG_DELETED = "deleted";
>>   private static final String FLAG_DRAFT = "draft";
>>   private static final String FLAG_FLAGGED = "flagged";
>>   private static final String FLAG_RECENT = "recent";
>>   private static final String FLAG_SEEN = "seen";
>> {code}
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

Reply via email to