[
https://issues.apache.org/jira/browse/SOLR-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660208#action_12660208
]
Noble Paul commented on SOLR-934:
---------------------------------
looks good. A few observations.
* the init must call super.init()
* Right before returning nextRow() ,call super.applyTransformer(row)
* Returning null signals end of rows. Close any connections or do cleanup
* 'exclude' and 'include' should either allow for escaping comma (between
multiple regex) or it can just take one reex for the time being
> Enable importing of mails into a solr index through DIH.
> --------------------------------------------------------
>
> Key: SOLR-934
> URL: https://issues.apache.org/jira/browse/SOLR-934
> Project: Solr
> Issue Type: New Feature
> Components: contrib - DataImportHandler
> Affects Versions: 1.4
> Reporter: Preetam Rao
> Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: SOLR-934.patch, SOLR-934.patch
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> Enable importing of mails into solr through DIH. Take one or more mailbox
> credentials, download and index their content along with the content from
> attachments. The folders to fetch can be made configurable based on various
> criteria. Apache Tika is used for extracting content from different kinds of
> attachments. JavaMail is used for mail box related operations like fetching
> mails, filtering them etc.
> The basic configuration for one mail box is as below:
> {code:xml}
> <document>
> <entity processor="MailEntityProcessor" user="[email protected]"
> password="something" host="imap.gmail.com" protocol="imaps"/>
> </document>
> {code}
> The below is the list of all configuration available:
> {color:green}Required{color}
> ---------
> *user*
> *pwd*
> *protocol* (only "imaps" supported now)
> *host*
> {color:green}Optional{color}
> ---------
> *folders* - comma seperated list of folders.
> If not specified, default folder is used. Nested folders can be specified
> like a/b/c
> *recurse* - index subfolders. Defaults to true.
> *exclude* - comma seperated list of patterns.
> *include* - comma seperated list of patterns.
> *batchSize* - mails to fetch at once in a given folder.
> Only headers can be prefetched in Javamail IMAP.
> *readTimeout* - defaults to 60000ms
> *conectTimeout* - defaults to 30000ms
> *fetchSize* - IMAP config. 32KB default
> *fetchMailsSince* -
> date/time in miliiseconds, mails received after which will be fetched. Useful
> for delta import.
> *customFilter* - class name.
> {code}
> import javax.mail.Folder;
> import javax.mail.SearchTerm;
> clz implements MailEntityProcessor.CustomFilter() {
> public SearchTerm getCustomSearch(Folder folder);
> }
> {code}
> *processAttachement* - defaults to true
> The below are the indexed fields.
> {code}
> // Fields To Index
> // single valued
> private static final String SUBJECT = "subject";
> private static final String FROM = "from";
> private static final String SENT_DATE = "sentDate";
> private static final String XMAILER = "xMailer";
> // multi valued
> private static final String TO_CC_BCC = "allTo";
> private static final String FLAGS = "flags";
> private static final String CONTENT = "content";
> private static final String ATTACHMENT = "attachement";
> private static final String ATTACHMENT_NAMES = "attachementNames";
> // flag values
> private static final String FLAG_ANSWERED = "answered";
> private static final String FLAG_DELETED = "deleted";
> private static final String FLAG_DRAFT = "draft";
> private static final String FLAG_FLAGGED = "flagged";
> private static final String FLAG_RECENT = "recent";
> private static final String FLAG_SEEN = "seen";
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.