[
https://issues.apache.org/jira/browse/SOLR-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660194#action_12660194
]
preetam edited comment on SOLR-934 at 1/1/09 7:00 AM:
----------------------------------------------------------
Most of the features are implemented now.
Test cases also updated.
- recursion supported.
- folders can be selected/excluded by list of comma separated patterns
- mails can be fetched since a predefined receive date/time
- custom filters can be plugged in
- batching supported
TODO
- currently testbed needs to be setup manually. Create folders in testcase
setup().
- support POP3
- any reveiws/feedbacks/cleanup
attaching all the dependency jars as an attachment so that one does not have to
search them. May be it should be integrated through ant-maven tasks or maven
directly.
was (Author: preetam):
Most of the features are implemented now.
Test cases also updated.
> Enable importing of mails into a solr index through DIH.
> --------------------------------------------------------
>
> Key: SOLR-934
> URL: https://issues.apache.org/jira/browse/SOLR-934
> Project: Solr
> Issue Type: New Feature
> Components: contrib - DataImportHandler
> Affects Versions: 1.4
> Reporter: Preetam Rao
> Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: SOLR-934.patch, SOLR-934.patch
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> Enable importing of mails into solr through DIH. Take one or more mailbox
> credentials, download and index their content along with the content from
> attachments. The folders to fetch can be made configurable based on various
> criteria. Apache Tika is used for extracting content from different kinds of
> attachments. JavaMail is used for mail box related operations like fetching
> mails, filtering them etc.
> The basic configuration for one mail box is as below:
> {code:xml}
> <document>
> <entity processor="MailEntityProcessor" user="[email protected]"
> password="something" host="imap.gmail.com" protocol="imaps"/>
> </document>
> {code}
> The below is the list of all configuration available:
> {color:green}Required{color}
> ---------
> *user*
> *pwd*
> *protocol* (only "imaps" supported now)
> *host*
> {color:green}Optional{color}
> ---------
> *folders* - comma seperated list of folders.
> If not specified, default folder is used. Nested folders can be specified
> like a/b/c
> *recurse* - index subfolders. Defaults to true.
> *exclude* - comma seperated list of patterns.
> *include* - comma seperated list of patterns.
> *batchSize* - mails to fetch at once in a given folder.
> Only headers can be prefetched in Javamail IMAP.
> *readTimeout* - defaults to 60000ms
> *conectTimeout* - defaults to 30000ms
> *fetchSize* - IMAP config. 32KB default
> *fetchMailsSince* -
> date/time in miliiseconds, mails received after which will be fetched. Useful
> for delta import.
> *customFilter* - class name.
> {code}
> import javax.mail.Folder;
> import javax.mail.SearchTerm;
> clz implements MailEntityProcessor.CustomFilter() {
> public SearchTerm getCustomSearch(Folder folder);
> }
> {code}
> *processAttachement* - defaults to true
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.