[
https://issues.apache.org/jira/browse/SOLR-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Preetam Rao updated SOLR-934:
-----------------------------
Description:
Enable importing of mails into solr through DIH. Take one or more mailbox
credentials, download and index their content along with the content from
attachments. The folders to fetch can be made configurable based on various
criteria. Apache Tika is used for extracting content from different kinds of
attachments. JavaMail is used for mail box related operations like fetching
mails, filtering them etc.
The basic configuration for one mail box is as below:
{code:xml}
<document>
<entity processor="MailEntityProcessor" user="[email protected]"
password="something" host="imap.gmail.com" protocol="imaps"/>
</document>
{code}
The below is the list of all configuration available:
{color:green}Required{color}
---------
*user*
*pwd*
*protocol* (only "imaps" supported now)
*host*
{color:green}Optional{color}
---------
*folders* - comma seperated list of folders.
If not specified, default folder is used. Nested folders can be specified like
a/b/c
*recurse* - index subfolders. Defaults to true.
*exclude* - comma seperated list of patterns.
*include* - comma seperated list of patterns.
*batchSize* - mails to fetch at once in a given folder.
Only headers can be prefetched in Javamail IMAP.
*readTimeout* - defaults to 60000ms
*conectTimeout* - defaults to 30000ms
*fetchSize* - IMAP config. 32KB default
*fetchMailsSince* -
date/time in miliiseconds, mails received after which will be fetched. Useful
for delta import.
*customFilter* - class name.
{code}
import javax.mail.Folder;
import javax.mail.SearchTerm;
clz implements MailEntityProcessor.CustomFilter() {
public SearchTerm getCustomSearch(Folder folder);
}
{code}
*processAttachement* - defaults to true
was:
Enable importing of mails into solr through DIH. Take one or more mailbox
credentials, download and index their content along with the content from
attachments.
The folders to fetch can be made configurable based on various criteria.
Apache Tika can be used for extracting content from different kinds of
attachments.
JavaMail can be used for mail box related operations like fetching mails,
filtering them etc.
The basic configuration for one mail box can look something like this:
{code:xml}
<document>
<entity processor="MailEntityProcessor" user="[email protected]"
password="something" host="imap.gmail.com" protocol="imaps"
folder="test1"/>
</document>
{code}
- This can be enhanced with timeouts, list to be read from a file, folder
filters, delta import etc.
Remaining Estimate: 24h (was: 120h)
Original Estimate: 24h (was: 120h)
> Enable importing of mails into a solr index through DIH.
> --------------------------------------------------------
>
> Key: SOLR-934
> URL: https://issues.apache.org/jira/browse/SOLR-934
> Project: Solr
> Issue Type: New Feature
> Components: contrib - DataImportHandler
> Affects Versions: 1.4
> Reporter: Preetam Rao
> Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: SOLR-934.patch, SOLR-934.patch
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> Enable importing of mails into solr through DIH. Take one or more mailbox
> credentials, download and index their content along with the content from
> attachments. The folders to fetch can be made configurable based on various
> criteria. Apache Tika is used for extracting content from different kinds of
> attachments. JavaMail is used for mail box related operations like fetching
> mails, filtering them etc.
> The basic configuration for one mail box is as below:
> {code:xml}
> <document>
> <entity processor="MailEntityProcessor" user="[email protected]"
> password="something" host="imap.gmail.com" protocol="imaps"/>
> </document>
> {code}
> The below is the list of all configuration available:
> {color:green}Required{color}
> ---------
> *user*
> *pwd*
> *protocol* (only "imaps" supported now)
> *host*
> {color:green}Optional{color}
> ---------
> *folders* - comma seperated list of folders.
> If not specified, default folder is used. Nested folders can be specified
> like a/b/c
> *recurse* - index subfolders. Defaults to true.
> *exclude* - comma seperated list of patterns.
> *include* - comma seperated list of patterns.
> *batchSize* - mails to fetch at once in a given folder.
> Only headers can be prefetched in Javamail IMAP.
> *readTimeout* - defaults to 60000ms
> *conectTimeout* - defaults to 30000ms
> *fetchSize* - IMAP config. 32KB default
> *fetchMailsSince* -
> date/time in miliiseconds, mails received after which will be fetched. Useful
> for delta import.
> *customFilter* - class name.
> {code}
> import javax.mail.Folder;
> import javax.mail.SearchTerm;
> clz implements MailEntityProcessor.CustomFilter() {
> public SearchTerm getCustomSearch(Folder folder);
> }
> {code}
> *processAttachement* - defaults to true
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.