[
https://issues.apache.org/jira/browse/SOLR-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Preetam Rao updated SOLR-934:
-----------------------------
Attachment: SOLR-934.patch
Rough cut version. Tested with sample mails from my gmail account.
- Indexes one folder from IMAP account.
- Indexes attachments from various types like ppt, word, txt, and anything that
Tika supports.
TODO
--------
- recurse into folders
- performance tuning
- support filter criteria for folders
- supprt more than one mail box
- support pop3
USAGE
----------
For each mail it creates a document with the following attributes:
// Created fields
// single valued
"subject"
"from"
"sent_date"
"sent_date_display"
"X_Mailer"
// multi valued
"all_to"
"flags"
"content"
"Attachement"
// flag values
"answered"
"deleted"
"draft"
"flagged"
"recent"
"seen"
COMPILE
-------------
Dependencies:
JavaMail API jar
Activation jar
Tika and its dependent jars
How should we go about adding these dependencies ?
> Enable importing of mails into a solr index through DIH.
> --------------------------------------------------------
>
> Key: SOLR-934
> URL: https://issues.apache.org/jira/browse/SOLR-934
> Project: Solr
> Issue Type: New Feature
> Components: contrib - DataImportHandler
> Affects Versions: 1.4
> Reporter: Preetam Rao
> Priority: Minor
> Fix For: 1.4
>
> Attachments: SOLR-934.patch
>
> Original Estimate: 120h
> Remaining Estimate: 120h
>
> Enable importing of mails into solr through DIH. Take one or more mailbox
> credentials, download and index their content along with the content from
> attachments.
> The folders to fetch can be made configurable based on various criteria.
> Apache Tika can be used for extracting content from different kinds of
> attachments.
> JavaMail can be used for mail box related operations like fetching mails,
> filtering them etc.
> The basic configuration for one mail box can look something like this:
> <document>
> <entity processor="org.apache.solr.handler.dataimport.MailEntityProcessor"
> user="[email protected]"
> password="something"
> host="imap.gmail.com"
> protocol="imaps"
> folder="test1"/>
> </document>
> - This can be enhanced with timeouts, list to be read from a file, folder
> filters, delta import etc.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.