[ 
https://issues.apache.org/jira/browse/SOLR-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Preetam Rao updated SOLR-934:
-----------------------------

    Attachment: SOLR-934.patch

Rough cut version. Tested with sample mails from my gmail account.

- Indexes one folder from IMAP account.
- Indexes attachments from various types like ppt, word, txt, and anything that 
Tika supports.

TODO
--------
- recurse into folders
- performance tuning
- support filter criteria for folders
- supprt more than one mail box
- support pop3 

USAGE
----------

For each mail it creates a document with the following attributes:
    // Created fields
    // single valued
    "subject"
    "from"
    "sent_date"
    "sent_date_display"
    "X_Mailer"
    // multi valued
    "all_to"
    "flags"
    "content"
    "Attachement"
    
   // flag values
   "answered"
    "deleted"
    "draft"
    "flagged"
    "recent"
    "seen"

COMPILE
-------------
Dependencies:
JavaMail API jar
Activation jar
Tika and its dependent jars

How should we go about adding these dependencies ?





> Enable importing of mails into a solr index through DIH.
> --------------------------------------------------------
>
>                 Key: SOLR-934
>                 URL: https://issues.apache.org/jira/browse/SOLR-934
>             Project: Solr
>          Issue Type: New Feature
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Preetam Rao
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: SOLR-934.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Enable importing of mails into solr through DIH. Take one or more mailbox 
> credentials, download and index their content along with the content from 
> attachments.
> The folders to fetch can be made configurable based on various criteria.
> Apache Tika can be used for extracting content from different kinds of 
> attachments.
> JavaMail can be used for mail box related operations like fetching mails, 
> filtering them etc.
> The basic configuration for one mail box can look something like this:
> <document>
>    <entity processor="org.apache.solr.handler.dataimport.MailEntityProcessor"
>  user="[email protected]"
> password="something"
> host="imap.gmail.com"
> protocol="imaps"
> folder="test1"/>
> </document>
> - This can be enhanced with timeouts, list to be read from a file, folder 
> filters, delta import etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to