[ 
https://issues.apache.org/jira/browse/SOLR-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12659046#action_12659046
 ] 

Shalin Shekhar Mangar commented on SOLR-934:
--------------------------------------------

Thanks for this Preetam, looks great!

A few suggestions:
# Use the Lucene code style -- you can get a codestyle for Eclipse/Idea from 
http://wiki.apache.org/solr/HowToContribute
# Let us use the Java variable naming convention for the fields e.g sent_date 
becomes sentDate
# I don't think we need the sent_date_display, people can always format the 
date and display as they want
# All the attributes for the entity processor should be templatized e.g 
user="${dataimporter.request.user}" and so on. You'd need to use 
context.getVariableResolver().replaceTokens(attr)
# The Profile class looks un-necessary. The values can be stored directly as 
private variables
# Attachment names can be another multi-valued field
# Exception while connecting must be propagated so that the users know why the 
connection is failing.
# For delta imports, we can just provide a olderThan and newerThan syntax. That 
should be enough
# Streaming is recommended instead of calling folder.getMessages(). We can use 
getMessages(int start, int end) and the batchSize can be a configurable 
parameter with some sane default.

Support for recursive folders will be awesome.

> Enable importing of mails into a solr index through DIH.
> --------------------------------------------------------
>
>                 Key: SOLR-934
>                 URL: https://issues.apache.org/jira/browse/SOLR-934
>             Project: Solr
>          Issue Type: New Feature
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Preetam Rao
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: SOLR-934.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Enable importing of mails into solr through DIH. Take one or more mailbox 
> credentials, download and index their content along with the content from 
> attachments.
> The folders to fetch can be made configurable based on various criteria.
> Apache Tika can be used for extracting content from different kinds of 
> attachments.
> JavaMail can be used for mail box related operations like fetching mails, 
> filtering them etc.
> The basic configuration for one mail box can look something like this:
> <document>
>    <entity processor="org.apache.solr.handler.dataimport.MailEntityProcessor"
>  user="[email protected]"
> password="something"
> host="imap.gmail.com"
> protocol="imaps"
> folder="test1"/>
> </document>
> - This can be enhanced with timeouts, list to be read from a file, folder 
> filters, delta import etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to