[
https://issues.apache.org/jira/browse/SOLR-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12659046#action_12659046
]
Shalin Shekhar Mangar commented on SOLR-934:
--------------------------------------------
Thanks for this Preetam, looks great!
A few suggestions:
# Use the Lucene code style -- you can get a codestyle for Eclipse/Idea from
http://wiki.apache.org/solr/HowToContribute
# Let us use the Java variable naming convention for the fields e.g sent_date
becomes sentDate
# I don't think we need the sent_date_display, people can always format the
date and display as they want
# All the attributes for the entity processor should be templatized e.g
user="${dataimporter.request.user}" and so on. You'd need to use
context.getVariableResolver().replaceTokens(attr)
# The Profile class looks un-necessary. The values can be stored directly as
private variables
# Attachment names can be another multi-valued field
# Exception while connecting must be propagated so that the users know why the
connection is failing.
# For delta imports, we can just provide a olderThan and newerThan syntax. That
should be enough
# Streaming is recommended instead of calling folder.getMessages(). We can use
getMessages(int start, int end) and the batchSize can be a configurable
parameter with some sane default.
Support for recursive folders will be awesome.
> Enable importing of mails into a solr index through DIH.
> --------------------------------------------------------
>
> Key: SOLR-934
> URL: https://issues.apache.org/jira/browse/SOLR-934
> Project: Solr
> Issue Type: New Feature
> Components: contrib - DataImportHandler
> Affects Versions: 1.4
> Reporter: Preetam Rao
> Priority: Minor
> Fix For: 1.4
>
> Attachments: SOLR-934.patch
>
> Original Estimate: 120h
> Remaining Estimate: 120h
>
> Enable importing of mails into solr through DIH. Take one or more mailbox
> credentials, download and index their content along with the content from
> attachments.
> The folders to fetch can be made configurable based on various criteria.
> Apache Tika can be used for extracting content from different kinds of
> attachments.
> JavaMail can be used for mail box related operations like fetching mails,
> filtering them etc.
> The basic configuration for one mail box can look something like this:
> <document>
> <entity processor="org.apache.solr.handler.dataimport.MailEntityProcessor"
> user="[email protected]"
> password="something"
> host="imap.gmail.com"
> protocol="imaps"
> folder="test1"/>
> </document>
> - This can be enhanced with timeouts, list to be read from a file, folder
> filters, delta import etc.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.