Hello everyone.
I have created a collection and indexed mails from a gmail mailbox.
Nevertheless, only plain text is indexed. Neither html formatted nor
attachments' indexing works.
To index mails, I have included the below libs to solrconfig:
<lib dir="${solr.install.dir:../../..}/contrib/extraction/lib"
regex=".*\.jar" />
<lib dir="${solr.install.dir:../../..}/dist/"
regex="solr-cell-\d.*\.jar" />
Created mail-data-config.xml as below:
<dataConfig>
<document>
<!--
Note - In order to index attachments, set
processAttachement="true" and drop
Tika and its dependencies to example-DIH/solr/mail/lib directory
-->
<entity processor="MailEntityProcessor" user="xx...@gmail.com"
password="xxxxxx host="imap.gmail.com" protocol="imaps"
fetchMailsSince="2018-01-31 00:00:00" batchSize="20"
folders="inbox" processAttachement="true" name="mail_entity"/>
</document>
</dataConfig>
and added the below as well to solrconfig.
<requestHandler name="/dataimport" class="solr.DataImportHandler">
<lst name="defaults">
<str name="config">mail-data-config.xml</str>
</lst>
</requestHandler>
Please for your support :)
--
Dimitris Kardarakos