Hello everyone.

I have created a collection and indexed mails from a gmail mailbox. Nevertheless, only plain text is indexed. Neither html formatted nor attachments' indexing works.

To index mails, I have included the below libs to solrconfig:

<lib dir="${solr.install.dir:../../..}/contrib/extraction/lib" regex=".*\.jar" /> <lib dir="${solr.install.dir:../../..}/dist/" regex="solr-cell-\d.*\.jar" />

Created mail-data-config.xml as below:

<dataConfig>
  <document>
      <!--
        Note - In order to index attachments, set processAttachement="true" and drop
        Tika and its dependencies to example-DIH/solr/mail/lib directory
       -->
      <entity processor="MailEntityProcessor" user="xx...@gmail.com"
            password="xxxxxx host="imap.gmail.com" protocol="imaps"
            fetchMailsSince="2018-01-31 00:00:00" batchSize="20" folders="inbox" processAttachement="true" name="mail_entity"/>
  </document>
</dataConfig>

and added the below as well to solrconfig.

  <requestHandler name="/dataimport" class="solr.DataImportHandler">
    <lst name="defaults">
      <str name="config">mail-data-config.xml</str>
    </lst>
  </requestHandler>

Please for your support :)

--
Dimitris Kardarakos

Reply via email to