I would like to start using DIH to index some RSS-Feeds and mail folders

To get started I tried the RSS example from the wiki but as it is Solr
complains about the missing id field. After some experimenting I found
out two ways to fill the id:

- <copyField source="link" dest="id"/> in schema.xml
This works but isn't very flexible. Perhaps I have other types of
records with a real id or a multivalued link-field. Then this solution
would break.

- Changing the id field to type "uuid"
Again I would like to keep real ids where I have them and not a random UUID.

What didn't work but looks like the potentially best solution is to fill
the id in my data-config by using the link twice:
  <field column="link"         xpath="/RDF/item/link" />
  <field column="id"           xpath="/RDF/item/link" />
This would be a definition just for this single data source but I don't
get any docs (also no error message). No trace of any inserts whatsoever.
Is it possible to fill the id that way?

Another question regarding MailEntityProcessor
I found this example:
<document>
   <entity processor="MailEntityProcessor"
           user="someb...@gmail.com"
           password="something"
           host="imap.gmail.com"
           protocol="imaps"
           folders = "x,y,z"/>
</document>

But what is the dataSource (the enclosing tag to document)? That is, how
would a minimal but complete data-config.xml look like to index mails
from an IMAP server?

And finally, is it possible to combine the definitions for several
RSS-Feeds and Mail-accounts into one data-config? Or do I need a
separate config file and request handler for each of them?

-Michael

Reply via email to