I would like to start using DIH to index some RSS-Feeds and mail folders To get started I tried the RSS example from the wiki but as it is Solr complains about the missing id field. After some experimenting I found out two ways to fill the id:
- <copyField source="link" dest="id"/> in schema.xml This works but isn't very flexible. Perhaps I have other types of records with a real id or a multivalued link-field. Then this solution would break. - Changing the id field to type "uuid" Again I would like to keep real ids where I have them and not a random UUID. What didn't work but looks like the potentially best solution is to fill the id in my data-config by using the link twice: <field column="link" xpath="/RDF/item/link" /> <field column="id" xpath="/RDF/item/link" /> This would be a definition just for this single data source but I don't get any docs (also no error message). No trace of any inserts whatsoever. Is it possible to fill the id that way? Another question regarding MailEntityProcessor I found this example: <document> <entity processor="MailEntityProcessor" user="someb...@gmail.com" password="something" host="imap.gmail.com" protocol="imaps" folders = "x,y,z"/> </document> But what is the dataSource (the enclosing tag to document)? That is, how would a minimal but complete data-config.xml look like to index mails from an IMAP server? And finally, is it possible to combine the definitions for several RSS-Feeds and Mail-accounts into one data-config? Or do I need a separate config file and request handler for each of them? -Michael