Solr 4.1.0 We've been using the DIH to pull data in from a MySQL database for quite some time now. We're now wanting to strip all the HTML content out of many fields using the HTMLStripTransformer ( http://wiki.apache.org/solr/DataImportHandler#HTMLStripTransformer). Unfortunately, while it seems to be working fine for "top-level" entities, we can't seem to get it to work for sub-entities:
(not exact schema, reduced for example purposes) <entity name="blocks" dataSource="database" transformer="HTMLStripTransformer" query=" SELECT id as blockId, name as blockTitle, content as content FROM engagement_block "> <field column="content" stripHTML="true" /> *THIS WORKS!* <entity name="blockReplies" dataSource="database" transformer="HTMLStripTransformer" query=" SELECT br.other_content AS replyContent FROM block_reply "> <field column="other_content" stripHTML="true" /> *THIS DOESN'T WORK!* </entity> </entity> We've tried several different permutations of putting the sub-entity column in different nest levels of the XML to no avail. I'm curious if we're trying something that is just not supported or whether we are just trying the wrong things. Thanks, Andy Pickler