[
https://issues.apache.org/jira/browse/SOLR-1003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670099#action_12670099
]
Shalin Shekhar Mangar commented on SOLR-1003:
---------------------------------------------
No, not really. If HTML is embedded inside an XML document it needs to be
encoded properly (replace '<' with < etc.). The example described here does
not contain HTML, rather it contains XML nodes inside the "xhtml : p" node
mixed with Text nodes. This is the same example which led to the discovery of
SOLR-999 issue.
> XPathEntityprocessor must allow slurping all text from a given xml node and
> its children
> ----------------------------------------------------------------------------------------
>
> Key: SOLR-1003
> URL: https://issues.apache.org/jira/browse/SOLR-1003
> Project: Solr
> Issue Type: New Feature
> Components: contrib - DataImportHandler
> Affects Versions: 1.4
> Reporter: Noble Paul
> Priority: Minor
> Fix For: 1.4
>
> Attachments: SOLR-1003.patch
>
>
> take an example:
> {code:xml}
> <xhtml:p>This text is
> <xhtml:b>bold</xhtml:b> and this text is
> <xhtml:u>underlined</xhtml:u>!
> </xhtml:p>
> {code}
> It may be useful to get all the text from all the tags in <xhtml: p> ignoring
> the tag names .
> the configuration of the field may look like
> {code:xml}
> <field column="para" xpath="/p" flatten="true"/>
> {code}
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.