Re: Need help on getting HTML content

Sebastian Nagel Fri, 16 Dec 2016 07:58:28 -0800

Hi,

the only way is to transform the DOM subtree below the <math> element
back to HTML and then save this HTML string in parse metadata and write
it via an indexing filter as an extra field to the index.


See, e.g., o.a.n.util.DomUtil.saveDom(OutputStream, Element)
for how to "serialize" a DOM subtree.

Best,
Sebastian

On 12/16/2016 07:27 AM, [email protected] wrote:
> Hi,
> 
> 
> For a particular tag (<math>), I need to save the entire HTML of the tag.
> 
> Now I am able to save only the text content in getText() called in 
> HTMLParser.java. 
> 
> But there is no way to store the HTML content.
> 
> 
> Please share your thoughts on this.
> 
> [math tag.png]
> 
> 
> Thanks in advance,
> 
> -Ashok.
> 
> 
> 
> This e-mail and any files transmitted with it are for the sole use of the 
> intended recipient(s) and
> may contain confidential and privileged information. If you are not the 
> intended recipient(s),
> please reply to the sender and destroy all copies of the original message. 
> Any unauthorized review,
> use, disclosure, dissemination, forwarding, printing or copying of this 
> email, and/or any action
> taken in reliance on the contents of this e-mail is strictly prohibited and 
> may be unlawful. Where
> permitted by applicable law, this e-mail and other e-mail communications sent 
> to and from Cognizant
> e-mail addresses may be monitored.

Re: Need help on getting HTML content

Reply via email to