Re: remove and clean CDATA out of xml

Tony Mechelynck Mon, 01 Feb 2010 07:04:01 -0800

On 01/02/10 15:10, bw wrote:

Hello,


I have a big xml solr feed out of my content management system that
includes wysiwyg html tags inside CDATA tags.

I am looking for a way to remove the CDATA and only get the text.
CURRENT:
<add>
   <doc>
      <some_title>My title</some_title>
         <content><![[CDATA[
<p>The<strong>keyword</strong>  is nice to have but is not needed to
include in a solr feed</p><p><table cellspacing="2" cellpadding="2"
border="1" width="100%"><tbody><tr><td>&#201;tape 1&nbsp;:</td></tr>
]]></content>
   </doc>
   <doc>
      ....
   </doc>
</add>

WANTED:
<add>
   <doc>
      <some_title>My title</some_title>
         <content>The keyword is nice to have but is not needed to
include in a solr feed</content>
   </doc>
   <doc>
      ....
   </doc>
</add>

any vim tricks to do this?

thx

That's a hard one. I think you would have to write an ad-hoc function,using search() and maybe :mark, unless you always have a linebreak after<![[CDATA[ and another one before the corresponding ]]>, in which casethe following (untested) might work


        1
        %g/<!\[\]CDATA\[/.+1;/]]>/-1s/<.{-}>//
        %s/<!\[\[CDATA\[\|]]>//

but only if you have no other ]]>


Best regards,
Tony.
--
hundred-and-one symptoms of being an internet addict:
253. You wait for a slow loading web page before going to the toilet.

--
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php

Re: remove and clean CDATA out of xml

Reply via email to