bw wrote:
I am looking for a way to remove the CDATA and only get the text.
CURRENT:
<add>
  <doc>
     <some_title>My title</some_title>
        <content><![[CDATA[
<p>The <strong>keyword</strong> is nice to have but is not needed to
include in a solr feed</p><p><table cellspacing="2" cellpadding="2"
border="1" width="100%"><tbody><tr><td>&#201;tape 1&nbsp;:</td></tr>
]]></content>
  </doc>
  <doc>
     ....
  </doc>
</add>

WANTED:
<add>
  <doc>
     <some_title>My title</some_title>
        <content>The keyword is nice to have but is not needed to
include in a solr feed

what happens to the rest of the content here?

</content>
  </doc>
  <doc>
     ....
  </doc>
</add>

any vim tricks to do this?

You might be able to do something like

:%s/<!\[\[CDATA\[\(\%(\%(]]>\)\...@!\_.\)\{-}\)]]>/\=substitute(submatch(1), '<[^>]*>', '', 'g')/g

(all on one line)
It doesn't post-process XML entities, but otherwise, it worked on your example...

-tim



--
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php

Reply via email to