Your last comment made me think. I would like all the html encoded parts like É, é ’ etc... to be transformed into real utf8 as the feed should be utf8. (É, é and ’)
Any tips here? On 01/02/2010, Christian Brabandt <[email protected]> wrote: > On Mon, February 1, 2010 3:10 pm, bw wrote: >> I am looking for a way to remove the CDATA and only get the text. >> CURRENT: >> <add> >> <doc> >> <some_title>My title</some_title> >> <content><![[CDATA[ >> <p>The <strong>keyword</strong> is nice to have but is not needed to >> include in a solr feed</p><p><table cellspacing="2" cellpadding="2" >> border="1" width="100%"><tbody><tr><td>Étape 1 :</td></tr> >> ]]></content> >> </doc> >> <doc> >> .... >> </doc> >> </add> >> >> WANTED: >> <add> >> <doc> >> <some_title>My title</some_title> >> <content>The keyword is nice to have but is not needed to >> include in a solr feed</content> >> </doc> >> <doc> >> .... >> </doc> >> </add> >> >> any vim tricks to do this? > > If the start and end pattern are always in a separate line, you could > possibly use something like this: > :g/\V<![[CDATA[/+,/\V]]>/-s/<\_[^>]*>//g > followed by an additional > :%s/\V<![[CDATA[\|]]>// > to remove the remaining <![[CDATA start and end delimiters. > > Alternatively, you could use something like > :%s/\V<![[CDATA[\_.\{-}]]/\=substitute(submatch(0), > '\(<[^>]*>\)\|\(^\V![[CDATA[\)\|\(\V]]\$\)', '', 'g')/ > (1 line, barely tested, should work in your example case). > > Nevertheless, both leave the Étape 1 : parts in your text. So > you might be able to put the expression > :s/&[^;]*;// > into the previous expression, which would then look like this: > %s/\V<![[CDATA[\_.\{-}]]/\=substitute(submatch(0), > '\(<[^>]*>\)\|\(^\V![[CDATA[\)\|\(\V]]\$\)\|\m\(&[^;]*;\)', '', 'g')/ > and should work. However, I have it only barely tested. > > regards, > Christian > > -- > You received this message from the "vim_use" maillist. > For more information, visit http://www.vim.org/maillist.php -- [Bb](astia{2}n)?\s?[Ww](ak{2}ie)?$ -- You received this message from the "vim_use" maillist. For more information, visit http://www.vim.org/maillist.php
