Hi David,

I think you can use two different ways:

1. By scripting:
Should be something like that.
If you want to take into account all special characters (about one hundred), the second way could appear better.

function StripTags pHtml -- returns the meaningful text from a web page
  local tRegex,tPrevText
constant kHtml = "é,à,ç,>,<,ecirc;,è,©,•,&#39 ;,·,&"
  constant kConvertedHtml = "é,à,ç,>,<,ê,è,©,•,',·,&"
  -----
  replace return with space in pHtml
  replace numtochar(13) with empty in pHtml
  replace tab with empty in pHtml
  -----
  put replacetext(pHtml,"(?Usi)<SCRIPT.*</SCRIPT>","") into pHtml
  put replacetext(pHtml,"(?Usi)<STYLE>.*</STYLE>","") into pHtml
  put replacetext(pHtml,"(?Usi)<\?.*\?>","") into pHtml
  -----
  replace "&nbsp;" with space in pHtml
  replace "<BR>" with return in pHtml
  replace "<p>" with return in pHtml
  -----
  put  "<[^><]*>" into tRegex
  put replacetext(pHtml,tRegex,"") into pHtml
  put replacetext(pHtml,tRegex,"") into pHtml
  -----
  repeat until tPrevText is pHtml
    put pHtml into tPrevText
    put replacetext(pHtml," +",space) into pHtml
    put replacetext(pHtml,"^ ","") into pHtml
  end repeat
  -----
  replace (space & return) with return in pHtml
  replace (return & space) with return in pHtml
  filter pHtml without empty
  -----
  replace "&quot;" with quote in pHtml
  repeat with i = 1 to the number of items of kHtml
    replace item i of kHtml with item i of kConvertedHtml in pHtml
  end repeat
  -----
  return pHtml
end StripTags

2. By placing the text into a field:
We discussed this way of doing some months ago and it appeared (I think that it was Richard who pointed that out) that the fastest way seemed to use a field in a substack without opening it (if I remember correctly :-)

on StripTags pHtml
set the htmlText of fld "HtmlTemplate" of stack "HtmlConverter" to pHtml
  return the text of fld "HtmlTemplate" of stack "HtmlConver
end StripTags

Best Regards from Paris,
Eric Chatonet

Le 7 janv. 06 à 01:10, David Bovill a écrit :

On 7 Jan 2006, at 23:30, Eric Chatonet wrote:

Hi David,

From the docs:

    Á    &Aacute;
    á    &aacute;
    Â    &Acirc;
    â    &acirc;
    ´    &acute;
    Æ    &AElig;
    æ    &aelig;
    À    &Agrave;
    à    &agrave;
    Å    &Aring;
    å    &aring;
    Ã    &Atilde;
    ã    &atilde;
    Ä    &Auml;
    ä    &auml;

And many others.


This is from the htmlText property - yes? But that requires me to set the htmlText of a field... which is not such fun for a parser :) Guess I will have to manually stick them all in an array?

------------------------------------------------------------------------ ----------------------
http://www.sosmartsoftware.com/    [EMAIL PROTECTED]/


_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to