Re,

Le 2 août 08 à 16:31, H Baric a écrit :

* Get the text only from a web page - no html tags, no formatting etc.

LOL
This is a case that needs some additional code snippet as I said in a previous email :-)

put StripTags(thePage) into field "The Page"
---------------------------------------------------------
function StripTags pHtml -- returns the meaningful text from a web page
  local tRegex,tPrevText
constant kHtml = "é,à,ç,>,<,ecirc;,è,©,•,&#39 ;,·,&"
  constant kConvertedHtml = "é,à,ç,>,<,ê,è,©,•,',·,&"
  -----
  replace return with space in pHtml
  replace numtochar(13) with empty in pHtml
  replace tab with empty in pHtml
  -----
  put replacetext(pHtml,"(?Usi)<SCRIPT.*</SCRIPT>","") into pHtml
  put replacetext(pHtml,"(?Usi)<STYLE>.*</STYLE>","") into pHtml
  put replacetext(pHtml,"(?Usi)<\?.*\?>","") into pHtml
  -----
  replace "&nbsp;" with space in pHtml
  replace "<BR>" with return in pHtml
  replace "<p>" with return in pHtml
  -----
  put  "<[^><]*>" into tRegex
  put replacetext(pHtml,tRegex,"") into pHtml
  put replacetext(pHtml,tRegex,"") into pHtml
  -----
  repeat until tPrevText is pHtml
    put pHtml into tPrevText
    put replacetext(pHtml," +",space) into pHtml
    put replacetext(pHtml,"^ ","") into pHtml
  end repeat
  -----
  replace (space & return) with return in pHtml
  replace (return & space) with return in pHtml
  filter pHtml without empty
  -----
  replace "&quot;" with quote in pHtml
  repeat with i = 1 to the number of items of kHtml
    replace item i of kHtml with item i of kConvertedHtml in pHtml
  end repeat
  -----
  return pHtml
end StripTags

Best regards from Paris,
Eric Chatonet.
----------------------------------------------------------------
Plugins and tutorials for Revolution: http://www.sosmartsoftware.com/
Email: [EMAIL PROTECTED]/
----------------------------------------------------------------


_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to