New topic: 

Stripping HTML, just cant get it right :P

<http://forums.realsoftware.com/viewtopic.php?t=38310>

         Page 1 of 1
   [ 1 post ]                 Previous topic | Next topic          Author  
Message        silvs          Post subject: Stripping HTML, just cant get it 
right :PPosted: Thu Mar 24, 2011 10:20 pm                         
Joined: Fri Aug 27, 2010 12:28 am
Posts: 3               Its really starting to drive me up the wall 

i have tried the following:

mbs plugin,
current examples in this forum
examples from RBU
regex (as below, plus some alternates, this seems to be the closest but some 
html still causes me grief)
Code:  
  rx.searchPattern ="<script.+?<\s?/script>"
  rx.searchPattern ="<style.+?<\s?/style>"
  rx.searchPattern ="<.+?>"


a great example url that pretty much breaks my code every time is any product 
from amazon,
 ie. http://www.amazon.com/exec/obidos/ASIN/ ... 2/sofa-20/

the tool is for people to check their own urls and count keyword density, so i 
need to be able to analyse the plain text version of their pages. with the 
current regex and some other trimming i seem to be able to cover most urls, but 
the amazon example shows that its not good enough 

This is for xPlatform btw so the mac textutil is out of the question 
unfortunately      
_________________
RS 2011r1 Enterprise
MBP 17" i7: 10.7
AMD2 64bit: Ubuntu 11.04
MBP 15" i7: Windows7  
                             Top            Display posts from previous: All 
posts1 day7 days2 weeks1 month3 months6 months1 year Sort by AuthorPost 
timeSubject AscendingDescending          Page 1 of 1
   [ 1 post ]      
-- 
Over 1500 classes with 29000 functions in one REALbasic plug-in collection. 
The Monkeybread Software Realbasic Plugin v9.3. 
http://www.monkeybreadsoftware.de/realbasic/plugins.shtml

[email protected]

Reply via email to