Depends on how bad the table-soup is, but I usually like to
start with some basic cleanup with find/replace using regular
expressions in DWMX.
Search for: </{0,1}(table|tr|td)([^>]*)>
Replace with:
You can, of course, add more tag names in there depending on your situation,
with the obvious caveat that this only removes the tags themselves, not the
content in between opening/closing tags...so if you have, say, a block
of javascript
<script type="text/javascript">
blahblah
</script>
Running a replace for </{0,1}(script)([^>]*)> will leave blahblah in place...
Anyway, your mileage may vary, and it will still need a good ammount of
hand-cleaning...but at least it removes the tougher stains left on the table cloth.
Patrick
________________________________
Patrick H. Lauke
Webmaster / University of Salford
http://www.salford.ac.uk
> -----Original Message-----
> From: Lea de Groot
> Sent: 21 July 2004 10:00
> To: [EMAIL PROTECTED]
> Subject: [WSG] technique of converting to tablefree layout
>
>
> What are people's preferred techniques for 'screen scraping' existing
> sites to get the text from a tag-soup table layout?
> When a page has copious links and such, simply copying the text from
> the browser doesn't always give enough content to be a useful quick
> method.
>
> Lea
> --
> Lea de Groot
> Elysian Systems - I Understand the Internet
> <http://elysiansystems.com/>
> Web Design, Usability, Information Architecture, Search Engine
> Optimisation
> Brisbane, Australia
> *****************************************************
> The discussion list for http://webstandardsgroup.org/
> See http://webstandardsgroup.org/mail/guidelines.cfm
> for some hints on posting to the list & getting help
> *****************************************************
>
>
*****************************************************
The discussion list for http://webstandardsgroup.org/
See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list & getting help
*****************************************************