[Rails] Re: Removing extranious html

Marnen Laibow-Koser Wed, 07 Oct 2009 20:53:03 -0700

Morgan Morgan wrote:
> I can't seem to find a way to do this..  i have a bunch of html files
> that i just need to remove from the <!DOCTYPE to the <BODY> tag on the
> top then i need to remove from </body> to </html> on the bottom.
> 
> i looked at gsub and i'm learning regular expressions but i can't seem
> to figure out how they work.  so far i've been able to figure out how to
> kill single words and single letters but not whole blocks of letters and
> words.
> 
> it's mildly frustrating.
> 
> well if anyone can help it would be greatly appreciated. i'm off to my
> regex book.


Your regex book will be the best help, but here's a clue: I think you're 
going about it inside-out.  It would probably easiest to extract the 
entire <body> element.  It's relatively simple to write a regex that 
will cover most cases, but if you have to cover absolutely every valid 
case, you may want to use Nokogiri, Hpricot, or JavaScript DOM 
manipulation instead.

> 
> thanks in advanced.

Best,
--
Marnen Laibow-Koser
http://www.marnen.org
[email protected]
-- 
Posted via http://www.ruby-forum.com/.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Talk" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

[Rails] Re: Removing extranious html

Reply via email to