Morgan Morgan wrote: > I can't seem to find a way to do this.. i have a bunch of html files > that i just need to remove from the <!DOCTYPE to the <BODY> tag on the > top then i need to remove from </body> to </html> on the bottom. > > i looked at gsub and i'm learning regular expressions but i can't seem > to figure out how they work. so far i've been able to figure out how to > kill single words and single letters but not whole blocks of letters and > words. > > it's mildly frustrating. > > well if anyone can help it would be greatly appreciated. i'm off to my > regex book.
Your regex book will be the best help, but here's a clue: I think you're going about it inside-out. It would probably easiest to extract the entire <body> element. It's relatively simple to write a regex that will cover most cases, but if you have to cover absolutely every valid case, you may want to use Nokogiri, Hpricot, or JavaScript DOM manipulation instead. > > thanks in advanced. Best, -- Marnen Laibow-Koser http://www.marnen.org [email protected] -- Posted via http://www.ruby-forum.com/. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---

