From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Andy Postulka Sent: 21 October 2008 17:47 To: perl-win32-users@listserv.ActiveState.com Subject: RegExp matching over multiple lines
> Hi to everyone, > > First off I hope I'm posting this to the right Mailing list. > If not, please forgive me in advance. I just started using ActivePerl > > I'm having difficulty using RegExp to match/extract across several lines in an HTML file. > > I want to match and extract everything between a pair of HTML tags. > This Match/Extraction can occur over multiple lines in an HTML file. > > I'm using the following RegExp to test the HTML file for a Match/Extraction > > /<div >(.*?)<\/div>/s > > > Here is my Test HTML file. > > Case #1 > <div > Some text Number 1 > </div> > > Case #2 > <div > Some text Number 2 </div> > > Case #3 > <div > > Some text Number 3 > </div> > > This only match that occurs is "Case #2" when the Tag pair occurs on the same line. > So that tells me the RegExp works for a single line. > > When I split the Tag Pair over more that one line (Case # 1 & Case #3) it fails to match. > Since the "." does not match the "new line" characters I've used the "s" modifier, but that does not seem to > work > > I've tried using different combinations of the "s" & "m" modifiers which didn't seem to help any > I've also tried to "chomp()" each line to strip the new line characters from each line before applying the > RegExp and use the different combination of the "sm" modifiers. That didn't work either. That regular expression works fine on that data for me. There must be something you are not telling us. It would have helped if you posted real code, e.g. ------------------------------------ use strict; use warnings; local $/; my $data = <DATA>; while ($data =~ /<div >(.*?)<\/div>/sg) { print "Found '$1'\n"; } __DATA__ Case #1 <div > Some text Number 1 </div> Case #2 <div > Some text Number 2 </div> Case #3 <div > Some text Number 3 </div> ------------------------------------ However, the best advice that can give is that regular expressions are not a good tool to parse HTML, but HTML::Parser is. Or better still one of the modules that is derived from it, e.g. HTML::TokeParser or HTML::TreeBuilder. HTH -- Brian Raven ----------------------------------------------------------------------------------------------------------- This e-mail may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please advise the sender immediately by reply e-mail and delete this message and any attachments without retaining a copy. Any unauthorised copying, disclosure or distribution of the material in this e-mail is strictly forbidden. _______________________________________________ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs