RE: RegExp matching over multiple lines

Brian Raven Tue, 21 Oct 2008 10:16:09 -0700

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Andy Postulka
Sent: 21 October 2008 17:47
To: perl-win32-users@listserv.ActiveState.com
Subject: RegExp matching over multiple lines


> Hi to everyone,
> 
> First off I hope I'm posting this to the right Mailing list. 
> If not, please forgive me in advance. I just started using ActivePerl
> 
> I'm having difficulty using RegExp to match/extract across several
lines in an HTML file.
> 
> I want to match and extract everything between a pair of HTML tags. 
> This Match/Extraction can occur over multiple lines in an HTML file.
>  
> I'm using the following RegExp to test the HTML file for a
Match/Extraction
> 
>                 /<div >(.*?)<\/div>/s
> 
> 
> Here is my Test HTML file.
> 
> Case #1
> <div > Some text  Number 1
> </div> 
> 
> Case #2
> <div > Some text  Number 2 </div>
> 
> Case #3 
> <div >
>   Some text  Number 3
> </div>
> 
> This only match that occurs is "Case #2" when the Tag pair occurs on
the same line.
> So that  tells me the RegExp works for a single line. 
> 
> When I split the Tag Pair over more that one line (Case # 1 & Case #3)
it fails to match. 
> Since the "." does not match the "new line" characters I've used the
"s" modifier, but that does not  seem to 
> work
> 
> I've tried using different combinations of the "s" & "m" modifiers
which didn't seem to help any
> I've also tried to "chomp()" each line to strip the new line
characters from each line before applying the 
> RegExp and use the different combination of the "sm" modifiers. That
didn't work either.

That regular expression works fine on that data for me. There must be
something you are not telling us. It would have helped if you posted
real code, e.g.

------------------------------------
use strict;
use warnings;

local $/;
my $data = <DATA>;
while ($data =~ /<div >(.*?)<\/div>/sg) {
    print "Found '$1'\n";
}
__DATA__
Case #1
<div > Some text  Number 1
</div> 

Case #2
<div > Some text  Number 2 </div>

Case #3 
<div >
  Some text  Number 3
</div>
------------------------------------

However, the best advice that can give is that regular expressions are
not a good tool to parse HTML, but HTML::Parser is. Or better still one
of the modules that is derived from it, e.g. HTML::TokeParser or
HTML::TreeBuilder.

HTH

-- 
Brian Raven 

-----------------------------------------------------------------------------------------------------------
This e-mail may contain confidential and/or privileged information. If you are 
not the intended recipient or have received this e-mail in error, please advise 
the sender immediately by reply e-mail and delete this message and any 
attachments without retaining a copy. Any unauthorised copying, disclosure or 
distribution of the material in this e-mail is strictly forbidden.


_______________________________________________
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

RE: RegExp matching over multiple lines

Reply via email to