That did the trick. Thanks.
Bill
$Bill Luebkert wrote:
Bill Platt wrote:
Hello,
I have included a section of code below
that is driving me nuts.
If I don't run the Substitution operations,
then I can successfully extract the URL
and the imbedded anchor text from
$parsed_html.
Once I include the Substitution operations,
then I cannot extract the same results.
Even though the output text looks theoretically
correct, I cannot see why any combination of the
Substitution operation breaks my code.
Can you offer any suggestions to me?
if($parsed_html =~ m/href/)
{
$parsed_html =~ s/\s+/ /gs;
$parsed_html =~ s/>/">/gs;
The above could cause problems later.
$parsed_html =~ s/=http/="http/gis;
$parsed_html =~ s/"+/"/gs;
$parsed_html =~ s/'"/'/gs;
$_ = "$parsed_html";
@urlmatch = (@urlmatch,$2,$4) while m{
< \s*
A \s+ HREF \s* = \s* (["']) (.*?) (["'])
\s* > \s* (.*?) \s* <\/a \s* >
There is a " before the last > that you will need to account for.
}gsix;
print "0=$urlmatch[0]<BR>1=$urlmatch[1]<BR>2=$urlmatch[2]<BR>";
print "3=$urlmatch[3]<BR>4=$urlmatch[4]<BR>5=$urlmatch[5]<BR>";
print "s0=$0<BR>s1=$1<BR>s2=$2<BR>s3=$3<BR>s4=$4<BR>s5=$5<BR>";
print "$_<BR><HR>$parsed_html<BR><HR>";
}
my @urlmatch;
my $parsed_html =
"<A HREF="" class="moz-txt-link-freetext" href="http://www.fubar.com/">http://www.fubar.com/>URL</A>\n<A HREF="" class="moz-txt-link-freetext" href="http://www.fubar2.com/">http://www.fubar2.com/>URL2</A>\n";
if ($parsed_html =~ m/href/i) {
$parsed_html =~ s/\s+/ /gs;
$parsed_html =~ s/>/">/gs;
$parsed_html =~ s/=http/="http/gis;
$parsed_html =~ s/"+/"/gs;
$parsed_html =~ s/'"/'/gs;
$_ = $parsed_html;
print "\$_=$_\n";
while ( # note I added "? to the last part of the RE ------v (or just drop the \s*> part)
/<\s*A\s+HREF\s*=\s*(["'])(.*?)(["'])\s*>\s*([^<]*)\s*<\/a"*\s*>/gis) {
# print $n variables out:
for (1..9) {
eval "print \"<BR>$_=', \$$_, '\n\" if defined \$$_";
}
}
}
|
_______________________________________________
Perl-Unix-Users mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs