Re: redundant extraction.

Jimmy S. Lim Wed, 26 Apr 2000 08:38:41 -0700

hi!

will be very grateful if anyone can spare some time to help a newbie.

i wrote:

open(ABC, "TEST.html") or die "Can't open TEST: $!\n";
while ($line = <ABC>)
{
        if ($line =~ /hello\s\d\d\d/ )
{$hello=$&}
        if ($line =~ /E\d\d\d-\d\d\d\d/ or $line =~ /E\d\d\d\d\d\d\d/)
{$pager=$&}
print "$number : $hello\n";
};

to extract some information from an external html file.

below are the contents of file TEST.html.

E1234567 The quick brown fox jumps over the lazy dog hello 123.

the quick brown E0404040 toad jumps over the lazy croc hello 456
the quick E0487474 brown duck jumps over the lazy cat hello 789
E040-4774 the quick brown chick jumps over the lazy pig hello 101

this is what i get:

E1234567 : hello123

E1234567 : hello 123

E1234567 : hello 123

E0404040 : hello 456

E0487474 : hello 789

E040-4774 : hello 101

apparently, the information from the 1st line was repeated 3 times. i guess this is because of the 2 carriage returns after the first line of the TEST.html. if i cannot change the html file, how do i fix this problem?

i also tried another html file which apparently didn't have any obvious carriage return. i got a worse result w/ 20 or more repeat information extracted from the file. if it is not because of the carriage return, what could be the problem?

Re: redundant extraction.

Reply via email to