hi!
will be very grateful if anyone can spare some time to help a newbie.
i wrote:
open(ABC, "TEST.html") or die "Can't open TEST: $!\n";
while ($line = <ABC>)
{
        if ($line =~ /hello\s\d\d\d/ )
{$hello=$&}
        if ($line =~ /E\d\d\d-\d\d\d\d/ or $line =~ /E\d\d\d\d\d\d\d/)
{$pager=$&}
print "$number : $hello\n";
};
 
to extract some information from an external html file.
below are the contents of file TEST.html.
E1234567 The quick brown fox jumps over the lazy dog hello 123.
 
the quick brown E0404040 toad jumps over the lazy croc hello 456                              
the quick E0487474  brown duck jumps over the lazy cat hello 789   
E040-4774 the quick brown chick jumps over the lazy pig hello 101    
 
this is what i get:
E1234567 : hello123
E1234567 : hello 123
E1234567 : hello 123
E0404040 : hello 456
E0487474 : hello 789
E040-4774 : hello 101
 
apparently, the information from the 1st line was repeated 3 times. i guess this is because of the 2 carriage returns after the first line of the TEST.html. if i cannot change the html file, how do i fix this problem?
 
i also tried another html file which apparently didn't have any obvious carriage return. i got a worse result w/ 20 or more repeat information extracted from the file. if it is not because of the carriage return, what could be the problem?

Reply via email to