Lango, Trevor M. wrote:
First I apologize for the lame "reply" format - I am forced to use Microsoft Outlook Web Access (shudder) at work and wouldn't you know - it doesn't offer any options for mail format...?
Based on your rules above, TALL0047A and TAL0047A do in fact match
No, actually - however many characters are present in each have to match. If
the number of alpha characters in the first set of each field in the two lists
differ - no match.
Are you really saying:
From both items
remove trailing alphas
No. If trailing alphas are present they must also match.
take the last 4 digits
remove any leading zeros
Yes.
Do the strings always start with alphas? Or are there sometimes numerics
within the first 1-4 characters?
Yes - always start with alphas.
Is there stuff between the leading and ending portions, such that the
entries may be more than 10 characters long?
There will never be more than 4 leading alphas, 5 numerics, and 2 trailing
alphas.
So if, the string always starts with alphas, followed by digits,
followed (optionally) by alphas, and the digits must match when leading
zeros are removed then you could:
# here is one method (as always there are many ways to do it)
# read each file, parse the sections of each str, put those in a hash
# then compare the hashs, deleting the keys when you get a match
use strict;
my (@new, %file1, %file2);
open (FILE, "file1");
while (<FILE>) {
my $key = join("",parse($_));
$file1{$key} = $_;
}
close (FILE);
open (FILE, "file2");
while (<FILE>) {
my $key = join("",parse($_));
$file2{$key} = $_;
}
close (FILE);
my %tmp = %file1;
while (my ($key,$value) = each %tmp) {
if (defined $file2{$key}) {
delete $file1{$key};
delete $file2{$key};
push @new, $value;
}
}
print "matching\n";
print join("\n", @new),"\n";
print "in file1 but not file2\n";
print join("\n", sort values %file1),"\n";
print "in file2 but not file1\n";
print join("\n", sort values %file2),"\n";
sub parse {
my $str = $_[0];
# Capture the parts, leading alpha, followed by n digits,
# followed optionally by alphas
$str =~ /([a-zA-Z]+)(\d+)([a-zA-Z]+)?/;
my @str = ($1,$2,$3); # put the matches back into an array
$str[1] =~ s/^0+//; # strip leading 0s from digit portion
return @str;
}
_______________________________________________
vox-tech mailing list
[email protected]
http://lists.lugod.org/mailman/listinfo/vox-tech