Lango, Trevor M. wrote:
First I apologize for the lame "reply" format - I am forced to use Microsoft Outlook Web Access (shudder) at work and wouldn't you know - it doesn't offer any options for mail format...?
Based on your rules above, TALL0047A and TAL0047A do in fact match


No, actually - however many characters are present in each have to match.  If 
the number of alpha characters in the first set of each field in the two lists 
differ - no match.


Are you really saying:


From both items
remove trailing alphas

No. If trailing alphas are present they must also match.

take the last 4 digits
remove any leading zeros

Yes.


Do the strings always start with alphas? Or are there sometimes numerics
within the first 1-4 characters?

Yes - always start with alphas.



Is there stuff between the leading and ending portions, such that the
entries may be more than 10 characters long?


There will never be more than 4 leading alphas, 5 numerics, and 2 trailing 
alphas.

So if, the string always starts with alphas, followed by digits, followed (optionally) by alphas, and the digits must match when leading zeros are removed then you could:

# here is one method (as always there are many ways to do it)
# read each file, parse the sections of each str, put those in a hash
# then compare the hashs, deleting the keys when you get a match

use strict;

my (@new, %file1, %file2);

open (FILE, "file1");
while (<FILE>) {
        my $key = join("",parse($_));
        $file1{$key} = $_;
}
close (FILE);

open (FILE, "file2");
while (<FILE>) {
        my $key = join("",parse($_));
        $file2{$key} = $_;
}
close (FILE);

my %tmp = %file1;
while (my ($key,$value) = each %tmp) {

        if (defined $file2{$key}) {
                delete $file1{$key};
                delete $file2{$key};

                push @new, $value;
        }
}

print "matching\n";
print join("\n", @new),"\n";

print "in file1 but not file2\n";
print join("\n", sort values %file1),"\n";

print "in file2 but not file1\n";
print join("\n", sort values %file2),"\n";

sub parse {
        my $str = $_[0];

        # Capture the parts, leading alpha, followed by n digits,
        # followed optionally by alphas
        $str =~ /([a-zA-Z]+)(\d+)([a-zA-Z]+)?/;
        
        my @str = ($1,$2,$3);   # put the matches back into an array
        $str[1] =~ s/^0+//;     # strip leading 0s from digit portion

        return @str;
}
_______________________________________________
vox-tech mailing list
[email protected]
http://lists.lugod.org/mailman/listinfo/vox-tech

Reply via email to