I have two lists, not necessarily of the same length. List #1 has two columns. List #2 has one column. I would like to do the following:
Scan list #1 line by line. If a match for column #1 in list #1 is found in list #2, extract the matching lines and put them in a new list (#3). Otherwise, leave the contents of lists #1 and #2 as they are. If I expected the contents of the first column of each list to match exactly (character for character) - this would be a simple task with C++ or the like. However, the contents will not necessarily be perfectly identical. I do believe they are nearly identical enough though to use pattern matching via Perl or the like. Personally this is difficult for me (as a Perl noob), I know how to scan through a file for a pre-determined pattern - I don't understand how to scan through a file for a pattern that is essentially given by a line in another file...? I have not found anything in my reading of Perl documentation that explains how to read a file and use its contents as an argument for the pattern to search for in another file (suggestions on excellent Perl doc sources appreciated also!). This is what the contents of the lists may look like: TALL0047A TAL0047A TAL047A TAL47A TA0047A TA047A TA47A T0047A T047A T47A T0047 T047 T47 Examples of matching: TALL0047A TALL047A match TALL0047A TAL0047A not a match TALL0047A TAL0470A not a match The contents will always be one to four alpha characters followed by one to four numeric characters possibly followed by one or two alpha characters. A match would be defined as the following criteria being met: - The last one to four digits being identical (excluding leading zeroes) - The first one to four letters being identical It is absolutely imperative that any algorithm used does not produce false positives - if a line is extracted as a match - it must without a doubt actually be a match. It is not so critical if a possible match is passed up. The lists will contain thousands or tens of thousands of entries - just looking for a clever way to automate as much of the process as possible. I expect to have to check a portion of the lists by hand - I would simply appreciate reducing the number of lines that have to be checked manually. Perl seems ideal - I'm just not savvy enough with it (yet!) to make it work. I know there are some Perl gurus lurking about in LUGOD so if any of you have a spare moment to lend this some thought - thanks! Thank you in advance for any suggestions! - Trevor _______________________________________________ vox-tech mailing list [email protected] http://lists.lugod.org/mailman/listinfo/vox-tech
