[EMAIL PROTECTED] wrote: > Hi All, > > I wonder could someone help me with this? > > What I want to do is search through a list of letters and look for > adjacent groups of letters that form sequences, not in the usual way of > matching say abc to another appearance later on in the list but to look > for transposed patterns. > > The groups of letters can be made up of between 2 to 4 letters. > It is not know beforehand how the groups are formed, so I can't say > here is a group, look for a sequence formed from this. > There must be at least three appearances of the groups for the sequence > to be pass. > More than one sequence may show in a list, i.e. first abc, bcd and cde > followed by bd, df, fa, ac at some later point. > Ideally I would like to know the start and end position of the > sequence/sequences. > > I'm looking for patterns such as these: > > ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e'] > ['a', 'c', 'f', 'c', 'e', 'h', 'e', 'g', 'c'] > ['c', 'a', 'a', 'f', 'f', 'd', 'd', 'b'] > > But these would be wrong: > > ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e', 'f', 'a', 'd'] - This one > fails because of the 'jump' of the last group of three letters in > relation to the others. To pass the last group would need to have been > def. However, the previous three groups are ok and would be passed as a > sequence. > > ['a', 'c', 'f', 'a', 'f', 'e', 'e', 'g', 'c'] - Here, although the > first and last groups have a similar sequence this would fail because > of the intervening group (afe) not following the pattern. > > > > Thanks for any help, > > Malcolm > I'm not sure I understand exactly what you want to do, but some of the following may help
# Good sequences s1 = ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e'] s2 = ['a', 'c', 'f', 'c', 'e', 'h', 'e', 'g', 'c'] s3 = ['c', 'a', 'a', 'f', 'f', 'd', 'd', 'b'] # Bad sequences b1 = ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e', 'f', 'a', 'd'] b2 = ['a', 'c', 'f', 'a', 'f', 'e', 'e', 'g', 'c'] def make_groups(sequence, n): """return groups of length n from sequence Usage: >>> list(make_groups(s1,3)) [['a', 'b', 'c'], ['b', 'c', 'd'], ['c', 'd', 'e']] """ length = len(sequence) if length % n: raise ValueError, "Sequence length %s is not multiple of %s"\ % (length, n) for i in xrange(0, length, n): yield sequence[i:i+n] def make_diffs(groups): """return groups in canonical form where the first note in each sequence is 0, and each subsequent note is expressed as a positive interval from the first. Example: >>> list(make_diffs([['a', 'b', 'c'], ['b', 'c', 'd'], ['c', 'd', 'e']])) [(0, 1, 2), (0, 1, 2), (0, 1, 2)] """ for group in groups: base, basechr = [0], ord(group[0]) for char in group[1:]: diff = ord(char) - basechr # although not specified it seems that you want # to compare diffs mod 7 i.e. -2 == +5 base.append(diff % 7) yield tuple(base) def compare_forms(sequence): """test whether all groups of a sequence have the same form""" for length in range(4,2,-1): # look for longest match first forms = {} try: for group in make_diffs(make_groups(sequence, length)): forms[group] = forms.get(group,0)+1 print forms except ValueError: # raised if the sequence is not a multiple of length pass >>> compare_forms(s1) {(0, 1, 2): 3} # found three instances of (0,1,2) >>> compare_forms(s2) {(0, 2, 5): 3} # found three instances >>> compare_forms(s3) {(0, 5, 5, 3): 2} # found two instances >>> compare_forms(b1) {(0, 1, 3, 6): 1, (0, 1, 2, 1): 1, (0, 1, 0, 1): 1} # found 3 4-tuples {(0, 1, 2): 3, (0, 2, 5): 1} # found 3 instances of (0,1,2) and 1 other >>> compare_forms(b2) {(0, 5, 4): 1, (0, 2, 5): 2} >>> HTH Michael -- http://mail.python.org/mailman/listinfo/python-list