Ok, I've seen various passes at this problem using regex, split('='), etc., but the solutions seem fairly fragile, and the OP doesn't seem happy with any of them. Here is how this problem looks if you were going to try breaking it up with pyparsing: - Each line starts with an integer, and the string "ALA" - "ALA" is followed by a series of "X = 1.2"-type attributes, where the value part might be missing.
And to implement (with a few bells and whistles thrown in for free): data = """48 ALA H = 8.33 N = 120.77 CA = 55.18 HA = 4.12 C = 181.50 104 ALA H = 7.70 N = 121.21 CA = 54.32 HA = 4.21 C = 85 ALA H = 8.60 N = CA = HA = 4.65 C =""".splitlines() from pyparsing import * # define some basic data expressions integer = Word(nums) real = Combine(Word(nums) + "." + Word(nums)) # use parse actions to automatically convert numeric # strings to actual numbers at parse time integer.setParseAction(lambda tokens:int(tokens[0])) real.setParseAction(lambda tokens:float(tokens[0])) # define expressions for 'X = 1.2' assignments; note that the # value might be missing, so use Optional - we'll fill in # a default value of 0.0 if no value is given keyValue = Word(alphas.upper()) + '=' + \ Optional(real|integer, default=0.0) # define overall expression for the data on a line dataline = integer + "ALA" + OneOrMore(Group(keyValue))("kvdata") # attach parse action to define named values in the returned tokens def assignDataByKey(tokens): for k,_,v in tokens.kvdata: tokens[k] = v dataline.setParseAction(assignDataByKey) # for each line in the input data, parse it and print some of the data fields for d in data: print d parsedData = dataline.parseString(d) print parsedData.dump() print parsedData.CA print parsedData.N print Prints out: 48 ALA H = 8.33 N = 120.77 CA = 55.18 HA = 4.12 C = 181.50 [48, 'ALA', ['H', '=', 8.3300000000000001], ['N', '=', 120.77], ['CA', '=', 55.18], ['HA', '=', 4.1200000000000001], ['C', '=', 181.5]] - C: 181.5 - CA: 55.18 - H: 8.33 - HA: 4.12 - N: 120.77 - kvdata: [['H', '=', 8.3300000000000001], ['N', '=', 120.77], ['CA', '=', 55.18], ['HA', '=', 4.1200000000000001], ['C', '=', 181.5]] 55.18 120.77 104 ALA H = 7.70 N = 121.21 CA = 54.32 HA = 4.21 C = [104, 'ALA', ['H', '=', 7.7000000000000002], ['N', '=', 121.20999999999999], ['CA', '=', 54.32], ['HA', '=', 4.21], ['C', '=', 0.0]] - C: 0.0 - CA: 54.32 - H: 7.7 - HA: 4.21 - N: 121.21 - kvdata: [['H', '=', 7.7000000000000002], ['N', '=', 121.20999999999999], ['CA', '=', 54.32], ['HA', '=', 4.21], ['C', '=', 0.0]] 54.32 121.21 85 ALA H = 8.60 N = CA = HA = 4.65 C = [85, 'ALA', ['H', '=', 8.5999999999999996], ['N', '=', 0.0], ['CA', '=', 0.0], ['HA', '=', 4.6500000000000004], ['C', '=', 0.0]] - C: 0.0 - CA: 0.0 - H: 8.6 - HA: 4.65 - N: 0.0 - kvdata: [['H', '=', 8.5999999999999996], ['N', '=', 0.0], ['CA', '=', 0.0], ['HA', '=', 4.6500000000000004], ['C', '=', 0.0]] 0.0 0.0 Learn more about pyparsing at http://pyparsing.wikispaces.com. -- Paul _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor