On Mon, 1 Aug 2005, Kent Johnson wrote:
> [EMAIL PROTECTED] wrote: > > hi, > > > > I have large txt file with lines like this: > > > > ['DDB0216437'] 1166 1174 9 ZZZ 100 > > > > What I want to do is quickly count the number of lines that share a > > value in the 4th column and 5th (i.e. in this line I would count all > > the line that have '9' and 'ZZZ'). Anyone got any ideas for the > > quickest way to do this? The solution I have is really ugly. thanks, A dictionary approach may also be useful. The following example should help illustrate the technique: ###### >>> def histogram(iterable): ... """Returns a list of counts of each unique element in iterable.""" ... d = {} ... for x in iterable: ... d[x] = d.get(x, 0) + 1 ... return d.items() ... >>> histogram("this is a test of the emergency broadcast system this is only a test") [('a', 4), (' ', 13), ('c', 2), ('b', 1), ('e', 7), ('d', 1), ('g', 1), ('f', 1), ('i', 4), ('h', 3), ('m', 2), ('l', 1), ('o', 3), ('n', 2), ('s', 9), ('r', 2), ('t', 9), ('y', 3)] ###### This is a fairly straightforward way of doing letter-frequency stuff. We can see from the histogram that the letters {'d', 'f', 'g', 'l'] are solitary and occur only once. This tallying approach can also be applied to the original poster's question with the columns of a text file, as long as we figure out what we want to tally up. Good luck! _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor