Japhy Bartlett wrote: > I'm not sure that they cared about how you used file.readlines(), I think > the memory comment was a hint about instantiating Counter()s
Then they would have been clueless ;) Both Schtvveer's original script and his subsequent "Verschlimmbesserung" -- beautiful german word for making things worse when trying to improve them -- use only two Counters at any given time. The second version is very inefficient because it builds the same Counter over and over again -- but this does not affect peak memory usage much. Here's the original version that triggered the comment: [Schtvveer Schvrveve] > import sys > from collections import Counter > > def main(args): > filename = args[1] > word = args[2] > print countAnagrams(word, filename) > > def countAnagrams(word, filename): > > fileContent = readFile(filename) > > counter = Counter(word) > num_of_anagrams = 0 > > for i in range(0, len(fileContent)): > if counter == Counter(fileContent[i]): > num_of_anagrams += 1 > > return num_of_anagrams > > def readFile(filename): > > with open(filename) as f: > content = f.readlines() > > content = [x.strip() for x in content] > > return content > > if __name__ == '__main__': > main(sys.argv) > referenced as before.py below, and here's a variant that removes readlines(), range(), and the [x.strip() for x in content] list comprehension, the goal being minimal changes, not code as I would write it from scratch. # after.py import sys from collections import Counter def main(args): filename = args[1] word = args[2] print countAnagrams(word, filename) def countAnagrams(word, filename): fileContent = readFile(filename) counter = Counter(word) num_of_anagrams = 0 for line in fileContent: if counter == Counter(line): num_of_anagrams += 1 return num_of_anagrams def readFile(filename): # this relies on garbage collection to close the file # which should normally be avoided for line in open(filename): yield line.strip() if __name__ == '__main__': main(sys.argv) How to measure memoryview? I found <https://stackoverflow.com/questions/774556/peak-memory-usage-of-a-linux-unix-process> and as test data I use files containing 10**5 and 10**6 integers. With that setup (snipping everything but memory usage from the time -v output): $ /usr/bin/time -v python before.py anagrams5.txt 123 6 Maximum resident set size (kbytes): 17340 $ /usr/bin/time -v python before.py anagrams6.txt 123 6 Maximum resident set size (kbytes): 117328 $ /usr/bin/time -v python after.py anagrams5.txt 123 6 Maximum resident set size (kbytes): 6432 $ /usr/bin/time -v python after.py anagrams6.txt 123 6 Maximum resident set size (kbytes): 6432 See the pattern? before.py uses O(N) memory, after.py O(1). Run your own tests if you need more datapoints or prefer a different method to measure memory consumption. _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor