hi,

i am doing a series of very simple string operations on lines i am
reading from a large file (~15 million lines). i store the result of
these operations in a simple instance of a class, and then put it
inside of a hash table. i found that this is unusually slow... for
example:

class myclass(object):
    __slots__ = ("a", "b", "c", "d")
    def __init__(self, a, b, c, d):
        self.a = a
        self.b = b
        self.c = c
        self.d = d
    def __str__(self):
        return "%s_%s_%s_%s" %(self.a, self.b, self.c, self.d)
    def __hash__(self):
        return hash((self.a, self.b, self.c, self.d))
    def __eq__(self, other):
        return (self.a == other.a and \
                self.b == other.b and \
                self.c == other.c and \
                self.d == other.d)
    __repr__ = __str__

n = 15000000
table = defaultdict(int)
t1 = time.time()
for k in range(1, n):
    myobj = myclass('a' + str(k), 'b', 'c', 'd')
    table[myobj] = 1
t2 = time.time()
print "time: ", float((t2-t1)/60.0)

this takes a very long time to run: 11 minutes!. for the sake of the
example i am not reading anything from file here but in my real code i
do. also, i do 'a' + str(k) but in my real code this is some simple
string operation on the line i read from the file. however, i found
that the above code shows the real bottle neck, since reading my file
into memory (using readlines()) takes only about 4 seconds. i then
have to iterate over these lines, but i still think that is more
efficient than the 'for line in file' approach which is even slower.

in the above code is there a way to optimize the creation of the class
instances ? i am using defaultdicts instead of ordinary ones so i dont
know how else to optimize that part of the code. is there a way to
perhaps optimize the way the class is written? if takes only 3 seconds
to read in 15 million lines into memory it doesnt make sense to me
that making them into simple objects while at it would take that much
more...
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to