Re: speeding up reading files (possibly with cython)

Tim Chase Sat, 07 Mar 2009 16:20:42 -0800

i have a program that essentially loops through a textfile file thats
about 800 MB in size containing tab separated data... my program
parses this file and stores its fields in a dictionary of lists.


for line in file:
  split_values = line.strip().split('\t')
  # do stuff with split_values

currently, this is very slow in python, even if all i do is break up
each line using split() and store its values in a dictionary, indexing
by one of the tab separated values in the file.

I'm not sure what the situation is, but I regularly skim throughtab-delimited files of similar size and haven't noticed anyproblems like you describe. You might try tweaking the optional(and infrequently specified) bufsize parameter of theopen()/file() call:


  bufsize = 4 * 1024 * 1024 # buffer 4 megs at a time
  f = file('in.txt', 'r', bufsize)
  for line in f:
    split_values = line.strip().split('\t')
    # do stuff with split_values

If not specified, you're at the mercy of the system-default(perhaps OS specific?). You can read more at[1] along with theassociated warning about setvbuf()


-tkc


[1]
http://docs.python.org/library/functions.html#open






--
http://mail.python.org/mailman/listinfo/python-list

Re: speeding up reading files (possibly with cython)

Reply via email to