Re: efficient text file search -solution
noro [EMAIL PROTECTED] wrote: OK, am not sure why, but fList=file('somefile').read() if fList.find('string') != -1: print 'FOUND' works much much faster. it is strange since i thought 'for line in file('somefile')' is optemized and read pages to the memory, Step back and think about what each is doing at a high level of description: file.read reads the contents of the file into memory in one go, end of story. file.[x]readlines reads (some or all of) the contents of the file into memeory, does a linear searches on it for end of line characters, and copies out the line(s) into some new bits of memory. Line-by-line processing has a *lot* more work to do (unless you're read()ing a really big file which is going to make heavy demands on memory allocation) and it should be no surprise that it's slower. -- \S -- [EMAIL PROTECTED] -- http://www.chaos.org.uk/~sion/ ___ | Frankly I have no feelings towards penguins one way or the other \X/ |-- Arthur C. Clarke her nu becomeþ se bera eadward ofdun hlæddre heafdes bæce bump bump bump -- http://mail.python.org/mailman/listinfo/python-list
efficient text file search.
Is there a more efficient method to find a string in a text file then: f=file('somefile') for line in f: if 'string' in line: print 'FOUND' ? BTW: does for line in f: read a block of line to te memory or is it simply calls f.readline() many times? thanks amit -- http://mail.python.org/mailman/listinfo/python-list
Re: efficient text file search.
noro [EMAIL PROTECTED] schreef in bericht news:[EMAIL PROTECTED] Is there a more efficient method to find a string in a text file then: f=file('somefile') for line in f: if 'string' in line: print 'FOUND' yes, more efficient would be: grep (http://www.gnu.org/software/grep/) -- http://mail.python.org/mailman/listinfo/python-list
Re: efficient text file search.
:) via python... Luuk wrote: noro [EMAIL PROTECTED] schreef in bericht news:[EMAIL PROTECTED] Is there a more efficient method to find a string in a text file then: f=file('somefile') for line in f: if 'string' in line: print 'FOUND' yes, more efficient would be: grep (http://www.gnu.org/software/grep/) -- http://mail.python.org/mailman/listinfo/python-list
Re: efficient text file search.
noro [EMAIL PROTECTED] schreef in bericht news:[EMAIL PROTECTED] :) via python... Luuk wrote: noro [EMAIL PROTECTED] schreef in bericht news:[EMAIL PROTECTED] Is there a more efficient method to find a string in a text file then: f=file('somefile') for line in f: if 'string' in line: print 'FOUND' yes, more efficient would be: grep (http://www.gnu.org/software/grep/) ok, a more serious answer: some googling turned op the following. Second paragraph of chapter 14 of http://www.amk.ca/python/2.1/ a.. The speed of line-oriented file I/O has been improved because people often complain about its lack of speed, and because it's often been used as a naïve benchmark. The readline() method of file objects has therefore been rewritten to be much faster. The exact amount of the speedup will vary from platform to platform depending on how slow the C library's getc() was, but is around 66%, and potentially much faster on some particular operating systems. Tim Peters did much of the benchmarking and coding for this change, motivated by a discussion in comp.lang.python. A new module and method for file objects was also added, contributed by Jeff Epler. The new method, xreadlines(), is similar to the existing xrange() built-in. xreadlines() returns an opaque sequence object that only supports being iterated over, reading a line on every iteration but not reading the entire file into memory as the existing readlines() method does. You'd use it like this: for line in sys.stdin.xreadlines(): # ... do something for each line ... ... For a fuller discussion of the line I/O changes, see the python-dev summary for January 1-15, 2001 at http://www.amk.ca/python/dev/2001-01-1.html. -- http://mail.python.org/mailman/listinfo/python-list
Re: efficient text file search.
noro wrote: Is there a more efficient method to find a string in a text file then: f=file('somefile') for line in f: if 'string' in line: print 'FOUND' Probably better to read the whole file at once if it isn't too big: f = file('somefile') data = f.read() if 'string' in data: print 'FOUND' -- http://mail.python.org/mailman/listinfo/python-list
Re: efficient text file search.
noro wrote: Is there a more efficient method to find a string in a text file then: f=file('somefile') for line in f: if 'string' in line: print 'FOUND' ? BTW: does for line in f: read a block of line to te memory or is it simply calls f.readline() many times? thanks amit If your file is sorted by some key in the data, you can build a very fast binary search with mmap in Python. -- http://mail.python.org/mailman/listinfo/python-list
Re: efficient text file search.
noro wrote: Is there a more efficient method to find a string in a text file then: f=file('somefile') for line in f: if 'string' in line: print 'FOUND' break ^^^ Add a 'break' after the print statement - that way you won't have to read the entire file unless the string isn't there. That's probably not the sort of advice you're after though :-) Can't see why reading the entire file in as the other poster suggested would help, and seeing as for line in f: is now regarded as the pythonic way of working with lines of text in a file, then I'd assume that the implementation would be at least as fast as for line in f.xreadlines(): -- http://mail.python.org/mailman/listinfo/python-list
Re: efficient text file search.
Luuk wrote: [snip] some googling turned op the following. Second paragraph of chapter 14 of http://www.amk.ca/python/2.1/ [snip] For a fuller discussion of the line I/O changes, see the python-dev summary for January 1-15, 2001 at http://www.amk.ca/python/dev/2001-01-1.html. That is *HISTORY*. That is Python 2.1. That is the year 2001. xreadlines is as dead as a dodo. -- http://mail.python.org/mailman/listinfo/python-list
Re: efficient text file search.
can you add some more info, or point me to a link, i havn't found anything about binary search in mmap() in python documents. the files are very big... thanks amit Bill Scherer wrote: noro wrote: Is there a more efficient method to find a string in a text file then: f=file('somefile') for line in f: if 'string' in line: print 'FOUND' ? BTW: does for line in f: read a block of line to te memory or is it simply calls f.readline() many times? thanks amit If your file is sorted by some key in the data, you can build a very fast binary search with mmap in Python. -- http://mail.python.org/mailman/listinfo/python-list
Re: efficient text file search.
noro wrote: Bill Scherer wrote: noro wrote: Is there a more efficient method to find a string in a text file then: f=file('somefile') for line in f: if 'string' in line: print 'FOUND' ? BTW: does for line in f: read a block of line to te memory or is it simply calls f.readline() many times? thanks amit If your file is sorted by some key in the data, you can build a very fast binary search with mmap in Python. can you add some more info, or point me to a link, i haven't found anything about binary search in mmap() in python documents. the files are very big... [please don't top-post: add your latest comments at the end so the story reads from the beginning]. I think this is probably not going to help you. A binary search is only useful if you want to locate a value in an ordered list. Since your original posting made it seem like the text you are looking for could appear in any position in any line of the file a binary search doesn't do you any good at all (in fact it complicates things and slows them down unnecessarily) because you'd still need to look at all lines. Plus, if the lines are of variable length then you'd need to start by creating an index of them, meaning you'd have to go right through the file anyway. regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://holdenweb.blogspot.com Recent Ramblings http://del.icio.us/steve.holden -- http://mail.python.org/mailman/listinfo/python-list
Re: efficient text file search.
John Machin [EMAIL PROTECTED] schreef in bericht news:[EMAIL PROTECTED] Luuk wrote: [snip] some googling turned op the following. Second paragraph of chapter 14 of http://www.amk.ca/python/2.1/ [snip] For a fuller discussion of the line I/O changes, see the python-dev summary for January 1-15, 2001 at http://www.amk.ca/python/dev/2001-01-1.html. That is *HISTORY*. That is Python 2.1. That is the year 2001. xreadlines is as dead as a dodo. Thats's why i started my reply with: some googling turned op the following. i did not state that further googling was unneeded ;-) -- http://mail.python.org/mailman/listinfo/python-list
Re: efficient text file search.
i'm not sure. each line in the text file and an index string. i can sort the file, and use some binary tree search on it. (I need to do a number of searchs). there are 1219137 indexs in the file. so maby a memory efficient sort algorithm is in place. how can mmap help me? is there any fbinary search algorithm for text files out there or do i need to write one? Steve Holden wrote: noro wrote: Bill Scherer wrote: noro wrote: Is there a more efficient method to find a string in a text file then: f=file('somefile') for line in f: if 'string' in line: print 'FOUND' ? BTW: does for line in f: read a block of line to te memory or is it simply calls f.readline() many times? thanks amit If your file is sorted by some key in the data, you can build a very fast binary search with mmap in Python. can you add some more info, or point me to a link, i haven't found anything about binary search in mmap() in python documents. the files are very big... [please don't top-post: add your latest comments at the end so the story reads from the beginning]. I think this is probably not going to help you. A binary search is only useful if you want to locate a value in an ordered list. Since your original posting made it seem like the text you are looking for could appear in any position in any line of the file a binary search doesn't do you any good at all (in fact it complicates things and slows them down unnecessarily) because you'd still need to look at all lines. Plus, if the lines are of variable length then you'd need to start by creating an index of them, meaning you'd have to go right through the file anyway. regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://holdenweb.blogspot.com Recent Ramblings http://del.icio.us/steve.holden -- http://mail.python.org/mailman/listinfo/python-list
Re: efficient text file search.
noro wrote: Is there a more efficient method to find a string in a text file then: f=file('somefile') for line in f: if 'string' in line: print 'FOUND' ? Is this something you want to do only once for a given file ? The replies so far seem to imply so and in this case I doubt that you can do anything more efficient. OTOH, if the same file is to be searched repeatedly for different strings, an appropriate indexing scheme can speed things up considerably on average. George -- http://mail.python.org/mailman/listinfo/python-list
efficient text file search -solution
OK, am not sure why, but fList=file('somefile').read() if fList.find('string') != -1: print 'FOUND' works much much faster. it is strange since i thought 'for line in file('somefile')' is optemized and read pages to the memory, i guess not.. George Sakkis wrote: noro wrote: Is there a more efficient method to find a string in a text file then: f=file('somefile') for line in f: if 'string' in line: print 'FOUND' ? Is this something you want to do only once for a given file ? The replies so far seem to imply so and in this case I doubt that you can do anything more efficient. OTOH, if the same file is to be searched repeatedly for different strings, an appropriate indexing scheme can speed things up considerably on average. George -- http://mail.python.org/mailman/listinfo/python-list