i'm not sure. each line in the text file and an index string. i can sort the file, and use some binary tree search on it. (I need to do a number of searchs). there are 1219137 indexs in the file. so maby a memory efficient sort algorithm is in place. how can mmap help me? is there any fbinary search algorithm for text files out there or do i need to write one?
Steve Holden wrote: > noro wrote: > > Bill Scherer wrote: > > > >>noro wrote: > >> > >> > >>>Is there a more efficient method to find a string in a text file then: > >>> > >>>f=file('somefile') > >>>for line in f: > >>> if 'string' in line: > >>> print 'FOUND' > >>> > >>>? > >>> > >>>BTW: > >>>does "for line in f: " read a block of line to te memory or is it > >>>simply calls f.readline() many times? > >>> > >>>thanks > >>>amit > >>> > >>> > >> > >>If your file is sorted by some key in the data, you can build a very > >>fast binary search with mmap in Python. > > > > > > can you add some more info, or point me to a link, i haven't found > > anything about binary search in mmap() in python documents. > > > > the files are very big... > > > [please don't "top-post": add your latest comments at the end so the > story reads from the beginning]. > > I think this is probably not going to help you. A binary search is only > useful if you want to locate a value in an ordered list. Since your > original posting made it seem like the text you are looking for could > appear in any position in any line of the file a binary search doesn't > do you any good at all (in fact it complicates things and slows them > down unnecessarily) because you'd still need to look at all lines. > > Plus, if the lines are of variable length then you'd need to start by > creating an index of them, meaning you'd have to go right through the > file anyway. > > regards > Steve > -- > Steve Holden +44 150 684 7255 +1 800 494 3119 > Holden Web LLC/Ltd http://www.holdenweb.com > Skype: holdenweb http://holdenweb.blogspot.com > Recent Ramblings http://del.icio.us/steve.holden -- http://mail.python.org/mailman/listinfo/python-list