Re: efficient text file search -solution

2006-09-12 Thread Sion Arrowsmith
noro [EMAIL PROTECTED] wrote:
OK, am not sure why, but

fList=file('somefile').read()
if fList.find('string') != -1:
   print 'FOUND'

works much much faster.

it is strange since i thought 'for line in file('somefile')' is
optemized and read pages to the memory,

Step back and think about what each is doing at a high level of
description: file.read reads the contents of the file into memory
in one go, end of story. file.[x]readlines reads (some or all of)
the contents of the file into memeory, does a linear searches on it
for end of line characters, and copies out the line(s) into some
new bits of memory. Line-by-line processing has a *lot* more work
to do (unless you're read()ing a really big file which is going to
make heavy demands on memory allocation) and it should be no
surprise that it's slower.

-- 
\S -- [EMAIL PROTECTED] -- http://www.chaos.org.uk/~sion/
  ___  |  Frankly I have no feelings towards penguins one way or the other
  \X/  |-- Arthur C. Clarke
   her nu becomeþ se bera eadward ofdun hlæddre heafdes bæce bump bump bump
-- 
http://mail.python.org/mailman/listinfo/python-list

efficient text file search.

2006-09-11 Thread noro
Is there a more efficient method to find a string in a text file then:

f=file('somefile')
for line in f:
if 'string' in line:
 print 'FOUND'

?

BTW:
does for line in f:  read a block of line to te memory or is it
simply calls f.readline() many times?

thanks
amit

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: efficient text file search.

2006-09-11 Thread Luuk

noro [EMAIL PROTECTED] schreef in bericht 
news:[EMAIL PROTECTED]
 Is there a more efficient method to find a string in a text file then:

 f=file('somefile')
 for line in f:
if 'string' in line:
 print 'FOUND'



yes, more efficient would be:
grep (http://www.gnu.org/software/grep/)



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: efficient text file search.

2006-09-11 Thread noro
:)

via python...

Luuk wrote:
 noro [EMAIL PROTECTED] schreef in bericht
 news:[EMAIL PROTECTED]
  Is there a more efficient method to find a string in a text file then:
 
  f=file('somefile')
  for line in f:
 if 'string' in line:
  print 'FOUND'
 


 yes, more efficient would be:
 grep (http://www.gnu.org/software/grep/)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: efficient text file search.

2006-09-11 Thread Luuk

noro [EMAIL PROTECTED] schreef in bericht 
news:[EMAIL PROTECTED]
 :)

 via python...

 Luuk wrote:
 noro [EMAIL PROTECTED] schreef in bericht
 news:[EMAIL PROTECTED]
  Is there a more efficient method to find a string in a text file then:
 
  f=file('somefile')
  for line in f:
 if 'string' in line:
  print 'FOUND'
 


 yes, more efficient would be:
 grep (http://www.gnu.org/software/grep/)


ok, a more serious answer:

some googling turned op the following.
Second paragraph of chapter 14 of http://www.amk.ca/python/2.1/

a.. The speed of line-oriented file I/O has been improved because people 
often complain about its lack of speed, and because it's often been used as 
a naïve benchmark. The readline() method of file objects has therefore been 
rewritten to be much faster. The exact amount of the speedup will vary from 
platform to platform depending on how slow the C library's getc() was, but 
is around 66%, and potentially much faster on some particular operating 
systems. Tim Peters did much of the benchmarking and coding for this change, 
motivated by a discussion in comp.lang.python.
A new module and method for file objects was also added, contributed by Jeff 
Epler. The new method, xreadlines(), is similar to the existing xrange() 
built-in. xreadlines() returns an opaque sequence object that only supports 
being iterated over, reading a line on every iteration but not reading the 
entire file into memory as the existing readlines() method does. You'd use 
it like this:


for line in sys.stdin.xreadlines():
# ... do something for each line ...
...
For a fuller discussion of the line I/O changes, see the python-dev summary 
for January 1-15, 2001 at http://www.amk.ca/python/dev/2001-01-1.html.



-- 
http://mail.python.org/mailman/listinfo/python-list

Re: efficient text file search.

2006-09-11 Thread Kent Johnson
noro wrote:
 Is there a more efficient method to find a string in a text file then:
 
 f=file('somefile')
 for line in f:
 if 'string' in line:
  print 'FOUND'

Probably better to read the whole file at once if it isn't too big:
f = file('somefile')
data = f.read()
if 'string' in data:
print 'FOUND'
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: efficient text file search.

2006-09-11 Thread Bill Scherer
noro wrote:

Is there a more efficient method to find a string in a text file then:

f=file('somefile')
for line in f:
if 'string' in line:
 print 'FOUND'

?

BTW:
does for line in f:  read a block of line to te memory or is it
simply calls f.readline() many times?

thanks
amit
  

If your file is sorted by some key in the data, you can build a very 
fast binary search with mmap in Python.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: efficient text file search.

2006-09-11 Thread Ant

noro wrote:
 Is there a more efficient method to find a string in a text file then:

 f=file('somefile')
 for line in f:
 if 'string' in line:
  print 'FOUND'
break
  ^^^
Add a 'break' after the print statement - that way you won't have to
read the entire file unless the string isn't there. That's probably not
the sort of advice you're after though :-)

Can't see why reading the entire file in as the other poster suggested
would help, and seeing as for line in f: is now regarded as the
pythonic way of working with lines of text in a file, then I'd assume
that the implementation would be at least as fast as for line in
f.xreadlines(): 

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: efficient text file search.

2006-09-11 Thread John Machin

Luuk wrote:
[snip]
 some googling turned op the following.
 Second paragraph of chapter 14 of http://www.amk.ca/python/2.1/
[snip]
 For a fuller discussion of the line I/O changes, see the python-dev summary
 for January 1-15, 2001 at http://www.amk.ca/python/dev/2001-01-1.html.

That is *HISTORY*. That is Python 2.1. That is the year 2001.
xreadlines is as dead as a dodo.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: efficient text file search.

2006-09-11 Thread noro
can you add some more info, or point me to a link, i havn't found
anything about binary search in mmap() in python documents.

the files are very big...

thanks
amit
Bill Scherer wrote:
 noro wrote:

 Is there a more efficient method to find a string in a text file then:
 
 f=file('somefile')
 for line in f:
 if 'string' in line:
  print 'FOUND'
 
 ?
 
 BTW:
 does for line in f:  read a block of line to te memory or is it
 simply calls f.readline() many times?
 
 thanks
 amit
 
 
 If your file is sorted by some key in the data, you can build a very
 fast binary search with mmap in Python.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: efficient text file search.

2006-09-11 Thread Steve Holden
noro wrote:
 Bill Scherer wrote:
 
noro wrote:


Is there a more efficient method to find a string in a text file then:

f=file('somefile')
for line in f:
   if 'string' in line:
print 'FOUND'

?

BTW:
does for line in f:  read a block of line to te memory or is it
simply calls f.readline() many times?

thanks
amit



If your file is sorted by some key in the data, you can build a very
fast binary search with mmap in Python.
 
 
  can you add some more info, or point me to a link, i haven't found
  anything about binary search in mmap() in python documents.
 
  the files are very big...
 
[please don't top-post: add your latest comments at the end so the 
story reads from the beginning].

I think this is probably not going to help you. A binary search is only 
useful if you want to locate a value in an ordered list. Since your 
original posting made it seem like the text you are looking for could 
appear in any position in any line of the file a binary search doesn't 
do you any good at all (in fact it complicates things and slows them 
down unnecessarily) because you'd still need to look at all lines.

Plus, if the lines are of variable length then you'd need to start by 
creating an index of them, meaning you'd have to go right through the 
file anyway.

regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd  http://www.holdenweb.com
Skype: holdenweb   http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: efficient text file search.

2006-09-11 Thread Luuk

John Machin [EMAIL PROTECTED] schreef in bericht 
news:[EMAIL PROTECTED]

 Luuk wrote:
 [snip]
 some googling turned op the following.
 Second paragraph of chapter 14 of http://www.amk.ca/python/2.1/
 [snip]
 For a fuller discussion of the line I/O changes, see the python-dev 
 summary
 for January 1-15, 2001 at http://www.amk.ca/python/dev/2001-01-1.html.

 That is *HISTORY*. That is Python 2.1. That is the year 2001.
 xreadlines is as dead as a dodo.


Thats's why i started my reply with:
some googling turned op the following.
i did not state that further googling was unneeded ;-)


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: efficient text file search.

2006-09-11 Thread noro
i'm not sure.

each line in the text file and an index string. i can sort the file,
and use some binary tree search on
it. (I need to do a number of searchs).
there are 1219137 indexs in the file. so maby a memory efficient sort
algorithm is in place.
how can mmap help me?
is there any fbinary search algorithm for text files out there or do i
need to write one?


Steve Holden wrote:
 noro wrote:
  Bill Scherer wrote:
 
 noro wrote:
 
 
 Is there a more efficient method to find a string in a text file then:
 
 f=file('somefile')
 for line in f:
if 'string' in line:
 print 'FOUND'
 
 ?
 
 BTW:
 does for line in f:  read a block of line to te memory or is it
 simply calls f.readline() many times?
 
 thanks
 amit
 
 
 
 If your file is sorted by some key in the data, you can build a very
 fast binary search with mmap in Python.
 
 
   can you add some more info, or point me to a link, i haven't found
   anything about binary search in mmap() in python documents.
  
   the files are very big...
  
 [please don't top-post: add your latest comments at the end so the
 story reads from the beginning].

 I think this is probably not going to help you. A binary search is only
 useful if you want to locate a value in an ordered list. Since your
 original posting made it seem like the text you are looking for could
 appear in any position in any line of the file a binary search doesn't
 do you any good at all (in fact it complicates things and slows them
 down unnecessarily) because you'd still need to look at all lines.

 Plus, if the lines are of variable length then you'd need to start by
 creating an index of them, meaning you'd have to go right through the
 file anyway.

 regards
   Steve
 --
 Steve Holden   +44 150 684 7255  +1 800 494 3119
 Holden Web LLC/Ltd  http://www.holdenweb.com
 Skype: holdenweb   http://holdenweb.blogspot.com
 Recent Ramblings http://del.icio.us/steve.holden

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: efficient text file search.

2006-09-11 Thread George Sakkis
noro wrote:

 Is there a more efficient method to find a string in a text file then:

 f=file('somefile')
 for line in f:
 if 'string' in line:
  print 'FOUND'

 ?

Is this something you want to do only once for a given file ? The
replies so far seem to imply so and in this case I doubt that you can
do anything more efficient. OTOH, if the same file is to be searched
repeatedly for different strings, an appropriate indexing scheme can
speed things up considerably on average.

George

-- 
http://mail.python.org/mailman/listinfo/python-list


efficient text file search -solution

2006-09-11 Thread noro
OK, am not sure why, but

fList=file('somefile').read()
if fList.find('string') != -1:
   print 'FOUND'

works much much faster.

it is strange since i thought 'for line in file('somefile')' is
optemized and read pages to the memory,
i guess not..

George Sakkis wrote:
 noro wrote:

  Is there a more efficient method to find a string in a text file then:
 
  f=file('somefile')
  for line in f:
  if 'string' in line:
   print 'FOUND'
 
  ?

 Is this something you want to do only once for a given file ? The
 replies so far seem to imply so and in this case I doubt that you can
 do anything more efficient. OTOH, if the same file is to be searched
 repeatedly for different strings, an appropriate indexing scheme can
 speed things up considerably on average.
 
 George

-- 
http://mail.python.org/mailman/listinfo/python-list