Re: Python fh.seek() oddity

2009-08-17 Thread Bobby Powers
On Fri, Aug 14, 2009 at 2:26 PM, Martin
Langhoffmartin.langh...@gmail.com wrote:
 On Fri, Aug 14, 2009 at 2:02 PM, Martin
 Langhoffmartin.langh...@gmail.com wrote:
 Seems one of my roles in life is to find all the oddities in Python.
 Hints from truly experienced Pythonistas welcome.

 Feels good to blame Python, but after a walk outside I managed to get
 it to seek backwards. And even found the bug in my code.

 Now, let me tell you, Perl bugs are *even worse* than this ;-)

Would you mind sharing what the bug was, or posting the updated script?

yours,
Bobby




 m
 --
  martin.langh...@gmail.com
  mar...@laptop.org -- School Server Architect
  - ask interesting questions
  - don't get distracted with shiny stuff  - working code first
  - http://wiki.laptop.org/go/User:Martinlanghoff
 ___
 Devel mailing list
 Devel@lists.laptop.org
 http://lists.laptop.org/listinfo/devel

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Python fh.seek() oddity

2009-08-17 Thread Martin Langhoff
On Mon, Aug 17, 2009 at 8:40 PM, Bobby Powersbobbypow...@gmail.com wrote:
 Would you mind sharing what the bug was, or posting the updated script?

I'm merging the functions in the script into the bitfrost.leases lib.
The bug had to do with miscounting offsets. I had forgotten to count
buftail in the offset. Today I fixed an additional bug in the
definition of needlelength.

With those fixed, it passes my tests -- it finds the keys even on the
page boundaries ok.

It's not fast, and it's not elegant, but it fixes a showstopper bug
for large deployments, so I'll post a patch for review  comment tmw.

cheers,



m
-- 
 martin.langh...@gmail.com
 mar...@laptop.org -- School Server Architect
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Python fh.seek() oddity

2009-08-14 Thread Martin Langhoff
Seems one of my roles in life is to find all the oddities in Python.
Hints from truly experienced Pythonistas welcome.

I am finding that after I do

  fh.seek(pos)
  buf = fh.read(pagesz)
  match = regexobj.search(buf)

the next fh.seek() will always be to _at least_ the end of the match.
I can no longer fh.seek(pos) or fh.seek(pos+1) -- the call succeeds
but the next read() _always_ starts at the end of the last match. But!
We never seek to that particular point.

Is this expected? Known? Normal when reading the docs under the
influence of powerful drugs?

Sample script is attached for the truly curious - try it on a large
CJSON file, providing a separate file with dict keys to search for.
Once you get past the first 'page', and the code has to re-seek after
it found the match, you'll see the problem.

I suspect it's related to Python possibly having the file mmap()ed
behind the scenes, without telling me.

(What I am writing is actually a grep over very large files for cases
where python's mmap is not available. Say for instance our wonky
initrd).

cheers,



m
-- 
 martin.langh...@gmail.com
 mar...@laptop.org -- School Server Architect
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff
#!/usr/bin/python

import re

def grep_for_lease_mmap(fpath, sn):
Search a potentially larger-than-mem cjson file for
   something that looks like a lease or a series of leases.

   Uses mmap.

   returns a string or False
   
import mmap
fh = open(fpath, 'r+')
m = mmap.mmap(fh.fileno(), 0)

# find the start of it
rx = re.compile(''+sn+':')
objkey = rx.search(m)

if objkey:
# find the tail - the first non-escaped
# doublequotes. This relies on sigs not
# having escape chars themselves.
# TODO: Negative look-behind assertion to handle
# escaped values.
rx = re.compile('')
objend = rx.search(m, objkey.end()) 

if objkey and objend:
found = m[objkey.end():objend.start()-1]
else:
found = False

m.close()
fh.close()

return found

def grep_for_lease(fpath, sn):
Search a potentially larger-than-mem cjson file for
   something that looks like a lease or a series of leases.

   Uses old read()s

   returns a string or False
   
# Use read()s, but keep stuff aligned to 4KB pages
# so we stand a chance to hit the fast paths.
page = 4096 #* 1024
step  = 0
cursor = 0

needlerx = re.compile(''+sn+':')
needlelength = len(sn) + 2

fh = open(fpath, 'r+')

buf = ''
buftail = ''

while True:

buf = fh.read(page)
if (buf == ''): # EOF
break

buf = buftail + buf

objkey = needlerx.search(buf)
if objkey:
# found the needle - issue a read
# from here and break
# -- we rewind 1 char so the rx includes
# -- the opening single-quote
fh.seek( page * step + objkey.start()-1 )
buf = fh.read(page)
# re-search for objkey - to get the offsets right
objkey = needlerx.search(buf)
break

# prep for next read - keep tail
# in case needle is on the boundary
buftail = buf[-needlelength:]
step = step+1
fh.seek( page * step )
print  [ Seek to %s ] % page * step

if objkey:
# find the tail - the first non-escaped
# doublequotes. This relies on sigs not
# having escape chars themselves.
# TODO: Negative look-behind assertion to handle
# escaped values.
rx = re.compile('')
objend = rx.search(buf, objkey.end()) 

if objkey and objend:
found = buf[objkey.end():objend.start()]
else:
found = False

fh.close()

return found

import sys

fh = file(sys.argv[1])
bigdata = {}
lines = fh.readlines()
for k in lines:
k = k.strip()
print Looking for %s % k
found = grep_for_lease(sys.argv[2], k)
if found:
if found == k.swapcase():
print ... found good match
else:
print BAD MATCH %s % found
else:
print NO MATCH


#found = grep_for_lease('/media/soas/big.json', 'CSN7470319B')
#
#if found:
#print Found: + found
#else:
#print 'not found'

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Python fh.seek() oddity

2009-08-14 Thread Martin Langhoff
On Fri, Aug 14, 2009 at 2:02 PM, Martin
Langhoffmartin.langh...@gmail.com wrote:
 Seems one of my roles in life is to find all the oddities in Python.
 Hints from truly experienced Pythonistas welcome.

Feels good to blame Python, but after a walk outside I managed to get
it to seek backwards. And even found the bug in my code.

Now, let me tell you, Perl bugs are *even worse* than this ;-)




m
-- 
 martin.langh...@gmail.com
 mar...@laptop.org -- School Server Architect
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel