Re: Canonical way of dealing with null-separated lines?

John Machin Thu, 24 Feb 2005 13:00:06 -0800

On Thu, 24 Feb 2005 11:53:32 -0500, Christopher De Vries
<[EMAIL PROTECTED]> wrote:


>On Wed, Feb 23, 2005 at 10:54:50PM -0500, Douglas Alan wrote:
>> Is there a canonical way of iterating over the lines of a file that
>> are null-separated rather than newline-separated?
>
>I'm not sure if there is a canonical method, but I would recommending using a
>generator to get something like this, where 'f' is a file object:
>
>def readnullsep(f):
>    # Need a place to put potential pieces of a null separated string
>    # across buffer boundaries
>    retain = []
>
>    while True:
>        instr = f.read(2048)
>        if len(instr)==0:
>            # End of file
>            break
>
>        # Split over nulls
>        splitstr = instr.split('\0')
>
>        # Combine with anything left over from previous read
>        retain.append(splitstr[0])
>        splitstr[0] = ''.join(retain)
>
>        # Keep last piece for next loop and yield the rest
>        retain = [splitstr[-1]]
>        for element in splitstr[:-1]:

(1) Inefficient (copies all but the last element of splitstr)

>            yield element
>
>    # yield anything left over
>    yield retain[0]

(2) Dies when the input file is empty.

(3) As noted by the OP, can return a spurious empty line at the end.

Try this:

!def readweird(f, line_end='\0', bufsiz=8192): 
!    retain = '' 
!    while True: 
!        instr = f.read(bufsiz)
!        if not instr:
!            # End of file 
!            break 
!        splitstr = instr.split(line_end)
!        if splitstr[-1]:
!            # last piece not terminated
!            if retain:
!                splitstr[0] = retain + splitstr[0]
!            retain = splitstr.pop()
!        else:
!            if retain:
!                splitstr[0] = retain + splitstr[0]
!                retain = ''
!            del splitstr[-1]
!        for element in splitstr: 
!            yield element 
!    if retain:
!        yield retain

Cheers,
John
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Canonical way of dealing with null-separated lines?

Reply via email to