Re: read Unicode characters one by one in python2

Chris Angelico Sun, 25 Feb 2018 13:00:06 -0800

On Mon, Feb 26, 2018 at 3:57 AM, Steven D'Aprano
<[email protected]> wrote:
> On Mon, 26 Feb 2018 01:50:16 +1100, Chris Angelico wrote:
>
>> If you actually need character-by-character, you'd need "for character
>> in fh.read()" rather than iterating over the file itself. Iterating over
>> a file yields lines.
>
> Indeed. But I wonder if there's a performance cost/gain to iterating over
> each line, rather than reading one char at a time?
>
> for line in file:
>     for c in line:
>         ...
>
> Too lazy to actually test it myself, but just tossing this idea out in
> case anyone else cares to give it a try.
>


Depends on the size of the file. For a small file, you could read the
whole thing into memory in a single disk operation, and then splitting
into lines is a waste of time; but for a gigantic file, reading
everything into RAM means crazy-expensive transfer/copy, so it'd be
HEAPS more efficient to work line by line - particularly if you don't
need the whole file.

But if you indeed want to cut the process off, having nested loops
means a simple "break" won't work. So that's a different
consideration.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: read Unicode characters one by one in python2

Reply via email to