Re: [Python-Dev] TextIO seek and tell cookies

2016-09-26 Thread Guido van Rossum
Yeah, that should work. The implementation is something like a byte
offset to the start of a line plus a character count, plus some misc
flags. I found this implementation in the 2.6 code (the last version
where it was pure Python code):

def _pack_cookie(self, position, dec_flags=0,
   bytes_to_feed=0, need_eof=0, chars_to_skip=0):
# The meaning of a tell() cookie is: seek to position, set the
# decoder flags to dec_flags, read bytes_to_feed bytes, feed them
# into the decoder with need_eof as the EOF flag, then skip
# chars_to_skip characters of the decoded result.  For most simple
# decoders, tell() will often just give a byte offset in the file.
return (position | (dec_flags<<64) | (bytes_to_feed<<128) |
   (chars_to_skip<<192) | bool(need_eof)<<256)

def _unpack_cookie(self, bigint):
rest, position = divmod(bigint, 1<<64)
rest, dec_flags = divmod(rest, 1<<64)
rest, bytes_to_feed = divmod(rest, 1<<64)
need_eof, chars_to_skip = divmod(rest, 1<<64)
return position, dec_flags, bytes_to_feed, need_eof, chars_to_skip

On Mon, Sep 26, 2016 at 3:43 PM, Greg Ewing  wrote:
> Ben Leslie wrote:
>>
>> But the idea of transmitting these offsets outside of a running
>> process is not something that I had anticipated. It got me thinking:
>> is there a guarantee that these opaque values returned from tell() is
>> stable across different versions of Python?
>
>
> Are they even guaranteed to work on a different file
> object in the same process? I.e. if you read some stuff
> from a file, do tell() on it, then close it, open it
> again and seek() with that token, are you guaranteed to
> end up at the same place in the file?
>
> --
> Greg
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/guido%40python.org



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] TextIO seek and tell cookies

2016-09-26 Thread Greg Ewing

Ben Leslie wrote:

But the idea of transmitting these offsets outside of a running
process is not something that I had anticipated. It got me thinking:
is there a guarantee that these opaque values returned from tell() is
stable across different versions of Python?


Are they even guaranteed to work on a different file
object in the same process? I.e. if you read some stuff
from a file, do tell() on it, then close it, open it
again and seek() with that token, are you guaranteed to
end up at the same place in the file?

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] TextIO seek and tell cookies

2016-09-26 Thread Random832
On Mon, Sep 26, 2016, at 05:30, Ben Leslie wrote:
> I think the case of JSON or SQL database is even more important though.
> 
> tell/seek can return 129-bit integers (maybe even more? my maths might
> be off here).
> 
> The very large integers that can be returned by tell() will break
> serialization to JSON, and storing in a SQL database (at least for
> most database types).
> 
> What is the value of comparing these to plain integers? Unless you
> happen to know the magic encoding it isn't going to be very useful I
> think?

I assume the value is that in the circumstances in which all of the
flags and other bits are zero, they can be used as offsets in precisely
the way that you used them. It may also be possible that in some cases
where they are not zero, doing arithmetic with them is still "safe"
since the real offset is still in the low-order bits. I don't know if
those circumstances are predictable enough for it to be worthwhile.
Changing it would obviously break code that does this (unless, perhaps,
it were changed to be a class with arithmetic operators), the question
is whether such code "deserves" to be broken.

In my own tests, even a UTF-8-sig file with DOS line endings "worked".
Does anyone have information about what circumstances can reliably cause
tell() to return values that are *not* simple integers? Maybe it has
something to do with working with stateful encodings like iso-2022 or
UTF-7?

What was the situation that caused your problem?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] TextIO seek and tell cookies

2016-09-26 Thread Ben Leslie
It was pointed out in private email that technically JSON can
represent very large integers even if ECMAScript itself can't.

But the idea of transmitting these offsets outside of a running
process is not something that I had anticipated. It got me thinking:
is there a guarantee that these opaque values returned from tell() is
stable across different versions of Python? My reading of opaque is
that it could be subject to change, but that possibly isn't the
intent.

It seems that since the sizeof(int) and sizeof(Py_off_t) could be
different in different builds of Python even off the same version,
then the opaque value returned is necessarily going to be different
between builds of even the same version of Python.

It seems like it would be prudent to discourage the sharing of these
opaque cookies (such as via a database or interchange formats) as
you'd have to be very sure that they would be interpreted correctly in
any receiving instance.

Cheers,

Ben

On 26 September 2016 at 02:30, Ben Leslie  wrote:
> I think the case of JSON or SQL database is even more important though.
>
> tell/seek can return 129-bit integers (maybe even more? my maths might
> be off here).
>
> The very large integers that can be returned by tell() will break
> serialization to JSON, and storing in a SQL database (at least for
> most database types).
>
> What is the value of comparing these to plain integers? Unless you
> happen to know the magic encoding it isn't going to be very useful I
> think?
>
> Cheers,
>
> Ben
>
> On 25 September 2016 at 21:18, Guido van Rossum  wrote:
>> Be careful though, comparing these to plain integers should probably
>> be allowed, and we also should make sure that things like
>> serialization via JSON or storing in an SQL database don't break. I
>> personally think it's one of those "learn not to touch the stove"
>> cases and there's limited value in making this API idiot proof.
>>
>> On Sun, Sep 25, 2016 at 9:05 PM, Nick Coghlan  wrote:
>>> On 26 September 2016 at 10:21, MRAB  wrote:
 On 2016-09-26 00:21, Ben Leslie wrote:
> Are there any downsides to this? I've made some progress developing a
> patch to change this functionality. Is it worth polishing and
> submitting?
>
 An alternative might be a subclass of int.
>>>
>>> It could make sense to use a subclass of int that emitted deprecation
>>> warnings for integer arithmetic, and then eventually disallowed it
>>> entirely.
>>>
>>> Cheers,
>>> Nick.
>>>
>>> --
>>> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
>>> ___
>>> Python-Dev mailing list
>>> Python-Dev@python.org
>>> https://mail.python.org/mailman/listinfo/python-dev
>>> Unsubscribe: 
>>> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>>
>>
>>
>> --
>> --Guido van Rossum (python.org/~guido)
>> ___
>> Python-Dev mailing list
>> Python-Dev@python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe: 
>> https://mail.python.org/mailman/options/python-dev/benno%40benno.id.au
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] TextIO seek and tell cookies

2016-09-26 Thread Ben Leslie
I think the case of JSON or SQL database is even more important though.

tell/seek can return 129-bit integers (maybe even more? my maths might
be off here).

The very large integers that can be returned by tell() will break
serialization to JSON, and storing in a SQL database (at least for
most database types).

What is the value of comparing these to plain integers? Unless you
happen to know the magic encoding it isn't going to be very useful I
think?

Cheers,

Ben

On 25 September 2016 at 21:18, Guido van Rossum  wrote:
> Be careful though, comparing these to plain integers should probably
> be allowed, and we also should make sure that things like
> serialization via JSON or storing in an SQL database don't break. I
> personally think it's one of those "learn not to touch the stove"
> cases and there's limited value in making this API idiot proof.
>
> On Sun, Sep 25, 2016 at 9:05 PM, Nick Coghlan  wrote:
>> On 26 September 2016 at 10:21, MRAB  wrote:
>>> On 2016-09-26 00:21, Ben Leslie wrote:
 Are there any downsides to this? I've made some progress developing a
 patch to change this functionality. Is it worth polishing and
 submitting?

>>> An alternative might be a subclass of int.
>>
>> It could make sense to use a subclass of int that emitted deprecation
>> warnings for integer arithmetic, and then eventually disallowed it
>> entirely.
>>
>> Cheers,
>> Nick.
>>
>> --
>> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
>> ___
>> Python-Dev mailing list
>> Python-Dev@python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe: 
>> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>
>
>
> --
> --Guido van Rossum (python.org/~guido)
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/benno%40benno.id.au
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] TextIO seek and tell cookies

2016-09-26 Thread Ben Leslie
On 25 September 2016 at 17:21, MRAB  wrote:
> On 2016-09-26 00:21, Ben Leslie wrote:
>>
>> Hi all,
>>
>> I recently shot myself in the foot by assuming that TextIO.tell
>> returned integers rather than opaque cookies. Specifically I was
>> adding an offset to the value returned by TextIO.tell. In retrospect
>> this doesn't make sense/
>>
>> Now, I don't want to drive change simply because I failed to read the
>> documentation carefully, but I think the current API is very easy to
>> misuse. Most of the time TextIO.tell returns a cookie that is actually
>> an integer and adding an offset to it and seek-ing works fine.
>>
>> The only indication you get that you are mis-using the API is that
>> sometimes tell returns a cookie that when you add an integer offset to
>> it will cause seek() to fail with an OverflowError.
>>
>> Would it be possible to change the API to return something more
>> opaque? E.g.: rather than converting the C cookie structure to a long,
>> could it instead be converted to  a bytes() object.
>>
>> (I.e.: Change textiowrapper_build_cookie to use
>> PyBytes_FromStringAndSize rather than _PyLong_FromByteArray and
>> equivalent for textiowrapper_parse_cookie).
>>
>> This would ensure the return value is never mis-used and is probably
>> also faster using bytes objects than converting to/from an integer.
>>
> why would it be faster? It's an integer internally.


It isn't an integer internally though, it is a cookie:

typedef struct {
   Py_off_t start_pos;
int dec_flags;
int bytes_to_feed;
int chars_to_skip;
char need_eof;
} cookie_type;

The memory view of this structure is then converted to a long. Surely
converting to a PyLong is more work than converting to bytes?
In any case, performance really isn't the motivation here.

Cheers,

Ben
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] TextIO seek and tell cookies

2016-09-25 Thread Peter Ludemann via Python-Dev
On 25 September 2016 at 21:18, Guido van Rossum  wrote:

> Be careful though, comparing these to plain integers should probably
> be allowed,


​There's a good reason why it's "opaque" ... why would you want to make it
less opaque?

And I'm curious why Python didn't adopt the fgetpos/fsetpos style that
makes the data structure completely opaque (fpos_t). IIRC, this was added
to C when the ANSI standard was first written, to allow cross-platform
compatibility in cases where ftell/fseek was difficult (or impossible) to
fully implement. Maybe those reasons don't matter any more (e.g., dealing
with record-oriented or keyed file systems) ...



> and we also should make sure that things like
> serialization via JSON or storing in an SQL database don't break. I
> personally think it's one of those "learn not to touch the stove"
> cases and there's limited value in making this API idiot proof.
>
> On Sun, Sep 25, 2016 at 9:05 PM, Nick Coghlan  wrote:
> > On 26 September 2016 at 10:21, MRAB  wrote:
> >> On 2016-09-26 00:21, Ben Leslie wrote:
> >>> Are there any downsides to this? I've made some progress developing a
> >>> patch to change this functionality. Is it worth polishing and
> >>> submitting?
> >>>
> >> An alternative might be a subclass of int.
> >
> > It could make sense to use a subclass of int that emitted deprecation
> > warnings for integer arithmetic, and then eventually disallowed it
> > entirely.
> >
> > Cheers,
> > Nick.
> >
> > --
> > Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
> > ___
> > Python-Dev mailing list
> > Python-Dev@python.org
> > https://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> guido%40python.org
>
>
>
> --
> --Guido van Rossum (python.org/~guido)
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> pludemann%40google.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] TextIO seek and tell cookies

2016-09-25 Thread Guido van Rossum
Be careful though, comparing these to plain integers should probably
be allowed, and we also should make sure that things like
serialization via JSON or storing in an SQL database don't break. I
personally think it's one of those "learn not to touch the stove"
cases and there's limited value in making this API idiot proof.

On Sun, Sep 25, 2016 at 9:05 PM, Nick Coghlan  wrote:
> On 26 September 2016 at 10:21, MRAB  wrote:
>> On 2016-09-26 00:21, Ben Leslie wrote:
>>> Are there any downsides to this? I've made some progress developing a
>>> patch to change this functionality. Is it worth polishing and
>>> submitting?
>>>
>> An alternative might be a subclass of int.
>
> It could make sense to use a subclass of int that emitted deprecation
> warnings for integer arithmetic, and then eventually disallowed it
> entirely.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/guido%40python.org



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] TextIO seek and tell cookies

2016-09-25 Thread Nick Coghlan
On 26 September 2016 at 10:21, MRAB  wrote:
> On 2016-09-26 00:21, Ben Leslie wrote:
>> Are there any downsides to this? I've made some progress developing a
>> patch to change this functionality. Is it worth polishing and
>> submitting?
>>
> An alternative might be a subclass of int.

It could make sense to use a subclass of int that emitted deprecation
warnings for integer arithmetic, and then eventually disallowed it
entirely.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] TextIO seek and tell cookies

2016-09-25 Thread MRAB

On 2016-09-26 00:21, Ben Leslie wrote:

Hi all,

I recently shot myself in the foot by assuming that TextIO.tell
returned integers rather than opaque cookies. Specifically I was
adding an offset to the value returned by TextIO.tell. In retrospect
this doesn't make sense/

Now, I don't want to drive change simply because I failed to read the
documentation carefully, but I think the current API is very easy to
misuse. Most of the time TextIO.tell returns a cookie that is actually
an integer and adding an offset to it and seek-ing works fine.

The only indication you get that you are mis-using the API is that
sometimes tell returns a cookie that when you add an integer offset to
it will cause seek() to fail with an OverflowError.

Would it be possible to change the API to return something more
opaque? E.g.: rather than converting the C cookie structure to a long,
could it instead be converted to  a bytes() object.

(I.e.: Change textiowrapper_build_cookie to use
PyBytes_FromStringAndSize rather than _PyLong_FromByteArray and
equivalent for textiowrapper_parse_cookie).

This would ensure the return value is never mis-used and is probably
also faster using bytes objects than converting to/from an integer.


why would it be faster? It's an integer internally.


Are there any downsides to this? I've made some progress developing a
patch to change this functionality. Is it worth polishing and
submitting?


An alternative might be a subclass of int.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] TextIO seek and tell cookies

2016-09-25 Thread Ben Leslie
Hi all,

I recently shot myself in the foot by assuming that TextIO.tell
returned integers rather than opaque cookies. Specifically I was
adding an offset to the value returned by TextIO.tell. In retrospect
this doesn't make sense/

Now, I don't want to drive change simply because I failed to read the
documentation carefully, but I think the current API is very easy to
misuse. Most of the time TextIO.tell returns a cookie that is actually
an integer and adding an offset to it and seek-ing works fine.

The only indication you get that you are mis-using the API is that
sometimes tell returns a cookie that when you add an integer offset to
it will cause seek() to fail with an OverflowError.

Would it be possible to change the API to return something more
opaque? E.g.: rather than converting the C cookie structure to a long,
could it instead be converted to  a bytes() object.

(I.e.: Change textiowrapper_build_cookie to use
PyBytes_FromStringAndSize rather than _PyLong_FromByteArray and
equivalent for textiowrapper_parse_cookie).

This would ensure the return value is never mis-used and is probably
also faster using bytes objects than converting to/from an integer.

Are there any downsides to this? I've made some progress developing a
patch to change this functionality. Is it worth polishing and
submitting?

Cheers,

Ben
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com