Re: [Python-Dev] Fuzziness in io module specs

2009-09-20 Thread Pascal Chambon
Well, system compatibility argues strongl in favor of not letting 
filepointer > EOF.
However, is that really necessary to move the pointer to EOF in ANY case 
? I mean, if I extend the file, or if I reduce it without going lower 
than my current filepointer, I really don't expect at all the io system 
to move my pointer to the end of file, "just for fun". In these 
patterns, people would have to remember their current filepointer, to 
come back to where they were, and that's not pretty imo...


If we agree on the simple mandatory expression 0 <= filepointer <= EOF 
(for cross-platform safety), then we just have to enforce it when the 
rule is broken : reducing the size lower than the filepointer, and 
seeking past the end of file. All other conditions should leav the 
filepointer where the user put it. Shouldnt it be so ?


  
Concerning the naming of truncate(), would it be possible to deprecate 
it and alias it to "resize()" ? It's not very gratifying to have 
duplicated methods at the beginning of a major release, but I feel too 
that "truncate" is a misleading term, that had better be replaced asap.


Regards,
Pascal
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] POSIX [Fuzziness in io module specs]

2009-09-20 Thread Pascal Chambon




What we could do with is better platform-independent
ways of distinguishing particular error conditions,
such as file not found, out of space, etc., either
using subclasses of IOError or mapping error codes
to a set of platform-independent ones.



Well, mapping all errors (including C ones and windows-specific ones) to 
a common set would be extremely useful for developers indeed.
I guess some advanced windows errors will never have equivalents 
elsewhere, but does anyone know an error code set which would be 
relevant to cover all memorty, filesystem, io and locking aspects ?



Regards,
Pascal
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fuzziness in io module specs

2009-09-20 Thread Greg Ewing

Pascal Chambon wrote:

  Concerning the naming of truncate(), would it be possible to deprecate 
it and alias it to "resize()" ? It's not very gratifying to have 
duplicated methods at the beginning of a major release, but I feel too 
that "truncate" is a misleading term, that had better be replaced asap.


There's something to be said for that, but there's also
something to be said for following established conventions,
and there's a long-established precedent from the C library
for having a function called truncate() that behaves this
way.

--
Greg
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fuzziness in io module specs - PEP update proposition

2009-09-20 Thread Pascal Chambon

Hello

After weighing up here and that, here is what I have come with. Comments 
and issue notifications more than welcome, of course. The exception 
thingy is not yet addressed.


Regards,
Pascal


*Truncate and file pointer semantics*

Rationale :

The current implementation of truncate() always move the file pointer to 
the new end of file.


This behaviour is interesting for compatibility, if the file has been 
reduced and the file pointer is now past its end, since some platforms 
might require 0 <= filepointer <= filesize.


However, there are several arguments against this semantic:

   * Most common standards (posix, win32...) allow the file pointer to
 be past the end of file, and define the behaviour of other stream
 methods in this case
   * In many cases, moving the filepointer when truncating has no
 reasons to happen (if we're extending the file, or reducing it
 without going beneath the file pointer)
   * Making 0 <= filepointer <= filesize a global rule of the python IO
 module doesn't seems possible, since it would require
 modifications of the semantic of other methods (eg. seek() should
 raise exceptions or silently disobey when asked to move the
 filepointer past the end of file), and lead to incoherent
 situations when concurrently accessing files without locking (what
 if another process truncates to 0 bytes the file you're writing ?)

So here is the proposed semantic, which matches established conventions:

*RawIOBase.truncate(n: int = None) -> int*

*(same for BufferedIOBase.truncate(pos: int = None) -> int)*

Resizes the file to the size specified by the positive integer n, or by 
the current filepointer position if n is None.


The file must be opened with write permissions.

If the file was previously larger than n, the extra data is discarded. 
If the file was previously shorter than n, its size is increased, and 
the extended area appears as if it were zero-filled.


In any case, the file pointer is left unchanged, and may point beyond 
the end of file.


Note: trying to read past the end of file returns an empty string, and 
trying to write past the end of file extends it by zero-ing the gap. On 
rare platforms which don't support file pointers to be beyond the end of 
file, all these behaviours shall be faked thanks to internal storage of 
the "wanted" file pointer position (silently extending the file, if 
necessary, when a write operation occurs).




*Proposition of doc update*

*RawIOBase*.read(n: int) -> bytes

Read up to n bytes from the object and return them. Fewer than n bytes 
may be returned if the operating system call returns fewer than n bytes. 
If 0 bytes are returned, and n was not 0, this indicates end of file. If 
the object is in non-blocking mode and no bytes are available, the call 
returns None.


*RawIOBase*.readinto(b: bytes) -> int

Read up to len(b) bytes from the object and stores them in b, returning 
the number of bytes read. Like .read, fewer than len(b) bytes may be 
read, and 0 indicates end of file if b is not 0. None is returned if a 
non-blocking object has no bytes available. The length of b is never 
changed.





___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fuzziness in io module specs - PEP update proposition

2009-09-20 Thread MRAB

Pascal Chambon wrote:

Hello

After weighing up here and that, here is what I have come with. Comments 
and issue notifications more than welcome, of course. The exception 
thingy is not yet addressed.


Regards,
Pascal


*Truncate and file pointer semantics*

Rationale :

The current implementation of truncate() always move the file pointer to 
the new end of file.


This behaviour is interesting for compatibility, if the file has been 
reduced and the file pointer is now past its end, since some platforms 
might require 0 <= filepointer <= filesize.


However, there are several arguments against this semantic:

* Most common standards (posix, win32…) allow the file pointer to be
  past the end of file, and define the behaviour of other stream
  methods in this case
* In many cases, moving the filepointer when truncating has no
  reasons to happen (if we’re extending the file, or reducing it
  without going beneath the file pointer)
* Making 0 <= filepointer <= filesize a global rule of the python IO
  module doesn’t seems possible, since it would require
  modifications of the semantic of other methods (eg. seek() should
  raise exceptions or silently disobey when asked to move the
  filepointer past the end of file), and lead to incoherent
  situations when concurrently accessing files without locking (what
  if another process truncates to 0 bytes the file you’re writing ?)

So here is the proposed semantic, which matches established conventions:

*RawIOBase.truncate(n: int = None) -> int*

*(same for BufferedIOBase.truncate(pos: int = None) -> int)*

Resizes the file to the size specified by the positive integer n, or by 
the current filepointer position if n is None.



The new size could be positive or zero.


The file must be opened with write permissions.

If the file was previously larger than n, the extra data is discarded. 
If the file was previously shorter than n, its size is increased, and 
the extended area appears as if it were zero-filled.


In any case, the file pointer is left unchanged, and may point beyond 
the end of file.


Note: trying to read past the end of file returns an empty string, and 
trying to write past the end of file extends it by zero-ing the gap. On 
rare platforms which don’t support file pointers to be beyond the end of 
file, all these behaviours shall be faked thanks to internal storage of 
the “wanted” file pointer position (silently extending the file, if 
necessary, when a write operation occurs).


 


*Proposition of doc update*

*RawIOBase*.read(n: int) -> bytes

Read up to n bytes from the object and return them. Fewer than n bytes 
may be returned if the operating system call returns fewer than n bytes. 
If 0 bytes are returned, and n was not 0, this indicates end of file. If 
the object is in non-blocking mode and no bytes are available, the call 
returns None.


*RawIOBase*.readinto(b: bytes) -> int

Read up to len(b) bytes from the object and stores them in b, returning 
the number of bytes read. Like .read, fewer than len(b) bytes may be 
read, and 0 indicates end of file if b is not 0. None is returned if a 
non-blocking object has no bytes available. The length of b is never 
changed.



I thought 'bytes' was immutable!

If you're going to read into a list or array, it would be nice to also
be able to give the start index and either the end index or the count
(start defaults to 0, end defaults to len).
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fuzziness in io module specs

2009-09-20 Thread MRAB

Greg Ewing wrote:

Pascal Chambon wrote:

  Concerning the naming of truncate(), would it be possible to 
deprecate it and alias it to "resize()" ? It's not very gratifying to 
have duplicated methods at the beginning of a major release, but I 
feel too that "truncate" is a misleading term, that had better be 
replaced asap.


There's something to be said for that, but there's also
something to be said for following established conventions,
and there's a long-established precedent from the C library
for having a function called truncate() that behaves this
way.


But Python isn't C. :-)

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fuzziness in io module specs - PEP update proposition

2009-09-20 Thread Antoine Pitrou

Hello,

> *Truncate and file pointer semantics*
[snip]

The major problem here is that you are changing the current semantics.
I don't think the argument you are making for it is strong enough to 
counter-balance the backwards compatibility issue. The situation would be 
different if 3.0 hadn't been released yet.

Besides, we already broke compatibility with 3.0/3.1, let's not give 
users the impression that we don't care about compatibility anymore.

Regards

Antoine.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] please consider changing --enable-unicode default to ucs4

2009-09-20 Thread Zooko O'Whielacronx
Dear Pythonistas:

This issue causes serious problems.  Users occasionally get binaries built for a
compatible Linux and Python version but with a different UCS2-vs-UCS4 setting,
and those users get mysterious memory corruption errors which are hard to
diagnose.  It is possible that these situations also open up security
vulnerabilities.  A couple such instances are documented on
http://bugs.python.org/setuptools/issue78, but you can find more by googling.
I would like to get this problem fixed!

In order to help address this issue I sampled what UCS size is used by python
executables in the wild.  I instrumented a few buildslaves that are
contributed by
various people to the Tahoe-LAFS project to print out their platform,
python version,
and sys.maxunicode.  The full results are appended below.  maxunicode: 1114111
means that python executable was configured with --enable-unicode=ucs4, and
maxunicode: 65535 means that python executable was configured with
--enable-unicode=ucs2 or just with --enable-unicode .  The only
incompatibilities
that I found are because some packagers have deliberately set UCS4
configuration and other packagers have left the default setting.

In the three cases where someone configured python with UCS2, one of the three
is certainly an accident (a custom-built python executable on an Ubuntu server)
and the other two just use the default instead of specifically configuring ucs2
in their configurations of Python and I suspect that they don't know the
difference and that it was an accident that they built a Python incompatible
with other distributions of their operating system.

In sum, while it would be good to add the unicode setting to the platform's ABI
(as discussed in setuptools ticket #78), it would also be good to make
the default
value be UCS4 instead of UCS2.  This would fix all three of the potential
incompatibilities that I found (listed below), and once we have proper inclusion
of the unicode setting in the ABI in order to prevent the memory corruption,
defaulting to UCS4 would increase the likelihood that a binary built on one
distribution would be usable on another.

I'm sure that someone can come up with a reason why UCS2 is better than UCS4,
but I'm also sure that the benefits of compatibility outweigh any benefits of
UCS2 encoding, and that the widespread use of UCS4 demonstrates that there is
nothing fatally wrong with it, and that people who really value UCS2 encoding
more than compatibility can choose that for themselves by explicitly
setting UCS2.

Let me restate that I am not suggesting taking away anyone's options, only
making the setting for people who don't specify default to the
compatible option.
Hm, I guess that means that it should default to UCS2 on Windows and Mac and
to UCS4 on Linux and Solaris.

Regards,

Zooko

Ubuntu 6.10 "edgy" i386: python: 2.4.4c1 (#2, Mar  7 2008, 03:03:38)  [GCC 4.1.2
20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5)], maxunicode: 1114111
Ubuntu 7.04 "feisty": python: 2.5.1 (r251:54863, Jul 31 2008, 22:53:39)  [GCC
4.1.2 (Ubuntu 4.1.2-0ubuntu4)], maxunicode: 1114111
Ubuntu 7.10 "gutsy" i386: python: 2.5.1 (r251:54863, Jul 31 2008, 23:17:40)
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)], maxunicode: 1114111
Ubuntu 8.04 "hardy" amd64: python: 2.5.2 (r252:60911, Jul 22 2009, 15:33:10)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)], maxunicode: 1114111
Ubuntu 8.04 "hardy" i386: *custom* python: 2.6 (r26:66714, Oct  2 2008,
13:40:28)  [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)], maxunicode: 65535
Ubuntu 8.04 "hardy" i386: python: 2.5.2 (r252:60911, Jul 22 2009, 15:35:03)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)], maxunicode: 1114111
Ubuntu 9.04 "jaunty" amd64: *custom* python: 2.6.2 (release26-maint, Apr 19
2009, 01:58:18)  [GCC 4.3.3], maxunicode: 1114111

Debian 4.0 "etch" i386: python: 2.4.4 (#2, Oct 22 2008, 19:52:44)  [GCC 4.1.2
20061115 (prerelease) (Debian 4.1.1-21)], maxunicode: 1114111
Debian 5.0 "lenny" i386: python: 2.5.2 (r252:60911, Jan  4 2009, 17:40:26)  [GCC
4.3.2], maxunicode: 1114111
Debian 5.0 "lenny" amd64: python: 2.5.2 (r252:60911, Jan  4 2009, 21:59:32)
[GCC 4.3.2], maxunicode: 1114111
Debian 5.0 "lenny" armv5tel: python: 2.5.2 (r252:60911, Jan  5 2009, 02:00:00)
[GCC 4.3.2], maxunicode: 1114111
Debian unstable "squeeze/sid" i386: python: 2.5.4 (r254:67916, Feb 17 2009,
20:16:45)  [GCC 4.3.3], maxunicode: 1114111

Fedora 11 "leonidas" amd64: python: 2.6 (r26:66714, Jul  4 2009, 17:37:13)  [GCC
4.4.0 20090506 (Red Hat 4.4.0-4)], maxunicode: 1114111

ArchLinux: python: 2.6.2 (r262:71600, Jul 20 2009, 02:23:30)  [GCC 4.4.0
20090630 (prerelease)], maxunicode: 65535

NetBSD 4: python: 2.5.2 (r252:60911, Mar 20 2009, 14:00:07)  [GCC 4.1.2 20060628
prerelease (NetBSD nb2 20060711)], maxunicode: 65535

OpenSolaris SunOS-5.11-i86pc-i386-32bit: python: 2.4.4 (#1, Mar 10 2009,
09:35:36) [C], maxunicode: 65535
Nexenta NCP1 SunOS-5.11-i86pc-i386-32bit: python: 2.4.3 (#2, May  3 2006,
19:12:42)  [GCC 4.0.3 (GNU_OpenSolaris 4.0.3-1nexenta

Re: [Python-Dev] please consider changing --enable-unicode default to ucs4

2009-09-20 Thread Benjamin Peterson
2009/9/20 Zooko O'Whielacronx :
> Dear Pythonistas:
>
> This issue causes serious problems.  Users occasionally get binaries built 
> for a
> compatible Linux and Python version but with a different UCS2-vs-UCS4 setting,
> and those users get mysterious memory corruption errors which are hard to
> diagnose.  It is possible that these situations also open up security
> vulnerabilities.  A couple such instances are documented on
> http://bugs.python.org/setuptools/issue78, but you can find more by googling.
> I would like to get this problem fixed!

You may want to have a look at the archives of the last time this was
extensively discussed:
http://mail.python.org/pipermail/python-dev/2008-July/080886.html


-- 
Regards,
Benjamin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] please consider changing --enable-unicode default to ucs4

2009-09-20 Thread Zooko O'Whielacronx
I'm sorry, I should have mentioned that I did read those archives
before I posted my letter.  That discussion was all about whether UCS2
or UCS4 is better.  I consider that question to be mostly irrelevant
to this issue, which is about compatibility for people who don't
choose to configure that setting themselves.  Platforms or people who
prefer UCS2 will continue to use it as appropriate.  UCS4 is clearly
good enough for the vast majority of Linux users, and having fewer
mysterious segfaults and potential security vulnerabilities would be
an important improvement to the user experience of Python on Linux.

I should mention that the reason I'm spending time on this right now
is that it is currently blocking me from being able to distribute
binaries of Python packages which will work for all of my Linux users.

Regards,

Zooko
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] please consider changing --enable-unicode default to ucs4

2009-09-20 Thread Antoine Pitrou
Zooko O'Whielacronx  gmail.com> writes:
> 
> Users occasionally get binaries built for a
> compatible Linux and Python version but with a different UCS2-vs-UCS4 setting,
> and those users get mysterious memory corruption errors which are hard to
> diagnose.

What "binaries" are you talking about?
AFAIK, C extensions should fail loading when they have the wrong UCS2/4 setting.
That's the reason we have all those #define's in unicodeobject.h: the actual
function names end up being different and, therefore, are not found when 
linking.

> In order to help address this issue I sampled what UCS size is used by python
> executables in the wild.

For information, all Mandriva versions I've used until now have had their
Python's built with UCS2 (maxunicode == 65535).

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fuzziness in io module specs - PEP update proposition

2009-09-20 Thread Daniel Stutzbach
On Sun, Sep 20, 2009 at 4:48 AM, Pascal Chambon wrote:

> *RawIOBase*.readinto(b: bytes) -> int
>

"bytes" are immutable.  The signature is:

*RawIOBase*.readinto(b: bytearray) -> int

Your efforts in working on clarifying these important corner cases is
appreciated. :-)

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC 
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] please consider changing --enable-unicode default to ucs4

2009-09-20 Thread Zooko O'Whielacronx
On Sun, Sep 20, 2009 at 8:27 AM, Antoine Pitrou  wrote:
>
> What "binaries" are you talking about?

I mean extension modules with native code, which means .so shared
library files on unix.

> AFAIK, C extensions should fail loading when they have the wrong UCS2/4 
> setting.

That would be an improvement!  Unfortunately we instead get mysterious
misbehavior of the module, e.g.:

http://bugs.python.org/setuptools/msg309
http://allmydata.org/trac/tahoe/ticket/704#comment:5

> For information, all Mandriva versions I've used until now have had their
> Python's built with UCS2 (maxunicode == 65535).

Thank you for the data point.  This means that binary extension
modules built on Mandriva can't be ported to Ubuntu or vice versa.
However, is this an argument for or against changing the default
setting to UCS4?  Changing the default setting wouldn't interfere with
Mandriva's decision, right?

Regards,

Zooko
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fuzziness in io module specs - PEP update proposition

2009-09-20 Thread Pascal Chambon

Daniel Stutzbach a écrit :
On Sun, Sep 20, 2009 at 4:48 AM, Pascal Chambon 
mailto:[email protected]>> wrote:


*RawIOBase*.readinto(b: bytes) -> int


"bytes" are immutable.  The signature is:

*RawIOBase*.readinto(b: bytearray) -> int

Your efforts in working on clarifying these important corner cases is 
appreciated. :-)



You're welcome B-)

Indeed my copy/paste of the current pep was an epic fail - you'll all 
have recognized readinto actually dealt with bytearrays, contrarily to 
what the current pep tells

-> http://www.python.org/dev/peps/pep-3116/

RawIOBase.read(int) takes a positive-or-zero integer indeed (I am used 
to understanding this, as opposed to "strictly positive")


Does MRAb's suggestion of providing beginning and end offsets for the 
bytearray meets people's expectations ? Personnaly, I feel readinto is a 
very low-level method, mostly used by read() to get a result from 
low-level native functions (fread, readfile), and read() always provides 
a buffer with the proper size... are there cases in which these two 
additional arguments would provide some real gain ?



Concerning the "backward compatibility" problem, I agree we should not 
break specifications, but breaking impelmentation details is another 
thing for me. It's a golden rule in programmers' world : thou shalt 
NEVER rely on implementation details. Programs that count on these (eg. 
thinking that listdir() will always returns "." and ".." as first 
item0... until it doesnt anymore) encounter huge problems when changing 
of platform or API version. When programming with the current 
truncate(), I would always have moved the file pointer after truncating 
the file, simply because I have no idea of what might happen to it 
(nothing was documented on this at the moment, and looking at the 
sources is really not a sustainable behaviour).
So well, it's a pity if some early 3.1 users relied on it, but if we 
stick to the current semantic we still have a real coherency problem - 
seek() is not limited in range, and some experienced programmers might 
be trapped by this non-conventionnal truncate() if they rely on posix or 
previous python versions... I really dislike the idea that truncate() 
might move my file offset even when there are no reasons for it.


Regards,
Pascal




___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] please consider changing --enable-unicode default to ucs4

2009-09-20 Thread Zooko O'Whielacronx
On Sun, Sep 20, 2009 at 8:27 AM, Antoine Pitrou  wrote:
> For information, all Mandriva versions I've used until now have had their
> Python's built with UCS2 (maxunicode == 65535).

By the way, I was investigating this, and discovered an issue on the
Mandriva tracker which suggests that they intend to switch to UCS4 in
the next release in order to avoid compatibility problems like these.
(Not because they think that UCS4 is better than UCS2.)

https://qa.mandriva.com/show_bug.cgi?id=48570

Regards,

Zooko
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] please consider changing --enable-unicode default to ucs4

2009-09-20 Thread Antoine Pitrou
Le Sun, 20 Sep 2009 10:17:45 -0600, Zooko O'Whielacronx a écrit :
> 
>> AFAIK, C extensions should fail loading when they have the wrong UCS2/4
>> setting.
> 
> That would be an improvement!  Unfortunately we instead get mysterious
> misbehavior of the module, e.g.:
> 
> http://bugs.python.org/setuptools/msg309
> http://allmydata.org/trac/tahoe/ticket/704#comment:5

The bug reports in themselves aren't very explicit, and they don't seem 
to be related to any native extension. So I'm not sure why you're talking 
about "mysterious memory corruption errors" in your original mail, 
because there doesn't seem to be such a thing happening at all.

Please note that there's a bug related to a non-portable peephole 
optimization of some unicode constants, perhaps it may explain the 
aforementioned problems (perhaps not) :
http://bugs.python.org/issue5057

I expect the solution to this bug to be rather easy (just disable the 
optimization, since it isn't really useful), but someone has to care 
enough to produce a patch.

>> For information, all Mandriva versions I've used until now have had
>> their Python's built with UCS2 (maxunicode == 65535).
> 
> Thank you for the data point.  This means that binary extension modules
> built on Mandriva can't be ported to Ubuntu or vice versa.

"Ported" they can certainly be, you just have to recompile.

> However, is
> this an argument for or against changing the default setting to UCS4? 
> Changing the default setting wouldn't interfere with Mandriva's
> decision, right?

Well, let's put it this way:
- either you expect the default setting to be observed by everyone, and 
it *will* interfere with someone's current decision
- or you don't expect the default setting to be observed by everyone, and 
then there's no point in changing it because it won't stop your problems

Either way, my mentioning of Mandriva was just meant as an additional 
data point to those you already provided ;-)

Regards

Antoine.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] please consider changing --enable-unicode default to ucs4

2009-09-20 Thread Antoine Pitrou
Le Sun, 20 Sep 2009 10:33:23 -0600, Zooko O'Whielacronx a écrit :
> 
> By the way, I was investigating this, and discovered an issue on the
> Mandriva tracker which suggests that they intend to switch to UCS4 in
> the next release in order to avoid compatibility problems like these.

Trying to use a Fedora or Suse RPM under Mandriva (or the other way 
round) isn't reasonable and is certainly not supported.
I don't understand why this so-called "compatibility problem" should be 
taken seriously by anyone.

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] please consider changing --enable-unicode default to ucs4

2009-09-20 Thread M.-A. Lemburg
Zooko O'Whielacronx wrote:
> On Sun, Sep 20, 2009 at 8:27 AM, Antoine Pitrou  wrote:
>>
>> What "binaries" are you talking about?
> 
> I mean extension modules with native code, which means .so shared
> library files on unix.

Those will not load unless they are for the right UCS-version of
Python. The extensions will give an ImportError if they are
using any Unicode APIs - we go through great lengths in the
Unicode API to make sure that you cannot mix UCS2 and UCS4 APIs.

I'm not exactly sure what you are trying to achieve by making
UCS4 the default... if you build extensions using the system
Python version, distutils will automatically build the right
UCS-version for you.

>> AFAIK, C extensions should fail loading when they have the wrong UCS2/4 
>> setting.
> 
> That would be an improvement!  Unfortunately we instead get mysterious
> misbehavior of the module, e.g.:
> 
> http://bugs.python.org/setuptools/msg309
> http://allmydata.org/trac/tahoe/ticket/704#comment:5

Those don't appear to be related to UCS2 vs. UCS4 but rather
some problem with the UTF-8 data those users are trying to load.

The fact that setuptools completely ignores the fact
that Python UCS2 and UCS4 are two different Python builds, is
not really a Python Unicode problem, but one of the setuptools design,
so you should probably complain there.

>> For information, all Mandriva versions I've used until now have had their
>> Python's built with UCS2 (maxunicode == 65535).
> 
> Thank you for the data point.  This means that binary extension
> modules built on Mandriva can't be ported to Ubuntu or vice versa.
> However, is this an argument for or against changing the default
> setting to UCS4?  Changing the default setting wouldn't interfere with
> Mandriva's decision, right?

Depends on what you mean with "ported": of course you can port a
source RPM between UCS2 and UCS4 builds. This just requires a
recompile.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Sep 20 2009)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fuzziness in io module specs - PEP update proposition

2009-09-20 Thread MRAB

Pascal Chambon wrote:

Daniel Stutzbach a écrit :
On Sun, Sep 20, 2009 at 4:48 AM, Pascal Chambon 
mailto:[email protected]>> wrote:


*RawIOBase*.readinto(b: bytes) -> int


"bytes" are immutable.  The signature is:

*RawIOBase*.readinto(b: bytearray) -> int

Your efforts in working on clarifying these important corner cases is 
appreciated. :-)



You're welcome B-)

Indeed my copy/paste of the current pep was an epic fail - you'll all 
have recognized readinto actually dealt with bytearrays, contrarily to 
what the current pep tells

-> http://www.python.org/dev/peps/pep-3116/

RawIOBase.read(int) takes a positive-or-zero integer indeed (I am used 
to understanding this, as opposed to "strictly positive")


Does MRAb's suggestion of providing beginning and end offsets for the 
bytearray meets people's expectations ? Personnaly, I feel readinto is a 
very low-level method, mostly used by read() to get a result from 
low-level native functions (fread, readfile), and read() always provides 
a buffer with the proper size... are there cases in which these two 
additional arguments would provide some real gain ?



It's useful if you want to fill the buffer but 'read' might return fewer
bytes than you asked for because it returns only what's available. That
might not happen for files, but it might for other forms of I/O. Other
languages, like Delphi and Java, which read into pre-existing arrays,
have or allow the extra parameters.


Concerning the "backward compatibility" problem, I agree we should not 
break specifications, but breaking impelmentation details is another 
thing for me. It's a golden rule in programmers' world : thou shalt 
NEVER rely on implementation details. Programs that count on these (eg. 
thinking that listdir() will always returns "." and ".." as first 
item0... until it doesnt anymore) encounter huge problems when changing 
of platform or API version. When programming with the current 
truncate(), I would always have moved the file pointer after truncating 
the file, simply because I have no idea of what might happen to it 
(nothing was documented on this at the moment, and looking at the 
sources is really not a sustainable behaviour).
So well, it's a pity if some early 3.1 users relied on it, but if we 
stick to the current semantic we still have a real coherency problem - 
seek() is not limited in range, and some experienced programmers might 
be trapped by this non-conventionnal truncate() if they rely on posix or 
previous python versions... I really dislike the idea that truncate() 
might move my file offset even when there are no reasons for it.



Well, if it's consistent and documented (and not totally stupid), I
can't really complain. :-)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] please consider changing --enable-unicode default to ucs4

2009-09-20 Thread Martin v. Löwis
Zooko O'Whielacronx wrote:
> I'm sorry, I should have mentioned that I did read those archives
> before I posted my letter.  That discussion was all about whether UCS2
> or UCS4 is better.  I consider that question to be mostly irrelevant
> to this issue, which is about compatibility for people who don't
> choose to configure that setting themselves. 

You surely must have missed the sentence

"For that reason I think it's also better that the configure script
continues to default to UTF-16 -- this will give the UTF-16 support
code the necessary exercise."

This is effectively a BDFL pronouncement. Nothing has changed the
validity of the premise of the statement, so the conclusion remains
valid, as well.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com