date:20090424

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-24 Thread Simon Cross

On Wed, Apr 22, 2009 at 8:50 AM, "Martin v. Löwis"  wrote:
> For Python 3, one proposed solution is to provide two sets of APIs: a
> byte-oriented one, and a character-oriented one, where the
> character-oriented one would be limited to not being able to represent
> all data accurately. Unfortunately, for Windows, the situation would
> be exactly the opposite: the byte-oriented interface cannot represent
> all data; only the character-oriented API can. As a consequence,
> libraries and applications that want to support all user data in a
> cross-platform manner have to accept mish-mash of bytes and characters
> exactly in the way that caused endless troubles for Python 2.x.

Is the second part of this actually true? My understanding may be
flawed, but surely all Unicode data can be converted to and from bytes
using UTF-8? Obviously not all byte sequences are valid UTF-8, but
this doesn't prevent one from creating an arbitrary Unicode string
using "utf-8 bytes".decode("utf-8").  Given this, can't people who
must have access to all files / environment data just use the bytes
interface?

Disclosure: My gut reaction is that the solution described in the PEP
is a hack, but I'm hardly a character encoding expert.  My feeling is
that the correct solution is to either standardise on the bytes
interface as the lowest common denominator, or to add a Path type (and
I guess an EnvironmentalData type) and use the new type to attempt to
hide the differences.

Schiavo
Simon
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-24 Thread Simon Cross

On Fri, Apr 24, 2009 at 12:04 PM, Glenn Linderman  wrote:
> The goal of Unicode users everywhere is to use Unicode for everything, no?
>  After all, all "real" file should have Unicode based names, and the only
> proper byte sequences that should exist are UTF-8 encoding Unicode bytes.
>  (Cheek to tongue: Get out of here!)

Humour aside :), the expectation that filenames are Unicode data
simply doesn't agree with the reality of POSIX file systems.  I think
an approach similar to that adopted by glib [1] could work -- i.e. use
the bytes API and provide some tools to assist application developers
in converting them to and from Unicode strings (these tools are then
where all the guess work about what encoding to use can live).

[1] 
http://library.gnome.org/devel/glib/stable/glib-Character-Set-Conversion.html

Schiavo
Simon
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-24 Thread Simon Cross

On Fri, Apr 24, 2009 at 11:22 AM, Glenn Linderman  wrote:
> Oh clearly it is a hack.  The right solution of a Path type (and friends)
> was discarded in earlier discussion, because it would impact too much
> existing code.  The use of bytes would be annoying in the context of py3,
> where things that you want to display are in str (Unicode).  So there is no
> solution that allows the use of str, and the robustness of bytes, and is
> 100% compatible with existing practice. Hence the desire is to find a hack
> that is "good enough".  At least, that is my understanding and synopsis.

What about keeping the bytes interface (utf-8 encoded Unicode on
Windows) and adding a Path type (and friends) interface that mirrors
it?

> (Sorry Simon, but it is still the same thread, anyway.)

Python discussions do seem to womble through a rather large set of
mailing lists and news groups. :)

Schiavo
Simon
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-24 Thread Paul Moore

2009/4/24 Simon Cross :
> On Fri, Apr 24, 2009 at 12:04 PM, Glenn Linderman  wrote:
>> The goal of Unicode users everywhere is to use Unicode for everything, no?
>>  After all, all "real" file should have Unicode based names, and the only
>> proper byte sequences that should exist are UTF-8 encoding Unicode bytes.
>>  (Cheek to tongue: Get out of here!)
>
> Humour aside :), the expectation that filenames are Unicode data
> simply doesn't agree with the reality of POSIX file systems.

However, it *does* agree with the reality of Windows file systems. The
fundamental problem here is that there is a strong OS disparity - for
Windows, the OS uses Unicode, for POSIX, the OS uses bytes.
Traditionally, Python has been happy to expose OS differences, and let
application code address platform portability issues. But this is such
a fundamental area, that doing so is problematic - it could easily
result in *more* code being OS-specific (in subtle,
only-affects-non-Latin-alphabet-using-users manners) rather than less.

That is why it makes sense to have *some* means of normalising things
in a way that does the best it can. The raw bytes interfaces should be
available for POSIX users writing low-level code that *must* handle
all possible nightmare scenarios[1], but Martin's proposal is designed
to handle "the majority of cases" in a platform-independent way. To
that end, a string-based interface makes sense, as frankly that's how
"normal" users think of filenames. The rest of Martin's proposal seems
to follow the same sort of practical approach.

Paul.

[1] Maybe there's a need for a Unicode interface on Windows that
doesn't do *any* encoding, even in the face of garbled Unicode - I
don't know low-level details well enough to be sure here. But the same
principle applies, that "get the raw data, regardless" is a low-level
OS-specific operation, and should not be the one used in day-to-day
programming.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-24 Thread Glenn Linderman

On approximately 4/24/2009 12:59 AM, came the following characters from 
the keyboard of Simon Cross:

On Wed, Apr 22, 2009 at 8:50 AM, "Martin v. Löwis"  wrote:

For Python 3, one proposed solution is to provide two sets of APIs: a
byte-oriented one, and a character-oriented one, where the
character-oriented one would be limited to not being able to represent
all data accurately. Unfortunately, for Windows, the situation would
be exactly the opposite: the byte-oriented interface cannot represent
all data; only the character-oriented API can. As a consequence,
libraries and applications that want to support all user data in a
cross-platform manner have to accept mish-mash of bytes and characters
exactly in the way that caused endless troubles for Python 2.x.

Is the second part of this actually true? My understanding may be
flawed, but surely all Unicode data can be converted to and from bytes
using UTF-8? Obviously not all byte sequences are valid UTF-8, but
this doesn't prevent one from creating an arbitrary Unicode string
using "utf-8 bytes".decode("utf-8").  Given this, can't people who
must have access to all files / environment data just use the bytes
interface?

Disclosure: My gut reaction is that the solution described in the PEP
is a hack, but I'm hardly a character encoding expert.  My feeling is
that the correct solution is to either standardise on the bytes
interface as the lowest common denominator, or to add a Path type (and
I guess an EnvironmentalData type) and use the new type to attempt to
hide the differences.

Oh clearly it is a hack.  The right solution of a Path type (and 
friends) was discarded in earlier discussion, because it would impact 
too much existing code.  The use of bytes would be annoying in the 
context of py3, where things that you want to display are in str 
(Unicode).  So there is no solution that allows the use of str, and the 
robustness of bytes, and is 100% compatible with existing practice. 
Hence the desire is to find a hack that is "good enough".  At least, 
that is my understanding and synopsis.

I never saw MvL's original message with the PEP delivered to my mailbox, 
but some of the replies came there, so I found and extensively replied 
to it using the Google group / usenet.  My reply never showed up here 
and no one has commented on it either... Should I repost via the mailing 
list?  I think so... I'll just paste it in here, with one tweak I 
noticed after I sent it fixed... (Sorry Simon, but it is still the same 
thread, anyway.) (Sorry to others, if my original reply was seen, and 
just wasn't worth replying to.)

On Apr 21, 11:50 pm, "Martin v. Löwis"  wrote:

> I'm proposing the following PEP for inclusion into Python 3.1.
> Please comment.

Basically the scheme doesn't work.  Aside from that, it is very close.

There are tons of encoding schemes that could work... they don't have
to include half-surrogates or bytes.  What they have to do, is make
sure that they are uniformly applied to all appropriate strings.

The problem with this, and other preceding schemes that have been
discussed here, is that there is no means of ascertaining whether a
particular file name str was obtained from a str API, or was funny-
decoded from a bytes API... and thus, there is no means of reliably
ascertaining whether a particular filename str should be passed to a
str API, or funny-encoded back to bytes.

The assumption in the 2nd Discussion paragraph may hold for a large
percentage of cases, maybe even including some number of 9s, but it is
not guaranteed, and cannot be enforced, therefore there are cases that
could fail.  Whether those failure cases are a concern or not is an
open question.  Picking a character (I don't find U+F01xx in the
Unicode standard, so I don't know what it is) that is obscure, and
unlikely to be used in "real" file names, might help the heuristic
nature of the encoding and decoding avoid most conflicts, but provides
no guarantee that data puns will not occur in practice.  Today's
obscure character is tomorrows commonly used character, perhaps.
Someone not on this list may be happily using that character for their
own nefarious, incompatible purpose.

As I realized in the email-sig, in talking about decoding corrupted
headers, there is only one way to guarantee this... to encode _all_
character sequences, from _all_ interfaces.  Basically it requires
reserving an escape character (I'll use ? in these examples -- yes, an
ASCII question mark -- happens to be illegal in Windows filenames so
all the better on that platform, but the specific character doesn't
matter... avoiding / \ and . is probably good, though).

So the rules would be, when obtaining a file name from the bytes OS
interface, that doesn't properly decode according to UTF-8, decode it
by placing a ? at the beginning, then for each decodable UTF-8
sequence, add a Unicode character -- unless the character is ?, in
which case you add two ??, and for each non-decodable byte sequenc

[Python-Dev] version for blender Vista

2009-04-24 Thread Yuma Scott



Can you tell me which installer of Python I need to work with
Blender and Windows Vista Home Premium?

Thanks!
Yuma Scott___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] version for blender Vista

2009-04-24 Thread Senthil Kumaran

From:

http://mail.python.org/mailman/listinfo/python-dev

About Python-Dev

***Do not post general Python questions to this list. For help with
Python please see the Python help page.***

On this list the key Python developers discuss the future of the
language and its implementation. Topics include Python design issues,
release mechanics, and maintenance of existing releases.

On Fri, Apr 24, 2009 at 7:04 PM, Yuma Scott  wrote:
>
> Can you tell me which installer of Python I need to work with
> Blender and Windows Vista Home Premium?
> Thanks!
> Yuma Scott
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/orsenthil%40gmail.com
>
>

-- 
-- 
Senthil
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-24 Thread James Y Knight


On Apr 24, 2009, at 8:00 AM, Paul Moore wrote:

However, it *does* agree with the reality of Windows file systems. The
fundamental problem here is that there is a strong OS disparity - for
Windows, the OS uses Unicode, for POSIX, the OS uses bytes.


It's unfortunately the case that this isn't *precisely* true. Windows  
uses arbitrary 16-bit sequences, just as unix uses arbitrary 8-bit  
sequences. Neither one is required by the operating system to be a  
proper unicode encoding. The main difference is that there is already  
a widely accepted way to decode a improperly-encoded 16-bit-sequence  
with the utf-16 codec: simply leave the lone surrogate pairs in place.


James
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-24 Thread Aahz

On Fri, Apr 24, 2009, Paul Moore wrote:
> 2009/4/24 Simon Cross :
>> 
>> Humour aside :), the expectation that filenames are Unicode data
>> simply doesn't agree with the reality of POSIX file systems.
> 
> However, it *does* agree with the reality of Windows file systems. The
> fundamental problem here is that there is a strong OS disparity - for
> Windows, the OS uses Unicode, for POSIX, the OS uses bytes.
> Traditionally, Python has been happy to expose OS differences, and let
> application code address platform portability issues. But this is such
> a fundamental area, that doing so is problematic - it could easily
> result in *more* code being OS-specific (in subtle,
> only-affects-non-Latin-alphabet-using-users manners) rather than less.

The part that I haven't seen clearly addressed so far is what happens
when disks get mounted across OSes (e.g. NFS).

While I agree that there should be a layer on top that can handle "most"
situations, it also seems clear that the raw layer needs to be readily
accessible.
-- 
Aahz ([email protected])   <*> http://www.pythoncraft.com/

"If you think it's expensive to hire a professional to do the job, wait
until you hire an amateur."  --Red Adair
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-24 Thread Antoine Pitrou

Aahz  pythoncraft.com> writes:
> 
> The part that I haven't seen clearly addressed so far is what happens
> when disks get mounted across OSes (e.g. NFS).

Unless there's some kind of native NFS API for file access, it is hopelessly out
of scope for Python. We use whatever the C library exports to us, and don't have
any control over filesystem details.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Dates in python-dev

2009-04-24 Thread MRAB


Hi,

I've recently subscribed to this list and received my first "Summary of
Python tracker Issues". What I find annoying are the dates, for example:

ACTIVITY SUMMARY (04/17/09 - 04/24/09)

3 x double-digits (have we learned nothing from Y2K? :-)) with the
_middle_ ones changing fastest!

I know it's the US standard, but Python is global. Could we have an
'international' style instead, say, year-month-day:

ACTIVITY SUMMARY (2009-04-17 - 2009-04-24)

Thank you for your attention, etc.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-24 Thread Paul Moore

2009/4/24 Antoine Pitrou :
> Aahz  pythoncraft.com> writes:
>>
>> The part that I haven't seen clearly addressed so far is what happens
>> when disks get mounted across OSes (e.g. NFS).
>
> Unless there's some kind of native NFS API for file access, it is hopelessly 
> out
> of scope for Python. We use whatever the C library exports to us, and don't 
> have
> any control over filesystem details.

For "raw" level stuff (bytes on Unix, Unicode-nearly (:-)) on Windows)
that's right. Resist the temptation to guess and all that.

For the level Martin is (as far as I can tell) aiming at [1], we need
some defined rules on how to behave (relatively) sanely. Windows is
fairly easy - "nearly-Unicode" to Unicode isn't too bad. But on Unix,
you're dealing with bytes-to-Unicode in the absence of a clearly
stated encoding - which is a known can of worms...

In my view:

The pros for Martin's proposal are a uniform cross-platform interface,
and a user-friendly API for the common case.
The cons are subtle and complex corner cases, and lack of agreement on
the validity of the proposed encoding in those cases.

The fact that the bytes APIs won't go away probably mitigates the cons
to a large extent (again, in my view...)

Paul.

[1] Actually, all the PEP says is "With this PEP, a uniform treatment
of these data as characters becomes
possible." An argument as to why this is a good thing would be a
useful addition to the PEP. At the moment it's more or less treated as
self-evident - which I agree with, but which clearly the Unix people
here are not as certain of.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Summary of Python tracker Issues

2009-04-24 Thread Python tracker


ACTIVITY SUMMARY (04/17/09 - 04/24/09)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue 
number.  Do NOT respond to this message.


 2227 open (+32) / 15427 closed (+17) / 17654 total (+49)

Open issues with patches:   865

Average duration of open issues: 641 days.
Median duration of open issues: 395 days.

Open Issues Breakdown
   open  2175 (+31)
pending52 ( +1)

Issues Created Or Reopened (51)
___

Builtin round function is sometimes inaccurate for floats04/18/09
CLOSED http://bugs.python.org/issue1869reopened marketdickinson 
  
   patch   

logging to file + encoding   04/20/09
CLOSED http://bugs.python.org/issue5170reopened shamilbi
  
   

IDLE cannot find windows chm file04/17/09
   http://bugs.python.org/issue5783created  rhettinger  
  
   patch   

raw deflate format and zlib module   04/17/09
   http://bugs.python.org/issue5784created  phr 
  
   

Condition.wait() does not respect its timeout04/18/09
CLOSED http://bugs.python.org/issue5785created  Kjir
  
   

len(reversed([1,2,3])) does not work anymore in 2.6.204/19/09
   http://bugs.python.org/issue5786reopened rhettinger  
  
   

object.__getattribute__(super, '__bases__') crashes the interpre 04/19/09
CLOSED http://bugs.python.org/issue5787reopened alexer  
  
   

datetime.timedelta is inconvenient to use... 04/18/09
   http://bugs.python.org/issue5788created  bquinlan
  
   patch   

powerset recipe listed twice in itertools docs   04/19/09
CLOSED http://bugs.python.org/issue5789created  stevenjd
  
   easy

itertools.izip python code has a typo04/19/09
CLOSED http://bugs.python.org/issue5790created  stevenjd
  
   

title information of unicodedata is wrong in some cases  04/19/09
CLOSED http://bugs.python.org/issue5791created  cfbolz  
  
   

Enable short float repr() on Solaris/x86 04/19/09
   http://bugs.python.org/issue5792created  marketdickinson 
  
   easy

Rationalize isdigit / isalpha / tolower / ... uses throughout Py 04/19/09
   http://bugs.python.org/issue5793created  marketdickinson 
  
   easy

pickle/cPickle of recursive tuples create pickles that cPickle c 04/19/09
   http://bugs.python.org/issue5794created  cwitty  
  
   

test_distutils failure on the ppc Debian buildbot04/19/09
CLOSED http://bugs.python.org/issue5795created  pitrou  
  
   

test_posix, test_pty crash under Windows 04/19/09
CLOSED http://bugs.python.org/issue5796created  pitrou  
  
   patch   

there is en exception om Create User page04/20/09
   http://bugs.python.org/issue5797created  nabeel  
  
   

test_asynchat fails on Mac OSX   04/20/09
   http://bugs.python.org/issue5798created  cartman 
  
   

Change ntpath functions to implicitly support UNC paths  04/20/09
   http://bugs.python.org/issue5799created  larry   
  
   patch

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-24 Thread Stephen J. Turnbull

James Y Knight writes:

 > It's unfortunately the case that this isn't *precisely* true. Windows  
 > uses arbitrary 16-bit sequences, just as unix uses arbitrary 8-bit  
 > sequences.

Including U+FFFE and U+ "not a character nowhere nohow"?  Just
when I was thinking Microsoft would actually nail one

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Dates in python-dev

2009-04-24 Thread Arfrever Frehtes Taifersar Arahesis

2009-04-24 18:29:29 MRAB napisał(a):
> Hi,
> 
> I've recently subscribed to this list and received my first "Summary of
> Python tracker Issues". What I find annoying are the dates, for example:
> 
>  ACTIVITY SUMMARY (04/17/09 - 04/24/09)
> 
> 3 x double-digits (have we learned nothing from Y2K? :-)) with the
> _middle_ ones changing fastest!
> 
> I know it's the US standard, but Python is global. Could we have an
> 'international' style instead, say, year-month-day:
> 
>  ACTIVITY SUMMARY (2009-04-17 - 2009-04-24)

+1.
ISO 8601 should be mandatory.

-- 
Arfrever Frehtes Taifersar Arahesis


signature.asc
Description: This is a digitally signed message part.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Dates in python-dev

2009-04-24 Thread Oleg Broytmann

On Fri, Apr 24, 2009 at 05:29:29PM +0100, MRAB wrote:
> I've recently subscribed to this list and received my first "Summary of
> Python tracker Issues". What I find annoying are the dates, for example:
>
> ACTIVITY SUMMARY (04/17/09 - 04/24/09)
>
> 3 x double-digits (have we learned nothing from Y2K? :-)) with the
> _middle_ ones changing fastest!
>
> I know it's the US standard, but Python is global. Could we have an
> 'international' style instead, say, year-month-day:
>
> ACTIVITY SUMMARY (2009-04-17 - 2009-04-24)

   +1000 from me!

Oleg.
-- 
 Oleg Broytmannhttp://phd.pp.ru/[email protected]
   Programmers don't die, they just GOSUB without RETURN.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-24 Thread Stephen J. Turnbull

Paul Moore writes:

 > The pros for Martin's proposal are a uniform cross-platform interface,
 > and a user-friendly API for the common case.

A more accurate phrasing would be "... a user-friendly API for those
who feel very lucky today."  Which is the common case, of course, but
spins a little differently.

 > [1] Actually, all the PEP says is "With this PEP, a uniform
 > treatment of these data as characters becomes possible." An
 > argument as to why this is a good thing would be a useful addition
 > to the PEP. At the moment it's more or less treated as self-evident
 > - which I agree with, but which clearly the Unix people here are
 > not as certain of.

Well, the problem is that both parts are false.  If you didn't start
with a valid string in a known encoding, you shouldn't treat it as
characters because it's not.  Hand it to a careful API, and you'll get
an Exception raised in your face.  And that's precisely why it's not
obviously a good thing.  Careful clients will have to treat it as
"transcoded bytes", and so the people who develop those clients get no
benefit.  OTOH, at least some of those who feel lucky and use it
naively are going to turn out to be wrong.

That said, I'm +0 on the PEP as is.  It's a little bit better than the
current situation in that developers who would otherwise just punt on
dealing with the other world (ie, Windows for Unix hackers, and Unix
for Windows coders) will have a unified interface so it'll maybe work
automagically (when you're luck :-) in that other world, too.  And if
somebody comes up with an idea of true genius for handling the
underlying problem, or even just a slight practical improvement, then
everybody who uses this API can benefit simply by upgrading Python.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Dates in python-dev

2009-04-24 Thread Stephen J. Turnbull

Followups directed to Tracker-Discuss, where the people who can do
something about it are hanging out.  (They're here too, but I'm pretty
sure they'd rather discuss this issue on that list.)

Arfrever Frehtes Taifersar Arahesis writes:
 > 2009-04-24 18:29:29 MRAB napisał(a):
 > > Hi,
 > > 
 > > I've recently subscribed to this list and received my first "Summary of
 > > Python tracker Issues". What I find annoying are the dates, for example:
 > > 
 > >  ACTIVITY SUMMARY (04/17/09 - 04/24/09)
 > > 
 > > 3 x double-digits (have we learned nothing from Y2K? :-)) with the
 > > _middle_ ones changing fastest!
 > > 
 > > I know it's the US standard, but Python is global. Could we have an
 > > 'international' style instead, say, year-month-day:
 > > 
 > >  ACTIVITY SUMMARY (2009-04-17 - 2009-04-24)
 > 
 > +1.
 > ISO 8601 should be mandatory.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-24 Thread Antoine Pitrou

Stephen J. Turnbull  xemacs.org> writes:
> 
> Well, the problem is that both parts are false.  If you didn't start
> with a valid string in a known encoding, you shouldn't treat it as
> characters because it's not.  Hand it to a careful API, and you'll get
> an Exception raised in your face.

Which "careful API" are you talking about?

> OTOH, at least some of those who feel lucky and use it
> naively are going to turn out to be wrong.

Why will they turn out to be wrong?



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] PyEval_Call* convenience functions

2009-04-24 Thread Tim Lesher

Is there a reason that the PyEval_CallFunction() and
PyEval_CallMethod() convenience functions remain undocumented? (i.e.,
would a doc-and-test patch to correct this be rejected?)

I didn't see any mention of this coming up in python-dev before.

Also, despite its name, PyEval_CallMethod() is quite useful for
calling module-level functions or classes (given that it's just a
PyObject_GetAttrString plus the implementation of
PyEval_CallFunction).  Is there any reason (beyond its undocumented
status) to believe this use case would ever be deprecated?

Thanks.

-- 
Tim Lesher 
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Tracker-discuss] Dates in python-dev

2009-04-24 Thread Daniel Diniz

http://psf.upfronthosting.co.za/roundup/meta/issue274
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Tuples and underorderable types

2009-04-24 Thread Raymond Hettinger


Does anyone have any ideas about what to do with issue 5830 and handling the 
problem in a general way (not just for sched)?

The basic problem is that decorate/compare/undecorate patterns no longer work when the primary sort keys are equal and the secondary 
keys are unorderable (which is now the case for many callables).


   >>> tasks = [(10, lambda: 0), (20, lambda: 1), (10, lambda: 2)]
   >>> tasks.sort()
   Traceback (most recent call last):
   ...
   TypeError: unorderable types: function() < function()

Would it make sense to provide a default ordering whenever the types are the 
same?

   def object.__lt__(self, other):
   if type(self) == type(other):
return id(self) < id(other)
   raise TypeError


Raymond


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Tuples and underorderable types

2009-04-24 Thread Antoine Pitrou

Raymond Hettinger  rcn.com> writes:
> 
> Would it make sense to provide a default ordering whenever the types are
> the same?

This doesn't work when they are not the same :-)

Instead, you could make the decorating a bit more sophisticated:

  decorated = [(key, id(value), value) for key, value in blah(values)]

or even:

  decorated = [(key, n, value) for n, key, value in enumerate(blah(values))]



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Tuples and underorderable types

2009-04-24 Thread Raymond Hettinger




Would it make sense to provide a default ordering whenever the types are
the same?


This doesn't work when they are not the same :-)


_ ~
@ @
\_/



Instead, you could make the decorating a bit more sophisticated:

 decorated = [(key, id(value), value) for key, value in blah(values)]

or even:

 decorated = [(key, n, value) for n, key, value in enumerate(blah(values))]


I already do something along those lines in heapq.nsmallest() 
and nlargest() to preserve sort stability. 
The real issue isn't how to fix one particular module.

The problem is that a basic python pattern is now broken
in a way that may not readily surface during testing.

I'm wondering if there is something we can do to mitigate
the issue in a general way.  It bites that the venerable technique
of tuple sorting has lost some of its mojo.  This may be
an unintended consequence of eliminating default comparisons.


Raymond
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Tuples and underorderable types

2009-04-24 Thread Scott Dial

Raymond Hettinger wrote:
> Would it make sense to provide a default ordering whenever the types are
> the same?
> 
>def object.__lt__(self, other):
>if type(self) == type(other):
> return id(self) < id(other)
>raise TypeError

No. This only makes it more difficult for someone wanting to behave
smartly with incomparable types. I can easily imagine someone wanting
incomparable objects to be treated as equal wrt. sorting. I am thinking
especially with respect to keeping the sort stable. I think many
developers would be surprised to find,

 >>> a =
 >>> tasks = [(10, lambda: 0), (20, lambda: 1), (10, lambda: 2)]
 >>> tasks.sort()
 >>> assert tasks[0][1]() == 0

, is not guaranteed.

Moreover, I fail to see your point in general as a bug if you accept
that there is not all objects can be total ordered. We shouldn't be
patching the object base class because of legacy code that relied on
sorting tuples; this code should be updated to either use a key function.

-Scott

-- 
Scott Dial
[email protected]
[email protected]
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-24 Thread Stephen J. Turnbull

Antoine Pitrou writes:
 > Stephen J. Turnbull  xemacs.org> writes:
 > > 
 > > Well, the problem is that both parts are false.  If you didn't start
 > > with a valid string in a known encoding, you shouldn't treat it as
 > > characters because it's not.  Hand it to a careful API, and you'll get
 > > an Exception raised in your face.
 > 
 > Which "careful API" are you talking about?
 >
 > > OTOH, at least some of those who feel lucky and use it
 > > naively are going to turn out to be wrong.
 > 
 > Why will they turn out to be wrong?

To quote the PEP:

"""
While providing a uniform API to non-decodable bytes, this interface
has the limitation that chosen representation only "works" if the data
get converted back to bytes with the python-escape error handler
also. Encoding the data with the locale's encoding and the (default)
strict error handler will raise an exception, encoding them with UTF-8
will produce non-sensical data.

For most applications, we assume that they eventually pass data
received from a system interface back into the same system
interfaces.
"""

But you can't know that.  These are now "just strings", which could
end up in pickles and other persistent objects, be passed across
network interfaces (remote copy, for example), etc, etc, and there is
no way to guarantee that the recipient will understand the rules,
unless the application encapsulates them in some kind of
representation that says "I look like a Unicode but I'm really just
encoded bytes."  But the whole point is to turn them into plain old
strings so people *don't have to bother* keeping track.

As I already said, this is no worse than the current situation, but it
gives the impression that Python has a standard "solution".  (Yes, I
know Martin doesn't claim it's a solution to any of those problems.
The point is user perception.)

I have to wonder whether having a standard way of not solving any
problems is better than having no standard way of not solving any
problems.  It may be, and it probably can't hurt, which is why I'm +0.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Tuples and underorderable types

2009-04-24 Thread Martin v. Löwis

> I'm wondering if there is something we can do to mitigate
> the issue in a general way.  It bites that the venerable technique
> of tuple sorting has lost some of its mojo.  This may be
> an unintended consequence of eliminating default comparisons.

I would discourage use of the decorate/sort/undecorate pattern,
and encourage use of the key= argument. Or, if you really need
to decorate into a tuple, still pass a key= argument.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Tuples and underorderable types

2009-04-24 Thread Daniel Diniz

Raymond Hettinger wrote:
> The problem is that a basic python pattern is now broken
> in a way that may not readily surface during testing.
>
> I'm wondering if there is something we can do to mitigate
> the issue in a general way.  It bites that the venerable technique
> of tuple sorting has lost some of its mojo.  This may be
> an unintended consequence of eliminating default comparisons.

There could be a high performance, non-lame version of the mapping
pattern below available in the stdlib (or at least in the docs):

keymap = {type(lambda: 1) : id}

def decorate_helper(tup):
return tuple(keymap[type(i)](i) if type(i) in keymap else i for i in tup)

tasks = [(10, lambda: 0), (20, lambda: 1), (10, lambda: 2)]
tasks.sort(key=decorate_helper)

This works when comparing different types too, but then some care must
be taken to avoid bad surprises:

keymap[type(1j)] = abs
imaginary_tasks = [(10j, lambda: 0), (20, lambda: 1), (10+1j, lambda: 2)]
imaginary_tasks.sort(key=decorate_helper) # not so bad if intended

mixed_tasks = [(lambda: 0,), (0.0,), (2**32,)]
mixed_tasks.sort(key=decorate_helper) # oops, not the same order as in 2.x

Regards,
Daniel
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PyEval_Call* convenience functions

2009-04-24 Thread Georg Brandl

Tim Lesher schrieb:
> Is there a reason that the PyEval_CallFunction() and
> PyEval_CallMethod() convenience functions remain undocumented? (i.e.,
> would a doc-and-test patch to correct this be rejected?)
> 
> I didn't see any mention of this coming up in python-dev before.
> 
> Also, despite its name, PyEval_CallMethod() is quite useful for
> calling module-level functions or classes (given that it's just a
> PyObject_GetAttrString plus the implementation of
> PyEval_CallFunction).  Is there any reason (beyond its undocumented
> status) to believe this use case would ever be deprecated?

FWIW, there's also PyObject_CallMethod(); all PyObject_Call* variants are
documented, but none of the PyEval_Call* functions are.  I actually don't
know why we have two sets of these, with partially conflicting definitions;
perhaps someone else can shed some light?

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Tuples and underorderable types

2009-04-24 Thread Aahz

On Fri, Apr 24, 2009, Raymond Hettinger wrote:
>
> I'm wondering if there is something we can do to mitigate the issue in
> a general way.  It bites that the venerable technique of tuple sorting
> has lost some of its mojo.  This may be an unintended consequence of
> eliminating default comparisons.

My understanding was that this was entirely an *intended* consequence of
eliminating default comparisons.  Not so much in the sense that it was
desired by itself, but that the whole discussion of whether to keep
moving forward in stripping out default comparisons explicitly revolved
around whether this kind of difficulty warranted the overall
simplification we now have (I don't remember off-hand whether this
specific case was discussed, though).

I think that anyone who wants to suggest reverting to some kind of
default comparison behavior needs to write up a PEP and clearly summarize
all previous discussion prior to 3.0 release, then go through the usual
grind of starting with python-ideas before coming back to python-dev.
-- 
Aahz ([email protected])   <*> http://www.pythoncraft.com/

"If you think it's expensive to hire a professional to do the job, wait
until you hire an amateur."  --Red Adair
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-24 Thread Lino Mastrodomenico

2009/4/22 "Martin v. Löwis" :
> To convert non-decodable bytes, a new error handler "python-escape" is
> introduced, which decodes non-decodable bytes using into a private-use
> character U+F01xx, which is believed to not conflict with private-use
> characters that currently exist in Python codecs.

Why not use U+DCxx for non-UTF-8 encodings too?

Overall I like the PEP: I think it's the best proposal so far that
doesn't put an heavy burden on applications that only want to do
simple things with the API.

-- 
Lino Mastrodomenico
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-24 Thread Glenn Linderman

On approximately 4/24/2009 11:40 AM, came the following characters from 
the keyboard of Stephen J. Turnbull:

Antoine Pitrou writes:
 > Stephen J. Turnbull  xemacs.org> writes:
 > > 
 > > Well, the problem is that both parts are false.  If you didn't start

 > > with a valid string in a known encoding, you shouldn't treat it as
 > > characters because it's not.  Hand it to a careful API, and you'll get
 > > an Exception raised in your face.
 > 
 > Which "careful API" are you talking about?

 >
 > > OTOH, at least some of those who feel lucky and use it
 > > naively are going to turn out to be wrong.
 > 
 > Why will they turn out to be wrong?

Because the encoding is not reliably reversible.  That is why I proposed 
one that is.

To quote the PEP:

"""
While providing a uniform API to non-decodable bytes, this interface
has the limitation that chosen representation only "works" if the data
get converted back to bytes with the python-escape error handler
also. Encoding the data with the locale's encoding and the (default)
strict error handler will raise an exception, encoding them with UTF-8
will produce non-sensical data.

For most applications, we assume that they eventually pass data
received from a system interface back into the same system
interfaces.
"""

And so my encoding (1) doesn't alter the data stream for any valid 
Windows file name, and where the naivest of users reside (2) doesn't 
alter the data stream for any Posix file name that was encoded as UTF-8 
sequences and doesn't contain ? characters in the file name [I perceive 
the use of ? in file names to be rare on Posix, because of experience, 
and because of the other problems caused by such use] (3) doesn't 
introduce data puns within applications that are correctly coded to know 
the encoding occurs.  The encoding technique in the PEP not only can 
produce data puns, thus not being reversible, it provides no reliable 
mechanism to know that this has occurred.

But you can't know that.  These are now "just strings", which could
end up in pickles and other persistent objects, be passed across
network interfaces (remote copy, for example), etc, etc, and there is
no way to guarantee that the recipient will understand the rules,
unless the application encapsulates them in some kind of
representation that says "I look like a Unicode but I'm really just
encoded bytes."  

This could happen.  Well-formed programs need to use the encoding at the 
boundaries.  Python could encapsulate its interfaces to the file system, 
but cannot encapsulate other interfaces.  Fortunately, something that is 
pickled, would probably be unpicked by Python, and therefore all would 
be well.  But any interface that expects a file name, and is not 
encapsulated by Python, must be encapsulated by the application.

But the whole point is to turn them into plain old
strings so people *don't have to bother* keeping track.

And if that is the point, it isn't worth doing.  If the point is that it 
can minimize the amount of existing, file name manipulation code that 
uses string manipulations, that must be reworked to be functional during 
a 2to3 migration, then it can be worth doing.  But I think it should be 
done with an encoding that doesn't introduce undetectable data puns, 
whether mine or some different encoding with that characteristic, but 
not the one presently in the PEP, because it does introduce undetectable 
data puns.

As I already said, this is no worse than the current situation, but it
gives the impression that Python has a standard "solution".  (Yes, I
know Martin doesn't claim it's a solution to any of those problems.
The point is user perception.)

I have to wonder whether having a standard way of not solving any
problems is better than having no standard way of not solving any
problems.  It may be, and it probably can't hurt, which is why I'm +0.

Interesting phraseology there, Stephen!

I'm +1 on the concept, -1 on the PEP, due solely to the lack of a 
reversible encoding.

--
Glenn -- http://nevcal.com/
===
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Dates in python-dev

2009-04-24 Thread Glenn Linderman

On approximately 4/24/2009 10:06 AM, came the following characters from 
the keyboard of Oleg Broytmann:

On Fri, Apr 24, 2009 at 05:29:29PM +0100, MRAB wrote:

I've recently subscribed to this list and received my first "Summary of
Python tracker Issues". What I find annoying are the dates, for example:

ACTIVITY SUMMARY (04/17/09 - 04/24/09)

3 x double-digits (have we learned nothing from Y2K? :-)) with the
_middle_ ones changing fastest!

I know it's the US standard, but Python is global. Could we have an
'international' style instead, say, year-month-day:

ACTIVITY SUMMARY (2009-04-17 - 2009-04-24)


   +1000 from me!

Oleg.



You missed a prime opportunity, Oleg...

+2000 from me!


--
Glenn -- http://nevcal.com/
===
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-24 Thread Martin v. Löwis

> Why not use U+DCxx for non-UTF-8 encodings too?

I thought of that, and was tricked into believing that only U+DC8x
is a half surrogate. Now I see that you are right, and have fixed
the PEP accordingly.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Summary of Python tracker Issues

2009-04-24 Thread Terry Reedy


Python tracker wrote:
[snip]

In going through this, I notice a lot of effort by Mark Dickenson and 
others to get some details of numbers computation and display right in 
time for 3.1.  As a certain-to-be beneficiary, I want to thank all who 
contributed.


Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Summary of Python tracker Issues

2009-04-24 Thread Mark Dickinson

On Fri, Apr 24, 2009 at 9:25 PM, Terry Reedy  wrote:
> In going through this, I notice a lot of effort by Mark Dickenson and others

Many others, but Eric Smith's name needs to be in big lights here.
There's no way the short float repr would have been ready for 3.1 if
Eric hadn't shown an interest in this at PyCon, and then taken on
the major internal replumbing job this entailed for all of Python's
string formatting.

> 3.1.  As a certain-to-be beneficiary, I want to thank all who contributed.

Glad you like it!

Mark
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Summary of Python tracker Issues

2009-04-24 Thread Eric Smith


Mark Dickinson wrote:

On Fri, Apr 24, 2009 at 9:25 PM, Terry Reedy  wrote:

In going through this, I notice a lot of effort by Mark Dickenson and others


Many others, but Eric Smith's name needs to be in big lights here.
There's no way the short float repr would have been ready for 3.1 if
Eric hadn't shown an interest in this at PyCon, and then taken on
the major internal replumbing job this entailed for all of Python's
string formatting.


Not to get too much into a mutual admiration mode, but Mark did the 
parts involving hard thinking.



3.1.  As a certain-to-be beneficiary, I want to thank all who contributed.


Glad you like it!


Me, too. I think it's going to be great once we get it all straightened 
out. And I think we're close!


Eric.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Deprecating PyOS_ascii_formatd

2009-04-24 Thread Eric Smith


Eric Smith wrote:
Assuming that Mark's and my changes in the py3k-short-float-repr branch 
get checked in shortly, I'd like to deprecate PyOS_ascii_formatd. Its 
functionality is largely being replaced by PyOS_double_to_string, which 
we're introducing on our branch.


We've checked the changes in, and everything looks good as far as I can 
tell.



My proposal is to deprecate PyOS_ascii_formatd in 3.1 and remove it in 3.2.


Having heard no dissent, I'd like to go ahead and deprecate this API. 
What are the mechanics of deprecating this? Just documentation, or is 
there something I should do in the code to generate a warning? Any 
pointers to examples would be great.


The 2.7 situation is tricker, because we're not planning on backporting 
the short-float-repr work back to 2.7. In 2.7 I guess we'll leave 
PyOS_ascii_formatd around, unfortunately.


I backported the new API to 2.7, so I'll also deprecate 
PyOS_ascii_formatd there.


Eric.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Deprecating PyOS_ascii_formatd

2009-04-24 Thread Benjamin Peterson

2009/4/24 Eric Smith :
>> My proposal is to deprecate PyOS_ascii_formatd in 3.1 and remove it in
>> 3.2.
>
> Having heard no dissent, I'd like to go ahead and deprecate this API. What
> are the mechanics of deprecating this? Just documentation, or is there
> something I should do in the code to generate a warning? Any pointers to
> examples would be great.

You can use PyErr_WarnEx().



-- 
Regards,
Benjamin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-24 Thread Toshio Kuratomi

Glenn Linderman wrote:
> On approximately 4/24/2009 11:40 AM, came the following characters from
> And so my encoding (1) doesn't alter the data stream for any valid
> Windows file name, and where the naivest of users reside (2) doesn't
> alter the data stream for any Posix file name that was encoded as UTF-8
> sequences and doesn't contain ? characters in the file name [I perceive
> the use of ? in file names to be rare on Posix, because of experience,
> and because of the other problems caused by such use] (3) doesn't
> introduce data puns within applications that are correctly coded to know
> the encoding occurs.  The encoding technique in the PEP not only can
> produce data puns, thus not being reversible, it provides no reliable
> mechanism to know that this has occurred.
> 
Uhm  Not arguing with your goals but '?' is unfortunately reasonably
easy to get into a filename.  For instance, I've had to download a lot
of scratch built packages from our buildsystem recently.  Scratch builds
have url's with query strings in them so::

wget
'http://koji.fedoraproject.org/koji/getfile?taskID=1318059&name=monodevelop-debugger-gdb-2.0-1.1.i586.rpm'

Which results in the filename:
  getfile?taskID=1318059&name=monodevelop-debugger-gdb-2.0-1.1.i586.rpm

-Toshio



signature.asc
Description: OpenPGP digital signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Tuples and underorderable types

2009-04-24 Thread Terry Reedy

Raymond Hettinger wrote:
Does anyone have any ideas about what to do with issue 5830 and handling 
the problem in a general way (not just for sched)?

The basic problem is that decorate/compare/undecorate patterns no longer 
work when the primary sort keys are equal and the secondary keys are 
unorderable (which is now the case for many callables).

   >>> tasks = [(10, lambda: 0), (20, lambda: 1), (10, lambda: 2)]
   >>> tasks.sort()
   Traceback (most recent call last):
   ...
   TypeError: unorderable types: function() < function()

Would it make sense to provide a default ordering whenever the types are 
the same?

   def object.__lt__(self, other):
   if type(self) == type(other):
return id(self) < id(other)
   raise TypeError

The immediate problem with this is that 'same type', or not, is 
sometimes a somewhat arbitrary implementation detail.  In 2.x, 
40 could be int or long, depending on the build.  In 3.0, that 
difference disappeared.  User-defined and builtin functions are 
different classes for implementation, not conceptual reasons.  (This 
could potentially bite what I understand to be your r71844/5 fix.) 
Unbound methods used to be the same class as bound methods (as I 
remember).  In 3.0, the wrapping disappeared and they are the same thing 
as the underlying function.  In 2.x, ascii text and binary data might 
both be str.  Now they might be str and bytes.

Universal ordering and default ordering by id was broken (and doomed) 
when Guido decided that complex numbers should not be comparable either 
lexicographically or by id.  Your proposed object.__lt__ would reverse 
that decision, unless, of course, complex was special-cased (again) to 
over-ride it, but then we would be back to the 2.x situation of mixed 
rules and exceptions.

Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-24 Thread Paul Moore

2009/4/24 Stephen J. Turnbull :
> Paul Moore writes:
>
>  > The pros for Martin's proposal are a uniform cross-platform interface,
>  > and a user-friendly API for the common case.
>
> A more accurate phrasing would be "... a user-friendly API for those
> who feel very lucky today."  Which is the common case, of course, but
> spins a little differently.

Sorry, but I think you're misrepresenting things. I'd have probably
let you off if you'd missed out the "very" - but I do think that it's
the common case. Consider:

- Windows systems where broken Unicode (lone surrogates or whatever)
isn't involved
- Unix systems where the user's stated filesystem encoding is correct

Can you honestly say that this isn't the vast majority of real-world
environments? (IIRC, you are based in Japan, so it may well be true
that the likelihood of problems is a lot higher where you are than
where I am - the UK - but I suspect that averaging out, things are
generally as above).

>  > [1] Actually, all the PEP says is "With this PEP, a uniform
>  > treatment of these data as characters becomes possible." An
>  > argument as to why this is a good thing would be a useful addition
>  > to the PEP. At the moment it's more or less treated as self-evident
>  > - which I agree with, but which clearly the Unix people here are
>  > not as certain of.
>
> Well, the problem is that both parts are false.

I can't work out which "parts" you are referring to here.

> If you didn't start
> with a valid string in a known encoding, you shouldn't treat it as
> characters because it's not.

Again, that's the purist argument. If you have a string (of bytes, I
guess) and a 99% certain guess as to the correct encoding, then I'd
argue that, as long as (a) it's not mission-critical (lives or backups
depend on it) and (b) you have a means of failing relatively
gracefully, you have every reason to make the assumption about
encoding.

After all, what's the alternative? Ultimately, you have a byte string
and no encoding. You make some assumption, or you can do hardly
anything. What use is "Processing file \x66\x6f\x6f" as a progress
indicator for a program that scans a directory? (That was "foo" for
people who can't read latin-1 written in hex :-))

> Hand it to a careful API, and you'll get
> an Exception raised in your face.  And that's precisely why it's not
> obviously a good thing.  Careful clients will have to treat it as
> "transcoded bytes", and so the people who develop those clients get no
> benefit.  OTOH, at least some of those who feel lucky and use it
> naively are going to turn out to be wrong.

But 99% of the time, "it" is a perfectly acceptable string.
(Percentage invented out of thin air, admitted :-)) Remember, only
when the system encounters an undecodable byte sequence, would a
technically invalid string be generated - and as far as I can tell,
the main case when that would happen is on Unix, if the user specifies
UTF-8 as the encoding, and the actual filesystem uses something else,
*and* there's a file with a name whose byte sequence is invalid UTF-8.
I'm *really* struggling to see that as a common scenario.

Admittedly, there are other, possibly more common, cases where the
string translation is valid, but semantically not what the user
expects - user says CP1251, but filesystem is CP850, say. As a UK
Windows user, I'm used to seeing CP850 vs CP1251 confusions like this
- "£" replaced with ú is the common case. It happens occasionally, and
occasionally causes code to behave unexpectedly. But it doesn't
reformat my hard drive and the alternative (having to be extra-careful
to tell every program precisely which encoding I'm using in every
situation) would make programs effectively unusable.

> That said, I'm +0 on the PEP as is.

So I'm largely preaching to the converted here. After all, lukewarm
acceptance from someone with experience of Asian encoding issues is
pretty much the equivalent of resounding support from someone who only
ever works in English! :-)

Paul.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Tuples and underorderable types

2009-04-24 Thread Raymond Hettinger




I would discourage use of the decorate/sort/undecorate pattern,
and encourage use of the key= argument. Or, if you really need
to decorate into a tuple, still pass a key= argument.


The bug report was actually about the sched module which used
heapq to prioritize tuples consisting of times, priorities, and actions.
I fixed and closed the original bug a few hours ago but had a
thought that the pattern itself may be ubiquitious (especially with heapq).
ISTM that other bugs like this are lurking about.  But all of you 
guys seem to think the status quo is fine, so that's the end of it.


Cheers,


Raymond
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] "Length of str " changes after passed in Python 2.5

2009-04-24 Thread [email protected]


---

I have the following code:

#  len(all_svs) = 10


# the I call a function  with 2 list parameters

def proc_line(line,all_svs) :


# inside the function the length of the list "all_svs" is 1 more -> 11
# I had to workaround it


for i in range(len(all_svs)  - 1 ) :# some how the length of all_svs  is 
incremented !!!

--

Is this a compiler bug ??

Or is it because of my first try of Python

Thanks

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] "Length of str " changes after passed in Python 2.5

2009-04-24 Thread Aahz

On Fri, Apr 24, 2009, [email protected] wrote:
>
> I have the following code:
> #  len(all_svs) = 10
> 
> # the I call a function  with 2 list parameters
> def proc_line(line,all_svs) :
> 
> # inside the function the length of the list "all_svs" is 1 more -> 11
> # I had to workaround it

This sounds like a usage question.  Please use comp.lang.python (or
possibly the tutor mailing list).
-- 
Aahz ([email protected])   <*> http://www.pythoncraft.com/

"If you think it's expensive to hire a professional to do the job, wait
until you hire an amateur."  --Red Adair
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-24 Thread James Y Knight


On Apr 24, 2009, at 6:05 PM, Paul Moore wrote:

- Windows systems where broken Unicode (lone surrogates or whatever)
isn't involved
- Unix systems where the user's stated filesystem encoding is correct

Can you honestly say that this isn't the vast majority of real-world
environments? (IIRC, you are based in Japan, so it may well be true
that the likelihood of problems is a lot higher where you are than
where I am - the UK - but I suspect that averaging out, things are
generally as above).


In my experience, it is normal on most unix systems that some programs  
(mostly daemons) are running in default "POSIX" locale, others (most  
user programs) are running in the "en_US.utf-8" locale, and some  
luddite users have set themselves to "en_US.8859-1". All running on  
the same system.


James
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-24 Thread Terry Reedy


Toshio Kuratomi wrote:

Glenn Linderman wrote:

On approximately 4/24/2009 11:40 AM, came the following characters from
And so my encoding (1) doesn't alter the data stream for any valid
Windows file name, and where the naivest of users reside (2) doesn't
alter the data stream for any Posix file name that was encoded as UTF-8
sequences and doesn't contain ? characters in the file name [I perceive
the use of ? in file names to be rare on Posix, because of experience,
and because of the other problems caused by such use] (3) doesn't
introduce data puns within applications that are correctly coded to know
the encoding occurs.  The encoding technique in the PEP not only can
produce data puns, thus not being reversible, it provides no reliable
mechanism to know that this has occurred.


Uhm  Not arguing with your goals but '?' is unfortunately reasonably
easy to get into a filename.  For instance, I've had to download a lot
of scratch built packages from our buildsystem recently.  Scratch builds
have url's with query strings in them so::


Is NUL \0 allowed in POSIX file names?  If not, could that be used as an 
escape char.  If it is not legal, then custom translated strings that 
escape in the wild would raise a red flag as soon as something else 
tried to use them.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Tuples and underorderable types

2009-04-24 Thread Terry Reedy


Raymond Hettinger wrote:



I would discourage use of the decorate/sort/undecorate pattern,
and encourage use of the key= argument. Or, if you really need
to decorate into a tuple, still pass a key= argument.


The bug report was actually about the sched module which used
heapq to prioritize tuples consisting of times, priorities, and actions.
I fixed and closed the original bug a few hours ago but had a
thought that the pattern itself may be ubiquitious (especially with heapq).
ISTM that other bugs like this are lurking about.  But all of you guys 
seem to think the status quo is fine, so that's the end of it.


If you define the bug as the sched module not being updated to the 3.0 
order, then there are possibly more.


I notice that most of the heapq functions do not take a key function 
argument.  Has or will this change in the future?  Or is making 
key-decorated tuples the responsibility of the user?  (I can see that a 
key func would work better with PriQueue class where the key func is 
passed just once.)


tjr

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-24 Thread Toshio Kuratomi

Terry Reedy wrote:

> Is NUL \0 allowed in POSIX file names?  If not, could that be used as an
> escape char.  If it is not legal, then custom translated strings that
> escape in the wild would raise a red flag as soon as something else
> tried to use them.
> 
AFAIK NUL should be okay but I haven't read a specification to reach
that conclusion.  Is that a proposal?  Should I go find someone who has
read the relevant standards to find out?

-Toshio



signature.asc
Description: OpenPGP digital signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-24 Thread Cameron Simpson

On 24Apr2009 18:20, Toshio Kuratomi  wrote:
| Terry Reedy wrote:
| > Is NUL \0 allowed in POSIX file names?  If not, could that be used as an
| > escape char.  If it is not legal, then custom translated strings that
| > escape in the wild would raise a red flag as soon as something else
| > tried to use them.
| > 
| AFAIK NUL should be okay but I haven't read a specification to reach
| that conclusion.  Is that a proposal?  Should I go find someone who has
| read the relevant standards to find out?

NUL cannot occur in a POSIX file path, if for no other reason than that
the API uses C strings, which are NUL terminated.

So, yes, you could use NUL as an escape character if you're sure you're
never dealing with _non_POSIX pathnames:-)

Cheers,
-- 
Cameron Simpson  DoD#743
http://www.cskk.ezoshosting.com/cs/

| I'm the female partner of a climber (I don't climb) and until now, I was
| under the impression that climbers are cool people, but alas, you had to
| ruin it for me.
*REAL* climbers are crude, impolite, solitary, abrupt, arrogant.  Sport
climbers are cool.
- Rene Tio  in rec.climbing
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

50 matches

Mail list logo