Re: [Python-Dev] Re: hierarchicial named groups extension to the re library

2005-04-07 Thread Anthony Baxter
On Sunday 03 April 2005 16:48, Martin v. Löwis wrote:
> If this kind of functionality would fall on immediate rejection for
> some reason, even writing the PEP might be pointless.

Note that even if something is rejected, the PEP itself is useful - it
collects knowledge in a format that's far more accessible than searching
the mailing list archives. 

(note that I'm not talking about this particular case, but about PEPs in
general - I have no opinion on the current proposal, because I'm not a
heavy user of REs)

-- 
Anthony Baxter <[EMAIL PROTECTED]>
It's never too late to have a happy childhood.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode byte order mark decoding

2005-04-07 Thread M.-A. Lemburg
Nicholas Bastin wrote:
> 
> On Apr 5, 2005, at 6:19 AM, M.-A. Lemburg wrote:
> 
>> Note that the UTF-16 codec is strict w/r to the presence
>> of the BOM mark: you get a UnicodeError if a stream does
>> not start with a BOM mark. For the UTF-8-SIG codec, this
>> should probably be relaxed to not require the BOM.
> 
> 
> I've actually been confused about this point for quite some time now,
> but never had a chance to bring it up.  I do not understand why
> UnicodeError should be raised if there is no BOM.  I know that PEP-100
> says:
> 
> 'utf-16': 16-bit variable length encoding (little/big endian)
> 
> and:
> 
> Note: 'utf-16' should be implemented by using and requiring byte order
> marks (BOM) for file input/output.
> 
> But this appears to be in error, at least in the current unicode
> standard.  'utf-16', as defined by the unicode standard, is big-endian
> in the absence of a BOM:
> 
> ---
> 3.10.D42:  UTF-16 encoding scheme:
> ...
> * The UTF-16 encoding scheme may or may not begin with a BOM.  However,
> when there is no BOM, and in the absence of a higher-level protocol, the
> byte order of the UTF-16 encoding scheme is big-endian.
> ---

The problem is "in the absence of a higher level protocol": the
codec doesn't know anything about a protocol - it's the application
using the codec that knows which protocol get's used. It's a lot
safer to require the BOM for UTF-16 streams and raise an exception
to have the application decide whether to use UTF-16-BE or the
by far more common UTF-16-LE.

Unlike for the UTF-8 codec, the BOM for UTF-16 is a configuration
parameter, not merely a signature.

In terms of history, I don't recall whether your quote was
already in the standard at the time I wrote the PEP. You are the
first to have reported a problem with the current implementation
(which has been around since 2000), so I believe that application
writers are more comfortable with the way the UTF-16 codec
is currently implemented. Explicit is better than implicit :-)

> The current implementation of the utf-16 codecs makes for some
> irritating gymnastics to write the BOM into the file before reading it
> if it contains no BOM, which seems quite like a bug in the codec. 

The codec writes a BOM in the first call to .write() - it
doesn't write a BOM before reading from the file.

> I allow for the possibility that this was ambiguous in the standard when
> the PEP was written, but it is certainly not ambiguous now.

See above.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Apr 07 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] threading (GilState) question

2005-04-07 Thread Michael Hudson
I recently redid how the readline module handled threads around
callbacks into Python (the previous code was insane).

This resulted in the following bug report:

http://www.python.org/sf/1176893

Which is correctly assigned to me as it's clearly a result of my
recent checkin.  However, I think my code is correct and the fault
lies elsewhere.

Basically, if you call PyGilState_Release before PyEval_InitThreads
you crash, because PyEval_ReleaseThread gets called while
interpreter_lock is NULL.  This is very simple to make go away -- the
problem is that there are several ways!

Point the first is that I really think this is a bug in the GilState
APIs: the readline API isn't inherently multi-threaded and so it would
be insane to call PyEval_InitThreads() in initreadline, yet it has to
cope with being called in a multithreaded situation.  If you can't use
the GilState APIs in this situation, what are they for?

Option 1) Call PyEval_ThreadsInitialized() in PyGilState_Release().
Non-invasive, but bleh.

Option 2) Call PyEval_SaveThread() instead of
PyEval_ReleaseThread()[1] in PyGilState_Release().  This is my
favourite option (PyGilState_Ensure() calls PyEval_RestoreThread which
is PyEval_SaveThread()s "mate") and I guess you can distill this long
mail into the question "why doesn't PyGilState_Release do this
already?"

Option 3) Make PyEval_ReleaseThread() not crash when interpreter_lock
== NULL.  Easy, but it's actually documented that you can't do this.

Opinions?  Am I placing too much trust into PyGilState_Release()s
existing choice of function?

Cheers,
mwh

[1] The issue of having almost-but-not-quite identical variations of
API functions -- here

 PyEval_AcquireThread/PyEval_ReleaseThread
 vs. PyEval_RestoreThread/PyEval_SaveThread

-- is something I can rant about at length, if anyone is
interested :)

--
  I located the link but haven't bothered to re-read the article,
  preferring to post nonsense to usenet before checking my facts.
  -- Ben Wolfson, comp.lang.python
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] threading (GilState) question

2005-04-07 Thread Nick Coghlan
Michael Hudson wrote:
Option 1) Call PyEval_ThreadsInitialized() in PyGilState_Release().
Non-invasive, but bleh.
Tim rejected this option back when PyEval_ThreadsInitialized() was added to the 
API [1]. Gustavo was having a similar problem with pygtk, and the end result was 
to add the ThreadsInitialized API so that pygtk could make its own check without 
slowing down the default case in the core.

Option 2) Call PyEval_SaveThread() instead of
PyEval_ReleaseThread()[1] in PyGilState_Release().  This is my
favourite option (PyGilState_Ensure() calls PyEval_RestoreThread which
is PyEval_SaveThread()s "mate") and I guess you can distill this long
mail into the question "why doesn't PyGilState_Release do this
already?"
See above. Although I'm now wondering about the opposite question: Why doesn't 
PyGilState_Ensure use PyEval_AcquireThread?

Cheers,
Nick.
[1] 
http://sourceforge.net/tracker/?func=detail&aid=1044089&group_id=5470&atid=305470
[2] http://mail.python.org/pipermail/python-dev/2004-August/047870.html

--
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
http://boredomandlaziness.skystorm.net
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] threading (GilState) question

2005-04-07 Thread Michael Hudson
Nick Coghlan <[EMAIL PROTECTED]> writes:

> Michael Hudson wrote:
>> Option 1) Call PyEval_ThreadsInitialized() in PyGilState_Release().
>> Non-invasive, but bleh.
>
> Tim rejected this option back when PyEval_ThreadsInitialized() was
> added to the API [1].

Well, not really.  The patch that was rejected was much larger than
any proposal of mine.  My option 1) is this:

--- pystate.c   09 Feb 2005 10:56:18 +  2.39
+++ pystate.c   07 Apr 2005 13:19:55 +0100  
@@ -502,7 +502,8 @@
  PyThread_delete_key_value(autoTLSkey);
  }
  /* Release the lock if necessary */
- else if (oldstate == PyGILState_UNLOCKED)
-  PyEval_ReleaseThread(tcur);
+  else if (oldstate == PyGILState_UNLOCKED
+   && PyEval_ThreadsInitialized())
+   PyEval_ReleaseThread();
 }
 #endif /* WITH_THREAD */

> Gustavo was having a similar problem with pygtk, and the end result
> was to add the ThreadsInitialized API so that pygtk could make its
> own check without slowing down the default case in the core.

Well, Gustavo seemed to be complaining about the cost of the locking.
I'm complaining about crashes.

>> Option 2) Call PyEval_SaveThread() instead of
>> PyEval_ReleaseThread()[1] in PyGilState_Release().  This is my
>> favourite option (PyGilState_Ensure() calls PyEval_RestoreThread which
>> is PyEval_SaveThread()s "mate") and I guess you can distill this long
>> mail into the question "why doesn't PyGilState_Release do this
>> already?"

This option corresponds to this patch:

--- pystate.c   09 Feb 2005 10:56:18 +  2.39
+++ pystate.c   07 Apr 2005 13:24:33 +0100  
@@ -503,6 +503,6 @@
   }
   /* Release the lock if necessary */
   else if (oldstate == PyGILState_UNLOCKED)
-  PyEval_ReleaseThread(tcur);
+  PyEval_SaveThread();
 }
 #endif /* WITH_THREAD */

> See above. Although I'm now wondering about the opposite question: Why
> doesn't PyGilState_Ensure use PyEval_AcquireThread?

Well, that would make more sense than what we have now.  OTOH, I'd
*much* rather make the PyGilState functions more tolerant -- I thought
being vaguely easy to use was part of their point.

I fail to believe the patch associated with option 2) has any
detectable performance cost.

Cheers,
mwh

-- 
  People think I'm a nice guy, and the fact is that I'm a scheming,
  conniving bastard who doesn't care for any hurt feelings or lost
  hours of work if it just results in what I consider to be a better
  system.-- Linus Torvalds
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode byte order mark decoding

2005-04-07 Thread Nicholas Bastin
On Apr 7, 2005, at 5:07 AM, M.-A. Lemburg wrote:
The current implementation of the utf-16 codecs makes for some
irritating gymnastics to write the BOM into the file before reading it
if it contains no BOM, which seems quite like a bug in the codec.
The codec writes a BOM in the first call to .write() - it
doesn't write a BOM before reading from the file.
Yes, see, I read a *lot* of UTF-16 that comes from other sources.  It's 
not a matter of writing with python and reading with python.

--
Nick
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] threading (GilState) question

2005-04-07 Thread Tim Peters
[Michael Hudson]
> ...
> Point the first is that I really think this is a bug in the GilState
> APIs: the readline API isn't inherently multi-threaded and so it would
> be insane to call PyEval_InitThreads() in initreadline, yet it has to
> cope with being called in a multithreaded situation.  If you can't use
> the GilState APIs in this situation, what are they for?

That's explained in the PEP -- of course :

http://www.python.org/peps/pep-0311.html

Under "Limitations and Exclusions" it specifically disowns
responsibility for worrying about whether Py_Initialize() and
PyEval_InitThreads() have been called:

This API will not perform automatic initialization of Python, or
initialize Python for multi-threaded operation.  Extension authors
must continue to call Py_Initialize(), and for multi-threaded
applications, PyEval_InitThreads().  The reason for this is that
the first thread to call PyEval_InitThreads() is nominated as the
"main thread" by Python, and so forcing the extension author to
specify the main thread (by forcing her to make this first call)
removes ambiguity.  As Py_Initialize() must be called before
PyEval_InitThreads(), and as both of these functions currently
support being called multiple times, the burden this places on
extension authors is considered reasonable.

That doesn't mean there isn't a clever way to get the same effect
anyway, but I don't have time to think about it, and reassigned the
bug report to Mark (who may or may not have time).
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode byte order mark decoding

2005-04-07 Thread M.-A. Lemburg
Nicholas Bastin wrote:
> 
> On Apr 7, 2005, at 5:07 AM, M.-A. Lemburg wrote:
> 
>>> The current implementation of the utf-16 codecs makes for some
>>> irritating gymnastics to write the BOM into the file before reading it
>>> if it contains no BOM, which seems quite like a bug in the codec.
>>
>>
>> The codec writes a BOM in the first call to .write() - it
>> doesn't write a BOM before reading from the file.
> 
> 
> Yes, see, I read a *lot* of UTF-16 that comes from other sources.  It's
> not a matter of writing with python and reading with python.

Ok, but I don't really follow you here: you are suggesting to
relax the current UTF-16 behavior and to start defaulting to
UTF-16-BE if no BOM is present - that's most likely going to
cause more problems that it seems to solve: namely complete
garbage if the data turns out to be UTF-16-LE encoded and,
what's worse, enters the application undetected.

If you do have UTF-16 without a BOM mark it's much better
to let a short function analyze the text by reading for first
few bytes of the file and then make an educated guess based
on the findings. You can then process the file using one
of the other codecs UTF-16-LE or -BE.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Apr 07 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] threading (GilState) question

2005-04-07 Thread Michael Hudson
Tim Peters <[EMAIL PROTECTED]> writes:

> [Michael Hudson]
>> ...
>> Point the first is that I really think this is a bug in the GilState
>> APIs: the readline API isn't inherently multi-threaded and so it would
>> be insane to call PyEval_InitThreads() in initreadline, yet it has to
>> cope with being called in a multithreaded situation.  If you can't use
>> the GilState APIs in this situation, what are they for?
>
> That's explained in the PEP -- of course :
>
> http://www.python.org/peps/pep-0311.html

Gnarr.  Of course, I read this passage.  I think it's missing a use
case.

> Under "Limitations and Exclusions" it specifically disowns
> responsibility for worrying about whether Py_Initialize() and
> PyEval_InitThreads() have been called:
>
[snip quote]

This suggests that I should call PyEval_InitThreads() in
initreadline(), which seems daft.

> That doesn't mean there isn't a clever way to get the same effect
> anyway,

Pah.  There's a very simple way (see my reply to Nick).  It even works
in the case that PyEval_InitThreads() is called in between the call to
PyGilState_Ensure() and PyGilState_Release().

> but I don't have time to think about it, and reassigned the bug
> report to Mark (who may or may not have time).

He gets a week :)

Cheers,
mwh

-- 
  Or here's an even simpler indicator of how much C++ sucks: Print
  out the C++ Public Review Document.  Have someone  hold it about
  three feet  above your head and then drop it.  Thus  you will be
  enlightened.-- Thant Tessman
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode byte order mark decoding

2005-04-07 Thread Nicholas Bastin
On Apr 7, 2005, at 11:35 AM, M.-A. Lemburg wrote:
Ok, but I don't really follow you here: you are suggesting to
relax the current UTF-16 behavior and to start defaulting to
UTF-16-BE if no BOM is present - that's most likely going to
cause more problems that it seems to solve: namely complete
garbage if the data turns out to be UTF-16-LE encoded and,
what's worse, enters the application undetected.
The crux of my argument is that the spec declares that UTF-16 without a 
BOM is BE.  If the file is encoded in UTF-16LE and it doesn't have a 
BOM, it doesn't deserve to be processed correctly.  That being said, 
treating it as UTF-16BE if it's LE will result in a lot of invalid code 
points, so it shouldn't be non-obvious that something has gone wrong.

If you do have UTF-16 without a BOM mark it's much better
to let a short function analyze the text by reading for first
few bytes of the file and then make an educated guess based
on the findings. You can then process the file using one
of the other codecs UTF-16-LE or -BE.
This is about what we do now - we catch UnicodeError and then add a BOM 
to the file, and read it again.  We know our files are UTF-16BE if they 
don't have a BOM, as the files are written by code which observes the 
spec.  We can't use UTF-16BE all the time, because sometimes they're 
UTF-16LE, and in those cases the BOM is set.

It would be nice if you could optionally specify that the codec would 
assume UTF-16BE if no BOM was present, and not raise UnicodeError in 
that case, which would preserve the current behaviour as well as allow 
users' to ask for behaviour which conforms to the standard.

I'm not saying that you can't work around the issue now, what I'm 
saying is that you shouldn't *have* to - I think there is a reasonable 
expectation that the UTF-16 codec conforms to the spec, and if you 
wanted it to do something else, it is those users who should be forced 
to come up with a workaround.

--
Nick
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode byte order mark decoding

2005-04-07 Thread Walter Dörwald
Nicholas Bastin sagte:

> On Apr 7, 2005, at 11:35 AM, M.-A. Lemburg wrote:
>
> [...]
>> If you do have UTF-16 without a BOM mark it's much better
>> to let a short function analyze the text by reading for first
>> few bytes of the file and then make an educated guess based
>> on the findings. You can then process the file using one
>> of the other codecs UTF-16-LE or -BE.
>
> This is about what we do now - we catch UnicodeError and
> then add a BOM  to the file, and read it again.  We know
> our files are UTF-16BE if they  don't have a BOM, as the
> files are written by code which observes the  spec.
> We can't use UTF-16BE all the time, because sometimes
> they're UTF-16LE, and in those cases the BOM is set.
>
> It would be nice if you could optionally specify that the
> codec would assume UTF-16BE if no BOM was present,
> and not raise UnicodeError in  that case, which would
> preserve the current behaviour as well as allow users'
> to ask for behaviour which conforms to the standard.

It should be feasible to implement your own codec for that
based on Lib/encodings/utf_16.py. Simply replace the line
in StreamReader.decode():
   raise UnicodeError,"UTF-16 stream does not start with BOM"
with:
   self.decode = codecs.utf_16_be_decode
and you should be done.

> [...]

Bye,
   Walter Dörwald



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode byte order mark decoding

2005-04-07 Thread Martin v. Löwis
Nicholas Bastin wrote:
> It would be nice if you could optionally specify that the codec would
> assume UTF-16BE if no BOM was present, and not raise UnicodeError in
> that case, which would preserve the current behaviour as well as allow
> users' to ask for behaviour which conforms to the standard.

Alternatively, the UTF-16BE codec could support the BOM, and do
UTF-16LE if the "other" BOM is found.

This would also support your usecase, and in a better way. The
Unicode assertion that UTF-16 is BE by default is void these
days - there is *always* a higher layer protocol, and it more
often than not specifies (perhaps not in English words, but
only in the source code of the generator) that the default should
by LE.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode byte order mark decoding

2005-04-07 Thread Walter Dörwald
Walter Dörwald sagte:

> Nicholas Bastin sagte:
>
> It should be feasible to implement your own codec for that
> based on Lib/encodings/utf_16.py. Simply replace the line
> in StreamReader.decode():
>   raise UnicodeError,"UTF-16 stream does not start with BOM"
> with:
>   self.decode = codecs.utf_16_be_decode
> and you should be done.

Oops, this only works if you have a big endian system.
Otherwise you have to redecode the input with:
   codecs.utf_16_ex_decode(input, errors, 1, False)

Bye,
   Walter Dörwald



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode byte order mark decoding

2005-04-07 Thread M.-A. Lemburg
Martin v. Löwis wrote:
> Nicholas Bastin wrote:
> 
>>It would be nice if you could optionally specify that the codec would
>>assume UTF-16BE if no BOM was present, and not raise UnicodeError in
>>that case, which would preserve the current behaviour as well as allow
>>users' to ask for behaviour which conforms to the standard.
> 
> 
> Alternatively, the UTF-16BE codec could support the BOM, and do
> UTF-16LE if the "other" BOM is found.

That would violate the Unicode standard - the BOM character
for UTF-16-LE and -BE must be interpreted as ZWNBSP.

> This would also support your usecase, and in a better way. The
> Unicode assertion that UTF-16 is BE by default is void these
> days - there is *always* a higher layer protocol, and it more
> often than not specifies (perhaps not in English words, but
> only in the source code of the generator) that the default should
> by LE.

I've checked the various versions of the Unicode standard
docs: it seems that the quote you have was silently introduced
between 3.0 and 4.0.

Python currently uses version 3.2.0 of the standard and I don't
think enough people are aware of the change in the standard to make
a case for dropping the exception raising in the case of a UTF-16
finding a stream without a BOM mark.

By the time we switch to 4.1 or later, we can then
make the change in the native UTF-16 codec as you
requested.

Personally, I think that the Unicode consortium should not
have introduced a default for the UTF-16 encoding byte
order. Using big endian as default in a world where most
Unicode data is created on little endian machines is not
very realistic either.

Note that the UTF-16 codec starts reading data in
the machines native byte order and then learns a possibly
different byte order by looking for BOMs.

Implementing a codec which implements the 4.0 behavior
is easy, though.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Apr 07 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode byte order mark decoding

2005-04-07 Thread Stephen J. Turnbull
> "MvL" == "Martin v. Löwis" <[EMAIL PROTECTED]> writes:

MvL> This would also support your usecase, and in a better way.
MvL> The Unicode assertion that UTF-16 is BE by default is void
MvL> these days - there is *always* a higher layer protocol, and
MvL> it more often than not specifies (perhaps not in English
MvL> words, but only in the source code of the generator) that the
MvL> default should by LE.

That is _not_ a protocol.  A protocol is a published specification,
not merely a frequent accident of implementation.  Anyway, both ISO
10646 and the Unicode standard consider that "internal use" and there
is no requirement at all placed on those data.  And such generators
typically take great advantage of that freedom---have you looked in a
.doc file recently?  Have you noticed how many different options
(previous implementations) of .doc are offered in the Import menu?

> "MAL" == "M.-A. Lemburg" <[EMAIL PROTECTED]> writes:

MAL> I've checked the various versions of the Unicode standard
MAL> docs: it seems that the quote you have was silently
MAL> introduced between 3.0 and 4.0.

Probably because ISO 10646 was _always_ BE until the standards were
unified.  But note that ISO 10646 standardizes only use as a
communications medium.  Neither ISO 10646 nor Unicode makes any
specification about internal usage.  Conformance in internal
processing is a matter of the programmer's convenience in producing
conforming output.

MAL> Python currently uses version 3.2.0 of the standard and I
MAL> don't think enough people are aware of the change in the
MAL> standard

There's only one (corporate) person that matters: Microsoft.

MAL> By the time we switch to 4.1 or later, we can then make the
MAL> change in the native UTF-16 codec as you requested.

While in principle I sympathize with Nick, pragmatically Microsoft is
unlikely to conform.  They will take the position that files created
by Windows are "internal" to the Windows environment, except where
explicitly intended for exchange with arbitrary platforms, and only
then will they conform.  As Martin points out, that is what really
matters for these defaults.  I think you should look to see what
Microsoft does.

MAL> Personally, I think that the Unicode consortium should not
MAL> have introduced a default for the UTF-16 encoding byte
MAL> order. Using big endian as default in a world where most
MAL> Unicode data is created on little endian machines is not very
MAL> realistic either.

It's not a default for the UTF-16 encoding byte order.  It's a default
for the UTF-16 encoding byte order _when UTF-16 is a communications
medium_.  Given that the generic network byte order is bigendian, I
think it would be insane to specify littleendian as Unicode's default.

With Unicode same as network, you specify UTF-16 strings internally as
an array of uint16_t, and when you put them on the wire (including
saving them to a file that might be put on the wire as octet-stream)
you apply htons(3) to it.  On reading, you apply ntohs(3) to it.  The
source code is portable, the file is portable.  How can you beat that?

-- 
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of TsukubaTennodai 1-1-1 Tsukuba 305-8573 JAPAN
   Ask not how you can "do" free software business;
  ask what your business can "do for" free software.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


RE: [Python-Dev] Developer list update

2005-04-07 Thread Raymond Hettinger
Does anyone know what has become of the following developers and perhaps
have their current email addresses?  Are any of these folks still active
in Python development?

  Ben Gertzfield
  Charles G Waldman
  Eric Price
  Finn Bock
  Ken Manheimer
  Moshe Zadka


Raymond Hettinger
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Developer list update

2005-04-07 Thread Alex Martelli
On Apr 7, 2005, at 07:58, Raymond Hettinger wrote:
Does anyone know what has become of the following developers and 
perhaps
have their current email addresses?  Are any of these folks still 
active
in Python development?

  Ben Gertzfield
  Charles G Waldman
  Eric Price
  Finn Bock
  Ken Manheimer
  Moshe Zadka
Moshe was at Pycon (sorry I didn't think of introducing you to each 
other!) so I do assume he's still active.

Alex
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] New style classes and operator methods

2005-04-07 Thread Greg Ewing
I think I've found a small flaw in the implementation
of binary operator methods for new-style Python classes.
If the left and right operands are of the same class,
and the class implements a right operand method but
not a left operand method, the right operand method
is not called. Instead, two attempts are made to call
the left operand method.
I'm surmising this is because both calls are funnelled
through the same C-level method, which is using the
types of the operands to decide whether to call the
left or right Python methods.
I suppose this isn't really a serious problem, since
it's easily worked around by always defining at least
a left operand method. But I thought I'd point it out
anyway.
The following example illustrates the problem:
class NewStyleSpam(object):
  def __add__(self, other):
print "NewStyleSpam.__add__", self, other
return NotImplemented
  def __radd__(self, other):
print "NewStyleSpam.__radd__", self, other
return 42
x1 = NewStyleSpam()
x2 = NewStyleSpam()
print x1 + x2
which produces:
NewStyleSpam.__add__ <__main__.NewStyleSpam object at 0x4019062c> 
<__main__.NewStyleSpam object at 0x4019056c>
NewStyleSpam.__add__ <__main__.NewStyleSpam object at 0x4019062c> 
<__main__.NewStyleSpam object at 0x4019056c>
Traceback (most recent call last):
  File "/home/cosc/staff/research/greg/tmp/foo.py", line 27, in ?
print x1 + x2
TypeError: unsupported operand type(s) for +: 'NewStyleSpam' and 'NewStyleSpam'

Old-style classes, on the other hand, work as expected:
class OldStyleSpam:
  def __add__(self, other):
print "OldStyleSpam.__add__", self, other
return NotImplemented
  def __radd__(self, other):
print "OldStyleSpam.__radd__", self, other
return 42
y1 = OldStyleSpam()
y2 = OldStyleSpam()
print y1 + y2
produces:
OldStyleSpam.__add__ <__main__.OldStyleSpam instance at 0x4019054c> 
<__main__.OldStyleSpam instance at 0x401901ec>
OldStyleSpam.__radd__ <__main__.OldStyleSpam instance at 0x401901ec> 
<__main__.OldStyleSpam instance at 0x4019054c>
42

--
Greg Ewing, Computer Science Dept, +--+
University of Canterbury,  | A citizen of NewZealandCorp, a   |
Christchurch, New Zealand  | wholly-owned subsidiary of USA Inc.  |
[EMAIL PROTECTED]  +--+
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com