date:20100107

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-07 Thread Glyph Lefkowitz


On Jan 7, 2010, at 11:21 PM, Guido van Rossum wrote:

> On Thu, Jan 7, 2010 at 7:34 PM, Glyph Lefkowitz  
> wrote:
>> 
>> On Jan 7, 2010, at 7:52 PM, Guido van Rossum wrote:
>>> 
>>> I'm a little hesitant about this. First of all, UTF-8 + BOM is crazy
>>> talk. And for the other two, perhaps it would make more sense to have
>>> a separate encoding-guessing function that takes a binary stream and
>>> returns a text stream wrapping it with the proper encoding?
>> 
>> It *is* crazy, but unfortunately rather common.  Wikipedia has a good
>> description of the issues:
>> .  Basically, some
>> Windows text APIs will emit a UTF-8 "BOM" in order to identify the file as
>> being UTF-8, so it's become a convention to do that.  That's not good
>> enough, so you need to guess the encoding as well to make sure, but if there
>> is a BOM and you can otherwise verify that the file is probably UTF-8
>> encoded, you should discard it.
> 
> That doesn't make sense. If the file isn't UTF-8 you can't see the
> BOM, because the BOM itself is UTF-8-encoded.

I'm saying that the BOM itself isn't enough to detect that the file is actually 
UTF-8.  If (for whatever reason: explicitly specified, guessed in some other 
way) the file's encoding is determined to be something else, the bytes 
comprising the BOM should be decoded as normal.  It's just that the UTF-8 
decoding of the BOM at the start of a file should be "".

> (And yes, I know this happens. Doesn't mean we need to auto-guess by
> default; there are lots of issues e.g. what should happen after
> seeking to offset 0?)

I think it's pretty clear that the BOM should still be skipped in that case ...

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-07 Thread Tres Seaver

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Guido van Rossum wrote:
> On Thu, Jan 7, 2010 at 7:34 PM, Glyph Lefkowitz  
> wrote:
>>
>> On Jan 7, 2010, at 7:52 PM, Guido van Rossum wrote:
>>
>> On Thu, Jan 7, 2010 at 4:10 PM, Victor Stinner
>>  wrote:
>>
>> Hi,
>>
>> Builtin open() function is unable to open an UTF-16/32 file starting with a
>>
>> BOM if the encoding is not specified (raise an unicode error). For an UTF-8
>>
>> file starting with a BOM, read()/readline() returns also the BOM whereas the
>>
>> BOM should be "ignored".
>>
>> I'm a little hesitant about this. First of all, UTF-8 + BOM is crazy
>> talk. And for the other two, perhaps it would make more sense to have
>> a separate encoding-guessing function that takes a binary stream and
>> returns a text stream wrapping it with the proper encoding?
>>
>> It *is* crazy, but unfortunately rather common.  Wikipedia has a good
>> description of the issues:
>> .  Basically, some
>> Windows text APIs will emit a UTF-8 "BOM" in order to identify the file as
>> being UTF-8, so it's become a convention to do that.  That's not good
>> enough, so you need to guess the encoding as well to make sure, but if there
>> is a BOM and you can otherwise verify that the file is probably UTF-8
>> encoded, you should discard it.
> 
> That doesn't make sense. If the file isn't UTF-8 you can't see the
> BOM, because the BOM itself is UTF-8-encoded.
> 
> (And yes, I know this happens. Doesn't mean we need to auto-guess by
> default; there are lots of issues e.g. what should happen after
> seeking to offset 0?)

The BOM should not be seekeable if the file is opened with the proposed
"guess encoding from BOM" mode:  it isn't properly part of the stream at
all in that case.

A UTF-8 BOM is an absurditiy, but it exists *everywhere* in the wild:
Python would do wll to make it as easy as possible to consume such
files, as well as the non-insane versions (UTF-16 / UTF-32 BOMs).  In
the best of all possible worlds, I would just try opening the file so:

  f = open('/path/to/file', 'r', encoding="DWIFM")

and any BOM present would set the encoding for the remainder of the stream..



Tres.
- --
===
Tres Seaver  +1 540-429-0999  tsea...@palladion.com
Palladion Software   "Excellence by Design"http://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAktGzLsACgkQ+gerLs4ltQ5+cwCdGfycPdj6+cPfD23vH644SpHL
sI0AoLGD7nfgMEJdJhBr90yjQQHfDgcJ
=js+2
-END PGP SIGNATURE-

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-07 Thread Stephen J. Turnbull

Guido van Rossum writes:

 > I'm a little hesitant about this. First of all, UTF-8 + BOM is crazy
 > talk.

That doesn't stop many applications from doing it.  Python should
perhaps not produce UTF-8 + BOM without a disclaimer of
indemnification against all resulting damage, signed in blood, from
the user for each instance.

But it should do something sane when reading such files.  I can't
really see any harm in throwing it away, especially since use of
ZERO-WIDTH NO-BREAK SPACE as a joining character has been deprecated
IIRC.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-07 Thread Guido van Rossum

On Thu, Jan 7, 2010 at 7:34 PM, Glyph Lefkowitz  wrote:
>
>
> On Jan 7, 2010, at 7:52 PM, Guido van Rossum wrote:
>
> On Thu, Jan 7, 2010 at 4:10 PM, Victor Stinner
>  wrote:
>
> Hi,
>
> Builtin open() function is unable to open an UTF-16/32 file starting with a
>
> BOM if the encoding is not specified (raise an unicode error). For an UTF-8
>
> file starting with a BOM, read()/readline() returns also the BOM whereas the
>
> BOM should be "ignored".
>
> I'm a little hesitant about this. First of all, UTF-8 + BOM is crazy
> talk. And for the other two, perhaps it would make more sense to have
> a separate encoding-guessing function that takes a binary stream and
> returns a text stream wrapping it with the proper encoding?
>
> It *is* crazy, but unfortunately rather common.  Wikipedia has a good
> description of the issues:
> .  Basically, some
> Windows text APIs will emit a UTF-8 "BOM" in order to identify the file as
> being UTF-8, so it's become a convention to do that.  That's not good
> enough, so you need to guess the encoding as well to make sure, but if there
> is a BOM and you can otherwise verify that the file is probably UTF-8
> encoded, you should discard it.

That doesn't make sense. If the file isn't UTF-8 you can't see the
BOM, because the BOM itself is UTF-8-encoded.

(And yes, I know this happens. Doesn't mean we need to auto-guess by
default; there are lots of issues e.g. what should happen after
seeking to offset 0?)

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] --enabled-shared broken on freebsd5?

2010-01-07 Thread Tres Seaver

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Nicholas Bastin wrote:
> I think this problem probably needs to move over to distutils-sig, as
> it doesn't seem to be specific to the way that Python itself uses
> distutils.  distutils.command.build_ext tests for Py_ENABLE_SHARED on
> linux and solaris and automatically adds '.' to the library_dirs, and
> I suspect it just needs to do this on FreeBSD as well (adding bsd to
> the list of platforms for which this is performed "solves" the
> problem, but I don't pretend to know enough about either distutils or
> freebsd to determine if this is the correct solution).

I wouldn't say it needed discussion on the SIG:  just create a bug
report, with the tentative patch you have worked out, and get it
assigned to Tarek.


Tres.
- --
===
Tres Seaver  +1 540-429-0999  tsea...@palladion.com
Palladion Software   "Excellence by Design"http://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAktGsdQACgkQ+gerLs4ltQ5BMQCgtV8snMXH/6dDwgdN4sIJljLd
koYAoKq6c0tKsRSrITHcygu4Od9FVzF5
=BJaE
-END PGP SIGNATURE-

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-07 Thread Glyph Lefkowitz

On Jan 7, 2010, at 7:52 PM, Guido van Rossum wrote:

> On Thu, Jan 7, 2010 at 4:10 PM, Victor Stinner
>  wrote:
>> Hi,
>> 
>> Builtin open() function is unable to open an UTF-16/32 file starting with a
>> BOM if the encoding is not specified (raise an unicode error). For an UTF-8
>> file starting with a BOM, read()/readline() returns also the BOM whereas the
>> BOM should be "ignored".

> I'm a little hesitant about this. First of all, UTF-8 + BOM is crazy
> talk. And for the other two, perhaps it would make more sense to have
> a separate encoding-guessing function that takes a binary stream and
> returns a text stream wrapping it with the proper encoding?

It *is* crazy, but unfortunately rather common.  Wikipedia has a good 
description of the issues: 
.  Basically, some Windows 
text APIs will emit a UTF-8 "BOM" in order to identify the file as being UTF-8, 
so it's become a convention to do that.  That's not good enough, so you need to 
guess the encoding as well to make sure, but if there is a BOM and you can 
otherwise verify that the file is probably UTF-8 encoded, you should discard it.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] --enabled-shared broken on freebsd5?

2010-01-07 Thread Nicholas Bastin

I think this problem probably needs to move over to distutils-sig, as
it doesn't seem to be specific to the way that Python itself uses
distutils.  distutils.command.build_ext tests for Py_ENABLE_SHARED on
linux and solaris and automatically adds '.' to the library_dirs, and
I suspect it just needs to do this on FreeBSD as well (adding bsd to
the list of platforms for which this is performed "solves" the
problem, but I don't pretend to know enough about either distutils or
freebsd to determine if this is the correct solution).

--
Nick
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-07 Thread MRAB


Guido van Rossum wrote:

I'm a little hesitant about this. First of all, UTF-8 + BOM is crazy
talk. And for the other two, perhaps it would make more sense to have
a separate encoding-guessing function that takes a binary stream and
returns a text stream wrapping it with the proper encoding?

Alternatively, have a universal UTF-8/16/32 encoding, ie one that 
expects UTF-8,

with or without BOM, or UTF-16/32 with BOM.


On Thu, Jan 7, 2010 at 4:10 PM, Victor Stinner
 wrote:

Hi,

Builtin open() function is unable to open an UTF-16/32 file starting with a
BOM if the encoding is not specified (raise an unicode error). For an UTF-8
file starting with a BOM, read()/readline() returns also the BOM whereas the
BOM should be "ignored".

See recent issues related to reading an UTF-8 text file including a BOM: #7185
(csv) and #7519 (ConfigParser). Such file can be opened in unicode mode with
the UTF-8-SIG encoding, but it's possible to do better.

I propose to improve open() (TextIOWrapper) by using the BOM to choose the
right encoding. I think that only files opened in read only mode should
support this new feature. *Read* the BOM in a *write* only file would cause
unexpected behaviours.

Since my proposition changes the result TextIOWrapper.read()/readline() for
files starting with a BOM, we might introduce an option to open() to enable
the new behaviour. But is it really needed to keep the backward compatibility?

I wrote a proof of concept attached to the issue #7651. My patch only changes
the behaviour of TextIOWrapper for reading files starting with a BOM. It
doesn't work yet if a seek() is used before the first read.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-07 Thread Guido van Rossum

I'm a little hesitant about this. First of all, UTF-8 + BOM is crazy
talk. And for the other two, perhaps it would make more sense to have
a separate encoding-guessing function that takes a binary stream and
returns a text stream wrapping it with the proper encoding?

--Guido

On Thu, Jan 7, 2010 at 4:10 PM, Victor Stinner
 wrote:
> Hi,
>
> Builtin open() function is unable to open an UTF-16/32 file starting with a
> BOM if the encoding is not specified (raise an unicode error). For an UTF-8
> file starting with a BOM, read()/readline() returns also the BOM whereas the
> BOM should be "ignored".
>
> See recent issues related to reading an UTF-8 text file including a BOM: #7185
> (csv) and #7519 (ConfigParser). Such file can be opened in unicode mode with
> the UTF-8-SIG encoding, but it's possible to do better.
>
> I propose to improve open() (TextIOWrapper) by using the BOM to choose the
> right encoding. I think that only files opened in read only mode should
> support this new feature. *Read* the BOM in a *write* only file would cause
> unexpected behaviours.
>
> Since my proposition changes the result TextIOWrapper.read()/readline() for
> files starting with a BOM, we might introduce an option to open() to enable
> the new behaviour. But is it really needed to keep the backward compatibility?
>
> I wrote a proof of concept attached to the issue #7651. My patch only changes
> the behaviour of TextIOWrapper for reading files starting with a BOM. It
> doesn't work yet if a seek() is used before the first read.
>
> --
> Victor Stinner
> http://www.haypocalc.com/
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-07 Thread Victor Stinner

Hi,

Builtin open() function is unable to open an UTF-16/32 file starting with a 
BOM if the encoding is not specified (raise an unicode error). For an UTF-8 
file starting with a BOM, read()/readline() returns also the BOM whereas the 
BOM should be "ignored".

See recent issues related to reading an UTF-8 text file including a BOM: #7185 
(csv) and #7519 (ConfigParser). Such file can be opened in unicode mode with 
the UTF-8-SIG encoding, but it's possible to do better.

I propose to improve open() (TextIOWrapper) by using the BOM to choose the 
right encoding. I think that only files opened in read only mode should 
support this new feature. *Read* the BOM in a *write* only file would cause 
unexpected behaviours.

Since my proposition changes the result TextIOWrapper.read()/readline() for 
files starting with a BOM, we might introduce an option to open() to enable 
the new behaviour. But is it really needed to keep the backward compatibility?

I wrote a proof of concept attached to the issue #7651. My patch only changes 
the behaviour of TextIOWrapper for reading files starting with a BOM. It 
doesn't work yet if a seek() is used before the first read.

-- 
Victor Stinner
http://www.haypocalc.com/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] relation between Python.asdl and Tools/compiler/ast.txt

2010-01-07 Thread Martin v. Löwis

>> astgen.py is not used to process asdl files; ast.txt lives right
>> next to astgen.py. Instead, the asdl file is processed by
>> Parser/asdl_c.py.
> 
> Yes, I know that. That's why I asked about the relation between
> ast.txt and Python.adsl. If internally the parser uses the .adsl, but
> expose as a reflection mechanism things that were generated from
> ast.txt, then there could be a mismatch. Where does ast.txt comes
> from ? Shouldn't it be generated itself from Python.adsl ?

What you may not be aware of is that Tools/compiler (and the
compiler package that it builds on) are both unused and unmaintained.

If the package stops working correctly - tough luck.

> So we would have
> 
> Python.adsl > ast.txt  astgen.py --->  ast.py
> containing all the UnarySub, Expression, classes that represents a
> Python AST.

No - what actually happens in Python 3.x is this: both the compiler
package and Tools/compiler are removed.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] GIL required for _all_ Python calls?

2010-01-07 Thread Martin v. Löwis

>> I don't think that's possible. The regex engine can also operate on
>> objects whose representation may move in memory when you don't hold
>> the GIL (e.g. buffers that get mutated).
> 
> Why is it a problem? If we get a buffer through the new buffer API, the object
> should ensure that the representation isn't moved away until the buffer is 
> released.

In 2.7, we currently get the buffer with bf_getreadbuffer. In 3.x, we have

/* Release the buffer immediately --- possibly dangerous
   but doing something else would require some re-factoring
*/
PyBuffer_Release(&view);


Even if we do use the new API, and correctly, it still might be
confusing if the contents of the buffer changes underneath.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] relation between Python.asdl and Tools/compiler/ast.txt

2010-01-07 Thread Yoann Padioleau

On Jan 7, 2010, at 12:31 PM, Martin v. Löwis wrote:

>> I would like to use astgen.py to generate python classes corresponding to 
>> the 
>> AST of something I have defined in a .asdl file, along the line of what is
>> apparently done for the python AST itself. I thought astgen.py would
>> take as an argument a .asdl file, but apparently it instead process a file
>> called ast.txt. Where does this file come from ? Is it generated from
>> Python.asdl ?
> 
> astgen.py is not used to process asdl files; ast.txt lives right next to
> astgen.py. Instead, the asdl file is processed by Parser/asdl_c.py.

Yes, I know that. That's why I asked about the relation between ast.txt and 
Python.adsl.
If internally the parser uses the .adsl, but expose as a reflection mechanism 
things
that were generated from ast.txt, then there could be a mismatch. Where does 
ast.txt comes from ? Shouldn't it be generated itself from Python.adsl ?

So we would have

Python.adsl > ast.txt  astgen.py --->  ast.py containing all 
the UnarySub, Expression, classes that represents a Python AST.

> 
> HTH,
> Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Backported faster RLock to Python 2.6.

2010-01-07 Thread Johan Gill


On 01/07/2010 01:23 PM, Nick Coghlan wrote:

As Simon pointed out, while some organisations do work that way, the PSF
isn't one of them.

The PSF only requires that the code be contributed under a license that
then allows us to turn around and redistribute it under a different open
source license without requesting additional permission from the
copyright holder. For corporate contributions, I believe the contributor
agreement needs to be signed by an authorised agent of the company - the
place to check that would probably be p...@python.org (that's the email
address for the PSF board).

Assuming the subject line relates to the code that you would like to
contribute though, that particular change is unlikely to happen - 2.6 is
in maintenance mode and changing RLock from a Python implementation to
the faster C one is solidly in new feature territory. Although a
backport of the 3.2 C RLock implementation to 2.7 could be useful, I
doubt that backporting code provided by an existing committer would be
the subject of this query :)

Regards,
Nick.

   

Yes, it is the new RLock implementation.
If I understood this correctly, we should make a patch against trunk if 
anything should be contributed.
Do you mean that we wouldn't need the paperwork for backporting the 
original patch committed to py3k?


Regards
Johan Gill
Agama Technologies

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] GIL required for _all_ Python calls?

2010-01-07 Thread Martin v. Löwis

> I've been wondering whether it's possible to release the GIL in the
> regex engine during matching.

Ok, here is another problem: SRE_OP_REPEAT uses PyObject_MALLOC,
which requires the GIL (it then also may call PyErr_NoMemory,
which also requires the GIL).

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] GIL required for _all_ Python calls?

2010-01-07 Thread Antoine Pitrou

Martin v. Löwis  v.loewis.de> writes:
> 
> I don't think that's possible. The regex engine can also operate on
> objects whose representation may move in memory when you don't hold
> the GIL (e.g. buffers that get mutated).

Why is it a problem? If we get a buffer through the new buffer API, the object
should ensure that the representation isn't moved away until the buffer is 
released.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Backported faster RLock to Python 2.6.

2010-01-07 Thread Nick Coghlan

Johan Gill wrote:
> Yes, it is the new RLock implementation.
> If I understood this correctly, we should make a patch against trunk if
> anything should be contributed.

Yep.

> Do you mean that we wouldn't need the paperwork for backporting the
> original patch committed to py3k?

Whether or not a contributor agreement was essential for this particular
contribution would depend on how much new code was needed for the
backport, but the bulk of the copyright on the C RLock code would remain
with Antoine regardless.

However, sorting through the legalities of the contributor agreement
really is the best way to make sure every is squared away nice and
neatly from a legal point of view.

After all, even if I was a lawyer (which I'm not, I'm just a developer
with an interest in licensing issues), I still wouldn't be *your* lawyer :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] GIL required for _all_ Python calls?

2010-01-07 Thread Martin v. Löwis

>>> I've been wondering whether it's possible to release the GIL in the
>>> regex engine during matching.
>>
>> I don't think that's possible. The regex engine can also operate on
>> objects whose representation may move in memory when you don't hold
>> the GIL (e.g. buffers that get mutated). Even if they stay in place -
>> if their contents changes, regex results may be confusing.
> 
> It seems probably worthwhile to optimize for the common case of using
> the regexp engine on an immutable object of type "str" or "bytes", and
> allow releasing the GIL in *that* case, even if you have to keep it for
> the general case.

Right. This problem was the one that I thought of first.

Thinking about these things is fairly difficult (to me, at least), so
I think I could only tell whether I would consider a patch thread-safe
that released the GIL around matching under selected circumstances -
if I had the patch available. I don't see any obvious reason (assuming
Guido's list of conditions holds - i.e. you are holding references to
everything you access).

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] GIL required for _all_ Python calls?

2010-01-07 Thread James Y Knight


On Jan 7, 2010, at 3:27 PM, Martin v. Löwis wrote:


I've been wondering whether it's possible to release the GIL in the
regex engine during matching.


I don't think that's possible. The regex engine can also operate on
objects whose representation may move in memory when you don't hold
the GIL (e.g. buffers that get mutated). Even if they stay in place -
if their contents changes, regex results may be confusing.


It seems probably worthwhile to optimize for the common case of using  
the regexp engine on an immutable object of type "str" or "bytes", and  
allow releasing the GIL in *that* case, even if you have to keep it  
for the general case.


James
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] relation between Python.asdl and Tools/compiler/ast.txt

2010-01-07 Thread Martin v. Löwis

> I would like to use astgen.py to generate python classes corresponding to the 
> AST of something I have defined in a .asdl file, along the line of what is
> apparently done for the python AST itself. I thought astgen.py would
> take as an argument a .asdl file, but apparently it instead process a file
> called ast.txt. Where does this file come from ? Is it generated from
> Python.asdl ?

astgen.py is not used to process asdl files; ast.txt lives right next to
astgen.py. Instead, the asdl file is processed by Parser/asdl_c.py.

HTH,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] GIL required for _all_ Python calls?

2010-01-07 Thread Martin v. Löwis

> I've been wondering whether it's possible to release the GIL in the
> regex engine during matching.

I don't think that's possible. The regex engine can also operate on
objects whose representation may move in memory when you don't hold
the GIL (e.g. buffers that get mutated). Even if they stay in place -
if their contents changes, regex results may be confusing.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] GIL required for _all_ Python calls?

2010-01-07 Thread Martin v. Löwis

>> A better rule would be "you may access the memory buffer in a PyString
>> or PyUnicode object with the GIL released as long as you own a
>> reference to the string object." Everything else is out of bounds (or
>> not worth the bother).
> 
> Is that a "yes" regarding the OP's original question about releasing the
> GIL during regexp searches?

No, because the regex engine may also operate on buffers that start
moving around when you release the GIL.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Question over splitting unittest into a package

2010-01-07 Thread Olemis Lang

On Mon, Jan 4, 2010 at 9:24 AM, Olemis Lang  wrote:
> On Thu, Dec 31, 2009 at 10:30 AM, Martin (gzlist)  
> wrote:
>> Thanks for the quick response.
>>
>> On 30/12/2009, Benjamin Peterson  wrote:
>>
>> but maybe a
>> discussion could start about a new, less hacky, way of doing the same
>>
>
> I am strongly -1 for modifying the classes in «traditional» unittest
> module [2]_ (except that I am strongly +1 for the package structure
> WITHOUT TOUCHING anything else ...) , and the more I think about it I
> am more convinced ... but anyway, this not a big deal (and in the end
> what I think is not that relevant either ... :o). So ...
>

IOW, if I had all the freedom to implement it, after the pkg structure
I'd do something like :

{{{
#!python

class TestResult:
# Everything just the same
def _is_relevant_tb_level(self, tb):
return '__unittest' in tb.tb_frame.f_globals

class BetterTestResult(TestResult):
# Further code ... maybe ;o)
#
def _is_relevant_tb_level(self, tb):
# This or anything else you might want to do ;o)
#
globs = tb.tb_frame.f_globals
is_relevant =  '__name__' in globs and \
globs["__name__"].startswith("unittest")
del globs
return is_relevant
}}}

that's what inheritance is for ;o) ... but quite probably that's not
gonna happen, just a comment .

-- 
Regards,

Olemis.

Blog ES: http://simelo-es.blogspot.com/
Blog EN: http://simelo-en.blogspot.com/

Featured article:
Ubuntu sustituye GIMP por F-Spot  -
http://feedproxy.google.com/~r/simelo-es/~3/-g48D6T6Ojs/ubuntu-sustituye-gimp-por-f-spot.html
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] GIL required for _all_ Python calls?

2010-01-07 Thread Antoine Pitrou

MRAB  mrabarnett.plus.com> writes:
> 
> I know that it needs to have the GIL during memory-management calls, but
> does it for calls like Py_UNICODE_TOLOWER or PyErr_SetString? Is there
> an easy way to find out?

There is no "easy way" to do so. The only safe way is to examine all the
functions or macros you want to call with the GIL released, and assess whether
it is safe to call them. As already pointed out, no reference count should be
changed, and generally no mutable container should be accessed, except if that
container is known not to be referenced anywhere else (that would be the case
for e.g. a list that your function has created and is busy populating).

I agree that releasing the GIL when doing non-trivial regex searches is a
worthwhile research, so please don't give up immediately :-)

Regards

Antoine Pitrou.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] GIL required for _all_ Python calls?

2010-01-07 Thread Stefan Behnel


MRAB, 07.01.2010 04:07:

I've been wondering whether it's possible to release the GIL in the
regex engine during matching.

I know that it needs to have the GIL during memory-management calls, but
does it for calls like Py_UNICODE_TOLOWER


Py_UNICODE_TOLOWER looks safe to me at first glance.



or PyErr_SetString?


Certainly not safe.



Is there an easy way to find out?


Release it and fix any crashes? Note that this isn't a safe solution, 
though, as some GIL requiring code may be platform specific. So a better 
approach might be to extract any obviously problematic stuff from the 
existing code (such as any exception handling, explicit ref-counting or 
object creation), and *then* try to release the GIL.


Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Backported faster RLock to Python 2.6.

2010-01-07 Thread Lennart Regebro

On Thu, Jan 7, 2010 at 14:15, Michael Foord  wrote:
> (i.e. copyright and ownership are legal terms that don't necessarily mean
> anything *practical* in these situations.)

OK, fair enough. :-)
-- 
Lennart Regebro: Python, Zope, Plone, Grok
http://regebro.wordpress.com/
+33 661 58 14 64
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] GIL required for _all_ Python calls?

2010-01-07 Thread Stefan Behnel


Guido van Rossum, 07.01.2010 05:29:

A better rule would be "you may access the memory buffer in a PyString
or PyUnicode object with the GIL released as long as you own a
reference to the string object." Everything else is out of bounds (or
not worth the bother).


Is that a "yes" regarding the OP's original question about releasing the 
GIL during regexp searches?


Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Backported faster RLock to Python 2.6.

2010-01-07 Thread Michael Foord


On 07/01/2010 13:11, Lennart Regebro wrote:

On Thu, Jan 7, 2010 at 13:23, Nick Coghlan  wrote:
   

As Simon pointed out, while some organisations do work that way, the PSF
isn't one of them.

The PSF only requires that the code be contributed under a license that
then allows us to turn around and redistribute it under a different open
source license without requesting additional permission from the
copyright holder.
 

Even if the contributed code as in this case is a method in an
existing file? How does that work, how do they keep ownership of one
method in the threading.py method? :-)

   


When contributing code to Python all work remains copyright the original 
author. The combined work is copyright *everyone*. The PSF has a license 
to distribute it, which is all that is important.


How you meaningfully exercise your ownership over chunks of code is left 
for the reader to determine...


(i.e. copyright and ownership are legal terms that don't necessarily 
mean anything *practical* in these situations.)


Michael


--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Backported faster RLock to Python 2.6.

2010-01-07 Thread Lennart Regebro

On Thu, Jan 7, 2010 at 13:23, Nick Coghlan  wrote:
> As Simon pointed out, while some organisations do work that way, the PSF
> isn't one of them.
>
> The PSF only requires that the code be contributed under a license that
> then allows us to turn around and redistribute it under a different open
> source license without requesting additional permission from the
> copyright holder.

Even if the contributed code as in this case is a method in an
existing file? How does that work, how do they keep ownership of one
method in the threading.py method? :-)

> Assuming the subject line relates to the code that you would like to
> contribute though, that particular change is unlikely to happen - 2.6 is
> in maintenance mode and changing RLock from a Python implementation to
> the faster C one is solidly in new feature territory. Although a
> backport of the 3.2 C RLock implementation to 2.7 could be useful, I
> doubt that backporting code provided by an existing committer would be
> the subject of this query :)

Ah. I probably misunderstood what the suggested contribution was.
Maybe it was a separate file, which I didn't get.

-- 
Lennart Regebro: Python, Zope, Plone, Grok
http://regebro.wordpress.com/
+33 661 58 14 64
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Backported faster RLock to Python 2.6.

2010-01-07 Thread Nick Coghlan

Lennart Regebro wrote:
> On Thu, Jan 7, 2010 at 10:46, Johan Gill  wrote:
>> Hi devs,
>> the company where I work has done some work on Python, and the question is
>> how this work, owned by the company, can be contributed to the community
>> properly. Are there any license issues or other pitfalls we need to think
>> about? I imagine that other companies have contributed before, so this is
>> probably an already solved problem.
> 
> I'm not a license lawyer, but typically your company needs to give the
> code to the community. Yes, it means it stops owning it.

As Simon pointed out, while some organisations do work that way, the PSF
isn't one of them.

The PSF only requires that the code be contributed under a license that
then allows us to turn around and redistribute it under a different open
source license without requesting additional permission from the
copyright holder. For corporate contributions, I believe the contributor
agreement needs to be signed by an authorised agent of the company - the
place to check that would probably be p...@python.org (that's the email
address for the PSF board).

Assuming the subject line relates to the code that you would like to
contribute though, that particular change is unlikely to happen - 2.6 is
in maintenance mode and changing RLock from a Python implementation to
the faster C one is solidly in new feature territory. Although a
backport of the 3.2 C RLock implementation to 2.7 could be useful, I
doubt that backporting code provided by an existing committer would be
the subject of this query :)

Regards,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] test_ctypes failure on AIX 5.3 using python-2.6.2 and libffi-3.0.9

2010-01-07 Thread swamy sangamesh

Hi All,

I built the python-2.6.2 with the latest libffi-3.0.9 in AIX 5.3 using xlc
compiler.
When i try to run the ctypes test cases, two failures are seen in
test_bitfields.

*test_ints (ctypes.test.test_bitfields.C_Test) ... FAIL
test_shorts (ctypes.test.test_bitfields.C_Test) ... FAIL*

I have attached the full test case result.

If i change the type c_int and c_short to c_unit and c_ushort of class
"BITS(Structure)" in file
test_bitfields.py then no failures are seen.

Has anyone faced the similar issue or any help is appreciated.


Thanks,
Sangamesh


ctype-testcases
Description: Binary data
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Backported faster RLock to Python 2.6.

2010-01-07 Thread Simon Cross

On Thu, Jan 7, 2010 at 1:12 PM, Lennart Regebro  wrote:
> I'm not a license lawyer, but typically your company needs to give the
> code to the community. Yes, it means it stops owning it.

This is incorrect.

The correct information is at http://www.python.org/psf/contrib/.

Schiavo
Simon
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Backported faster RLock to Python 2.6.

2010-01-07 Thread Lennart Regebro

On Thu, Jan 7, 2010 at 10:46, Johan Gill  wrote:
> Hi devs,
> the company where I work has done some work on Python, and the question is
> how this work, owned by the company, can be contributed to the community
> properly. Are there any license issues or other pitfalls we need to think
> about? I imagine that other companies have contributed before, so this is
> probably an already solved problem.

I'm not a license lawyer, but typically your company needs to give the
code to the community. Yes, it means it stops owning it.

-- 
Lennart Regebro: Python, Zope, Plone, Grok
http://regebro.wordpress.com/
+33 661 58 14 64
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Backported faster RLock to Python 2.6.

2010-01-07 Thread Johan Gill


Hi devs,
the company where I work has done some work on Python, and the question 
is how this work, owned by the company, can be contributed to the 
community properly. Are there any license issues or other pitfalls we 
need to think about? I imagine that other companies have contributed 
before, so this is probably an already solved problem.


Regards
Johan Gill
Agama Technologies

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] relation between Python.asdl and Tools/compiler/ast.txt

2010-01-07 Thread Yoann Padioleau

Hi,

I would like to use astgen.py to generate python classes corresponding to the 
AST of something I have defined in a .asdl file, along the line of what is
apparently done for the python AST itself. I thought astgen.py would
take as an argument a .asdl file, but apparently it instead process a file
called ast.txt. Where does this file come from ? Is it generated from
Python.asdl ?

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

Re: [Python-Dev] --enabled-shared broken on freebsd5?

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

Re: [Python-Dev] --enabled-shared broken on freebsd5?

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

[Python-Dev] Improve open() to support reading file starting with an unicode BOM

Re: [Python-Dev] relation between Python.asdl and Tools/compiler/ast.txt

Re: [Python-Dev] GIL required for _all_ Python calls?

Re: [Python-Dev] relation between Python.asdl and Tools/compiler/ast.txt

Re: [Python-Dev] Backported faster RLock to Python 2.6.

Re: [Python-Dev] GIL required for _all_ Python calls?

Re: [Python-Dev] GIL required for _all_ Python calls?

Re: [Python-Dev] Backported faster RLock to Python 2.6.

Re: [Python-Dev] GIL required for _all_ Python calls?

Re: [Python-Dev] GIL required for _all_ Python calls?

Re: [Python-Dev] relation between Python.asdl and Tools/compiler/ast.txt

Re: [Python-Dev] GIL required for _all_ Python calls?

Re: [Python-Dev] GIL required for _all_ Python calls?

Re: [Python-Dev] Question over splitting unittest into a package

Re: [Python-Dev] GIL required for _all_ Python calls?

Re: [Python-Dev] GIL required for _all_ Python calls?

Re: [Python-Dev] Backported faster RLock to Python 2.6.

Re: [Python-Dev] GIL required for _all_ Python calls?

Re: [Python-Dev] Backported faster RLock to Python 2.6.

Re: [Python-Dev] Backported faster RLock to Python 2.6.

Re: [Python-Dev] Backported faster RLock to Python 2.6.

[Python-Dev] test_ctypes failure on AIX 5.3 using python-2.6.2 and libffi-3.0.9

Re: [Python-Dev] Backported faster RLock to Python 2.6.

Re: [Python-Dev] Backported faster RLock to Python 2.6.

[Python-Dev] Backported faster RLock to Python 2.6.

[Python-Dev] relation between Python.asdl and Tools/compiler/ast.txt

35 matches

Site Navigation

Mail list logo

Footer information