Re: [Python-Dev] cpython: Implement PEP 393.

2011-09-28 Thread Georg Brandl
Am 28.09.2011 08:35, schrieb martin.v.loewis:
 http://hg.python.org/cpython/rev/8beaa9a37387
 changeset:   72475:8beaa9a37387
 user:Martin v. Löwis mar...@v.loewis.de
 date:Wed Sep 28 07:41:54 2011 +0200
 summary:
   Implement PEP 393.
 
[...]
 
 diff --git a/Doc/c-api/unicode.rst b/Doc/c-api/unicode.rst
 --- a/Doc/c-api/unicode.rst
 +++ b/Doc/c-api/unicode.rst
 @@ -1072,6 +1072,15 @@
 occurred and an exception has been set.
  
  
 +.. c:function:: Py_ssize_t PyUnicode_FindChar(PyObject *str, Py_UCS4 ch, 
 Py_ssize_t start, Py_ssize_t end, int direction)
 +
 +   Return the first position of the character *ch* in ``str[start:end]`` 
 using
 +   the given *direction* (*direction* == 1 means to do a forward search,
 +   *direction* == -1 a backward search).  The return value is the index of 
 the
 +   first match; a value of ``-1`` indicates that no match was found, and 
 ``-2``
 +   indicates that an error occurred and an exception has been set.
 +
 +
  .. c:function:: Py_ssize_t PyUnicode_Count(PyObject *str, PyObject *substr, 
 Py_ssize_t start, Py_ssize_t end)
  
 Return the number of non-overlapping occurrences of *substr* in

This is the only doc change for this change (and it doesn't have a 
versionadded).

Surely there must be more new APIs and changes that need documenting?

Georg

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cpython: Implement PEP 393.

2011-09-28 Thread Martin v. Löwis
 Surely there must be more new APIs and changes that need documenting?

Correct. All documentation still needs to be written.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PEP 393 merged

2011-09-28 Thread Martin v. Löwis
I have now merged the PEP 393 implementation into default.
The main missing piece is the documentation; contributions are
welcome.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unittest missing assertNotRaises

2011-09-28 Thread Wilfred Hughes
On 27 September 2011 19:59, Laurens Van Houtven _...@lvh.cc wrote:
 Sure, you just *do* it. The only advantage I see in assertNotRaises is that 
 when that exception is raised, you should (and would) get a failure, not an 
 error.

It's a useful distinction. I have found myself writing code of the form:

def test_old_exception_no_longer_raised(self):
try:
do_something():
except OldException:
self.assertTrue(False)

in order to distinguish between a regression and something new
erroring. The limitation of this pattern is that the test failure
message is not as good.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unittest missing assertNotRaises

2011-09-28 Thread Oleg Broytman
On Wed, Sep 28, 2011 at 09:43:13AM +1000, Steven D'Aprano wrote:
 Oleg Broytman wrote:
 On Tue, Sep 27, 2011 at 07:46:52PM +0100, Wilfred Hughes wrote:
 +def assertNotRaises(self, excClass, callableObj=None, *args, **kwargs):
 +Fail if an exception of class excClass is thrown by
 +callableObj when invoked with arguments args and keyword
 +arguments kwargs.
 ++
 +try:
 +callableObj(*args, **kwargs)
 +except excClass:
 +raise self.failureException(%s was raised % excClass)
 ++

 But I can't see this being a useful test.

   Me too.

Oleg.
-- 
 Oleg Broytmanhttp://phdru.name/p...@phdru.name
   Programmers don't die, they just GOSUB without RETURN.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unittest missing assertNotRaises

2011-09-28 Thread Michael Foord

On 27/09/2011 19:46, Wilfred Hughes wrote:

Hi folks

I wasn't sure if this warranted a bug in the tracker, so I thought I'd 
raise it here first.


unittest has assertIn, assertNotIn, assertEqual, assertNotEqual and so 
on. So, it seems odd to me that there isn't assertNotRaises. Is there 
any particular motivation for not putting it in?


I've attached a simple patch against Python 3's trunk to give an idea 
of what I have in mind.




As others have said, the opposite of assertRaises is just calling the code!

I have several times needed regression tests that call code that *used* 
to raise an exception. It can look slightly odd to have a test without 
an assert, but the singular uselessness of assertNotRaises does not make 
it a better alternative. I usually add a comment:


def test_something_that_used_to_not_work(self):
# this used to raise an exception
do_something()

All the best,

Michael Foord


Thanks
Wilfred


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk



--
http://www.voidspace.org.uk/

May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unittest missing assertNotRaises

2011-09-28 Thread Michael Foord

On 27/09/2011 19:59, Laurens Van Houtven wrote:
Sure, you just *do* it. The only advantage I see in assertNotRaises is 
that when that exception is raised, you should (and would) get a 
failure, not an error.
There are some who don't see the distinction between a failure and an 
error as a useful distinction... I'm becoming more sympathetic to that view.


All the best,

Michael




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk



--
http://www.voidspace.org.uk/

May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] range objects in 3.x

2011-09-28 Thread Greg Ewing

Ethan Furman wrote:


Well, actually, I'd be using it with dates.  ;)


Seems to me that one size isn't going to fit all.

Maybe we really want two functions:

   interpolate(start, end, count)
   Requires a type supporting addition and division,
   designed to work predictably and accurately with
   floats

   extrapolate(start, step, end)
   Works for any type supporting addition, not
   recommended for floats

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Heads up: Apple llvm gcc 4.2 miscompiles PEP 393

2011-09-28 Thread martin
The gcc that Apple ships with the Lion SDK (not sure what Xcode  
version that is)
miscompiles Python now. I've reported this to Apple as bug 10143715;  
not sure whether

there is a public link to this bug report.

In essence, the code

typedef struct {
long length;
long hash;
int state;
int *wstr;
} PyASCIIObject;

typedef struct {
PyASCIIObject _base;
long utf8_length;

char *utf8;
long wstr_length;

} PyCompactUnicodeObject;

void *_PyUnicode_compact_data(void *unicode) {
return PyASCIIObject*)unicode)-state  0x20) ?
((void*)((PyASCIIObject*)(unicode) + 1)) :
((void*)((PyCompactUnicodeObject*)(unicode) + 1)));
}

miscompiles (with -O2 -fomit-frame-pointer) to


__PyUnicode_compact_data:
Leh_func_begin1:
leaq32(%rdi), %rax
ret

The compiler version is

gcc version 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)

This unconditionally assumes that sizeof(PyASCIIObject) needs to be
added to unicode, independent of whether the state bit is set or not.

I'm not aware of a work-around in the code. My work-around is to use gcc-4.0,
which is still available on my system from an earlier Xcode installation
(in /Developer-3.2.6)

Regards,
Martin


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Heads up: Apple llvm gcc 4.2 miscompiles PEP 393

2011-09-28 Thread Xavier Morel
On 2011-09-28, at 13:24 , mar...@v.loewis.de wrote:
 The gcc that Apple ships with the Lion SDK (not sure what Xcode version that 
 is)
Xcode 4.1

 I'm not aware of a work-around in the code. My work-around is to use gcc-4.0,
 which is still available on my system from an earlier Xcode installation
 (in /Developer-3.2.6)
Does Clang also fail to compile this? Clang was updated from 1.6 to 2.0 with 
Xcode 4, worth a try.

Also, from your version listing it seems to be llvm-gcc (gcc frontend with llvm 
backend I think), is there no more straight gcc (with gcc frontend and backend)?

FWIW, on 10.6 the default gcc is a straight 4.2

 gcc --version
i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5664)

There is an llvm-gcc 4.2 but it uses a slightly different revision of llvm

 llvm-gcc --version
   
i686-apple-darwin10-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 
5658) (LLVM build 2333.4)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unittest missing assertNotRaises

2011-09-28 Thread Laurens Van Houtven
Oops, I accidentally hit Reply instead of Reply to All...

On Wed, Sep 28, 2011 at 1:05 PM, Michael Foord fuzzy...@voidspace.org.ukwrote:

  On 27/09/2011 19:59, Laurens Van Houtven wrote:

 Sure, you just *do* it. The only advantage I see in assertNotRaises is that
 when that exception is raised, you should (and would) get a failure, not an
 error.

 There are some who don't see the distinction between a failure and an error
 as a useful distinction... I'm becoming more sympathetic to that view.


I agree. Maybe if there were less failures posing as errors and errors
posing as failures, I'd consider taking the distinction seriously.

The only use case I've personally encountered is with fuzzy tests. The
example that comes to mind is one where we had a fairly complex iterative
algorithm for learning things from huge amounts of test data and there were
certain criteria (goodness of result, time taken) that had to be satisfied.
In that case, it blew up because someone messed up dependencies and it
took 3% longer than is allowable  are pretty obviously different...
Considering how exotic that use case is, like I said, I'm not really
convinced how generally useful it is :) especially since this isn't even a
unit test...



 All the best,

 Michael


cheers
lvh
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 393 merged

2011-09-28 Thread Guido van Rossum
Congrats! Python 3.3 will be better because of this.

On Wed, Sep 28, 2011 at 12:48 AM, Martin v. Löwis mar...@v.loewis.de wrote:
 I have now merged the PEP 393 implementation into default.
 The main missing piece is the documentation; contributions are
 welcome.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 393 close to pronouncement

2011-09-28 Thread M.-A. Lemburg
Guido van Rossum wrote:
 Given the feedback so far, I am happy to pronounce PEP 393 as
 accepted. Martin, congratulations! Go ahead and mark ity as Accepted.
 (But please do fix up the small nits that Victor reported in his
 earlier message.)

I've been working on feedback for the last few days, but I guess it's
too late. Here goes anyway...

I've only read the PEP and not followed the discussion due to lack of
time, so if any of this is no longer valid, that's probably because
the PEP wasn't updated :-)

Resizing


Codecs use resizing a lot. Given that PyCompactUnicodeObject
does not support resizing, most decoders will have to use
PyUnicodeObject and thus not benefit from the memory footprint
advantages of e.g. PyASCIIObject.


Data structure
--

The data structure description in the PEP appears to be wrong:

PyASCIIObject has a wchar_t *wstr pointer - I guess this should
be a char *str pointer, otherwise, where's the memory footprint
advantage (esp. on Linux where sizeof(wchar_t) == 4) ?

I also don't see a reason to limit the UCS1 storage version
to ASCII. Accordingly, the object should be called PyLatin1Object
or PyUCS1Object.

Here's the version from the PEP:


typedef struct {
  PyObject_HEAD
  Py_ssize_t length;
  Py_hash_t hash;
  struct {
  unsigned int interned:2;
  unsigned int kind:2;
  unsigned int compact:1;
  unsigned int ascii:1;
  unsigned int ready:1;
  } state;
  wchar_t *wstr;
} PyASCIIObject;

typedef struct {
  PyASCIIObject _base;
  Py_ssize_t utf8_length;
  char *utf8;
  Py_ssize_t wstr_length;
} PyCompactUnicodeObject;


Typedef'ing Py_UNICODE to wchar_t and using wchar_t in existing
code will cause problems on some systems where whcar_t is a
signed type.

Python assumes that Py_UNICODE is unsigned and thus doesn't
check for negative values or takes these into account when
doing range checks or code point arithmetic.

On such platform where wchar_t is signed, it is safer to
typedef Py_UNICODE to unsigned wchar_t.

Accordingly and to prevent further breakage, Py_UNICODE
should not be deprecated and used instead of wchar_t
throughout the code.


Length information
--

Py_UNICODE access to the objects assumes that len(obj) ==
length of the Py_UNICODE buffer. The PEP suggests that length
should not take surrogates into account on UCS2 platforms
such as Windows. The causes len(obj) to not match len(wstr).

As a result, Py_UNICODE access to the Unicode objects breaks
when surrogate code points are present in the Unicode object
on UCS2 platforms.

The PEP also does not explain how lone surrogates will be
handled with respect to the length information.

Furthermore, determining len(obj) will require a loop over
the data, checking for surrogate code points. A simple memcpy()
is no longer enough.

I suggest to drop the idea of having len(obj) not count
wstr surrogate code points to maintain backwards compatibility
and allow for working with lone surrogates.

Note that the whole surrogate debate does not have much to
do with this PEP, since it's mainly about memory footprint
savings. I'd also urge to do a reality check with respect
to surrogates and non-BMP code points: in practice you only
very rarely see any non-BMP code points in your data. Making
all Python users pay for the needs of a tiny fraction is
not really fair. Remember: practicality beats purity.


API
---

Victor already described the needed changes.


Performance
---

The PEP only lists a few low-level benchmarks as basis for the
performance decrease. I'm missing some more adequate real-life
tests, e.g. using an application framework such as Django
(to the extent this is possible with Python3) or a server
like the Radicale calendar server (which is available for Python3).

I'd also like to see a performance comparison which specifically
uses the existing Unicode APIs to create and work with Unicode
objects. Most extensions will use this way of working with the
Unicode API, either because they want to support Python 2 and 3,
or because the effort it takes to port to the new APIs is
too high. The PEP makes some statements that this is slower,
but doesn't quantify those statements.


Memory savings
--

The table only lists string sizes up 8 code points. The memory
savings for these are really only significant for ASCII
strings on 64-bit platforms, if you use the default UCS2
Python build as basis.

For larger strings, I expect the savings to be more significant.
OTOH, a single non-BMP code point in such a string would cause
the savings to drop significantly again.


Complexity
--

In order to benefit from the new API, any code that has to
deal with low-level Py_UNICODE access to the Unicode objects
will have to be adapted.

For best performance, each algorithm will have to be implemented
for all three storage types.

Not doing so, will result in a slow-down, if I read the PEP
correctly. It's difficult to say, of what scale, since that
information 

Re: [Python-Dev] PEP 393 close to pronouncement

2011-09-28 Thread Benjamin Peterson
2011/9/28 M.-A. Lemburg m...@egenix.com:
 Guido van Rossum wrote:
 Given the feedback so far, I am happy to pronounce PEP 393 as
 accepted. Martin, congratulations! Go ahead and mark ity as Accepted.
 (But please do fix up the small nits that Victor reported in his
 earlier message.)

 I've been working on feedback for the last few days, but I guess it's
 too late. Here goes anyway...

 I've only read the PEP and not followed the discussion due to lack of
 time, so if any of this is no longer valid, that's probably because
 the PEP wasn't updated :-)

 Resizing
 

 Codecs use resizing a lot. Given that PyCompactUnicodeObject
 does not support resizing, most decoders will have to use
 PyUnicodeObject and thus not benefit from the memory footprint
 advantages of e.g. PyASCIIObject.


 Data structure
 --

 The data structure description in the PEP appears to be wrong:

 PyASCIIObject has a wchar_t *wstr pointer - I guess this should
 be a char *str pointer, otherwise, where's the memory footprint
 advantage (esp. on Linux where sizeof(wchar_t) == 4) ?

 I also don't see a reason to limit the UCS1 storage version
 to ASCII. Accordingly, the object should be called PyLatin1Object
 or PyUCS1Object.

I think the purpose is that if it's only ASCII, no work is need to
encode to UTF-8.


-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 393 close to pronouncement

2011-09-28 Thread Martin v. Löwis
 Codecs use resizing a lot. Given that PyCompactUnicodeObject
 does not support resizing, most decoders will have to use
 PyUnicodeObject and thus not benefit from the memory footprint
 advantages of e.g. PyASCIIObject.

No, codecs have been rewritten to not use resizing.

 PyASCIIObject has a wchar_t *wstr pointer - I guess this should
 be a char *str pointer, otherwise, where's the memory footprint
 advantage (esp. on Linux where sizeof(wchar_t) == 4) ?

That's the Py_UNICODE representation for backwards compatibility.
It's normally NULL.

 I also don't see a reason to limit the UCS1 storage version
 to ASCII. Accordingly, the object should be called PyLatin1Object
 or PyUCS1Object.

No, in the ASCII case, the UTF-8 length can be shared with the regular
string length - not so for Latin-1 character above 127.

 Typedef'ing Py_UNICODE to wchar_t and using wchar_t in existing
 code will cause problems on some systems where whcar_t is a
 signed type.
 
 Python assumes that Py_UNICODE is unsigned and thus doesn't
 check for negative values or takes these into account when
 doing range checks or code point arithmetic.
 
 On such platform where wchar_t is signed, it is safer to
 typedef Py_UNICODE to unsigned wchar_t.

No. Py_UNICODE values *must* be in the range 0..17*2**16.
Values larger than 17*2**16 are just as bad as negative
values, so having Py_UNICODE unsigned doesn't improve
anything.

 Py_UNICODE access to the objects assumes that len(obj) ==
 length of the Py_UNICODE buffer. The PEP suggests that length
 should not take surrogates into account on UCS2 platforms
 such as Windows. The causes len(obj) to not match len(wstr).

Correct.

 As a result, Py_UNICODE access to the Unicode objects breaks
 when surrogate code points are present in the Unicode object
 on UCS2 platforms.

Incorrect. What specifically do you think would break?

 The PEP also does not explain how lone surrogates will be
 handled with respect to the length information.

Just as any other code point. Python does not special-case
surrogate code points anymore.

 Furthermore, determining len(obj) will require a loop over
 the data, checking for surrogate code points. A simple memcpy()
 is no longer enough.

No, it won't. The length of the Unicode object is stored in
the length field.

 I suggest to drop the idea of having len(obj) not count
 wstr surrogate code points to maintain backwards compatibility
 and allow for working with lone surrogates.

Backwards-compatibility is fully preserved by PyUnicode_GET_SIZE
returning the size of the Py_UNICODE buffer. PyUnicode_GET_LENGTH
returns the true length of the Unicode object.

 Note that the whole surrogate debate does not have much to
 do with this PEP, since it's mainly about memory footprint
 savings. I'd also urge to do a reality check with respect
 to surrogates and non-BMP code points: in practice you only
 very rarely see any non-BMP code points in your data. Making
 all Python users pay for the needs of a tiny fraction is
 not really fair. Remember: practicality beats purity.

That's the whole point of the PEP. You only pay for what
you actually need, and in most cases, it's ASCII.

 For best performance, each algorithm will have to be implemented
 for all three storage types.

This will be a trade-off. I think most developers will be happy
with a single version covering all three cases, especially as it's
much more maintainable.

Kind regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Heads up: Apple llvm gcc 4.2 miscompiles PEP 393

2011-09-28 Thread Martin v. Löwis
 Does Clang also fail to compile this? Clang was updated from 1.6 to 2.0 with 
 Xcode 4, worth a try.

clang indeed works fine.

 Also, from your version listing it seems to be llvm-gcc (gcc frontend with 
 llvm backend I think), 
 is there no more straight gcc (with gcc frontend and backend)?

/usr/bin/cc and /usr/bin/gcc both link to llvm-gcc-4.2. However, there
still is /usr/bin/gcc-4.2. Using that, Python also compiles correctly -
so I have changed the gcc link on my system.

Thanks for the advise - I didn't expect that Apple ships thhree compilers...

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Heads up: Apple llvm gcc 4.2 miscompiles PEP 393

2011-09-28 Thread Xavier Morel
On 2011-09-28, at 19:49 , Martin v. Löwis wrote:
 
 Thanks for the advise - I didn't expect that Apple ships thhree compilers…
Yeah I can understand that, they're in the middle of the transition but Clang 
is not quite there yet so...
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] What it takes to change a single keyword.

2011-09-28 Thread Yaşar Arabacı
Hi,

First of all, I am sincerely sorry if this is wrong mailing list to ask this
question. I checked out definitions of couple other mailing list, and this
one seemed most suitable. Here is my question:

Let's say I want to change a single keyword, let's say import keyword, to be
spelled as something else, like it's translation to my language. I guess it
would be more complicated than modifiying Grammar/Grammar, but I can't be
sure which files should get edited.

I'am asking this, because, I am trying to figure out if I could translate
keyword's into another language, without affecting behaviour of language.


-- 
http://yasar.serveblog.net/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] range objects in 3.x

2011-09-28 Thread Fernando Perez
On Tue, 27 Sep 2011 11:25:48 +1000, Steven D'Aprano wrote:

 The audience for numpy is a small minority of Python users, and they

Certainly, though I'd like to mention that scientific computing is a major 
success story for Python, so hopefully it's a minority with something to 
contribute wink

 tend to be more sophisticated. I'm sure they can cope with two functions
 with different APIs wink

No problem with having different APIs, but in that case I'd hope the 
builtin wouldnt' be named linspace, to avoid confusion.  In numpy/scipy we 
try hard to avoid collisions with existing builtin names, hopefully in 
this case we can prevent the reverse by having a dialogue.

 While continuity of API might be a good thing, we shouldn't accept a
 poor API just for the sake of continuity. I have some criticisms of the
 linspace API.
 
 numpy.linspace(start, stop, num=50, endpoint=True, retstep=False)
 
 http://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html
 
 * It returns a sequence, which is appropriate for numpy but in standard
 Python it should return an iterator or something like a range object.

Sure, no problem there.

 * Why does num have a default of 50? That seems to be an arbitrary
 choice.

Yup.  linspace was modeled after matlab's identically named command:

http://www.mathworks.com/help/techdoc/ref/linspace.html

but I have no idea why the author went with 50 instead of 100 as the 
default (not that 100 is any better, just that it was matlab's choice).  
Given how linspace is often used for plotting, 100 is arguably a more 
sensible choice to get reasonable graphs on normal-resolution displays at 
typical sizes, absent adaptive plotting algorithms.

 * It arbitrarily singles out the end point for special treatment. When
 integrating, it is just as common for the first point to be singular as
 the end point, and therefore needing to be excluded.

Numerical integration is *not* the focus of linspace(): in numerical 
integration, if an end point is singular you have an improper integral and 
*must* approach the singularity much more carefully than by simply 
dropping the last point and hoping for the best.  Whether you can get away 
by using (desired_end_point - very_small_number) --the dumb, naive 
approach-- or not depends a lot on the nature of the singularity.

Since numerical integration is a complex and specialized domain and the 
subject of an entire subcomponent of the (much bigger than numpy) scipy 
library, there's no point in arguing the linspace API based on numerical 
integration considerations.

Now, I *suspect* (but don't remember for sure) that the option to have it 
right-hand-open-ended was to match the mental model people have for range:

In [5]: linspace(0, 10, 10, endpoint=False)
Out[5]: array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])

In [6]: range(0, 10)
Out[6]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


I'm not arguing this was necessarily a good idea, just my theory on how it 
came to be.  Perhaps R. Kern or one of the numpy lurkers in here will 
pitch in with a better recollection.

 * If you exclude the end point, the stepsize, and hence the values
 returned, change:
 
   linspace(1, 2, 4)
 array([ 1.,  1.,  1.6667,  2.])
   linspace(1, 2, 4, endpoint=False)
 array([ 1.  ,  1.25,  1.5 ,  1.75])
 
 This surprises me. I expect that excluding the end point will just
 exclude the end point, i.e. return one fewer point. That is, I expect
 num to count the number of subdivisions, not the number of points.

I find it very natural.  It's important to remember that *the whole point* 
of linspace's existence is to provide arrays with a known, fixed number of 
points:

In [17]: npts = 10

In [18]: len(linspace(0, 5, npts))
Out[18]: 10

In [19]: len(linspace(0, 5, npts, endpoint=False))
Out[19]: 10

So the invariant to preserve is *precisely* the number of points, not the 
step size.  As Guido has pointed out several times, the value of this 
function is precisely to steer people *away* from thinking of step sizes 
in a context where they are more likely than not going to get it wrong.  
So linspace focuses on a guaranteed number of points, and lets the step 
size chips fall where they may.


 * The retstep argument changes the return signature from = array to =
 (array, number). I think that's a pretty ugly thing to do. If linspace
 returned a special iterator object, the step size could be exposed as an
 attribute.

Yup, it's not pretty but understandable in numpy's context, a library that 
has a very strong design focus around arrays, and numpy arrays don't have 
writable attributes:

In [20]: a = linspace(0, 10)

In [21]: a.stepsize = 0.1
---
AttributeErrorTraceback (most recent call last)
/home/fperez/ipython-input-21-ded7f1198857 in module()
 1 a.stepsize = 0.1

AttributeError: 'numpy.ndarray' object has no attribute 'stepsize'


So 

Re: [Python-Dev] Heads up: Apple llvm gcc 4.2 miscompiles PEP 393

2011-09-28 Thread Ned Deily
In article 74f6adfa-874d-4bac-b304-ce8b12d80...@masklinn.net,
 Xavier Morel catch-...@masklinn.net wrote:

 On 2011-09-28, at 19:49 , Martin v. Löwis wrote:
  
  Thanks for the advise - I didn't expect that Apple ships thhree compilersŠ
 Yeah I can understand that, they're in the middle of the transition but Clang 
 is not quite there yet so...

BTW, at the moment, we are still using gcc-4.2 (not gcc-llvm nor clang) 
from Xcode 3 on OS X 10.6 for the 64-bit/32-bit installer builds and 
gcc-4.0 on 10.5 for the 32-bit-only installer builds.  We will probably 
revisit that as we get closer to 3.3 alphas and betas.

-- 
 Ned Deily,
 n...@acm.org

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] range objects in 3.x

2011-09-28 Thread Greg Ewing

Fernando Perez wrote:

Now, I *suspect* (but don't remember for sure) that the option to have it 
right-hand-open-ended was to match the mental model people have for range:


In [5]: linspace(0, 10, 10, endpoint=False)
Out[5]: array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])

In [6]: range(0, 10)
Out[6]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


My guess would be it's so that you can concatenate two sequences
created with linspace covering adjacent ranges and get the same
result as a single linspace call covering the whole range.


I do hope, though, that the chosen name is *not*:

- 'interval'

- 'interpolate' or similar


Would 'subdivide' be acceptable?

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] cpython: Implement PEP 393.

2011-09-28 Thread Eric V. Smith
Is there some reason str.format had such major surgery done to it? It
appears parts of it were removed from stringlib. I had not even thought
to look at the code before it was merged, as it never occurred to me
anyone would do that.

I left it in stringlib even in 3.x because there's the occasional talk
of adding bytes.bformat, and since all of the code works well with
stringlib (since it was used by str and unicode in 2.x), it made sense
to leave it there.

In addition, there are outstanding patches that are now broken.

I'd prefer it return to how it used to be, and just the minimum changes
required for PEP 393 be made to it.

Thanks.
Eric.

On 9/28/2011 2:35 AM, martin.v.loewis wrote:
 http://hg.python.org/cpython/rev/8beaa9a37387
 changeset:   72475:8beaa9a37387
 user:Martin v. Löwis mar...@v.loewis.de
 date:Wed Sep 28 07:41:54 2011 +0200
 summary:
   Implement PEP 393.
 
 files:
   Doc/c-api/unicode.rst  | 9 +
   Include/Python.h   | 5 +
   Include/complexobject.h| 5 +-
   Include/floatobject.h  | 5 +-
   Include/longobject.h   | 6 +-
   Include/pyerrors.h | 6 +
   Include/pyport.h   | 3 +
   Include/unicodeobject.h|   783 +-
   Lib/json/decoder.py| 3 +-
   Lib/test/json_tests/test_scanstring.py |11 +-
   Lib/test/test_codeccallbacks.py| 7 +-
   Lib/test/test_codecs.py| 4 +
   Lib/test/test_peepholer.py | 4 -
   Lib/test/test_re.py| 7 +
   Lib/test/test_sys.py   |38 +-
   Lib/test/test_unicode.py   |41 +-
   Makefile.pre.in| 6 +-
   Misc/NEWS  | 2 +
   Modules/_codecsmodule.c| 8 +-
   Modules/_csv.c | 2 +-
   Modules/_ctypes/_ctypes.c  | 6 +-
   Modules/_ctypes/callproc.c | 8 -
   Modules/_ctypes/cfield.c   |64 +-
   Modules/_cursesmodule.c| 7 +-
   Modules/_datetimemodule.c  |13 +-
   Modules/_dbmmodule.c   |12 +-
   Modules/_elementtree.c |31 +-
   Modules/_io/_iomodule.h| 2 +-
   Modules/_io/stringio.c |69 +-
   Modules/_io/textio.c   |   352 +-
   Modules/_json.c|   252 +-
   Modules/_pickle.c  | 4 +-
   Modules/_sqlite/connection.c   |19 +-
   Modules/_sre.c |   382 +-
   Modules/_testcapimodule.c  | 2 +-
   Modules/_tkinter.c |70 +-
   Modules/arraymodule.c  | 8 +-
   Modules/md5module.c|10 +-
   Modules/operator.c |27 +-
   Modules/pyexpat.c  |11 +-
   Modules/sha1module.c   |10 +-
   Modules/sha256module.c |10 +-
   Modules/sha512module.c |10 +-
   Modules/sre.h  | 4 +-
   Modules/syslogmodule.c |14 +-
   Modules/unicodedata.c  |28 +-
   Modules/zipimport.c|   141 +-
   Objects/abstract.c | 4 +-
   Objects/bytearrayobject.c  |   147 +-
   Objects/bytesobject.c  |   127 +-
   Objects/codeobject.c   |15 +-
   Objects/complexobject.c|19 +-
   Objects/dictobject.c   |20 +-
   Objects/exceptions.c   |26 +-
   Objects/fileobject.c   |17 +-
   Objects/floatobject.c  |19 +-
   Objects/longobject.c   |84 +-
   Objects/moduleobject.c | 9 +-
   Objects/object.c   |10 +-
   Objects/setobject.c|40 +-
   Objects/stringlib/count.h  | 9 +-
   Objects/stringlib/eq.h |23 +-
   Objects/stringlib/fastsearch.h | 4 +-
   Objects/stringlib/find.h   |31 +-
   Objects/stringlib/formatter.h  |  1516 --
   Objects/stringlib/localeutil.h |27 +-
   Objects/stringlib/partition.h  |12 +-
   Objects/stringlib/split.h  |26 +-
   Objects/stringlib/string_format.h  |  1385 --
   Objects/stringlib/stringdefs.h | 2 +
   Objects/stringlib/ucs1lib.h|35 +
   Objects/stringlib/ucs2lib.h|34 +
   Objects/stringlib/ucs4lib.h|34 +
   Objects/stringlib/undef.h  |10 +
   Objects/stringlib/unicode_format.h |  1416 ++
   Objects/stringlib/unicodedefs.h| 2 +
   Objects/typeobject.c   |18 +-
   

Re: [Python-Dev] [Python-checkins] cpython: Enhance Py_ARRAY_LENGTH(): fail at build time if the argument is not an array

2011-09-28 Thread Benjamin Peterson
2011/9/28 victor.stinner python-check...@python.org:
 http://hg.python.org/cpython/rev/36fc514de7f0
 changeset:   72512:36fc514de7f0
 user:        Victor Stinner victor.stin...@haypocalc.com
 date:        Thu Sep 29 01:12:24 2011 +0200
 summary:
  Enhance Py_ARRAY_LENGTH(): fail at build time if the argument is not an array

 Move other various macros to pymcacro.h

 Thanks Rusty Russell for having written these amazing C macros!

 files:
  Include/Python.h          |  19 +
  Include/pymacro.h         |  57 +++

Do we really need a new file? Why not pyport.h where other compiler stuff goes?


-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 393 close to pronouncement

2011-09-28 Thread Victor Stinner
 Resizing
 
 
 Codecs use resizing a lot. Given that PyCompactUnicodeObject
 does not support resizing, most decoders will have to use
 PyUnicodeObject and thus not benefit from the memory footprint
 advantages of e.g. PyASCIIObject.

Wrong. Even if you create a string using the legacy API (e.g. 
PyUnicode_FromUnicode), the string will be quickly compacted to use the most 
efficient memory storage (depending on the maximum character). quickly: at 
the 
first call to PyUnicode_READY. Python tries to make all strings ready as early 
as possible.

 PyASCIIObject has a wchar_t *wstr pointer - I guess this should
 be a char *str pointer, otherwise, where's the memory footprint
 advantage (esp. on Linux where sizeof(wchar_t) == 4) ?

For pure ASCII strings, you don't have to store a pointer to the UTF-8 string, 
nor the length of the UTF-8 string (in bytes), nor the length of the wchar_t 
string (in wide characters): the length is always the length of the ASCII 
string, and the UTF-8 string is shared with the ASCII string. The structure is 
much smaller thanks to these optimizations, and so Python 3.3 uses less memory 
than 2.7 for ASCII strings, even for short strings.

 I also don't see a reason to limit the UCS1 storage version
 to ASCII. Accordingly, the object should be called PyLatin1Object
 or PyUCS1Object.

Latin1 is less interesting, you cannot share length/data fields with utf8 or 
wstr. We didn't add a special case for Latin1 strings (except using Py_UCS1* 
strings to store their characters).

 Furthermore, determining len(obj) will require a loop over
 the data, checking for surrogate code points. A simple memcpy()
 is no longer enough.

Wrong. len(obj) gives the right result (see the long discussion about what 
is the length of a string in a previous thread...) in O(1) since it's computed 
when the string is created.

 ... in practice you only
 very rarely see any non-BMP code points in your data. Making
 all Python users pay for the needs of a tiny fraction is
 not really fair. Remember: practicality beats purity.

The creation of the string is maybe is little bit slower (especially when you 
have to scan the string twice to first get the maximum character), but I think 
that this slow down is smaller than the speedup allowed by the PEP.

Because ASCII strings are now char*, I think that processing ASCII strings is 
faster because the CPU can cache more data (close to the CPU).

We can do better optimization on ASCII and Latin1 strings (it's faster to 
manipulate char* than uint16_t* or uint32_t*). For example, str.center(), 
str.ljust, str.rjust and str.zfill do now use the very fast memset() function 
for latin1 strings to pad the string.

Another example, duplicating a string (or create a substring) should be faster 
just because you have less data to copy (e.g. 10 bytes for a string of 10 
Latin1 characters vs 20 or 40 bytes with Python 3.2).

The two most common encodings in the world are ASCII and UTF-8. With the PEP 
393, encoding to ASCII or UTF-8 is free, you don't have to encode anything, 
you have directly the encoded char* buffer (whereas you have to convert 16/32 
bit wchar_t to char* in Python 3.2, even for pure ASCII). (It's also free to 
encode Latin1 Unicode string to Latin1.)

With the PEP 393, we never have to decode UTF-16 anymore when iterating on 
code pointer to support correctly non-BMP characters (which was required 
before in narrow build, e.g. on Windows). Iterate on code point is just a 
dummy loop, no need to check if each character is in range U+D800-U+DFFF.

There are other funny tricks (optimizations). For example, text.replace(a, b) 
knows that there is nothing to do if maxchar(a)  maxchar(text), where 
maxchar(obj) just requires to read an attribute of the string. Think about 
ASCII and non-ASCII strings: pure_ascii.replace('\xe9', '') now just creates a 
new reference...

I don't think that Martin wrote his PEP to be able to implement all these 
optimisations, but there are an interesting side effect of his PEP :-)

 The table only lists string sizes up 8 code points. The memory
 savings for these are really only significant for ASCII
 strings on 64-bit platforms, if you use the default UCS2
 Python build as basis.

In the 32 different cases, the PEP 393 is better in 29 cases and just as good 
as Python 3.2 in 3 corner cases:

- 1 ASCII, 16-bit wchar, 32-bit
- 1 Latin1, 32-bit wchar, 32-bit
- 2 Latin1, 32-bit wchar, 32-bit

Do you really care of these corner cases? See the more the realistic benchmark 
in previous Martin's email (PEP 393 memory savings update): the PEP 393 not 
only uses 3x less memory than 3.2, but it uses also *less* memory than Python 
2.7, whereas Python 3 uses Unicode for everything!

 For larger strings, I expect the savings to be more significant.

Sure.

 OTOH, a single non-BMP code point in such a string would cause
 the savings to drop significantly again.

In this case, it's just as good as Python 3.2 in wide mode, but worse 

Re: [Python-Dev] [Python-checkins] cpython: Enhance Py_ARRAY_LENGTH(): fail at build time if the argument is not an array

2011-09-28 Thread Victor Stinner
Le jeudi 29 septembre 2011 02:07:02, Benjamin Peterson a écrit :
 2011/9/28 victor.stinner python-check...@python.org:
  http://hg.python.org/cpython/rev/36fc514de7f0
  changeset:   72512:36fc514de7f0
  user:Victor Stinner victor.stin...@haypocalc.com
  date:Thu Sep 29 01:12:24 2011 +0200
  summary:
   Enhance Py_ARRAY_LENGTH(): fail at build time if the argument is not an
  array
  
  Move other various macros to pymcacro.h
  
  Thanks Rusty Russell for having written these amazing C macros!
  
  files:
   Include/Python.h  |  19 +
   Include/pymacro.h |  57 +++
 
 Do we really need a new file? Why not pyport.h where other compiler stuff
 goes?

I'm not sure that pyport.h is the right place to add Py_MIN, Py_MAX, 
Py_ARRAY_LENGTH. pyport.h looks to be related to all things specific to the 
platform like INT_MAX, Py_VA_COPY, ... pymacro.h contains platform independant 
macros.

I would like to suggest the opposite: move platform independdant macros from 
pyport.h to pymacro.h :-) Suggestions:
 - Py_ARITHMETIC_RIGHT_SHIFT
 - Py_FORCE_EXPANSION
 - Py_SAFE_DOWNCAST

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] range objects in 3.x

2011-09-28 Thread Fernando Perez
On Thu, 29 Sep 2011 11:36:21 +1300, Greg Ewing wrote:


 I do hope, though, that the chosen name is *not*:
 
 - 'interval'
 
 - 'interpolate' or similar
 
 Would 'subdivide' be acceptable?

I'm not great at finding names, and I don't totally love it, but I 
certainly don't see any problems with it.  It is, after all, a subdivision 
of an interval :)

I think 'grid' has been mentioned, and I think it's reasonable, even 
though most people probably associate the word with a two-dimensional 
object.  But grids can have any desired dimensionality.

Now, in fact, numpy has a slightly demented (but extremely useful) ogrid 
object:

In [7]: ogrid[0:10:3]
Out[7]: array([0, 3, 6, 9])

In [8]: ogrid[0:10:3j]
Out[8]: array([  0.,   5.,  10.])

Yup, that's a complex slice :)

So if python named the builtin 'grid', I think it would go well with 
existing numpy habits.

Cheers,

f

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com