Re: [Python-Dev] Unicode charmap decoders slow

2005-10-05 Thread Walter Dörwald
Am 05.10.2005 um 00:08 schrieb Martin v. Löwis:

> Walter Dörwald wrote:
>
>>> This array would have to be sparse, of course.
>>>
>> For encoding yes, for decoding no.
>>
> [...]
>
>> For decoding it should be sufficient to use a unicode string of   
>> length 256. u"\ufffd" could be used for "maps to undefined". Or  
>> the  string might be shorter and byte values greater than the  
>> length of  the string are treated as "maps to undefined" too.
>
> Right. That's what I meant with "sparse": you somehow need to  
> represent
> "no value".

OK, but I don't think that we really need a sparse data structure for  
that. I used the following script to check that:
-
import sys, os.path, glob, encodings

has = 0
hasnt = 0

for enc in glob.glob("%s/*.py" % os.path.dirname(encodings.__file__)):
   enc = enc.rsplit(".")[-2].rsplit("/")[-1]
   try:
 __import__("encodings.%s" % enc)
 codec = sys.modules["encodings.%s" % enc]
   except:
 pass
   else:
 if hasattr(codec, "decoding_map"):
   print codec
   for i in xrange(0, 256):
 if codec.decoding_map.get(i, None) is not None:
   has += 1
 else:
   hasnt += 1
print "assigned values:", has, "unassigned values:", hasnt

It reports that in all the charmap codecs there are 15292 assigned  
byte values and only 324 unassigned ones. I.e. only about 2% of the  
byte values map to "undefined". Storing those codepoints in the array  
as U+FFFD would only need 648 (or 1296 for wide builds) additional  
bytes. I don 't think a sparse data structure could beat that.

>> This might work, although nobody has complained about charmap   
>> encoding yet. Another option would be to generate a big switch   
>> statement in C and let the compiler decide about the best data   
>> structure.
> I would try to avoid generating C code at all costs. Maintaining  
> the build processes will just be a nightmare.

Sounds resonable.

Bye,
Walter Dörwald

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode charmap decoders slow

2005-10-05 Thread M.-A. Lemburg
Martin v. Löwis wrote:
>>Another option would be to generate a big switch  statement in C 
>>and let the compiler decide about the best data  structure.
> 
> I would try to avoid generating C code at all costs. Maintaining the 
> build processes will just be a nightmare.

We could automate this using distutils; however I'm not sure
whether this would then also work on Windows.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 05 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode charmap decoders slow

2005-10-05 Thread jepler
The function the module below, xlate.xlate, doesn't quite do what "".decode
does.  (mostly that characters that don't exist are mapped to u+fffd always,
instead of having the various behaviors avilable to "".decode)

It builds the fast decoding structure once per call, but when decoding 53kb of
data that overhead is small enough to make it much faster than
s.decode('mac-roman').  For smaller buffers (I tried 53 characters), s.decode is
two times faster. (43us vs 21us)

$ timeit.py -s "s='a'*53*1024; import xlate" "s.decode('mac-roman')"
100 loops, best of 3: 12.8 msec per loop
$ timeit.py -s "s='a'*53*1024; import xlate, encodings.mac_roman" \
"xlate.xlate(s, encodings.mac_roman.decoding_map)"
1000 loops, best of 3: 573 usec per loop

Jeff
#include 
#include 
#include 

PyObject *xlate(PyObject *s, PyObject *o) {
unsigned char *inbuf;
int i, length, pos=0;
PyObject *map, *key, *value, *ret;
Py_UNICODE *u, *ru;

if(!PyArg_ParseTuple(o, "s#O", (char*)&inbuf, &length, &map)) return NULL;
if(!PyDict_Check(map)) {
PyErr_SetString(PyExc_TypeError, "Argument 2 must be a dictionary");
return NULL;
}

u = PyMem_Malloc(sizeof(Py_UNICODE) * 256);
if(!u) { return NULL; }
for(i=0; i<256; i++) {
u[i] = 0xfffd;
}

while(PyDict_Next(map, &pos, &key, &value)) {
int ki, vi;
if(!PyInt_Check(key)) { 
PyErr_SetString(PyExc_TypeError, "Dictionary keys must be ints");
return NULL;
}
ki = PyInt_AsLong(key);
if(ki < 0 || ki > 255) { 
PyErr_Format(PyExc_TypeError,
"Dictionary keys must be in the range 0..255 (saw %d)", ki);
return NULL;
}
if(value == Py_None) continue;
if(!PyInt_Check(value)) { 
PyErr_SetString(PyExc_TypeError, "Dictionary values must be ints or 
None");
return NULL;
}
vi = PyInt_AsLong(value);
u[ki] = vi;
}

ret = PyUnicode_FromUnicode(NULL, length);
if(!ret) { free(u); return NULL; }
ru = PyUnicode_AsUnicode(ret);
for(i=0; iimport encodings.mac_roman
import xlate

def test(encname, decoding_map):

s = ""
for k, v in decoding_map.items():
if v is not None: 
s += chr(k)

u1 = s.decode(encname)
print decoding_map
u2 = xlate.xlate(s, decoding_map)
assert u1 == u2

test("mac-roman", encodings.mac_roman.decoding_map)


pgpyIMySVfb2z.pgp
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Python 2.5 and ast-branch

2005-10-05 Thread Nick Coghlan
Guido van Rossum wrote:
> On 10/4/05, Nick Coghlan <[EMAIL PROTECTED]> wrote:
> 
>>I was planning on looking at your patch too, but I was waiting for an answer
>>from Guido about the fate of the ast-branch for Python 2.5. Given that we have
>>patches for PEP 342 and PEP 343 against the trunk, but ast-branch still isn't
>>even passing the Python 2.4 test suite, I'm wondering if it should be bumped
>>from the feature list again.
> 
> 
> What do you want me to say about the AST branch? It's not my branch, I
> haven't even checked it out, I'm just patiently waiting for the folks
> who started it to finally finish it.

It was a question I asked a few weeks back [1] that didn't get any response 
(even from Brett!), to do with the fact that for Python 2.4 there was a 
deadline for landing the ast-branch that was a month or two in advance of the 
deadline for 2.4a1. I thought you'd set that deadline, but now that I look for 
it, I can't actually find any evidence of that. The only thing I can find is 
Jeremy's email saying it wasn't ready in time [2] (Jeremy's concern about 
reference leaks in ast-branch when it encounters compile errors is one I 
share, btw).

Anyway, the question is: What do we want to do with ast-branch? Finish 
bringing it up to Python 2.4 equivalence, make it the HEAD, and only then 
implement the approved PEP's (308, 342, 343) that affect the compiler? Or 
implement the approved PEP's on the HEAD, and move the goalposts for 
ast-branch to include those features as well?

I believe the latter is the safe option in terms of making sure 2.5 is a solid 
release, but doing it that way suggests to me that the ast compiler would need 
to be held over until 2.6, which would be somewhat unfortunate.

Given that I don't particularly like that answer, I'd love for someone to 
convince me I'm wrong ;)

Cheers,
Nick.

[1] http://mail.python.org/pipermail/python-dev/2005-September/056449.html
[2] http://mail.python.org/pipermail/python-dev/2004-June/045121.html

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://boredomandlaziness.blogspot.com
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 2.5 and ast-branch

2005-10-05 Thread Guido van Rossum
On 10/5/05, Nick Coghlan <[EMAIL PROTECTED]> wrote:
> Anyway, the question is: What do we want to do with ast-branch? Finish
> bringing it up to Python 2.4 equivalence, make it the HEAD, and only then
> implement the approved PEP's (308, 342, 343) that affect the compiler? Or
> implement the approved PEP's on the HEAD, and move the goalposts for
> ast-branch to include those features as well?
>
> I believe the latter is the safe option in terms of making sure 2.5 is a solid
> release, but doing it that way suggests to me that the ast compiler would need
> to be held over until 2.6, which would be somewhat unfortunate.
>
> Given that I don't particularly like that answer, I'd love for someone to
> convince me I'm wrong ;)

Given the total lack of response, I have a different suggestion. Let's
*abandon* the AST-branch. We're fooling ourselves believing that we
can ever switch to that branch, no matter how theoretically better it
is.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode charmap decoders slow

2005-10-05 Thread Hye-Shik Chang
On 10/5/05, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
> Of course, a C version could use the same approach as
> the unicodedatabase module: that of compressed lookup
> tables...
>
> http://aggregate.org/TechPub/lcpc2002.pdf
>
> genccodec.py anyone ?
>

I had written a test codec for single byte character sets to evaluate
algorithms to use in CJKCodecs once before  (it's not a direct
implemention of you've mentioned, tough) I just ported it
to unicodeobject (as attached).  It showed relatively fine result
than charmap codecs:

% python ./Lib/timeit.py -s "s='a'*1024*1024; u=unicode(s)"
"s.decode('iso8859-1')"
10 loops, best of 3: 96.7 msec per loop
% ./python ./Lib/timeit.py -s "s='a'*1024*1024; u=unicode(s)"
"s.decode('iso8859_10_fc')"
10 loops, best of 3: 22.7 msec per loop
% ./python ./Lib/timeit.py -s "s='a'*1024*1024; u=unicode(s)"
"s.decode('utf-8')"
100 loops, best of 3: 18.9 msec per loop

(Note that it doesn't contain any documentation nor good error
handling yet. :-)


Hye-Shik


fastmapcodec.diff
Description: Binary data
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode charmap decoders slow

2005-10-05 Thread Walter Dörwald
Martin v. Löwis wrote:

> Tony Nelson wrote:
> 
>>> For decoding it should be sufficient to use a unicode string of
>>> length 256. u"\ufffd" could be used for "maps to undefined". Or the
>>> string might be shorter and byte values greater than the length of
>>> the string are treated as "maps to undefined" too.
>>
>> With Unicode using more than 64K codepoints now, it might be more forward
>> looking to use a table of 256 32-bit values, with no need for tricky
>> values.
> 
> You might be missing the point. \ufffd is REPLACEMENT CHARACTER,
> which would indicate that the byte with that index is really unused
> in that encoding.

OK, here's a patch that implements this enhancement to 
PyUnicode_DecodeCharmap(): http://www.python.org/sf/1313939

The mapping argument to PyUnicode_DecodeCharmap() can be a unicode 
string and is used as a decoding table.

Speed looks like this:

python2.4 -mtimeit "s='a'*53*1024; u=unicode(s)" "s.decode('utf-8')"
1000 loops, best of 3: 538 usec per loop
python2.4 -mtimeit "s='a'*53*1024; u=unicode(s)" "s.decode('mac-roman')"
100 loops, best of 3: 3.85 msec per loop
./python-cvs -mtimeit "s='a'*53*1024; u=unicode(s)" "s.decode('utf-8')"
1000 loops, best of 3: 539 usec per loop
./python-cvs -mtimeit "s='a'*53*1024; u=unicode(s)" "s.decode('mac-roman')"
1000 loops, best of 3: 623 usec per loop

Creating the decoding_map as a string should probably be done by 
gencodec.py directly. This way the first import of the codec would be 
faster too.

Bye,
Walter Dörwald
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode charmap decoders slow

2005-10-05 Thread M.-A. Lemburg
Hye-Shik Chang wrote:
> On 10/5/05, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
> 
>>Of course, a C version could use the same approach as
>>the unicodedatabase module: that of compressed lookup
>>tables...
>>
>>http://aggregate.org/TechPub/lcpc2002.pdf
>>
>>genccodec.py anyone ?
>>
> 
> 
> I had written a test codec for single byte character sets to evaluate
> algorithms to use in CJKCodecs once before  (it's not a direct
> implemention of you've mentioned, tough) I just ported it
> to unicodeobject (as attached). 

Thanks. Please upload the patch to SF.

Looks like we now have to competing patches: yours and the
one written by Walter.

So far you've only compared decoding strings into Unicode
and they seem to be similar in performance. Do they differ
in encoding performance ?

> It showed relatively fine result
> than charmap codecs:
> 
> % python ./Lib/timeit.py -s "s='a'*1024*1024; u=unicode(s)"
> "s.decode('iso8859-1')"
> 10 loops, best of 3: 96.7 msec per loop
> % ./python ./Lib/timeit.py -s "s='a'*1024*1024; u=unicode(s)"
> "s.decode('iso8859_10_fc')"
> 10 loops, best of 3: 22.7 msec per loop
> % ./python ./Lib/timeit.py -s "s='a'*1024*1024; u=unicode(s)"
> "s.decode('utf-8')"
> 100 loops, best of 3: 18.9 msec per loop
> 
> (Note that it doesn't contain any documentation nor good error
> handling yet. :-)
> 
> 
> Hye-Shik

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 05 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode charmap decoders slow

2005-10-05 Thread Martin v. Löwis
M.-A. Lemburg wrote:
>>I would try to avoid generating C code at all costs. Maintaining the 
>>build processes will just be a nightmare.
> 
> 
> We could automate this using distutils; however I'm not sure
> whether this would then also work on Windows.

It wouldn't.

Regards,
Martin

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode charmap decoders slow

2005-10-05 Thread Martin v. Löwis
Walter Dörwald wrote:
> OK, here's a patch that implements this enhancement to 
> PyUnicode_DecodeCharmap(): http://www.python.org/sf/1313939

Looks nice!

> Creating the decoding_map as a string should probably be done by 
> gencodec.py directly. This way the first import of the codec would be 
> faster too.

Hmm. How would you represent the string in source code? As a Unicode
literal? With \u escapes, or in a UTF-8 source file? Or as a UTF-8
string, with an explicit decode call?

I like the current dictionary style for being readable, as it also
adds the Unicode character names into comments.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode charmap decoders slow

2005-10-05 Thread M.-A. Lemburg
Martin v. Löwis wrote:
> M.-A. Lemburg wrote:
> 
>>> I would try to avoid generating C code at all costs. Maintaining the
>>> build processes will just be a nightmare.
>>
>>
>>
>> We could automate this using distutils; however I'm not sure
>> whether this would then also work on Windows.
> 
> 
> It wouldn't.

Could you elaborate why not ? Using distutils on Windows is really
easy...

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 05 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode charmap decoders slow

2005-10-05 Thread M.-A. Lemburg
Martin v. Löwis wrote:
> Walter Dörwald wrote:
> 
>>OK, here's a patch that implements this enhancement to 
>>PyUnicode_DecodeCharmap(): http://www.python.org/sf/1313939
> 
> Looks nice!

Indeed (except for the choice of the "map this character
to undefined" code point).

Hye-Shik, could you please provide some timeit figures for
the fastmap encoding ?

>>Creating the decoding_map as a string should probably be done by 
>>gencodec.py directly. This way the first import of the codec would be 
>>faster too.
> 
> 
> Hmm. How would you represent the string in source code? As a Unicode
> literal? With \u escapes, or in a UTF-8 source file? Or as a UTF-8
> string, with an explicit decode call?
> 
> I like the current dictionary style for being readable, as it also
> adds the Unicode character names into comments.

Not only that: it also allows 1-n and 1-0 mappings which was part
of the idea to use a mapping object (such as a dictionary) as basis
for the codec.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 05 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode charmap decoders slow

2005-10-05 Thread Martin v. Löwis
M.-A. Lemburg wrote:
>>It wouldn't.
> 
> 
> Could you elaborate why not ? Using distutils on Windows is really
> easy...

The current build process for Windows simply doesn't provide it.
You expect to select "Build/All" from the menu (or some such),
and expect all code to be compiled. The VC build process only
considers VC project files.

Maybe it is possible to hack up a project file to invoke distutils
as the build process, but no such project file is currently available,
nor is it known whether it is possible to create one. Whatever the
build process: it should properly with debug and release build,
with alternative compilers (such as Itanium compiler), and place
the files so that debugging from the VStudio environment is possible.
All of this is not the case of today, and nobody has worked on
making it possible. I very much doubt distutils in its current form
could handle it.

Regards,
Martin

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 2.5 and ast-branch

2005-10-05 Thread Brett Cannon
To answer Nick's email here, I didn't respond to that initial email
because it seemed specifically directed at Guido and not me.

On 10/5/05, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> On 10/5/05, Nick Coghlan <[EMAIL PROTECTED]> wrote:
> > Anyway, the question is: What do we want to do with ast-branch? Finish
> > bringing it up to Python 2.4 equivalence, make it the HEAD, and only then
> > implement the approved PEP's (308, 342, 343) that affect the compiler? Or
> > implement the approved PEP's on the HEAD, and move the goalposts for
> > ast-branch to include those features as well?
> >
> > I believe the latter is the safe option in terms of making sure 2.5 is a 
> > solid
> > release, but doing it that way suggests to me that the ast compiler would 
> > need
> > to be held over until 2.6, which would be somewhat unfortunate.
> >
> > Given that I don't particularly like that answer, I'd love for someone to
> > convince me I'm wrong ;)
>
> Given the total lack of response, I have a different suggestion. Let's
> *abandon* the AST-branch. We're fooling ourselves believing that we
> can ever switch to that branch, no matter how theoretically better it
> is.
>

Since the original people who have done the majority of the work
(Jeremy, Tim, Neal, Nick, logistix, and myself) have fallen so far
behind this probably is not a bad decision.  Obviously I would like to
see the work pan out, but since I personally just have not found the
time to shuttle the branch the rest of the way I really am in no
position to say much in terms of objecting to its demise.

Maybe I can come up with a new design and get my dissertation out of it.  =)

-Brett
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode charmap decoders slow

2005-10-05 Thread M.-A. Lemburg
Martin v. Löwis wrote:
> M.-A. Lemburg wrote:
> 
>>> It wouldn't.
>>
>>
>>
>> Could you elaborate why not ? Using distutils on Windows is really
>> easy...
> 
> 
> The current build process for Windows simply doesn't provide it.
> You expect to select "Build/All" from the menu (or some such),
> and expect all code to be compiled. The VC build process only
> considers VC project files.
> 
> Maybe it is possible to hack up a project file to invoke distutils
> as the build process, but no such project file is currently available,
> nor is it known whether it is possible to create one. Whatever the
> build process: it should properly with debug and release build,
> with alternative compilers (such as Itanium compiler), and place
> the files so that debugging from the VStudio environment is possible.
> All of this is not the case of today, and nobody has worked on
> making it possible. I very much doubt distutils in its current form
> could handle it.

I see, so you have to create a VC project file for each codec -
that would be hard to maintain indeed.

For Unix platforms this would be no problem at all since there all
extension modules are built using distutils anyway.

Thanks for the explanation.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 05 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode charmap decoders slow

2005-10-05 Thread Trent Mick
[Martin v. Loewis wrote]
> Maybe it is possible to hack up a project file to invoke distutils
> as the build process, but no such project file is currently available,
> nor is it known whether it is possible to create one. 

This is essentially what the "_ssl" project does, no? It defers to
"build_ssl.py" to do the build work. I didn't see what the full build
requirements were earlier in this thread though, so I may be missing
something.

Trent

-- 
Trent Mick
[EMAIL PROTECTED]
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode charmap decoders slow

2005-10-05 Thread Martin v. Löwis
Trent Mick wrote:
> [Martin v. Loewis wrote]
> 
>>Maybe it is possible to hack up a project file to invoke distutils
>>as the build process, but no such project file is currently available,
>>nor is it known whether it is possible to create one. 
> 
> 
> This is essentially what the "_ssl" project does, no? 

More or less, yes. It does support both debug and release build. It
does not support Itanium builds (atleast not the way the other projects
do); as a result, the Itanium build currently just doesn't offer SSL.

More importantly, build_ssl.py is not based on distutils. Instead, it
is manually hacked up - a VBScript file would have worked as well. So
if you were to create many custom build scripts (one per codec), you
might just as well generate the VS project files directly.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode charmap decoders slow

2005-10-05 Thread Hye-Shik Chang
On 10/6/05, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
> Hye-Shik, could you please provide some timeit figures for
> the fastmap encoding ?
>

(before applying Walter's patch, charmap decoder)

% ./python Lib/timeit.py -s "s='a'*53*1024; e='iso8859_10';
u=unicode(s, e)" "s.decode(e)"
100 loops, best of 3: 3.35 msec per loop

(applied the patch, improved charmap decoder)

% ./python Lib/timeit.py -s "s='a'*53*1024; e='iso8859_10';
u=unicode(s, e)" "s.decode(e)"
1000 loops, best of 3: 1.11 msec per loop

(the fastmap decoder)

% ./python Lib/timeit.py -s "s='a'*53*1024; e='iso8859_10_fc';
u=unicode(s, e)" "s.decode(e)"
1000 loops, best of 3: 1.04 msec per loop

(utf-8 decoder)

% ./python Lib/timeit.py -s "s='a'*53*1024; e='utf_8'; u=unicode(s,
e)" "s.decode(e)"
1000 loops, best of 3: 851 usec per loop

Walter's decoder and the fastmap decoder run in mostly same way.
So the performance difference is quite minor.  Perhaps, the minor
difference came from the existence of wrapper function on each codecs;
the fastmap codec provides functions usable as Codecs.{en,de}code
directly.

(encoding, charmap codec)

% ./python Lib/timeit.py -s "s='a'*53*1024; e='iso8859_10';
u=unicode(s, e)" "u.encode(e)"
100 loops, best of 3: 3.51 msec per loop

(encoding, fastmap codec)

% ./python Lib/timeit.py -s "s='a'*53*1024; e='iso8859_10_fc';
u=unicode(s, e)" "u.encode(e)"
1000 loops, best of 3: 536 usec per loop

(encoding, utf-8 codec)

% ./python Lib/timeit.py -s "s='a'*53*1024; e='utf_8'; u=unicode(s,
e)" "u.encode(e)"
1000 loops, best of 3: 1.5 msec per loop

If the encoding optimization can be easily done in Walter's approach,
the fastmap codec would be too expensive way for the objective because
we must maintain not only fastmap but also charmap for backward
compatibility.

Hye-Shik
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Removing the block stack (was Re: PEP 343 and __with__)

2005-10-05 Thread Phillip J. Eby
At 09:50 AM 10/4/2005 +0100, Michael Hudson wrote:
>(anyone still thinking about removing the block stack?).

I'm not any more.  My thought was that it would be good for performance, by 
reducing the memory allocation overhead for frames enough to allow pymalloc 
to be used instead of the platform malloc.  After more investigation, 
however, I realized that was a dumb idea, because for a typical application 
the amortized allocation cost of frames approaches zero as the program runs 
and allocates as many frames as it will ever use, as large as it will ever 
use them, and just recycles them on the free list.  And all of the ways I 
came up with for removing the block stack were a lot more complex than 
leaving it as-is.

Clearly, the cost of function calls in Python lies somewhere else, and I'd 
probably look next at parameter tuple allocation, and other frame 
initialization activities.  I seem to recall that Armin Rigo once supplied 
a patch that sped up calls at the cost of slowing down recursive or 
re-entrant ones, and I seem to recall that it was based on preinitializing 
frames, not just preallocating them:

 http://mail.python.org/pipermail/python-dev/2004-March/042871.html

However, the patch was never applied because of its increased memory usage 
as well as the slowdown for recursion.

Every so often, in blue-sky thinking about alternative Python VM designs, I 
think about making frames virtual, in the sense of not even having "real" 
frame objects except for generators, sys._getframe(), and tracebacks.  I 
suspect, however, that doing this in a way that doesn't mess with the 
current C API is non-trivial.  And for many "obvious" ways to simplify the 
various stacks, locals, etc., the downside could be more complexity for 
generators, and probably less speed as well.

For example, we could use a single "stack" arena in the heap for 
parameters, locals, cells, and blocks, rather than doing all the various 
sub-allocations within the frame.  But then creating a frame would involve 
copying data off the top of this pseudo-stack, and doing all the offset 
computations and perhaps some other trickery as well.  And resuming a 
generator would have to either copy it back, or have some sane way to make 
calls out to a new stack arena when calling other functions - thus making 
those operations slower.

The real problem, of course, with any of these ideas is that we are at best 
shaving a few percentage points here, a few points there, so it's 
comparatively speaking rather expensive to do the experiments to see if 
they help anything.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the block stack (was Re: PEP 343 and __with__)

2005-10-05 Thread Neal Norwitz
On 10/5/05, Phillip J. Eby <[EMAIL PROTECTED]> wrote:
> At 09:50 AM 10/4/2005 +0100, Michael Hudson wrote:
> >(anyone still thinking about removing the block stack?).
>
> I'm not any more.  My thought was that it would be good for performance, by
> reducing the memory allocation overhead for frames enough to allow pymalloc
> to be used instead of the platform malloc.

I did something similar to reduce the frame size to under 256 bytes
(don't recall if I made a patch or not) and it had no overall effect
on perf.

> Clearly, the cost of function calls in Python lies somewhere else, and I'd
> probably look next at parameter tuple allocation, and other frame
> initialization activities.

I think that's a big part of it.  This patch shows C calls getting
sped up primarly by avoiding tuple creation:

http://python.org/sf/1107887

I hope to work on that and get it into 2.5.

I've also been thinking about avoiding tuple creation when calling
python functions.  The change I have in mind would probably have to
wait until p3k, but could yield some speed ups.

Warning:  half baked idea follows.

My thoughts are to dynamically allocate the Python stack memory (e.g.,
void *stack = malloc(128MB)).  Then all calls within each thread uses
its own stack.  So things would be pushed onto the stack like they are
currently, but we wouldn't need to do create a tuple to pass to a
method, they could just be used directly.  Basically more closely
simulate the way it currently works in hardware.

This would mean all the PyArg_ParseTuple()s would have to change.  It
may be possible to fake it out, but I'm not sure it's worth it which
is why it would be easier to do this for p3k.

The general idea is to allocate the stack in one big hunk and just
walk up/down it as functions are called/returned.  This only means
incrementing or decrementing pointers.  This should allow us to avoid
a bunch of copying and tuple creation/destruction.  Frames would
hopefully be the same size which would help.  Note that even though
there is a free list for frames, there could still be
PyObject_GC_Resize()s often (or unused memory).  WIth my idea,
hopefully there would be better memory locality, which could speed
things up.

n
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com