Re: [Python-Dev] Frame zombies
Note that _only_ recursions will have more than 1 frame attached. That's not true; in the presence of threads, the same method may also be invoked more than one time simultaneously. But removing freelist altogether will not work well with any type of recursion. How do you know that? Did you measure the time? On what system? What were the results? Performance optimization without measuring is just unacceptable. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] zipfile and unicode filenames
sys.setdefaultencoding() exists for a reason, wouldn't it be better if stdlib could cope with that at least with zipfile? sys.setdefaultencoding just does not work. Many more things break when you call it. It only exists because people like you insisted that it exists. Also note that I'm trying to ask if zipfile should be improved, how it should be improved, and this possible improvement is not even for me (because now I know how zipfile behaves and I will work correctly with it, but someone else might stumble upon this very unexpectedly). If you want to come up with a patch: sure. The zipfile module should handle Unicode strings, encoding them in the encoding that the ZIP specification defines (both the formal one, and the informal-defined-by-pkwares-implementation). The tricky question is what to do when reading in zipfiles with non-ASCII characters (and yes, I understand that in your case there were only ASCII characters in the file names). The problem was that sourcedir was unicode, and on my machine everything went ok multiple times. zipfile.ZipInfo.FileHeader would return unicode, but then when it writes it to a file it gets back to str (because mappings back and forth were identical). The problem happened when on a different machine header suddenly got byte 0x98 in position 10 (seems to be compress_size), which cp1251 codec couldn't decode. You see, arcname didn't even have unicode characters, but the mere fact that it was unicode made header upgrade to unicode in return header + self.filename + self.extra. Ok, now I understand. If filename is a Unicode string, header is converted using the system encoding; depending on the exact value of header and depending on the system encoding, this may cause a decoding error. This bug has been reported as http://bugs.python.org/1170311 Because that's not supposed to work sanely when self.filename is unicode I'm asking if the right behavior would be to a) disallow unicode filenames in zipfile.ZipInfo, b) automatically convert filename to str in zipfile.ZipInfo, c) leave everything as it is. The correct behavior would be b); the difficult details are what encoding to use. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fwd: Instance variable access and descriptors
I have to agree with you. If removing support for self.__dict__['propertyname'] (where propertyname is also the name of a descriptor) is the price to pay for significant speedup, so be it. People doing that are asking for trouble anyway! On 10/06/07, Eyal Lotem [EMAIL PROTECTED] wrote: On 6/10/07, Phillip J. Eby [EMAIL PROTECTED] wrote: At 12:23 AM 6/10/2007 +0300, Eyal Lotem wrote: A. It will break code that uses instance.__dict__['var'] directly, when 'var' exists as a property with a __set__ in the class. I believe this is not significant. B. It will simplify getattr's semantics. Python should _always_ give precedence to instance attributes over class ones, rather than have very weird special-cases (such as a property with a __set__). Actually, these are features that are both used and desirable; I've been using them both since Python 2.2 (i.e., for many years now). I'm -1 on removing these features from any version of Python, even 3.0. It is the same feature, actually, two sides of the same coin. Why do you use self.__dict__['propertyname'] when you can use self._propertyname? Why even call the first form, which is both longer and causes performance problems a feature? C. It will greatly speed up instance variable access, especially when the class has a large mro. ...at the cost of slowing down access to properties and __slots__, by adding an *extra* dictionary lookup there. It will slow down access to properties - but that slowdown is insignificant: A. The vast majority of lookups are *NOT* of properties. They are the rare case and should not be the focus of optimization. B. Property access involves calling Python functions - which is heavier than a single dict lookup. C. The dict lookup to find the property in the __mro__ can involve many dicts (so in those cases adding a single dict lookup is not heavy). Note, by the way, that if you want to change attribute lookup semantics, you can always override __getattribute__ and make it work whatever way you like, without forcing everybody else to change *their* code. If I write my own __getattribute__ I lose the performance benefit that I am after. I do agree that code shouldn't be broken, that's why a transitional that requires using __fastlookup__ can be used (Unfortunately, from __future__ cannot be used as it is not local to a module, but to a class hierarchy - unless one imports a feature from __future__ into a class). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/gjcarneiro%40gmail.com -- Gustavo J. A. M. Carneiro INESC Porto The universe is always one step beyond logic. -- Frank Herbert ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] zipfile and unicode filenames
Also note that I'm trying to ask if zipfile should be improved, how it should be improved, and this possible improvement is not even for me (because now I know how zipfile behaves and I will work correctly with it, but someone else might stumble upon this very unexpectedly). If you want to come up with a patch: sure. The zipfile module should handle Unicode strings, encoding them in the encoding that the ZIP specification defines (both the formal one, and the informal-defined-by-pkwares-implementation). I don't think always encoding them to utf-8 (and using bit 11 of flag_bits) is a good idea, since there's a chance to create archives that won't be correctly readable by programs not supporting this bit (it's no secret that currently some programs just assume that filenames are encoded using one of system encodings). This is too complex and hazy to implement. Even if I know what is the situation on Windows (i.e. using OEM, also called DOS encoding, but I'm not sure how to determine its codec name from within python apart from calling GetConsoleCP), I'm totally unaware of the situation on other operating systems. The tricky question is what to do when reading in zipfiles with non-ASCII characters (and yes, I understand that in your case there were only ASCII characters in the file names). I don't think it should be changed. Ok, now I understand. If filename is a Unicode string, header is converted using the system encoding; depending on the exact value of header and depending on the system encoding, this may cause a decoding error. This bug has been reported as http://bugs.python.org/1170311 I see. Well, that's all easier now then, as I can just create a patch for an already existing bug. Because that's not supposed to work sanely when self.filename is unicode I'm asking if the right behavior would be to a) disallow unicode filenames in zipfile.ZipInfo, b) automatically convert filename to str in zipfile.ZipInfo, c) leave everything as it is. The correct behavior would be b); the difficult details are what encoding to use. Current zipfile seems to officially support ascii filenames only anyway, so the patch can be as simple as this: Index: Lib/zipfile.py === --- Lib/zipfile.py (revision 55850) +++ Lib/zipfile.py (working copy) @@ -252,12 +252,13 @@ self.extract_version = max(45, self.extract_version) self.create_version = max(45, self.extract_version) +filename = str(self.filename) header = struct.pack(structFileHeader, stringFileHeader, self.extract_version, self.reserved, self.flag_bits, self.compress_type, dostime, dosdate, CRC, compress_size, file_size, - len(self.filename), len(extra)) -return header + self.filename + extra + len(filename), len(extra)) +return header + filename + extra def _decodeExtra(self): # Try to decode the extra field. This doesn't introduce new features, just enforces filenames to be ascii (or whatever default encoding is) encodable. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] zipfile and unicode filenames
Current zipfile seems to officially support ascii filenames only anyway, so the patch can be as simple as this: Submitted patch and test case as http://python.org/sf/1734346 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] zipfile and unicode filenames
I don't think always encoding them to utf-8 (and using bit 11 of flag_bits) is a good idea, since there's a chance to create archives that won't be correctly readable by programs not supporting this bit (it's no secret that currently some programs just assume that filenames are encoded using one of system encodings). I think it is also fairly uniformly agreed that these programs are incorrect; the official encoding of file names in a zip file is Windows/DOS code page 437. This is too complex and hazy to implement. Even if I know what is the situation on Windows (i.e. using OEM, also called DOS encoding, but I'm not sure how to determine its codec name from within python apart from calling GetConsoleCP), I'm totally unaware of the situation on other operating systems. I don't think that the situation on Windows is that the OEM code page should be used. Instead, CP 437 should be used, independent of the OEM code page. The tricky question is what to do when reading in zipfiles with non-ASCII characters (and yes, I understand that in your case there were only ASCII characters in the file names). I don't think it should be changed. In Python 3, it will certainly change, since the string type will be unicode-based. It probably should not change for the rest of 2.x. Current zipfile seems to officially support ascii filenames only anyway That's not true. You can use any byte string as the file name that you want, including non-ASCII strings encoded in CP437. +filename = str(self.filename) That would be incorrect, as it relies on the system encoding, which shouldn't be relied upon. Plus, it would allow arbitrary non-string things as filenames. What it should do instead (IMO) is to encode in CP437. Bonus points if it falls back to the UTF-8 feature of zip files if encoding as CP437 fails. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fwd: Instance variable access and descriptors
At 04:14 AM 6/10/2007 +0300, Eyal Lotem wrote: On 6/10/07, Phillip J. Eby [EMAIL PROTECTED] wrote: At 12:23 AM 6/10/2007 +0300, Eyal Lotem wrote: A. It will break code that uses instance.__dict__['var'] directly, when 'var' exists as a property with a __set__ in the class. I believe this is not significant. B. It will simplify getattr's semantics. Python should _always_ give precedence to instance attributes over class ones, rather than have very weird special-cases (such as a property with a __set__). Actually, these are features that are both used and desirable; I've been using them both since Python 2.2 (i.e., for many years now). I'm -1 on removing these features from any version of Python, even 3.0. It is the same feature, actually, two sides of the same coin. Why do you use self.__dict__['propertyname'] when you can use self._propertyname? Because I'm *not writing this by hand*. I'm using descriptors that know what attribute name they're responsible for, and do the access directly. Why even call the first form, which is both longer and causes performance problems a feature? If you don't understand that, IMO you don't yet understand enough about the descriptor architecture to be proposing changes to it. Note, by the way, that if you want to change attribute lookup semantics, you can always override __getattribute__ and make it work whatever way you like, without forcing everybody else to change *their* code. If I write my own __getattribute__ I lose the performance benefit that I am after. Not if you write it in C. I do agree that code shouldn't be broken, that's why a transitional that requires using __fastlookup__ can be used (Unfortunately, from __future__ cannot be used as it is not local to a module, but to a class hierarchy - unless one imports a feature from __future__ into a class). I have no idea what you're talking about here. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] zipfile and unicode filenames
On 6/10/07, Martin v. Löwis [EMAIL PROTECTED] wrote: I don't think always encoding them to utf-8 (and using bit 11 of flag_bits) is a good idea, since there's a chance to create archives that won't be correctly readable by programs not supporting this bit (it's no secret that currently some programs just assume that filenames are encoded using one of system encodings). I think it is also fairly uniformly agreed that these programs are incorrect; the official encoding of file names in a zip file is Windows/DOS code page 437. Before replying to you I actually did some quick tests. I packed a file with localized filename and then opened it using explorer and also viewed it using the hexeditor: 7-Zip: directory cp866, header cp866: explorer sees correct filename. zipfile: directory cp1251, header cp1251: explorer sees incorrect filename. pkzip25.exe: directory cp866, header cp1251: explorer sees correct filenames, zipfile complains that filenames differ. zip.exe: directory cp1251, header cp1251: explorer sees incorrect filenames. Also note, that modifying filename in directory with a hex editor to cp866 made explorer see correct filenames. Another experiment with pkzip25 showed that modifying filename in directory makes it extract files with that filenam, i.e. it ignores header filename. The same behavior is showed by 7-Zip. So the general idea is that at least directory filename has some sort of convention of using oem (dos, console) encoding on Windows, cp866 in my case. Header filenames have different encodings, and seem to be ignored. I don't think that the situation on Windows is that the OEM code page should be used. Instead, CP 437 should be used, independent of the OEM code page. And on the contrary, pkzip25 made by PKWARE Inc. themselves behaves otherwise. +filename = str(self.filename) That would be incorrect, as it relies on the system encoding, which shouldn't be relied upon. Well, as I've seen in numerous examples above, system (or actually dos) encoding is actually what is used by at least by three major programs: 7-zip, pkzip25 and explorer, at least on windows. Plus, it would allow arbitrary non-string things as filenames. Hmm... why is that bad? What it should do instead (IMO) is to encode in CP437. Bonus points if it falls back to the UTF-8 feature of zip files if encoding as CP437 fails. And encoding to cp437 would be incorrect, as no currently existing program would correctly work on non-english Windows OSes. I think that letting the user deciding on the encoding is the right way to go here, as you can't know what user actually wants these days, it's all too hazy to me. And in case unicode is passed it just converts it using ascii (or default) codec. One can specify ascii codec there explicitly, if using system encoding is really an issue. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fwd: Instance variable access and descriptors
At 11:27 AM 6/10/2007 +0100, Gustavo Carneiro wrote: I have to agree with you. If removing support for self.__dict__['propertyname'] (where propertyname is also the name of a descriptor) is the price to pay for significant speedup, so be it. People doing that are asking for trouble anyway! How so? This order of lookup is explicitly defined by the precedence rules of PEP 252: When a dynamic attribute (one defined in a regular object's __dict__) has the same name as a static attribute (one defined by a meta-object in the inheritance graph rooted at the regular object's __class__), the static attribute has precedence if it is a descriptor that defines a __set__ method (see below); otherwise (if there is no __set__ method) the dynamic attribute has precedence. In other words, for data attributes (those with a __set__ method), the static definition overrides the dynamic definition, but for other attributes, dynamic overrides static. I fail to see how relying on explicitly-documented language behavior is asking for trouble. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] zipfile and unicode filenames
So the general idea is that at least directory filename has some sort of convention of using oem (dos, console) encoding on Windows, cp866 in my case. Header filenames have different encodings, and seem to be ignored. Ok, then this is what the zipfile module should implement. That would be incorrect, as it relies on the system encoding, which shouldn't be relied upon. Well, as I've seen in numerous examples above, system (or actually dos) encoding is actually what is used by at least by three major programs: 7-zip, pkzip25 and explorer, at least on windows. Please don't confuse Python's system encoding with the system's (or user's) standard encoding - they are not related at all. Using the OEM code page if everybody else does it is fine. Using the encoding that somebody hand-coded into the Python installation is not. Plus, it would allow arbitrary non-string things as filenames. Hmm... why is that bad? Errors should never pass silently. What it should do instead (IMO) is to encode in CP437. Bonus points if it falls back to the UTF-8 feature of zip files if encoding as CP437 fails. And encoding to cp437 would be incorrect, as no currently existing program would correctly work on non-english Windows OSes. I think that letting the user deciding on the encoding is the right way to go here, as you can't know what user actually wants these days, it's all too hazy to me. Asking the user is not practical. If the user was aware of the problem, you would not have run into the problem in the first place - you would have known to encode all file names before passing them into the zipfile module. The automatic mode should follow the standard or the conventions; the user (in quotes, because the end user is rarely bothered with that detail) can still override that explicitly. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] zipfile and unicode filenames
On 6/10/07, Martin v. Löwis [EMAIL PROTECTED] wrote: So the general idea is that at least directory filename has some sort of convention of using oem (dos, console) encoding on Windows, cp866 in my case. Header filenames have different encodings, and seem to be ignored. Ok, then this is what the zipfile module should implement. But this is only on Windows! I have no clue what's the common situation on other OSes and don't even know how to sanely get OEM codepage on Windows (the obvious way with ctypes.kernel32.GetOEMCP() doesn't seem good to me). So I guess that's bad idea anyway, maybe conforming to language bit is better (ascii will stay ascii anyway). What about this? Index: Lib/zipfile.py === --- Lib/zipfile.py (revision 55850) +++ Lib/zipfile.py (working copy) @@ -252,6 +252,7 @@ self.extract_version = max(45, self.extract_version) self.create_version = max(45, self.extract_version) +self._encodeFilename() header = struct.pack(structFileHeader, stringFileHeader, self.extract_version, self.reserved, self.flag_bits, self.compress_type, dostime, dosdate, CRC, @@ -259,6 +260,16 @@ len(self.filename), len(extra)) return header + self.filename + extra +def _encodeFilename(self): +if isinstance(self.filename, unicode): +self.filename = self.filename.encode('utf-8') +self.flag_bits = self.flag_bits | 0x800 + +def _decodeFilename(self): +if self.flag_bits 0x800: +self.filename = self.filename.decode('utf-8') +self.flag_bits = self.flag_bits ~0x800 + def _decodeExtra(self): # Try to decode the extra field. extra = self.extra @@ -683,6 +694,7 @@ t11, (t5)0x3F, (t0x1F) * 2 ) x._decodeExtra() +x._decodeFilename() x.header_offset = x.header_offset + concat self.filelist.append(x) self.NameToInfo[x.filename] = x @@ -967,6 +979,7 @@ extract_version = zinfo.extract_version create_version = zinfo.create_version +zinfo._encodeFilename() centdir = struct.pack(structCentralDir, stringCentralDir, create_version, zinfo.create_system, extract_version, zinfo.reserved, Index: Lib/test/test_zipfile.py === --- Lib/test/test_zipfile.py(revision 55850) +++ Lib/test/test_zipfile.py(working copy) @@ -515,6 +515,11 @@ # and report that the first file in the archive was corrupt. self.assertRaises(RuntimeError, zipf.testzip) +def testUnicodeFilenames(self): +zf = zipfile.ZipFile(TESTFN, w) +zf.writestr(ufoo.txt, Test for unicode filename) +zf.close() + def tearDown(self): support.unlink(TESTFN) support.unlink(TESTFN2) The problem is that I don't know if anything actually supports bit 11 at the time and can't even tell if I did this correctly or not. :( ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] zipfile and unicode filenames
But this is only on Windows! I have no clue what's the common situation on other OSes and don't even know how to sanely get OEM codepage on Windows (the obvious way with ctypes.kernel32.GetOEMCP() doesn't seem good to me). So I guess that's bad idea anyway, maybe conforming to language bit is better (ascii will stay ascii anyway). What about this? I haven't checked (*) whether you got the right value for flag_bits; assuming you do, this looks good. For compatibility, I would propose to use UTF-8 only if the file name is not ASCII. Even though the OEM code pages vary, they are (mostly) ASCII supersets. So if the string can be encoded in ASCII, there is no need to set the UTF-8 flag bit. OTOH, I now wonder whether it would *hurt* to have the flag bit: if old zip software does not choke if the flag is set, then it can just as well be set, as ASCII strings automatically get encoded as ASCII in UTF-8. Regards, Martin (*) I just now read http://www.pkware.com/documents/casestudies/APPNOTE.TXT and 0x800 seems to be the right value indeed. Notice, in appendix D, that the specification says that the historical encoding of file names is code page 437. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Instance variable access and descriptors
On Sun, Jun 10, 2007, Eyal Lotem wrote: Python, probably through the valid assumption that most attribute lookups go to the class, tries to look for the attribute in the class first, and in the instance, second. What Python currently does is quite peculiar! Here's a short description o PyObject_GenericGetAttr: A. Python looks for a descriptor in the _entire_ mro hierarchy (len(mro) class/type check and dict lookups). B. If Python found a descriptor and it has both get and set functions - it uses it to get the value and returns, skipping the next stage. C. If Python either did not find a descriptor, or found one that has no setter, it will try a lookup in the instance dict. D. If Python failed to find it in the instance, it will use the descriptor's getter, and if it has no getter it will use the descriptor itself. Guido, Ping, and I tried working on this at the sprint for PyCon 2003. We were unable to find any solution that did not affect critical-path timing. As other people have noted, the current semantics cannot be changed. I'll also echo other people and suggest that this discusion be moved to python-ideas if you want to continue pushing for a change in semantics. I just did a Google for my notes from PyCon 2003 and it appears that I never sent them out (probably because they aren't particularly comprehensible). Here they are for the record (from 3/25/2003): ''' CACHE_ATTR is the name used to describe a speedup (for new-style classes only) in attribute lookup by caching the location of attributes in the MRO. Some of the non-obvious bits of code: * If a new-style class has any classic classes in its bases, we can't do attribute caching (we need to weakrefs to the derived classes). * If searching the MRO for an attribute discovers a data descriptor (has tp_descr_set), that overrides any attribute that might be in the instance; however, the existence of tp_descr_get still permits the instance to override its bases (but tp_descr_get is called if there is no instance attribute). * We need to invalidate the cache for the updated attribute in all derived classes in the following cases: * an attribute is added or deleted to the class or its base classes * an attribute has its status changed to or from being a data descriptor This file uses Python pseudocode to describe changes necessary to implement CACHE_ATTR at the C level. Except for class Meta, these are all exact descriptions of the work being done. Except for class Meta the changes go into object.c (Meta goes into typeobject.c). The pseudocode looks somewhat C-like to ease the transformation. ''' NULL = object() def getattr(inst, name): isdata, where = lookup(inst.__class__, name) if isdata: descr = where[name] if hasattr(descr, __get__): return descr.__get__(inst) else: return descr value = inst.__dict__.get(name, NULL) if value != NULL: return value if where == NULL: raise AttributError descr = where[name] if hasattr(descr, __get__): value = descr.__get__(inst) else: value = descr return value def setattr(inst, name, value): isdata, where = lookup(inst.__class__, name) if isdata: descr = where[name] descr.__set__(inst, value) return inst.__dict__[name] = value def lookup(cls, name): if cls.__cache__ != NULL: pair = cls.__cache__.get(name) else: pair = NULL if pair: return pair else: for c in cls.__mro__: where = c.__dict__ if name in where: descr = where[name] isdata = hasattr(descr, __set__) pair = isdata, where break else: pair = False, NULL if cls.__cache__ != NULL: cls.__cache__[name] = pair return pair ''' These changes go into typeobject.c; they are not a complete description of what happens during creation/updates, only the changes necessary to implement CACHE_ATTRO. ''' from types import ClassType class Meta(type): def _invalidate(cls, name): if name in cls.__cache__: del cls.__cache__[name] for c in cls.__subclasses__(): if name not in c.__dict__: self._invalidate(c, name) def _build_cache(cls, bases): for base in bases: if type(base.__class__) is ClassType: cls.__cache__ = NULL break else: cls.__cache__ = {} def __new__ (cls, bases): self._build_cache(cls, bases) def __setbases__(cls, bases): self._build_cache(cls, bases) def __setattr__(cls, name, value): if cls.__cache__ != NULL: old = cls.__dict__.get(name, NULL) wasdata = old != NULL and hasattr(old, __set__) isdata = value != NULL and hasattr(value, __set__) if wasdata != isdata or (old == NULL)
Re: [Python-Dev] Frame zombies
On 6/10/07, Martin v. Löwis [EMAIL PROTECTED] wrote: Note that _only_ recursions will have more than 1 frame attached. That's not true; in the presence of threads, the same method may also be invoked more than one time simultaneously. Yes, I have missed that, and realized that I missed that myself a bit later. I guess I can rationalize that with the fact that I myself tend to avoid threads. But removing freelist altogether will not work well with any type of recursion. How do you know that? Did you measure the time? On what system? What were the results? Performance optimization without measuring is just unacceptable. I agree, I may have used the wrong tone above. Removing the freelist will probably either not have a significant effect (at worst, its adding very little work of maintaining it), or improve recursions and functions that tend to be running simultaniously in multiple threads (as in those cases the realloc will not actually resize the frame, and mallocs/free will indeed be saved). But do note my corrected tone (I said probably :-) - and anyone is welcome to try removing it and see if they get a performance benefit. The fact threading also causes the same code object to be used in multiple frames makes everything a little less predictable and may mean that having a larger-than-1 number of frames associated with each code object may indeed yield a performance benefit. I am not sure how to benchmark such modifications. Is there any benchmark that includes threaded use of the same functions in typical use cases? Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/eyal.lotem%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Frame zombies
I am not sure how to benchmark such modifications. Is there any benchmark that includes threaded use of the same functions in typical use cases? I don't think it's necessary to benchmark that specific case - *any* kind of micro-benchmark would be better than none. If you want to introduce free lists per code object, you need to benchmark such code, and compare it to the status quo. While doing so, I'd ask to also measure the case that the free list is dropped without a replacement. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] zipfile and unicode filenames
On 6/11/07, Martin v. Löwis [EMAIL PROTECTED] wrote: For compatibility, I would propose to use UTF-8 only if the file name is not ASCII. Even though the OEM code pages vary, they are (mostly) ASCII supersets. So if the string can be encoded in ASCII, there is no need to set the UTF-8 flag bit. Done: Index: Lib/zipfile.py === --- Lib/zipfile.py (revision 55850) +++ Lib/zipfile.py (working copy) @@ -252,13 +252,29 @@ self.extract_version = max(45, self.extract_version) self.create_version = max(45, self.extract_version) +filename, flag_bits = self._encodeFilenameFlags() header = struct.pack(structFileHeader, stringFileHeader, - self.extract_version, self.reserved, self.flag_bits, + self.extract_version, self.reserved, flag_bits, self.compress_type, dostime, dosdate, CRC, compress_size, file_size, - len(self.filename), len(extra)) -return header + self.filename + extra + len(filename), len(extra)) +return header + filename + extra +def _encodeFilenameFlags(self): +if isinstance(self.filename, unicode): +try: +return self.filename.encode('ascii'), self.flag_bits +except UnicodeEncodeError: +return self.filename.encode('utf-8'), self.flag_bits | 0x800 +else: +return self.filename, self.flag_bits + +def _decodeFilenameFlags(self): +if self.flag_bits 0x800: +return self.filename.decode('utf-8'), self.flag_bits ~0x800 +else: +return self.filename, self.flag_bits + def _decodeExtra(self): # Try to decode the extra field. extra = self.extra @@ -684,6 +700,7 @@ x._decodeExtra() x.header_offset = x.header_offset + concat +x.filename, x.flag_bits = x._decodeFilenameFlags() self.filelist.append(x) self.NameToInfo[x.filename] = x if self.debug 2: @@ -967,16 +984,17 @@ extract_version = zinfo.extract_version create_version = zinfo.create_version +filename, flag_bits = zinfo._encodeFilenameFlags() centdir = struct.pack(structCentralDir, stringCentralDir, create_version, zinfo.create_system, extract_version, zinfo.reserved, - zinfo.flag_bits, zinfo.compress_type, dostime, dosdate, + flag_bits, zinfo.compress_type, dostime, dosdate, zinfo.CRC, compress_size, file_size, - len(zinfo.filename), len(extra_data), len(zinfo.comment), + len(filename), len(extra_data), len(zinfo.comment), 0, zinfo.internal_attr, zinfo.external_attr, header_offset) self.fp.write(centdir) -self.fp.write(zinfo.filename) +self.fp.write(filename) self.fp.write(extra_data) self.fp.write(zinfo.comment) Index: Lib/test/test_zipfile.py === --- Lib/test/test_zipfile.py(revision 55850) +++ Lib/test/test_zipfile.py(working copy) @@ -515,6 +515,12 @@ # and report that the first file in the archive was corrupt. self.assertRaises(RuntimeError, zipf.testzip) +def testUnicodeFilenames(self): +zf = zipfile.ZipFile(TESTFN, w) +zf.writestr(ufoo.txt, Test for unicode filename) +assert isinstance(zf.infolist()[0].filename, unicode) +zf.close() + def tearDown(self): support.unlink(TESTFN) support.unlink(TESTFN2) What I also changed is to encode filenames only for writing to the target file, without damaging ZipInfo. The reason for this is that if user decides to enumerate infolist after she wrote files to ZipFile, she would expect ZipInfo.filename to be what she passed to ZipFile.write/ZipFile.writestr. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Question about dictobject.c:lookdict_string
My question is specifically regarding the transition back from lookdict_string (the initial value) to the general lookdict. Currently, when a string-only dict is trying to look up any non-string, it reverts back to a general lookdict. Wouldn't it be better (especially in the more important case of a string-key-only dict), to revert to the generic lookdict when a non-string is inserted to the dict, rather than when one is being searched? This seems to me as it would shift this (admittedly very slight) performance cost of a type ptr comparison from the read-access, to write-access on all dicts (which means insertions of new keys in non-string-only dicts may pay for another check, or that the lookdict funcptr will be replaced by two funcptrs so that a different insertion func on string-only dicts is used too [was tempted to say vtable here, but that would add another dereference to lookups]). It would also have the slight benefit of speeding up non-string lookups in string-only dicts. This does not seem like a significant issue, but as I know a lot of effort went into optimizing dicts, I was wondering if I am missing something here. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] zipfile and unicode filenames
On 6/11/07, Alexey Borzenkov [EMAIL PROTECTED] wrote: The problem is that I don't know if anything actually supports bit 11 at the time and can't even tell if I did this correctly or not. :( I downloaded the latest WinZip and can confirm that it parses utf-8 filenames correctly (although it seems to treat presence of bit 11 more like enabling autodetection mode, not strict utf-8, but it must be because it has to cope with lots of incorrect zip files), i.e. in the presence of bit 11 it understands filename to be utf-8, without presence of bit 11 it treats it just like oem. :) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com