[issue14200] Idle shell crash on printing non-BMP unicode character
Vlastimil Brom vlastimil.b...@gmail.com added the comment: Sorry for mixing the different problems, these were somehow things I noticed at once in the new python version, but I should have noticed the different domains myself. I still might not understand the term crash properly - I just meant to distinguish between a single appropriate exception on an invalid operation (while the app is staying alive and works on next valid input) - as is the case with calling through python.exe, and - on the other hand - the immediate termination on encountering the invalid input, which happens with pythonw.exe. Now I see, that with pythonw a tk app terminates with the first exception (in general) in py 3.3 and also 3.2 (as opposed to py 2.7, where it just swallows the exception and stays alive, as one would probably expect). Should this be reported in a separate issue, or is this what remains relevant in *this* report? (Sorry for the confusion.) vbr -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14200 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14200] Idle shell crash on printing non-BMP unicode character
New submission from Vlastimil Brom vlastimil.b...@gmail.com: Hi, while testing python 3.3a1 a bit, especially the new string handling of non-BMP characters, I noticed a problem in Idle in this regard: Python 3.3.0a1 (default, Mar 4 2012, 17:27:59) [MSC v.1500 32 bit (Intel)] on win32 ... [using win XPp SP3 Czech] got_ahsa = \N{GOTHIC LETTER AHSA} len(got_ahsa) 1 got_ahsa.encode(unicode-escape) b'\\U00010330' got_ahsa [crash - idle shell window closes immediately without any visible error message or traceback] I realised later, that tkinter probably won't be able to print wide-unicode characters anyway (according to http://bugs.python.org/issue12342 ), but Idle should probably just print the exception introduced there, e.g. ValueError: character U+10330 is above the range (U+-U+) allowed by Tcl Regards vbr -- components: IDLE, Tkinter, Unicode messages: 154944 nosy: ezio.melotti, vbr priority: normal severity: normal status: open title: Idle shell crash on printing non-BMP unicode character versions: Python 3.3 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14200 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14200] Idle shell crash on printing non-BMP unicode character
Vlastimil Brom vlastimil.b...@gmail.com added the comment: Hi, thanks for the pointer, after invoking idle using python.exe, I don't see the crash mentioned in the report: Python 3.3.0a1 (default, Mar 4 2012, 17:27:59) [MSC v.1500 32 bit (Intel)] on win32 Type copyright, credits or license() for more information. got_ahsa = \N{GOTHIC LETTER AHSA} len(got_ahsa) 1 got_ahsa.encode(unicode-escape) b'\\U00010330' got_ahsa print(got_ahsa) I just get empty line as answer but no crash. The console indeed contains the traceback with the error I expected vbr Microsoft Windows XP [Verze 5.1.2600] (C) Copyright 1985-2001 Microsoft Corp. C:\Python33python.exe -m idlelib.idle *** Internal Error: rpc.py:SocketIO.localcall() Object: stdout Method: bound method PseudoFile.write of idlelib.PyShell.PseudoFile object at 0x01CDDB50 Args: ('\U00010330',) Traceback (most recent call last): File C:\Python33\lib\idlelib\rpc.py, line 188, in localcall ret = method(*args, **kwargs) File C:\Python33\lib\idlelib\PyShell.py, line 1244, in write self.shell.write(s, self.tags) File C:\Python33\lib\idlelib\PyShell.py, line 1226, in write OutputWindow.write(self, s, tags, iomark) File C:\Python33\lib\idlelib\OutputWindow.py, line 40, in write self.text.insert(mark, s, tags) File C:\Python33\lib\idlelib\Percolator.py, line 25, in insert self.top.insert(index, chars, tags) File C:\Python33\lib\idlelib\ColorDelegator.py, line 80, in insert self.delegate.insert(index, chars, tags) File C:\Python33\lib\idlelib\PyShell.py, line 322, in insert UndoDelegator.insert(self, index, chars, tags) File C:\Python33\lib\idlelib\UndoDelegator.py, line 81, in insert self.addcmd(InsertCommand(index, chars, tags)) File C:\Python33\lib\idlelib\UndoDelegator.py, line 116, in addcmd cmd.do(self.delegate) File C:\Python33\lib\idlelib\UndoDelegator.py, line 219, in do text.insert(self.index1, self.chars, self.tags) File C:\Python33\lib\idlelib\ColorDelegator.py, line 80, in insert self.delegate.insert(index, chars, tags) File C:\Python33\lib\idlelib\WidgetRedirector.py, line 104, in __call__ return self.tk_call(self.orig_and_operation + args) ValueError: character U+10330 is above the range (U+-U+) allowed by Tcl -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14200 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14200] Idle shell crash on printing non-BMP unicode character
Vlastimil Brom vlastimil.b...@gmail.com added the comment: I'd like to add some further observations to the mentioned issue; it seems, that the crash is indeed not specific to idle. In a sample tkinter app, where I just display e.g. chr(66352) in an Entry widget, I also get the same immediate crash via pythonw.exe and the previously mentioned proper ValueError without a crash with python.exe. I also tried to explicitly display surrogate pair, which were used automatically until python 3.2; these can be used in tkinter in 3.3, but there are limitations and discrepancies: got_ahsa = \N{GOTHIC LETTER AHSA} def wide_char_to_surrog_pair(char): code_point = ord(char) if code_point = 0x: return char else: high_surr = (code_point - 0x1) // 0x400 + 0xD800 low_surr = (code_point - 0x1) % 0x400 + 0xDC00 return chr(high_surr)+chr(low_surr) ahsa_surrog = wide_char_to_surrog_pair(got_ahsa) print(ahsa_surrog) ̰ repr(ahsa_surrog) '_ud800\x00udf30' ahsa_surrog 'Pud800 udf30' [the space in the middle of the last item might be \x00, as it terminates the clipboard content, the rest is copied separately] the printed square corresponds with the given character and can be used in other programs etc. (whereas in py 3.2, the same value was used for repr and a direct display of the string in the interpreter, there are three different formats in py 3.3. I also noticed that surogate pair is not supported as input for unicodedata.name(...) anymore: import unicodedata unicodedata.name(ahsa_surrog) Traceback (most recent call last): File pyshell#60, line 1, in module unicodedata.name(ahsa_surrog) TypeError: need a single Unicode character as parameter (in 3.2 and probably others it returns the expected 'GOTHIC LETTER AHSA') (I for my part would think, that e.g. keeping a bit liberal (but still non-ambiguous) input possibilities for unicodedata wouldn't hurt. Also, if tkinter is not going to support wide unicode natively any time soon, the output conversion using surrogates, which are also understandable for other programs, seems the most usable option in this regard. Hopefully, this is somehow relevant for the original issue - I am somehow not sure, whether some parts would be better posted as separate issues, or whether this is the planned and expected behaviour anyway. regards, vbr -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14200 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Adding a new regex module (compatible with re)
Vlastimil Brom vlastimil.b...@gmail.com added the comment: Not that it matters in any way, but if the regex semantics has to be distinguished via non-standard custom flags; I would prefer even less wordy flags, possibly such that the short forms for the in-pattern flag setting would be one-letter (such as all the other flags) and preferably some with underlying plain English words as base, to get some mnemotechnics (which I don't see in the numbered versions requiring one to keep track of the rather internal library versioning). Unfortunately, it might be difficult to find suitable names, given the objections expressed against the already discussed ones. (FOr what it is worth, I thought e.g. of [t]raditional and [e]nhanced, but these also suffer from some of the mentioned disadvantages... vbr -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Adding a new regex module (compatible with re)
Vlastimil Brom vlastimil.b...@gmail.com added the comment: I'd agree with Steven ( msg143377 ) and others, that there probably shouldn't be a large library-specific set of new tags just for housekeeping purposes between re and regex. I would personally prefer, that these tags also be settable in the pattern (?...), which would probably be problematic with versioned flags. Although I am trying to take advantage of the new additions, if applicable, I agree, that there should be a possibility to use regex in an unreflected way with the same behaviour like re (maybe except for the fixes of what will be agreed on to be a bug (enough)). On the other hand, it seems to me, that the enhancements/additions can be enabled at once, as an user upgrading the regexes for the new library consciously (or a new user not knowing re) can be supposed to know the new features and their implications. I guess, it is mostly trivially possible to fix/disambiguate the problematic patterns, e.g. by escaping. As for setting the new/old behaviour, would there be a possibility to distinguish it just by importing (possibly through some magic, without the need to duplicate the code?), import re_in_compat_mode as re vs: import re_with_all_the_new_features as re Unfortunately, i have no idea, whether this is possible or viable... with this option, the (user) code update could be just the change of the imports instead of adding the flags to all relevant places (and to take them away as redundant, as the defaults evolve with the versions...). However, it is not clear, how this aliasing would work out with regard to the transition, maybe the long differenciated module names could be kept and the meaning of import re would change, allong with the previous warnings, in some future version. just a few thoughts... vbr -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11744] re.LOCALE doesn't reflect locale.setlocale(...)
Vlastimil Brom vlastimil.b...@gmail.com added the comment: Thanks for the comment for string.letters and further reference. Given, that Mr. Barnett mentioned in his tracker to regex ( http://code.google.com/p/mrab-regex-hg/issues/detail?id=6 ), that he only supports the LOCALE flag because of the compatibility with re and given my zero knowledge of C, I suppose, we will live with the status quo. I guess, if there were a well defined source of letters for the given locales, the implementation wouldn't necessarily have to be be that complex (in the context of the regex code), but as there is probably no agreement in this respect (if string.letters is questionable), it becomes pointless. After all, one can define a needed regex pattern manually, and mrab's regex library makes it much easier due to the support for unicode properties and others. vbr -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11744 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11744] re.LOCALE doesn't reflect locale.setlocale(...)
New submission from Vlastimil Brom vlastimil.b...@gmail.com: Hi, I just noticed a behaviour of the re.LOCALE flag I can't understand; I first reported this to the new regex implementation, which, however, only mimics the standard lib re in this case: http://code.google.com/p/mrab-regex-hg/issues/detail?id=6 I also couldn't find anything relevant in the tracker, other than some older, already fixed issues; I'm sorry, if I missed something. I thought, the search pattern (?L)\w would match any of the respective string.letters according to the current locale (and possibly additionally [0-9_]). However, the locale doesn't seem to be reflected in an expected way. unicode_BMP = + .join(unichr(i)for i in range(1, 0x1)) import locale locale.setlocale(locale.LC_ALL, ) 'Czech_Czech Republic.1250' import re print(.join(re.findall(r(?L)\w, unicode_BMP))) 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz£¥ª¯³µ¹º¼¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþ locale.setlocale(locale.LC_ALL, Greek) 'Greek_Greece.1253' print(.join(re.findall(r(?L)\w, unicode_BMP))) 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz¢²³µ¸¹º¼¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþ unicode_BMP = + .join(unichr(i)for i in range(1, 0x1)) locale.setlocale(locale.LC_ALL, ) 'Czech_Czech Republic.1250' print unicode(string.letters, windows-1250) ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzŠŚŤŽŹšśťžźŁĄŞŻłµąşĽľżŔÁÂĂÄĹĆÇČÉĘËĚÍÎĎĐŃŇÓÔŐÖŘŮÚŰÜÝŢßŕáâăäĺćçčéęëěíîďđńňóôőöřůúűüýţ locale.setlocale(locale.LC_ALL, Greek) 'Greek_Greece.1253' print unicode(string.letters, windows-1253) ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzƒΆµΈΉΊΌΎΏΐΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩΪΫάέήίΰαβγδεζηθικλμνξοπρςστυφχψωϊϋόύώ It seems that the nearest letter set to the result of the re/regex LOCALE flags migt be ascii or US locale: locale.setlocale(locale.LC_ALL, US) 'English_United States.1252' print unicode(string.letters, windows-1252) ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzƒŠŒŽšœžŸªµºÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ however, there are some differences too, namely between z and À re (?L)\w : Czech z£¥ª¯³µ¹º¼¾¿À Greek z¢²³µ¸¹º¼¾¿À string.letters -- US locale zƒŠŒŽšœžŸªµºÀ (as displayed in tkinter Idle shell) (in either case, there are some items, one wouldn't consider usual word characters, cf. ¿) I am not sure whether there are no other issues (like some encoding/displaying peculiarities in Tkinter), but the re matching using the LOCALE flag don't reflect the locale.setlocale(...) in a transparent way. Is it supposed to work this way and is there another possibility to get the expected locale aware matching, as one might expect according to: http://docs.python.org/library/re.html#re.LOCALE Make \w, \W, \b, \B, \s and \S dependent on the current locale. using Python 2.7.1, 32 bit; win 7 Home Premium 64-bit, Czech. in Python 3.1.3 as well as 3.2 the result is the same (with the appropriately modified code): ... import locale locale.setlocale(locale.LC_ALL, ) 'Czech_Czech Republic.1250' import re print(.join(re.findall(r(?L)\w, unicode_BMP))) 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz£¥ª¯³µ¹º¼¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþ However, in Python 3, there is no comparison with string.letters available anymore. Regards, Vlastimil Brom -- components: Regular Expressions, Unicode messages: 132826 nosy: vbr priority: normal severity: normal status: open title: re.LOCALE doesn't reflect locale.setlocale(...) type: behavior versions: Python 2.7, Python 3.1, Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11744 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10459] missing character names in unicodedata (CJK...)
New submission from Vlastimil Brom vlastimil.b...@gmail.com: I just noticed an ommision of come character names in unicodedata module. These are some CJK - Ideographs: 龼 (0x9fbc) - 鿋 (0x9fcb) (CJK Unified Ideographs [19968-40959] [0x4e00-0x9fff]) ꜀ (0x2a700) - 뜴 (0x2b734) (CJK Unified Ideographs Extension C [173824-177983] [0x2a700-0x2b73f]) 띀 (0x2b740) - 렝 (0x2b81d) (CJK Unified Ideographs Extension D [177984-178207] [0x2b740-0x2b81f]) The names are probably to be generated - e.g. CJK UNIFIED IDEOGRAPH-2A700 ... etc. (Tested with the recompiled unicodedata - using unicode 6.0; with the py 27 - builtin module (unidata_version: '5.2.0') only the first two ranges are relevant (as CJK Unified Ideographs Extension D is an adition of Unicode 6) (Also there are the unprintable ASCII controls, surrogates and private use areas, where the missing names are probably ok.) I tested with the following rather clumsy code: # # # # # # # # # # # # # # # # wide_unichr = custom unichr emulating unicode ranges beyond on narrow python build codepoints_missing_char_names = [[-2,-2],] # dummy for i in xrange(0x10+1): if unicodedata.category(wide_unichr(i))[:1] != 'C' and unicodedata.name(wide_unichr(i), u??noname??) == u??noname??: if codepoints_missing_char_names[-1][1] == i-1: codepoints_missing_char_names[-1][1] = i else: codepoints_missing_char_names.append([i, i]) for first, last in codepoints_missing_char_names[1:]: print u%s (%s) - %s (%s) % (wide_unichr(first), hex(first), wide_unichr(last), hex(last),) # # # # # # # # # # # # # # # # # # # # # # # # # # Unfortunately, I can't provide a fix, as unicodedata involves C code, where my knowledge is near zero. vbr -- messages: 121521 nosy: vbr priority: normal severity: normal status: open title: missing character names in unicodedata (CJK...) ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10459 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10459] missing character names in unicodedata (CJK...)
Changes by Vlastimil Brom vlastimil.b...@gmail.com: -- components: +Library (Lib), Unicode type: - behavior ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10459 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Vlastimil Brom vlastimil.b...@gmail.com added the comment: I'd have liked to suggest updating the underlying unicode data to the latest standard 6.0, but it turns out, it might be problematic with the cross-version compatibility; according to the clarification in http://bugs.python.org/issue10400 the 3... versions are going to be updated, while it is not allowed in the 2.x series. I guess it would cause maintainance problems (as the needed properties are not available via unicodedata). Anyway, while I'd like the recent unicode data to be supported (new characters, ranges, scripts, and corrected individual properties...), I'm much happier, that there is support for the 2 series in regex... vbr -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Vlastimil Brom vlastimil.b...@gmail.com added the comment: Thank you very much! a quick test with my custom unicodedata with 6.0 on py 2.7 seems ok. I hope, there won't be problems with cooperation of the more recent internal data with the original 5.2 database in python 2.x releases. vbr -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10400] updating unicodedata to Unicode 6
New submission from Vlastimil Brom vlastimil.b...@gmail.com: I'd like to suggest updating the unicodedata module according to the recent Unicode standard 6.0 http://www.unicode.org/versions/Unicode6.0.0/ I'm sorry to bother, in case this is planned automatically, I just wasn't able to find the respective information. Would it be possible to apply such update also for the upcomming python 2.7.1, or are there some showstoppers/incompatibilities... with regard to the new unicode version? regards, vbr -- components: Unicode messages: 121070 nosy: vbr priority: normal severity: normal status: open title: updating unicodedata to Unicode 6 type: feature request versions: Python 2.7, Python 3.1, Python 3.2, Python 3.3 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10400 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10400] updating unicodedata to Unicode 6
Vlastimil Brom vlastimil.b...@gmail.com added the comment: Thanks for the clarification; I obviously looked in an inappropriate branch before. Sorry for the noise... vbr -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10400 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Vlastimil Brom vlastimil.b...@gmail.com added the comment: Maybe I am missing something, but the result in regex seem ok to me: \A is treated like A in a character set; when the test string is changed to A b c or in the case insensitive search the A is matched. [\A\s]\w doesn't match the starting a, as it is not followed by any word character: for s in [r'\A\w', r'[\A]\w', r'[\A\s]\w']: print regex.findall(s, 'A b c') ... ['A'] [] [' b', ' c'] for s in [r'\A\w', r'(?i)[\A]\w', r'[\A\s]\w']: print regex.findall(s, 'a b c') ... ['a'] [] [' b', ' c'] In the original re there seem to be a bug/limitation in this regard (\A and also \Z in character sets aren't supported in some combinations... vbr -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Vlastimil Brom vlastimil.b...@gmail.com added the comment: There seems to be a bug in the handling of numbered backreferences in sub() in issue2636-20101102.zip I believe, it would be a fairly new regression, as it would be noticed rather soon. (tested on Python 2.7; winXP) re.sub(([xy]), -\\1-, abxc) 'ab-x-c' regex.sub(([xy]), -\\1-, abxc) Traceback (most recent call last): File stdin, line 1, in module File C:\Python27\lib\regex.py, line 176, in sub return _compile(pattern, flags).sub(repl, string, count, pos, endpos) File C:\Python27\lib\regex.py, line 375, in _compile_replacement compiled.extend(items) TypeError: 'int' object is not iterable vbr -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Vlastimil Brom vlastimil.b...@gmail.com added the comment: Sorry for the noise, please, forgot my previous msg120215; I somehow managed to keep an older version of _regex_core.py along with the new regex.py in the Lib directory, which are obviously incompatible. After updating the files correctly, the mentioned examples work correctly. vbr -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Vlastimil Brom vlastimil.b...@gmail.com added the comment: I tried to give the 64-bit version a try, but I might have encountered a more general difficulties. I tested this on Windows 7 Home Premium (Czech), the system is 64-bit (or I've hoped so sofar :-), according to System info: x64-based PC I installed Python 2.7 Windows X86-64 installer from http://www.python.org/download/ which run ok, but the header in the python shell contains win32 Python 2.7 (r27:82525, Jul 4 2010, 07:43:08) [MSC v.1500 64 bit (AMD64)] on win32 Type help, copyright, credits or license for more information. Consequently, after copying the respecitive files from issue2636-20101009.zip I get an import error: import regex Traceback (most recent call last): File stdin, line 1, in module File C:\Python_64bit_27\lib\regex.py, line 253, in module from _regex_core import * File C:\Python_64bit_27\lib\_regex_core.py, line 53, in module import _regex ImportError: DLL load failed: %1 nenÝ platnß aplikace typu Win32. (The last part of the message is a in Czech with broken diacritics: %1 is not a valid Win32 type application.) Is there something I can do in this case? I'd think, the installer would refuse to install a 64-bit software on a 32-bit OS or 32-bit architecture, or am I missing something obvious from the naming peculiarities x64, 64bit etc.? That being said, I probably don't need to use 64-bit version of python, obviously, it isn't a wide unicode build mentioned earlier, hence len(u\U00010333) # is still: 2 And I currently don't have special memory requirements, which might be better addressed on a 64-bit system. If there is something I can do to test regex in this environment, please, let me know; On the same machine the 32-version is ok: Python 2.7 (r27:82525, Jul 4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)] on win32 Type help, copyright, credits or license for more information. import regex regards vbr -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Vlastimil Brom vlastimil.b...@gmail.com added the comment: Well, it seemed to me too, I happened to read the last post from Matthew, msg118243, in the sense that he made some updates which need testing on a 64 bit system (I am unsure, whether hardware architecture, OS type, python build or something else was meant); but it must have been somehow separated as a new directory in the issue2636-20101009.zip which is not the case. More generaly, I was somhow confused about the win32 in the shell header in the mentioned install. vbr -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Vlastimil Brom vlastimil.b...@gmail.com added the comment: Sorry for the noise, it seems, I can go back to the 32-bit python for now then... vbr -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Vlastimil Brom vlastimil.b...@gmail.com added the comment: Well, of course, the surrogates probably shouldn't be handled separately in one module independently of the rest of the standard library. (I actually don't know such narrow implementation (although it is mentioned in those unicode quidelines http://unicode.org/reports/tr18/#Supplementary_Characters ) The main surprise on my part was due to the compile error rather than empty match as was the case with re; but now I see, that it is a consequence of the newly introduced wide unicode notation, the matching behaviour changed consistently. (for my part, the workarounds I found, seem to be sufficient in the cases I work with wide unicode; most likely I am not going to compile wide unicode build on windows myself in the near future :-) vbr -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Vlastimil Brom vlastimil.b...@gmail.com added the comment: I like the idea of the general new flag introducing the reasonable, backwards incompatible behaviour; one doesn't have to remember a list of non-standard flags to get this features. While I recognise, that the module probably can't work correctly with wide unicode characters on a narrow python build (py 2.7, win XP in this case), i noticed a difference to re in this regard (it might be based on the absence of the wide unicode literal in the latter). re.findall(u\\U00010337, ua\U00010337bc) [] re.findall(u(?i)\\U00010337, ua\U00010337bc) [] regex.findall(u\\U00010337, ua\U00010337bc) [] regex.findall(u(?i)\\U00010337, ua\U00010337bc) Traceback (most recent call last): File input, line 1, in module File C:\Python27\lib\regex.py, line 203, in findall return _compile(pattern, flags).findall(string, pos, endpos, File C:\Python27\lib\regex.py, line 310, in _compile parsed = parsed.optimise(info) File C:\Python27\lib\_regex_core.py, line 1735, in optimise if self.is_case_sensitive(info): File C:\Python27\lib\_regex_core.py, line 1727, in is_case_sensitive return char_type(self.value).lower() != char_type(self.value).upper() ValueError: unichr() arg not in range(0x1) (narrow Python build) I.e. re fails to match this pattern (as it actually looks for U00010337 ), regex doesn't recognise the wide unicode as surrogate pair either, but it also raises an error from narrow unichr. Not sure, whether/how it should be fixed, but the difference based on the i-flag seems unusual. Of course it would be nice, if surrogate pairs were interpreted, but I can imagine, that it would open a whole can of worms, as this is not thoroughly supported in the builtin unicode either (len, indices, slicing). I am trying to make wide unicode characters somehow usable in my app, mainly with hacks like extended unichr (\U+hex(67)[2:].zfill(8)).decode(unicode-escape) or likewise for ord surrog_ord = (ord(first) - 0xD800) * 0x400 + (ord(second) - 0xDC00) + 0x1 Actually, using regex, one can work around some of these limitations of len, index or slice using a list form of the string containing surrogates regex.findall(ur(?s)(?:\p{inHighSurrogates}\p{inLowSurrogates})|., uab̷̸̹cd) [u'a', u'b', u'\U00010337', u'\U00010338', u'\U00010339', u'c', u'd'] but apparently things like wide unicode literals or character sets (even extending of the shorthands like \w etc.) are much more complicated. regards, vbr -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Vlastimil Brom vlastimil.b...@gmail.com added the comment: Not that my opinion matters, but for what is it worth, I find it rather unusual to have to use special flags to get normal (for some definition of normal) behaviour, while retaining the defaults buggy in some way (like ZEROWIDTH). I would think, the backwards compatibility would not be needed under these circumstances - in such probably marginal cases (or is setting global flags at the end or otherwhere than on beginning oof the pattern that frequent?). It seems, that with many new features and enhancements for previously impossible patterns, chances are, that the code using regular expressions in a more advanced way might benefit from reviewing the patterns (where also the flags for historical behaviour could be adjusted if really needed). Anyway, thanks for further improvements! (although it broke my custom function previously misusing the internal data of the regex module for getting the unicode script property (currently unavailable via unicodedata) :-). Best regards, vbr -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Vlastimil Brom vlastimil.b...@gmail.com added the comment: Just another rather marginal findings; differences between regex and re: regex.findall(r[\B], aBc) ['B'] re.findall(r[\B], aBc) [] (Python 2.7 ... on win32; regex - issue2636-20100912.zip) I believe, regex is more correct here, as uppercase \B doesn't have a special meaning within a set (unlike backspace \b), hence it should be treated as B, but I wanted to mention it as a difference, just in case it would matter. I also noticed another case, where regex is more permissive: regex.findall(r[\d-h], ab12c-h) ['1', '2', '-', 'h'] re.findall(r[\d-h], ab12c-h) Traceback (most recent call last): File input, line 1, in module File re.pyc, line 177, in findall File re.pyc, line 245, in _compile error: bad character range howewer, there might be an issue in negated sets, where the negation seem to apply for the first shorthand literal only; the rest is taken positively regex.findall(r[^\d-h], a^b12c-h) ['-', 'h'] cf. also a simplified pattern, where re seems to work correctly: regex.findall(r[^\dh], a^b12c-h) ['h'] re.findall(r[^\dh], a^b12c-h) ['a', '^', 'b', 'c', '-'] or maybe regardless the order - in presence of shorthand literals and normal characters in negated sets, these normal characters are matched positively regex.findall(r[^h\s\db], a^b 12c-h) ['b', 'h'] re.findall(r[^h\s\db], a^b 12c-h) ['a', '^', 'c', '-'] also related to character sets but possibly different - maybe adding a (reduntant) character also belonging to the shorthand in a negated set seem to somehow confuse the parser: regex.findall(r[^b\w], a b) [] re.findall(r[^b\w], a b) [' '] regex.findall(r[^b\S], a b) [] re.findall(r[^b\S], a b) [' '] regex.findall(r[^8\d], a 1b2) [] re.findall(r[^8\d], a 1b2) ['a', ' ', 'b'] I didn't find any relevant tracker issues, sorry if I missed some... I initially wanted to provide test code additions, but as I am not sure about the intended output in all cases, I am leaving it in this form; vbr -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Vlastimil Brom vlastimil.b...@gmail.com added the comment: Thanks for the update; Just a small observation regarding some character ranges and ignorecase, probably irrelevant, but a difference to the current re anyway: zero2z = u0123456789:;=?...@abcdefghijklmnopqrstuvwxyz[\]^_`abcdefghijklmnopqrstuvwxyz re.findall((?i)[X-d], zero2z) [] regex.findall((?i)[X-d], zero2z) [u'A', u'B', u'C', u'D', u'X', u'Y', u'Z', u'[', u'\\', u']', u'^', u'_', u'`', u'a', u'b', u'c', u'd', u'x', u'y', u'z'] re.findall((?i)[B-d], zero2z) [u'B', u'C', u'D', u'b', u'c', u'd'] regex.findall((?i)[B-d], zero2z) [u'A', u'B', u'C', u'D', u'E', u'F', u'G', u'H', u'I', u'J', u'K', u'L', u'M', u'N', u'O', u'P', u'Q', u'R', u'S', u'T', u'U', u'V', u'W', u'X', u'Y', u'Z', u'[', u'\\', u']', u'^', u'_', u'`', u'a', u'b', u'c', u'd', u'e', u'f', u'g', u'h', u'i', u'j', u'k', u'l', u'm', u'n', u'o', u'p', u'q', u'r', u's', u't', u'u', u'v', u'w', u'x', u'y', u'z'] It seems, that the re module is building the character set using a case insensitive alphabet in some way. I guess, the behaviour of re is buggy here, while regex is ok (tested on py 2.7, Win XPp). vbr -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2986] difflib.SequenceMatcher not matching long sequences
Vlastimil Brom vlastimil.b...@gmail.com added the comment: I guess, I am not supposed to post to python-dev - not being a python developer, hopefully it is appropriate to add a comment here - only based on my current usage of (a modified) difflib.SequenceMatcher. It seems, the mentions of text comparison in that thread, e.g. http://mail.python.org/pipermail/python-dev/2010-July/101515.html etc. rather imply line-by-line comparison, and possibly character comparison of matched lines. For me the direct character-wise comparison is more useful in most cases. With the popular heuristics disabled the results look pretty well. (the script only involves changing the background colour of the compared texts - based on the SequenceMatcher - get_opcodes() ) Just now, I only need to disable the popular check, currently I use a monkey-patched subclass of SequenceMatcher with extended signature and modified __chain_b function. cf. http://mail.python.org/pipermail/python-list/2010-June/1247907.html I would vote for extending the SequenceMatcher API to enable adjustments (leaving the default values as the current ones) - enable/disable popular check, set the thresholds for string length and popular frequency (and eventually other parameters, which might be added). Are there some restrictions on API changes in a library due to a moratorium - even if the default behaviour remains unchanged? Otherwise, what might be the disadvantages of this approach? If the current behaviour is considered appropriate for the original usecases, other uses would be also made possible/easier - only at the cost of learning the meaning of the added parameters - from the enhanced docs, of course. vbr -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2986 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Vlastimil Brom vlastimil.b...@gmail.com added the comment: Thanks for the prompt fix! It would indeed be nice to see this enhanced re module in the standard library e.g. in 3.2, but I also really appreciate, that also multiple 2.x versions are supported (as my current main usage of this library involves py2-only wx gui). As for the usage statistics, I for one always downloaded the updates from here rather than pypi, but maybe it is not a regular case. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Vlastimil Brom vlastimil.b...@gmail.com added the comment: I just noticed a somehow strange behaviour in matching character sets or alternate matches which contain some more advanced unicode characters, if they are in the search pattern with some simpler ones. The former seem to be ignored and not matched (the original re engine matches all of them); (win XPh SP3 Czech, Python 2.7; regex issue2636-20100414) print u.join(regex.findall(u., ueèéêëēěė)) eèéêëēěė print u.join(regex.findall(u[eèéêëēěė], ueèéêëēěė)) eèéêëē print u.join(regex.findall(ue|è|é|ê|ë|ē|ě|ė, ueèéêëēěė)) eèéêëē print u.join(re.findall(u[eèéêëēěė], ueèéêëēěė)) eèéêëēěė print u.join(re.findall(ue|è|é|ê|ë|ē|ě|ė, ueèéêëēěė)) eèéêëēěė even stranger, if the pattern contains only these higher unicode characters, everything works ok: print u.join(regex.findall(uē|ě|ė, ueèéêëēěė)) ēěė print u.join(regex.findall(u[ēěė], ueèéêëēěė)) ēěė The characters in question are some accented latin letters (here in ascending codepoints), but it can be other scripts as well. print regex.findall(u., ueèéêëēěė) [u'e', u'\xe8', u'\xe9', u'\xea', u'\xeb', u'\u0113', u'\u011b', u'\u0117'] The threshold isn't obvious to me, at first I thought, the characters represented as unicode escapes are problematic, whereas those with hexadecimal escapes are ok; however ē - u'\u0113' seems ok too. (python 3.1 behaves identically: regex.findall([eèéêëēěė], eèéêëēěė) ['e', 'è', 'é', 'ê', 'ë', 'ē'] regex.findall([ēěė], eèéêëēěė) ['ē', 'ě', 'ė'] ) vbr -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2986] difflib.SequenceMatcher not matching long sequences
Vlastimil Brom vlastimil.b...@gmail.com added the comment: I just stumbled on some seemingly different unexpected behaviour of difflib.SequenceMatcher, but it turns out, it may have the same cause, i.e. the popular heuristics. I hopefully managed to replicate it on an illustrative sample text - in as included in the attached file. (I also mentioned this issue in hte python-list http://mail.python.org/pipermail/python-list/2010-April/1241951.html but as there were no replies I eventually found, this might be more appropriate place.) Both strings differ in a minimal way, each having one extra character in a strategic position, which probably meets some pathological case for difflib. Instead of just reporting the insertion and deletion of these single characters (which works well for most cases - with most other positions of the differing characters), the output of the SequenceMatcher decides to delete a large part of the string in between the differences and to insert the almost same text after that. The attached code simply prints the results of the comparison with the respective tags, and substrings. No junk function is used. I get the same results on Python 2.5.4, 2.6.5, 3.1.1 on windows XPp SP3. I didn't find any plausible mentions of such cases in the documentation, but after some searching I found several reports in the bug tracker mentioning the erroneous output of SequenceMatcher on longer repetitive sequences. besides this http://bugs.python.org/issue2986 e.g. http://bugs.python.org/issue1711800 http://bugs.python.org/issue4622 http://bugs.python.org/issue1528074 In my case, disabling the popular heuristics as mentioned by John Machin in http://bugs.python.org/issue1528074#msg29269 seems to have solved the problem; with a modified version of difflib containing: if 0: # disable popular heuristics if n = 200 and len(indices) * 100 n: populardict[elt] = 1 del indices[:] the comparison catches the differences in the test strings as expected - i.e. one character addition and deletion only. It is likely, that some other use cases for difflib may rely on the popular-heuristics but it also seems useful to have some control over this behaviour, which might not be appropriate in all cases. (The issue seems to be the same in python 2.5, 2.6 and 3.1.) regards, vbr -- nosy: +vbr Added file: http://bugs.python.org/file17001/difflib_test_inq.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2986 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Vlastimil Brom vlastimil.b...@gmail.com added the comment: I am not sure about the testsuite for this regex module, but it seems to me, that many of the problems reported here probably don't apply for the current builtin re, as they are connected with the new features of regex. After the suggestion in msg91462. I briefly checked the re testsuite and found it very comprehensive, given the featureset. Of course, most/all? re tests should apply for regex, but probably not vice versa. vbr -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Vlastimil Brom vlastimil.b...@gmail.com added the comment: I just noticed a cornercase with the newly introduced grapheme matcher \X, if this is used in the character set: regex.findall(\X, abc) ['a', 'b', 'c'] regex.findall([\X], abc) Traceback (most recent call last): File input, line 1, in module File regex.pyc, line 218, in findall File regex.pyc, line 1435, in _compile File regex.pyc, line 2351, in optimise File regex.pyc, line 2705, in optimise File regex.pyc, line 2798, in optimise File regex.pyc, line 2268, in __hash__ AttributeError: '_Sequence' object has no attribute '_key' It obviously doesn't make much sense to use this universal literal in the character class (the same with . in its metacharacter role) and also http://www.regular-expressions.info/refunicode.html doesn't mention this possibility; but the error message might probably be more descriptive, or the pattern might match X or \ and \X (?) I was originally thinking about the possibility to combine the positive and negative character classes, where e.g. \X would be a kind of base; I am not aware of any re engine supporting this, but I eventually found an unicode guidelines for regular expressions, which also covers this: http://unicode.org/reports/tr18/#Subtraction_and_Intersection It also surprises a bit, that these are all included in Basic Unicode Support: Level 1; (even with arbitrary unions, intersections, differences ...) it suggests, that there is probably no implementation available (AFAIK) - even on this basic level, according to this guideline. Among other features on this level, the section http://unicode.org/reports/tr18/#Supplementary_Characters seems useful, especially the handling of the characters beyond \u, also in the form of surrogate pairs as single characters. This might be useful on the narrow python builds, but it is possible, that there would be be an incompatibility with the handling of these data in narrow python itself. Just some suggestions or rather remarks, as you already implemented many advanced features and are also considering some different approaches ...:-) vbr -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Vlastimil Brom vlastimil.b...@gmail.com added the comment: Actually I had that impression too, but I was mainly surprised with these requirements being on the lowest level of the unicode support. Anyway, maybe the relevance of these guidelines for the real libraries is is lower, than I expected. Probably the simpler cases are adequately handled with lookarounds, e.g. (?:\w(?!\p{Greek}))+ and the complex examples like symmetric differences seem to be beyond the normal scope of re anyway. Personally, I would find the surrogate handling more useful, but I see, that it isn't actually the job for the re library, given that the narrow build of python doesn't support indexing, slicing, len of these characters either... vbr -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Vlastimil Brom vlastimil.b...@gmail.com added the comment: Thanks, its indeed a very nice addition to the library... Just a marginal remark; it seems, that in script-names also some non BMP characters are covered, however, in the unicode ranges thee only BMP. http://www.unicode.org/Public/UNIDATA/Blocks.txt Am I missing something more complex, as why 1.. - ..10; ranges weren't included in _BLOCKS ? Maybe building these ranges is expensive, in contrast to rare uses of these properties? (Not that I am able to reliably test it on my narrow python build on windows, but currently, obviously, e.g. \p{InGothic} gives undefined property name whereas \p{Gothic} is accepted.) vbr -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Vlastimil Brom vlastimil.b...@gmail.com added the comment: Is the issue2636-20100222.zip archive supposed to be complete? I can't find not only the rst or html features, but more importantly the py and pyd files for the particular versions. Anyway, I just skimmed through the regular-expressions.info documentation and found, that most features, which I missed in the builtin re version seems to be present in the regex module; a few possibly notable exceptions being some unicode features: http://www.regular-expressions.info/unicode.html support for unicode script properties might be needlessly complex (maybe unless http://bugs.python.org/issue6331 is implemented) On the other hand \X for matching any single grapheme might be useful, according to the mentioned page, the currently working equivalent would be \P{M}\p{M}* However, I am not sure about the compatibility concerns; it is possible, that the modifier characters as a part of graphemes might cause some discrepancies in the text indices etc. A feature, where i personally (currently) can't find a usecase is \G and continuing matches (but no doubt, there would be some some cases for this). regards vbr -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Vlastimil Brom vlastimil.b...@gmail.com added the comment: Wow, that's what can be called rapid development :-), thanks very much! I did'n noticed before, that \G had been implemented already. \X works fine for me, it also maintains the input string indices correctly. We can use unicode character properties \p{Letter} and unicode bloks \p{inBasicLatin} properties; the script properties like \p{Latin} or \p{IsLatin} return undefined property name. I guess, this would require the access to the respective information in unicodedata, where it isn't available now (there also seem to be much more scripts than those mentioned at regular-expressions.info cf. http://www.unicode.org/Public/UNIDATA/Scripts.txt http://www.unicode.org/Public/UNIDATA/PropertyValueAliases.txt (under # Script (sc)). vbr -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Vlastimil Brom vlastimil.b...@gmail.com added the comment: Thanks for fixing the argument positions; unfortunately, it seems, there might be some other problem, that makes my code work differently than the builtin re; it seems, in the character classes the ignorcase flag is ignored somehow: regex.findall(r[ab], aB, regex.I) ['a'] re.findall(r[ab], aB, re.I) ['a', 'B'] (The same with the flag set in the pattern.) Outside of the character class the case seems to be handled normally, or am I missing something? vbr -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Vlastimil Brom vlastimil.b...@gmail.com added the comment: I just tested the fix for unicode tracebacks and found some possibly weird results (not sure how/whether it should be fixed, as these inputs are indeed rather artificial...). (win XPp SP3 Czech, Python 2.6.4) Using the cmd console, the output is fine (for the characters it can accept and display) regex.findall(ur\p{InBasicLatinĚ}, uaé) Traceback (most recent call last): ... File C:\Python26\lib\regex.py, line 1244, in _parse_property raise error(undefined property name '%s' % name) regex.error: undefined property name 'InBasicLatinĚ' (same result for other distorted proprety names containing e.g. ěščřžýáíéúůßäëiöüîô ... However, in Idle the output differs depending on the characters present regex.findall(ur\p{InBasicLatinÉ}, uab c) yields the expected ... File C:\Python26\lib\regex.py, line 1244, in _parse_property raise error(undefined property name '%s' % name) error: undefined property name 'InBasicLatinÉ' but regex.findall(ur\p{InBasicLatinĚ}, uab c) Traceback (most recent call last): ... File C:\Python26\lib\regex.py, line 1244, in _parse_property raise error(undefined property name '%s' % name) File C:\Python26\lib\regex.py, line 167, in __init__ message = message.encode(sys.stdout.encoding) File C:\Python26\lib\encodings\cp1250.py, line 12, in encode return codecs.charmap_encode(input,errors,encoding_table) UnicodeEncodeError: 'charmap' codec can't encode character u'\xcc' in position 37: character maps to undefined which might be surprising, as cp1250 should be able to encode Ě, maybe there is some intermediate ascii step? using the wxpython pyShell I get its specific encoding error: regex.findall(ur\p{InBasicLatinÉ}, uab c) Traceback (most recent call last): ... File C:\Python26\lib\regex.py, line 1102, in _parse_escape return _parse_property(source, info, in_set, ch) File C:\Python26\lib\regex.py, line 1244, in _parse_property raise error(undefined property name '%s' % name) File C:\Python26\lib\regex.py, line 167, in __init__ message = message.encode(sys.stdout.encoding) AttributeError: PseudoFileOut instance has no attribute 'encoding' (the same for \p{InBasicLatinĚ} etc.) In python 3.1 in Idle, all of these exceptions are displayed correctly, also in other scripts or with special characters. Maybe in python 2.x e.g. repr(...) of the unicode error messages could be used in order to avoid these problems, but I don't know, what the conventions are in these cases. Another issue I found here (unrelated to tracebacks) are backslashes or punctuation (except the handled -_) in the property names, which just lead to failed mathces and no exceptions about unknown property names regex.findall(u\p{InBasic.Latin}, uab c) [] I was also surprised by the added pos/endpos parameters, as I used flags as a non-keyword third parameter for the re functions in my code (probably my fault ...) re.findall(pattern, string, flags=0) regex.findall(pattern, string, pos=None, endpos=None, flags=0, overlapped=False) (is there a specific reason for this order, or could it be changed to maintain compatibility with the current re module?) I hope, at least some of these remarks make some sense; thanks for the continued work on this module! vbr -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Vlastimil Brom vlastimil.b...@gmail.com added the comment: Thanks for the quick update, I confirm the fix for both issues; just another finding (while testing the behaviour mentioned previously - msg91917) The property name normalisation seem to be much more robust now, I just encountered an encoding error using a rather artificial input (in python 2.5, 2.6): regex.findall(ur\p{UppercaseÄÄÄLetter}, uQW\p{UppercaseÄÄÄLetter}as) Traceback (most recent call last): File pyshell#4, line 1, in module regex.findall(ur\p{UppercaseÄÄÄLetter}, uQW\p{UppercaseÄÄÄLetter}as) File C:\Python25\lib\regex.py, line 213, in findall return _compile(pattern, flags).findall(string, overlapped=overlapped) File C:\Python25\lib\regex.py, line 599, in _compile parsed = _parse_pattern(source, info) File C:\Python25\lib\regex.py, line 690, in _parse_pattern branches = [_parse_sequence(source, info)] File C:\Python25\lib\regex.py, line 702, in _parse_sequence item = _parse_item(source, info) File C:\Python25\lib\regex.py, line 710, in _parse_item element = _parse_element(source, info) File C:\Python25\lib\regex.py, line 837, in _parse_element return _parse_escape(source, info, False) File C:\Python25\lib\regex.py, line 1098, in _parse_escape return _parse_property(source, info, in_set, ch) File C:\Python25\lib\regex.py, line 1240, in _parse_property raise error(undefined property name '%s' % name) error: unprintable error object Not sure, how this would be fixed (i.e. whether the error message should be changed to unicode, if applicable). Not surprisingly, in python 3.1, there is a correct message at the end: regex.error: undefined property name 'UppercaseÄÄÄLetter' vbr -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Vlastimil Brom vlastimil.b...@gmail.com added the comment: I'd like to add another issue I encountered with the latest version of regex - issue2636-20100204.zip It seems, that there is an error in handling some quantifiers in python 2.5 on Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit (Intel)] on win32 I get e.g.: regex.findall(urq*, uqqwe) Traceback (most recent call last): File pyshell#35, line 1, in module regex.findall(urq*, uqqwe) File C:\Python25\lib\regex.py, line 213, in findall return _compile(pattern, flags).findall(string, overlapped=overlapped) File C:\Python25\lib\regex.py, line 633, in _compile p = _regex.compile(pattern, info.global_flags | info.local_flags, code, info.group_index, index_group) RuntimeError: invalid RE code There is the same error for other possibly infinite quantifiers like q+, q{0,} etc. with their non-greedy and possesive variants. On python 2.6 and 3.1 all these patterns works without errors. vbr -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Vlastimil Brom vlastimil.b...@gmail.com added the comment: Hi, thanks for the update! Just for the unlikely case, it hasn't been noticed sofar, using python 2.6.4 or 2.5.4 with the regexp build issue2636-20100204.zip I am getting the following easy-to-fix error: Python 2.6.4 (r264:75708, Oct 26 2009, 08:23:19) [MSC v.1500 32 bit (Intel)] on win32 Type help, copyright, credits or license for more information. import regex Traceback (most recent call last): File stdin, line 1, in module File C:\Python26\lib\regex.py, line 2003 print Header file written at %s\n % os.path.abspath(header_file.name)) ^ SyntaxError: invalid syntax After removing the extra closing paren in regex.py, line 2003, everything seems ok. vbr -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Vlastimil Brom vlastimil.b...@gmail.com added the comment: I'd like to add some detail to the previous msg91473 The current behaviour of the character properties looks a bit surprising sometimes: regex.findall(ur\p{UppercaseLetter}, uQW\p{UppercaseLetter}as) [u'Q', u'W', u'U', u'L'] regex.findall(ur\p{Uppercase Letter}, uQW\p{Uppercase Letter}as) [u'\\p{Uppercase Letter}'] regex.findall(ur\p{UppercaseÄÄÄLetter}, uQW\p {UppercaseÄÄÄLetter}as) [u'\\p{Uppercase\xc4\xc4\xc4Letter}'] regex.findall(ur\p{UppercaseQQQLetter}, uQW\p {UppercaseQQQLetter}as) Traceback (most recent call last): File pyshell#34, line 1, in module regex.findall(ur\p{UppercaseQQQLetter}, uQW\p {UppercaseQQQLetter}as) ... File C:\Python26\lib\regex.py, line 1178, in _parse_property raise error(undefined property name '%s' % name) error: undefined property name 'UppercaseQQQLetter' i.e. potential property names consisting only from the ascii-letters (+ _, -) are looked up and either used or an error is raised, other names (containing whitespace or non-ascii letters) aren't treated as a special expression, hence, they either match their literal value or simply don't match (without errors). Is this the intended behaviour? I am not sure whether it is maybe defined somewhere, or there are some de-facto standards for this... I guess, the space in the property names might be allowed (unless there are some implications for the parser...), otherwise the fallback handling of invalid property names as normal strings is probably the expected way. vbr -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Vlastimil Brom vlastimil.b...@gmail.com added the comment: Sorry for the dumb question, which may also suggest, that I'm unfortunately unable to contribute at this level (with zero knowledge of C and only working one for Python): Where can I find the sources for tests etc. and how they are eventually to be submitted? Is some other account needed besides the one for bugs.python.org? Anyway, the long character properties now work in the latest version issue2636-20090810#3.zip In the mentioned overview http://www.regular-expressions.info/unicode.html there is a statement for the property names: You may omit the underscores or use hyphens or spaces instead. While I'm not sure, that it is a good thing to have that many variations, they should probably be handled in the same way. Now, the whitespace (and also non ascii characters) in the property name seem to confuse the parser: these pass silently (don't match anything) and don't throw an exception like undefined property name. cf. regex.findall(ur\p{Dummy Property}, uabcDEF) [] regex.findall(ur\p{DümmýPrópërtý}, uabcDEF) [] regex.findall(ur\p{DummyProperty}, uabcDEF) Traceback (most recent call last): File input, line 1, in module File regex.pyc, line 195, in findall File regex.pyc, line 563, in _compile File regex.pyc, line 642, in _parse_pattern File regex.pyc, line 654, in _parse_sequence File regex.pyc, line 662, in _parse_item File regex.pyc, line 787, in _parse_element File regex.pyc, line 1021, in _parse_escape File regex.pyc, line 1159, in _parse_property error: undefined property name 'DummyProperty' vbr -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Vlastimil Brom vlastimil.b...@gmail.com added the comment: First, many thanks for this contribution; it's great, that the re module gets updated in that comprehensive way! I'd like to report some issue with the current version (issue2636-20090804.zip). Using an empty string as the search pattern ends up consuming system resources and the function doesn't return anything nor raise an exception or crash (within several minutes I tried). The current re engine simply returns the empty matches on all character boundaries in this case. I use win XPh SP3, the behaviour is the same on python 2.5.4 and 2.6.2: It should be reproducible with the following simple code: import re import regex re.findall(, abcde) ['', '', '', '', '', ''] regex.findall(, abcde) _ regards vbr -- nosy: +vbr ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Vlastimil Brom vlastimil.b...@gmail.com added the comment: I'd like to confirm, that the above reported error is fixed in issue2636-20090810#2.zip While testing the new features a bit, I noticed some irregularity in handling the Unicode Character Properties; I tried randomly some of those mentioned at http://www.regular- expressions.info/unicode.html using the simple findall like above. It seems, that only the short abbreviated forms of the properties are supported, however, the long variants are handled in different ways. Namely, the properties names containing whitespace or other non-letter characters cause some probably unexpected exception: regex.findall(ur\p{Ll}, uabcDEF) [u'a', u'b', u'c'] # works ok \p{LowercaseLetter} isn't supported, but seems to be handled, as it throws error: undefined property name at the end of the traceback. \p{Lowercase Letter} \p{Lowercase_Letter} \p{Lowercase-Letter} isn't probably expected, the traceback is: regex.findall(ur\p{Lowercase_Letter}, uabcDEF) Traceback (most recent call last): File input, line 1, in module File C:\Python25\lib\regex.py, line 194, in findall return _compile(pattern, flags).findall(string) File C:\Python25\lib\regex.py, line 386, in _compile parsed = _parse_pattern(source, info) File C:\Python25\lib\regex.py, line 465, in _parse_pattern branches = [_parse_sequence(source, info)] File C:\Python25\lib\regex.py, line 477, in _parse_sequence item = _parse_item(source, info) File C:\Python25\lib\regex.py, line 485, in _parse_item element = _parse_element(source, info) File C:\Python25\lib\regex.py, line 610, in _parse_element return _parse_escape(source, info, False) File C:\Python25\lib\regex.py, line 844, in _parse_escape return _parse_property(source, ch == p, here, in_set) File C:\Python25\lib\regex.py, line 983, in _parse_property if info.local_flags IGNORECASE and not in_set: NameError: global name 'info' is not defined Of course, arbitrary strings other than properties names are handled identically. Python 2.6.2 version behaves the same like 2.5.4. vbr -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5274] sys.exc_info()[1] - different handling from str() and unicode() - py 2.6
Vlastimil Brom vlastimil.b...@gmail.com added the comment: I just want to confirm, that the reported issue is the same in python 2.6.2, is it really the intended behaviour in python 2.6 (as opposed to 2.5)? vbr -- components: +Unicode ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5274 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4281] Idle - incorrectly displaying a character (Latin capital letter sharp s)
Vlastimil Brom vlastimil.b...@gmail.com added the comment: I just wanted to confirm, that there isn't a bug in idle nor tk, but somwhere in my istalled fonts. Now while testing python 3.1a1, when I also have a font containing ẞ LATIN CAPITAL LETTER SHARP S (DejaVu), it's more clear. Printing this character using a default font in idle I get the wrong glyph mentioned in the report; however this is corrected immediately after changing the font to DejaVu. Some of the fonts on my system seems to shadow this newly added character with a wrong glyph (also preventing tk to find a font realy suporting this). Sorry for the needles bug report. vbr -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4281 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4281] Idle - incorrectly displaying a character (Latin capital letter sharp s)
Vlastimil Brom [EMAIL PROTECTED] added the comment: I can confirm, that TCL displays the same character as Idle, hence it itsn't a bug in Python (cf. the screenshot). Unfortunately, I couldn't identify the font used here; I'm not able to modify and recompile Tk, as suggested, but I tried to check the possible serif fonts visually. None of the fonts listed in Word is identical to the one used for capital sharp s in tcl (I created a simple app with Tkinter Label-s showing the pairs of the characters in question using the potentially similar fonts; while some are really close, in all cases there are various differences in glyphs; ) In any case, I guess this isn't a problem in python, which would have to be further examined; I have quite a lot of fonts installed, probably with some of them behaving in some non-standard ways Added file: http://bugs.python.org/file11968/capital-sharp-s-TCL-Idle.jpg ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue4281 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4281] Idle - incorrectly displaying a character (Latin capital letter sharp s)
New submission from Vlastimil Brom [EMAIL PROTECTED]: While experimenting with the new unicodedata for version 5.1 (many thanks for it!) I discovered some strange behaviour of Idle with regard to a character not available in any font on my system, namely Latin capital letter sharp s - U+1E9E. Cf. the following sessions: Python 3.0rc2 (r30rc2:67141, Nov 7 2008, 11:43:46) [MSC v.1500 32 bit (Intel)] on win32 Type copyright, credits or license() for more information. ... IDLE 3.0rc2 print(\N{LATIN CAPITAL LETTER SHARP S}) ẞ print(\N{LATIN CAPITAL LETTER S WITH CEDILLA}) Ş print(\N{PHAGS-PA LETTER KA}) ꡀ print(\ufff0) hex(ord(ẞ)) '0x1e9e' hex(ord(Ş)) '0x15e' Of course, the exact view cannot be copied, but basically I see very similar glyphs for the first two characters, while I had expected a square-sign or something for the first one; this is what I get with other surely unavailable glyph as well as a non existent character. See the attached screenshot. However, the characters remain clearly distinguished, as can be seen e.g. after copying them as a parameter of ord(...). Python 2.6 behaves the same way: === Python 2.6 (r26:66721, Oct 2 2008, 11:35:03) [MSC v.1500 32 bit (Intel)] on win32 Type copyright, credits or license() for more information. ... IDLE 2.6 print u\N{LATIN CAPITAL LETTER SHARP S} ẞ ... == Not that it is much important, but I found it a bit surprising. I'm using WinXPh SP3 Czech. -- components: IDLE, Tkinter, Unicode files: idle-capital-sharp-s.jpg messages: 75613 nosy: vbr severity: normal status: open title: Idle - incorrectly displaying a character (Latin capital letter sharp s) versions: Python 2.6, Python 3.0 Added file: http://bugs.python.org/file11963/idle-capital-sharp-s.jpg ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue4281 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1688] Incorrectly displayed non ascii characters in prompt using input() - Python 3.0a2
Vlastimil Brom [EMAIL PROTECTED] added the comment: While I am not sure about the status of this somewhat older issue, I just wanted to mention, that the behaviour remains the same in Python 3.0rc1 (XPh SP3, Czech) Python 3.0rc1 (r30rc1:66507, Sep 18 2008, 14:47:08) [MSC v.1500 32 bit (Intel)] on win32 Type help, copyright, credits or license for more information. input(ěšč: ) ─Ť┼í─Ź: řžý 'řžý' print(ěšč: ) ěšč: Is the patch above supposed to have been committed, or are there yet another difficulties? (Not that it is a huge problem (for me), as applications dealing with non ascii text probably would use a gui, rather than relying on a console, but it's a kind of surprising.) ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue1688 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3815] Python 3.0b3 - Idle doesn't start on win XPh
New submission from Vlastimil Brom [EMAIL PROTECTED]: Using Python 3.0b3 on windows XPH SP2 (installed form python-3.0b3.msi) Idle can't be started. Using a windows shortcut, only an error-promt is shown Subprocess Startup Error: IDLE's subprocess dien't make connection. Either IDLE can't start a subprocess or personal firewall is blocking the connection. I'm aware of the warning about firewalls in IDLE, but the previous 3.0 betas didn't have that issue with the same settings of the windows firewall. After directly calling: C:\Python30\python.exe C:\Python30\Lib\idlelib\idle.py The same error is thrown, but previously another exception is writen to the console: Traceback (most recent call last): File string, line 1, in module File C:\Python30\lib\idlelib\run.py, line 76, in main sockthread.set_daemon(True) AttributeError: 'Thread' object has no attribute 'set_daemon' Regards, vbr -- components: IDLE messages: 72843 nosy: vbr severity: normal status: open title: Python 3.0b3 - Idle doesn't start on win XPh type: crash versions: Python 3.0 ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3815 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3815] Python 3.0b3 - Idle doesn't start on win XPh
Vlastimil Brom [EMAIL PROTECTED] added the comment: Sorry for the noise, somehow my search in the bug tracker didn't show this report; after fixing the mentioned line in run.py everything works ok. ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3815 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1688] Incorrectly displayed non ascii characters in prompt using input() - Python 3.0a2
Vlastimil Brom added the comment: First sorry about a delayed response, but moreover, I fear, preparing a patch would be far beyond my programming competence; sorry about that. __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1688 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1110] Problems with the msi installer - python-3.0a1.msi
Vlastimil Brom added the comment: I just installed python-3.0a2 and it works fine for me (Win XPh SP2 Czech; python3 directory C:\Python30). Sofar I haven't found any problems other than those mentioned in the release notes. Thank you very much for fixing this! __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1110 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1110] Problems with the msi installer - python-3.0a1.msi
New submission from Vlastimil Brom: I encountered problems installing python 3.0 alpha 1 from the MSI installer supplied on the python download page (python-3.0a1.msi). If the advanced option of the installer (compile .py files to bytecode after installation) is checked - the following message is shown There is a problem with this Windows installer package. A program run as part of the setup did not finish as expected ... If I don't choose the option to compile files, the installation finishes without any visible errors. The result is in both cases the same however. After calling python.exe it shows the version info etc. in the interactive prompt, but it doesn't respond in any way. e.g. 1+1 object : RuntimeError('lost sys.stdout',) type: RuntimeError refcount: 4 address : 00A65BD0 lost sys.stderr Running of any .py file doesn't work either. My system is Win XPh SP2 Czech (the same on Win XPp SP2 Czech). Could possibly the Czech windows version/ language setting/ locale/ timezone or whatever be the problem (as there were some problems reported with the manual compilation on German or Polish Winsows- systems)? Or am I missing something trivial? Thanks, Vlastimil Brom -- components: Windows messages: 55665 nosy: vbr severity: normal status: open title: Problems with the msi installer - python-3.0a1.msi versions: Python 3.0 __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1110 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1110] Problems with the msi installer - python-3.0a1.msi
Vlastimil Brom added the comment: The path to the python executable on my system is: C:\Python30\python.exe The path to Program Files is C:\Program Files, but it doesn't matter in that case, I guess. And yes, I use the console window (i.e. the cmd window in Windows) - the IDLE doesn't run either, as all other .py files (using python 3.0). __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1110 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com