[issue1609] test_re.py fails

2007-12-20 Thread Ismail Donmez
Ismail Donmez added the comment: Hi Martin, Actually the only problem is how can I get wctype functionality with 8-bit strings, any example is appreciated. This bug itself is invalid because --with-wctype-functions is deprecated. But as I said I just hope removing that doesn't regress Turkish

[issue1609] test_re.py fails

2007-12-20 Thread Ismail Donmez
Ismail Donmez added the comment: Funnily, print .encode(iso-8859-9).decode(iso-8859-9).upper() works, but print .encode(iso-8859-9).upper().decode(iso-8859-9) not. __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1609

[issue1609] test_re.py fails

2007-12-20 Thread Guido van Rossum
Guido van Rossum added the comment: Funnily, print .encode(iso-8859-9).decode(iso-8859-9).upper() works, but print .encode(iso-8859-9).upper().decode(iso-8859-9) not. You'll have to debug this yourself. __ Tracker [EMAIL PROTECTED]

[issue1609] test_re.py fails

2007-12-20 Thread Ismail Donmez
Ismail Donmez added the comment: I guess so, I will no longer spam this bug. Thanks for the suggestions. __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1609 __ ___

[issue1609] test_re.py fails

2007-12-20 Thread Guido van Rossum
Guido van Rossum added the comment: Two easy ways to get the functionality using 8-bit strings, assuming you've already set your locale properly: (1) If your data is already an 8-bit string (i.e. isinstance(data, str)), simply use data.upper() or data.lower() (2) If your data is Unicode (i.e.

[issue1609] test_re.py fails

2007-12-20 Thread Martin v. Löwis
Martin v. Löwis added the comment: print .encode(iso-8859-9).upper().decode(iso-8859-9) does not Please get your types right. is a byte string (in Python 2.x). encode: unicode - string decode: string - unicode That you still can apply .encode to the byte string is a bug/pit fall in

[issue1609] test_re.py fails

2007-12-20 Thread Ismail Donmez
Ismail Donmez added the comment: Tried like , unicode(iii).encode(iso-8859-9).upper() doesn't work, I'll ask on python users list. Thanks. __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1609 __

[issue1609] test_re.py fails

2007-12-19 Thread Ismail Donmez
Ismail Donmez added the comment: Python README says --with-wctype-functions is deprecated and will be removed in Python 2.6 , I don't think its worth to fix it now. Also test failures with --with-wctype-functions is seems to be known according to Google. What I wonder if removing

[issue1609] test_re.py fails

2007-12-19 Thread Ismail Donmez
Ismail Donmez added the comment: Indeed there seems to be regressions: Python 2.4 : [~] python Python 2.4.4 (#1, Oct 23 2007, 11:25:50) [GCC 3.4.6] on linux2 Type help, copyright, credits or license for more information. import locale locale.setlocale(locale.LC_ALL,) 'tr_TR.UTF-8' print

[issue1609] test_re.py fails

2007-12-19 Thread Ismail Donmez
Ismail Donmez added the comment: Situation is even more complicated, following functions behave _correctly_ when wctypes is enabled : print unicode(i).upper() İ print unicode().lower() Following doesn't work even if wctypes is enabled : print unicode().upper()

[issue1609] test_re.py fails

2007-12-19 Thread Guido van Rossum
Guido van Rossum added the comment: Martin, can you have a look at this? Cartman, can you produce a unittest for the correct behavior that only uses ASCII input (using \u instead of just typing Turkish characters)? -- assignee: - loewis nosy: +loewis

[issue1609] test_re.py fails

2007-12-19 Thread Ismail Donmez
Ismail Donmez added the comment: So in conclusion, - Enabling wctypes makes Turkish support work with \u syntax, breaks unicode() - Disabling wctypes breaks Turkish support with \u and/or unicode() Attached test.py tests Turkish corner cases of lower()/upper() . Correct output is which python

[issue1609] test_re.py fails

2007-12-19 Thread Guido van Rossum
Guido van Rossum added the comment: Hm. The test2.py file, when I download it, contains the two bytes \xc4\xb1 in the first unicode() call, and \xc4\xb0 in the second one. This is *always* supposed to produce a UnicodeDecodeError, since it would use the default encoding which is ASCII. So I

[issue1609] test_re.py fails

2007-12-19 Thread Ismail Donmez
Ismail Donmez added the comment: Replacing Turkish characters with hex versions in test2.py still results in UnicodeDecodeError and works with python 2.4. __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1609 __

[issue1609] test_re.py fails

2007-12-19 Thread Guido van Rossum
Guido van Rossum added the comment: Replacing Turkish characters with hex versions in test2.py still results in UnicodeDecodeError and works with python 2.4. I'm hoping Martin can confirm this, but I suspect that this is due to a tightening of the rules for converting from 8-bit strings to

[issue1609] test_re.py fails

2007-12-19 Thread Ismail Donmez
Ismail Donmez added the comment: Ok that was because we had modified default encoding in Lib/site.py to be utf-8. Sorry! The only problem left is last 2 conversions in test.py gives wrong results when wctypes is disabled, that is : print u\u0069.upper() should give \u0130 (LATIN CAPITAL

[issue1609] test_re.py fails

2007-12-19 Thread Guido van Rossum
Guido van Rossum added the comment: print u\u0069.upper() should give \u0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE) print u\u0049.lower() should give \u0131 (LATIN SMALL LETTER DOTLESS I) These transformations work fine with python2.5 when --with-wctype-functions is used. I think

[issue1609] test_re.py fails

2007-12-19 Thread Ismail Donmez
Ismail Donmez added the comment: But it should be affected by locale, thats the point of locale.setlocale call. This is how libc's wc functions behave. __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1609 __

[issue1609] test_re.py fails

2007-12-19 Thread Guido van Rossum
Guido van Rossum added the comment: But it should be affected by locale, thats the point of locale.setlocale call. This is how libc's wc functions behave. No, the locale should only affect 8-bit string operations, never unicode operations. __ Tracker [EMAIL

[issue1609] test_re.py fails

2007-12-19 Thread Ismail Donmez
Ismail Donmez added the comment: Ok then what is the suggested way to get back the Turkish way of doing upper/lower on i I ? __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1609 __

[issue1609] test_re.py fails

2007-12-19 Thread Guido van Rossum
Guido van Rossum added the comment: Ok then what is the suggested way to get back the Turkish way of doing upper/lower on i I ? That's a question for Martin von Loewis. I suppose you could use 8-bit strings exclusively. Or you could use .translate() with a custom dict.

[issue1609] test_re.py fails

2007-12-19 Thread Martin v. Löwis
Martin v. Löwis added the comment: I think too many issues get mixed in this report. I would like to ignore all but one issue, but I don't understand what the one issue is that this report should deal with. cartman, when you compare Python 2.4 and 2.5, could it be that the 2.4 Python was

[issue1609] test_re.py fails

2007-12-17 Thread Guido van Rossum
Guido van Rossum added the comment: Focus on how using --with-wctype-functions changes things and how this could affect the regex implementation. (I wouldn't be surprised if the other failing tests were to to the regex bugs.) __ Tracker [EMAIL PROTECTED]

[issue1609] test_re.py fails

2007-12-14 Thread Ismail Donmez
Ismail Donmez added the comment: Any ideas/comments on how to move forward with this? Thanks, ismail __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1609 __ ___ Python-bugs-list

[issue1609] test_re.py fails

2007-12-13 Thread Ismail Donmez
New submission from Ismail Donmez: Using python 2.5 revision 59479 from release25-maint branch, [~/python-2.5] LD_LIBRARY_PATH=/home/cartman/python-2.5: ./python ./Lib/test/test_re.py test_anyall (__main__.ReTests) ... ok test_basic_re_sub (__main__.ReTests) ... ok test_bigcharset

[issue1609] test_re.py fails

2007-12-13 Thread Guido van Rossum
Guido van Rossum added the comment: Can't reproduce. Like before, what platform, compiler etc.? Does using ./configure --with-pydebug make a difference? What's the LD_LIBRARY_PATH for? -- nosy: +gvanrossum __ Tracker [EMAIL PROTECTED]

[issue1609] test_re.py fails

2007-12-13 Thread Ismail Donmez
Ismail Donmez added the comment: gcc 4.3, Linux 2.6.18, 32bit. Without LD_LIBRARY_PATH it would use the system libraries and not the compiled ones which anyway is not wanted. Configure line used is (damn I forgot to specify this before, sorry) --with-fpectl \ --enable-shared \ --enable-ipv6

[issue1609] test_re.py fails

2007-12-13 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc added the comment: Is GCC 4.3 released yet? Not yet but soon, its less buggy compared to 4.1 and 4.2 at the moment. Not quite yet, gcc 4.3 had a big inlining bug that was just corrected two weeks ago: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33434 You may have

[issue1609] test_re.py fails

2007-12-13 Thread Ismail Donmez
Ismail Donmez added the comment: Not quite yet, gcc 4.3 had a big inlining bug that was just corrected two weeks ago: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33434 You may have encountered this bug, or another similar one... Two weeks ago is too old for me, I am using SVN snapshot from

[issue1609] test_re.py fails

2007-12-13 Thread Ismail Donmez
Ismail Donmez added the comment: Removing --with-wctype-functions in total fixes following regression tests, test_codecs test_re test_ucn test_unicodedata __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1609 __

[issue1609] test_re.py fails

2007-12-13 Thread Ismail Donmez
Ismail Donmez added the comment: Remove test_ucn from the list, it still fails but its for another bug report. __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1609 __ ___