Ismail Donmez added the comment:
Hi Martin,
Actually the only problem is how can I get wctype functionality with
8-bit strings, any example is appreciated.
This bug itself is invalid because --with-wctype-functions is
deprecated. But as I said I just hope removing that doesn't regress
Turkish
Ismail Donmez added the comment:
Funnily,
print .encode(iso-8859-9).decode(iso-8859-9).upper()
works, but
print .encode(iso-8859-9).upper().decode(iso-8859-9)
not.
__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1609
Guido van Rossum added the comment:
Funnily,
print .encode(iso-8859-9).decode(iso-8859-9).upper()
works, but
print .encode(iso-8859-9).upper().decode(iso-8859-9)
not.
You'll have to debug this yourself.
__
Tracker [EMAIL PROTECTED]
Ismail Donmez added the comment:
I guess so, I will no longer spam this bug. Thanks for the suggestions.
__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1609
__
___
Guido van Rossum added the comment:
Two easy ways to get the functionality using 8-bit strings, assuming
you've already set your locale properly:
(1) If your data is already an 8-bit string (i.e. isinstance(data,
str)), simply use data.upper() or data.lower()
(2) If your data is Unicode (i.e.
Martin v. Löwis added the comment:
print .encode(iso-8859-9).upper().decode(iso-8859-9)
does not
Please get your types right. is a byte string (in Python 2.x).
encode: unicode - string
decode: string - unicode
That you still can apply .encode to the byte string is a bug/pit fall in
Ismail Donmez added the comment:
Tried like ,
unicode(iii).encode(iso-8859-9).upper()
doesn't work, I'll ask on python users list. Thanks.
__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1609
__
Ismail Donmez added the comment:
Python README says --with-wctype-functions is deprecated and will be
removed in Python 2.6 , I don't think its worth to fix it now. Also test
failures with --with-wctype-functions is seems to be known according to
Google.
What I wonder if removing
Ismail Donmez added the comment:
Indeed there seems to be regressions:
Python 2.4 :
[~] python
Python 2.4.4 (#1, Oct 23 2007, 11:25:50)
[GCC 3.4.6] on linux2
Type help, copyright, credits or license for more information.
import locale
locale.setlocale(locale.LC_ALL,)
'tr_TR.UTF-8'
print
Ismail Donmez added the comment:
Situation is even more complicated, following functions behave
_correctly_ when wctypes is enabled :
print unicode(i).upper()
İ
print unicode().lower()
Following doesn't work even if wctypes is enabled :
print unicode().upper()
Guido van Rossum added the comment:
Martin, can you have a look at this?
Cartman, can you produce a unittest for the correct behavior that only
uses ASCII input (using \u instead of just typing Turkish characters)?
--
assignee: - loewis
nosy: +loewis
Ismail Donmez added the comment:
So in conclusion,
- Enabling wctypes makes Turkish support work with \u syntax, breaks
unicode()
- Disabling wctypes breaks Turkish support with \u and/or unicode()
Attached test.py tests Turkish corner cases of lower()/upper() . Correct
output is which python
Guido van Rossum added the comment:
Hm. The test2.py file, when I download it, contains the two bytes
\xc4\xb1 in the first unicode() call, and \xc4\xb0 in the second
one. This is *always* supposed to produce a UnicodeDecodeError, since
it would use the default encoding which is ASCII. So I
Ismail Donmez added the comment:
Replacing Turkish characters with hex versions in test2.py still results
in UnicodeDecodeError and works with python 2.4.
__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1609
__
Guido van Rossum added the comment:
Replacing Turkish characters with hex versions in test2.py still results
in UnicodeDecodeError and works with python 2.4.
I'm hoping Martin can confirm this, but I suspect that this is due to
a tightening of the rules for converting from 8-bit strings to
Ismail Donmez added the comment:
Ok that was because we had modified default encoding in Lib/site.py to
be utf-8. Sorry!
The only problem left is last 2 conversions in test.py gives wrong
results when wctypes is disabled, that is :
print u\u0069.upper()
should give \u0130 (LATIN CAPITAL
Guido van Rossum added the comment:
print u\u0069.upper()
should give \u0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE)
print u\u0049.lower()
should give \u0131 (LATIN SMALL LETTER DOTLESS I)
These transformations work fine with python2.5 when
--with-wctype-functions is used.
I think
Ismail Donmez added the comment:
But it should be affected by locale, thats the point of locale.setlocale
call. This is how libc's wc functions behave.
__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1609
__
Guido van Rossum added the comment:
But it should be affected by locale, thats the point of locale.setlocale
call. This is how libc's wc functions behave.
No, the locale should only affect 8-bit string operations, never
unicode operations.
__
Tracker [EMAIL
Ismail Donmez added the comment:
Ok then what is the suggested way to get back the Turkish way of doing
upper/lower on i I ?
__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1609
__
Guido van Rossum added the comment:
Ok then what is the suggested way to get back the Turkish way of doing
upper/lower on i I ?
That's a question for Martin von Loewis. I suppose you could use 8-bit
strings exclusively. Or you could use .translate() with a custom dict.
Martin v. Löwis added the comment:
I think too many issues get mixed in this report. I would like to ignore
all but one issue, but I don't understand what the one issue is that
this report should deal with.
cartman, when you compare Python 2.4 and 2.5, could it be that the 2.4
Python was
Guido van Rossum added the comment:
Focus on how using --with-wctype-functions changes things and how this
could affect the regex implementation. (I wouldn't be surprised if the
other failing tests were to to the regex bugs.)
__
Tracker [EMAIL PROTECTED]
Ismail Donmez added the comment:
Any ideas/comments on how to move forward with this?
Thanks,
ismail
__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1609
__
___
Python-bugs-list
New submission from Ismail Donmez:
Using python 2.5 revision 59479 from release25-maint branch,
[~/python-2.5] LD_LIBRARY_PATH=/home/cartman/python-2.5: ./python
./Lib/test/test_re.py
test_anyall (__main__.ReTests) ... ok
test_basic_re_sub (__main__.ReTests) ... ok
test_bigcharset
Guido van Rossum added the comment:
Can't reproduce.
Like before, what platform, compiler etc.? Does using ./configure
--with-pydebug make a difference? What's the LD_LIBRARY_PATH for?
--
nosy: +gvanrossum
__
Tracker [EMAIL PROTECTED]
Ismail Donmez added the comment:
gcc 4.3, Linux 2.6.18, 32bit.
Without LD_LIBRARY_PATH it would use the system libraries and not the
compiled ones which anyway is not wanted.
Configure line used is (damn I forgot to specify this before, sorry)
--with-fpectl \
--enable-shared \
--enable-ipv6
Amaury Forgeot d'Arc added the comment:
Is GCC 4.3 released yet?
Not yet but soon, its less buggy compared to 4.1 and 4.2
at the moment.
Not quite yet, gcc 4.3 had a big inlining bug that was just corrected
two weeks ago:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33434
You may have
Ismail Donmez added the comment:
Not quite yet, gcc 4.3 had a big inlining bug that was just corrected
two weeks ago:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33434
You may have encountered this bug, or another similar one...
Two weeks ago is too old for me, I am using SVN snapshot from
Ismail Donmez added the comment:
Removing --with-wctype-functions in total fixes following regression tests,
test_codecs
test_re
test_ucn
test_unicodedata
__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1609
__
Ismail Donmez added the comment:
Remove test_ucn from the list, it still fails but its for another bug
report.
__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1609
__
___
31 matches
Mail list logo