[issue12735] request full Unicode collation support in std python library

2020-04-01 Thread Ahmad Azizi


Ahmad Azizi  added the comment:

No, this is not an OS dependent issue. Python does not use Unicode 
collation(uses utf-8) for sorting.

--
versions:  -Python 3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12735] request full Unicode collation support in std python library

2020-04-01 Thread Matej Cepl

Matej Cepl  added the comment:

Isn’t this done by the system? It feels like barking at the wrong tree.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12735] request full Unicode collation support in std python library

2020-03-31 Thread Ahmad Azizi

Ahmad Azizi  added the comment:

Remember, sorting standard Persian(Farsi) characters does not work properly 
with current implementation of Python 3.x
As the result, python is probably unable to sort properly in some other 
languages.

Here is correct order of alphabet in Persian:

 "آ","ا","ب","پ","ت","ث","ج","چ","ح","خ",   
 "د","ذ","ص","ض","ط","ظ","ع","غ","ف","ق",   
 "ک","گ","ك","ل","م","ن","و","ه","ی","ي",

After sorting using sorted():

آ, ا, ب, ت, ث, ج, ح, خ, د, ذ, ص, ض, ط, ظ, ع, غ, ف, ق, ك, ل, م, ن, ه, و, ي, پ, 
چ, ک, گ, ی,

--
nosy: +Ahmad Azizi

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12735] request full Unicode collation support in std python library

2018-04-13 Thread Matej Cepl

Change by Matej Cepl :


--
nosy: +mcepl

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12735] request full Unicode collation support in std python library

2018-01-21 Thread Greg Lindahl

Change by Greg Lindahl :


--
nosy: +wumpus

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12735] request full Unicode collation support in std python library

2013-07-10 Thread Terry J. Reedy

Changes by Terry J. Reedy :


--
versions: +Python 3.4 -Python 3.3

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12735] request full Unicode collation support in std python library

2011-08-26 Thread Tom Christiansen

Tom Christiansen  added the comment:

Guido van Rossum  wrote
   on Fri, 26 Aug 2011 21:55:03 -: 

> I know I sound like NIH, but I'm always reluctant to add a big 3rd
> party lib like ICU to the permanent dependencies of all future Python
> distros.  If people want to use ICU they already can.  OTOH I don't
> have a better idea. :-(

I know exactly what you mean.  I would not want to push that on anyone,
being dependent on a gigantic 3rd-party module.  I just tried to answer
the question.  The only two full UCA implementations I know of are ICU's
and Perl's, which does not use ICU (since we're UTF-8, etc).

I just wish Python had Unicode collation, is all.

--tom

PS: (I haven't had good luck the ICU bindings in 3.2.)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12735] request full Unicode collation support in std python library

2011-08-26 Thread Tom Christiansen

Tom Christiansen  added the comment:

I should probably mention the importance in the design of a UCA module of
being able to specify which UCA version number you want it to behave like
in case you plan to override some of the DUCET entries.  That way if you
run under a later UCA with different DUCET weights, your own tailorings will
still make sense.  If you don't do this, your collation tailorings can break 
in a new release of the UCA.

--tom

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12735] request full Unicode collation support in std python library

2011-08-26 Thread Tom Christiansen

Tom Christiansen  added the comment:

Raymond Hettinger  added the comment:

> I would like to be involved in the design of the API for a UCA module
> and its routines for loading Unicode Collation Element Tables (not
> making the mistake of using global state like the locale module does).

Is this the problem where a locale is global to a process (or thread)?

The way I'm used to using the UCA module in Perl, that's never a problem,
because it's completely object-oriented.  There's no global state.  You 
instantiate a collator object with all the state it needs, like

collation_level
upper_before_lower
backwards_levels
normalization
override_CJK
override_Hangul
katakana_before_hiragana
variable
locale
preprocess

And then you use that object for all your collation needs, including
not just sorting but also string comparison and even searches.

For example, you could instantiate a first collator object with its level
set to one, meaning just compare base alphanumerics not diacritics or case
or nonletters, and a second with the defaults so that it uses all four
levels or a different normalization.  I have on occasion had more than one
collator object around at once each with its own locale, like if I want to
compare different locales' comparisons.

--tom

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12735] request full Unicode collation support in std python library

2011-08-26 Thread Raymond Hettinger

Raymond Hettinger  added the comment:

I would like to be involved in the design of the API for a UCA module and its 
routines for loading Unicode Collation Element Tables (not making the mistake 
of using global state like the locale module does).

--
nosy: +rhettinger

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12735] request full Unicode collation support in std python library

2011-08-26 Thread Guido van Rossum

Guido van Rossum  added the comment:

I know I sound like NIH, but I'm always reluctant to add a big 3rd party lib 
like ICU to the permanent dependencies of all future Python distros.  If people 
want to use ICU they already can.  OTOH I don't have a better idea. :-(

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12735] request full Unicode collation support in std python library

2011-08-26 Thread Tom Christiansen

Tom Christiansen  added the comment:

> Sounds like a fair feature request for Python 3.3, as long as the
> intention is that users must import some module from the standard
> library and use functions defined in that module.  The operations and
> methods defined for str instances (e.g. ==, <, etc.) should not change
> their behavior.

> Is there an existing 3rd party library that we could adopt (even if it isn't 
> perfect yet)?

I *think* you could use ICU's.  

I'm pretty sure the Parrot people use ICU libraries.

--tom

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12735] request full Unicode collation support in std python library

2011-08-26 Thread Guido van Rossum

Guido van Rossum  added the comment:

Sounds like a fair feature request for Python 3.3, as long as the intention is 
that users must import some module from the standard library and use functions 
defined in that module.  The operations and methods defined for str instances 
(e.g. ==, <, etc.) should not change their behavior.

Is there an existing 3rd party library that we could adopt (even if it isn't 
perfect yet)?

--
nosy: +gvanrossum

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12735] request full Unicode collation support in std python library

2011-08-12 Thread Matthew Barnett

Changes by Matthew Barnett :


--
nosy: +mrabarnett

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12735] request full Unicode collation support in std python library

2011-08-12 Thread Éric Araujo

Changes by Éric Araujo :


--
nosy: +eric.araujo
versions: +Python 3.3 -Python 3.2

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12735] request full Unicode collation support in std python library

2011-08-12 Thread Arfrever Frehtes Taifersar Arahesis

Changes by Arfrever Frehtes Taifersar Arahesis :


--
nosy: +Arfrever

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12735] request full Unicode collation support in std python library

2011-08-12 Thread Daniel Urban

Changes by Daniel Urban :


--
nosy: +durban

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12735] request full Unicode collation support in std python library

2011-08-11 Thread Ezio Melotti

Changes by Ezio Melotti :


--
nosy: +belopolsky, ezio.melotti

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12735] request full Unicode collation support in std python library

2011-08-11 Thread Tom Christiansen

New submission from Tom Christiansen :

Python has no standard support for the Unicode Collation Library as explained 
in UTS #10.  This is request that UCA library be added to the standard Python 
distribution.

Collation underlies virtually everything we do with text, not just sorting but 
any sort of comparison. Furthermore, the UCA is tailorable for locales in a 
portable way that does not require dodgy vendor support. It is a very important 
step in making Python suitable for full Unicode text processing.

--
components: Library (Lib)
messages: 141926
nosy: tchrist
priority: normal
severity: normal
status: open
title: request full Unicode collation support in std python library
type: feature request
versions: Python 3.2

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com