Patches item #1734234, was opened at 2007-06-10 01:45
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1734234&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Modules
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Rauli Ruohonen (raulir)
Assigned to: Nobody/Anonymous (nobody)
Summary: Fast path for unicodedata.normalize()

Initial Comment:
Implements quick checking of already normalized forms as 
described in http://unicode.org/reports/tr15/#Annex8

The patch is against 2.6 SVN trunk. Normalization test
passes on both UCS2 and UCS4 builds on Ubuntu Edgy.

API affected:

unicodedata.normalize('NFC', u'a') is u'a' and similar
expressions become true, as the unicode object is not
copied when it is found to be already normalized.
The documentation does not specify either way.

Added memory footprint:

A new 8-bit field is added to _PyUnicode_DatabaseRecord,
and the generated _PyUnicode_Database_Records
array grows from 219 records to 304 records. Each
record looks like this:

typedef struct {
   const unsigned char category;
   const unsigned char combining;
   const unsigned char bidirectional;
   const unsigned char mirrored;
   const unsigned char east_asian_width;
   const unsigned char normalization_quick_check;
} _PyUnicode_DatabaseRecord;

normalization_quick_check is the added field.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1734234&group_id=5470
_______________________________________________
Patches mailing list
Patches@python.org
http://mail.python.org/mailman/listinfo/patches

Reply via email to