Ah yes, I was just going by membership in the CJK Unified Ideographs Extension E block, not actual assignment.
So the lack of assignment means it should fail the Unified_Ideograph membership in http://unicode.org/reports/tr10/#Values_For_Base_Table Got it! Thanks James On Wed, Sep 27, 2017 at 5:29 PM, Ken Whistler via Unicode < unicode@unicode.org> wrote: > > > On 9/27/2017 2:19 PM, Markus Scherer via Unicode wrote: > > On Wed, Sep 27, 2017 at 1:49 PM, James Tauber via Unicode < > unicode@unicode.org> wrote: > >> I recently updated pyuca[1], my pure Python implementation of the Unicode >> Collation Algorithm to work with 8.0.0, 9.0.0, and 10.0.0 but to get all >> the tests to work, I had to special case the implicit weight base for >> U+2CEA2. The spec seems to suggest the base should be FB80 but I had to >> override just that code point to have a base of FBC0 for the tests to pass. >> >> Is this a known issue with the spec or something I've missed? >> > > 2CEA2..2CEAF are unassigned code points for which the UCA+DUCET uses a > base of FBC0. > > markus > > > And you may have a range error in Extension E to account for the test > problem. > > The relevant section of CollationTest_SHIFTED_SHORT.txt has tests that > will pass only if: > > 2B735 < 2B81E < 2CEA2 < 2EBE1 < 2FFFE > Ext C < Ext D < Ext E < Ext F < non-character > > Those are *unassigned* characters just past the assigned ranges but still > in the blocks in each of those CJK extensions. So if you have a range error > for assigned characters in Extension E, you'd get a failure at that point > in the text cases. > > --Ken > >