Martin Sebor wrote:
Andrew Black wrote:
Greetings all.

When building the UTF-8 locales on windows with the debug version of the localedef utility, the localedef utility terminates with a failed assertion within the library (in __rw_debug_iter::operator*() in _iterbase.h). Within collate.cpp, the failure occurs on line 579.

A trace of the code

It might be helpful to see the stack trace.

The call stack when the assertion dialog is presented is as follows, but I don't find it to be very helpful:
        localedef.exe!_NMSG_WRITE(int rterrnum=10)  Line 195    C
        localedef.exe!abort()  Line 44 + 0x7    C
        localedef.exe!__rw::__rw_assert_fail(const char * expr=0x004bdc20, 
const char * file=0x004bdc3c, int line=436, const char * func=0x00540418)  Line 
97   C++
        localedef.exe!__rw::__rw_debug_iter<__rw::__rb_tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,unsigned 
short>,__rw::__select1st<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,unsigned short>,std::basic_string<char,std::char_traits<char>,std::allocator<char> > 
>,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,unsigned short> > 
>,__rw::__rw_tree_iter<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,unsigned short>,int,std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,unsigned short> const 
*,std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,unsigned short> const &,__rw::__rw_rb_tree_nod
e<std::allocator<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,unsigned short> >,std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,unsigned 
short>,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,__rw::__select1st<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,unsigned 
short>,std::basic_string<char,std::char_traits<char>,std::allocator<char> > > > >,__rw::__rw_tree_iter<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,unsigned 
short>,int,std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,unsigned short> *,std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,unsigned short> 
&,__rw::__rw_rb_tree_node<std::allocator<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,unsigned short> >,std::pair<std::basic_stri
ng<char,std::char_traits<char>,std::allocator<char> > const ,unsigned 
short>,std::bas()  Line 436 + 0x2a   C++
        localedef.exe!Def::add_missing_values(const std::vector<bool,std::allocator<bool> 
> & ordinal_weights={...}, const Def::Weights_t * weights_template=0x00000000, unsigned int 
& coll_value=1158, bool give_warning=true)  Line 579 + 0x1c   C++
        localedef.exe!Def::process_collate()  Line 840  C++
        localedef.exe!Def::process_input()  Line 499    C++
        localedef.exe!create_locale(std::basic_string<char,std::char_traits<char>,std::allocator<char> > std_src={...}, 
std::basic_string<char,std::char_traits<char>,std::allocator<char> > std_cmap={...}, 
std::basic_string<char,std::char_traits<char>,std::allocator<char> > outdir={...}, 
std::basic_string<char,std::char_traits<char>,std::allocator<char> > std_locale={...}, bool force_output=true, bool use_ucs=false, 
bool no_position=false, bool link_aliases=false)  Line 210 + 0xb     C++
        localedef.exe!main(int argc=8, char * * argv=0x00ab0f48)  Line 561 + 
0xc7       C++
        localedef.exe!mainCRTStartup()  Line 259 + 0x19 C
        kernel32.dll!7c816fd7()         
        ntdll.dll!7c915b4f()    


indicates that the last good iteration across this line is iteration number 56677, for the token 'UFFFD'.

I assume this on line 23337 of UTF-8.

I believe the UTF-8 encoding file has already been read in at this point, and the loop is iterating over the map the contents were stored in. However, my reading of the charmap file would also pinpoint that line.


The following token (<U00010300>) fails because __rw_debug_iter::_C_is_end() returns true. However, my reading of collate.cpp is that this condition shouldn't happen, as the termination condition of loop containing the statement in question is suppose to terminate when this condition is reached.

Does this indicate a flaw in std::map or something else?

More likely, in collate.cpp or somewhere in the rest of localedef.
I suspect it has to do with wchar_t being only 16 bits wide on
Windows and the character map containing characters (such as
<U00010300>) beyond that range. To fix this we'll either need to
replace wchar_t with a 32-bit type or ignore characters that do
not fit in 16 bits on Windows (and wherever else wchar_t isn't
32 bits, such as AIX).

You would be in a better position to determine the correct course of action than I am. The drawback of using a 32 bit datatype would be being unable to use the system wchar_t functions, while the drawback of staying with the native wchar_t would be the loss of the upper segment of the character map. My instinct is to use the wider datatype, but I don't know how efficient it would be to replace the entire set of system wchar_t functions. (OS X 10.2 was lacking in wchar_t functions if I recall correctly, and we never really got the library compiling there.)

--Andrew Black

Reply via email to