Martin Sebor wrote:
Andrew Black wrote:
Greetings all.
When building the UTF-8 locales on windows with the debug version of
the localedef utility, the localedef utility terminates with a failed
assertion within the library (in __rw_debug_iter::operator*() in
_iterbase.h). Within collate.cpp, the failure occurs on line 579.
A trace of the code
It might be helpful to see the stack trace.
The call stack when the assertion dialog is presented is as follows, but
I don't find it to be very helpful:
localedef.exe!_NMSG_WRITE(int rterrnum=10) Line 195 C
localedef.exe!abort() Line 44 + 0x7 C
localedef.exe!__rw::__rw_assert_fail(const char * expr=0x004bdc20,
const char * file=0x004bdc3c, int line=436, const char * func=0x00540418) Line
97 C++
localedef.exe!__rw::__rw_debug_iter<__rw::__rb_tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,unsigned
short>,__rw::__select1st<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,unsigned short>,std::basic_string<char,std::char_traits<char>,std::allocator<char> >
>,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,unsigned short> >
>,__rw::__rw_tree_iter<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,unsigned short>,int,std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,unsigned short> const
*,std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,unsigned short> const &,__rw::__rw_rb_tree_nod
e<std::allocator<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,unsigned short> >,std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,unsigned
short>,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,__rw::__select1st<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,unsigned
short>,std::basic_string<char,std::char_traits<char>,std::allocator<char> > > > >,__rw::__rw_tree_iter<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,unsigned
short>,int,std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,unsigned short> *,std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,unsigned short>
&,__rw::__rw_rb_tree_node<std::allocator<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,unsigned short> >,std::pair<std::basic_stri
ng<char,std::char_traits<char>,std::allocator<char> > const ,unsigned
short>,std::bas() Line 436 + 0x2a C++
localedef.exe!Def::add_missing_values(const std::vector<bool,std::allocator<bool>
> & ordinal_weights={...}, const Def::Weights_t * weights_template=0x00000000, unsigned int
& coll_value=1158, bool give_warning=true) Line 579 + 0x1c C++
localedef.exe!Def::process_collate() Line 840 C++
localedef.exe!Def::process_input() Line 499 C++
localedef.exe!create_locale(std::basic_string<char,std::char_traits<char>,std::allocator<char> > std_src={...},
std::basic_string<char,std::char_traits<char>,std::allocator<char> > std_cmap={...},
std::basic_string<char,std::char_traits<char>,std::allocator<char> > outdir={...},
std::basic_string<char,std::char_traits<char>,std::allocator<char> > std_locale={...}, bool force_output=true, bool use_ucs=false,
bool no_position=false, bool link_aliases=false) Line 210 + 0xb C++
localedef.exe!main(int argc=8, char * * argv=0x00ab0f48) Line 561 +
0xc7 C++
localedef.exe!mainCRTStartup() Line 259 + 0x19 C
kernel32.dll!7c816fd7()
ntdll.dll!7c915b4f()
indicates that the last good iteration across this line is iteration
number 56677, for the token 'UFFFD'.
I assume this on line 23337 of UTF-8.
I believe the UTF-8 encoding file has already been read in at this
point, and the loop is iterating over the map the contents were stored
in. However, my reading of the charmap file would also pinpoint that line.
The following token (<U00010300>) fails because
__rw_debug_iter::_C_is_end() returns true. However, my reading of
collate.cpp is that this condition shouldn't happen, as the
termination condition of loop containing the statement in question is
suppose to terminate when this condition is reached.
Does this indicate a flaw in std::map or something else?
More likely, in collate.cpp or somewhere in the rest of localedef.
I suspect it has to do with wchar_t being only 16 bits wide on
Windows and the character map containing characters (such as
<U00010300>) beyond that range. To fix this we'll either need to
replace wchar_t with a 32-bit type or ignore characters that do
not fit in 16 bits on Windows (and wherever else wchar_t isn't
32 bits, such as AIX).
You would be in a better position to determine the correct course of
action than I am. The drawback of using a 32 bit datatype would be
being unable to use the system wchar_t functions, while the drawback of
staying with the native wchar_t would be the loss of the upper segment
of the character map. My instinct is to use the wider datatype, but I
don't know how efficient it would be to replace the entire set of system
wchar_t functions. (OS X 10.2 was lacking in wchar_t functions if I
recall correctly, and we never really got the library compiling there.)
--Andrew Black