On Tue, Aug 17, 2010 at 4:06 PM, Nikola Smolenski <[email protected]> wrote:
> For some time now, I am thinking about a stupidly simple solution:
>
> php -r 'for($i = 0; $i < 65536; $i++) { echo pack("nx", $i); echo "\n"; }'|
> iconv -f ucs-2be -t utf8 | sort | php -r 'foreach(file("php://stdin") as $v)
> { echo var_export(substr($v, 0, -1)) . " => \"" . str_pad(base_convert($i,
> 10, 36), 4, 0, STR_PAD_LEFT) . "\",\n"; $i++; }'

This doesn't account for how complicated proper locale-specific
sorting is.  Multi-character strings do not sort just based on
splitting them into characters and sorting those.  You can have the
same character sorting differently in different contexts.  There are
well-established libraries for Unicode sorting, and we certainly
should not try to reinvent the wheel here.

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to