I hit on another idea later. If you don't have to use regular
expressions, you can also convert it to UCS-4, break the
resulting string into a collection of individual 4 octets,
and finally turn them back into the original encoding.
This must be the fastest.

It is not a perfect solution by any means, however it does seem to accomplish
the goal better then the current solution which does not work at all. BTW if
you do know of any documentation about morphological analyzer especially with
focus on multibyte languages I would be grateful if you could share that
information.

Please take a look over this list's archive, as I posted a pointer to the resource already.

Moriyoshi

--
PHP Internationalization Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Reply via email to