Hi,

I seem to have problems with umlauts, such as in words

   Präsentation

When a document is added with

   return new AI::Categorizer::Document(name    => $filename,
                                        content => $content);

to the collection, after loading and finish, the feature vector
contains only fragments of these words, such as

    pr         => 1
    sentation  => 1

Setting the locale on the shell or in Perl does not have any effect

    use locale;

not even with turning on de_AT explicitly.

--

Aaaaaah, lib/AI/Categorizer/Document.pm is NOT using locale and use locale
is very, uhm, local %-)

Patching the file does not seem to break the test cases.

\rho

Reply via email to