On 02/28/2013 04:54 PM, Stefan Matheis wrote:
Hey Guys
I know, that may not be the normal use case, but my java knowledge is limited
and a command line call would be the easiest way to integrate the OpenNLP
capabilities into the project, so bear with me :)
$ cat input.txt
Sein Song "Nightcall" hat den Film "Drive" mit Ryan Gosling erst so richtig
bekannt gemacht. Wir haben uns mit Vincent Belorgey, besser bekannt als Kavinsky, über sein
Debütalbum, seine Musik und die 80er Jahre unterhalten.
$ bin/opennlp SimpleTokenizer < input.txt
Sein Song " Nightcall " hat den Film " Drive " mit Ryan Gosling erst so richtig
bekannt gemacht . Wir haben uns mit Vincent Belorgey , besser bekannt als Kavinsky , ?? ber sein
Deb ?? talbum , seine Musik und die 80 er Jahre unterhalten .
Average: 166.7 sent/s
Total: 1 sent
Runtime: 0.006s
Is my console misconfigured? The Input maybe not correct encoded? Or does it
just not work?™ Of course i can work around that and create somehow a matching
for those words originally containing Umlauts .. but, if it would be possible
to avoid that? (:
While screening the web .. i found
https://issues.apache.org/jira/browse/OPENNLP-172 but i'm not sure how that may
or may not be related to me problem.
OpenNLP is using the platform default encoding to read from the console,
that usually works as long as the platform default encoding
can encode the content which is passed to OpenNLP.
On which OS do you run? What is your platform encoding?
Jörn