Hi Olivier, Right now is a small training set, but the curious thing is with a little corpus (4 lines) it detects phrases like "call to ann" but not "call ann". So I suspect there is something wrong when training a with a phrase that has two consecutive markers.
I have tried with a bigger corpus like: <START:action> call <END> <START:person> mary <END> a tope <START:action> call <END> <START:person> james <END> a tope <START:action> call <END> <START:person> mary <END> a tope <START:action> call <END> <START:person> joe smith <END> a tope ... ... With about 20 lines but no luck. And about the regex it was my first option for this problem, even I have a working solution... but I quickly found that I wanted to have something less rigid that I could train with several different phrases, so hence I'm playing with OpenNLP. I'm looking for something that allows me to process phrases like: weather in london how is the weather in london in london how is the weather right now today how is the weather near london As you can guess using regexes to implement this was not very fun ;-) And about the capitalization right now the input comes all in lowercase (it comes from a speech recognizer like that) On Mon, Jan 2, 2012 at 7:31 PM, Olivier Grisel <olivier.gri...@ensta.org>wrote: > How big is your training set? You don't have any upercase letters in > your phrases? > > You might need a larger and more diverse set of examples (including > negative examples without any kind of annotations). > > Do your sentence always follow such simple patterns? If so should > probably use a simple regular expression with a fixed / controlled > list of action names. > > -- > Olivier > http://twitter.com/ogrisel - http://github.com/ogrisel > -- Angel.