Thanks Angel, There seems to be a BUG with back to back spans... I'm trying to track the problem down.
James On 1/2/2012 1:49 PM, Angel Luis Jimenez Martinez wrote: > Hi Olivier, > > Right now is a small training set, but the curious thing is with a little > corpus (4 lines) it detects phrases like "call to ann" but not "call ann". > So I suspect there is something wrong when training a with a phrase that > has two consecutive markers. > > I have tried with a bigger corpus like: > > <START:action> call <END> <START:person> mary <END> a tope > <START:action> call <END> <START:person> james <END> a tope > <START:action> call <END> <START:person> mary <END> a tope > <START:action> call <END> <START:person> joe smith <END> a tope > ... > ... > > With about 20 lines but no luck. > > And about the regex it was my first option for this problem, even I have a > working solution... but I quickly found that I wanted to have something > less rigid that I could train with several different phrases, so hence I'm > playing with OpenNLP. > > I'm looking for something that allows me to process phrases like: > > weather in london > how is the weather in london > in london how is the weather right now > today how is the weather near london > > As you can guess using regexes to implement this was not very fun ;-) > > And about the capitalization right now the input comes all in lowercase (it > comes from a speech recognizer like that) > > On Mon, Jan 2, 2012 at 7:31 PM, Olivier Grisel > <olivier.gri...@ensta.org>wrote: > >> How big is your training set? You don't have any upercase letters in >> your phrases? >> >> You might need a larger and more diverse set of examples (including >> negative examples without any kind of annotations). >> >> Do your sentence always follow such simple patterns? If so should >> probably use a simple regular expression with a fixed / controlled >> list of action names. >> >> -- >> Olivier >> http://twitter.com/ogrisel - http://github.com/ogrisel >> > >