Thanks Angel,

There seems to be a BUG with back to back spans... I'm trying to track
the problem down.

James

On 1/2/2012 1:49 PM, Angel Luis Jimenez Martinez wrote:
> Hi Olivier,
>
> Right now is a small training set, but the curious thing is with a little
> corpus (4 lines) it detects phrases like "call to ann" but not "call ann".
> So I suspect there is something wrong when training a with a phrase that
> has two consecutive markers.
>
> I have tried with a bigger corpus like:
>
> <START:action> call <END> <START:person> mary <END> a tope
> <START:action> call <END> <START:person> james <END> a tope
> <START:action> call <END> <START:person> mary <END> a tope
> <START:action> call <END> <START:person> joe smith <END> a tope
> ...
> ...
>
> With about 20 lines but no luck.
>
> And about the regex it was my first option for this problem, even I have a
> working solution... but I quickly found that I wanted to have something
> less rigid that I could train with several different phrases, so hence I'm
> playing with OpenNLP.
>
> I'm looking for something that allows me to process phrases like:
>
> weather in london
> how is the weather in london
> in london how is the weather right now
> today how is the weather near london
>
> As you can guess using regexes to implement this was not very fun ;-)
>
> And about the capitalization right now the input comes all in lowercase (it
> comes from a speech recognizer like that)
>
> On Mon, Jan 2, 2012 at 7:31 PM, Olivier Grisel 
> <olivier.gri...@ensta.org>wrote:
>
>> How big is your training set? You don't have any upercase letters in
>> your phrases?
>>
>> You might need a larger and more diverse set of examples (including
>> negative examples without any kind of annotations).
>>
>> Do your sentence always follow such simple patterns? If so should
>> probably use a simple regular expression with a fixed / controlled
>> list of action names.
>>
>> --
>> Olivier
>> http://twitter.com/ogrisel - http://github.com/ogrisel
>>
>
>

Reply via email to