Re: Name finder training not working with consecutive tags

James Kosin Wed, 04 Jan 2012 21:00:56 -0800

Angel,

Thanks again.
It should be fixed in the next release... or SVN trunk right now.


James

On 1/4/2012 7:14 PM, Angel Luis Jimenez Martinez wrote:
> Hi,
>
> I have poked the code and it seems I have found the problem.
>
> In NameFindeME.java method: public Span[] find(String[] tokens, String[][]
> additionalContext)
>
> There's a line like:
>
> spans.add(new Span(start, end, extractNameType(chunkTag)));
>
> that should be like this:
>
> spans.add(new Span(start, end, extractNameType(c.get(li - 1))));
>
> And there's a test that needs changing on NameFinderMETest,
> method: testOnlyWithEntitiesWithTypes
>
> These asserts:
>
>     assertEquals(new Span(0, 1, "location"), names1[0]);
>     assertEquals(new Span(1, 3, "person"), names1[1]);
>
> should be like this:
>
>     assertEquals(new Span(0, 1, "organization"), names1[0]);
>     assertEquals(new Span(1, 3, "location"), names1[1]);
>     assertEquals(new Span(3, 5, "person"), names1[2]);
>
> I have not thoroughly tested this, so if someone that knows better this
> code could take a look at this it would be great :)
>
> On Wed, Jan 4, 2012 at 9:24 AM, Angel Luis Jimenez Martinez <
> soyan...@gmail.com> wrote:
>
>> Sounds great!
>>
>> Let me know if you need some testing when you commit the fix, I'll be
>> happy to help.
>>
>> Thanks, James!
>> El 04/01/2012 06:32, "James Kosin" <james.ko...@gmail.com> escribió:
>>
>> I've narrowed the problem down to the output method that generates the
>>> output text from the tokenNameFinder....
>>> At least I think that is where the problem lies.
>>>
>>> James
>>>
>>> On 1/2/2012 1:49 PM, Angel Luis Jimenez Martinez wrote:
>>>> Hi Olivier,
>>>>
>>>> Right now is a small training set, but the curious thing is with a
>>> little
>>>> corpus (4 lines) it detects phrases like "call to ann" but not "call
>>> ann".
>>>> So I suspect there is something wrong when training a with a phrase that
>>>> has two consecutive markers.
>>>>
>>>> I have tried with a bigger corpus like:
>>>>
>>>> <START:action> call <END> <START:person> mary <END> a tope
>>>> <START:action> call <END> <START:person> james <END> a tope
>>>> <START:action> call <END> <START:person> mary <END> a tope
>>>> <START:action> call <END> <START:person> joe smith <END> a tope
>>>> ...
>>>> ...
>>>>
>>>> With about 20 lines but no luck.
>>>>
>>>> And about the regex it was my first option for this problem, even I
>>> have a
>>>> working solution... but I quickly found that I wanted to have something
>>>> less rigid that I could train with several different phrases, so hence
>>> I'm
>>>> playing with OpenNLP.
>>>>
>>>> I'm looking for something that allows me to process phrases like:
>>>>
>>>> weather in london
>>>> how is the weather in london
>>>> in london how is the weather right now
>>>> today how is the weather near london
>>>>
>>>> As you can guess using regexes to implement this was not very fun ;-)
>>>>
>>>> And about the capitalization right now the input comes all in lowercase
>>> (it
>>>> comes from a speech recognizer like that)
>>>>
>>>> On Mon, Jan 2, 2012 at 7:31 PM, Olivier Grisel <
>>> olivier.gri...@ensta.org>wrote:
>>>>> How big is your training set? You don't have any upercase letters in
>>>>> your phrases?
>>>>>
>>>>> You might need a larger and more diverse set of examples (including
>>>>> negative examples without any kind of annotations).
>>>>>
>>>>> Do your sentence always follow such simple patterns? If so should
>>>>> probably use a simple regular expression with a fixed / controlled
>>>>> list of action names.
>>>>>
>>>>> --
>>>>> Olivier
>>>>> http://twitter.com/ogrisel - http://github.com/ogrisel
>>>>>
>>>>
>>>
>

Re: Name finder training not working with consecutive tags

Reply via email to