Hi,

I have poked the code and it seems I have found the problem.

In NameFindeME.java method: public Span[] find(String[] tokens, String[][]
additionalContext)

There's a line like:

spans.add(new Span(start, end, extractNameType(chunkTag)));

that should be like this:

spans.add(new Span(start, end, extractNameType(c.get(li - 1))));

And there's a test that needs changing on NameFinderMETest,
method: testOnlyWithEntitiesWithTypes

These asserts:

    assertEquals(new Span(0, 1, "location"), names1[0]);
    assertEquals(new Span(1, 3, "person"), names1[1]);

should be like this:

    assertEquals(new Span(0, 1, "organization"), names1[0]);
    assertEquals(new Span(1, 3, "location"), names1[1]);
    assertEquals(new Span(3, 5, "person"), names1[2]);

I have not thoroughly tested this, so if someone that knows better this
code could take a look at this it would be great :)

On Wed, Jan 4, 2012 at 9:24 AM, Angel Luis Jimenez Martinez <
soyan...@gmail.com> wrote:

> Sounds great!
>
> Let me know if you need some testing when you commit the fix, I'll be
> happy to help.
>
> Thanks, James!
> El 04/01/2012 06:32, "James Kosin" <james.ko...@gmail.com> escribió:
>
> I've narrowed the problem down to the output method that generates the
>> output text from the tokenNameFinder....
>> At least I think that is where the problem lies.
>>
>> James
>>
>> On 1/2/2012 1:49 PM, Angel Luis Jimenez Martinez wrote:
>> > Hi Olivier,
>> >
>> > Right now is a small training set, but the curious thing is with a
>> little
>> > corpus (4 lines) it detects phrases like "call to ann" but not "call
>> ann".
>> > So I suspect there is something wrong when training a with a phrase that
>> > has two consecutive markers.
>> >
>> > I have tried with a bigger corpus like:
>> >
>> > <START:action> call <END> <START:person> mary <END> a tope
>> > <START:action> call <END> <START:person> james <END> a tope
>> > <START:action> call <END> <START:person> mary <END> a tope
>> > <START:action> call <END> <START:person> joe smith <END> a tope
>> > ...
>> > ...
>> >
>> > With about 20 lines but no luck.
>> >
>> > And about the regex it was my first option for this problem, even I
>> have a
>> > working solution... but I quickly found that I wanted to have something
>> > less rigid that I could train with several different phrases, so hence
>> I'm
>> > playing with OpenNLP.
>> >
>> > I'm looking for something that allows me to process phrases like:
>> >
>> > weather in london
>> > how is the weather in london
>> > in london how is the weather right now
>> > today how is the weather near london
>> >
>> > As you can guess using regexes to implement this was not very fun ;-)
>> >
>> > And about the capitalization right now the input comes all in lowercase
>> (it
>> > comes from a speech recognizer like that)
>> >
>> > On Mon, Jan 2, 2012 at 7:31 PM, Olivier Grisel <
>> olivier.gri...@ensta.org>wrote:
>> >
>> >> How big is your training set? You don't have any upercase letters in
>> >> your phrases?
>> >>
>> >> You might need a larger and more diverse set of examples (including
>> >> negative examples without any kind of annotations).
>> >>
>> >> Do your sentence always follow such simple patterns? If so should
>> >> probably use a simple regular expression with a fixed / controlled
>> >> list of action names.
>> >>
>> >> --
>> >> Olivier
>> >> http://twitter.com/ogrisel - http://github.com/ogrisel
>> >>
>> >
>> >
>>
>>


-- 
Angel.

Reply via email to