Thanks, it will help my troubleshooting of the issue.  I looked and
found out it wasn't with the training, then found out the output only
happens when the tags are back to back... so I started outputting the
type passed into the span and outputting the constructor information as
well.  Found out that someone may have been construction the Span with
the wrong type.
Thanks again, and I'll check this and give you full credit.

James

On 1/4/2012 7:14 PM, Angel Luis Jimenez Martinez wrote:
> Hi,
>
> I have poked the code and it seems I have found the problem.
>
> In NameFindeME.java method: public Span[] find(String[] tokens, String[][]
> additionalContext)
>
> There's a line like:
>
> spans.add(new Span(start, end, extractNameType(chunkTag)));
>
> that should be like this:
>
> spans.add(new Span(start, end, extractNameType(c.get(li - 1))));
>
> And there's a test that needs changing on NameFinderMETest,
> method: testOnlyWithEntitiesWithTypes
>
> These asserts:
>
>     assertEquals(new Span(0, 1, "location"), names1[0]);
>     assertEquals(new Span(1, 3, "person"), names1[1]);
>
> should be like this:
>
>     assertEquals(new Span(0, 1, "organization"), names1[0]);
>     assertEquals(new Span(1, 3, "location"), names1[1]);
>     assertEquals(new Span(3, 5, "person"), names1[2]);
>
> I have not thoroughly tested this, so if someone that knows better this
> code could take a look at this it would be great :)
>
> On Wed, Jan 4, 2012 at 9:24 AM, Angel Luis Jimenez Martinez <
> soyan...@gmail.com> wrote:
>
>> Sounds great!
>>
>> Let me know if you need some testing when you commit the fix, I'll be
>> happy to help.
>>
>> Thanks, James!
>> El 04/01/2012 06:32, "James Kosin" <james.ko...@gmail.com> escribió:
>>
>> I've narrowed the problem down to the output method that generates the
>>> output text from the tokenNameFinder....
>>> At least I think that is where the problem lies.
>>>
>>> James
>>>
>>> On 1/2/2012 1:49 PM, Angel Luis Jimenez Martinez wrote:
>>>> Hi Olivier,
>>>>
>>>> Right now is a small training set, but the curious thing is with a
>>> little
>>>> corpus (4 lines) it detects phrases like "call to ann" but not "call
>>> ann".
>>>> So I suspect there is something wrong when training a with a phrase that
>>>> has two consecutive markers.
>>>>
>>>> I have tried with a bigger corpus like:
>>>>
>>>> <START:action> call <END> <START:person> mary <END> a tope
>>>> <START:action> call <END> <START:person> james <END> a tope
>>>> <START:action> call <END> <START:person> mary <END> a tope
>>>> <START:action> call <END> <START:person> joe smith <END> a tope
>>>> ...
>>>> ...
>>>>
>>>> With about 20 lines but no luck.
>>>>
>>>> And about the regex it was my first option for this problem, even I
>>> have a
>>>> working solution... but I quickly found that I wanted to have something
>>>> less rigid that I could train with several different phrases, so hence
>>> I'm
>>>> playing with OpenNLP.
>>>>
>>>> I'm looking for something that allows me to process phrases like:
>>>>
>>>> weather in london
>>>> how is the weather in london
>>>> in london how is the weather right now
>>>> today how is the weather near london
>>>>
>>>> As you can guess using regexes to implement this was not very fun ;-)
>>>>
>>>> And about the capitalization right now the input comes all in lowercase
>>> (it
>>>> comes from a speech recognizer like that)
>>>>
>>>> On Mon, Jan 2, 2012 at 7:31 PM, Olivier Grisel <
>>> olivier.gri...@ensta.org>wrote:
>>>>> How big is your training set? You don't have any upercase letters in
>>>>> your phrases?
>>>>>
>>>>> You might need a larger and more diverse set of examples (including
>>>>> negative examples without any kind of annotations).
>>>>>
>>>>> Do your sentence always follow such simple patterns? If so should
>>>>> probably use a simple regular expression with a fixed / controlled
>>>>> list of action names.
>>>>>
>>>>> --
>>>>> Olivier
>>>>> http://twitter.com/ogrisel - http://github.com/ogrisel
>>>>>
>>>>
>>>
>

Reply via email to