Angel, Thanks again. It should be fixed in the next release... or SVN trunk right now.
James On 1/4/2012 7:14 PM, Angel Luis Jimenez Martinez wrote: > Hi, > > I have poked the code and it seems I have found the problem. > > In NameFindeME.java method: public Span[] find(String[] tokens, String[][] > additionalContext) > > There's a line like: > > spans.add(new Span(start, end, extractNameType(chunkTag))); > > that should be like this: > > spans.add(new Span(start, end, extractNameType(c.get(li - 1)))); > > And there's a test that needs changing on NameFinderMETest, > method: testOnlyWithEntitiesWithTypes > > These asserts: > > assertEquals(new Span(0, 1, "location"), names1[0]); > assertEquals(new Span(1, 3, "person"), names1[1]); > > should be like this: > > assertEquals(new Span(0, 1, "organization"), names1[0]); > assertEquals(new Span(1, 3, "location"), names1[1]); > assertEquals(new Span(3, 5, "person"), names1[2]); > > I have not thoroughly tested this, so if someone that knows better this > code could take a look at this it would be great :) > > On Wed, Jan 4, 2012 at 9:24 AM, Angel Luis Jimenez Martinez < > soyan...@gmail.com> wrote: > >> Sounds great! >> >> Let me know if you need some testing when you commit the fix, I'll be >> happy to help. >> >> Thanks, James! >> El 04/01/2012 06:32, "James Kosin" <james.ko...@gmail.com> escribió: >> >> I've narrowed the problem down to the output method that generates the >>> output text from the tokenNameFinder.... >>> At least I think that is where the problem lies. >>> >>> James >>> >>> On 1/2/2012 1:49 PM, Angel Luis Jimenez Martinez wrote: >>>> Hi Olivier, >>>> >>>> Right now is a small training set, but the curious thing is with a >>> little >>>> corpus (4 lines) it detects phrases like "call to ann" but not "call >>> ann". >>>> So I suspect there is something wrong when training a with a phrase that >>>> has two consecutive markers. >>>> >>>> I have tried with a bigger corpus like: >>>> >>>> <START:action> call <END> <START:person> mary <END> a tope >>>> <START:action> call <END> <START:person> james <END> a tope >>>> <START:action> call <END> <START:person> mary <END> a tope >>>> <START:action> call <END> <START:person> joe smith <END> a tope >>>> ... >>>> ... >>>> >>>> With about 20 lines but no luck. >>>> >>>> And about the regex it was my first option for this problem, even I >>> have a >>>> working solution... but I quickly found that I wanted to have something >>>> less rigid that I could train with several different phrases, so hence >>> I'm >>>> playing with OpenNLP. >>>> >>>> I'm looking for something that allows me to process phrases like: >>>> >>>> weather in london >>>> how is the weather in london >>>> in london how is the weather right now >>>> today how is the weather near london >>>> >>>> As you can guess using regexes to implement this was not very fun ;-) >>>> >>>> And about the capitalization right now the input comes all in lowercase >>> (it >>>> comes from a speech recognizer like that) >>>> >>>> On Mon, Jan 2, 2012 at 7:31 PM, Olivier Grisel < >>> olivier.gri...@ensta.org>wrote: >>>>> How big is your training set? You don't have any upercase letters in >>>>> your phrases? >>>>> >>>>> You might need a larger and more diverse set of examples (including >>>>> negative examples without any kind of annotations). >>>>> >>>>> Do your sentence always follow such simple patterns? If so should >>>>> probably use a simple regular expression with a fixed / controlled >>>>> list of action names. >>>>> >>>>> -- >>>>> Olivier >>>>> http://twitter.com/ogrisel - http://github.com/ogrisel >>>>> >>>> >>> >