Hi, I have poked the code and it seems I have found the problem.
In NameFindeME.java method: public Span[] find(String[] tokens, String[][] additionalContext) There's a line like: spans.add(new Span(start, end, extractNameType(chunkTag))); that should be like this: spans.add(new Span(start, end, extractNameType(c.get(li - 1)))); And there's a test that needs changing on NameFinderMETest, method: testOnlyWithEntitiesWithTypes These asserts: assertEquals(new Span(0, 1, "location"), names1[0]); assertEquals(new Span(1, 3, "person"), names1[1]); should be like this: assertEquals(new Span(0, 1, "organization"), names1[0]); assertEquals(new Span(1, 3, "location"), names1[1]); assertEquals(new Span(3, 5, "person"), names1[2]); I have not thoroughly tested this, so if someone that knows better this code could take a look at this it would be great :) On Wed, Jan 4, 2012 at 9:24 AM, Angel Luis Jimenez Martinez < soyan...@gmail.com> wrote: > Sounds great! > > Let me know if you need some testing when you commit the fix, I'll be > happy to help. > > Thanks, James! > El 04/01/2012 06:32, "James Kosin" <james.ko...@gmail.com> escribió: > > I've narrowed the problem down to the output method that generates the >> output text from the tokenNameFinder.... >> At least I think that is where the problem lies. >> >> James >> >> On 1/2/2012 1:49 PM, Angel Luis Jimenez Martinez wrote: >> > Hi Olivier, >> > >> > Right now is a small training set, but the curious thing is with a >> little >> > corpus (4 lines) it detects phrases like "call to ann" but not "call >> ann". >> > So I suspect there is something wrong when training a with a phrase that >> > has two consecutive markers. >> > >> > I have tried with a bigger corpus like: >> > >> > <START:action> call <END> <START:person> mary <END> a tope >> > <START:action> call <END> <START:person> james <END> a tope >> > <START:action> call <END> <START:person> mary <END> a tope >> > <START:action> call <END> <START:person> joe smith <END> a tope >> > ... >> > ... >> > >> > With about 20 lines but no luck. >> > >> > And about the regex it was my first option for this problem, even I >> have a >> > working solution... but I quickly found that I wanted to have something >> > less rigid that I could train with several different phrases, so hence >> I'm >> > playing with OpenNLP. >> > >> > I'm looking for something that allows me to process phrases like: >> > >> > weather in london >> > how is the weather in london >> > in london how is the weather right now >> > today how is the weather near london >> > >> > As you can guess using regexes to implement this was not very fun ;-) >> > >> > And about the capitalization right now the input comes all in lowercase >> (it >> > comes from a speech recognizer like that) >> > >> > On Mon, Jan 2, 2012 at 7:31 PM, Olivier Grisel < >> olivier.gri...@ensta.org>wrote: >> > >> >> How big is your training set? You don't have any upercase letters in >> >> your phrases? >> >> >> >> You might need a larger and more diverse set of examples (including >> >> negative examples without any kind of annotations). >> >> >> >> Do your sentence always follow such simple patterns? If so should >> >> probably use a simple regular expression with a fixed / controlled >> >> list of action names. >> >> >> >> -- >> >> Olivier >> >> http://twitter.com/ogrisel - http://github.com/ogrisel >> >> >> > >> > >> >> -- Angel.