Re: case sensitivity of the tagger
A newbie answer. Would not regex help? [b|B]en regards On 19 March 2014 21:10, Andriy Rysin ary...@gmail.com wrote: If I have two words in dictionary that differ only in captitalization of the first letter then when the tagger finds lowercase in the sentence it correcly tags only from lowercase one (and if it find a capitalized version in the sentence it correcly tags from both). But if I have a capitalized word (proper noun) in a dictionary that does not have a same non-capitalized sibling if the lowercase version is found in the sentence (which I would expect it to be not tagged) it tags it from capitalized word (which is wrong at least for Ukrainian). For example I have Бен (Ben) defined as man's name in the dictionary but not бен (ben) so when бен Ладен (bin Laden) is found the бен is tagged as name. I can probably create a lowercase бен in the dictionary with some special/empty tag (it's not a separate word per se) to fix it, but just wanted to doublecheck this was intentional and there's no easy way to configure it the other way. Thanks Andriy -- Learn Graph Databases - Download FREE O'Reilly Book Graph Databases is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/13534_NeoTech ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel -- Dave Pawson XSLT XSL-FO FAQ. Docbook FAQ. http://www.dpawson.co.uk -- Learn Graph Databases - Download FREE O'Reilly Book Graph Databases is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/13534_NeoTech ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: case sensitivity of the tagger
On 2014-03-20 08:46, Daniel Naber wrote: Have you debugged this? It seems strange, as e.g. Dog in English won't be tagged Not sure how I tested this yesterday, but that claim is wrong. Both dog and Dog in English are tagged as NN. Regards Daniel -- Learn Graph Databases - Download FREE O'Reilly Book Graph Databases is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/13534_NeoTech ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: case sensitivity of the tagger
Thanks Jaume, I'll try how it works a bit later (I've already added бен to dictionary so I could push my changes). One more question around this is if I don't want to have бен as a separate word ahow can I mark бен Ладен (bin Laden) as a single word/noun? I use multiwords.txt to mark similar phrases but in this example Ладен can be inflected, I think multiwords does not support that (without writing all 7 forms of it in). Or is there better approach do treat these not-really-a-word-by-itself situations? Thanks Andriy 2014-03-20 4:19 GMT-04:00 Jaume Ortolà i Font jaumeort...@gmail.com: 2014-03-20 8:46 GMT+01:00 Daniel Naber daniel.na...@languagetool.org: On 2014-03-19 22:10, Andriy Rysin wrote: For example I have Бен (Ben) defined as man's name in the dictionary but not бен (ben) so when бен Ладен (bin Laden) is found the бен is tagged as name. Have you debugged this? It seems strange, as e.g. Dog in English won't be tagged, and the English tagger also extends BaseTagger so it should behave as the Ukrainian one. Or am I missing something? Hi, The BaseTagger, by default, tags lowercase words with capitalized word tags. To change this, you can add dontTagLowercaseWithUppercase(); to your UkrainianTagger constructor. I have done it for you. Regards, Jaume -- Learn Graph Databases - Download FREE O'Reilly Book Graph Databases is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/13534_NeoTech ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel -- Learn Graph Databases - Download FREE O'Reilly Book Graph Databases is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/13534_NeoTech ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel