Re: case sensitivity of the tagger

2014-03-20 Thread Dave Pawson
A newbie answer.

Would not regex help?   [b|B]en

regards

On 19 March 2014 21:10, Andriy Rysin ary...@gmail.com wrote:
 If I have two words in dictionary that differ only in captitalization
 of the first letter then when the tagger finds lowercase in the
 sentence it correcly tags only from lowercase one (and if it find a
 capitalized version in the sentence it correcly tags from both).
 But if I have a capitalized word (proper noun) in a dictionary that
 does not have a same non-capitalized sibling if the lowercase version
 is found in the sentence (which I would expect it to be not tagged) it
 tags it from capitalized word (which is wrong at least for Ukrainian).

 For example I have Бен (Ben) defined as man's name in the dictionary
 but not бен (ben) so when бен Ладен (bin Laden) is found the бен
 is tagged as name.

 I can probably create a lowercase бен in the dictionary with some
 special/empty tag (it's not a separate word per se) to fix it, but
 just wanted to doublecheck this was intentional and there's no easy
 way to configure it the other way.

 Thanks
 Andriy

 --
 Learn Graph Databases - Download FREE O'Reilly Book
 Graph Databases is the definitive new guide to graph databases and their
 applications. Written by three acclaimed leaders in the field,
 this first edition is now available. Download your free book today!
 http://p.sf.net/sfu/13534_NeoTech
 ___
 Languagetool-devel mailing list
 Languagetool-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/languagetool-devel



-- 
Dave Pawson
XSLT XSL-FO FAQ.
Docbook FAQ.
http://www.dpawson.co.uk

--
Learn Graph Databases - Download FREE O'Reilly Book
Graph Databases is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: case sensitivity of the tagger

2014-03-20 Thread Daniel Naber
On 2014-03-20 08:46, Daniel Naber wrote:

 Have you debugged this? It seems strange, as e.g. Dog in English 
 won't
 be tagged

Not sure how I tested this yesterday, but that claim is wrong. Both 
dog and Dog in English are tagged as NN.

Regards
  Daniel


--
Learn Graph Databases - Download FREE O'Reilly Book
Graph Databases is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: case sensitivity of the tagger

2014-03-20 Thread Andriy Rysin
Thanks Jaume, I'll try how it works a bit later (I've already added
бен to dictionary so I could push my changes).
One more question around this is if I don't want to have бен as a
separate word ahow can I mark бен Ладен (bin Laden) as a single
word/noun? I use multiwords.txt to mark similar phrases but in this
example Ладен can be inflected, I think multiwords does not support
that (without writing all 7 forms of it in). Or is there better
approach do treat these not-really-a-word-by-itself situations?

Thanks
Andriy

2014-03-20 4:19 GMT-04:00 Jaume Ortolà i Font jaumeort...@gmail.com:
 2014-03-20 8:46 GMT+01:00 Daniel Naber daniel.na...@languagetool.org:

 On 2014-03-19 22:10, Andriy Rysin wrote:

  For example I have Бен (Ben) defined as man's name in the dictionary
  but not бен (ben) so when бен Ладен (bin Laden) is found the бен
  is tagged as name.

 Have you debugged this? It seems strange, as e.g. Dog in English won't
 be tagged, and the English tagger also extends BaseTagger so it should
 behave as the Ukrainian one. Or am I missing something?



 Hi,

 The BaseTagger, by default, tags lowercase words with capitalized word tags.
 To change this, you can add dontTagLowercaseWithUppercase(); to your
 UkrainianTagger constructor. I have done it for you.

 Regards,
 Jaume


 --
 Learn Graph Databases - Download FREE O'Reilly Book
 Graph Databases is the definitive new guide to graph databases and their
 applications. Written by three acclaimed leaders in the field,
 this first edition is now available. Download your free book today!
 http://p.sf.net/sfu/13534_NeoTech
 ___
 Languagetool-devel mailing list
 Languagetool-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/languagetool-devel


--
Learn Graph Databases - Download FREE O'Reilly Book
Graph Databases is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel