Re: SRX

2014-08-19 Thread Daniel Naber
On 2014-08-18 17:18, R.J. Baars wrote: I was able to test, and removed 2 of my additions to make it work. Thanks, I have committed it. It will be part of the daily builds tonight: https://languagetool.org/download/snapshots/?C=M;O=D Regards Daniel

Re: SRX

2014-08-19 Thread R.J. Baars
I am currently checking the output of all the rules on the 20GB corpus; some rules are perfect, some less (though hard to tweak). Result will be a major update, I guess... Ruud On 2014-08-18 17:18, R.J. Baars wrote: I was able to test, and removed 2 of my additions to make it work. Thanks,

SRX

2014-08-18 Thread R.J. Baars
There is an adjustment to make in the sentence splitter. But where did the .srx go? I detected an abbreviation that is commonly used and as for now seen as sentence end: milj. Could this be added to the Dutch srx rules? Ruud

Re: SRX

2014-08-18 Thread R.J. Baars
Same applies to [0-9]{1,2}[-]pers. Ruud There is an adjustment to make in the sentence splitter. But where did the .srx go? I detected an abbreviation that is commonly used and as for now seen as sentence end: milj. Could this be added to the Dutch srx rules? Ruud

Re: SRX

2014-08-18 Thread Daniel Naber
On 2014-08-18 16:16, R.J. Baars wrote: There is an adjustment to make in the sentence splitter. But where did the .srx go? It's at languagetool-core/src/main/resources/org/languagetool/resource/segment.srx Could this be added to the Dutch srx rules? Sure, could you send a patch? Regards

Re: SRX

2014-08-18 Thread R.J. Baars
I am not qualified to edit sources. Just no programmer. Unfortunately, the srx is not separate per languages too. I found the source on Github (which I don't really understand) so I will be able to adjust, and send it to you. But how can I test it if it is not in the runtime version? Ruud

Re: switching to SRX sentence tokenizer

2014-04-12 Thread Marcin Miłkowski
W dniu 2014-04-12 09:55, Daniel Naber pisze: On 2014-04-12 09:34, Marcin Miłkowski wrote: SRX file can be easily edited and we will happily accept all patches, also for languages without complete support in LT. Where's the problem? Today, you can extend the Language class and have a Regex

Re: SRX Sentence Tokenizer

2013-05-02 Thread Daniel Naber
On 01.05.2013, 12:18:41 Andriy Rysin wrote: P.S. BTW would not it make sense to split segement.srx by language modules? Absolutely. This isn't very high on my personal TODO list though, so any help/patches are welcome. Regards Daniel -- http://www.danielnaber.de

Re: SRX Sentence Tokenizer

2013-05-02 Thread Marcin Miłkowski
Most srx-compliant software uses a single file for all languages, AFAIK. Regards, Marcin 02-05-2013 09:08 użytkownik Daniel Naber list2...@danielnaber.de napisał: On 01.05.2013, 12:18:41 Andriy Rysin wrote: P.S. BTW would not it make sense to split segement.srx by language modules

SRX Sentence Tokenizer

2013-05-01 Thread Andriy Rysin
Hi all I need a bit help with srx sentence tokenizer, I've added this rule to prevent sentence split on Name abbreviation+Surname, e.g. Т.Шевченко which is often met in texts. The rule will need to be a bit more complex but I am trying something simple first. rule break=no beforebreak\b[А-ЯІЇЄҐ

Re: SRX Sentence Tokenizer

2013-05-01 Thread Piotr
Maybe the part after the \. should be in the afterbreak element? Regards, Piotr On Wed, May 1, 2013 at 6:18 PM, Andriy Rysin ary...@gmail.com wrote: Hi all I need a bit help with srx sentence tokenizer, I've added this rule to prevent sentence split on Name abbreviation+Surname, e.g

Re: SRX Sentence Tokenizer

2013-05-01 Thread Andriy Rysin
Thanks, that helped! Andriy On 05/01/2013 02:54 PM, Piotr wrote: Maybe the part after the \. should be in the afterbreak element? Regards, Piotr On Wed, May 1, 2013 at 6:18 PM, Andriy Rysin ary...@gmail.com mailto:ary...@gmail.com wrote: Hi all I need a bit help with srx