No rules, but by example

2014-08-19 Thread R.J. Baars
I think a good addition for LT would be to have a general rule, just acting on tokens, a bit like srx does wit letters. bed : ok bed english : not ok = bad english A mechanism, that lets the longer token list overrule the shorter one. This would create the option to add found errors empirically.

Re: SRX

2014-08-19 Thread Daniel Naber
On 2014-08-18 17:18, R.J. Baars wrote: I was able to test, and removed 2 of my additions to make it work. Thanks, I have committed it. It will be part of the daily builds tonight: https://languagetool.org/download/snapshots/?C=M;O=D Regards Daniel

Re: SRX

2014-08-19 Thread R.J. Baars
I am currently checking the output of all the rules on the 20GB corpus; some rules are perfect, some less (though hard to tweak). Result will be a major update, I guess... Ruud On 2014-08-18 17:18, R.J. Baars wrote: I was able to test, and removed 2 of my additions to make it work. Thanks,

urls in grammar file

2014-08-19 Thread R.J. Baars
This generates an error. How do I add multiple urls to 1 rule? urlhttps://onzetaal.nl/taaladvies/advies/instandhouden-in-stand-houden/url urlhttp://taaladvies.net/taal/advies/vraag/412/in_bedrijf_stelling_inbedrijfstelling//url Ruud

Re: urls in grammar file

2014-08-19 Thread Daniel Naber
On 2014-08-19 12:41, R.J. Baars wrote: This generates an error. How do I add multiple urls to 1 rule? Only one URL per rule is currently supported. Regards Daniel -- ___

Re: No rules, but by example

2014-08-19 Thread Daniel Naber
On 2014-08-19 09:14, R.J. Baars wrote: bed : ok bed english : not ok = bad english For some types of errors, I think it works better then current rule/exception type of check. I'm not sure I understand: do you suggest a different (more compact) way to write down simple rules, or do you

Re: urls in grammar file

2014-08-19 Thread R.J. Baars
This is a limitation; there are several texts to refer to sometimes ... Is it on the whishlist to change this? On 2014-08-19 12:41, R.J. Baars wrote: This generates an error. How do I add multiple urls to 1 rule? Only one URL per rule is currently supported. Regards Daniel

Re: No rules, but by example

2014-08-19 Thread R.J. Baars
What I mean is just making a list of token groups, good and bad. I'll try a different example: hand some; wrong; handsome I hand some tools to; correct Another one: bene; wrong;been nota bene;correct It is a very compact way of defining very simple rules. I encounter rules that work fine,

Re: urls in grammar file

2014-08-19 Thread Daniel Naber
On 2014-08-19 13:35, R.J. Baars wrote: This is a limitation; there are several texts to refer to sometimes ... Is it on the whishlist to change this? Please open an issue at https://github.com/languagetool-org/languagetool/issues Regards Daniel

Re: No rules, but by example

2014-08-19 Thread Daniel Naber
On 2014-08-19 13:43, R.J. Baars wrote: hand some; wrong; handsome I hand some tools to; correct It is a very compact way of defining very simple rules. I see, but the thing is that these rules probably won't stay simple for long. What if you want to add you hand some tools, we hand some

support for Persian

2014-08-19 Thread Daniel Naber
Hi, some people will already have noticed it: LT has just added support for Persian. This is an important step, as Persian is the first right-to-left language we support. There will probably be some bugs, but for now it's looking good and there have only been minor issues. Persian is already

Re: No rules, but by example

2014-08-19 Thread R.J. Baars
Postags are a challenge. There are so many words having that amount of postags, it will be hard to get those really wel determined. I will spend some time in the disambiguator, for assigning postags as deleting ambiguous ones where possible. First we have to establish a better postagging system

Re: support for Persian

2014-08-19 Thread R.J. Baars
Persian or Farsi? Hi, some people will already have noticed it: LT has just added support for Persian. This is an important step, as Persian is the first right-to-left language we support. There will probably be some bugs, but for now it's looking good and there have only been minor issues.

Re: support for Persian

2014-08-19 Thread Reza engyian
https://en.wikipedia.org/wiki/Persian_language On Tue, Aug 19, 2014 at 5:33 PM, R.J. Baars r.j.ba...@xs4all.nl wrote: Persian or Farsi? Hi, some people will already have noticed it: LT has just added support for Persian. This is an important step, as Persian is the first

Error message?

2014-08-19 Thread R.J. Baars
java -jar languagetool.jar (process:14246): GLib-CRITICAL **: g_slice_set_config: assertion 'sys_page_size == 0' failed This occurs after clicking on the reference url in the UI; it works though. Ruud --

Re: support for Persian

2014-08-19 Thread R.J. Baars
Okay, if you need sentences to test on, or plain words lists with frequencies, or word groups and their frequencies, I have them. Ruud https://en.wikipedia.org/wiki/Persian_language On Tue, Aug 19, 2014 at 5:33 PM, R.J. Baars r.j.ba...@xs4all.nl wrote: Persian or Farsi? Hi, some

Re: support for Persian

2014-08-19 Thread Reza engyian
Thanks, I want to write suggest for message for rule id=PluralFix name=ZWNJ for Plural extension pattern token regexp='yes'[ءآأؤإئابپةتثجحخچدذرزژسشصضطظعغفقكکگلمنهوىیيًٌٍَُِّْ]+/token token regexp='yes'ها(ی|یی|یم|یت|یش|مان|تان|شان|)/token

Rule not working as expected

2014-08-19 Thread R.J. Baars
I discovered that the rule below is not working very well. It look like 'skip' also skips over sentence boundaries. Is that intentional? Or is something else wrong? In case it is intentional, is there an option to forbid that? Ruud rule id=nr738 name=duur kost pattern token skip=4duur/token

looking for language maintainers

2014-08-19 Thread Daniel Naber
Hi, while we're adding support for languages (Tamil, Persian), we're less successful in finding maintainers for the unmaintained languages we support. Here's a list of languages that need a maintainer: Lithuanian Belarusian Malayalam Swedish Icelandic Japanese Danish Galician Romanian Chinese

Re: Rule not working as expected

2014-08-19 Thread Daniel Naber
On 2014-08-19 15:39, R.J. Baars wrote: I discovered that the rule below is not working very well. It look like 'skip' also skips over sentence boundaries. No, that shouldn't be possible. Maybe sentence detection is broken? What sentence does this match that it shouldn't? Regards Daniel

Re: support for Persian

2014-08-19 Thread Daniel Naber
On 2014-08-19 15:32, Reza engyian wrote: it should suggest first_word+ZWNJ [4]+ها would you please tell me how should I write the suggest ? I haven't tested it, but this should work: messageDo you mean suggestion\1#8204;ها/suggestion?/message Regards Daniel

Re: Rule not working as expected

2014-08-19 Thread Dominique Pellé
R.J. Baars r.j.ba...@xs4all.nl wrote: I discovered that the rule below is not working very well. It look like 'skip' also skips over sentence boundaries. Is that intentional? Or is something else wrong? In case it is intentional, is there an option to forbid that? Ruud rule id=nr738

Re: Rule not working as expected

2014-08-19 Thread R.J. Baars
THanks, I will check it out. The rule is not functioning very well at all. I commented it out and put it on the list of items to do. Ruud R.J. Baars r.j.ba...@xs4all.nl wrote: I discovered that the rule below is not working very well. It look like 'skip' also skips over sentence

Some Universal regex for puntuations

2014-08-19 Thread Reza engyian
Hi all, I found some regexs which help to have better punctuation if they are not controlled by LT please add them to the tool: The Second line after the regexs is suggestion line --- should have space between word and numbers numbers (\d+)(\w+) $1 $2 (\w+)(\d+) $1 $2 space between