Re: [Languagetool] Spelling rule in Polish and (possibly) other languages

2012-02-04 Thread R.J. Baars
I would be interested in such a comstruction for Dutch, just towarn about incorrect word splitups. Ruud. That's it, thank you! However, what we have now in Language Tool is a long set of rules, one for each pair of words. I thought about a tool that would allow adding new pairs of words to

Re: [Languagetool] Hunspell support

2012-04-29 Thread R.J. Baars
Laszlo once told me suggestion generation takes a lot of time, and is programmatically limited in time. Several options in the .aff limit option and time too. Ruud W dniu 2012-04-29 02:12, Daniel Naber pisze: On Samstag, 28. April 2012, Marcin Miłkowski wrote: My impression was that it's

Re: [Languagetool] Initial hunspell support

2012-05-21 Thread R.J. Baars
Stemming and analyzing, especially getting a postag for a compond not in the dic, would be great. Impact is quite large though; for Dutch, tuning of postags to make the hunspell and LT ones compatible requires revision of some tags. Good plan by the way... Ruud Hi again, I've just split the

Re: [Languagetool] added support for Greek

2012-05-25 Thread R.J. Baars
Might be usefull rules for Dutch as well But there is no Dutch contributor on Java level. Hi, Hi, I have done the translations. I was surprised by the new text warning for identical words and adverbs at the start of the sentence. Seems to me that 'same word' overlaps 'same adverb'.

Re: [Languagetool] online rule creator

2012-05-30 Thread R.J. Baars
Daniel, I succeeded in creating prototype rules from the annotations file, a bit like your easy rules editor. 600 rules in one run. None has been tested yet though. That is the next step. Ruud -- Live Security Virtual

Re: [Languagetool] How to enable spellchecking?

2012-06-04 Thread R.J. Baars
The way Hunspell reacts to quotes and dashes is specified in the affix file. For Dutch, Marcin once adjusted the tokenizer to accept certain quotes. In other languages, this will result in two tokens from LT, which are by themselves incorrect. The dash is controlled in Hunspell as a default

Re: [Languagetool] How to enable spellchecking?

2012-06-05 Thread R.J. Baars
Marcin, in reaction to your quote below, I can only agree that the computational part is quite complex. There was some research in Tilburg, resulting in a quite simple computational algorithm: - reduce all characters to their simple form (drop accents) - get their ascii value, raise it to the

Re: [Languagetool] Wordlists for spellers?

2012-06-22 Thread R.J. Baars
There is a words list for Dutch at www.opentaal.org. Relatively small, just 450.000 words, including proper names, but it is the best open one there is. To be exact: http://opentaal.org/bestanden/doc_download/18-woordenlijst-v-210g-bronbestanden- Ruud Ruud Marcin Miłkowski

Re: [Languagetool] last chance for translation updates

2012-06-29 Thread R.J. Baars
What is the error the text 'Duplicate rule file!' stands for? What is exactly going on when this message applies? There is only one grammar.xml, or not? When not, what makes 2 files duplicate? Ruud I'm going to create the LanguageTool 1.8 release tomorrow, so today is the last chance to

Re: [Languagetool] vote about logo proposals

2012-09-19 Thread R.J. Baars
I Like 1, because it links to the implementation perception of users Maybe combined with 2, which the advantage of already shoming an applicable icon Having an underline under languagetool suggests that the name is wrong. That is not intended of course. 7 shows the multi-lingual aspect quite

Re: [Languagetool] Postags for compunds with -

2012-10-15 Thread R.J. Baars
That is strange, that should not be the case. Could you send me the dump? Ruud On 14.10.2012, 21:44:59 Ruud Baars wrote: Though the input for fsa surely has 13-jarig as a base word for 13-jarige Well, it doesn't. If I export the dutch.dict with this command: ./dump.sh

Re: [Languagetool] Word form dictionary for German

2012-11-02 Thread R.J. Baars
You could dump it. Daniel did that for me lately. Think about the postagging, done by the routines in LT which supports compounds. Ruud Hi, I noticed today that the german.dict file in LanguageTool is a binary file, I suppose created with Morplogik. Is the original data and the conversion

Re: [Languagetool] word confusion

2012-11-18 Thread R.J. Baars
Could you send me the text string please, so I can translate them? Ruud Is there anyone out here good enough to add this rule to Dutch LT? We might move some rules from the xml to this rule, maintaining functionality, and expand it easier and faster. I've just added

Re: [Languagetool] Performance

2012-11-21 Thread R.J. Baars
Daniel, Maybe I am wrong,but suppose this speeds things up, wouldn't it also make things a lot more complex? Is it a real option? Wouldn't a faster server, faster compiler (if available) be easier solutions? Or could moving some processing to the client side be an option? Ruud On

Re: Firefox extension

2012-12-09 Thread R.J. Baars
Could you add the link to the translation location? The extension has been preliminarily reviewed[1] by Mozilla, and is now available at addons.mozilla.org (AMO)[2]. Please note that this version is not up to date, so if you want to test the extension, use the attached version. I plan to

Re: Firefox extension

2012-12-09 Thread R.J. Baars
I have translated Dutch as good as possible. Ruud. The extension has been preliminarily reviewed[1] by Mozilla, and is now available at addons.mozilla.org (AMO)[2]. Please note that this version is not up to date, so if you want to test the extension, use the attached version. I plan to

Re: adding suggestions to a pattern rule

2013-01-17 Thread R.J. Baars
Fine for me. Makes a big rewording for existing rules necessary though. Ruud 2013/1/16 Dominique Pellé dominique.pe...@gmail.com Do we really need to put suggestion inside suggestions? It would be less noisy like this: messageyada yada yada/message suggestionxxx/suggestion

Re: adding suggestions to a pattern rule

2013-01-17 Thread R.J. Baars
That is redundant functionality imho, prone to introduce differences in replys. It is acceptable for migration purposes nonetheless. Ruud 2013/1/17 R.J. Baars r.j.ba...@xs4all.nl Fine for me. Makes a big rewording for existing rules necessary though. Ruud No need. It will be just another

Re: bug: 3607406 No space before colon

2013-04-05 Thread R.J. Baars
I guess this is language independ, so be fixed in same rule as no space before comma. Ruud All, I could fix this bug by coding the following rule within the Italian grammar.xml (I also added a check for semicolon… I know, I could have used a reg_exp :-) ). The question is, would this be

Re: bug: 3607406 No space before colon

2013-04-05 Thread R.J. Baars
there are more chars this might apply to :;,.)} The other way around: there should be a space before some other chars. Ruud -- Minimize network downtime and maximize team effectiveness. Reduce network management and

Re: Mixed case words

2013-04-08 Thread R.J. Baars
About case: In Dutch, DVD is a undesirable way to write dvd; Dvd at the start of a sentence. Camelwords are proper names most of the time. Nieuw-Zeeland is one word, and correct. As is auto-deur. As far as I am concerned, for any word, only a first-uppered form is acceptable as a variant of any

Re: Mixed case words

2013-04-08 Thread R.J. Baars
the firstUpper is a feature for spell checking, the other for grammar, as long as there is still a distinction. But why does the grammar not signal the ' start of sentence' to the spell checker? Ruud 2013/4/8 R.J. Baars r.j.ba...@xs4all.nl About case: In Dutch, DVD is a undesirable way to write

Re: More on spelling suggestions

2013-04-25 Thread R.J. Baars
I don't understand most of the technicalities, but I can stat that for Dutch, multiple single and multiple character substitutions are necessary to get the proper word. Example: cocoskoekie is very informal an wrong for kokoskoekje ( twice the common c-k interchange, once the common kie-kje

Re: More on spelling suggestions

2013-04-30 Thread R.J. Baars
I really do think it is not a good idea to expand a dic using an aff, unless it is just a munch compressed dic. Using compouning flags to generate all kinds of theorètical words will also result in unlikely or wrong words to appear, and will make the list unnecessary long. Ruud

Re: Hunspell expansion to words list

2013-05-07 Thread R.J. Baars
Marcin, Hunspell is (almost) perfectly sound, assuming there is a reasonable text as input, and all features are used extensively. The issue is that compounding itself is not safe; it is a language issue. Doing 'expansion' , the process is the other way around: all words, even words never

Re: LT as a Java spell checker

2013-05-07 Thread R.J. Baars
API-compatible too? Hi, maybe we could also have a command-line interface compatible with ispell/aspell/hunspell? This way, we would have a drop-in replacement for these spellers. Best, Marcin W dniu 2013-05-05 17:33, Daniel Naber pisze: Hi, I created a page that describes how to use

Re: spelling abbreviated words with dots?

2013-05-16 Thread R.J. Baars
The handling of dots is mostly done in the calling app, and quite differently between apps. But when spelling woudl be called as part of one language support module, not 2 separate spelling and grammar module, it would be possible to handle this correctly for both. Ruud What's the best

Re: buggy tagging in Slovak?

2013-06-25 Thread R.J. Baars
Some words have tat much ways of being used, that almost every postag is valid. These words that require a of of exceptions, or, which is better, pre-treatment in the disambguation. Ruud Hi, I have created my first rule. It somehow works, but its behaviour is weird. The rule should find a

Re: buggy tagging in Slovak?

2013-06-25 Thread R.J. Baars
So there is no construction like referring to an a ? Anyway, there are wasy to work around it. Ruud Dňa 25.06.2013 11:14, R.J. Baars wrote / napísal(a): Some words have tat much ways of being used, that almost every postag is valid. These words that require a of of exceptions, or, which

Re: buggy tagging in Slovak?

2013-06-26 Thread R.J. Baars
Disambiguation will help. Use words before and after the letter to rule noun out. Ruud On Tue, Jun 25, 2013 at 12:35 PM, Milos Sramek sramek.mi...@gmail.comwrote: Dňa 25.06.2013 11:14, R.J. Baars wrote / napísal(a): Some words have tat much ways of being used, that almost every postag

Re: suggestions in Morfologik spelling rule

2013-07-16 Thread R.J. Baars
Coding word frequencies as a character is fine. I think it would be classes, logarithmic as far as I am concerned. Ruud W dniu 2013-07-16 00:03, Jaume Ortolà i Font pisze: 2013/7/15 Marcin Miłkowski list-addr...@wp.pl: Hi Jaume, W dniu 2013-07-15 21:16, Jaume Ortolà i Font pisze: Hi,

Re: New Language Turkish

2013-07-19 Thread R.J. Baars
When you are going to need postags, I would be happy to co-operate on Turkish; I can provide a Turkish words list and word frequencies. Ruud (Dutch) Hi everyone, My name is Sinan Yolal. I do not know so much about Java and I am learning XML. If it is okay, I would like to add Turkish to

Re: New Language Turkish

2013-07-19 Thread R.J. Baars
You are right. It is right there for download. Ruud I think there was a complete dictionary and a tagger in the abandoned project we mentioned earlier... But I may be wrong. Best, Marcin W dniu 2013-07-19 08:09, R.J. Baars pisze: When you are going to need postags, I would be happy to co

Re: SimpleReplaceRule improvements

2013-07-22 Thread R.J. Baars
This is really very useful! Jaume I am all for moving useful pieces into common code, please implement the necessary changes - I'll be on vacation starting tomorrow without permanent Internet connection so I may not get to do that myself for next couple of weeks. Thanks Andriy

Re: gaining new contributors by offering easy tasks

2013-09-16 Thread R.J. Baars
Daniel, for non-developers like Github is indeed a blocking thing. Developer tools are just too complicated. A web interface to contribute to rules is the best may to go for xml rules, I guess. But then, there has to be some form of levels of access, a workflow from developing a rule to test,

Re: gaining new contributors by offering easy tasks

2013-09-16 Thread R.J. Baars
I have reported the low improvement rate for Dutch to the OpenTaal community, which planned to take over actions from myself. if that does not lead to a repsponse soon, I will take over the LT Dutch again myself. I am quite busy nowadays, but the least I can do is remove the false alarms, which

Re: ignore occurrences in the beginning of a sentence

2013-09-26 Thread R.J. Baars
You could use the postag SENT_START on the first token Ruud I thought this would ignore occurrences of actually|really in the beginning of a sentence: pattern token/ token regexp=yesactually|really/token token/ /pattern But it doesn't. How do I

Re: dump of LT command line

2013-09-30 Thread R.J. Baars
Yes, it is a snapshot of a few days ago, with edited disambiguation. Tests run perfectly. I can send the changed disambiguation, maybe it is in that. But tonight, I am in the train right now and then at work, without connections. Security... Ruud On 2013-09-29 22:32, Ruud Baars wrote:

Re: dump of LT command line

2013-10-01 Thread R.J. Baars
No, that is all I changed, but with a (recent) snapshot. I will check tonight with the new release. - without the altered disambig - with it It could be platform, But I am not the only one on (K)ubuntu, using JDK. Ruud On 2013-09-30 18:17, Ruud Baars wrote: For reproducing exactly: remove

Re: dump of LT command line

2013-10-03 Thread R.J. Baars
No need for excuses. imho this is already quite stable. I will download a snapshot and start again. I am glad to have helped getting rid of a bug. Ruud The BaseSynthesizer is now thread-safe. The PolishSynthesizer had the same bug and has been fixed, too. @Ruud: Sorry for the

Re: Intro

2013-10-09 Thread R.J. Baars
It might be possible to generate Hungarian postags from the info in the Hungarian Hunspell. Ruud On 2013-10-09 10:40, Logon wrote: Hi Karoly, I am new to LanguageTool, have just learned about it as the HU translator of OmegaT. I wonder if it would be possible to start specifying rules

LT stability OK

2013-10-13 Thread R.J. Baars
I just noticed the long process of having lT digging through the Dutch input files has succesfully ended without any further crashes. Looks like the fixes applied were great! Ruud -- October Webinars: Code for

Afrikaans spellchecker

2013-10-13 Thread R.J. Baars
Does anyone know the maker of the Afrikaans spell checker? It seems to have a lot of proper names without the required capital. Ruud -- October Webinars: Code for Performance Free Intel webinars can help you accelerate

Re: renaming languagetool-standalone.jar?

2013-11-09 Thread R.J. Baars
languagetool-graphical ? languagetool-user-interface? Ruud Hi, a user was confused about the name languagetool-standalone.jar[1]. Does anybody see a problem with renaming the file to languagetool-desktop.jar? languagetool-gui.jar might even be more appropriate, but I assume not everybody

Re: New sanity check in grammar.xml when running ant check

2013-11-14 Thread R.J. Baars
Thanks for adding this check. I am currently just too busy, but will trty to find the time to correct these rules. Ruud Dominique Pellé dominique.pe...@gmail.com wrote: Hi I've just added new checks when running ant check. It now reports the following new errors in Russian, Chinese and

Re: Creating complex rules for dozens/hundreds of possible second verbs

2013-11-19 Thread R.J. Baars
Sounds like regexp is applicable? Or are there just too much? Hello! There are some special verbs in Portuguese which affect the verb that comes next, for example: ter/haver. Variations of the verb ter to add to the rules: 1) tinha 2) tinhas 3) tinham 4) tenha 5) tenhas 6) tenham 7)

Re: removing ANY code?

2013-11-19 Thread R.J. Baars
As discussed before, there is actually no real connection between language and country. These codes could (and should) not be handled as one setting at all. English is one of the languges spoken worldwide, with small variants for regions like Australia, New Zealand, Great Britain, USA etc. Ruud

Re: better Wikipedia match filtering

2013-11-26 Thread R.J. Baars
It is huge (10 GB). Since it is data captured from sites, it is not guaranteed to be as free as is needed for free publication of results. The format is plain text, utf8, 1 line per original paragraph. Ruud On 2013-11-26 11:49, R.J. Baars wrote: This is a big help if it also works on other

Re: Improving spelling suggestions with frequency dictionaries

2013-11-26 Thread R.J. Baars
The corpus data does not have to be free, as long as you don't reproduce the data, do your own process and add intelligence. Anyway, I collected rather reliable frequency data from Internet for lots of languages. I am willing to 'free' the counting data on some reasonable conditions. These

Re: showing example sentences in GUI

2013-12-02 Thread R.J. Baars
Why show sentences with errors to people that need help getting it right? It is not an objection, more a question of : is there a reason from the user perspective? Ruud Hi, we have example sentences for every error in grammar.xml, and we show them on community.lt.org, but not in the

Re: Compounding support in the speller

2014-01-08 Thread R.J. Baars
There are 2 mechanisms in Hunspell, the one using compound rules and the one using compound-start , -middle and -end. Dutch uses both, rules for the easy system of numbers etc, the other one for regular compounds. One cannot mix those two methods. Other languages don't compound at all, most

Re: Compounding support in the speller

2014-01-08 Thread R.J. Baars
Marcin, Is there a readable document that shows how prefixes and suffixes are defined and applied in the morfologik speller? Ruud Hi all, I want to introduce compounding support for MorfologikSpeller for the next LT release. I looked at hunspell dictionaries again and it seems that it

Hunspell and flags

2014-01-08 Thread R.J. Baars
My perception is that Hunspell suffers from the short flag notation. Longer flags would allow for more flags, thus reducing the need for conditions for applying the affixes (the condition is pre-applied by applying (longer) flags to the right base forms. Maybe complexity in the mechanism could

Re: Compounding support in the speller

2014-01-08 Thread R.J. Baars
Marcin, I assume most of the entries in the dictionary will be words, acceptable without any affix or other additions. In that case, a flag 'notaword' could be used to keep non-word parts to be accepted as words. A bit like FORBIDDENWORD in Hunspell. Ruud

Re: Compounding support in the speller

2014-01-08 Thread R.J. Baars
I read the part about configuring the speller dictionary of morfologik. Some aspects seem strange to me. Looks like some or all options allowed are for the entire list only (?) Case-specificness is important. Now, I notice some hunspell dictionaries accept compounds that are incorrect imho. I

Re: Compounding support in the speller

2014-01-08 Thread R.J. Baars
Weird. On 2014-01-08 15:26, R.J. Baars wrote: Case-specificness is important. Now, I notice some hunspell dictionaries accept compounds that are incorrect imho. I noticed German accepting BurgerInnen. That's a feature I think. BürgerInnen is sometimes used as a short form of BÃ

Wiki

2014-01-09 Thread R.J. Baars
Could someone please this text from the wiki? Interactive testing of rules using a corpus My account does not allow edit. Ruud -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses

Re: Lithuanian rules - inflection

2014-01-21 Thread R.J. Baars
Inflection can be done, using the 'inverse dictionary'. If there is a Lithouanian postag dictionary, in generating this from the data, the reversed dictiuonary can be made, and the 'inflected' can be used. There are examples in the wiki as well as some languages, like Dutch. Ruud Hello to all

Re: capitalizing Morfologik Spelling suggestions

2014-02-02 Thread R.J. Baars
I guess this depends on the case-specificity of the dictionary entries. Are they, or aren't they? hendrix should be in the dictionary as Hendrix, not hendrix, so the suggestion should always be uppercased, it being a proper name. DVD is a very common error (in Dutch) for dvd, so DVD should

Re: new rule: check space between sentences

2014-02-19 Thread R.J. Baars
In fact, I suggest only lowercasefull stopUppercase should be reported. Ruud Daniel Naber wrote thus at 06:18 PM 19-02-14: This rule will cause new false alarms for words like X.Org. If it's only when the first character after the dot is in cap, then it's okay. Otherwise, we've a lot to

Re: Bug: More or _less_

2014-03-07 Thread R.J. Baars
But Marcin is right. There should not be a noun after this expression, but most likely an adjective. Ruud I understand that U/C part, but more or less in this case is an idiom meaning to a varying or undetermined extent or degree (Merriam-Webster's Collegiate® Dictionary). We can't have it

Re: Translation for LanguageToolFx

2014-03-18 Thread R.J. Baars
I would be glad to update translations, but there is only the English text to start from, no context of the text at all, or is there? The same text maybe should be translated differently for different contexts. Ruud Hi, Please update your translations for LanguageToolFx at transifex[1]

Re: please add POS tag set documentation

2014-03-19 Thread R.J. Baars
The .txt is in the LT distribution, isn't it? I tis generated when the dictionary was tagged. (Has not been done since ages). If it is there, I am willing to add explanations to use until the new tags are available. Ruud Hi, in order to link POS tag set descriptions from the online rule

Re: rule editor screencast - help needed

2014-05-07 Thread R.J. Baars
I downloaded audio and video. The video is in a bit strange resolution for Video : 1120*832 This could already cause som problems. Should I place it in the middel of of a full HD fram, or squeeze it in dvd format? KDENLIVE is quite good at this, generating all sort of open formats and keeping

Re: rule editor screencast - help needed

2014-05-07 Thread R.J. Baars
The .avi seems to be compressed using a quite lossy codec, so the screen is rather fluffy. It does not seem to be of the same size the one Jan used. Is the raw file still available? Ruud -- Is your legacy SCM system

Re: rule editor screencast - help needed

2014-05-07 Thread R.J. Baars
21:38, R.J. Baars wrote: Is the raw file still available? Now at http://danielnaber.de/tmp/lt-rule-editor-screencast-no-sound.ogv Regards Daniel -- Is your legacy SCM system holding you back? Join Perforce May 7

Re: Some thoughts on grammar Nazis, LanguageTool, and the world in itself

2014-05-19 Thread R.J. Baars
Would it be possible to: - add a level of 'checking profiles' - that have the familiar categories - the categories just contain rule(set) id's (file_name+rule_id) - the rule-id's are stored in a number of rule files Of all profiles, just 1 can be active for the active language. This makes it

Dump

2014-05-27 Thread R.J. Baars
Hi. I am currently using languagetool-commandline checking billions of paragraphs. It works fine, except for some dumps, like the ones below. I need the tool to continue, for I need the data. When it has been processed, I might try to find the items it crashes on. It looks like it is all

Re: Dump

2014-05-27 Thread R.J. Baars
Unfortunately, the dump is still there in the nightly build of yesterday: I will try to find examples making it dump, though they are rare. Ruud Exception in thread main java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.StringIndexOutOfBoundsException: String index

Re: Dump

2014-05-27 Thread R.J. Baars
By the way, it would be helpfull if in the dump there was some info of the input ... I dont knwo if that is possible, but it would surely help. Ruud W dniu 2014-05-27 12:42, Daniel Naber pisze: On 2014-05-27 11:06, Marcin Miłkowski wrote: maybe it was because of a simple mistake in the

Re: Dump

2014-05-29 Thread R.J. Baars
It has been running for some hours now; no dump yet. Ruud On 2014-05-28 07:28, R.J. Baars wrote: Unfortunately, the dump is still there in the nightly build of yesterday: Are you sure you're using the latest build? The line numbers look like Marcin's fix is not included. I have added

Re: reminder: feature freeze for 2.6

2014-06-21 Thread R.J. Baars
I did Dutch as far as the core is concerned. I don't know the other features, So I had to leave those untranslated. Hi, this is reminder that we're now in feature freeze for LT 2.6, to be released 2014-06-30. * Please help complete the translations at

Re: Morphologic Analyser to solve concordance issue for Portuguese

2014-07-08 Thread R.J. Baars
There is loads of them in the wiki, in the other languages files etc. It is quite simple, instead of checking the word itself, you check the 'type of word'. Ruud But, how? Is there any documentation around with examples? On 08/07/2014 07:15, R.J. Baars wrote: I think the best way

Re: Help needed with POS Tagged word

2014-07-26 Thread R.J. Baars
Graag gedaan. m.v.g. Ruud Isn't this the kind of thing On 2014-07-25 17:03, Elanjelian Venugopal wrote: I am trying to capture an error pattern where a letter is attached as a suffix to a POS tagged word. And the suggestion should strip of the suffix. I haven't tested it, but something

SRX

2014-08-18 Thread R.J. Baars
There is an adjustment to make in the sentence splitter. But where did the .srx go? I detected an abbreviation that is commonly used and as for now seen as sentence end: milj. Could this be added to the Dutch srx rules? Ruud

Re: SRX

2014-08-18 Thread R.J. Baars
Same applies to [0-9]{1,2}[-]pers. Ruud There is an adjustment to make in the sentence splitter. But where did the .srx go? I detected an abbreviation that is commonly used and as for now seen as sentence end: milj. Could this be added to the Dutch srx rules? Ruud

Re: SRX

2014-08-18 Thread R.J. Baars
On 2014-08-18 16:16, R.J. Baars wrote: There is an adjustment to make in the sentence splitter. But where did the .srx go? It's at languagetool-core/src/main/resources/org/languagetool/resource/segment.srx Could this be added to the Dutch srx rules? Sure, could you send a patch? Regards

No rules, but by example

2014-08-19 Thread R.J. Baars
I think a good addition for LT would be to have a general rule, just acting on tokens, a bit like srx does wit letters. bed : ok bed english : not ok = bad english A mechanism, that lets the longer token list overrule the shorter one. This would create the option to add found errors empirically.

Re: SRX

2014-08-19 Thread R.J. Baars
I am currently checking the output of all the rules on the 20GB corpus; some rules are perfect, some less (though hard to tweak). Result will be a major update, I guess... Ruud On 2014-08-18 17:18, R.J. Baars wrote: I was able to test, and removed 2 of my additions to make it work. Thanks

urls in grammar file

2014-08-19 Thread R.J. Baars
This generates an error. How do I add multiple urls to 1 rule? urlhttps://onzetaal.nl/taaladvies/advies/instandhouden-in-stand-houden/url urlhttp://taaladvies.net/taal/advies/vraag/412/in_bedrijf_stelling_inbedrijfstelling//url Ruud

Re: urls in grammar file

2014-08-19 Thread R.J. Baars
This is a limitation; there are several texts to refer to sometimes ... Is it on the whishlist to change this? On 2014-08-19 12:41, R.J. Baars wrote: This generates an error. How do I add multiple urls to 1 rule? Only one URL per rule is currently supported. Regards Daniel

Re: No rules, but by example

2014-08-19 Thread R.J. Baars
:14, R.J. Baars wrote: bed : ok bed english : not ok = bad english For some types of errors, I think it works better then current rule/exception type of check. I'm not sure I understand: do you suggest a different (more compact) way to write down simple rules, or do you suggest

Re: No rules, but by example

2014-08-19 Thread R.J. Baars
for Dutch. Ruud On 2014-08-19 13:43, R.J. Baars wrote: hand some; wrong; handsome I hand some tools to; correct It is a very compact way of defining very simple rules. I see, but the thing is that these rules probably won't stay simple for long. What if you want to add you hand some

Re: support for Persian

2014-08-19 Thread R.J. Baars
Persian or Farsi? Hi, some people will already have noticed it: LT has just added support for Persian. This is an important step, as Persian is the first right-to-left language we support. There will probably be some bugs, but for now it's looking good and there have only been minor issues.

Error message?

2014-08-19 Thread R.J. Baars
java -jar languagetool.jar (process:14246): GLib-CRITICAL **: g_slice_set_config: assertion 'sys_page_size == 0' failed This occurs after clicking on the reference url in the UI; it works though. Ruud --

Re: support for Persian

2014-08-19 Thread R.J. Baars
Okay, if you need sentences to test on, or plain words lists with frequencies, or word groups and their frequencies, I have them. Ruud https://en.wikipedia.org/wiki/Persian_language On Tue, Aug 19, 2014 at 5:33 PM, R.J. Baars r.j.ba...@xs4all.nl wrote: Persian or Farsi? Hi, some

Rule not working as expected

2014-08-19 Thread R.J. Baars
I discovered that the rule below is not working very well. It look like 'skip' also skips over sentence boundaries. Is that intentional? Or is something else wrong? In case it is intentional, is there an option to forbid that? Ruud rule id=nr738 name=duur kost pattern token skip=4duur/token

Re: Rule not working as expected

2014-08-19 Thread R.J. Baars
THanks, I will check it out. The rule is not functioning very well at all. I commented it out and put it on the list of items to do. Ruud R.J. Baars r.j.ba...@xs4all.nl wrote: I discovered that the rule below is not working very well. It look like 'skip' also skips over sentence boundaries

Command line output

2014-08-20 Thread R.J. Baars
Is it possible to get the entire input in the output? Same for the matched part? Ruud -- Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/ ___

Re: Command line output

2014-08-20 Thread R.J. Baars
I would appreciate if it could be added. It makes it a lot easier to cycle testsets. I guess I should make a request for it? Ruud On 2014-08-20 09:47, R.J. Baars wrote: Is it possible to get the entire input in the output? Same for the matched part? I think that's not possible without

Re: Command line output

2014-08-20 Thread R.J. Baars
, but the context can be very long. Using the truncated context as new input triggers new errors and misses some) Ruud On 2014-08-20 10:15, R.J. Baars wrote: I would appreciate if it could be added. It makes it a lot easier to cycle testsets. I guess I should make a request for it? What exactly would

Re: Some Universal regex for puntuations

2014-08-20 Thread R.J. Baars
This one will generate false errors on the plain type F16 and mp3. Not that general after all... Ruud I mean these rules are useful for all languages for example --- should have space between word and numbers numbers (\d+)(\w+) $1 $2 will do this action foo123 foo 123 I checked

Re: Command line output

2014-08-20 Thread R.J. Baars
I will try to build it using the web interface. Ruud On 2014-08-20 10:15, R.J. Baars wrote: I would appreciate if it could be added. It makes it a lot easier to cycle testsets. I guess I should make a request for it? What exactly would you need as output? Adding another output format

exception

2014-08-20 Thread R.J. Baars
I try to warn for longer words in uppercase (considered bad habit in Dutch), but this rule hits far too much. However, this generates an enormous amount of positives, so I want to make some exceptions. One exception should be UPPERCASE - words words words (newspaper start of line The other one

skip with exception

2014-08-20 Thread R.J. Baars
I want to catch a double negation (not . not), but want to forbid the match when there is a certain token in between (like ;). Is that possible? How? Ruud -- Slashdot TV. Video for Nerds. Stuff that matters.

Unification

2014-08-20 Thread R.J. Baars
The texts on unification on the wiki look pretty difficult. Is this the way to check: 'leger des heils' (salvation army) and correct it into 'Leger des Heils'? There are quite a lot of proper names that need this kind of correction, preferrably in one rule .. What are the options? Ruud

Re: Unification

2014-08-21 Thread R.J. Baars
I have found a solution, but it is a bit of a workaround: I changed the rule to just check 'leger des heils' case insensitive, so it will always hit, no mater the case. To prevent alarm for the correct 'Leger des Heils' I added a case sensitive rule to the disambiguator. Ruud The texts on

Explanation needed

2014-08-21 Thread R.J. Baars
Could anyone please elaborate on the layout of: wrongwordincontext?, Maybe using bad/bed as an example? I am not sure which of the fields is what. The german example is not clear to me (Not very good in German). word1=bad word2=bed match1=? match2=? context1=? context2=? explanation1 explanation2

Re: Unification

2014-08-22 Thread R.J. Baars
I stopped using the disambiguator, and used antipattern instead. This way, the rule is completely in one file. Thanks for tipping. Ruud I have found a solution, but it is a bit of a workaround: I changed the rule to just check 'leger des heils' case insensitive, so it will always hit, no

Re: Explanation needed

2014-08-22 Thread R.J. Baars
Thanks I added it to the Dutch datafile as well. And one case to start with. Taht works fine. But it does not trigger when none of the context words is present. Is that by intent? Ruud On 2014-08-22 07:33, R.J. Baars wrote: Could anyone please elaborate on the layout of: wrongwordincontext

  1   2   3   >