Re: [en] postags for 'haven'

2016-07-29 Thread Marcin Miłkowski
W dniu 29.07.2016 o 13:57, Mike Unwalla pisze: > Marcin, thanks. > > Except for the removal of VBP, I decided to make no changes to the > disambiguation at this time, for these reasons: > > 1. In disambiguation.xml, if I remove the readings MD and VBP from 'haven' > if it is not MD (that is, it is

Re: [en] postags for 'haven'

2016-07-27 Thread Marcin Miłkowski
W dniu 25.07.2016 o 16:28, Mike Unwalla pisze: > Hello, > > The word 'haven' has these postags: NN, MD, VBP. > > When the word is not a verb (as in "the village is a haven of tranquility"), > I want to remove MD and VBP. Postag VBP causes 'haven' to appear as a > suggestion when I use

Names of rule groups are now required and non-empty

2016-04-23 Thread Marcin Miłkowski
Hi all, I made one change in rule syntax today. The name attribute was optional until today but it was inconsistent with configuration: neither the GUI, nor the command-line, nor even the API allows to disable just one rule from a rule group. You disable the whole group at the same time. But

Re: Roadmap for Spanish

2016-04-07 Thread Marcin Miłkowski
W dniu 06.04.2016 o 14:57, Juan Martorell pisze: > On 5 April 2016 at 16:29, Jaume Ortolà i Font > wrote: > > > 2014-06-06 20:45 GMT+02:00 Juan Martorell >: > > >

Re: Roadmap for Spanish

2016-04-06 Thread Marcin Miłkowski
W dniu 06.04.2016 o 14:55, Juan Martorell pisze: > > > On 4 April 2016 at 19:28, Marcin Miłkowski <list-addr...@wp.pl > <mailto:list-addr...@wp.pl>> wrote: > > Hi, > > W dniu 03.04.2016 o 12:46, Juan Martorell pisze: > > I realized this bec

Re: Preventing inflections in suggestions

2016-03-12 Thread Marcin Miłkowski
W dniu 12.03.2016 o 05:02, Andriy Rysin pisze: > Hi all > > I have some word forms (colloquial forms of the verbs) I would like to > tag and recognize but I don't want them to show up in suggestions. I > found that I can remove the tags I don't want from the list when > building synthesizer but I

Re: Create rule - comma before word(s)

2016-03-03 Thread Marcin Miłkowski
W dniu 03.03.2016 o 19:06, Marco A.G.Pinto pisze: > Hello! > > The following Portuguese words require a comma before them: > 1) Eu gosto muito de chocolate, *mas *não posso comer para não engordar. > 2) Eu gosto muito de chocolate, *porém *não posso comer para não engordar. > 3) Eu gosto muito de

Re: Valency dictionary and attribute [long mail]

2016-02-08 Thread Marcin Miłkowski
Hi Andriy, W dniu 07.02.2016 o 21:35, Andriy Rysin pisze: > Hi Marcin > > I was actually thinking for something even more abstract. To adjust > your example: > >

Re: Valency dictionary and attribute [long mail]

2016-02-02 Thread Marcin Miłkowski
defined per valency lexicon in a language). There are free valency lexicons for many languages beside Polish. > > Thus if we add other semantic information into LT we can use this info > in the logic without changing the LT core. The core XML parsing will have to be changed anyway.

Re: MS Word add-in for LT

2016-02-02 Thread Marcin Miłkowski
W dniu 30.01.2016 o 15:17, Daniel Naber pisze: > On 2016-01-30 12:07, Jaume Ortolà i Font wrote: > >> My preferred approach would be to write a "configuration program" (in >> HTML/JavaScript), with a custom form for every language. This program >> would generate output(s) that can be used

Re: MS Word add-in for LT

2016-01-30 Thread Marcin Miłkowski
W dniu 29.01.2016 o 14:38, Jaume Ortolà i Font pisze: > 2016-01-29 14:07 GMT+01:00 Marcin Miłkowski <list-addr...@wp.pl > <mailto:list-addr...@wp.pl>>: > > W dniu 29.01.2016 o 12:27, Jaume Ortolà i Font pisze: > Just tested and it works in MS Word 2007. >

Re: New Language - constraint grammar importing tool

2016-01-29 Thread Marcin Miłkowski
W dniu 29.01.2016 o 13:58, Curon Wyn pisze: > Hi Marcin/Daniel > > Ar 29/01/2016 09:01, ysgrifennodd Marcin Miłkowski: > >> I would try to get this working. The conversion was pretty good, only >> very special constructions of constraint grammar could not be

Re: MS Word add-in for LT

2016-01-29 Thread Marcin Miłkowski
W dniu 29.01.2016 o 14:12, Marcin Miłkowski pisze: > W dniu 29.01.2016 o 12:27, Jaume Ortolà i Font pisze: >> Hi, >> >> I have released a new version of the plug-in [1]. >> >> All messages are now in English by default. The translations into other >> languag

Re: MS Word add-in for LT

2016-01-29 Thread Marcin Miłkowski
W dniu 29.01.2016 o 12:27, Jaume Ortolà i Font pisze: > Hi, > > I have released a new version of the plug-in [1]. > > All messages are now in English by default. The translations into other > languages have to be put in files like this [2]. I think almost all > needed strings can be extracted from

Re: New Language - constraint grammar importing tool

2016-01-29 Thread Marcin Miłkowski
Hi, W dniu 29.01.2016 o 00:34, curon@wyn.cymru pisze: > Hi, > > First of all, I must thank you all for developing such a good grammar > correction tool under an open source license. > > A few years ago I started looking at developing An Gramadóir, as work had > already been done for the Welsh

Re: MS Word add-in for LT

2016-01-26 Thread Marcin Miłkowski
Hi Jaume, this is very good news! W dniu 26.01.2016 o 10:47, Jaume Ortolà i Font pisze: > Hi, > > I have made a beta release of a MS Word add-in for LanguageTool [1]. > ("Add-in" is Microsoft terminology for "plug-in"). > > It has some limitations, but I think it can work fine and be useful. The

Re: Splitting segment.srx?

2016-01-25 Thread Marcin Miłkowski
been created. >> >> Sorry if it sounded negative, but really, it would make my life harder: >> I need several languages in the same file, so we would need to join XML >> files on the fly, and make sure that nobody tries to override the >> standard settings fo

Re: Splitting segment.srx?

2016-01-25 Thread Marcin Miłkowski
standard settings for all languages by mistake etc. Regards, Marcin > > Regards, > Andriy > > 2016-01-25 3:50 GMT-05:00 Marcin Miłkowski <list-addr...@wp.pl>: >> W dniu 25.01.2016 o 03:29, Andriy Rysin pisze: >>> Currently 95% of the language handling is done in langua

Re: Splitting segment.srx?

2016-01-25 Thread Marcin Miłkowski
d complicated tools just to solve an issue that is a matter of taste. This is a waste of time. It would be much more productive to build more GUI tools. Regards, Marcin > > Regards, > Andriy > > > > 2016-01-24 17:13 GMT-05:00 Marcin Miłkowski <list-addr...@wp.pl>: >>

Re: Splitting segment.srx?

2016-01-24 Thread Marcin Miłkowski
W dniu 24.01.2016 o 17:15, Andriy Rysin pisze: > Would it make sense to split segment.srx into language modules (and > assemble dynamically from available languages)? For now it seems to be > the only language-specific piece that belongs to core module. > Was there any attempts at this and if yes

Re: Java Webstart commented out

2015-12-29 Thread Marcin Miłkowski
W dniu 29.12.2015 o 14:10, Daniel Naber pisze: > Hi, > > I've commented out the Webstart link at languagetool.org. After updating > to LT 3.2, Webstart complained that not all JARs are signed with the > same key. I could "fix" this by manually cleaning the Webstart cache > (not the browser cache).

Re: LanguageTool in 2015 + the future

2015-12-17 Thread Marcin Miłkowski
W dniu 08.12.2015 o 23:11, Daniel Naber pisze: > On 2015-12-07 19:30, Marcin Miłkowski wrote: > >> I think there's a community that we haven't addressed at all: language >> professionals, be it proofreaders or translators (and translation >> agencies). Translators ar

Re: LanguageTool in 2015 + the future

2015-12-07 Thread Marcin Miłkowski
Hi, W dniu 07.12.2015 o 14:56, Daniel Naber pisze: > Hi, > > the year is slowly coming to an end, so I thought I'd try to summarize > what we've achieved this year and how we can move LT forward in the > future. In 2015, we... > > * made three releases so far (2.9, 3.0, 3.1), another one is

Re: Terminology checking

2015-08-14 Thread Marcin Miłkowski
W dniu 14.08.2015 o 08:56, Dmitri Gabinski pisze: For reference: you can use Okapi CheckMate for such purposes. CheckMate can also engage LanguageTool to check spelling/grammar. Well, for morphologically-rich languages, you cannot, as it would only check the base forms. Turning a simple CSV

Re: improvements in Morfologik speller

2015-06-09 Thread Marcin Miłkowski
W dniu 2015-06-08 o 21:27, Jaume Ortolà i Font pisze: 2015-06-08 9:39 GMT+02:00 Daniel Naber daniel.na...@languagetool.org mailto:daniel.na...@languagetool.org: On 2015-06-02 15:06, Jaume Ortolà i Font wrote: Hi Jaume, sorry for the late reply. There are some failures

Re: Problems with LibreOffice 4.2x on Windows

2015-05-08 Thread Marcin Miłkowski
W dniu 2015-05-07 o 21:54, Daniel Naber pisze: On 2015-05-07 20:51, Marcin Miłkowski wrote: on Windows machines, LanguageTool 2.9 seems to cause crashes in LO 4.2x (and newer versions). See this comment: http://en.libreofficeforum.org/node/9867#comment-41082 Also see here (starting

Re: large file mode for command-line version

2015-04-13 Thread Marcin Miłkowski
W dniu 2015-04-13 o 18:26, Daniel Naber pisze: Hi, is there any reason we still need the special mode in our command line version that gets activated if the input data is 64,000 characters long? It complicates the code and causes this bug:

Google Doc grammar check

2015-04-05 Thread Marcin Miłkowski
Hi all, there's a Proofread Bot extension that does some external grammar checks in Google Docs: https://chrome.google.com/webstore/detail/proofread-bot/djancbfmkanmnofhdfindoppiapcgnbf So basically, we can see it's doable. There's also VeritySpell extension and so on. Regards, Marcin

Re: Rule suggestion

2015-04-01 Thread Marcin Miłkowski
W dniu 2015-03-31 o 23:19, Torsten Wagner pisze: Hi, as I read on the website you are looking for suggestions of new rules, even if they seem to be trivial. I found that the languagetool return a lot of false positive for abbreviations ending with an s. Languagetool assumes that you might

Attribute-value pairs for POS tags [Was: Re: German tests]

2015-03-04 Thread Marcin Miłkowski
W dniu 2015-03-04 o 13:42, Daniel Naber pisze: On 2015-03-04 08:52, Marcin Miłkowski wrote: If we could move the first part of the code to another class, which would analyze POS tags to get proper values of attributes, the code would be cleaner and faster. The basic attribute-value class

Re: German tests

2015-03-03 Thread Marcin Miłkowski
W dniu 2015-03-02 o 23:08, Daniel Naber pisze: On 2015-03-02 22:44, Marcin Miłkowski wrote: Die[der/ART:DEF:AKK:PLU:FEM*,der/ART:DEF:NOM:PLU:FEM*,der/PRO:DEM:AKK:SIN:FEM*,der/PRO:DEM:NOM:SIN:FEM*,der/PRO:PER:AKK:SIN:FEM*,der/PRO:PER:NOM:SIN:FEM*] That's the difference, for me it gets unified

Re: German tests

2015-03-03 Thread Marcin Miłkowski
W dniu 2015-03-03 o 11:05, Daniel Naber pisze: On 2015-03-03 09:19, Marcin Miłkowski wrote: No idea, frankly. Lowercase changes nothing. But I do get the same alarm when using the yesterday's snapshot I have downloaded from our website, so I guess you must be working on some other version

Re: German tests

2015-03-03 Thread Marcin Miłkowski
W dniu 2015-03-03 o 22:59, Daniel Naber pisze: On 2015-03-03 14:36, Andriy Rysin wrote: I installed jdk1.7.0_75 and German tests pass with it so it's java 8 which makes it fail. I did some debugging and the problem is caused by the elements in Unifier that we iterate over but that have no

Re: German tests

2015-03-02 Thread Marcin Miłkowski
W dniu 2015-03-02 o 21:20, Daniel Naber pisze: On 2015-03-02 18:03, Andriy Rysin wrote: I ran mvn clean test still has same issue, I tried en, ca, and pl and they all pass (and no extra output). I did mvn clean install in languagetool-core to make sure I get the latest core classes.

Re: chunker in disambiguator tests

2015-03-02 Thread Marcin Miłkowski
W dniu 2015-03-02 o 03:26, Andriy Rysin pisze: It looks like when we run checks we do run chunker before we run disambiguator, but when we run disambiguator tests we don't run chunker so the rules/examples in the disambiguator don't see multiword chunks. Is this correct or am I missing

Re: Disambiguator tests run twice

2015-03-02 Thread Marcin Miłkowski
W dniu 2015-03-02 o 19:00, Andriy Rysin pisze: It looks like it's because we have public void testRules() throws Exception { testDisambiguationRulesFromXML(); } in test subclass and also testDisambiguationRulesFromXML() from parent class is run as well. We probably should either

Re: Disabling disambiguator rules

2015-02-27 Thread Marcin Miłkowski
W dniu 2015-02-26 o 21:10, Andriy Rysin pisze: Would it make sense to allow to disable disambiguator rules the same way we disable checking rules? They are cascaded, disabling them is like disabling random pieces of Java code. It might work but it's very risky due to complexity of

Re: MultiThreadedJLanguageTool

2015-02-22 Thread Marcin Miłkowski
W dniu 2015-02-22 o 15:24, Andriy Rysin pisze: On 02/22/2015 04:45 AM, Marcin Miłkowski wrote: Hi, W dniu 2015-02-21 o 19:22, Andriy Rysin pisze: So the main problem with this performance improvement is that we read across paragraphs. There are two problems with this: 1) error context

Re: Development tools

2015-02-11 Thread Marcin Miłkowski
W dniu 2015-02-02 o 02:20, Andriy Rysin pisze: Sorry if this is obvious, but my friends asked me and I'm away from my computer. Is there a way to call parts of sentence analyzer of LT from command line? I.e. sentence tokenizer, tokenizer, tagger, disambiguator? Or currently using Java API is

Re: saving configuration in stand-alone

2015-01-28 Thread Marcin Miłkowski
W dniu 2015-01-27 o 23:47, Daniel Naber pisze: Hi, can anyone confirm that saving the configuration in the stand-alone client is broken? If I disable a rule in the config dialog and restart, the rule is enabled again. Also, does anybody remember why we have both ~/languagetool.properties

Failing tests in Ukrainian

2015-01-25 Thread Marcin Miłkowski
Andriy, you seem to have failed to include one file in your commits. At least the tests fail for me: ava.lang.RuntimeException: java.nio.file.AccessDeniedException: compounds-unknown.txt at org.languagetool.tagging.uk.UkrainianTagger.debugCompounds(UkrainianTagger.java:137) at

Re: another small XML cleanup idea

2015-01-14 Thread Marcin Miłkowski
W dniu 2015-01-13 o 23:35, Daniel Naber pisze: Hi, here's another small XML syntax cleanup idea: Old syntax: match no=1 regexp_match=runter regexp_replace=herunter/ Proposed new syntax: match no=1 regexp=runter - herunter/ The new syntax is shorter and easier to read. Also, the old

Re: small XML syntax changes

2015-01-14 Thread Marcin Miłkowski
W dniu 2015-01-13 o 09:37, Daniel Naber pisze: On 2015-01-13 09:10, Marcin Miłkowski wrote: I've removed about correct 1000 example sentences for German as they were redundant, i.e. they just repeated the incorrect example and its 'correction' attribute. Unless someone objects

Re: small XML syntax changes

2015-01-13 Thread Marcin Miłkowski
W dniu 2015-01-12 o 22:01, Daniel Naber pisze: On 2015-01-12 15:28, Daniel Naber wrote: 2.) A rule can now have only one example sentence as long as there's a correction. I've removed about correct 1000 example sentences for German as they were redundant, i.e. they just repeated the

Re: English Rule Additions

2014-12-23 Thread Marcin Miłkowski
W dniu 2014-12-23 o 00:02, Nick Hough pisze: I have devised some rules for common English mistakes for the letter ‘A’, which you can see here: https://gist.github.com/howlinghuffy/d25d3d6b43c7a9b485cb I plan on doing many more submissions like this over the coming months; let me know what

Re: Plain English rules

2014-12-22 Thread Marcin Miłkowski
W dniu 2014-12-22 o 11:33, Daniel Naber pisze: On 2014-12-20 11:32, Heikki Lehvaslaiho wrote: Heikki, I've set up a gist with 80 English rules that (mostly) expand redundant/wordy rules in LanguageTools 2.7. Testrules script passes these, but it would be good for someone to go though them

Re: English rule: bet regards

2014-12-16 Thread Marcin Miłkowski
W dniu 2014-12-16 o 12:43, Juan Martorell pisze: Hi all, I came across a rule for a common typo I fall in frequently: writing 'bet' instead of 'best' I created an initial rule using the LanguageTool Rule editor: !-- English rule, 2014-12-16 -- rule id=ID name=bet_regards pattern marker

Re: discovering language tool

2014-12-13 Thread Marcin Miłkowski
Hi, W dniu 2014-12-12 o 18:24, Elie Naulleau pisze: Hi all, I am just discovering LT and I am getting interested in its possibilities. I have been auditing/evaluating a correction software for a company looking for style correction. It is called LELIE, is based on the Dislog language, a

Re: Applying matched token's POS tag to another matched token

2014-10-31 Thread Marcin Miłkowski
Hm, are we talking about suggestions or references in the pattern? Because it's certainly possible to do this in suggestions by simply using an appropriate match number. Of course, there might be a limitation: you cannot use the tag of another token. If you can think of a clear syntax for

Re: They all means the same.

2014-10-21 Thread Marcin Miłkowski
Indeed. I just fixed this in the repository. Best, MM W dniu 2014-10-19 o 12:57, Kumara Bhikkhu pisze: Means should be flagged here: They all means the same. kb -- Comprehensive Server Monitoring with

Re: English disambiguation issue

2014-10-16 Thread Marcin Miłkowski
W dniu 2014-10-16 o 15:58, Jonathon Churchill pisze: I'd like to bring attention to a problem with one of the English disambiguation rules: Incorrect: /They are laugh loudly./ Correct: /They are laughing loudly./ / / here the writer may have intended to write /laughing/, however the word

Re: switching from Hunspell to Morfologik

2014-10-12 Thread Marcin Miłkowski
Hi, We have discussed this several times. Basically, I want to tag more words than I want to accept as spelled correctly. Keeping dictionaries separate helps with this. Also, the download size matters less and less, and morfologik dictionaries are fairly small anyway. Best Marcin 11 paź 2014

Re: Morfologik speller

2014-10-03 Thread Marcin Miłkowski
W dniu 2014-10-03 o 13:22, R.J. Baars pisze: Marcin, would it be possible to use the morfologik speller as a separate program, to throw a list of words at, and get the alternatives? No. It does not tokenize words, and you need a little bit of tooling to use the library anyway. Is there an

Re: unexpected ending of a sentence

2014-10-02 Thread Marcin Miłkowski
W dniu 2014-10-02 o 08:25, R.J. Baars pisze: I produced a rule, signaling an unexpected end of a sentence, like a sentence not ending with a char like . ! or ? But this is quite common to happen inside table cells or in headings. LT is not aware of these things, is it? Has anyone found a way

Re: tokenizing numbers

2014-09-30 Thread Marcin Miłkowski
W dniu 2014-09-24 o 21:03, R.J. Baars pisze: Maybe we agree to disagree.. Having them as one token makes detecting patterns easy using regular expressions.. But writing suggestions becomes a nightmare, as you have to use groups and it becomes complex very soon. Marcin Ruud For Polish,

Re: tokenizing numbers

2014-09-24 Thread Marcin Miłkowski
For Polish, I actually want to have numbers tokenized. It makes writing number format rules easier. For example, we use comma as a decimal separator, not a dot. Best Marcin 24 wrz 2014 17:12 Andriy Rysin ary...@gmail.com napisał(a): Hmm, so when you meet 1.001 in the document you would not know

Re: spell checker enhancement

2014-09-16 Thread Marcin Miłkowski
W dniu 2014-09-16 o 09:03, R.J. Baars pisze: A word like 'Aviv'is not correct unless 'Tel' is before it. So it is best to leave Tel and Aviv out of the spell checker. That results in spell checking reporting errors for Aviv. In the disambiguator, there is the option to block that, by making

Re: spell checker enhancement

2014-09-16 Thread Marcin Miłkowski
W dniu 2014-09-16 o 11:21, R.J. Baars pisze: Marcin, We don't agree. There is a spellchecker, but also a single word ignore list for it. Yes, but for multi-words, we'd have to use the disambiguator code internally anyway. You ask for yet another notation of the same thing. Notice also that

Re: new committer: Ebrahim Byagowi

2014-09-13 Thread Marcin Miłkowski
W dniu 2014-08-27 o 09:24, Daniel Naber pisze: Hi, I'd like to welcome Ebrahim Byagowi (ebraminio on github) as a new committer. Ebrahim has recently helped to add Persian to LT, the first right-to-left language we support. We're looking forward to your contributions, Ebrahim!

Re: LT performance optimization

2014-09-11 Thread Marcin Miłkowski
W dniu 2014-09-11 03:54, Andriy Rysin pisze: I tried to run my test under PerformanceTest and I had to cut down my shortest text to 27m characters and it barely made it with -Xmx6g :) I ran for about an hour under profiler after which I shut it down. The picture here is slightly different: 1)

Re: Suggestion: find POS tag of portion of a word in XML rules

2014-09-10 Thread Marcin Miłkowski
W dniu 2014-09-09 23:10, Dominique Pellé pisze: Daniel Naber daniel.na...@languagetool.org mailto:daniel.na...@languagetool.org wrote: On 2014-09-09 22:38, Dominique Pellé wrote: * why does your example give a message in the java rule. Why can't we use message…/message

Re: Suggestion: find POS tag of portion of a word in XML rules

2014-09-10 Thread Marcin Miłkowski
W dniu 2014-09-10 11:34, Dominique Pellé pisze: Marcin Miłkowski list-addr...@wp.pl mailto:list-addr...@wp.pl wrote: W dniu 2014-09-09 23:10, Dominique Pellé pisze: Daniel Naber daniel.na...@languagetool.org mailto:daniel.na...@languagetool.org mailto:daniel.na

Re: Interpunction issue?

2014-09-05 Thread Marcin Miłkowski
W dniu 2014-09-05 11:33, R.J. Baars pisze: When there is no space, it is reported. I just thought the , means continuation, and the ... does too. Yes, but in mathematical contexts, may … mean omission. See: i1, i2, …, in (imagine all numbers and 'n' in subscript). Marcin Ruud W dniu

Re: MorfologikSpeller

2014-09-04 Thread Marcin Miłkowski
W dniu 2014-09-04 07:41, R.J. Baars pisze: Checking the results for Dutch Morfologik-speller, I found this issue: 4459.) Line 36899, column 1, Rule ID: MORFOLOGIK_RULE_EN_GB Message: Possible spelling mistake found Suggestion: afbakening; afbakenings- afbakenings- ^^^ This word is

Re: MorfologikSpeller

2014-09-04 Thread Marcin Miłkowski
W dniu 2014-09-03 20:06, R.J. Baars pisze: I replace the English dictionary with the newly generated Dutch one. Running the complete list of wrong and correct words through LT works. The output is less structured than I would like though. When there is no suggestion, the entire suggestion

Re: Bug is disambiguator?

2014-09-03 Thread Marcin Miłkowski
W dniu 2014-09-03 06:22, Dominique Pellé pisze: Hi Have a look in the following debug output of LanguageTool where a token gets non-sensical POS tag N.* (multiple times) after a disambiguation rule is applied. Is it a bug in the disambiguator? Or am writing an incorrect disambiguation

Re: Current limitations of MorfologikSpeller

2014-09-03 Thread Marcin Miłkowski
W dniu 2014-09-03 07:40, R.J. Baars pisze: I could, If I were able to code. I only do things on the XML level. Actually, you don't have to. The current morfologik dictionary implementation supports the normalization via fsa.dict.input-conversion property. See:

Re: MorfologikSpeller

2014-09-03 Thread Marcin Miłkowski
W dniu 2014-09-03 11:19, R.J. Baars pisze: The wiki states: LanguageTool's stand-alone version comes with a tool . and the .info file that's already part of LanguageTool. ... A bit further on, it says: Configuring the dictionary: The dictionary can be further configured using an

Re: MorfologikSpeller

2014-09-03 Thread Marcin Miłkowski
W dniu 2014-09-03 10:58, R.J. Baars pisze: To add the words frequencis, I am directed by the wiki to an address where there is a frequency list indeed. But only 187000 words; while I have 1.2 million Dutch words and their frequency myself. Probably the probabilities of their occurrence is

Re: MorfologikSpeller

2014-09-03 Thread Marcin Miłkowski
W dniu 2014-09-03 12:30, R.J. Baars pisze: Marcin, For English, there are .info files in /resource/ as well as in /resource/hunspell. First seems to be for the tagging dict, second for the speller. Ah, of course, there should be one .info file per one .dict file. I thought you were asking

Re: MorfologikSpeller

2014-09-03 Thread Marcin Miłkowski
W dniu 2014-09-03 14:26, R.J. Baars pisze: Marcin, I filtered the frequencies for any word found more than 50 times; thus 800.000 frequencies, about 4 times the size of the internet file. It adds about 0,4 MB to the dictionary, now in total 9.7 MB. The dictionary still needs some

Re: Bug is disambiguator?

2014-09-03 Thread Marcin Miłkowski
Hi all, OK, Jaume fixed Catalan rules, so I could integrate the change. Now filter action works only when the filter matches the token in the pattern. We'll see if it has any impact on today's nightly diff. If not, we'll keep the change, and add some documentation. Regards, Marcin

Re: Questions about new date checking rule

2014-08-31 Thread Marcin Miłkowski
W dniu 2014-08-30 23:35, Dominique Pellé pisze: Daniel Naber daniel.na...@languagetool.org mailto:daniel.na...@languagetool.org wrote: On 2014-08-29 21:50, Dominique Pellé wrote: Message: The date 31 September 2014 is not a Monday, but a Wednesday. Monday, 31 September 2014

Re: locqualityissuetype

2014-08-27 Thread Marcin Miłkowski
W dniu 2014-08-27 21:41, Jaume Ortolà i Font pisze: 2014-08-27 19:26 GMT+02:00 R.J. Baars r.j.ba...@xs4all.nl mailto:r.j.ba...@xs4all.nl: I see. But don't understand. What I do understand is it meant to specify something, out of an issue list. Is there an issue list somewhere

Re: chunks in exceptions

2014-08-18 Thread Marcin Miłkowski
W dniu 2014-08-18 04:11, Andriy Rysin pisze: On 08/16/2014 06:07 PM, Daniel Naber wrote: On 2014-08-11 01:47, Andriy Rysin wrote: I was writing a rule were I had to catch a phrase with last word being noun, but only if that noun is not part of adverb chunk (with another word following). The

Re: spell suggestions for irregular verbs

2014-07-30 Thread Marcin Miłkowski
W dniu 2014-07-27 20:23, Daniel Naber pisze: On 2014-07-27 11:20, Marcin Miłkowski wrote: I think we should use simpleReplaceRule instead. I think I use it for contractions already. The problem with that is that incorrectly used irregular verbs are often already detected by the spelling

Re: spell suggestions for irregular verbs

2014-07-27 Thread Marcin Miłkowski
I think we should use simpleReplaceRule instead. I think I use it for contractions already. Regards Marcin 27 lip 2014 10:14 Daniel Naber daniel.na...@languagetool.org napisał(a): Hi, what's the best way to provide good suggestions for misspelled irregular verbs, like buyed, beginned, or

Re: help with English style rules

2014-07-20 Thread Marcin Miłkowski
W dniu 2014-07-17 06:12, Kumara Bhikkhu pisze: Excellent answers. I'm no native speaker, but hope you don't mind me adding. Perhaps the only place where LT could *suggest* a comma after for example is when it begins the sentence. I think there will be also genuine cases of a missing comma

Re: Is exception\2/exception supposed to work?

2014-07-20 Thread Marcin Miłkowski
W dniu 2014-07-17 15:06, Dominique Pellé pisze: On Thu, Jul 17, 2014 at 2:49 PM, Dominique Pellé dominique.pe...@gmail.com mailto:dominique.pe...@gmail.com wrote: Daniel Naber daniel.na...@languagetool.org mailto:daniel.na...@languagetool.org wrote: On 2014-07-17 10:52,

Re: English native speaker help

2014-07-20 Thread Marcin Miłkowski
W dniu 2014-07-18 17:58, Mike Unwalla pisze: Hi Daniel, I am a native speaker of English. They are used to manage transfers through the PQR. LT: This verb is used with the gerund form: used to managing Possible false alarm, but only the writer knows. The verb 'used to' is used with the

Re: New Member to LT - for Tamil

2014-07-14 Thread Marcin Miłkowski
W dniu 2014-07-14 09:12, Elanjelian Venugopal pisze: Hi, have installed JDK 1.8.0_05 and tested. No changes. :( And, BTW, how do I push my changes to grammar.xml back to you? It appears I don't have sufficient permission to push it to the master. -e. Use pull request. Regards, Marcin On

Re: Questions about creating a synthesizer dictionary

2014-07-12 Thread Marcin Miłkowski
W dniu 2014-07-11 23:01, Daniel Naber pisze: On 2014-07-11 22:43, Dominique Pellé wrote: 1/ Why does the above command create files in /tmp rather than providing command line options to specify the outputs? There's no specific reason that I can remember. Feel free to change the command.

Re: Morphologic Analyser to solve concordance issue for Portuguese

2014-07-08 Thread Marcin Miłkowski
--- On 08/07/2014 10:08, Jaume Ortolà i Font wrote: 2014-07-08 9:37 GMT+02:00 Marcin Miłkowski list-addr...@wp.pl mailto:list-addr...@wp.pl: The Portuguese dictionary is already built. We simply haven't included it yet because we usually start from a certain number of rules

Re: These|Those + Singular Noun

2014-06-01 Thread Marcin Miłkowski
W dniu 2014-05-31 12:35, Kumara Bhikkhu pisze: Marcin Miłkowski wrote thus at 04:00 PM 31-05-14: W dniu 2014-05-31 08:29, Kumara Bhikkhu pisze: Here's what it doesn't catch: *I find _those translation_ misleading. It does: 1.) Line 1, column 8, Rule ID: THIS_NNS[2] Message: Did you mean

Re: These|Those + Singular Noun

2014-05-31 Thread Marcin Miłkowski
W dniu 2014-05-31 08:29, Kumara Bhikkhu pisze: Marcin Miłkowski wrote thus at 07:31 PM 30-05-14: Hm, there's already THIS_NNS[2] rule that finds these|those + singular noun. Is there any mistake that it does not find? It definitely detects the mistake as specified in your example above. You

Re: Found in a few grammar.xml files (en, de, ru)

2014-05-30 Thread Marcin Miłkowski
W dniu 2014-05-29 10:01, Marcin Miłkowski pisze: W dniu 2014-05-28 21:42, Dominique Pellé pisze: Hi Searching for in grammar.xml files, I see things that are wrong, or at least suspicious: $ ack-grep --xml '' languagetool-language-modules/*/src languagetool-language-modules/de/src

Re: Also without comma

2014-05-30 Thread Marcin Miłkowski
W dniu 2014-05-30 05:28, Kumara Bhikkhu pisze: Current LT flags sentences beginning with Also without a comma, and suggest adding a comma. I think and exception should be made when the following word is a verb. E.g.: Also specify your gender. Thanks! I fixed this, and I spotted some further

Re: These|Those + Singular Noun

2014-05-30 Thread Marcin Miłkowski
W dniu 2014-05-29 06:26, Kumara Bhikkhu pisze: I need help on this: rule id=ID name=These|Those + Singular Noun pattern token regexp='yes'these|those/token token postag='NN|NN:UN' postag_regexp='yes'exception postag='IN|VBP' postag_regexp='yes'//token /pattern

Re: Found in a few grammar.xml files (en, de, ru)

2014-05-29 Thread Marcin Miłkowski
W dniu 2014-05-28 21:42, Dominique Pellé pisze: Hi Searching for in grammar.xml files, I see things that are wrong, or at least suspicious: $ ack-grep --xml '' languagetool-language-modules/*/src languagetool-language-modules/de/src/main/resources/org/languagetool/rules/de/grammar.xml

Re: possible new English rule

2014-05-28 Thread Marcin Miłkowski
W dniu 2014-05-28 13:46, Jaume Ortolà i Font pisze: Could it be a useful rule? !-- English rule, 2014-05-28 -- rule id=ID name=a compete/complete pattern token regexp='yes'a|an|the/token token postag='VB|VBP' postag_regexp='yes'exception postag='VB|VBP' postag_regexp='yes'

Re: Dump

2014-05-27 Thread Marcin Miłkowski
Hi, maybe it was because of a simple mistake in the isNumberOrDot() method. I fixed it, so the today's build should run fine. Could you download the nightly and see whether you get crashes on your data? Best, Marcin W dniu 2014-05-27 09:20, R.J. Baars pisze: Hi. I am currently using

Re: Dump

2014-05-27 Thread Marcin Miłkowski
W dniu 2014-05-27 12:42, Daniel Naber pisze: On 2014-05-27 11:06, Marcin Miłkowski wrote: maybe it was because of a simple mistake in the isNumberOrDot() method. I fixed it, Are you sure you have pushed it? I cannot see it in the list of changes. Apparently, something weird happened after

Re: New rule for English

2014-05-19 Thread Marcin Miłkowski
W dniu 2014-05-19 05:21, Kumara Bhikkhu pisze: Please consider adding this. I'm unable to test it due to the and. Well, I don't see any mistake being detected here. It was/is... that is a way to express stress on some facts in the statement. This is perfect English and you could probably find

Re: False flag: I had to say no to them

2014-05-13 Thread Marcin Miłkowski
Thanks, I just fixed this, m. W dniu 2014-05-13 12:59, Kumara Bhikkhu pisze: Found a false flag. In this sentence: I had to say /no/ to them. LT flags no, saying it probably should be now. kb -- Accelerate Dev

Re: homophone detection

2014-05-07 Thread Marcin Miłkowski
W dniu 2014-05-07 16:16, Daniel Naber pisze: Hi, as you may know, After the Deadline is an Open Source text checker, quite similar to LT. It's not maintained anymore, so why not use some of its ideas in LT? A paper describing AtD is available at [1], it's well-written and provides a good

Re: incorrect antipattern IDs (bug in XML parser?) + antipattern sanity check

2014-05-06 Thread Marcin Miłkowski
W dniu 2014-05-06 00:30, Dominique Pellé pisze: Hi I've added antipattern sanity checks. It detects some problems in antipatterns for German and Polish. However, I have not checked-in yet because the antiPattern.getId() is incorrect. It seems to contain the ID of the previous rule,

Re: Possible bug in XML rule/disambiguation parsing

2014-05-05 Thread Marcin Miłkowski
Hi, W dniu 2014-05-04 07:06, Dominique Pellé pisze: Hi I've added a new pattern rule checker (commit commit e26967dc4663283574a8d536308c13ad188b44a0) and it finds this issue: The Catalan rule: FORCA2:6, token [1], contains força that contains token separators, so

Re: Possible bug in XML rule/disambiguation parsing

2014-05-05 Thread Marcin Miłkowski
W dniu 2014-05-05 11:21, Marcin Miłkowski pisze: Hi, W dniu 2014-05-04 07:06, Dominique Pellé pisze: Hi I've added a new pattern rule checker (commit commit e26967dc4663283574a8d536308c13ad188b44a0) and it finds this issue: The Catalan rule: FORCA2:6, token [1], contains força

Re: Suggestion: find POS tag of portion of a word in XML rules

2014-04-29 Thread Marcin Miłkowski
W dniu 2014-04-29 07:02, Dominique Pellé pisze: Daniel Naber daniel.na...@languagetool.org mailto:daniel.na...@languagetool.org wrote: On 2014-04-27 22:18, Dominique Pellé wrote: token regexp=yes postag_group1=fooez-(.*)/token I'm not sure how this could be implemented in a

Re: infix vs. prefix in morfologik

2014-04-26 Thread Marcin Miłkowski
W dniu 2014-04-26 20:07, Daniel Naber pisze: Hi Marcin and all, this is an older change, but I wonder: doesn't infix encoding imply prefix encoding? If so, shouldn't then the if .. else if be the other way round here in line 73 (DictionaryBuilder.java)?

  1   2   3   4   5   >