Re: added.txt activated for most languages

2014-12-23 Thread Daniel Naber
On 2014-12-22 22:51, Jaume Ortolà i Font wrote:

 I use the manual-tagger not only as a way to add new words and tags,
 but also as a means of fixing tags temporarily until the next
 dictionary update.  So if there is a manual tag, the dictionary tag is
 ignored.

For German, sometimes readings are missing, but the word itself is 
already known, i.e. its tags are incomplete. In those cases, I only add 
the missing readings to added.txt. So for German, I need the current 
approach. Maybe we could make that configurable so that the sub class 
decided which kind of combination it wants, and have a second combiner 
class?

Regards
  Daniel


--
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: English Rule Additions

2014-12-23 Thread Marcin Miłkowski
W dniu 2014-12-23 o 00:02, Nick Hough pisze:
 I have devised some rules for common English mistakes for the letter
 ‘A’, which you can see here:
 https://gist.github.com/howlinghuffy/d25d3d6b43c7a9b485cb

 I plan on doing many more submissions like this over the coming months;
 let me know what you think.

Looks nice to me. Did you run this over a corpus?

Also, it would be very useful to include url element to have more 
documentation for the end users. Link to some publicly available 
information on the web on your rules (a good dictionary etc.).

Best,
Marcin

--
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Improving spelling suggestions with frequency dictionaries

2014-12-23 Thread Daniel Naber
On 2014-12-23 04:55, Andriy Rysin wrote:

 2) I would like to suggest adding -o outputFile option to
 *DictionaryBuilder, as writing output to random tmp file makes
 scripting harder; I already have local changes working (using
 apache.cli library) so if nobody objects I will clean up the code and
 will push it in (the option will be optional so old way will still
 work)

I'd even suggest to make it non-optional, that will probably keep the 
code simpler.

Regards
  Daniel


--
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: added.txt activated for most languages

2014-12-23 Thread R.J. Baars
2 option, override or add. Those could be in oen file using an indicator,
or in two file.
Does not matter much.

Ruud

 On 2014-12-22 22:51, Jaume Ortolà i Font wrote:

 I use the manual-tagger not only as a way to add new words and tags,
 but also as a means of fixing tags temporarily until the next
 dictionary update.  So if there is a manual tag, the dictionary tag is
 ignored.

 For German, sometimes readings are missing, but the word itself is
 already known, i.e. its tags are incomplete. In those cases, I only add
 the missing readings to added.txt. So for German, I need the current
 approach. Maybe we could make that configurable so that the sub class
 decided which kind of combination it wants, and have a second combiner
 class?

 Regards
   Daniel


 --
 Dive into the World of Parallel Programming! The Go Parallel Website,
 sponsored by Intel and developed in partnership with Slashdot Media, is
 your
 hub for all things parallel software development, from weekly thought
 leadership blogs to news, videos, case studies, tutorials and more. Take a
 look and join the conversation now. http://goparallel.sourceforge.net
 ___
 Languagetool-devel mailing list
 Languagetool-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/languagetool-devel




--
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


RE: Plain English rules

2014-12-23 Thread Mike Unwalla
Marcin wrote: This said, they might be useful for technical writing; in such
writing, linguistic variation is indeed to be limited. But Mike Unwalla
would know better.

Heikki's rules are good, but they are not always applicable. 

A style is not a standard. 'Plain English' is not a standard. Thus, to
create a set of 'plain English' rules about which everyone agrees is
difficult.

A better strategy is to create set of rules for a specified style guide.
Refer to 'Enable using multiple rule sets' on
http://wiki.languagetool.org/missing-features. Although we can put a set of
rules in an external file
(http://wiki.languagetool.org/tips-and-tricks#toc9), we do not have an easy
way to select a set of rules.

Regards,

Mike Unwalla
Contact: www.techscribe.co.uk/techw/contact.htm 


-Original Message-
From: Marcin Milkowski [mailto:list-addr...@wp.pl] 
Sent: 22 December 2014 12:33
To: languagetool-devel@lists.sourceforge.net
Subject: Re: Plain English rules

W dniu 2014-12-22 o 11:33, Daniel Naber pisze:
 On 2014-12-20 11:32, Heikki Lehvaslaiho wrote:

 Heikki,

 I've set up a gist with 80 English rules that (mostly) expand
 redundant/wordy rules in LanguageTools 2.7. Testrules script passes
 these, but it would be good for someone to go though them before
 inclusion to the main rules file.

 https://gist.github.com/heikkil/4efc378102037651f755 [1]

 thanks for those rules! Style rules can cause false alarms, or the
 messages could be considered to be false alarms, so I'm not sure whether
 we should activate these rules by default. What do others think?

I think these rules are following extreme prescriptivism.

I am strongly against the inclusion of such rules as turned on by 
default, because they raise false alarms for perfect English. My rough 
guide is this: if your rules tell that Jane Austen and Charles Dickens 
are bad writers, then your rules are simply wrong. And Dickens does use 
the words indicated in the rules; see for example 'accompany':

https://books.google.pl/books?id=INkAes9Y5AYCpg=PA538lpg=PA538dq=accompan
y+%22charles+dickens%22source=blots=_lFgWHI48osig=X1vs7tIDaTPM9WSA7sGsXCP
OwRohl=plsa=Xei=zg6YVK6RCMWBU-XEgdAFved=0CE4Q6AEwBg#v=onepageq=accompan
y%20%22charles%20dickens%22f=false

(page 223).

This said, they might be useful for technical writing; in such writing, 
linguistic variation is indeed to be limited. But Mike Unwalla would 
know better.

Best regards,
Marcin


--



--
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Tests fail: concurrency problem?

2014-12-23 Thread Jaume Ortolà i Font
Hi,

I get test errors in HTTPServerLoadTest with the current master branch (no
other changes), the same or similar errors in different machines. Has
anyone else seen this error?

Regards,
Jaume Ortolà





Tests in error:
  HTTPServerLoadTest.testHTTPServer:61 » Execution
java.lang.AssertionError: Res...


---

Test set: org.languagetool.server.HTTPServerLoadTest

---

Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 20.644 sec
 FAILURE!
testHTTPServer(org.languagetool.server.HTTPServerLoadTest)  Time elapsed:
20.355 sec   ERROR!
java.util.concurrent.ExecutionException: java.lang.AssertionError: Result:
?xml version=1.0 encodi\
ng=UTF-8?

matches software=LanguageTool version=2.8-SNAPSHOT
buildDate=2014-12-23 09:54
language shortname=pl-PL name=Polish mothertongueshortname=en
mothertonguename=English/
error fromy=1 fromx=0 toy=1 tox=39 ruleId=TRANSLATION_LENGTH
msg=Source and target transl\
ation lengths are very different replacements= context=To jest okropnie
długi tekst, naprawdę! c\
ontextoffset=0 offset=0 errorlength=39 locqualityissuetype=length/

/matches
--
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: English Rule Additions

2014-12-23 Thread Nick Hough

 W dniu 2014-12-23 o 00:02, Nick Hough pisze:
 I have devised some rules for common English mistakes for the letter
 ?A?, which you can see here:
 https://gist.github.com/howlinghuffy/d25d3d6b43c7a9b485cb
 
 I plan on doing many more submissions like this over the coming months;
 let me know what you think.
 
 Looks nice to me. Did you run this over a corpus?
 
 Also, it would be very useful to include url element to have more 
 documentation for the end users. Link to some publicly available 
 information on the web on your rules (a good dictionary etc.).
 
 Best,
 Marcin

Yes, I have run all of the rules over the wikipedia and tatoeba corpora.  I am 
getting most of the information for these rules from written literature not 
publicly available, however I will search for appropriate URLs to add.  Is this 
mailing list the best place to post new rules in the future?

Regards,

Nick
--
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Tests fail: concurrency problem?

2014-12-23 Thread Yakov Reztsov


Hi,
I also got this error .


Tue, 23 Dec 2014 12:44:12 +0100 от Jaume Ortolà i Font:
Hi, 

I get test errors in HTTPServerLoadTest with the current master branch (no 
other changes), the same or similar errors in different machines. Has anyone 
else seen this error?

Regards,
Jaume Ortolà





Tests in error: 
  HTTPServerLoadTest.testHTTPServer:61 » Execution java.lang.AssertionError: 
Res...


---
                       
Test set: org.languagetool.server.HTTPServerLoadTest                           
                       
---
                       
Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 20.644 sec  
FAILURE!               
testHTTPServer(org.languagetool.server.HTTPServerLoadTest)  Time elapsed: 
20.355 sec   ERROR!      
java.util.concurrent.ExecutionException: java.lang.AssertionError: Result: 
?xml version=1.0 encodi\
ng=UTF-8?                                                                   
                       
matches software=LanguageTool version=2.8-SNAPSHOT buildDate=2014-12-23 
09:54                 
language shortname=pl-PL name=Polish mothertongueshortname=en 
mothertonguename=English/     
error fromy=1 fromx=0 toy=1 tox=39 ruleId=TRANSLATION_LENGTH 
msg=Source and target transl\
ation lengths are very different replacements= context=To jest okropnie 
długi tekst, naprawdę! c\
ontextoffset=0 offset=0 errorlength=39 locqualityissuetype=length/    
                       
/matches 
--
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now.  http://goparallel.sourceforge.net
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel



-- 

Yakov Reztsov
--
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Improving spelling suggestions with frequency dictionaries

2014-12-23 Thread Dominique Pellé
Hi

I have this script...

languagetool/languagetool-language-modules/fr/src/main/resources/org/languagetool/resource/fr/create-lexicon.sh

... which works by assuming that SynthDictionaryBuilder java
program creates its output files in /tmp/...

But it would be trivial to modify the script if -o option is added
which will make the script simpler and more robust.

Above script even has a comment saying...

# The Java program outputs temporary files in /tmp which is not
# convenient (it would be better to indicate the location of output files).


So I'm also in favor of of making -o option mandatory.

Regards
Dominique


Andriy Rysin ary...@gmail.com wrote:

 That may break existing scripts that depende on that, is everybody ok
 with such a change?

 Regards,
 Andriy

 2014-12-23 3:41 GMT-05:00 Daniel Naber daniel.na...@languagetool.org:
 On 2014-12-23 04:55, Andriy Rysin wrote:

 2) I would like to suggest adding -o outputFile option to
 *DictionaryBuilder, as writing output to random tmp file makes
 scripting harder; I already have local changes working (using
 apache.cli library) so if nobody objects I will clean up the code and
 will push it in (the option will be optional so old way will still
 work)

 I'd even suggest to make it non-optional, that will probably keep the
 code simpler.

 Regards
   Daniel


 --
 Dive into the World of Parallel Programming! The Go Parallel Website,
 sponsored by Intel and developed in partnership with Slashdot Media, is your
 hub for all things parallel software development, from weekly thought
 leadership blogs to news, videos, case studies, tutorials and more. Take a
 look and join the conversation now. http://goparallel.sourceforge.net
 ___
 Languagetool-devel mailing list
 Languagetool-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/languagetool-devel

 --
 Dive into the World of Parallel Programming! The Go Parallel Website,
 sponsored by Intel and developed in partnership with Slashdot Media, is your
 hub for all things parallel software development, from weekly thought
 leadership blogs to news, videos, case studies, tutorials and more. Take a
 look and join the conversation now. http://goparallel.sourceforge.net
 ___
 Languagetool-devel mailing list
 Languagetool-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/languagetool-devel

--
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Improving spelling suggestions with frequency dictionaries

2014-12-23 Thread Andriy Rysin
I have pushed a change to SpellDictionaryBuilder to require -o
outputFile option. I tried to make the changes small so there's
probably room for improvement.

If this approach works we can use the same for other dictionary builders.

Andriy

2014-12-23 11:58 GMT-05:00 Dominique Pellé dominique.pe...@gmail.com:
 Hi

 I have this script...

 languagetool/languagetool-language-modules/fr/src/main/resources/org/languagetool/resource/fr/create-lexicon.sh

 ... which works by assuming that SynthDictionaryBuilder java
 program creates its output files in /tmp/...

 But it would be trivial to modify the script if -o option is added
 which will make the script simpler and more robust.

 Above script even has a comment saying...

 # The Java program outputs temporary files in /tmp which is not
 # convenient (it would be better to indicate the location of output files).


 So I'm also in favor of of making -o option mandatory.

 Regards
 Dominique


 Andriy Rysin ary...@gmail.com wrote:

 That may break existing scripts that depende on that, is everybody ok
 with such a change?

 Regards,
 Andriy

 2014-12-23 3:41 GMT-05:00 Daniel Naber daniel.na...@languagetool.org:
 On 2014-12-23 04:55, Andriy Rysin wrote:

 2) I would like to suggest adding -o outputFile option to
 *DictionaryBuilder, as writing output to random tmp file makes
 scripting harder; I already have local changes working (using
 apache.cli library) so if nobody objects I will clean up the code and
 will push it in (the option will be optional so old way will still
 work)

 I'd even suggest to make it non-optional, that will probably keep the
 code simpler.

 Regards
   Daniel


 --
 Dive into the World of Parallel Programming! The Go Parallel Website,
 sponsored by Intel and developed in partnership with Slashdot Media, is your
 hub for all things parallel software development, from weekly thought
 leadership blogs to news, videos, case studies, tutorials and more. Take a
 look and join the conversation now. http://goparallel.sourceforge.net
 ___
 Languagetool-devel mailing list
 Languagetool-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/languagetool-devel

 --
 Dive into the World of Parallel Programming! The Go Parallel Website,
 sponsored by Intel and developed in partnership with Slashdot Media, is your
 hub for all things parallel software development, from weekly thought
 leadership blogs to news, videos, case studies, tutorials and more. Take a
 look and join the conversation now. http://goparallel.sourceforge.net
 ___
 Languagetool-devel mailing list
 Languagetool-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/languagetool-devel

 --
 Dive into the World of Parallel Programming! The Go Parallel Website,
 sponsored by Intel and developed in partnership with Slashdot Media, is your
 hub for all things parallel software development, from weekly thought
 leadership blogs to news, videos, case studies, tutorials and more. Take a
 look and join the conversation now. http://goparallel.sourceforge.net
 ___
 Languagetool-devel mailing list
 Languagetool-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/languagetool-devel

--
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel