Re: added.txt activated for most languages
On 2014-12-22 22:51, Jaume Ortolà i Font wrote: I use the manual-tagger not only as a way to add new words and tags, but also as a means of fixing tags temporarily until the next dictionary update. So if there is a manual tag, the dictionary tag is ignored. For German, sometimes readings are missing, but the word itself is already known, i.e. its tags are incomplete. In those cases, I only add the missing readings to added.txt. So for German, I need the current approach. Maybe we could make that configurable so that the sub class decided which kind of combination it wants, and have a second combiner class? Regards Daniel -- Dive into the World of Parallel Programming! The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: English Rule Additions
W dniu 2014-12-23 o 00:02, Nick Hough pisze: I have devised some rules for common English mistakes for the letter ‘A’, which you can see here: https://gist.github.com/howlinghuffy/d25d3d6b43c7a9b485cb I plan on doing many more submissions like this over the coming months; let me know what you think. Looks nice to me. Did you run this over a corpus? Also, it would be very useful to include url element to have more documentation for the end users. Link to some publicly available information on the web on your rules (a good dictionary etc.). Best, Marcin -- Dive into the World of Parallel Programming! The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: Improving spelling suggestions with frequency dictionaries
On 2014-12-23 04:55, Andriy Rysin wrote: 2) I would like to suggest adding -o outputFile option to *DictionaryBuilder, as writing output to random tmp file makes scripting harder; I already have local changes working (using apache.cli library) so if nobody objects I will clean up the code and will push it in (the option will be optional so old way will still work) I'd even suggest to make it non-optional, that will probably keep the code simpler. Regards Daniel -- Dive into the World of Parallel Programming! The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: added.txt activated for most languages
2 option, override or add. Those could be in oen file using an indicator, or in two file. Does not matter much. Ruud On 2014-12-22 22:51, Jaume Ortolà i Font wrote: I use the manual-tagger not only as a way to add new words and tags, but also as a means of fixing tags temporarily until the next dictionary update. So if there is a manual tag, the dictionary tag is ignored. For German, sometimes readings are missing, but the word itself is already known, i.e. its tags are incomplete. In those cases, I only add the missing readings to added.txt. So for German, I need the current approach. Maybe we could make that configurable so that the sub class decided which kind of combination it wants, and have a second combiner class? Regards Daniel -- Dive into the World of Parallel Programming! The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel -- Dive into the World of Parallel Programming! The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
RE: Plain English rules
Marcin wrote: This said, they might be useful for technical writing; in such writing, linguistic variation is indeed to be limited. But Mike Unwalla would know better. Heikki's rules are good, but they are not always applicable. A style is not a standard. 'Plain English' is not a standard. Thus, to create a set of 'plain English' rules about which everyone agrees is difficult. A better strategy is to create set of rules for a specified style guide. Refer to 'Enable using multiple rule sets' on http://wiki.languagetool.org/missing-features. Although we can put a set of rules in an external file (http://wiki.languagetool.org/tips-and-tricks#toc9), we do not have an easy way to select a set of rules. Regards, Mike Unwalla Contact: www.techscribe.co.uk/techw/contact.htm -Original Message- From: Marcin Milkowski [mailto:list-addr...@wp.pl] Sent: 22 December 2014 12:33 To: languagetool-devel@lists.sourceforge.net Subject: Re: Plain English rules W dniu 2014-12-22 o 11:33, Daniel Naber pisze: On 2014-12-20 11:32, Heikki Lehvaslaiho wrote: Heikki, I've set up a gist with 80 English rules that (mostly) expand redundant/wordy rules in LanguageTools 2.7. Testrules script passes these, but it would be good for someone to go though them before inclusion to the main rules file. https://gist.github.com/heikkil/4efc378102037651f755 [1] thanks for those rules! Style rules can cause false alarms, or the messages could be considered to be false alarms, so I'm not sure whether we should activate these rules by default. What do others think? I think these rules are following extreme prescriptivism. I am strongly against the inclusion of such rules as turned on by default, because they raise false alarms for perfect English. My rough guide is this: if your rules tell that Jane Austen and Charles Dickens are bad writers, then your rules are simply wrong. And Dickens does use the words indicated in the rules; see for example 'accompany': https://books.google.pl/books?id=INkAes9Y5AYCpg=PA538lpg=PA538dq=accompan y+%22charles+dickens%22source=blots=_lFgWHI48osig=X1vs7tIDaTPM9WSA7sGsXCP OwRohl=plsa=Xei=zg6YVK6RCMWBU-XEgdAFved=0CE4Q6AEwBg#v=onepageq=accompan y%20%22charles%20dickens%22f=false (page 223). This said, they might be useful for technical writing; in such writing, linguistic variation is indeed to be limited. But Mike Unwalla would know better. Best regards, Marcin -- -- Dive into the World of Parallel Programming! The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Tests fail: concurrency problem?
Hi, I get test errors in HTTPServerLoadTest with the current master branch (no other changes), the same or similar errors in different machines. Has anyone else seen this error? Regards, Jaume Ortolà Tests in error: HTTPServerLoadTest.testHTTPServer:61 » Execution java.lang.AssertionError: Res... --- Test set: org.languagetool.server.HTTPServerLoadTest --- Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 20.644 sec FAILURE! testHTTPServer(org.languagetool.server.HTTPServerLoadTest) Time elapsed: 20.355 sec ERROR! java.util.concurrent.ExecutionException: java.lang.AssertionError: Result: ?xml version=1.0 encodi\ ng=UTF-8? matches software=LanguageTool version=2.8-SNAPSHOT buildDate=2014-12-23 09:54 language shortname=pl-PL name=Polish mothertongueshortname=en mothertonguename=English/ error fromy=1 fromx=0 toy=1 tox=39 ruleId=TRANSLATION_LENGTH msg=Source and target transl\ ation lengths are very different replacements= context=To jest okropnie długi tekst, naprawdę! c\ ontextoffset=0 offset=0 errorlength=39 locqualityissuetype=length/ /matches -- Dive into the World of Parallel Programming! The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: English Rule Additions
W dniu 2014-12-23 o 00:02, Nick Hough pisze: I have devised some rules for common English mistakes for the letter ?A?, which you can see here: https://gist.github.com/howlinghuffy/d25d3d6b43c7a9b485cb I plan on doing many more submissions like this over the coming months; let me know what you think. Looks nice to me. Did you run this over a corpus? Also, it would be very useful to include url element to have more documentation for the end users. Link to some publicly available information on the web on your rules (a good dictionary etc.). Best, Marcin Yes, I have run all of the rules over the wikipedia and tatoeba corpora. I am getting most of the information for these rules from written literature not publicly available, however I will search for appropriate URLs to add. Is this mailing list the best place to post new rules in the future? Regards, Nick -- Dive into the World of Parallel Programming! The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: Tests fail: concurrency problem?
Hi, I also got this error . Tue, 23 Dec 2014 12:44:12 +0100 от Jaume Ortolà i Font: Hi, I get test errors in HTTPServerLoadTest with the current master branch (no other changes), the same or similar errors in different machines. Has anyone else seen this error? Regards, Jaume Ortolà Tests in error: HTTPServerLoadTest.testHTTPServer:61 » Execution java.lang.AssertionError: Res... --- Test set: org.languagetool.server.HTTPServerLoadTest --- Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 20.644 sec FAILURE! testHTTPServer(org.languagetool.server.HTTPServerLoadTest) Time elapsed: 20.355 sec ERROR! java.util.concurrent.ExecutionException: java.lang.AssertionError: Result: ?xml version=1.0 encodi\ ng=UTF-8? matches software=LanguageTool version=2.8-SNAPSHOT buildDate=2014-12-23 09:54 language shortname=pl-PL name=Polish mothertongueshortname=en mothertonguename=English/ error fromy=1 fromx=0 toy=1 tox=39 ruleId=TRANSLATION_LENGTH msg=Source and target transl\ ation lengths are very different replacements= context=To jest okropnie długi tekst, naprawdę! c\ ontextoffset=0 offset=0 errorlength=39 locqualityissuetype=length/ /matches -- Dive into the World of Parallel Programming! The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel -- Yakov Reztsov -- Dive into the World of Parallel Programming! The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: Improving spelling suggestions with frequency dictionaries
Hi I have this script... languagetool/languagetool-language-modules/fr/src/main/resources/org/languagetool/resource/fr/create-lexicon.sh ... which works by assuming that SynthDictionaryBuilder java program creates its output files in /tmp/... But it would be trivial to modify the script if -o option is added which will make the script simpler and more robust. Above script even has a comment saying... # The Java program outputs temporary files in /tmp which is not # convenient (it would be better to indicate the location of output files). So I'm also in favor of of making -o option mandatory. Regards Dominique Andriy Rysin ary...@gmail.com wrote: That may break existing scripts that depende on that, is everybody ok with such a change? Regards, Andriy 2014-12-23 3:41 GMT-05:00 Daniel Naber daniel.na...@languagetool.org: On 2014-12-23 04:55, Andriy Rysin wrote: 2) I would like to suggest adding -o outputFile option to *DictionaryBuilder, as writing output to random tmp file makes scripting harder; I already have local changes working (using apache.cli library) so if nobody objects I will clean up the code and will push it in (the option will be optional so old way will still work) I'd even suggest to make it non-optional, that will probably keep the code simpler. Regards Daniel -- Dive into the World of Parallel Programming! The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel -- Dive into the World of Parallel Programming! The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel -- Dive into the World of Parallel Programming! The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: Improving spelling suggestions with frequency dictionaries
I have pushed a change to SpellDictionaryBuilder to require -o outputFile option. I tried to make the changes small so there's probably room for improvement. If this approach works we can use the same for other dictionary builders. Andriy 2014-12-23 11:58 GMT-05:00 Dominique Pellé dominique.pe...@gmail.com: Hi I have this script... languagetool/languagetool-language-modules/fr/src/main/resources/org/languagetool/resource/fr/create-lexicon.sh ... which works by assuming that SynthDictionaryBuilder java program creates its output files in /tmp/... But it would be trivial to modify the script if -o option is added which will make the script simpler and more robust. Above script even has a comment saying... # The Java program outputs temporary files in /tmp which is not # convenient (it would be better to indicate the location of output files). So I'm also in favor of of making -o option mandatory. Regards Dominique Andriy Rysin ary...@gmail.com wrote: That may break existing scripts that depende on that, is everybody ok with such a change? Regards, Andriy 2014-12-23 3:41 GMT-05:00 Daniel Naber daniel.na...@languagetool.org: On 2014-12-23 04:55, Andriy Rysin wrote: 2) I would like to suggest adding -o outputFile option to *DictionaryBuilder, as writing output to random tmp file makes scripting harder; I already have local changes working (using apache.cli library) so if nobody objects I will clean up the code and will push it in (the option will be optional so old way will still work) I'd even suggest to make it non-optional, that will probably keep the code simpler. Regards Daniel -- Dive into the World of Parallel Programming! The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel -- Dive into the World of Parallel Programming! The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel -- Dive into the World of Parallel Programming! The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel -- Dive into the World of Parallel Programming! The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel