words frequencies

2014-10-14 Thread R.J. Baars
I am currently exporting word frequencies for all languages I have collected over the years. These frequency lists are 'dirty', which means there has been done no check if words are correct. That will be handled by the the speller anyway. Spell checker maintainers could also use it for input..

Re: words frequencies

2014-10-14 Thread R.J. Baars
The frequency lists are now available. You can find yours here: the 'gaia' format: www.spellonit.com/downloads/frequencies/language code_gaia.xml.zip the plain csv: www.spellonit.com/downloads/frequencies/language code_wordfreqs.csv.zip Ruud

Re: switching from Hunspell to Morfologik

2014-10-14 Thread Daniel Naber
On 2014-10-13 23:09, Juan Martorell wrote: java.io.IOException: Cannot load or parse input stream of '/org/languagetool/rules/es/grammar.xml' You have local changes in your grammar.xml, don't you? This exception indicates the 'name' attribute isn't set for a rule/rulegroup. It's not related

Re: words frequencies

2014-10-14 Thread Daniel Naber
On 2014-10-14 08:26, R.J. Baars wrote: the 'gaia' format: www.spellonit.com/downloads/frequencies/language code_gaia.xml.zip Could you list which ones are available? (Or configure the server so that it lists the directory when www.spellonit.com/downloads/frequencies/ is opened?) As this is

Re: switching from Hunspell to Morfologik

2014-10-14 Thread Dominique Pellé
Daniel Naber daniel.na...@languagetool.org wrote: Hi, to provide LT as a 100% pure Java software, I'd like to switch from Hunspell (native code) to Morfologik (Java-based). For that, I think the following languages are easy to switch: Asturian Galician Khmer Spanish

Re: words frequencies

2014-10-14 Thread R.J. Baars
I could list hem, but I don't want to yet. I first want to resolve the license. Just data It is a lot more work to collect data like this, than it is to make a little program. I don't see the difference. It is the effort and ingenuity that counts. It is not a plain collection, but picking

Wikicheck not working for some articles

2014-10-14 Thread Jaume Ortolà i Font
Hi, Wikicheck is not working now for articles with titles that include some diacritic. See, for example, [1]. It used to work well. Regards, Jaume Ortolà [1] http://tools.wmflabs.org/languagetool/pageCheck/index?lang=caurl=Llista_dels_rius_m%C3%A9s_llargs

Re: words frequencies

2014-10-14 Thread Daniel Naber
On 2014-10-14 08:49, R.J. Baars wrote: I even would rather exclude commercial use without written consent of the owner (me). In fact, I would object to any use except for open and free purposes. Is there a license that fits that? Creative Commons has a non-commercial option, but then we

Re: switching from Hunspell to Morfologik

2014-10-14 Thread Juan Martorell
On 14 October 2014 08:35, Daniel Naber daniel.na...@languagetool.org wrote: On 2014-10-13 23:09, Juan Martorell wrote: java.io.IOException: Cannot load or parse input stream of '/org/languagetool/rules/es/grammar.xml' You have local changes in your grammar.xml, don't you? This exception

Re: Wikicheck not working for some articles

2014-10-14 Thread Daniel Naber
On 2014-10-14 08:56, Jaume Ortolà i Font wrote: Wikicheck is not working now for articles with titles that include some diacritic. See, for example, [1]. It used to work well. I know... I don't know how to solve this, everything works fine locally and I also don't see what has changed. It

Re: switching from Hunspell to Morfologik

2014-10-14 Thread Daniel Naber
On 2014-10-11 12:00, Daniel Naber wrote: to provide LT as a 100% pure Java software, I'd like to switch from Hunspell (native code) to Morfologik (Java-based). For that, I think the following languages are easy to switch: Asturian I've switched over Asturian now, would be nice if

pom.xml cleanup

2014-10-14 Thread Daniel Naber
Hi, I did some internal cleanup to the poxm.xml files so that they contain less duplication. It shouldn't make a difference for anyone, but if you have problems building LT with Maven, let me know. Regards Daniel

frequency lists

2014-10-14 Thread R.J. Baars
After conferring a bit more with Daniel, I decided to make my company to publish the top 30% of the frequency lists free and open using CC-BY. This should be enough for LT. If you want to add frequencies to the morfologik speller, the frequency list for your language could be in the complete set

Re: words frequencies

2014-10-14 Thread Xavi Ivars
2014-10-14 8:49 GMT+02:00 R.J. Baars r.j.ba...@xs4all.nl: I could list hem, but I don't want to yet. I first want to resolve the license. Just data It is a lot more work to collect data like this, than it is to make a little program. I don't see the difference. It is the effort and