For the record, I respect that this type of work takes a *lot* of time and hard work, and that people do have the right to make their work copyleft.
There is however, for practical purposes, a huge difference for us between MPL/LGPL (the french case) and GPL-only (the italian case). Pedro. --- On Mon, 11/7/11, Olivier R. <[email protected]> wrote: > Hello everyone, > > I don’t like mailing-lists, so I have subscribed here > just to explain few things about dictionaries. Then I’ll > vanish. > > Rob Weir wrote: > > Just make sure that you explain what a spell checking > dictionary is. > > Otherwise any legal types will be confused. This > is not a dictionary > > like Webster's, with words and definitions, where the > definitions are > > creative content. A spell checking dictionary is > more of a word list. > > I'm not sure what the creative > expression is in a list of all common > > words in a language and how that could be > copyrighted. Of course, I > > am not a lawyer. > > Few dictionaries are just words lists, but most of them are > lists of words tagged with flags described in an affixation > file which specify what are the rules to generate > inflexions. This affixation file can be quite simple or very > complex. And this can be a difficult matter. > It looks easy at first, but when you begin to get > deeper in this matter, there is often a lot of issues to > handle. Create a proper affixation file can really be a hard > work. And even if the difficulty is > not high, this can be a very long job. > So, no, Hunspell dictionaries are not just words > lists. > > For example, it took me one year and countless hours of > work to rewrite the affixation file of the French > dictionaries from scratch. Even after that, there were still > a lot of bugs (not spelling mistakes). For one year, I had > to patch regularly the affixation file. Even after few > years, there is still sometimes something to fix. The French > dictionaries contain approximatively 13000 rules. > Here an example of one of the most complex flags: > http://www.dicollecte.org/affixes.php?prj=fr&flag=c2 > > (AFAIK, there is only one dictionary which has a more > complex affixation file, the Hungarian one.) > > I also tagged the affixation file in order to generate 4 > different dictionaries with a script, to offer to users the > mean to write according to their preferences towards the > optional and controversial French spelling reform of 1990. > > Besides, 99 % of entries have been manually grammatically > tagged. > Several contributors did a tremendous job by adding > lexical tags, adding many words, moving entries in different > subdictionaries according to our policy, handling special > cases, reporting mistakes and issues. Because, spelling > matters are much more complex than you think, > especially if you want to use your dictionary for grammar > checking. > We often have to handle old, new or variant spelling > just for one word, and there are decisions to take about > what to do with special cases, which are actually very > numerous. Managing dictionaries is not a trivial task. > Here is the "bugtracker" where we work on the French > dictionaries. > http://www.dicollecte.org/propositions.php?prj=fr&tab=E > [fr] > (This bugtracker also allows us to commit in the > dictionary in the database.) > The changelog: > http://www.dicollecte.org/log.php?prj=fr > > This dictionary is used by the both French grammar > checkers. > > What you said about copyright could be right for a list > generated by script from a corpus, but that’s not true for > dictionaries who are conceived by human with their > knowledge, their work and their choices. > > > > But we'll never resolve this on legal grounds. > At Apache we would not > > bundle a dictionary under a legal theory if the > compiler of the > > dictionary did not want us to. I think we should > respect the > > dictionary compiler's wishes and intent, > > _even if legally we're not obligated to_. > > Wow... That’s really not encouraging for people who may > consider to change the license of their work... Does IBM > think the same way? > Few years ago, when I began to contribute for FLOSS, > I thought the less restrictive licenses were the better > ones, only because I didn’t care and I was ignorant about > licensing and political matters. > As time goes, I think more and more the opposite. > And when I read you, I’m beginning to think I was still > too soft on that topic. > > > > 3) We could contact the compilers of the dictionary > and ask if they > > would make them available under a difference > license. Generally > > people make things available under an OSS license > because they want to > > see other projects use them. If we tell them > that a leading > > application like OpenOffice can no longer user their > dictionary, this > > might persuade them to change their license. > > Here is the situation for the French dictionaries: > > 1. The Hunspell spelling dictionaries > Licenses: MPL/LGPL/GPL > > As I am the sole author of the affixation file, as I > grammatically tagged myself about 90 % of all entries > (without copying another lexicon with a script), I can say > for sure that I do not intend to change the licenses for the > Apache one. > When I built Dicollecte, my goal was to encourage > people to contribute for all and give back the improvements > they did. Switching to the Apache license would be a > contradiction with everything I did. > > By the way, these dictionaries _require_ Hunspell. > They won’t work properly with Myspell. I saw a lot of > people think Hunspell dictionaries will work with Myspell. > That’s a wrong assumption. Hunspell can use Myspell > dictionaries, but Hunspell also offers a lot of new features > which allow to improve the dictionaries structure. > And Myspell does not recognize double suffixation or > double prefixation, cannot handle duplicate lemmas, does not > handle morphological tags, has a limited amount of flags, > does not recognize Hunspell compound commands, etc. (I am > not even sure that Myspell can use UTF-8 files.) > > But, good for you, AFAIK, many dictionnaries still > have a Myspell structure. But not the French ones and some > others. > > > 2. The thesaurus > The initial and main author released it under > license LGPL. > Now he’s dead. AFAIK, there is no way to change > the license before his work is considered as puplic domain, > but there also have been several improvements on the initial > work. > At the moment, I am working on it to transform it as > a list of "synsets" which could be used to generate a better > thesaurus. A list of synsets would be a far better basis to > work on. I don’t know if I will succeed. This is a > difficult matter and it requires a lot of work. > > > 3. Hyphenation rules > Licence LGPL. > This is a dictionary converted from the hyphenation > rules for TeX, > modified somehow to handle several issues. > I did nothing on it. I’m just packaging it in the > extensions for > OOo/LibO. You'll have to contact the peoples who created > it. > > > > 4) We could convert another word list or dictionary, > one that has a > > better license, into Hunspell format. > > Hmmm... > You may generate affixation rules for Myspell with a > script… but then, these dictionaries will probably be such > a mess that you’ll be very lucky if you find someone with > enough abnegation to improve it. The main issues of > dictionaries are: > - if you just create a list of words, you may only > improve it with text parser or other lexicons, but it will > be hard and annoying to improve it manually, as the list > will be very, very long, and it will be a memory waste. And > each times you will regenerate it with your script, you’ll > have to fix again manually what you did before. > - if you create an affixation file with script, your > dictionary will be a mess, no easy way to improve it, as the > dictionary structure will not be intuitive for a human > being. And again, you cannot really mix improvements by > scripting and improvements by human being. > The best way is to get somewhere a good lexicon > already tagged with a non-restrictive license. Even then, > you’ll have to write manually a proper affixation file… > and then, you may discover it is not the easy task you may > think it is, unless your language is somehow very logical, > with neither exceptions, neither weird stuff… > > > Regards, > Olivier R. >
