Good day, all! Finally had time to read and think a bit, especially Jameson's design.
(For the record: Andrew kindly included me in your discussion, because I have worked on a few multilingual projects in the past, and got to think about these issues somewhat.) I like many aspects of the design a lot; especially the idea that the .py files should be in English as much as possible, with translation on load and save. This has two obvious advantages, namely a) we do not have to store translation between a matrix of languages, but we are allowed a simpler hub-and-spoke model with English as the hub b) the current interpreter can work without knowing about all this. However, Jameson, if I may, I would take issue with a few assumptions of your model: most especially that of a "preferred language" for modules. I am referring to your point 3: > 3-This dictionary ONLY contains translations for the "public > interface" of somemodule.py, that is, those identifiers which are > used in importer modules. It also defines a single, unchanging > "preferred language" for that file, which is the assumed language > for all non-translated identifiers in that file. I am especially interested in collaborative work; and I believe it is not unreasonable to hope that children between schools in different countries will get to share some work. That would mean that a given modules may have many editors, possibly introducing identifiers in more than one non-English language. From that point of view, "preferred language" is a feature of an editing environment, not of a module. New identifiers should be individually tagged by language; I see that tagging as appropriate work for the editing environment. Basically, upon loading the file, all local identifiers would be read in memory; upon saving, new ones would be saved with a language tag. (Plausibly as a postfix, Identifier_i18n_2letterLanguageCode...) (I would otherwise follow Mike's suggestion to use a fixed transliteration table for non-latin scripts.) This only applies until we have a valid English version of the identifier, of course; at that point, it will serve as the hub. But that raises another issue, which you tackle in point 5 and 6: what happens with imports in other modules that use the old generated identifier? You suggest keeping a separate history. It is a possibility, but I fear it goes counter to the goal of making the files usable by the existing interpreter. (Though you may have thought of a workaround this that I have missed.) My suggestion would be as follows: (I will use French for my example.) in premier_module.py: def une_fonction__i18n_fr: ... EOF in deuxieme_module.py: du premier_module importe une_fonction__i18n_fr ... EOF Then, the translation a_function is introduced for une_fonction... So premier_module.py becomes: def a_function: ... # -*- Translation history block -*- une_fonction__i18n_fr = a_fonction # -*- End translation history block -*- EOF (N.B. 1: The translation history block could be hidden in a knowledgeable editor; but we should have access to it, so as to explain why that word is still reserved.) (N.B. 2: Actually, it is likely that premier_module.py has been renamed to first_module.py, and the package's __init__.py has a similar equivalence in _its_ translation block!!!) That way, the original import in deuxieme_module still works, in an unmodified python interpreter. (Until the knowledgeable editor gets to work on deuxieme_module.py again, of course.) Even if someone decides on a better translation later on, more than one version may be kept in the translation block. This has the disadvantage of polluting the code, but the advantage of polluting the filesystem less. I am realizing a broader application of this mechanism: the translation block could be tagged with a revision number (if __revision__<540:), and the "import" command could mention the last known revision; so translation blocks would only be activated at need. But that's all another story. Another quick related note: What if someone adds a translation between two non-English languages? In your first email, you explicitly forbid it; I am not sure that is necessary. (I am not sure you think of it as necessary in your later design as well.) Clearly, however, X to Y translations may have to refer to the history (as language X is replaced by English) so as to become English to Y. To finish with your design points, you introduce what I see as a severe limitation in your point 4: > 4-There is good UI support for creating a new translation for a > word. However, the assumed user model is that words will be > translated INTO a users preferred language; FROM the context of an > importer module (you'd generally not add translations for a module > from that module itself, since generally you wouldn't even have > modules open whose preferred language is not your own); and > therefore WITH an explicit user decision as to which module this > translation belongs in (they want to use their language for > identifier X which is in English, well, they must have had a reason > to write it in English rather than their language so they > presumably know what imported module it comes from.) What really made me jump is the notion that "you wouldn't even have modules open whose preferred language is not your own". Again, this assumes a single preferred language per module, which is something I would rather avoid, and I believe is not necessary if identifiers have a language mark. However, I suspect your mention of "from the context of an importer module" comes from the issues you encountered with memorizing the import structure. I would like to hear more about the problems you ran into there, because I believe it is necessary (for reasons to be detailed below.) Now, a few suggestions and pitfalls of my own: a) I believe there should be one translation file per language. More file pollution, less parsing. I suspect that something akin to the getinfo file structure would be appropriate: package/module1.py would be translated in package/_t9n_/fr/module1.pyt package/_t9n_/fr/module1.pyto (object, like a .mo file) package/_t9n_/es/module1.pyt package/_t9n_/es/module1.pyto and so on. b) A particularly fancy editor would color-code words in other languages instead of showing the _i18n_xx tag. Of course there would be a way to access online translation services to get suggestions (as has been suggested by many.) c) Sci-fi scenario: any new translation suggestion by a child or educator should be made available to others using a distributed database system... (they are likely to work on common projects, and hence on common modules.) The children educators known to be knowledgeable about a given language pair should have a way to vet translations in that database. Oh, and let's send it to planet python so we have a basis to build the translation files to the standard library for very obscure languages ;-) (OK, that _is_ sci-fi. Still worth thinking about!) d) Back to earth: I said we really had to know the import structure... here is a slew of related problems: Suppose we are editing a module that is importing something from the core library: from moduleX import f1 from moduleZ import f2 f1() f2() Now, suppose f1 and f2 both translate as "sigma" in the current editing language... Then, though the .py code is unambiguous, the translated on-screen code looks ambiguous; and worse, the un- translation process on save is not well-defined. The solution is to actually un-specify the imports in the source code: import moduleX import moduleY moduleX.sigma() moduleY.sigma() This refactoring should be possible in most cases, unless two top- level modules have similar translations. (say moduleX and module Y both translate as "modula") This situation should be marked as an error; or alternately _display_ the following: import moduleX__i18n_en_ as modula_1 import moduleY__i18n_en_ as modula_2 modula_1.sigma() module_2.sigma() This is not an interpreter-level change, but a disguised display. (or rather a refactoring which can be memorized, and reverted by the untranslation machinery.) Note that display-only import disambiguation may also be necessary if the above code happens in a core library file (which we would never modify.) In any case it is useful to flag as an error any translation that introduces ambiguity within the same namespace. Similar transformations may be made necessary by "from moduleX import *" syntax. None of this is simple, as I said; but alas probably necessary. e) Would we display numbers as the equivalent numerics in other writing systems? f) Docstrings... are another issue entirely. I still like my idea of a distributed database, so children puzzling out a foreign (to them) docstring with online help can put their minds together. OK, I am giving more problems than solutions, here; and unfortunately, my spare time is otherwise quite occupied, so I doubt I can contribute to implementation; still, I hope that spelling some of these things out is useful to others. I'll try to keep my thinking cap on as this discussion evolves. Cheers, Marc-Antoine Parent http://maparent.ca/ P.S. I _love_ your idea of arrows in the margin to indicate flow! _______________________________________________ Sugar mailing list [email protected] http://lists.laptop.org/listinfo/sugar

