https://bugzilla.wikimedia.org/show_bug.cgi?id=15161
--- Comment #6 from C. Scott Ananian <[email protected]> --- Some discussion from IRC. Pig Latin would be a good english variant to explore some of the non-reversible language variant pairs (like Arabic/Latin). (12:03:51 PM) cscott: James_F: i've been talking to liangent about language converter. it would be nice if VE could present the text to be edited in the proper language variant. the way that VE/parsoid selser works makes this feasible I think. (12:04:40 PM) cscott: that is, we convert all the article text, but we only re-save the edited bits (in the converted variant). needs some thought wrt how diffs appear, etc. (12:04:48 PM) James_F: cscott: That sounds totally feasible - your talking about VE requesting zh-hans or zh-hant (or whatever) from Parsoid and showing that? (12:05:23 PM) cscott: James_F: something like that. not sure where in the stack language conversion will live exactly. gwicke_away is talking about it as a post-processing pass. (12:05:42 PM) cscott: this would also allow language converter to work on portuguese and even en-gb/en-us. (12:06:32 PM) cscott: ie, you always see 'color' in VE even if the source text was 'colour', but it doesn't get re-saved as 'color' unless you edit the sentence containing the word. (or paragraph? or word?) (12:06:34 PM) James_F: Like link target hinting. (12:07:04 PM) James_F: Selser is paragraph-level right now, I think? (12:07:39 PM) cscott: i'm not sure, but i think so. html element-level. (12:08:39 PM) cscott: it might be that we want to be more precise for better variant support -- or maybe not. maybe element-level marking of lang= is right (it avoids adding spurious <span> tags just to record the language variant) and we just want to be smarter about how we present diffs. (12:09:18 PM) cscott: ie, color->colour shouldn't appear as a diff. (or for serbian, the change from latin to cyrillic alphabet shouldn't be treated as a diff, if the underlying content is the same) (12:10:36 PM) cscott there are some tricky issues -- for some language pairs one encoding has strictly more information than the other. ie, in languages with arabic and latin orthographies, uppercase letters are specific to the latin script. so if the user writes the text natively in arabic, we won't necessarily know the correct capitalization (and the capitalization of the rest of the paragraph might be lost). (12:11:05 PM) cscott: so lots of details. but we should be able to handle the 'easy' cases (where the languages convert w/o information loss) first. (12:11:22 PM) ***cscott wonders if pig latin is a reversible transformation (12:17:06 PM) MatmaRex: cscott: it's not, i'm afraid (12:17:28 PM) MatmaRex: unless you rely on a dictionary (12:17:39 PM) MatmaRex: as appleway might come from apple or wapple, i think (12:17:41 PM) cscott: MatmaRex: well, i guess that makes it a great stand-in for the 'tricky' languages. (12:18:20 PM) cscott: so much the better. ;) (12:18:29 PM) MatmaRex: it's only the words starting with a vowel that are troublesome, though (12:20:59 PM) cscott: i think the idea is that, if i edit in en-pig and type 'appleway' it should get saved as appleway and probably a default translation into en-us should be made? (ie, in the latin/arabic pairs, assume lowercase). There should be a specific UX affordance in VE to specify both sides of the variant, which serializes into -{en-pig:appleway,en-us:apple,en-gb:apple}-. (12:24:45 PM) cscott: i guess when you edit text which was originally in en-us, it needs to be converted to -{en-us:apple,en-pig:appleway}- by the language converter so that information isn't lost when the edited en-pig text is saved back. -- You are receiving this mail because: You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
