https://bugzilla.wikimedia.org/show_bug.cgi?id=15161

--- Comment #6 from C. Scott Ananian <[email protected]> ---
Some discussion from IRC.  Pig Latin would be a good english variant to explore
some of the non-reversible language variant pairs (like Arabic/Latin).

(12:03:51 PM) cscott: James_F: i've been talking to liangent about language
converter.  it would be nice if VE could present the text to be edited in the
proper language variant.  the way that VE/parsoid selser works makes this
feasible I think.
(12:04:40 PM) cscott: that is, we convert all the article text, but we only
re-save the edited bits (in the converted variant).  needs some thought wrt how
diffs appear, etc.
(12:04:48 PM) James_F: cscott: That sounds totally feasible - your talking
about VE requesting zh-hans or zh-hant (or whatever) from Parsoid and showing
that?
(12:05:23 PM) cscott: James_F: something like that.  not sure where in the
stack language conversion will live exactly.  gwicke_away is talking about it
as a post-processing pass.
(12:05:42 PM) cscott: this would also allow language converter to work on
portuguese and even en-gb/en-us.
(12:06:32 PM) cscott: ie, you always see 'color' in VE even if the source text
was 'colour', but it doesn't get re-saved as 'color' unless you edit the
sentence containing the word. (or paragraph?  or word?)
(12:06:34 PM) James_F: Like link target hinting.
(12:07:04 PM) James_F: Selser is paragraph-level right now, I think?
(12:07:39 PM) cscott: i'm not sure, but i think so.  html element-level.
(12:08:39 PM) cscott: it might be that we want to be more precise for better
variant support -- or maybe not.  maybe element-level marking of lang= is right
(it avoids adding spurious <span> tags just to record the language variant) and
we just want to be smarter about how we present diffs.
(12:09:18 PM) cscott: ie, color->colour shouldn't appear as a diff.  (or for
serbian, the change from latin to cyrillic alphabet shouldn't be treated as a
diff, if the underlying content is the same)
(12:10:36 PM) cscott there are some tricky issues -- for some language pairs
one encoding has strictly more information than the other.  ie, in languages
with arabic and latin orthographies, uppercase letters are specific to the
latin script.  so if the user writes the text natively in arabic, we won't
necessarily know the correct capitalization (and the capitalization of the rest
of the paragraph might be lost).
(12:11:05 PM) cscott: so lots of details.  but we should be able to handle the
'easy' cases (where the languages convert w/o information loss) first.
(12:11:22 PM) ***cscott wonders if pig latin is a reversible transformation
(12:17:06 PM) MatmaRex: cscott: it's not, i'm afraid
(12:17:28 PM) MatmaRex: unless you rely on a dictionary
(12:17:39 PM) MatmaRex: as appleway might come from apple or wapple, i think
(12:17:41 PM) cscott: MatmaRex: well, i guess that makes it a great stand-in
for the 'tricky' languages.
(12:18:20 PM) cscott: so much the better. ;)
(12:18:29 PM) MatmaRex: it's only the words starting with a vowel that are
troublesome, though
(12:20:59 PM) cscott: i think the idea is that, if i edit in en-pig and type
'appleway' it should get saved as appleway and probably a default translation
into en-us should be made?  (ie, in the latin/arabic pairs, assume lowercase). 
There should be a specific UX affordance in VE to specify both sides of the
variant, which serializes into -{en-pig:appleway,en-us:apple,en-gb:apple}-.
(12:24:45 PM) cscott: i guess when you edit text which was originally in en-us,
it needs to be converted to -{en-us:apple,en-pig:appleway}- by the language
converter so that information isn't lost when the edited en-pig text is saved
back.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to