On 10/19/2011 01:40 PM, Andreas Prilop wrote:
There are three so-called "Yiddish digraphs" in Unicode:
   U+05F0   wawayim
   U+05F1   waw yod
   U+05F2   yodayim

What is specifically Yiddish about these digraphs?
They can be used in the same way in Hebrew.
But this isn't done. Why not?

http://he.wikipedia.org/wiki/%F8%E9%E9_%F7%E5%F8%F6%E5%E5%E9%E9%EC
http://he.wikipedia.org/wiki/%F8%D6_%F7%E5%F8%F6%D4%D6%EC

Why should Yiddish be written with special digraphs
but Hebrew with sequences of two letters?

But even in Yiddish, the digraphs are not really used:

http://yi.wikipedia.org/wiki/%F8%F2%F7%E9%E0%E5%E5%E9%F7
http://yi.wikipedia.org/wiki/%F8%F2%F7%E9%E0%D4%E9%F7


The Unicode Standard says:
| ... to distinguish the digraph double vav from an occurrence
| of a consonantal vav followed by a vocalic vav.

By that reasoning you would need an English digraph "sh"
to distinguish "sh" in "shit" from "s-h" in ***hole. ;-)

I think the issue here is (probably) a matter of legacy encodings, though someone else would need to confirm that. It is true that in Yiddish the double-vav, vav-yod, and double-yod digraphs are considered separate letters, but the same is true of Welsh "ch", which we know does not get its own code-point. Similarly, U+FB2E HEBREW LETTER ALEF WITH PATAH is just the same thing as an ordinary ALEF with a PATAH vowel-point, and indeed has just that as its canonical decomposition, so even Unicode considers the two codings to be identical (right? or mostly identical at least), and the same for much of the rest of the Hebrew Alphabetic Presentation block, U+FB1D - U+FB4F. Modern Hebrew likely borrowed the special use (in unpointed text) of double-vav and double-yod from Yiddish, but they are not normally considered separate letters in Hebrew.

The only reason I can think of for these characters having their own code-points is the same reason that U+00E1 LATIN LETTER SMALL A WITH ACUTE has its own code-point, despite being just an "a" with a combining acute, or that U+0149 LATIN SMALL LETTER N PRECEDED BY APOSTROPHE has its own, or that U+0064 LATIN SMALL LETTER DZ has its own: presumably, there was some earlier encoding that had it and was required for round-tripping (interesting that the Latin examples have *compatibility* decompositions, and the Hebrew/Yiddish digraphs don't even have that).

The case of U+FB1F HEBREW LIGATURE YIDDISH YOD YOD PATAH, which you did not mention, is a different situation, in that the patah is written under *both* yods, so it can't truly be said to decompose into ordinary Hebrew letters.

If there wasn't an earlier standard, I don't really have a good answer that isn't contradicted by other examples. I thought it was in Latin-8, but I don't see it when I look it up.

~mark


Reply via email to