If QAMATS and SHEVA are to be split as suggested, I propose that the split 
variants be completely separate, i.e. new symbols for QAMATS QATAN and QAMATS 
GADOL, leaving the existing QAMATS as is for the undifferentiated QAMATS. This 
is necessary because there are cases of disagreement whether the QAMATS is 
QATAN or GADOL, for example in the name נָעֳמִי (Naomi). See 
https://hebrew-academy.org.il/%d7%a6%d6%b8%d7%94%d6%b3%d7%a8%d6%b7%d7%99%d6%b4%d7%9d-%d7%a0%d6%b8%d7%a2%d6%b3%d7%9e%d6%b4%d7%99-%d7%94%d7%92%d7%99%d7%99%d7%aa-%d7%a7%d7%9e%d7%a5-%d7%9c%d7%a4%d7%a0%d7%99-%d7%97%d7%98/

The situation with SHEVA is similar in that there are cases where there is 
disagreement, for example in words like כִּתְבִי. 
 
Best Regards,

Jonathan Rosenne

-----Original Message-----
From: Unicode <[email protected]> On Behalf Of Mark E. Shoulson 
via Unicode
Sent: Friday, October 24, 2025 7:55 PM
To: [email protected]
Subject: Recent Hebrew Proposals

I wasn't watching the document registry carefully enough; Hebrew proposals are 
often things I feel I can help with.  Let's see if I can avoid weighing in just 
for the sake of talking.  I have my doubts.

WRT the sheva na/heavy sheva, there were indeed already *some* imprints that 
made that distinction back when we proposed QAMATS QATAN, but they were quite 
few.  There are more printers who want to make this distinction now, maybe 
typographic style has changed, and perhaps shva na does need encoding now.  The 
same for dagesh hazaq.  I have a scan (attached, if the list permits) from way 
back when from a source (Koren) that didn't and doesn't distinguish dagesh qal 
from dagesh hazaq... but nonetheless has a subtle but distinct difference 
between a VAV with a dagesh and a VAV with a shuruq-dot (see at the end of the 
second word from the left, two vavs, but the dot in the first one is just a bit 
higher than the second?  That's on purpose).

With regard to the "double duty" that these characters would involve, well, 
there is something to that.  It is indeed much the same as when I proposed 
QAMATS QATAN and HOLAM HASER FOR VAV: there is a long tradition of NOT 
distinguishing these symbols, they were long considered the same symbol (even 
if they had different semantic meanings), and many (most?) printers will carry 
on not distinguishing them... but we want to support a growing segment of 
publishers that are making the distinction.  That's sort of the situation we're 
in, and I suppose the most completely unambiguous approach would be to leave, 
say HEBREW POINT SHEVA for the lumpers and encode *both* HEBREW POINT SHEVA 
MOBILE and also HEBREW POINT QUIESCENT SHEVA for the splitters.  But I think 
most here would agree that that would be excessive, and since it's the mobile 
sheva and the heavy dagesh that are being given new emphasized shapes for the 
most part, it makes sense to split them off and leave the rest und!
 istinguished.

Document L2/25-237 draws a distinction between QAMATS QATAN and the case of 
ATNAH HAFUKH, and there is an important difference.  A distinct shape for 
QAMATS QATAN is a recent innovation; it was never part of classical Hebrew 
orthography but has been introduced within the past century(?) and gained 
traction, sufficient to be worth considering.  ATNAH HAFUKH actually has the 
opposite problem.  As we showed in the proposal for ATNAH HAFUKH, it formerly 
*was* written distinctly from YERAH BEN YOMO in old MSS, including the Aleppo 
Codex, and only later (probably with the advent of printing) were the two 
symbols conflated.  So even if it did not become common in current Hebrew 
printing to show them distinctly, it would still have been a good idea to 
disunify them in order to transcribe such MSS accurately.

Regarding L2/25-242, proposing "helper" accents for preposed/postposed accents, 
I am opposed.  These "helpers" were never considered "different symbols" from 
the real ones, but only copies placed more conveniently to help the reader.  In 
fact, I would say that for Zarqa, after using 
U+05AE HEBREW ACCENT ZINOR for the "main" postposed accent, one should
NOT use U+U0598 HEBREW ACCENT ZARQA for the "helper" even though it has the 
right look and positioning (the names of these accents are a known anomaly, see 
https://www.unicode.org/notes/tn27/ appendix A).  Rather, one should use U+05AE 
HEBREW ACCENT ZINOR for both of them, and the font should know to position the 
non-final one differently.  Same for PASHTA; in my opinion, one should use 
U+0599 HEBREW ACCENT PASHTA for both the main and helper, and not use U+05A8 
HEBREW ACCENT QADMA.  After all, both symbols are pashtas!  Just one is written 
in the wrong place to help you out.

Using the font to reposition things might be "fragile", but that doesn't make 
it wrong.  That kind of positioning really is the font's (and
font-renderer's) job to keep straight, not the encoding.  And BTW, I don't 
think I've ever seen a "helper DEHI" anywhere, so that one is a solution in 
search of a problem.  I know that the MCE used different codings for those 
"helpers" (I regularly use MCE, still reading Hebrew texts encoded in plain 
ASCII letters, though I wouldn't recommend it to anyone); I'm not sure that 
argues much one way or the other.  MCE also, I think, encoded preposed accents 
*before* their letters, which is definitely contrary to Unicode's principles, 
as well as coding VAV + HOLAM as HOLAM + VAV.  I don't really see that this 
separate encoding really helps much.

I remember years ago someone was asking to encode the "MEAYLA" or "MAYELA" 
accent, on the grounds that it is considered a distinct cantillation by 
scholars, even though it is identical in appearance and placement to TIPEHA and 
can only be distinguished by the fact that it appears in the same (possibly 
hyphenated) word as a "siluq" 
(end-of-verse) or ETNAHTA.  (For that matter, the unification of "siluq" 
with METEG is a far, far nastier problem to deal with, were it not for the fact 
that you can tell the end-of-verse by the following SOF PASUQ).  But there was 
never any distinction between meayla and tipeha except for scholarly debate 
(the meayla even has the same effect on the sequence of cantillations that the 
tipeha has, even though it's technically a connective and not a disjunctive.)

And indeed, I believe the same person also proposed disunifying PASEQ from the 
line used to make a legarmehh (or shalshelet gedola).  (I remember she once 
asked if there were people making Unicode decisions who were NOT font 
designers, as if the problem was that we were all mere grunts making fonts and 
not students of Hebrew.)  And again, there was really no reason: nobody 
(almost?) ever made that distinction in writing, and Unicode is here to encode 
things that are *written*, not things that we think about.

Now, perhaps the situation is different.  Maybe the paseq vs legarmehh line is 
starting to be recognized by printers, as in the examples shown.  Is it 
widespread enough to really matter?  That's another question.  I think the 
previous suggestion (not an actual proposal) was for there to be a "LEGARMEHH 
LINE" codepoint as distinct from PASEQ, and I guess the PASEQ line would do 
"double duty"; this proposal is the other way around, coding a "PASEQ NOT 
LEGARMEHH" point.  That feels more complicated and harder to understand, but 
makes sense numerically, since there are more LEGARMEHHs than PASEQs.  (The 
problem is that the symbol has always been commonly referred to as PASEQ, with 
people saying "and then a legarmehh has a line after it that looks like a 
PASEQ...") Again, this is a "qamats qatan" type problem, not an "atnah hafukh" 
type problem ("No manuscript distinguishes paseq from legarmeh in the way that 
some recent publications do", from the proposal).  And lest you !
 think I am insensitive to the situation, I have indeed years ago written a 
program for parsing Biblical verses according to cantillations that runs up 
against this exact problem.  (I just haven't bothered to address the issue 
seriously.)

~mark

Reply via email to