Bram Moolenaar wrote:
Tony Mechelynck wrote:
In languages using accented letters, the Vim spell checker doesn't recognise
HTML entities (in HTML text): for example, the letters outside of the &...;
entities are highlighted as "spellBad" (after ":set spell spelllang=fr") in
the following French words:
où meaning: où (where)
après après (after)
cérémonie cérémonie (ceremony)
courrouça courrouça ([he] angered)
désespéré désespéré (desperate)
nécessaire nécessaire (necessary)
année année (year)
etc.
They are perfectly valid French words, if one takes into account the following
equivalences:
ù = ù
è = è
é = é
ç = ç
etc.
I don't know how to solve the problem; maybe an "interpretation layer" to
resolve the entities between the HTML text and the French (or other
non-English language) dictionary?
Well, words with HTML things in them are NOT French words. Why don't
you use utf-8 encoded HTML?
I started that particular site some years ago, in 7-bit ASCII plus entities.
I'm loath to change it now, and risk making it incompatible with some older
browsers. It already holds quite a bit of text.
I disagree with the statement that these words are not French words. In an
HTML file, where HTML syntax must be taken into account, they are.
If you really want to recognize these words, you could take the French
dictionary, do a global replace and build a spell file from that.
Actually, I don't use spell (I am blessed with a good sense of orthography);
but I wondered if there couldn't (someday) be a solution for people who don't
share the same blessing.
The proposed solution would mean creating an additional spell file, slightly
larger than the French dictionary, for use only with HTML text. I'm not
convinced of such a solution's viability, especially since it would have to be
repeated for German, Swedish, Turkish, Polish, etc., etc., etc. Maybe even for
words like risqué and garçon in English.
You'll have to check if using & and ; in the middle of a word is causing
trouble. Adding them to word characters will probably create different
problems.
The semicolon can also mean a semicolon, which is a punctuation mark and not a
word character, and can be used as such after a word with no intervening space
(or with preceding it, depending on typesetting conventions). The case
of the ampersand is simpler: to obtain a true ampersand in the rendered text,
one must use one of & (symbolic entity) & (decimal entity) or &
(hex entity) in the HTML.
Best regards,
Tony.