Re: Bug report : Spell checking doesn't know about HTML entities

A.J.Mechelynck Thu, 22 Mar 2007 23:11:05 -0800

Bram Moolenaar wrote:

Tony Mechelynck wrote:
In languages using accented letters, the Vim spell checker doesn't recogniseHTML entities (in HTML text): for example, the letters outside of the &...;entities are highlighted as "spellBad" (after ":set spell spelllang=fr") inthe following French words:
o&ugrave;                       meaning: où             (where)
apr&egrave;s                             après          (after)
c&eacute;r&eacute;monie                  cérémonie      (ceremony)
courrou&ccedil;a                         courrouça      ([he] angered)
d&eacute;sesp&eacute;r&eacute;           désespéré      (desperate)
n&eacute;cessaire                        nécessaire     (necessary)
ann&eacute;e                             année          (year)

etc.
They are perfectly valid French words, if one takes into account the followingequivalences:
&ugrave; = ù
&egrave; = è
&eacute; = é
&ccedil; = ç
etc.
I don't know how to solve the problem; maybe an "interpretation layer" toresolve the entities between the HTML text and the French (or othernon-English language) dictionary?
Well, words with HTML things in them are NOT French words.  Why don't
you use utf-8 encoded HTML?

I started that particular site some years ago, in 7-bit ASCII plus entities.I'm loath to change it now, and risk making it incompatible with some olderbrowsers. It already holds quite a bit of text.

I disagree with the statement that these words are not French words. In anHTML file, where HTML syntax must be taken into account, they are.


If you really want to recognize these words, you could take the French
dictionary, do a global replace and build a spell file from that.

Actually, I don't use spell (I am blessed with a good sense of orthography);but I wondered if there couldn't (someday) be a solution for people who don'tshare the same blessing.

The proposed solution would mean creating an additional spell file, slightlylarger than the French dictionary, for use only with HTML text. I'm notconvinced of such a solution's viability, especially since it would have to berepeated for German, Swedish, Turkish, Polish, etc., etc., etc. Maybe even forwords like risqué and garçon in English.


You'll have to check if using & and ; in the middle of a word is causing
trouble.  Adding them to word characters will probably create different
problems.

The semicolon can also mean a semicolon, which is a punctuation mark and not aword character, and can be used as such after a word with no intervening space(or with   preceding it, depending on typesetting conventions). The caseof the ampersand is simpler: to obtain a true ampersand in the rendered text,one must use one of & (symbolic entity) & (decimal entity) or &(hex entity) in the HTML.



Best regards,
Tony.

Re: Bug report : Spell checking doesn't know about HTML entities

Reply via email to