Re: a character for an unknown character

Martin Mueller Fri, 23 Dec 2016 13:25:50 -0800

That’s excellent advice. In our project somebody confused the bullet with the 
black circle. It didn’t matter, because 17th century texts don’t have bullet 
symbols—at least not the ones we’re dealing with. But following your advice 
would significantly reduce ambiguity

From: <[email protected]> on behalf of Philippe Verdy <[email protected]>
Reply-To: Philippe Verdy <[email protected]>
Date: Friday, December 23, 2016 at 1:35 PM
To: Martin Mueller <[email protected]>
Cc: William_J_G Overington <[email protected]>, "[email protected]" 
<[email protected]>
Subject: Re: a character for an unknown character

if you want something that is very unlikely to be present in original texts, it 
would be preferable to avoid the black dot or any other bullets which may be 
used as punctuation marks.

Consider using some geometric shape, notably those inherited from DOS code 
pages, such as the filled square U+2588 (█). It is mapped in many common fonts, 
only because it is part of legacy code page 437 (at position 0xDB=219 decimal) 
and most other codepages for MSDOS. It may be used in legacy encoded texts for 
MSDOS but only for presentation purpose (using monospaced fonts for text-only 
terminals) where it should not match any use for missing/damaged parts of an 
original document printed/handwritten document on paper (those DOS texts should 
have no original version on paper, they are originately only in encoded files 
on computers).

It is easily entered on keyboards using Alt+219 (**not** Alt+0219) on Windows 
(it works using the current OEM 8-bit codepage, which may be CP437, CP850 or 
similar).

There's also the half-filled square U+2584 (▄), at position 0xDC=218 decimal in 
CP437/CP850 (i.e. Alt+218 on Windows keyboards)  if you want to avoid filling 
the full lineheight and being able to discriminate multiple rows of text.

Or the filled squared with dark grey pattern U+2593 (▓), at position 0xB2=178 
(i.e. Alt+178 on Windows keyboards) if you want to still see it with text 
selection. Its gray pattern is also intuitively meaning "missing part".

All these geometric shapes are symbols, not punctuations, and very unlikely to 
be used as bullet punctuations in documents and not confusable with any other 
characters for actual text. They are also ignored in plain text searches, i.e. 
not considered as variants of a significant dot, and there's also a word break 
before and after them (so they won't collapse into surrounding words written 
before or after them). They are also typically used to replace words that have 
been voluntarily deleted/hidden from an original document (becaue there's a 
need for keeping this info private).

But note that input fields for entering password or secret codes in application 
forms/dialogs are typically using black bullets U+2022 (•) or simply ASCII 
asterisks U+002A (*) to replace the entered characters: they cannot be read, 
but the user knows what he is entering on his keyboard.

2016-12-23 0:35 GMT+01:00 Martin Mueller 
<[email protected]<mailto:[email protected]>>:
These are very handsome and interesting. But for the purposes of my project, 
which involves folks here, there, and everywhere working on editorial problems 
relating to digital transcriptions of Early Modern texts, the cardinal 
requirement is that the character can be found on and deployed from any 
Windows, Linux, or OS 10 machin. We have used the black dot (\u25cf) as a 
kludge. Since it does not occur in the source data, there is no ambiguity. It 
is relatively easy to produce on a keyboard. From a visual perspective it is 
preferable to the diamond with a question mark—although that is semantically 
more obvious. But it is visually very disruptive, and it is much harder to find 
on a standard character map than the black dot, which is predictably located in 
geometrical shapes.

It’s a kludge, but it works, and it looks to me superior to any of the 
alternatives. But I can be persuaded otherwise.

With thanks for the help of all of you

MM

On 12/22/16, 6:03 AM, "William_J_G Overington" 
<[email protected]<mailto:[email protected]>> wrote:

    Martin Mueller wrote:

    > Is there a Unicode character that says “I represent an alphanumerical 
character, but I don’t know which”.  This is a very common problem in the 
transcription of historical texts where you have lacunas.

    I have been reading this thread with interest.

    I have produced nine designs for glyphs.

    If you so choose, you can assign specific meanings to one, some, or all of 
them. If you need more than nine designs please say.

    Please find attached nine .png files, one glyph design in each file.

    The size of each of the images and the names of the files follow the 
following specification.

https://urldefense.proofpoint.com/v2/url?u=http-3A__www.unicode.org_emoji_selection.html-23images&d=CwIFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=rG8zxOdssqSzDRz4x1GLlmLOW60xyVXydxwnJZpkxbk&m=rtTUf0iueQJPUWv8oFWfDyJBHafFPYQJ5mZelPYN_mE&s=VMzwU8ONTcLHvFcK5hcR9yj5TT3SzYSs-YYB8IGRq_A&e=

    However the images are not congruently in accordance with those rules as 
there is a one pixel width transparent surround as the designs were made using 
filled rectangles upon a theoretical seven row by seven column arrangement of 
blocks, each block ten pixels by ten pixels. I used the Serif PagePlus X7 
desktop publishing program.

    The characters are not intended as emoji, I just applied the above 
specification as it is convenient to make the designs compatible with that 
specification as far as possible.

    I have assigned Private Use Area code points of U+EA60 through to U+EA68 to 
the glyphs. The specific code point for each glyph is indicated in the file 
name of the image of that glyph.

    I have chosen those code points as the Alt codes for U+EA60 through to 
U+EA68 are Alt 60000 through to Alt 60008 respectively. My thinking being that 
if the designs are implemented in fonts that those easy to remember Alt codes 
might be helpful to someone using the Microsoft WordPad program.

    I checked that those code points are not being used in the Medieval Unicode 
Font Initiative.

https://urldefense.proofpoint.com/v2/url?u=http-3A__skaldic.abdn.ac.uk_db.php-3Fcp-3DEA-26if-3Dmufi-26table-3Dmufi-5Fchar&d=CwIFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=rG8zxOdssqSzDRz4x1GLlmLOW60xyVXydxwnJZpkxbk&m=rtTUf0iueQJPUWv8oFWfDyJBHafFPYQJ5mZelPYN_mE&s=z5-Sl6Aw2Dr0dYsoZ9xgzqCpXjzoot1TnwUrJKqNHpo&e=

    Readers who so choose are welcome to implement these glyphs in fonts.

    The 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.unicode.org_emoji_selection.html-23images&d=CwIFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=rG8zxOdssqSzDRz4x1GLlmLOW60xyVXydxwnJZpkxbk&m=rtTUf0iueQJPUWv8oFWfDyJBHafFPYQJ5mZelPYN_mE&s=VMzwU8ONTcLHvFcK5hcR9yj5TT3SzYSs-YYB8IGRq_A&e=
  specification mentions licensing. For the avoidance of doubt these designs 
are free to share and use.

    A Private Use Area solution is not ideal, yet may be helpful in getting 
things started and could be helpful in establishing usage, which could help in 
getting the characters implemented into regular Unicode.

    I am attaching the images to this email. The nature of the email system is 
that the order of the images might not be in the order of the code points, yet 
each image has an indication of the code point within its name so that 
information should help to resolve any such problem in the transmission of the 
email attachments.

    William Overington

    Thursday 22 December 2016

Re: a character for an unknown character

Reply via email to