Re: Private Use areas

2018-08-28 Thread Asmus Freytag via Unicode

  
  
On 8/27/2018 2:20 PM, Rebecca
  Bettencourt via Unicode wrote:


  
  

  

  

  
> That
sounds like a non-conformant use of characters
in the U+24xx block.

Well, you are an expert on these things and I do
not understand as to with what it would be
non-conformant.
  
  

  



A conformant process must interpret ⓅⓊⒶⒹⒶⓉⒶ as the
  characters ⓅⓊⒶⒹⒶⓉⒶ and not as a signal to process what
  follows as anything other than plain text.
  

  

Not correct.
If that was literally true, then all HTML, XML, CSS, C, C#, Java,
  Python source code files and their compilers would be
  non-conformant.
It's more like, "if a process treats a sequence of bytes as
  Unicode plain text, then the bytes corresponding to the codes
  assigned to ⓅⓊⒶⒹⒶⓉⒶ just stand for ⓅⓊⒶⒹⒶⓉⒶ. Any meaning is
  imparted by the (human) reader."
However, if the process treats the file as a source file in a
  markup language, there's nothing that prevents it from assigning
  particular interpretations to ⓅⓊⒶⒹⒶⓉⒶ, including, but not limited
  to not displaying these code points as characters.
The interpretation of the remainder of the file may well be
  conformant to the Unicode Standard, just as the display of the
  contents of many HMTL elements is usually conformant to the
  Unicode Standard.

  

  


What you are proposing is a higher-level protocol,
  whether you realize it or not. 
  

  

Correct, the rub here is that all these schemes that treat
  characters as both syntax and text depending on context amount to
  mark-up languages and are therefore ipso-facto no longer plain
  text (except if displayed as source code, but already applying
  syntax coloring would no longer be purely treating the data as
  plain text).

In-band markup has thus a dual nature as plain text and rich
  text, depending on how it is processed.



  

  
Unfortunately your higher-level protocol has a serious
  flaw in that it cannot represent the string "ⓅⓊⒶⒹⒶⓉⒶ". 
  

  

That could probably be remedied by the usual techniques.


  

  
Also, seeing a bunch of circled alphanumeric characters
  in a document ⓘⓢ◯ⓕⓐⓡ◯ⓕⓡⓞⓜ◯ⓤⓝⓞⓑⓣⓡⓤⓢⓘⓥⓔ.
  

  

:)


  

  


There are plenty of already-existing higher-level
  protocols (you mentioned one: XML) that could be used to
  provide information about PUA characters, and they are all
  much better suited to that purpose than what you are
  proposing.


  

  

There are situations where an ad-hoc markup language seems to
  fulfill a need that is not well served by the existing
  full-fledged markup languages. You find them in internet "bulletin
  boards" or services like GitHub, where pure plain text is too
  restrictive but the required text styles purposefully limited -
  which makes the syntactic overhead of a full-featured mark-up
  language burdensome.
Too bad that there's been no "winner" among these, and therefore
  no universally accepted one. If so, it might have presented an
  obvious target for a PUA extension.
A./

  



Re: Line wrapping of mixed LTR/RTL text

2018-08-28 Thread Eli Zaretskii via Unicode
> Date: Tue, 28 Aug 2018 13:44:58 +0300
> From: Cosmin Apreutesei via Unicode 
> 
> There is this sentence in UAX#9 which provides a clue: "[...] trailing
> whitespace will appear at the visual end of the line (in the paragraph
> direction).". I'm not sure what that means, but by doing some tests
> with fribidi and libunibreak I noticed that the whitespace always
> sticks to the logical end of the word (so visually to the right for
> LTR runs and to the left for RTL runs), regardless of the base
> paragraph direction.

That is not so if the line ends after the whitespace: in that case the
whitespace is trailing, and will appear at the visual end of the
line.  Only if you add some character after the whitespace will the
whitespace "jump" to the other side of the word.

> Quick example showing the problem. The following text:
> 
> لمفاتيح ABC DEF
> 
> with RTL base direction would wrap (for a certain line width) as:
> 
> ABC  لمفاتيح
> DEF
> 
> with two spaces between the Latin and Arabic text, one from the Latin
> text and one from the Arabic text.

No, it should show the space after ABC to the left of ABC,
i.e. immediately before the line end.

What UAX#9 tells you is that you need to decide that the line will
wrap after the space that follows "ABC", the reorder the line as if it
ended after that space, which will produce this:

لمفاتيح ABC 

(with the trailing space to the left of "ABC").  Then you should
display "DEF" on the next line.

IOW, the correct order is:

  . find levels
  . wrap in logical order
  . reorder wrapped lines



Line wrapping of mixed LTR/RTL text

2018-08-28 Thread Cosmin Apreutesei via Unicode
Hello everyone,

I'm having a bit of trouble implementing line wrapping with bidi and I
would like to ask for some advice or hints on what is the proper way
to do this.

UAX#9 section 3.4 says that bidi reordering should be done after line
wrapping. But in order to do line wrapping correctly I need to be able
to visually ignore some whitespace, and I'm not sure exactly which
whitespace must be ignored.

There is this sentence in UAX#9 which provides a clue: "[...] trailing
whitespace will appear at the visual end of the line (in the paragraph
direction).". I'm not sure what that means, but by doing some tests
with fribidi and libunibreak I noticed that the whitespace always
sticks to the logical end of the word (so visually to the right for
LTR runs and to the left for RTL runs), regardless of the base
paragraph direction. Is it safe to use this assumption and always
remove the whitespace at the logical end of the last word of the line?
Or is it more complicated than that?

Quick example showing the problem. The following text:

لمفاتيح ABC DEF

with RTL base direction would wrap (for a certain line width) as:

ABC  لمفاتيح
DEF

with two spaces between the Latin and Arabic text, one from the Latin
text and one from the Arabic text. Since the line logically ends with
the "C" and LTR direction, I should have to probably remove the space
after the "C" (and, as a rule, just remove the whitespace at the
logical end of the word, regardless of paragraph's direction or word's
direction). Is this the right way to do it?

Screenshots attached.

Thanks!


Re: Private Use areas

2018-08-28 Thread William_J_G Overington via Unicode
Asmus Freytag wrote:

> There are situations where an ad-hoc markup language seems to fulfill a need 
> that is not well served by the existing full-fledged markup languages. You 
> find them in internet "bulletin boards" or services like GitHub, where pure 
> plain text is too restrictive but the required text styles purposefully 
> limited - which makes the syntactic overhead of a full-featured mark-up 
> language burdensome.

I am thinking of such an ad-hoc special purpose markup language.

I am thinking of something like a special purpose version of the FORTH computer 
language being used but with no user definitions, no comparison operations and 
no loops and no compiler. Just a straight run through as if someone were typing 
commands into FORTH in interactive mode at a keyboard. Maybe no need for spaces 
between commands. For example, circled R might mean use Right-to-left text 
display.

I am thinking that there could be three stacks, one for code points and one for 
numbers and one for external reference strings such as for accessing a web page 
or a PDF (Portable Document Format) document or listing an International 
Standard Book Number and so on. Code points could be entered by circled H 
followed by circled hexadecimal characters followed by a circled character to 
indicate Push onto the code point stack. Numbers could be entered in base 10, 
followed by a circled character to mean Push onto the number stack. A later 
circled character could mean to take a certain number of code points (maybe 
just 1, or maybe 0) from the character stack and a certain number of numbers 
(maybe just 1, or maybe just 0) from the number stack and use them to set some 
property.

It could all be very lightweight software-wise, just reading the characters of 
the sequence of circled characters and obeying them one by one just one time 
only on a single run through, with just a few, such as the circled digits, each 
having its meaning dependent upon a state variable such as, for a circled 
digit, whether data entry is currently hexadecimal or base 10.

I am wondering how many PUA property variables there would need to be set for 
the system to be useful.

The sequence could start with all of those PUA property values set at their 
default values so only those that needed changing need be explicitly set, 
though others could be explicitly set to the default values if a record were 
desired. 

William Overington
 
Tuesday 28 August 2018



Re: Private Use areas

2018-08-28 Thread William_J_G Overington via Unicode
James Kass wrote:

> Non-conformant?  Well, it's probably overkill anyway.  A simpler method of 
> identifying which PUA convention is being used for a file
would be to either have the first line of the file being something like 
[PUA1] or to have the file name be something like MYFILE.TXTPUA1.  
Where "PUA1" equals the CSUR.  Other numbers (PUA2, PUA3, etc.) for 
other PUA conventions.

The problem that then arises is that a registry is needed for what those 
numbers mean, such as PUA01728. So what if someone writes explaining his 
designs for glyphs for the language of the people who live in the northern part 
of the fifth planet from the sun in the science fiction novel he is writing? Is 
registration granted instantly upon request or is there a threshold of some 
sort? What if lots of people do that, including some people wanting a registry 
code number for the various emoji that they want? If there is a threshold of 
proving usage and so on, or of showing that the designs have been produced AT a 
business or AT a college or whatever, then the system will only work for some 
users of the Private Use Areas.

My opinion is that the system needs to be free-standing, with each usage 
possibly self-contained or with an external reference to a document that is 
available. Care would need to be taken to send a copy of any such document to 
deposit libraries such as The British Library so as to ensure long-term 
conservation.

William Overington

Tuesday 28 August 2018



Re: Private Use areas

2018-08-28 Thread William_J_G Overington via Unicode
Hi
 
Mark E. Shoulson wrote:
 
> I'm not sure what the advantage is of using circled characters instead of 
> plain old ascii.
 
My thinking is that "plain old ascii" might be used in the text encoded in the 
file. Sometimes a file containing Private Use Area characters is a mix of 
regular Unicode Latin characters with just a few Private Use Area characters 
mixed in with them. So my suggestion of using circled characters is for 
disambiguation purposes. The circled characters in the PUAINFO sequence would 
not be displayed if a special software program were being used to read in the 
text file, then act upon the information that is encoded using the circled 
characters.
 
My thinking is that using this method just adds some encoded information at the 
start of the text file and does not require the whole document to become 
designated as a file conformant to a particular markup format.
 
William Overington
 
Tuesday 28 August 2018
 


Re: Line wrapping of mixed LTR/RTL text

2018-08-28 Thread Cosmin Apreutesei via Unicode
Hi Eli, thanks for answering! I think I'm getting closer. Just a few
more clarifications if you please.

> That is not so if the line ends after the whitespace: in that case the
> whitespace is trailing, and will appear at the visual end of the
> line.

So only if it's a soft break I should indeed remove the last logical
space, if it's before a hard break then leave it alone.

> Only if you add some character after the whitespace will the
> whitespace "jump" to the other side of the word.

... because the hard break just turned into a soft break and the newly
typed character will appear on the next line with a hard line break
after it, right?

> No, it should show the space after ABC to the left of ABC,
> i.e. immediately before the line end.

Just to make sure, this moving of the last space at the visual end of
the line can only be experienced with a moving cursor, right? I mean
as far as displaying goes (and as far as line width computation for
the purposes of line wrapping goes), that space is just removed,
right?  I'm trying to infer the purpose of moving that space to the
end of the line instead of just removing it: is the idea to always
provide a cursor at the visual end of the line so that typing can
continue there or is there more to it?

> What UAX#9 tells you is that you need to decide that the line will
> wrap after the space that follows "ABC"

... but when computing the line width I should not include the width
of that space, right? since it will not take space in the box in the
end.

>, then reorder the line as if it
> ended after that space, which will produce this:
>
> لمفاتيح ABC
>
> (with the trailing space to the left of "ABC").  Then you should
> display "DEF" on the next line.

You mean it will produce this:

" ABC لمفاتيح"



Re: Line wrapping of mixed LTR/RTL text

2018-08-28 Thread Cosmin Apreutesei via Unicode
Hi Philippe,

> The space encoded just before the logical end of line or linewrap (in the 
> middle of the displayed line) has to be moved at end of the physical line (in 
> the paragraph direction), it should not be kept in the middle.

Ok, that seem to confirm what Eli is saying and it clarifies that
sentence from UAX#9. Thanks!


Re: Private Use areas

2018-08-28 Thread Doug Ewell via Unicode
On August 23, 2011, Asmus Freytag wrote:

> On 8/23/2011 7:22 AM, Doug Ewell wrote:
>> Of all applications, a word processor or DTP application would want
>> to know more about the properties of characters than just whether
>> they are RTL. Line breaking, word breaking, and case mapping come to
>> mind.
>>
>> I would think the format used by standard UCD files, or the XML
>> equivalent, would be preferable to making one up:
>
> The right answer would follow the XML format of the UCD.
>
> That's the only format that allows all necessary information contained
> in one file, and it would leverage of any effort that users of the
> main UCD have made in parsing the XML format.
>
> An XML format shold also be flexible in that you can add/remove not
> just characters, but properties as needed.
>
> The worst thing do do, other than designing something from scratch,
> would be to replicate the UnicodeData.txt layout with its random, but
> fixed collection of properties and insanely many semi-colons. None of
> the existing UCD txt files carries all the needed data in a single
> file.

I don't know if or how I responded 7 years ago, but at least today, I
think this is an excellent suggestion.

If the goal is to encourage vendors to support PUA assignments, using an
exceedingly well-defined format (UAX #42) sitting atop one of the most
widely used base formats ever (XML), with all property information in a
single repository (per PUA scheme), would be great encouragement. I've
devised lots of novel file formats and I think this is one use case
where that would be a real hindrance.

Storing this information in a font, by hook or crook, would lock users
of those PUA characters into that font. At that rate, you might as well
use ASCII-hacked fonts, as we did 25 years ago.
 
--
Doug Ewell | Thornton, CO, US | ewellic.org



Re: Line wrapping of mixed LTR/RTL text

2018-08-28 Thread Eli Zaretskii via Unicode
> From: Cosmin Apreutesei 
> Date: Tue, 28 Aug 2018 21:28:58 +0300
> Cc: unicode@unicode.org
> 
> > That is not so if the line ends after the whitespace: in that case the
> > whitespace is trailing, and will appear at the visual end of the
> > line.
> 
> So only if it's a soft break I should indeed remove the last logical
> space, if it's before a hard break then leave it alone.

Actually, you don't have to remove it, you could leave it.  It's only
an aesthetic issue.

> > No, it should show the space after ABC to the left of ABC,
> > i.e. immediately before the line end.
> 
> Just to make sure, this moving of the last space at the visual end of
> the line can only be experienced with a moving cursor, right? I mean
> as far as displaying goes (and as far as line width computation for
> the purposes of line wrapping goes), that space is just removed,
> right?

As I said, not necessarily.  But it is definitely there when you
reorder characters for display.

> I'm trying to infer the purpose of moving that space to the
> end of the line instead of just removing it

If you remove trailing space, then you need to see it being trailing
before you remove it.  That is the purpose of moving it.

> > What UAX#9 tells you is that you need to decide that the line will
> > wrap after the space that follows "ABC"
> 
> ... but when computing the line width I should not include the width
> of that space, right? since it will not take space in the box in the
> end.

If you will remove the space, then yes.

> You mean it will produce this:
> 
> " ABC لمفاتيح"

Yes.


RE: Private Use areas - Vertical Text

2018-08-28 Thread WORDINGHAM RICHARD via Unicode

> 
> On 27 August 2018 at 15:22 Peter Constable via Unicode 
>  wrote:
> 
> Layout engines that support CJK vertical layout do not rely on the 'vert' 
> feature to rotate glyphs for CJK ideographs, but rather rotate the glyph 90° 
> and switch to using vertical glyph metrics. The 'vert' feature is used to 
> substitute vertical alternate glyphs as needed, such as for punctuation that 
> isn't automatically rotated (and would probably need a differently-positioned 
> alternate in any case).
> 
> Cf. UAX 50.
> 

There have been some pretty confused statements. I believe the observed problem 
is that PUA characters for Zhuang CJK ideographs get rotated when displayed 
vertically rather than left-to-right.

Unicode is doing what it can in this matter:

(a) Zhuang PUA characters are being made individually obsolete.

(b) By default, PUA characters have the value of Vertical_orientation=upright 
as do CJK ideographs.

For CJK ideographs, it is not clear to me when the vert feature (if present) 
would be applied.  Is it only for some codepoints (vo=tu), or is it for all 
that the engine expects to be displayed ‘upright’ in vertical text?  The vrtr 
feature (if present) would be applied when glyphs are to be rotated.  Is it for 
all such glyphs, or only those for which rotation is expected to be inadequate 
(vo=tr)?  It seems that feature vrt2 is to be applied to all glyphs; perhaps 
rotation is the default behaviour when there is no look-up value for a glyph 
that the engine expects to be rotated.  The truly difficult case would be when 
there is no attempt to apply a look-up – possibly vrtr would not apply to 
/p{vo=r}.

I would expect that defining the lookup vrt2 or vrtr to map Zhuang glyphs to 
themselves (or something prerotated) would cure the problem.  This would not 
work for sequences of Zhuang ideographs treated as RTL text - but that is 
unlikely to happen.

Richard.


Re: Line wrapping of mixed LTR/RTL text

2018-08-28 Thread Philippe Verdy via Unicode
The space encoded just before the logical end of line or linewrap (in the
middle of the displayed line) has to be moved at end of the physical line
(in the paragraph direction), it should not be kept in the middle.

If you need to force a linewrap on a non-breaking space (because there's no
other break opportunity to wrap the line elsewhere), then treat that
non-breaking space as a regular breaking space which will also be moved at
end of the row (after the margin on the ending side of the paragraph), and
choose the last non-breaking space on the row; usually, all spaces present
at linewraps (including non-breaking spaces) are compacted. But there are
other style policies that will force the linewrap preferably after a
trailing punctuation or a separator punctuation, or before a leading
punctuation, or just after the last unbreakable cluster that can fit the
row (including ion the middle of words at arbitrary position if there's no
hyphenation process or the script does not support hyphenation, such as
sinograms and kanas).

Where to insert linewraps is very fuzzy and depends on the rendering
context and capabilities of the target device (you cannot scroll a piece of
printed paper, but you can scroll a display with a scrollbar or using
navigation cursors in a width-restricted input field)

Le mar. 28 août 2018 à 16:34, Cosmin Apreutesei via Unicode <
unicode@unicode.org> a écrit :

> Hello everyone,
>
> I'm having a bit of trouble implementing line wrapping with bidi and I
> would like to ask for some advice or hints on what is the proper way
> to do this.
>
> UAX#9 section 3.4 says that bidi reordering should be done after line
> wrapping. But in order to do line wrapping correctly I need to be able
> to visually ignore some whitespace, and I'm not sure exactly which
> whitespace must be ignored.
>
> There is this sentence in UAX#9 which provides a clue: "[...] trailing
> whitespace will appear at the visual end of the line (in the paragraph
> direction).". I'm not sure what that means, but by doing some tests
> with fribidi and libunibreak I noticed that the whitespace always
> sticks to the logical end of the word (so visually to the right for
> LTR runs and to the left for RTL runs), regardless of the base
> paragraph direction. Is it safe to use this assumption and always
> remove the whitespace at the logical end of the last word of the line?
> Or is it more complicated than that?
>
> Quick example showing the problem. The following text:
>
> لمفاتيح ABC DEF
>
> with RTL base direction would wrap (for a certain line width) as:
>
> ABC  لمفاتيح
> DEF
>
> with two spaces between the Latin and Arabic text, one from the Latin
> text and one from the Arabic text. Since the line logically ends with
> the "C" and LTR direction, I should have to probably remove the space
> after the "C" (and, as a rule, just remove the whitespace at the
> logical end of the word, regardless of paragraph's direction or word's
> direction). Is this the right way to do it?
>
> Screenshots attached.
>
> Thanks!
>


RE: Private Use areas - Vertical Text

2018-08-28 Thread via Unicode

Dear Richard and Peter,

apologies for the lack of clarity. Let me try to explain below.

On 2018-08-29 01:13, WORDINGHAM RICHARD via Unicode wrote:

On 27 August 2018 at 15:22 Peter Constable via Unicode
 wrote:

Layout engines that support CJK vertical layout do not rely on the
'vert' feature to rotate glyphs for CJK ideographs, but rather
rotate the glyph 90° and switch to using vertical glyph metrics.
The 'vert' feature is used to substitute vertical alternate glyphs
as needed, such as for punctuation that isn't automatically rotated
(and would probably need a differently-positioned alternate in any
case).

Cf. UAX 50.


There have been some pretty confused statements. I believe the
observed problem is that PUA characters for Zhuang CJK ideographs get
rotated when displayed vertically rather than left-to-right.



Yes, as Richard says when CJK Zhuang text is displayed vertically whilst 
the Zhuang characters in Unicode remain upright, but those with PUA 
codepoints are rotated 90°. This is because the PUA characters are 
treated like English text, which are correctly rotated 90°. The 
orientation of the CJK characters in this case appears to depend on 
which block they belong to. As Peter points out this does not seem to 
match UAX 50.



Unicode is doing what it can in this matter:

(a) Zhuang PUA characters are being made individually obsolete.



Yes and No. Whilst a thousand Zhuang characters have been enocoded and 
two thousand have been submitted via IRG, however the number of PUA 
Zhuang characters is about the same or increasing. In 2006 when started 
just under 6k PUA points were used, presently there are over 8k, over 6k 
of which have not been submitted, and the earliest any future 
submissions can be encoded is 2026. That being said the number of more 
common Zhuang characters needing PUA support is coming down. So whilst 
individual characters are being resolved, the need for PUA Zhuang 
characters remains, and will so for decades to come.



(b) By default, PUA characters have the value of
Vertical_orientation=upright as do CJK ideographs.



Noted above.

Regards
John


For CJK ideographs, it is not clear to me when the vert feature (if
present) would be applied.  Is it only for some codepoints (vo=tu), or
is it for all that the engine expects to be displayed 'upright' in
vertical text?  The vrtr feature (if present) would be applied when
glyphs are to be rotated.  Is it for all such glyphs, or only those
for which rotation is expected to be inadequate (vo=tr)?  It seems
that feature vrt2 is to be applied to all glyphs; perhaps rotation is
the default behaviour when there is no look-up value for a glyph that
the engine expects to be rotated.  The truly difficult case would be
when there is no attempt to apply a look-up - possibly vrtr would not
apply to /p{vo=r}.

I would expect that defining the lookup vrt2 or vrtr to map Zhuang
glyphs to themselves (or something prerotated) would cure the problem.
 This would not work for sequences of Zhuang ideographs treated as RTL
text - but that is unlikely to happen.

Richard.




Re: Private Use areas

2018-08-28 Thread Janusz S. Bień via Unicode
On Tue, Aug 28 2018 at  9:43 -0700, unicode@unicode.org writes:
> On August 23, 2011, Asmus Freytag wrote:
>
>> On 8/23/2011 7:22 AM, Doug Ewell wrote:
>>> Of all applications, a word processor or DTP application would want
>>> to know more about the properties of characters than just whether
>>> they are RTL. Line breaking, word breaking, and case mapping come to
>>> mind.
>>>
>>> I would think the format used by standard UCD files, or the XML
>>> equivalent, would be preferable to making one up:

Right. I was not so quick to state this so early, but 2 years ago I
wrote to the MUFI list:


--8<---cut here---start->8---
On Sat, Jan 02 2016 at 12:35 CET, odd.hau...@uib.no writes:

[...]

> Note the permanent URI at the University Library in Bergen. This will
> in all likelihood be the last recommendation of its kind (and
> certainly the last edited by the undersigned), so please look out for
> new solutions (databases or the like) on the MUFI web site!

I think that one of the forms, perhaps even the primary one, should
follow the original Unicode Character Database and the
output of Unibook (http://www.unicode.org/unibook/).

The idea can be tested by converting the present recommendation to this
form. Unfortunately I'm unable to contribute myself to this task.

One of the advantages would be that the various character browsers can
be adapted relatively easily to provide info about the MUFI characters.

A simpler variant of this idea is to use Unibook-like format to
document fonts. A quick-and-dirty tools for this purpose has been
prepared by a student of mine:

https://bitbucket.org/jsbien/fntsample-fork-with-ucd-comments/
https://bitbucket.org/jsbien/unicode-ucd-parser

A sample output of the tools is available at

https://bitbucket.org/jsbien/parkosz-font/downloads/Parkosz1907draft.pdf

(the font is also quick-and-dirty and unfinished work).

--8<---cut here---end--->8---

Unfortunately there was no reaction.

>>
>> The right answer would follow the XML format of the UCD.
>>
>> That's the only format that allows all necessary information contained
>> in one file,

For me necessary are also comments and crossreferences contained in
NamesList.txt. Do I understand correctly that only "ISO Comment
properties" are included in the file?

>> and it would leverage of any effort that users of the
>> main UCD have made in parsing the XML format.
>>
>> An XML format shold also be flexible in that you can add/remove not
>> just characters, but properties as needed.
>>
>> The worst thing do do, other than designing something from scratch,
>> would be to replicate the UnicodeData.txt layout with its random, but
>> fixed collection of properties and insanely many semi-colons. None of
>> the existing UCD txt files carries all the needed data in a single
>> file.
>
> I don't know if or how I responded 7 years ago, but at least today, I
> think this is an excellent suggestion.
>
> If the goal is to encourage vendors to support PUA assignments, using an
> exceedingly well-defined format (UAX #42) sitting atop one of the most
> widely used base formats ever (XML), with all property information in a
> single repository (per PUA scheme), would be great encouragement.

I think we need also the data in the format acceptable by UniBook.

> I've devised lots of novel file formats and I think this is one use
> case where that would be a real hindrance.

> Storing this information in a font, by hook or crook, would lock users
> of those PUA characters into that font. At that rate, you might as well
> use ASCII-hacked fonts, as we did 25 years ago.

Storing the information in a font is inappropriate not only for the
technical reasons, as I wrote recently (on Thu, Aug 23 2018)

> Fonts are for *rendering*, new characters and variants are more and
> more often needed for *input* of real life old texts with sufficient
> precision.

Best regards

Janusz

-- 
 ,   
Janusz S. Bien
emeryt (emeritus)
https://sites.google.com/view/jsbien