|
On 3/12/2016 10:55 PM, Janusz S. "Bień"
wrote:
In fact, the possibility of reuse in this context probably among the unstated rationales for making the information and syntax available in the first place.I understand there is no intention to make an official XML version of the file as it would require changes in Unibook? In principle, the tooling that the editorial committee maintains could be modified to write out some XML version of the information. It's only software. By the same token principle, someone could write a new parser for Unibook that can read the XML. Both would consume significant amount of resources, for absolutely no gain when it comes to the core purpose: the production of the code charts. In fact, the work would not be done, because the code chart process requires the use of some namelist-aware tools for draft preparation. All of these would have to be translated into a new format as well. Finally, Unibook relies on auxiliary files that provide font selection and configuration data. Logically, the smart thing would be to convert all of them to XML, or JSON, or whatever the structured data description format du jour. Looked at it from a practical perspective, by those involved in doing the work of creating the code charts and issuing new versions of the Standard, it's a non-starter. There are explanations about character use that are only maintained in the PDF of the core specification, where this information is packaged in a way that can be understood by a human reader, but is not amenable to be extracted by machine.While the annotations, comments, cross references etc. in Namelist.txt appear, formally, to be machine extractable, the way they are created and managed make them just as much "human-accessible" only as the core specification. This comment referred to the facts that (a) the nameslist is not exhaustive and that (b) it is perfectly OK to have information that's not intended for machine- parsing. Information intended for machine-parsing has a certain amount of structure and consistency, so that when a data table is built from it, the consuming program can rely on the fact that it will cover some aspect of character identity or behavior in a systematic way. Well, not all possible information is systematized that way. Some information requires being interpreted by a human reader; the fact that the information is not buried in running text, but shows up in "fields" in a list, doesn't make it systematized in the same way as case mapping, decomposition or other property data. You might as well have a tool that extracts snippets from the core specification. All fine, if your goal is, for example, to present all bits of text mentioning a certain code point (search engines will do some of that extraction for you). However, even after extraction, the data is still just as unstructured as before, and, while useful to a human reader, doesn't constitute a formal character property. That's the whole reason why we go to the trouble of defining so clearly what is and isn't a character property (see UAX#44). A./ |
- Re: Gaps in Mathematical Alphanumeric Symbols Andrew West
- Re: Gaps in Mathematical Alphanumeric Symbols Doug Ewell
- Re: Gaps in Mathematical Alphanumeric Symbols Andrew West
- NamesList.txt as data source (was: Re: Gaps in ... Ken Whistler
- Re: NamesList.txt as data source (was: Re: ... J. S. Choi
- Re: NamesList.txt as data source Asmus Freytag
- Re: NamesList.txt as data source Oren Watson
- Re: NamesList.txt as data sour... Ken Whistler
- annotations (was: NamesList.txt as data sou... Janusz S. Bień
- Re: annotations (was: NamesList.txt as ... Marcel Schneider
- Re: annotations Asmus Freytag (t)
- Re: NamesList.txt as data source Janusz S. Bień
- Re: NamesList.txt as data source Asmus Freytag (t)
- RE: Gaps in Mathematical Alphanumeric Symbols Doug Ewell

