But CSV is only fine for pure tabular data, and the UCD or CDLR data is has a more complex structure than a simple 2D table. In addition, the schema is evolving, with new kind of datas added everytime; you cannot keep that compatibility by adding more empty columns to a single table; adding new semicolons or other separators to a CSV makes the formaty much less readable, and in fact it will then contain lot of redundancy.
Like traditional relational databases, these project need a schema and structure. But if we have to use a RDBMS API, we'll loose the possibility for using various tools. So these Unicode databases are using collections of tables and in some cases you need to split a value into multiple ones with different scoping rules: for that job JSON or XML is fine. But nothing prevents you to load the existing UCD/CLDR database files into a relational database and expose the data in different views. But most applications are in fact built by first laoding this data with a parser specific to the application, that will convert it to its application-defined schema, and data can be recompiled in a new form that will then be exposed by an application API. XML if then fine ! It has no cost for final users that just use the generated applications. It's only up to application compiler projects to parse the data, generate their code, and integrate the data to their API (there are more useful tools than just "grep'ing the UCD/CLDR datafiles. Also the UCD and CLDR files are checked by other automated tools that already parse them, and load them to perform consistency checks and generate multiple presentations: the important ICU project is built and maintained for that, it has all the tools needed, plus a reduced API that can be used directly by final applications. Even some UCD files are now automatically generated from other source files, they contain automatically generated reports, Only the initial main UCD file has kept its initial pure CSV form: it was no longer possible to continue extending this single file, but compatibility has been preserved and it's a good thing. All others contain comment lines, and basic report lines. Le lun. 3 sept. 2018 à 12:16, Adam Borowski via Unicode <unicode@unicode.org> a écrit : > On Mon, Sep 03, 2018 at 08:24:06AM +0200, Janusz S. Bień via Unicode wrote: > > For a non-programmer like me CVS is much more convenient form than XML - > > I can use it not only with a spreadsheet, but also import directly into > > a database and analyse with various queries. XML is politically correct, > > but practically almost unusable without a specialised parser. > > And for a programmer, XML is outright insane. You need a complex library > to > do so, and those fail KISS so badly that you have a CVE roughly yearly. > On the other hand, writing a parser for current headerless ;-separated data > completely from scratch is just: > > cut -d';' -f 1,6 </usr/share/unicode/UnicodeData.txt > or: > (split/;/)[0,5] > > JSON is somewhat better, but still needs drastically more effort. > CSV (especially with no escapes) is trivial to handle. > > > ᛗᛖᛟᚹ! > -- > ⢀⣴⠾⠻⢶⣦⠀ What Would Jesus Do, MUD/MMORPG edition: > ⣾⠁⢰⠒⠀⣿⡁ • multiplay with an admin char to benefit your mortal [Mt3:16-17] > ⢿⡄⠘⠷⠚⠋⠀ • abuse item cloning bugs [Mt14:17-20, Mt15:34-37] > ⠈⠳⣄⠀⠀⠀⠀ • use glitches to walk on water [Mt14:25-26] >