Michael, I don't care what you choose. Whatever works is fine for an internal use.
But is the data scheme you share representative of your actual application? >From what I see below, unless the number of "point" variables is not always exactly four, the application might be handled well by any format that handles rectangular data, perhaps even CSV. You show a I mean anything like a data.frame can contain data columns like p1,p2,p3,p4 and a categorical one like IHRcurve_name. Or do you have a need for more variability such as an undetermined number of similar units in ways that might require more flexibility or be more efficient done another way? MOST of the discussion I am seeing here seems peripheral to getting you what you need for your situation and may require a learning curve to learn to use properly. Are you planning on worrying about how to ship your data encrypted, for example? Any file format you use for storage can presumably be encrypted and send and decrypted if that matters. So, yes, from an abstract standpoint we can discuss the merits of various approaches. If it matters that humans can deal with your data in a file or that it be able to be imported into a program like EXCEL, those are considerations. But if not, there are quite a few relatively binary formats where your program can save a snapshot of the data into a file and read it back in next time. I often do that in another language that lets me share variable including nested components such as the complex structures that come out of a statistical analysis or the components needed to make one or more graphs later. If you write the program that creates the darn things as well as the one that later reads them back in, you can do what you want. Or, did I miss something and others have already produced the data using other tools, in which case you have to read it in at least once/ -----Original Message----- From: Python-list <python-list-bounces+avigross=verizon....@python.org> On Behalf Of Michael F. Stemper Sent: Saturday, September 25, 2021 4:20 PM To: python-list@python.org Subject: Re: XML Considered Harmful On 21/09/2021 13.12, Michael F. Stemper wrote: > If XML is not the way to package data, what is the recommended > approach? Well, there have been a lot of ideas put forth on this thread, many more than I expected. I'd like to thank everyone who took the time to contribute. Most of the reasons given for avoiding XML appear to be along the lines of "XML has all of these different options that it supports." However, it seems that I could ignore 99% of those things and just use a teeny subset of its capabilities. For instance, if I modeled a fuel like this: <Fuel name="Montana Sub-Bituminous"> <uom>ton</uom> <price>21.96</price> <heat_content>18.2</heat_content> </Fuel> and a generating unit like this: <Generator name="Skunk Creek 1"> <IHRcurve name="normal"> <point P="63" IHR="8.513"/> <point P="105" IHR="8.907"/> <point P="241" IHR="9.411"/> <point P="455" IHR="10.202"/> </IHRcurve> <IHRcurve name="constrained"> <point P="63" IHR="8.514"/> <point P="103" IHR="9.022"/> <point P="223" IHR="9.511"/> <point P="415" IHR="10.102"/> </IHRcurve> </Generator> why would the fact that I could have chosen, instead, to model the unit of measure as an attribute of the fuel, or its name as a sub-element matter? Once the modeling decision has been made, all of the decisions that might have been would seem to be irrelevant. Some years back, IEC's TC57 came up with CIM[1]. This nailed down a lot of decisions. The fact that other decisions could have been made doesn't seem to keep utilities from going forward with it as an enterprise-wide data model. My current interests are not anywhere so expansive, but it seems that the situations are at least similar: 1. Look at an endless range of options for a data model. 2. Pick one. 3. Run with it. To clearly state my (revised) question: Why does the existence of XML's many options cause a problem for my use case? Other reactions: Somebody pointed out that some approaches would require that I climb a learning curve. That's appreciated, although learning new things is always good. NestedText looks cool, and a lot like YAML. Having not gotten around to playing with YAML yet, I was surprised to learn that it tries to guess data types. This sounds as if it could lead to the same type of problems that led to the names of some genes being turned into dates. It was suggested that I use an RDBMS, such as sqlite3, for the input data. I've used sqlite3 for real-time data exchange between concurrently-running programs. However, I don't see syntax like: sqlite> INSERT INTO Fuels ...> (name,uom,price,heat_content) ...> VALUES ("Montana Sub-Bituminous", "ton", 21.96, 13.65); as being nearly as readable as the XML that I've sketched above. Yeah, I could write a program to do this, but that doesn't really change anything, since I'd still need to get the data into the program. (Changing a value would be even worse, requiring the dreaded UPDATE INTO statement, instead of five seconds in vi.) Many of the problems listed for CSV, which come from its lack of standardization, seem similar to those given for XML. "Commas or tabs?" "How are new-lines represented?" If I was to use CSV, I'd be able to just pick answers. However, fitting hierarchical data into rows/columns just seems wrong, so I doubt that I'll end up going that way. As far as disambiguating authors, I believe that most journals are now expecting an ORCID[2] (which doesn't help with papers published before that came around). As far as use of XML to store program state, I wouldn't ever consider that. As noted above, I've used an RDBMS to do so. It handles all of the concurrency issues for me. The current use case is specifically for raw, static input. Fascinating to find out that XML was originally designed to mark up text, especially legal text. It was nice to be reminded of what Matt Parker looked like when he had hair. [1] <https://en.wikipedia.org/wiki/Common_Information_Model_(electricity)> [2] <https://orcid.org/> -- Michael F. Stemper Psalm 82:3-4 -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list