Re: Why self describing data formats:
On Fri, 01 Jun 2007 20:59:55 +1000 "James A. Donald" <[EMAIL PROTECTED]> wrote: > Many protocols use some form of self describing data format, for > example ASN.1, XML, S expressions, and bencoding. > > Why? > > Presumably both ends of the conversation have negotiated what > protocol version they are using (and if they have not, you have big > problems) and when they receive data, they need to get the data they > expect. If they are looking for list of integer pairs, and they get > a integer string pairs, then having them correctly identified as > strings is not going to help much. > The most important reason is application flexibility -- very often, complex data structures are being passed around, and having some format like those makes life easier. There is some security benefit, though -- see Section 7 of Abadi and Needham's "Prudent Engineering Practice for Cryptographic Protocols" (1995). (Yes, they're calling for a lot less than full-blown ASN.1.) --Steve Bellovin, http://www.cs.columbia.edu/~smb - The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]
Re: Why self describing data formats:
On Mon, Jun 11, 2007 at 11:28:37AM -0400, Richard Salz wrote: > >Many protocols use some form of self describing data format, for example > > ASN.1, XML, S expressions, and bencoding. > > I'm not sure what you're getting at. All XML and S expressions really get > you is that you know how to skip past something you don't understand. This > is also true for many (XER, DER, BER) but not all (PER) encodings for > ASN.1. If only it were so easy. As we discovered in the IETF KRB WG you can't expect that just because the protocol uses a TLV encoding (DER) you can just add items to sequences (structures) or choices (discriminated unions) willy nilly: code generated by a compiler might choke because formally the protocol didn't allow extensibility and the compiler did the Right Thing. Extensibility of this sort requires that one be explicit about it in the original spec. > Are you saying why publish a schema? I doubt it: you can have schemas without self-describing encodings (again, PER, XDR, are examples of non-self-describing encodings for ASN.1 and XDR, respectively). Schemas can be good while self-describing encodings can be bad... Nico -- - The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]
Re: Why self describing data formats:
James A. Donald: > > In the case of XML, yes there is a parsing engine, > > and if the structure of the DTD reflects the > > structure of the algorithm, then indeed it makes > > things much easier. But usually the committee have > > not thought about the algorithm, or have unresolved > > disagreements about what the algorithm should be, > > leaving the engineer with problems that are at best > > extremely difficult to solve, and are at worst > > impossible to solve. Ideally the DTD should be > > developed in parallel with the program that > > processes the XML. In that case, you get the > > parsing engine doing a lot of work for free, so the > > engineers do not have to reinvent the wheel. But if > > the DTD is written first by one group, and the > > program second, by another group, the second group > > is usually hosed good. Will Morton: > The situation is improved slightly with XML schemas, > as one can use frameworks like XMLBeans > (http://xmlbeans.apache.org/) to get the protocol much > closer to the code. This can help a bit, but doesn't > change the fundamentals. > > You're still right in that if you have one group > developing the code and another the protocol, you're > probably screwed, but isn't this just as true (perhaps > moreso) if you're rolling your own protocol structure > instead of using XML? With XML, alarmingly great flexibility in the protocol is easy and less work for the people designing the protocol - the protocol may be inordinately flexible because of laziness, carelessness, unresolved disagreement, or papered over disagreement, resulting in tag soup. With a protocol that is not self describing, the committee devising the protocol have to actually agree on what the protocol actually is. - The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]
Re: Why self describing data formats:
On Mon, Jun 11, 2007 at 09:28:02AM -0400, Bowness, Piers wrote: > But what is does help is allowing a protocol to be expanded and enhanced > while maintaining backward compatibility for both client and server. Nonsense. ASN.1's PER encoding does not prevent extensibility. - The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]
Re: Why self describing data formats:
> >But the main motivation (imho) is that it's trendy. And once anyone > >proposes a heavyweight "standard" encoding, anyone who opposes it is > >labeled a Luddite. Maybe. But there's quite a lot to be said for standards which lead to widespread availability of tools implementing them, both, open source and otherwise. One of the arguments we've heard for why ASN.1 sucks is the lack of tools, particularly open source ones, for ASN.1 and its encodings. Nowadays there is one GPL ASN.1 compiler and libraries: SNACC. (I'm not sure if it's output is unencumbered, like bison, or what, but that's important to a large number of developers who don't want to be forced to license under GPL, and there's not any full-featured ASN.1 compilers and libraries licensed under the BSD or BSD-like licenses.) The situation is markedly different with XML. Even if you don't like XML, or its redundancy (as an encoding, but then, see FastInfoSet, a PER-based encoding of XML), it has that going for it: tool availability. Nico -- - The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]
Re: Why self describing data formats:
On Fri, Jun 01, 2007 at 08:59:55PM +1000, James A. Donald wrote: > Many protocols use some form of self describing data format, for example > ASN.1, XML, S expressions, and bencoding. ASN.1 is not an encoding, and not all its encodings are self-describing. Specifically, PER is a compact encoding such that a PER encoding of some data cannot be decoded without access to the ASN.1 module(s) that describes the data types in question. Yes, it's a nit. Then there's XDR -- which can be thought of as a subset of ASN.1 and a four-octet aligned version of PER (XDR being both, a syntax and an encoding). > Why? Supposedly it is (or was thought to be) easier to write encoders/ decoders for TLV encodings (BER, DER, CER) and S-expressions, but I don't believe it (though I certainly believe that it was thought to be easier): rpcgen is a simple enough program, for example. TLV encodings tend to quite redundant, in a way that seems dangerous: a lazy programmer can (and many have) write code that fails to validate parts of an encoding and mostly get away with it (until the then inevitable subsequent buffer overflow, of course). Of course, code generators and libraries for self-describing and non- self-describing encodings alike are not necessarily bug free (have any been?) but at least they have the virtue that they are automatic tools that consume a formal language, thus limiting the number of lazy programmers involved and the number of different ways in which they can screw up (and they leave their consumers off the hook, to a point). > Presumably both ends of the conversation have negotiated what protocol > version they are using (and if they have not, you have big problems) and > when they receive data, they need to get the data they expect. If they > are looking for list of integer pairs, and they get a integer string > pairs, then having them correctly identified as strings is not going to > help much. I agree. The redundancy of TLV encodings, XML, etcetera, is unnecessary. Note though that I'm only talking about serialization formats for data in protocols; XML, I understand, was intended for _documents_, and it does seem quite appropriate for that, and so it can be expected that there should be a place for it in Internet protocols in transferring pieces of documents. Nico -- - The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]
Re: Why self describing data formats:
James A. Donald wrote: In the case of XML, yes there is a parsing engine, and if the structure of the DTD reflects the structure of the algorithm, then indeed it makes things much easier. But usually the committee have not thought about the algorithm, or have unresolved disagreements about what the algorithm should be, leaving the engineer with problems that are at best extremely difficult to solve, and are at worst impossible to solve. Ideally the DTD should be developed in parallel with the program that processes the XML. In that case, you get the parsing engine doing a lot of work for free, so the engineers do not have to reinvent the wheel. But if the DTD is written first by one group, and the program second, by another group, the second group is usually hosed good. The situation is improved slightly with XML schemas, as one can use frameworks like XMLBeans (http://xmlbeans.apache.org/) to get the protocol much closer to the code. This can help a bit, but doesn't change the fundamentals. You're still right in that if you have one group developing the code and another the protocol, you're probably screwed, but isn't this just as true (perhaps moreso) if you're rolling your own protocol structure instead of using XML? W - The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]
Re: Why self describing data formats:
>Many protocols use some form of self describing data format, for example > ASN.1, XML, S expressions, and bencoding. I'm not sure what you're getting at. All XML and S expressions really get you is that you know how to skip past something you don't understand. This is also true for many (XER, DER, BER) but not all (PER) encodings for ASN.1. Are you saying why publish a schema? /r$ -- STSM, Senior Security Architect DataPower SOA Appliances http://www.ibm.com/software/integration/datapower/ - The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]
Re: Why self describing data formats:
| Many protocols use some form of self describing data format, for | example ASN.1, XML, S expressions, and bencoding. | | Why? | | Presumably both ends of the conversation have negotiated what protocol | version they are using (and if they have not, you have big problems) | and when they receive data, they need to get the data they expect. If | they are looking for list of integer pairs, and they get a integer | string pairs, then having them correctly identified as strings is not | going to help much. I suspect the main reason designers use self-describing formats is the same reason Unix designers tend to go with all-ASCII formats: It's much easier to debug "by eye". Whether this is really of significance at any technical level is debateable. At the social level, it's very important. We're right into "worse is better" territory: Self- describing and, especially, ASCII-based protocols and formats are much easier to hack with. It's much easier to recover from errors in a self-describing format; it's much easier to make "reasonable" inter- pretations of incorrect data (for better or worse). Network lore makes this a virtue: "Be conservative in what you send, liberal in what you accept." (The first part gets honored in the breach all too often, and of course, the second is a horrible prescription for cryptography or security in general.) So software to use such protocols and formats gets developed faster, spreads more widely, and eventually you have an accepted standard that's too expensive to replace. The examples are rife. HTML is a wonderful one: It's a complex but human-readable protocol that a large fraction (probaby a majority) of generators get wrong - so there's a history of HTML readers ignoring errors and "doing the best they can". Again, this is a mixed bag - on the on hand, the web would clearly have grown much more slowly without it; on the other, the lack of standardization can cause, and has caused, problems. (IE6-only sites, raise your hands.) Looked at objectively, it's hard to see why XML is even a reasonable choice for many of its current uses. (A markup language is supposed to add semantic information over an existing body of data. If most of the content of a document is within the markup - true of probably the majority of uses of XML today - something is very wrong.) But it's there, there are tons of ancilliary programs, so ... the question that gets asked is not "why use XML?" but "why *not* use XML?" (Now, if I could only learn to relax and stop tearing my hear every time I read some XML paper in which they use "semantics" to mean what everyone else uses "syntax" for) -- Jerry - The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]
RE: Why self describing data formats:
> On Friday, June 01, 2007 7:00 AM James A. Donald wrote: > > Many protocols use some form of self describing data format, for example > ASN.1, XML, S expressions, and bencoding. > > Why? > > Presumably both ends of the conversation have negotiated what protocol > version they are using (and if they have not, you have big problems) and > when they receive data, they need to get the data they expect. If they > are looking for list of integer pairs, and they get a integer string > pairs, then having them correctly identified as strings is not going to > help much. > But what is does help is allowing a protocol to be expanded and enhanced while maintaining backward compatibility for both client and server. Provided care is taken to have the protocol contain the previously required items, consumers (clients) can examine the version information and continue based on a minimum required version (i.e., The client *must* be receive version X.Y or higher.) Clients can safely ignore new, unrecognized protocol elements while greatly simplifying server code (which just emits the high-version protocol). I would generally reserve the term "protocol" for wire transmissions (where presumably client and server could negotiate an appropriate version). Many of the self-describing "protocols" you mention become static file formats. This can have its drawbacks. An interesting workaround to this is the use of "critical" key usage extensions in X.509 (forcing the client to reject the certificate if there are key usage restrictions that a specific client cannot recognize). There are also overhead issues (especially for XML). -Piers - The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]
Re: Why self describing data formats:
James A. Donald wrote: Many protocols use some form of self describing data format, for example ASN.1, XML, S expressions, and bencoding. Why? Presumably both ends of the conversation have negotiated what protocol version they are using (and if they have not, you have big problems) and when they receive data, they need to get the data they expect. If they are looking for list of integer pairs, and they get an integer string pair, then having the string correctly identified as a string is not going to help much. Charlie Kaufman wrote: You are correct that such encodings don't help with any interoperability issues. Sometimes, they make reading and writing the spec easier, since silly issues like big endian vs. little endian encoding of integers gets specified elsewhere. They also make it easy to write specs that do not in fact work. The spec writers do not in fact agree, and then leave the problem of implementing an under defined spec to the engineer. More rarely, it makes coding easier (if there is some parsing and encoding engine readily available to the implementers). If the protocol is being designed by a committee, it can reduce the number of debates over minutia. In the case of XML, yes there is a parsing engine, and if the structure of the DTD reflects the structure of the algorithm, then indeed it makes things much easier. But usually the committee have not thought about the algorithm, or have unresolved disagreements about what the algorithm should be, leaving the engineer with problems that are at best extremely difficult to solve, and are at worst impossible to solve. Ideally the DTD should be developed in parallel with the program that processes the XML. In that case, you get the parsing engine doing a lot of work for free, so the engineers do not have to reinvent the wheel. But if the DTD is written first by one group, and the program second, by another group, the second group is usually hosed good. But the main motivation (imho) is that it's trendy. And once anyone proposes a heavyweight "standard" encoding, anyone who opposes it is labeled a Luddite. Sounds true. - The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]
Re: Why self describing data formats:
re: http://www.garlic.com/~lynn/aadsm27.htm#24 Why self describing data formats: for other archaeological trivia ... later i transferred from the science center to SJR and got to do some of the work on the original relational/sql implementation, System/R. a few years later, the "L" in GML also transferred to SJR and worked on relational, included being involved in the development of of BLOBS (Binary Large OBjectS) for relational. roll forward a few yrs to the acm (database) sigmod conference in san jose in the early 90s. In one of the sessions, somebody raised the question about what was all this X.500 and X.509 stuff going on in ISO ... and there was somebody from the audience that explained how it was a bunch of networking engineers trying to re-invent 1960s database technology. today ... you can periodically find heated online discussion about XML "databases" and whether they compromise the purity of information integrity that you get from the relational paradigm. lots of past posts mentioning various things about system/r, relational database technology, etc http://www.garlic.com/~lynn/subtopic.html#systemr - The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]
Re: Why self describing data formats:
James A. Donald wrote: Many protocols use some form of self describing data format, for example ASN.1, XML, S expressions, and bencoding. Why? gml (precursor to sgml, html, xml, etc) http://www.garlic.com/~lynn/subtopic.html#sgml was invented at the science center in 1969 http://www.garlic.com/~lynn/subtopic.html#545tech ... some recent (science center) topic drift/references in this post http://www.garlic.com/~lynn/2007l.html#65 mainframe = superserver "G", "M", & "L" were individuals at the science center ... so the requirement was to come up with an acronym from the inventors initials so some of the historical justification for the original "markup language" paradigm can be found originally CMS had the script command for document formating ... using "dot" format commands ... i.e. science center on 4th flr of 545 tech sq doing virtual machines, cp67, cms, the internal network, etc ... and multics on 5th flr of 545 tech sq ... draw from some common heritage to CTSS (and some of the unix heritage traces back thru multics also to CTSS). the original GML was sort of a combination of "self-describing" data (somewhat for legal documents) http://www.sgmlsource.com/history/roots.htm http://xml.coverpages.org//sgmlhist0.html and document formating ... when GML tag formating was added to CMS script processing command. Later you find a big CMS installation at CERN ... and HTML drawing heritage from the "waterloo" clone of the CMS script command. http://infomesh.net/html/history/early first webserver in the states was at slac (a CERN "sister location) ... another big vm/cms installation: http://www.slac.stanford.edu/history/earlyweb/history.shtml recent historical post/reference http://www.garlic.com/~lynn/2007d.html#29 old tapes last time i checked, w3c hdqtrs was around the corner from the old science center location at 545 tech. sq. before GML, the science center had an activity involving "performance" data from the time-sharing service (originally using virtual machine cp67 service and then transitioning to vm370) ... lots of system activity data was captured every 5-10 minutes and then archived to tape ... starting in the mid-60s ... by the mid-70s there was a decade of data spanning lots of different configurations, workloads, etc. The original intention when the system activity data was being archived was to include enuf self-describing information that the data could be interpreted many yrs later. lots of past posts about using cp67&vm370 for time-sharing services (both for internal corporate use and customers offering commercial, online time-sharing services using the platform) http://www.garlic.com/~lynn/subtopic.html#timeshare lots of past posts about long term performance monitoring, workload profiling, benchmarking and stuff leading up to things like capacity planning http://www.garlic.com/~lynn/subtopic.html#benchmark much later, you find things like ASN.1 encoding for handling interoperability of network transmitted data between platforms that might have different information representation conventions (like the whole little/big endian stuff). one of the things swirling around digital signature activity in the mid-90s was almost religious belief that digital certificate encoding mandated ASN.1. other digital signature operations that were less religious about PKI, x.509 identity digital certificates, etc ... were much less strict about encoding technique for digitally signed operations ... included certificateless digital signature infrastructures http://www.garlic.com/~lynn/subpubkey.html#certless One of the battles during the period between XML and ASN.1 proponents during the period was that XML didn't provide for a deterministic encoding. It really was somewhat a red herring on the digital certificate ... ASN.1 side ... since they were looking at always keeping things ASN.1 encoded (not just for transmission) ... and only decoding when some specific information needed extraction. On the other side was places like FSTC which was defining digitally signed electronic check convention (with tranmission over ACH or ISO8583). There was already a transmission standard ... which ASN.1 encoding would severely bloat ... not to mention the horrible payload bloat that was the result of any certificate-based infrastructure needing to append redundand and superfluous digital certificates. FSTC just defined appending a digital signature to existing payload. The issue then became a deterministic encoding of the information for when the digital signature was generated and verified. If you temporarily encoded the payload as XML, generated the digital signature ... and then appended the digital signature to the standard (ACH or ISO8583) payload ... the problem was that at the other end, XML didn't provide a deterministic encoding methodology so that the recipient could re-encode the payload and verify the digital signature. So FSTC eventually defined some additional rule
Why self describing data formats:
Many protocols use some form of self describing data format, for example ASN.1, XML, S expressions, and bencoding. Why? Presumably both ends of the conversation have negotiated what protocol version they are using (and if they have not, you have big problems) and when they receive data, they need to get the data they expect. If they are looking for list of integer pairs, and they get a integer string pairs, then having them correctly identified as strings is not going to help much. - The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]