Re: Why self describing data formats:

2007-06-25 Thread Steven M. Bellovin
On Fri, 01 Jun 2007 20:59:55 +1000
"James A. Donald" <[EMAIL PROTECTED]> wrote:

> Many protocols use some form of self describing data format, for
> example ASN.1, XML, S expressions, and bencoding.
> 
> Why?
> 
> Presumably both ends of the conversation have negotiated what
> protocol version they are using (and if they have not, you have big
> problems) and when they receive data, they need to get the data they
> expect.  If they are looking for list of integer pairs, and they get
> a integer string pairs, then having them correctly identified as
> strings is not going to help much.
> 
The most important reason is application flexibility -- very often,
complex data structures are being passed around, and having some
format like those makes life easier.

There is some security benefit, though -- see Section 7 of Abadi
and Needham's "Prudent Engineering Practice for Cryptographic
Protocols" (1995).  (Yes, they're calling for a lot less than
full-blown ASN.1.)


--Steve Bellovin, http://www.cs.columbia.edu/~smb

-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


Re: Why self describing data formats:

2007-06-23 Thread Nicolas Williams
On Mon, Jun 11, 2007 at 11:28:37AM -0400, Richard Salz wrote:
> >Many protocols use some form of self describing data format, for example
> > ASN.1, XML, S expressions, and bencoding.
> 
> I'm not sure what you're getting at.  All XML and S expressions really get 
> you is that you know how to skip past something you don't understand. This 
> is also true for many (XER, DER, BER) but not all (PER) encodings for 
> ASN.1.

If only it were so easy.  As we discovered in the IETF KRB WG you can't
expect that just because the protocol uses a TLV encoding (DER) you can
just add items to sequences (structures) or choices (discriminated
unions) willy nilly: code generated by a compiler might choke because
formally the protocol didn't allow extensibility and the compiler did
the Right Thing.  Extensibility of this sort requires that one be
explicit about it in the original spec.

> Are you saying why publish a schema?

I doubt it: you can have schemas without self-describing encodings
(again, PER, XDR, are examples of non-self-describing encodings for
ASN.1 and XDR, respectively).  Schemas can be good while self-describing
encodings can be bad...

Nico
-- 

-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


Re: Why self describing data formats:

2007-06-23 Thread James A. Donald

James A. Donald:
> > In the case of XML, yes there is a parsing engine,
> > and if the structure of the DTD reflects the
> > structure of the algorithm, then indeed it makes
> > things much easier.  But usually the committee have
> > not thought about the algorithm, or have unresolved
> > disagreements about what the algorithm should be,
> > leaving the engineer with problems that are at best
> > extremely difficult to solve, and are at worst
> > impossible to solve.  Ideally the DTD should be
> > developed in parallel with the program that
> > processes the XML.  In that case, you get the
> > parsing engine doing a lot of work for free, so the
> > engineers do not have to reinvent the wheel.  But if
> > the DTD is written first by one group, and the
> > program second, by another group, the second group
> > is usually hosed good.

Will Morton:
> The situation is improved slightly with XML schemas,
> as one can use frameworks like XMLBeans
> (http://xmlbeans.apache.org/) to get the protocol much
> closer to the code.  This can help a bit, but doesn't
> change the fundamentals.
>
> You're still right in that if you have one group
> developing the code and another the protocol, you're
> probably screwed, but isn't this just as true (perhaps
> moreso) if you're rolling your own protocol structure
> instead of using XML?

With XML, alarmingly great flexibility in the protocol
is easy and less work for the people designing the
protocol - the protocol may be inordinately flexible
because of laziness, carelessness, unresolved
disagreement, or papered over disagreement,
resulting in tag soup.

With a protocol that is not self describing, the
committee devising the protocol have to actually agree
on what the protocol actually is.

-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


Re: Why self describing data formats:

2007-06-21 Thread Nicolas Williams
On Mon, Jun 11, 2007 at 09:28:02AM -0400, Bowness, Piers wrote:
> But what is does help is allowing a protocol to be expanded and enhanced
> while maintaining backward compatibility for both client and server.

Nonsense.  ASN.1's PER encoding does not prevent extensibility.

-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


Re: Why self describing data formats:

2007-06-21 Thread Nicolas Williams
> >But the main motivation (imho) is that it's trendy. And once anyone
> >proposes a heavyweight "standard" encoding, anyone who opposes it is
> >labeled a Luddite.

Maybe.  But there's quite a lot to be said for standards which lead to
widespread availability of tools implementing them, both, open source
and otherwise.

One of the arguments we've heard for why ASN.1 sucks is the lack of
tools, particularly open source ones, for ASN.1 and its encodings.

Nowadays there is one GPL ASN.1 compiler and libraries: SNACC.  (I'm not
sure if it's output is unencumbered, like bison, or what, but that's
important to a large number of developers who don't want to be forced to
license under GPL, and there's not any full-featured ASN.1 compilers and
libraries licensed under the BSD or BSD-like licenses.)

The situation is markedly different with XML.  Even if you don't like
XML, or its redundancy (as an encoding, but then, see FastInfoSet, a
PER-based encoding of XML), it has that going for it: tool availability.

Nico
-- 

-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


Re: Why self describing data formats:

2007-06-21 Thread Nicolas Williams
On Fri, Jun 01, 2007 at 08:59:55PM +1000, James A. Donald wrote:
> Many protocols use some form of self describing data format, for example 
> ASN.1, XML, S expressions, and bencoding.

ASN.1 is not an encoding, and not all its encodings are self-describing.

Specifically, PER is a compact encoding such that a PER encoding of some
data cannot be decoded without access to the ASN.1 module(s) that
describes the data types in question.

Yes, it's a nit.

Then there's XDR -- which can be thought of as a subset of ASN.1 and a
four-octet aligned version of PER (XDR being both, a syntax and an
encoding).

> Why?

Supposedly it is (or was thought to be) easier to write encoders/
decoders for TLV encodings (BER, DER, CER) and S-expressions, but I
don't believe it (though I certainly believe that it was thought to be
easier): rpcgen is a simple enough program, for example.

TLV encodings tend to quite redundant, in a way that seems dangerous: a
lazy programmer can (and many have) write code that fails to validate
parts of an encoding and mostly get away with it (until the then
inevitable subsequent buffer overflow, of course).

Of course, code generators and libraries for self-describing and non-
self-describing encodings alike are not necessarily bug free (have any
been?) but at least they have the virtue that they are automatic tools
that consume a formal language, thus limiting the number of lazy
programmers involved and the number of different ways in which they can
screw up (and they leave their consumers off the hook, to a point).

> Presumably both ends of the conversation have negotiated what protocol 
> version they are using (and if they have not, you have big problems) and 
> when they receive data, they need to get the data they expect.  If they 
> are looking for list of integer pairs, and they get a integer string 
> pairs, then having them correctly identified as strings is not going to 
> help much.

I agree.  The redundancy of TLV encodings, XML, etcetera, is
unnecessary.  Note though that I'm only talking about serialization
formats for data in protocols; XML, I understand, was intended for
_documents_, and it does seem quite appropriate for that, and so it can
be expected that there should be a place for it in Internet protocols in
transferring pieces of documents.

Nico
-- 

-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


Re: Why self describing data formats:

2007-06-21 Thread Will Morton

James A. Donald wrote:



In the case of XML, yes there is a parsing engine, and if the structure 
of the DTD reflects the structure of the algorithm, then indeed it makes 
things much easier.  But usually the committee have not thought about 
the algorithm, or have unresolved disagreements about what the algorithm 
should be, leaving the engineer with problems that are at best extremely 
difficult to solve, and are at worst impossible to solve.  Ideally the 
DTD should be developed in parallel with the program that processes the 
XML.  In that case, you get the parsing engine doing a lot of work for 
free, so the engineers do not have to reinvent the wheel.  But if the 
DTD is written first by one group, and the program second, by another 
group, the second group is usually hosed good.




The situation is improved slightly with XML schemas, as one can use 
frameworks like XMLBeans (http://xmlbeans.apache.org/) to get the 
protocol much closer to the code.  This can help a bit, but doesn't 
change the fundamentals.


You're still right in that if you have one group developing the code and 
another the protocol, you're probably screwed, but isn't this just as 
true (perhaps moreso) if you're rolling your own protocol structure 
instead of using XML?


W

-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


Re: Why self describing data formats:

2007-06-21 Thread Richard Salz
>Many protocols use some form of self describing data format, for example
> ASN.1, XML, S expressions, and bencoding.

I'm not sure what you're getting at.  All XML and S expressions really get 
you is that you know how to skip past something you don't understand. This 
is also true for many (XER, DER, BER) but not all (PER) encodings for 
ASN.1.

Are you saying why publish a schema?

/r$

--
STSM, Senior Security Architect
DataPower SOA Appliances
http://www.ibm.com/software/integration/datapower/

-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


Re: Why self describing data formats:

2007-06-21 Thread Leichter, Jerry
| Many protocols use some form of self describing data format, for
| example ASN.1, XML, S expressions, and bencoding.
| 
| Why?
| 
| Presumably both ends of the conversation have negotiated what protocol
| version they are using (and if they have not, you have big problems)
| and when they receive data, they need to get the data they expect.  If
| they are looking for list of integer pairs, and they get a integer
| string pairs, then having them correctly identified as strings is not
| going to help much.
I suspect the main reason designers use self-describing formats is the
same reason Unix designers tend to go with all-ASCII formats:  It's
much easier to debug "by eye".  Whether this is really of significance
at any technical level is debateable.  At the social level, it's very
important.  We're right into "worse is better" territory:  Self-
describing and, especially, ASCII-based protocols and formats are much
easier to hack with.  It's much easier to recover from errors in a
self-describing format; it's much easier to make "reasonable" inter-
pretations of incorrect data (for better or worse).  Network lore makes
this a virtue:  "Be conservative in what you send, liberal in what you
accept."  (The first part gets honored in the breach all too often, and
of course, the second is a horrible prescription for cryptography or
security in general.)  So software to use such protocols and formats
gets developed faster, spreads more widely, and eventually you have an
accepted standard that's too expensive to replace.

The examples are rife.  HTML is a wonderful one:  It's a complex but
human-readable protocol that a large fraction (probaby a majority) of
generators get wrong - so there's a history of HTML readers ignoring
errors and "doing the best they can".  Again, this is a mixed bag - on
the on hand, the web would clearly have grown much more slowly without
it; on the other, the lack of standardization can cause, and has caused,
problems.  (IE6-only sites, raise your hands.)

Looked at objectively, it's hard to see why XML is even a reasonable
choice for many of its current uses.  (A markup language is supposed to
add semantic information over an existing body of data.  If most of
the content of a document is within the markup - true of probably the
majority of uses of XML today - something is very wrong.)  But it's
there, there are tons of ancilliary programs, so ... the question that
gets asked is not "why use XML?" but "why *not* use XML?"  (Now, if I
could only learn to relax and stop tearing my hear every time I read
some XML paper in which they use "semantics" to mean what everyone
else uses "syntax" for)
-- Jerry

-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


RE: Why self describing data formats:

2007-06-21 Thread Bowness, Piers

> On Friday, June 01, 2007 7:00 AM James A. Donald wrote:
> 
> Many protocols use some form of self describing data format, for
example
> ASN.1, XML, S expressions, and bencoding.
> 
> Why?
> 
> Presumably both ends of the conversation have negotiated what protocol
> version they are using (and if they have not, you have big problems)
and
> when they receive data, they need to get the data they expect.  If
they
> are looking for list of integer pairs, and they get a integer string
> pairs, then having them correctly identified as strings is not going
to
> help much.
> 

But what is does help is allowing a protocol to be expanded and enhanced
while maintaining backward compatibility for both client and server.
Provided care is taken to have the protocol contain the previously
required items, consumers (clients) can examine the version information
and continue based on a minimum required version (i.e., The client
*must* be receive version X.Y or higher.) Clients can safely ignore new,
unrecognized protocol elements while greatly simplifying server code
(which just emits the high-version protocol).

I would generally reserve the term "protocol" for wire transmissions
(where presumably client and server could negotiate an appropriate
version). Many of the self-describing "protocols" you mention become
static file formats.

This can have its drawbacks. An interesting workaround to this is the
use of "critical" key usage extensions in X.509 (forcing the client to
reject the certificate if there are key usage restrictions that a
specific client cannot recognize). There are also overhead issues
(especially for XML).

-Piers

-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


Re: Why self describing data formats:

2007-06-21 Thread James A. Donald

James A. Donald wrote:

Many protocols use some form of self describing data format, for example
ASN.1, XML, S expressions, and bencoding.

Why?

Presumably both ends of the conversation have negotiated what protocol
version they are using (and if they have not, you have big problems) and
when they receive data, they need to get the data they expect.  If they
are looking for list of integer pairs, and they get an integer string
pair, then having the string correctly identified as a string is not going to
help much.


Charlie Kaufman wrote:

You are correct that such encodings don't help with any interoperability
issues. Sometimes, they make reading and writing the spec easier, since
silly issues like big endian vs. little endian encoding of integers gets
specified elsewhere. 


They also make it easy to write specs that do not in fact work.  The 
spec writers do not in fact agree, and then leave the problem of 
implementing an under defined spec to the engineer.



More rarely, it makes coding easier (if there is some
parsing and encoding engine readily available to the implementers). If
the protocol is being designed by a committee, it can reduce the number of
debates over minutia.


In the case of XML, yes there is a parsing engine, and if the structure 
of the DTD reflects the structure of the algorithm, then indeed it makes 
things much easier.  But usually the committee have not thought about 
the algorithm, or have unresolved disagreements about what the algorithm 
should be, leaving the engineer with problems that are at best extremely 
difficult to solve, and are at worst impossible to solve.  Ideally the 
DTD should be developed in parallel with the program that processes the 
XML.  In that case, you get the parsing engine doing a lot of work for 
free, so the engineers do not have to reinvent the wheel.  But if the 
DTD is written first by one group, and the program second, by another 
group, the second group is usually hosed good.



But the main motivation (imho) is that it's trendy. And once anyone
proposes a heavyweight "standard" encoding, anyone who opposes it is
labeled a Luddite.


Sounds true.

-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


Re: Why self describing data formats:

2007-06-11 Thread Anne & Lynn Wheeler


re:
http://www.garlic.com/~lynn/aadsm27.htm#24 Why self describing data formats:

for other archaeological trivia ... later i transferred from the science center
to SJR and got to do some of the work on the original relational/sql 
implementation,
System/R.

a few years later, the "L" in GML also transferred to SJR and worked on 
relational,
included being involved in the development of of BLOBS (Binary Large OBjectS) 
for relational.


roll forward a few yrs to the acm (database) sigmod conference in san jose in
the early 90s. In one of the sessions, somebody raised the question about what
was all this X.500 and X.509 stuff going on in ISO ... and there was somebody
from the audience that explained how it was a bunch of networking engineers 
trying to re-invent 1960s database technology.


today ... you can periodically find heated online discussion about XML 
"databases"
and whether they compromise the purity of information integrity that you get
from the relational paradigm. lots of past posts mentioning various things about
system/r, relational database technology, etc
http://www.garlic.com/~lynn/subtopic.html#systemr


-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


Re: Why self describing data formats:

2007-06-11 Thread Anne & Lynn Wheeler

James A. Donald wrote:
Many protocols use some form of self describing data format, for example 
ASN.1, XML, S expressions, and bencoding.


Why?


gml (precursor to sgml, html, xml, etc) 
http://www.garlic.com/~lynn/subtopic.html#sgml


was invented at the science center in 1969 
http://www.garlic.com/~lynn/subtopic.html#545tech


... some recent (science center) topic drift/references in this post
http://www.garlic.com/~lynn/2007l.html#65 mainframe = superserver

"G", "M", & "L" were individuals at the science center ... so the
requirement was to come up with an acronym from the inventors initials

so some of the historical justification for the original "markup language" 
paradigm
can be found 


originally CMS had the script command for document formating ... using
"dot" format commands ... i.e. science center on 4th flr of 545 tech sq
doing virtual machines, cp67, cms, the internal network, etc ... and multics
on 5th flr of 545 tech sq ... draw from some common heritage to CTSS (and some
of the unix heritage traces back thru multics also to CTSS).

the original GML was sort of a combination of "self-describing" data (somewhat 
for
legal documents) 
http://www.sgmlsource.com/history/roots.htm

http://xml.coverpages.org//sgmlhist0.html

and document formating ... when GML tag formating was added to CMS script processing 
command. Later you find a big CMS installation at CERN ... and HTML drawing heritage 
from the "waterloo" clone of the CMS script command.

http://infomesh.net/html/history/early

first webserver in the states was at slac (a CERN "sister location) ... another 
big vm/cms installation:

http://www.slac.stanford.edu/history/earlyweb/history.shtml

recent historical post/reference
http://www.garlic.com/~lynn/2007d.html#29 old tapes

last time i checked, w3c hdqtrs was around the corner from the old
science center location at 545 tech. sq.

before GML, the science center had an activity involving "performance" data
from the time-sharing service (originally using virtual machine cp67 service
and then transitioning to vm370) ... lots of system activity data was captured
every 5-10 minutes and then archived to tape ... starting in the mid-60s ...
by the mid-70s there was a decade of data spanning lots of different 
configurations,
workloads, etc. The original intention when the system activity data was being
archived was to include enuf self-describing information that the data could
be interpreted many yrs later. lots of past posts about using cp67&vm370
for time-sharing services (both for internal corporate use and customers 
offering
commercial, online time-sharing services using the platform)
http://www.garlic.com/~lynn/subtopic.html#timeshare

lots of past posts about long term performance monitoring, workload profiling,
benchmarking and stuff leading up to things like capacity planning
http://www.garlic.com/~lynn/subtopic.html#benchmark

much later, you find things like ASN.1 encoding for handling interoperability
of network transmitted data between platforms that might have different
information representation conventions (like the whole little/big endian stuff).

one of the things swirling around digital signature activity in the mid-90s
was almost religious belief that digital certificate encoding mandated 
ASN.1. 

other digital signature operations that were less religious about PKI, 
x.509 identity digital certificates, etc ... were much less strict

about encoding technique for digitally signed operations ... included
certificateless digital signature infrastructures
http://www.garlic.com/~lynn/subpubkey.html#certless

One of the battles during the period between XML and ASN.1 proponents
during the period was that XML didn't provide for a deterministic encoding.
It really was somewhat a red herring on the digital certificate ... ASN.1
side ... since they were looking at always keeping things ASN.1 encoded
(not just for transmission) ... and only decoding when some specific 
information needed extraction.


On the other side was places like FSTC which was defining digitally
signed electronic check convention (with tranmission over ACH or ISO8583).
There was already a transmission standard ... which ASN.1 encoding would
severely bloat ... not to mention the horrible payload bloat that was
the result of any certificate-based infrastructure needing to append
redundand and superfluous digital certificates.

FSTC just defined appending a digital signature to existing payload.
The issue then became a deterministic encoding of the information
for when the digital signature was generated and verified. If you
temporarily encoded the payload as XML, generated the digital signature
... and then appended the digital signature to the standard (ACH or
ISO8583) payload ... the problem was that at the other end,
XML didn't provide a deterministic encoding methodology so that
the recipient could re-encode the payload and verify the digital
signature. So FSTC eventually defined some additional rule

Why self describing data formats:

2007-06-09 Thread James A. Donald
Many protocols use some form of self describing data format, for example 
ASN.1, XML, S expressions, and bencoding.


Why?

Presumably both ends of the conversation have negotiated what protocol 
version they are using (and if they have not, you have big problems) and 
when they receive data, they need to get the data they expect.  If they 
are looking for list of integer pairs, and they get a integer string 
pairs, then having them correctly identified as strings is not going to 
help much.


-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]