LeadingSkip/TrailingSkip are just about the way some formats are expressed.

You are correct that these are about formats with unused "holes" in them.

Sometimes the spec document tells you how long the fields are, and when there 
are unused areas, the distance to the next element. That's the style that 
leadingSkip/TrailingSkip are really about.

Other formats define "elements" with names like "spare" which consume the 
'unused' data.

Other formats tell you the starting location of each field. This style ("offset 
oriented") isn't currently supported directly by DFDL. You have to convert it 
into length information. It's planned for a DFDL v2.0 feature to directly 
support this.

I would agree with you that I've only ever seen these leadingSkip/TrailingSkip 
notions used in binary data formats.  But the difference between binary and 
text can be subtle.

An important property for DFDL is an A + B composition rule. If you can 
describe A, and you can describe B, then if you concatenate A-described data 
with B-described data, you can describe that. So if A is textual, and B is 
binary, then A+B is a blend. One can of course then surround that mixture with 
yet more layers of other formats. I have seen commercial data sets which truly 
looked like a COBOL-style mainframe data record concatenated onto a log output 
from some perl-based web application. They were directly juxtaposed in the 
data. Then repeated per record. Each record was this composition of two utterly 
different formats. This was a commercial data set you had to pay money to buy 
from a supplier of such data.

The A+B composition property prevents DFDL from segregating the world into 
text, and the properties exclusive to text, and binary and the properties 
exclusive to binary data.  The world of data formats is much too messy for such 
clean distinctions.
________________________________
From: Costello, Roger L. <[email protected]>
Sent: Monday, April 13, 2020 7:28 AM
To: [email protected] <[email protected]>
Subject: Is leadingSkip and trailingSkip used (exclusively) with binary data 
formats that contain an island of text?

Hi Folks,

I can't imagine a text data format in which the text doesn't start until after 
n bytes (or n bits). That is, I can't see the need for leadingSkip in a text 
data format. Is there a use case for leadingSkip in text data formats? If 
leadingSkip doesn't apply to text data formats, then why do I have to specify 
it?

However, I can imagine an island of text embedded in a binary data format: the 
text doesn't start until after n bytes (or n bits). Yes?

Likewise, I can't imagine a text data format in which there is some text and 
then the next text is after n bytes (or n bits). That is, I can't see the need 
for trailingSkip in a text data format. Is there a use case for trailingSkip in 
text data formats? If trailingSkip doesn't apply to text data formats, then why 
do I have to specify it?

However, I can imagine islands of text embedded in a binary data format: there 
is an island of text and then the next island of text doesn't start until after 
n bytes (or n bits). Yes?

/Roger


Reply via email to