LeadingSkip/TrailingSkip are just about the way some formats are expressed.
You are correct that these are about formats with unused "holes" in them.
Sometimes the spec document tells you how long the fields are, and when there
are unused areas, the distance to the next element. That's the style that
leadingSkip/TrailingSkip are really about.
Other formats define "elements" with names like "spare" which consume the
'unused' data.
Other formats tell you the starting location of each field. This style ("offset
oriented") isn't currently supported directly by DFDL. You have to convert it
into length information. It's planned for a DFDL v2.0 feature to directly
support this.
I would agree with you that I've only ever seen these leadingSkip/TrailingSkip
notions used in binary data formats. But the difference between binary and
text can be subtle.
An important property for DFDL is an A + B composition rule. If you can
describe A, and you can describe B, then if you concatenate A-described data
with B-described data, you can describe that. So if A is textual, and B is
binary, then A+B is a blend. One can of course then surround that mixture with
yet more layers of other formats. I have seen commercial data sets which truly
looked like a COBOL-style mainframe data record concatenated onto a log output
from some perl-based web application. They were directly juxtaposed in the
data. Then repeated per record. Each record was this composition of two utterly
different formats. This was a commercial data set you had to pay money to buy
from a supplier of such data.
The A+B composition property prevents DFDL from segregating the world into
text, and the properties exclusive to text, and binary and the properties
exclusive to binary data. The world of data formats is much too messy for such
clean distinctions.
________________________________
From: Costello, Roger L. <[email protected]>
Sent: Monday, April 13, 2020 7:28 AM
To: [email protected] <[email protected]>
Subject: Is leadingSkip and trailingSkip used (exclusively) with binary data
formats that contain an island of text?
Hi Folks,
I can't imagine a text data format in which the text doesn't start until after
n bytes (or n bits). That is, I can't see the need for leadingSkip in a text
data format. Is there a use case for leadingSkip in text data formats? If
leadingSkip doesn't apply to text data formats, then why do I have to specify
it?
However, I can imagine an island of text embedded in a binary data format: the
text doesn't start until after n bytes (or n bits). Yes?
Likewise, I can't imagine a text data format in which there is some text and
then the next text is after n bytes (or n bits). That is, I can't see the need
for trailingSkip in a text data format. Is there a use case for trailingSkip in
text data formats? If trailingSkip doesn't apply to text data formats, then why
do I have to specify it?
However, I can imagine islands of text embedded in a binary data format: there
is an island of text and then the next island of text doesn't start until after
n bytes (or n bits). Yes?
/Roger