Hi Mike,
Thank you! Very interesting. So the stuff at the end of message is not padding;
rather, it's terminator characters. Neat!
In my original post, I made a typo. There is no null character in the (text)
input file.
Okay, I tried the approach you recommended:
<xs:element name="label-message">
<xs:complexType>
<xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="infix">
<xs:element name="row" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence dfdl:separator=":"
dfdl:separatorPosition="infix">
<xs:element name="label" type="xs:string" />
<xs:element name="message" type="xs:string"
dfdl:terminator="%WSP;%NL;"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
Unfortunately, the XML output just contains the first row:
<label-message>
<row>
<label>Dear Sir</label>
<message> Thank you for your response.</message>
</row>
</label-message>
Is my DFDL schema faithfully implementing your suggestion? If yes, does this
mean there is a bug in Daffodil?
/Roger
From: Beckerle, Mike <[email protected]>
Sent: Tuesday, January 8, 2019 9:33 AM
To: [email protected]
Subject: [EXT] Re: How to trim a string of whitespace padding?
Hmmmm. You forgot to mention....
* Rows are separated by exactly one newline.
* Within a row, there is a label and message, separated by a colon.
* The message is a string padded by zero or more whitespace characters
(newlines, spaces).
The message is NUL terminated. This is what makes it unambiguous.
Padding isn't going to do what you need here. It's not a way to edit the string
content.
I think lengthKind 'pattern' here isn't helping you anymore.
What about
<element name="message" dfdl:lengthKind="delimited"
dfdl:terminator="%WSP*;%NUL;" .../>
This is terminated by any amount of whitespace followed by a NUL.
This *should* work. I think.
I am a little worried about a possible bug.....that the stack of delimiters
Daffodil sets up is going to have the surrounding sequence's NL separator on
it, and that when scanning for the terminating delimiter for message, Daffodil
is going to stop at the first NL and claim that's the end of the row. This
would be a bug. This shouldn't be the case. The fact that message has its own
terminator should take the separator out of scope until the message element has
been isolated. But I'm a little worried that won't happen.
________________________________
From: Costello, Roger L. <[email protected]<mailto:[email protected]>>
Sent: Monday, January 7, 2019 8:43:58 AM
To: [email protected]<mailto:[email protected]>
Subject: How to trim a string of whitespace padding?
Hello DFDL community,
My input looks like this:
[cid:[email protected]]
One way to characterize the input is this:
* Rows are separated by exactly one newline.
* Within a row, there is a label and message, separated by a colon.
* The message is a string padded by zero or more whitespace characters
(newlines, spaces).
I tried this DFDL schema:
<xs:element name="label-message">
<xs:complexType>
<xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="infix">
<xs:element name="row" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence dfdl:separator=":"
dfdl:separatorPosition="infix">
<xs:element name="label" type="xs:string" />
<xs:element name="message" type="xs:string"
dfdl:lengthKind="pattern"
dfdl:lengthPattern="[\x20-\x7F]+?(?=(\x0D\x0A|\x0A)([\x20-\x7F]|$))"
dfdl:representation="text"
dfdl:encoding="ASCII"
dfdl:textTrimKind="padChar"
dfdl:textStringPadCharacter="%WSP;"
dfdl:textStringJustification="left"
dfdl:terminator="%NUL;"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
Running that I discovered that %WSP; can't be a padding character because it's
not a single character.
Is there a way to express this characterization in DFDL?
Note: I know that there are other ways to characterize the input, but I want to
see if it's possible to express this characterization in DFDL.
/Roger