Ah. I get it. My suggestion can't work because it will only trim away character 
that match the pad char, and you want it to simply skip the junk on parse, fill 
with 0x00 on unparse.


How about this:


<element name="fieldName" dfdl:length="32" dfdl:lengthKind="explicit"

   dfdl:fillByte="%#r00;">

   <complexType>

       <element name="value" type="xs:string"

              dfdl:lengthKind="pattern"

              dfdl:lengthPattern="{[^\x00]{0,31}(?=\x00)|.{32}" />

   </complexType>

</element>


This requires you to model this seemingly simple type string as a complex 
element, but that's a common thing in DFDL when a seemingly simple thing has a 
subtle/complex representation. The shape of the DFDL schema must be rich enough 
to express the complexity of the representation. You have to model what the 
data is, not what you want it to appear like.



________________________________
From: Costello, Roger L. <[email protected]>
Sent: Monday, October 1, 2018 11:48:55 AM
To: [email protected]
Subject: RE: How to declare a string element where the string stops at the 
first null (hex 0) symbol?


Hi Mike,



Thank you! That is very helpful.



I added all the things you suggested. See below. Unfortunately, that just 
resulted in stripping off the rightmost null (hex 0) symbols, leaving this:



<field-name>marker-col?C????xFE</field-name>



Someone on StackOverflow says that C indicates “character field” and xFE 
indicates 254 bytes. I’m not sure that that is true, however.



What I desire (I think) is this:



<field-name>marker-col</field-name>



Suggestions?



<xs:element     name="field-name"
                       type="xs:string"

                        dfdl:length="32"

                        dfdl:lengthKind="explicit"

                        dfdl:lengthUnits="characters"
                       dfdl:textTrimKind="padChar"
                        dfdl:textStringPadCharacter="%NUL;"
                        dfdl:textStringJustification="left"/>





From: Mike Beckerle <[email protected]>
Sent: Monday, October 1, 2018 11:02 AM
To: [email protected]
Subject: Re: How to declare a string element where the string stops at the 
first null (hex 0) symbol?



Hi Roger,



Looks like you are looking to create a 32-byte long element with NUL "padding".



Question: Is there always at least one NUL at the end, or can a field name use 
up all 32 bytes with non-NUL characters? I'm going to guess here (because it's 
more common in data I've seen), that a field name that occupies all 32 bytes 
would not have a NUL at all.



In that case this is fixed-length data (dfdl:lengthKind="explicit"), and the 
properties that do what you want are for "padding/trimming", in section 13.2 
and, as this is a string element (not a number or boolean) section 13.4.



textTrimKind (used for parsing)

textPadKind (used for unparsing)



textStringJustification (which side the text is padded/trimmed on, or "center" 
justified)

textStringPadCharacter="%NUL;" (note: must use DFDL Entity to represent this.)

truncateSpecifiedLengthString (if string is too long on unparse - chop it, or 
is it an error?)



These names seem bulky, but DFDL lets you have simultaneously left justified 
text strings, but right justified text numbers in the same format, since this 
is so common for the elements to need different justification directions.



A note when you are testing - DFDL spec requires that the padding/filling area 
after the data gets filled with the pad character. So data like in your example 
will not "round trip", as it won't preserve the junk that is there.



If you create a TDML test, you will need to set roundTrip="twoPass" to get it 
to compare the infoset after re-parsing the data it unparsed.





________________________________

From: Costello, Roger L. <[email protected]<mailto:[email protected]>>
Sent: Monday, October 1, 2018 10:27 AM
To: [email protected]<mailto:[email protected]>
Subject: How to declare a string element where the string stops at the first 
null (hex 0) symbol?



Hi Folks,

I am working on a DFDL schema for parsing dBase files.

One of its fields is "Field Name". The dBase specification says this about that 
field:

Field name in ASCII, zero-filled, 32 bytes.

I have a sample dBase file with this hex value for field name:

6D 61 72 6B 65 72 2D 63 6F 6C 00 43 00 00 00 00
FE 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

My hex editor also displays the hex as text:

marker-col.C....þ...............

I believe the actual field name is "marker-col" and the rest is garbage. (I 
have this belief because I have a dBase tool and it displays "marker-col")

How do I declare, in DFDL, that the element's value is, "The text up to, but 
not including, the first null (hex 0) symbol; discard the null symbol and all 
the following hex digits"?

/Roger

Reply via email to