Hello,
This issue has been posted on Stackoverflow originally here:
https://stackoverflow.com/questions/58168427/dfdl-decoding-of-enumerated-binary-data
Since then, I've realised that this mailing list is probably the better
audience :-)
Here is my original post (with some edits to keep things a bit shorter):
I'm currently working on a DFDL schema for a legacy (custom) binary file format
used in a system to translate to either XML or JSON. I've got some binary data
that is enumerated values, i.e. the C-struct data type looks like this (and
stored as a byte):
typedef enum _SomeEnum
{
ENUM_1 = 0x00,
ENUM_2 = 0x01,
ENUM_3 = 0x02
} SomeEnum;
I can decode the enumeration to a numerical value just fine with this DFDL
schema code (including checks for speculative parsing):
<xs:element name="SomeEnum" type="xs:unsignedByte>
<xs:annotation>
<xs:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:assert><![CDATA[{ . lt 3 }]]></dfdl:assert>
</xs:appinfo>
</xs:annotation>
</xs:element>
which translates to this XML with the enum field equal to 1 in this instance:
<SomeEnum>1</SomeEnum>
What I would like is to have the ability to translate the decoded enumeration
value to a string so that the XML result looks like this:
<SomeEnum>ENUM_1</SomeEnum>
Brandon Sloane (Daffodil dev) then responded to the post (also edited, just to
highlight the preferred solution):
The newest release of Daffodil (2.4.0) includes a DFDL extension designed
specifically for this problem. Some documentation available on the Daffodil
wiki<https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Feature+to+support+enumerations+and+typeValueCalc>.
The theory here is that you can define a simple type that is a restriction on
xs:string as an xsd enumeration; then supply the corresponding binary values as
a DFDL annotation:
<xs:simpleType name="uint8" dfdl:length="1">
<xs:restriction base="xs:unsignedInt"/>
</xs:simpleType>
<xs:simpleType name="SomeEnumType" dfdlx:repType="tns:uint8">
<xs:restriction base="xs:string">
<xs:enumeration value="ENUM_1" dfdlx:repValues="0" />
<xs:enumeration value="ENUM_2" dfdlx:repValues="1" />
<xs:enumeration value="ENUM_3" dfdlx:repValues="2" />
</xs:restriction>
</xs:simpleType>
<xs:element name="SomeEnum" type="tns:SomeEnumType" />
The benefit here is that the schema is much more maintainable, and Daffodil
will perform the lookup using a direct hash-table lookup, instead of needed to
walk through an if-else tree.
I then ran into some issues with the above recommendation:
Daffodil produces the following error for the above schema:
[error] Schema Definition Error: When lengthKind='implicit', both minLength and
maxLength facets must be specified.
Adding xs:minLength and xs:maxLength, the parser complains that they need to be
the same value. Setting them the same, the parser then crashes. Not sure what
these need to be.
I found this JIRA issue<https://issues.apache.org/jira/browse/DAFFODIL-2146>
(https://issues.apache.org/jira/browse/DAFFODIL-2146) . It uses the
inputTypeCalcString inputValueCalc function, but that just throws the error
that inputTypeCalcString is an unsupported function. It seems these are
deprecated in version 2.4.0 even though the fix version for these are version
2.4.0.
What I have realised is that it can translate from one type to another only if
that type is the exact same length. So this works:
<xs:simpleType name="SomeEnumType" dfdlx:repType="xs:unsignedByte">
<xs:restriction base=" xs:unsignedByte ">
<xs:enumeration value="55" dfdlx:repValues="0" />
<xs:enumeration value="56" dfdlx:repValues="1" />
<xs:enumeration value="57" dfdlx:repValues="2" />
</xs:restriction>
</xs:simpleType>
The value 0 is translated to 55, 1 to 56 and 2 to 57. But the moment I change
the translated base to something else, Daffodil doesn't like it, e.g.
<xs:simpleType name="SomeEnumType" dfdlx:repType="xs:unsignedByte">
<xs:restriction base=" xs:unsignedShort ">
<xs:enumeration value="55" dfdlx:repValues="0" />
<xs:enumeration value="56" dfdlx:repValues="1" />
<xs:enumeration value="57" dfdlx:repValues="2" />
</xs:restriction>
</xs:simpleType>
It complains that shorts are 16-bits in length (the repValue base is 8-bits).
Any ideas/help?
Thanks
Pirow Engelbrecht | Senior Design Engineer
Tel +27 12 678 9740 (ext. 9879) | Cell +27 63 148 3376
76 Regency Drive | Irene | Centurion | 0157<https://goo.gl/maps/v9ZbwjqpPyL2>
[create-transition]<https://etion.co.za/>
Facebook<https://www.facebook.com/Etion-Limited-2194612947433812?_rdc=1&_rdr> |
YouTube<https://www.youtube.com/channel/UCUY-5oeACtLk2uTsEjZCU6A> |
LinkedIn<https://www.linkedin.com/company/etionltd> |
Twitter<https://twitter.com/Etionlimited> |
Instagram<https://www.instagram.com/Etionlimited/>