Hi Roger, Your description sounds fine, although there are some subtleties to consider when choosing the schema type. The schema type "unsignedInt" implies a 32-bit-long unsigned integer but you specified a length of 1 byte (8 bits) for the binary number, which makes it sound like it should be "unsignedByte". The type also matters for the text number; it constrains the text number's range of possible values as well. My point is that the schema type shouldn't be smaller than the binary or text number's range but you can use a bigger type if you prefer to constrain the number explicitly. Still, picking a matching type allows you to constrain the number's length and range of values implicitly without needing explicit Daffodil properties.
John From: Roger L Costello <coste...@mitre.org> Sent: Tuesday, November 16, 2021 3:02 PM To: users@daffodil.apache.org Subject: EXT: Understanding text numbers versus binary numbers WARNING: This email originated from outside of GE. Please validate the sender's email address before clicking on links or attachments as they may not be safe. Hi Folks, I am writing a description of text numbers versus binary numbers. Below is what I wrote. Does it make sense? I feel like I kind of ran out of steam at the end, and probably ended too abruptly. What do you think? Should I add more to the end? Any comments you have would be appreciated. /Roger Subject: Understanding text numbers versus binary numbers I have two files. Each contains one number, the number 30. The numbers represent the number of students in a classroom. I opened the first file using a hex editor. Here's what the hex editor displayed: [cid:image003.png@01D7DC98.60794920] The number 30 is stored in the first file as two ASCII characters, the character '3' (which in the ASCII encoding scheme is hex 33), followed by the character '0' (which in the ASCII encoding scheme is hex 30). The number in the first file is stored as a text number. Then I opened the second file using the hex editor and here's what I saw: [cid:image005.png@01D7DC98.60794920] The number 30 is stored in the second file as a binary, 2's complement number. Hex 1E corresponds to this bit sequence 0001 1110, which has the value 24 + 23 + 22 + 21 = 16 + 8 + 4 + 3 = 30. The number in the second file is stored as a binary number. Suppose that you are designing a new binary data format, say, a new image format. The format will be used to contain image data and metadata about the image such as its size, date of creation, and so forth. The metadata contains several numbers (such as size). Should the data format require the numbers be stored as text numbers or binary numbers? There are some advantages to storing the numbers as text numbers: * They can be easily read by humans * Numbers represented by characters eliminate problems caused by word length and machine internal representation differences. [1] Binary numbers have an advantage: * They might use less bytes (in the above example, the text number used two bytes whereas the binary number used one byte) DFDL (Data Format Description Language) can be used to specify both kinds of numbers (text numbers and binary numbers). DFDL builds on top of XML Schema (XML Schema hosts DFDL). Using DFDL we specify the number by giving it a name (NumStudents) and its type (xs:nonNegativeInt) using an XML Schema element declaration: <xs:element name="NumStudents" type="xs:unsignedInt" That specifies that instances of the data format contain an unsigned integer, but it does not specify whether the unsigned integer is to be stored as a text number or a binary number. The properties of a text number are different than the properties of a binary number. To specify that the number is stored as a text number we use these "text number properties": <xs:element name="NumStudents" type="xs:unsignedInt" dfdl:representation="text" dfdl:encoding="ASCII" dfdl:textNumberPattern="#" dfdl:textNumberRep="standard" dfdl:textStandardBase="10" dfdl:textStandardExponentRep="E" dfdl:textStandardZeroRep="0" /> To specify that the number is stored as a binary number we use these "binary number properties": <xs:element name="NumStudents" type="xs:unsignedInt" dfdl:representation="binary" dfdl:length="1" dfdl:lengthKind="explicit" dfdl:lengthUnits="bytes" dfdl:byteOrder="littleEndian" dfdl:binaryNumberRep="binary"/> [1] Quote from section 5.1.7 of the NITF specification