Thank you Steve!
Your last suggestion is very intriguing:
It might make sense to use separate elements for each type of identifier. For
example, maybe something like this would be more useful:
<Storage_Class>0</Storage_Class>
<Storage_Class_Desc_Short>
IMAGE_SYM_CLASS_NULL
</Storage_Class_Desc_Short>
<Storage_Class_Desc_Long>
No assigned storage class
</Storage_Class_Desc_Long>
How would that be implemented? One field in the input file results in 4
elements in the XML output. I didn't know that is possible. How to do that?
/Roger
-----Original Message-----
From: Steve Lawrence <[email protected]>
Sent: Monday, February 4, 2019 9:52 AM
To: [email protected]; Costello, Roger L. <[email protected]>
Subject: [EXT] Re: Best practice for replacing a numeric value with a symbolic
value?
I think it really depends on who will use the XML infoset and how they plan to
use it. If everyone in the world knows that 0 means IMAGE_SYMCLASS_NULL, then
maybe you don't need the converted value or description. Or if no one would
ever know what 0 means maybe it makes sense to only have the description. We've
come across formats where people actually do care about the raw value because
that's what they know and what they are use to, but the converted value is more
useful in certain calculations, so we end up including both the raw and logical
values.
Some things to consider:
- Certain field types may be easier to filter on than others. For example,
numeric values can be compared in ranges. Maybe someone only cares about fields
greater than 2--maintaining the numeric values helps with that.
- Sometimes multiple numeric values map to a single logical value. For example,
maybe 0-4 have unique meanings, but 5-15 just mean "ILLEGAL_VALUE". If you hide
the numeric value, that might pose difficulties in unparsing so you've lost
that information--you now need to use outputValueCalc to determine which
numeric value to unparse when the logical value is ILLEGAL_VALUE. Maybe there's
an obvious answer, but maybe not.
It might make sense to use separate elements for each type of identifier. For
example, maybe something like this would be more useful:
<Storage_Class>0</Storage_Class>
<Storage_Class_Desc_Short>
IMAGE_SYM_CLASS_NULL
</Storage_Class_Desc_Short>
<Storage_Class_Desc_Long>
No assigned storage class
</Storage_Class_Desc_Long>
The benefit to this is that a user could query and extract exactly what they
want without having to do any string processing.
- Steve
On 2/4/19 8:03 AM, Costello, Roger L. wrote:
> Hello DFDL community,
>
> I am working on a DFDL schema for Windows EXE files.
>
> There are many places in my DFDL schema where I replace a numeric value with
> a symbolic.
>
> For example, there is one field called "Storage Class". The specification
> enumerates a couple dozen numeric values for this field and what each value
> means:
>
> IMAGE_SYM_CLASS_NULL (0) No assigned storage class
>
> IMAGE_SYM_CLASS_AUTOMATIC (1) The automatic (stack) variable. The Value field
> specifies the stack frame offset.
>
> IMAGE_SYM_CLASS_EXTERNAL (2) The Value field indicates the size if the
> section number is IMAGE_SYM_UNDEFINED (0). If the section number is not zero,
> then the Value field specifies the offset within the section.
>
> IMAGE_SYM_CLASS_STATIC (3) The offset of the symbol within the section. If
> the Value field is zero, then the symbol represents a section name.
> ...
>
> Do you have a recommendation for the generated XML? Which of the following is
> best practice?
>
> (a) <Storage_Class>0</Storage_Class>
>
> (b) <Storage_Class>IMAGE_SYM_CLASS_NULL</Storage_Class>
>
> (c) <Storage_Class>IMAGE_SYM_CLASS_NULL (0)</Storage_Class>
>
> (d) <Storage_Class>No assigned storage class</Storage_Class>
>
> (e) <Storage_Class>IMAGE_SYM_CLASS_NULL (0) No assigned storage
> class</Storage_Class>
>
> (f) Other (what?)
>
> I am eager to hear your thoughts on this.
>
> /Roger
>