On Jun 19, 2012, at 12:01 PM, Jakub Zawadzki wrote:
> Hi,
>
> String from tvb_get_ephemeral_string() still needs escaping with
> format_text(),
> cause it doesn't check encoding.
>
> When you use:
> tvb_get_ephemeral_string_enc(tvb, offset, length, ENC_UTF_8 | ENC_NA);
>
> It guarantees result encoded in UTF-8:
> * string as converted from the appropriate encoding to UTF-8 ...
>
> (Code to do it is still in XXX's but this is bug in libwireshark and no one
> can blame you that you used wrong function :))
Or, rather, there is no code to do it and there's an XXX comment noting that:
case ENC_UTF_8:
/*
* XXX - should map all invalid UTF-8 sequences
* to a "substitute" UTF-8 character.
*/
strbuf = tvb_get_ephemeral_string(tvb, offset, length);
break;
*That's* why escaping is still needed - we don't handle malformed UTF-8 strings.
If we were to:
store strings in the protocol tree as a combination of an encoding
value and a raw bucket of bytes;
in the display filter code:
implement comparison of string fields with text strings by
attempting to convert the string field value to UTF-8 at that point and having
all comparisons other than != fail and have != succeed if the string can't be
converted;
also support comparison of string fields with *byte* strings,
e.g. a sequence of hex-encoded octets, and have that compare the raw bucket of
bytes;
have the "construct a filter from this field" UI generate, for string
fields, a comparison with a UTF-8 string if the value can be converted to UTF-8
and a comparison with a byte string if it can't;
in the code that generates displayable text for the protocol tree,
replace anything in the string that can't be mapped to UTF-8 to a "substitute"
UTF-8 character, including mapping invalid UTF-8 sequences if the string's
encoding is UTF-8;
do similar stuff if a string field's value is used to generate anything
in the packet summary.
that might handle this, and might also defer conversion to UTF-8 until it's
absolutely necessary, which might cut CPU usage.
(A further optimization might be to, in the display filter code implement
comparison of string fields with text strings by attempting to convert the
*text string* to the field's encoding at that point, with the result cached so
you only have to do it once, and having all comparisons other than != fail and
have != succeed if the string can't be converted.)
As for PDML:
http://www.nbee.org/doku.php?id=netpdl:visualization_expression#data_associated_to_each_netpdl_field
says
value (always present)
It contains the field value as an hex string. In other words,
the content of the value attribute in case of a 16 bit field whose value is
0800 (hex) is the string 0800. This value is exactly the same as written in the
hex file, i.e. it does not change in case of little-endian or big-endian
fields. Please note that for bitmasked fields, this value corresponds to the
'unmasked' value (i.e. the real field value must be derived from a bitwise AND
operation between the mask and the value).
showvalue(always present)
It contains the field value in a “printable” form. For
instance, field value ffffffffffff (related to a MAC address) can be associated
to value ffffff-ffffff and assigned to this attribute.
which seems to indicate that, for a string field, "value" should be the raw hex
bytes of the string (except perhaps for byte-swapping in the case of UCS-2,
UCS-4, or UTF-16), *NOT* anything that looks like the text, and suggests to me
that "showvalue" would be the same as what's generated by the code that
generates displayable text for the protocol tree, i.e. it'd have invalid UTF-8
sequences replaced by a "substitute" character.
However, I don't know whether that's what programs out there that read PDML
would expect - and it doesn't include the encoding for the string, so that code
wouldn't know what to do with the string's value.
___________________________________________________________________________
Sent via: Wireshark-dev mailing list <[email protected]>
Archives: http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
mailto:[email protected]?subject=unsubscribe