Re: [Wireshark-dev] No tvb_get for string-encoded numbers?

Hadriel Kaplan Fri, 04 Apr 2014 14:04:31 -0700

On Apr 4, 2014, at 4:04 PM, Guy Harris <[email protected]> wrote:

> On Apr 4, 2014, at 7:30 AM, Hadriel Kaplan <[email protected]> wrote:
> 
>> I might be overlooking something, but I don’t see a tvb_get_* function to 
>> get a uint8/16/32/64 that was encoded as a ascii or utf-8 string in the 
>> packet. Is there such a thing?
> 
> No.
> I've occasionally also thought there should be such a routine.


I've started coding it today but my real day-job is getting in the way. :)


> Note, though, that, whilst tvb_get_guint8() and tvb_get_{n,le}tohXXX() can 
> never fail, because every possible sequence of octets is a valid 2's 
> complement integral value, routines to get a number encoded as a string *can* 
> fail, e.g. 0123xyzw is not a valid number in bases 8, 10, or 16.

I'm thinking the tvb_get_* routines should just execute strtol() internally 
(and the other flavors depending on number type)... and thus return 0 if it 
couldn't be read. And it would internally handle out-of-range errors, but still 
return 0.

But I'm also thinking the tvb_get_* would take the same optional char** endptr 
param that strtol() takes, so it can pass back the ending offset - that way you 
could catch things like "0123xyzw", because if I recall correctly it actually 
*is* successfully converted to a number: the long int 123 for the "0123" 
portion. (though I could be mis-remembering)

Fwiw, looking at some of the consumers of such a routine, they don't seem 
overly concerned with errors currently when they call atoi/strtol/etc.

Also, I'm only planning to do this for base 10. A proto needing something else 
can do it the long way.


> There are other cases where a tvb_get_ routine can return "you lose", e.g. 
> tvb_get_string_enc() can fail if there are invalid octet sequences (about the 
> only encodings I know of where *every* octet sequence is a valid string are 
> some of the ISO 8859-n encodings), and at least some floating-point formats 
> probably have invalid values (I guess an IEEE NaN is "valid", at least to the 
> extent that if we try to format it it'll show up as "NaN", but if we try to 
> do calculations with it we might get a floating-point exception.

As an aside, I'm *only* thinking of having this for ASCII strings; anything 
else needs to do it the long way.

For protocols which are actually truly UTF-8, I'm planning to just assume 
treating them as ASCII is ok, because as far as I know the atoi/strtol/etc. 
functions don't actually care: if they see the ASCII characters for digits (and 
+/-/etc.) they'll parse it, else not. So any non-ASCII UTF-8 character in the 
sequence is meaningless to them and they stop parsing at that character.

And since the current potential consumers of these routines only call 
atoi/strtol/etc. right now, they're really only doing the conversion for ASCII 
anyway, afaik.  But I'll test it to verify, in case I'm wrong.

So what that means is I don't plan to have an encoding ENC_* param for the 
tvb_get_* routines, nor need the tvb_get_* routines to internally call 
tvb_get_string_enc(). In fact I'm hoping to do without any temp string being 
created, even inside the tvb_get_* functions.


> And I'd like to see proto_tree_add_XXX_item() routines that add an item with 
> a particular type *and* take a pointer argument and return the value for the 
> item through that pointer;

Good idea.


>> And if we had common functions handle ascii and utf-8 string-encoded 
>> numbers, they could avoid creating temporary strings as well.
> 
> The only real encoding issues are "ASCII superset" (so that "0123456789", for 
> example, is encoded the same as in ASCII) vs. "2 or more bytes per ASCII 
> character" (e.g., UCS-2, UTF-16, and UCS-4) vs. "one of those 7-bit GSM 
> character encodings" vs. "EBCDIC".

See above.  The number of proto's that need such a thing for anything but ASCII 
and UTF-8 is pretty small I think. And it appears the ones that are truly UTF-8 
do it for ASCII only anyway.

-hadriel

___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <[email protected]>
Archives:    http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
             mailto:[email protected]?subject=unsubscribe

Re: [Wireshark-dev] No tvb_get for string-encoded numbers?

Reply via email to