[PHP-DEV] is_numeric_string an hexadecimal numbers (123 == 0x7B)
Hi internals! The internal is_numeric_string [1] function is used to check whether a string contains a number (and to extract that number). Currently is_numeric_string also accepts hexadecimal strings [2] (apart from the normal decimal integers and doubles). This can cause some quite odd behavior at times. E.g. string comparisons also use is_numeric_string, resulting in the behavior: var_dump('123' == '0x7b'); // true In all other parts of the engine hexadecimal strings are not recognized [3]: var_dump((int) '0x7b'); // int(0) This also causes minor problems in other parts of the engine where is_numeric_string is used. E.g. $string = 'abc'; var_dump($string['0xabc']); // string(a) // 0xabc is first accepted as a number by is_numeric_string, but then cast to 0 by convert_to_long But: $string = 'abc'; var_dump($string['0abc']); // outputs (as expected): Notice: A non well formed numeric value encountered in /code/8KXrYZ on line 9 NULL In my eyes accepting hex strings in is_numeric_string leads to a quite big WTF effect and causes problems and as such should be dropped. I don't think this has much BC impact, so it should be possible to change it. Nikita [1]: http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_operators.h#is_numeric_string [2]: http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_operators.h#131 [3]: http://www.php.net/manual/en/language.types.string.php#language.types.string.conversion -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] is_numeric_string an hexadecimal numbers (123 == 0x7B)
hi! On Tue, Apr 17, 2012 at 1:20 PM, Nikita Popov nikita@googlemail.com wrote: [3]: http://www.php.net/manual/en/language.types.string.php#language.types.string.conversion From the manual: If the string starts with valid numeric data, this will be the value used. Otherwise, the value will be 0 (zero). Valid numeric data is an optional sign, followed by one or more digits (optionally containing a decimal point), followed by an optional exponent. The exponent is an 'e' or 'E' followed by one or more digits. So no problem from to change this confusing behavior. Cheers, -- Pierre @pierrejoye | http://blog.thepimp.net | http://www.libgd.org -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] is_numeric_string an hexadecimal numbers (123 == 0x7B)
2012/4/17 Nikita Popov nikita@googlemail.com var_dump('123' == '0x7b'); // true In all other parts of the engine hexadecimal strings are not recognized [3]: var_dump((int) '0x7b'); // int(0) Hi, Nikita I personally would rather change the type-conversion for strings to integer ... At least if you force it to do a type-cast (in other words: forcing to get any valuable integer of that string) ... Bye Simon -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] is_numeric_string an hexadecimal numbers (123 == 0x7B)
On Tue, 17 Apr 2012 13:20:33 +0200, Nikita Popov nikita@googlemail.com wrote: The internal is_numeric_string [1] function is used to check whether a string contains a number (and to extract that number). Currently is_numeric_string also accepts hexadecimal strings [2] (apart from the normal decimal integers and doubles). [...] In my eyes accepting hex strings in is_numeric_string leads to a quite big WTF effect and causes problems and as such should be dropped. I don't think this has much BC impact, so it should be possible to change it. I think definitely has a larger BC impact than you're portraying, I can see some people making comparisons against '0xA' instead of 0xA. Besides, this is part of the Zend API. It's already used in many extensions (though possibly some of these should be using a stricter function) and changing its behavior is a stable branch is not wise: http://lxr.php.net/opengrok/search?q=project=PHP_TRUNKdefs=refs=is_numeric_string But in case, if there are no graver BC impacts, +1 for master. -- Gustavo Lopes -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] is_numeric_string an hexadecimal numbers (123 == 0x7B)
On Tue, 17 Apr 2012 13:35:48 +0200, Simon Schick simonsimc...@googlemail.com wrote: 2012/4/17 Nikita Popov nikita@googlemail.com var_dump('123' == '0x7b'); // true In all other parts of the engine hexadecimal strings are not recognized [3]: var_dump((int) '0x7b'); // int(0) I personally would rather change the type-conversion for strings to integer ... At least if you force it to do a type-cast (in other words: forcing to get any valuable integer of that string) ... I think that would be an error. As was mentioned a few months ago when 0b was introduced, no other number format has this behavior. You can't do 123 == 0b10 or 123 == 0876. Extending this hexadecimal oddity instead of eliminating it is inconsistent with the treatment given to those other formats. -- Gustavo Lopes -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] is_numeric_string an hexadecimal numbers (123 == 0x7B)
2012/4/17 Gustavo Lopes glo...@nebm.ist.utl.pt: I think that would be an error. As was mentioned a few months ago when 0b was introduced, no other number format has this behavior. You can't do 123 == 0b10 or 123 == 0876. Extending this hexadecimal oddity instead of eliminating it is inconsistent with the treatment given to those other formats. -- Gustavo Lopes Hi, Gustavo That's something I didn't know of ... if we're doing that, it should, of course, be also be done for the dual system. The only thing I wonder about is the code examples you're giving ... I would expect this to work if we start to change something here: var_dump((int) '0x7b'); // int(123) var_dump((int) '0b011'); // int(123) var_dump((int) '0123'); // int(123) The last example was not mentioned here before but as you set in an example, I did it here as well ... Bye Simon -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] is_numeric_string an hexadecimal numbers (123 == 0x7B)
On 04/17/2012 01:20 PM, Nikita Popov wrote: I don't think this has much BC impact, so it should be possible to change it. Same here, i never even knew that this worked in a string context until recently. Autocast/comparison rules are already complicated enough as they are documented now, and i failed to find anything in the manual that would actually say that hex in a string context is support to work at all ... I can't really judge the BC implications though, so the best way would be to start throwing E_DEPRECATED warnings for now ... or maybe go the X11 way of deliberately break obscure feature and see how many complaints we get ;) -- hartmut -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] is_numeric_string an hexadecimal numbers (123 == 0x7B)
On Apr 17, 2012, at 5:39, Hartmut Holzgraefe hartmut.holzgra...@gmail.commailto:hartmut.holzgra...@gmail.com wrote: Same here, i never even knew that this worked in a string context until recently. Autocast/comparison rules are already complicated enough as they are documented now, and i failed to find anything in the manual that would actually say that hex in a string context is support to work at all ... Would this end up changing the behavior of the user land is_numeric() function? The behavior actually is documented under that function: Finds whether the given variable is numeric. Numeric strings consist of [...]. Hexadecimal notation (0xFF) is allowed too but only without sign, decimal and exponential part. If so, although this does technically break BC in that case, I for one will not miss it. The only effect this will have on our code is to make validation of numeric input much easier and less error-prone. -- Bob Williams Sent from my iPad Notice: This communication, including attachments, may contain information that is confidential. It constitutes non-public information intended to be conveyed only to the designated recipient(s). If the reader or recipient of this communication is not the intended recipient, an employee or agent of the intended recipient who is responsible for delivering it to the intended recipient, or if you believe that you have received this communication in error, please notify the sender immediately by return e-mail and promptly delete this e-mail, including attachments without reading or saving them in any manner. The unauthorized use, dissemination, distribution, or reproduction of this e-mail, including attachments, is prohibited and may be unlawful. If you have received this email in error, please notify us immediately by e-mail or telephone and delete the e-mail and the attachments (if any).
Re: [PHP-DEV] is_numeric_string an hexadecimal numbers (123 == 0x7B)
On Tue, Apr 17, 2012 at 1:44 PM, Gustavo Lopes glo...@nebm.ist.utl.pt wrote: On Tue, 17 Apr 2012 13:20:33 +0200, Nikita Popov nikita@googlemail.com wrote: The internal is_numeric_string [1] function is used to check whether a string contains a number (and to extract that number). Currently is_numeric_string also accepts hexadecimal strings [2] (apart from the normal decimal integers and doubles). [...] In my eyes accepting hex strings in is_numeric_string leads to a quite big WTF effect and causes problems and as such should be dropped. I don't think this has much BC impact, so it should be possible to change it. I think definitely has a larger BC impact than you're portraying, I can see some people making comparisons against '0xA' instead of 0xA. Yes, this definitely does have BC impact, but I don't think it is particularly large. The affected areas mainly would be: * String comparisons using == * Strings passed to internal functions which accept the value through an l zend_parse_parameters (functions doing manual type handling via the Z_TYPE and convert_to_long do not accept hex already now) * The userland function is_numeric The first two would mainly be a problem if somebody - as you already mention - has written '0xA' == $foo style comparisons or did stuff like round($number, '0xA'). Both cases - in my eyes - aren't particularly probably as anyone who knows what a hex number is probably also knows the difference between a string literal and a number literal. The last one is more problematic. It is explicitly documented as accepting hexadecimal numbers. In my eyes it too should not accept them, but I could imagine that people rely on this. Besides, this is part of the Zend API. It's already used in many extensions (though possibly some of these should be using a stricter function) and changing its behavior is a stable branch is not wise: http://lxr.php.net/opengrok/search?q=project=PHP_TRUNKdefs=refs=is_numeric_string I've already looked at some of these and in most (all?) cases the intended behavior seems to be to not allow hex (passing hex in those situations actually creates some kind of broken behavior). Nikita -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] is_numeric_string an hexadecimal numbers (123 == 0x7B)
On 4/17/12 08:17, Nikita Popov nikita@googlemail.com wrote: The last one is more problematic. It is explicitly documented as accepting hexadecimal numbers. In my eyes it too should not accept them, but I could imagine that people rely on this. This always struck me as mistaken design. Why accept hex or decimal, but not the other bases that PHP knows about? I can see a small number of scenarios where having it accept hex input is definitely useful, but I suspect that the vast majority of cases out there where it's used is in validation routines expecting straightforward, base-10 numbers. And I know that, of all such cases I've seen (and I've seen quite a few, since one of our interview test questions implicitly covers it), most programmers are blissfully ignorant of the hex support and unwittingly allow bad user data to slip into their applications to become trusted data. Not good. As I mentioned in my last message, I wouldn't be bothered if this behavior were simply removed. I think it would affect a small number of people knowingly relying on the feature, while it would fix probably many thousands of bugs out there lurking in less-aware programmers' code. Even better, though, might be to add a flag parameter that would give the programmer explicit control over its behavior, including which bases to allow (and including the bases currently MIA). -Bob -- Robert E. Williams, Jr. Associate Vice President of Software Development Newtek Businesss Services, Inc. -- The Small Business Authority https://www.newtekreferrals.com/rewjr http://www.thesba.com/ Notice: This communication, including attachments, may contain information that is confidential. It constitutes non-public information intended to be conveyed only to the designated recipient(s). If the reader or recipient of this communication is not the intended recipient, an employee or agent of the intended recipient who is responsible for delivering it to the intended recipient, or if you believe that you have received this communication in error, please notify the sender immediately by return e-mail and promptly delete this e-mail, including attachments without reading or saving them in any manner. The unauthorized use, dissemination, distribution, or reproduction of this e-mail, including attachments, is prohibited and may be unlawful. If you have received this email in error, please notify us immediately by e-mail or telephone and delete the e-mail and the attachments (if any). -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] is_numeric_string an hexadecimal numbers (123 == 0x7B)
2012/4/17 Simon Schick simonsimc...@googlemail.com Hi, Gustavo That's something I didn't know of ... if we're doing that, it should, of course, be also be done for the dual system. The only thing I wonder about is the code examples you're giving ... I would expect this to work if we start to change something here: var_dump((int) '0x7b'); // int(123) var_dump((int) '0b011'); // int(123) var_dump((int) '0123'); // int(123) The last example was not mentioned here before but as you set in an example, I did it here as well ... Bye Simon Hi, all As I saw now in another thread - I forgot the octal number-system which takes 0 as prefix ... and this would change the result of my last example: var_dump((int) '0173'); // int(123) This makes me quite unsure if this should be done the way I proposed ... Here I would not expect it to happen like this. Bye Simon -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php