[PHP-DEV] is_numeric_string an hexadecimal numbers (123 == 0x7B)

2012-04-17 Thread Nikita Popov
Hi internals!

The internal is_numeric_string [1] function is used to check whether a
string contains a number (and to extract that number).

Currently is_numeric_string also accepts hexadecimal strings [2]
(apart from the normal decimal integers and doubles).

This can cause some quite odd behavior at times. E.g. string
comparisons also use is_numeric_string, resulting in the behavior:

var_dump('123' == '0x7b'); // true

In all other parts of the engine hexadecimal strings are not recognized [3]:

var_dump((int) '0x7b'); // int(0)

This also causes minor problems in other parts of the engine where
is_numeric_string is used. E.g.

$string = 'abc';
var_dump($string['0xabc']); // string(a)
// 0xabc is first accepted as a number by is_numeric_string, but then
cast to 0 by convert_to_long

But:

$string = 'abc';
var_dump($string['0abc']);
// outputs (as expected):
Notice: A non well formed numeric value encountered in /code/8KXrYZ on line 9
NULL

In my eyes accepting hex strings in is_numeric_string leads to a quite
big WTF effect and causes problems and as such should be dropped.

I don't think this has much BC impact, so it should be possible to change it.

Nikita

 [1]: http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_operators.h#is_numeric_string
 [2]: http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_operators.h#131
 [3]: 
http://www.php.net/manual/en/language.types.string.php#language.types.string.conversion

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] is_numeric_string an hexadecimal numbers (123 == 0x7B)

2012-04-17 Thread Pierre Joye
hi!


On Tue, Apr 17, 2012 at 1:20 PM, Nikita Popov nikita@googlemail.com wrote:
  [3]: 
 http://www.php.net/manual/en/language.types.string.php#language.types.string.conversion

From the manual:

If the string starts with valid numeric data, this will be the value
used. Otherwise, the value will be 0 (zero). Valid numeric data is an
optional sign, followed by one or more digits (optionally containing a
decimal point), followed by an optional exponent. The exponent is an
'e' or 'E' followed by one or more digits. 

So no problem from to change this confusing behavior.

Cheers,
-- 
Pierre

@pierrejoye | http://blog.thepimp.net | http://www.libgd.org

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] is_numeric_string an hexadecimal numbers (123 == 0x7B)

2012-04-17 Thread Simon Schick
2012/4/17 Nikita Popov nikita@googlemail.com

 var_dump('123' == '0x7b'); // true

 In all other parts of the engine hexadecimal strings are not recognized
 [3]:

 var_dump((int) '0x7b'); // int(0)


Hi, Nikita

I personally would rather change the type-conversion for strings to integer ...
At least if you force it to do a type-cast (in other words: forcing to
get any valuable integer of that string) ...

Bye
Simon

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] is_numeric_string an hexadecimal numbers (123 == 0x7B)

2012-04-17 Thread Gustavo Lopes
On Tue, 17 Apr 2012 13:20:33 +0200, Nikita Popov  
nikita@googlemail.com wrote:



The internal is_numeric_string [1] function is used to check whether a
string contains a number (and to extract that number).

Currently is_numeric_string also accepts hexadecimal strings [2]
(apart from the normal decimal integers and doubles).

[...]
In my eyes accepting hex strings in is_numeric_string leads to a quite
big WTF effect and causes problems and as such should be dropped.

I don't think this has much BC impact, so it should be possible to  
change it.




I think definitely has a larger BC impact than you're portraying, I can  
see some people making comparisons against '0xA' instead of 0xA.


Besides, this is part of the Zend API. It's already used in many  
extensions (though possibly some of these should be using a stricter  
function) and changing its behavior is a stable branch is not wise:


http://lxr.php.net/opengrok/search?q=project=PHP_TRUNKdefs=refs=is_numeric_string

But in case, if there are no graver BC impacts, +1 for master.

--
Gustavo Lopes

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] is_numeric_string an hexadecimal numbers (123 == 0x7B)

2012-04-17 Thread Gustavo Lopes
On Tue, 17 Apr 2012 13:35:48 +0200, Simon Schick  
simonsimc...@googlemail.com wrote:



2012/4/17 Nikita Popov nikita@googlemail.com


var_dump('123' == '0x7b'); // true

In all other parts of the engine hexadecimal strings are not recognized
[3]:

var_dump((int) '0x7b'); // int(0)



I personally would rather change the type-conversion for strings to  
integer ...

At least if you force it to do a type-cast (in other words: forcing to
get any valuable integer of that string) ...



I think that would be an error. As was mentioned a few months ago when 0b  
was introduced, no other number format has this behavior. You can't do  
123 == 0b10 or 123 == 0876. Extending this hexadecimal oddity  
instead of eliminating it is inconsistent with the treatment given to  
those other formats.


--
Gustavo Lopes

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] is_numeric_string an hexadecimal numbers (123 == 0x7B)

2012-04-17 Thread Simon Schick
2012/4/17 Gustavo Lopes glo...@nebm.ist.utl.pt:

 I think that would be an error. As was mentioned a few months ago when 0b
 was introduced, no other number format has this behavior. You can't do 123
 == 0b10 or 123 == 0876. Extending this hexadecimal oddity instead of
 eliminating it is inconsistent with the treatment given to those other
 formats.

 --
 Gustavo Lopes


Hi, Gustavo

That's something I didn't know of ... if we're doing that, it should,
of course, be also be done for the dual system.
The only thing I wonder about is the code examples you're giving ...

I would expect this to work if we start to change something here:

var_dump((int) '0x7b'); // int(123)
var_dump((int) '0b011'); // int(123)
var_dump((int) '0123'); // int(123)

The last example was not mentioned here before but as you set in an
example, I did it here as well ...

Bye
Simon

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] is_numeric_string an hexadecimal numbers (123 == 0x7B)

2012-04-17 Thread Hartmut Holzgraefe
On 04/17/2012 01:20 PM, Nikita Popov wrote:

 I don't think this has much BC impact, so it should be possible to change it.

Same here, i never even knew that this worked in a string context
until recently. Autocast/comparison rules are already complicated
enough as they are documented now, and i failed to find anything
in the manual that would actually say that hex in a string
context is support to work at all ...

I can't really judge the BC implications though, so the best way would
be to start throwing E_DEPRECATED warnings for now ... or maybe go the
X11 way of deliberately break obscure feature and see how many
complaints we get ;)

-- 
hartmut

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] is_numeric_string an hexadecimal numbers (123 == 0x7B)

2012-04-17 Thread Robert Williams
On Apr 17, 2012, at 5:39, Hartmut Holzgraefe 
hartmut.holzgra...@gmail.commailto:hartmut.holzgra...@gmail.com wrote:

Same here, i never even knew that this worked in a string context
until recently. Autocast/comparison rules are already complicated
enough as they are documented now, and i failed to find anything
in the manual that would actually say that hex in a string
context is support to work at all ...

Would this end up changing the behavior of the user land is_numeric() function? 
The behavior actually is documented under that function:

Finds whether the given variable is numeric. Numeric strings consist of [...]. 
Hexadecimal notation (0xFF) is allowed too but only without sign, decimal and 
exponential part.

If so, although this does technically break BC in that case, I for one will not 
miss it. The only effect this will have on our code is to make validation of 
numeric input much easier and less error-prone.

--
Bob Williams

Sent from my iPad


Notice: This communication, including attachments, may contain information that 
is confidential. It constitutes non-public information intended to be conveyed 
only to the designated recipient(s). If the reader or recipient of this 
communication is not the intended recipient, an employee or agent of the 
intended recipient who is responsible for delivering it to the intended 
recipient, or if you believe that you have received this communication in 
error, please notify the sender immediately by return e-mail and promptly 
delete this e-mail, including attachments without reading or saving them in any 
manner. The unauthorized use, dissemination, distribution, or reproduction of 
this e-mail, including attachments, is prohibited and may be unlawful. If you 
have received this email in error, please notify us immediately by e-mail or 
telephone and delete the e-mail and the attachments (if any).


Re: [PHP-DEV] is_numeric_string an hexadecimal numbers (123 == 0x7B)

2012-04-17 Thread Nikita Popov
On Tue, Apr 17, 2012 at 1:44 PM, Gustavo Lopes glo...@nebm.ist.utl.pt wrote:
 On Tue, 17 Apr 2012 13:20:33 +0200, Nikita Popov nikita@googlemail.com
 wrote:

 The internal is_numeric_string [1] function is used to check whether a
 string contains a number (and to extract that number).

 Currently is_numeric_string also accepts hexadecimal strings [2]
 (apart from the normal decimal integers and doubles).

 [...]

 In my eyes accepting hex strings in is_numeric_string leads to a quite
 big WTF effect and causes problems and as such should be dropped.

 I don't think this has much BC impact, so it should be possible to change
 it.


 I think definitely has a larger BC impact than you're portraying, I can see
 some people making comparisons against '0xA' instead of 0xA.
Yes, this definitely does have BC impact, but I don't think it is
particularly large.

The affected areas mainly would be:
 * String comparisons using ==
 * Strings passed to internal functions which accept the value through
an l zend_parse_parameters (functions doing manual type handling via
the Z_TYPE and convert_to_long do not accept hex already now)
 * The userland function is_numeric

The first two would mainly be a problem if somebody - as you already
mention - has written '0xA' == $foo style comparisons or did stuff
like round($number, '0xA'). Both cases - in my eyes - aren't
particularly probably as anyone who knows what a hex number is
probably also knows the difference between a string literal and a
number literal.

The last one is more problematic. It is explicitly documented as
accepting hexadecimal numbers. In my eyes it too should not accept
them, but I could imagine that people rely on this.

 Besides, this is part of the Zend API. It's already used in many extensions
 (though possibly some of these should be using a stricter function) and
 changing its behavior is a stable branch is not wise:

 http://lxr.php.net/opengrok/search?q=project=PHP_TRUNKdefs=refs=is_numeric_string
I've already looked at some of these and in most (all?) cases the
intended behavior seems to be to not allow hex (passing hex in those
situations actually creates some kind of broken behavior).

Nikita

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] is_numeric_string an hexadecimal numbers (123 == 0x7B)

2012-04-17 Thread Robert Williams
On 4/17/12 08:17, Nikita Popov nikita@googlemail.com wrote:


The last one is more problematic. It is explicitly documented as
accepting hexadecimal numbers. In my eyes it too should not accept
them, but I could imagine that people rely on this.

This always struck me as mistaken design. Why accept hex or decimal, but
not the other bases that PHP knows about? I can see a small number of
scenarios where having it accept hex input is definitely useful, but I
suspect that the vast majority of cases out there where it's used is in
validation routines expecting straightforward, base-10 numbers. And I know
that, of all such cases I've seen (and I've seen quite a few, since one of
our interview test questions implicitly covers it), most programmers are
blissfully ignorant of the hex support and unwittingly allow bad user data
to slip into their applications to become trusted data. Not good.

As I mentioned in my last message, I wouldn't be bothered if this behavior
were simply removed. I think it would affect a small number of people
knowingly relying on the feature, while it would fix probably many
thousands of bugs out there lurking in less-aware programmers' code. Even
better, though, might be to add a flag parameter that would give the
programmer explicit control over its behavior, including which bases to
allow (and including the bases currently MIA).

-Bob

--
Robert E. Williams, Jr.
Associate Vice President of Software Development
Newtek Businesss Services, Inc. -- The Small Business Authority
https://www.newtekreferrals.com/rewjr
http://www.thesba.com/







Notice: This communication, including attachments, may contain information that 
is confidential. It constitutes non-public information intended to be conveyed 
only to the designated recipient(s). If the reader or recipient of this 
communication is not the intended recipient, an employee or agent of the 
intended recipient who is responsible for delivering it to the intended 
recipient, or if you believe that you have received this communication in 
error, please notify the sender immediately by return e-mail and promptly 
delete this e-mail, including attachments without reading or saving them in any 
manner. The unauthorized use, dissemination, distribution, or reproduction of 
this e-mail, including attachments, is prohibited and may be unlawful. If you 
have received this email in error, please notify us immediately by e-mail or 
telephone and delete the e-mail and the attachments (if any).

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] is_numeric_string an hexadecimal numbers (123 == 0x7B)

2012-04-17 Thread Simon Schick
2012/4/17 Simon Schick simonsimc...@googlemail.com

 Hi, Gustavo

 That's something I didn't know of ... if we're doing that, it should,
 of course, be also be done for the dual system.
 The only thing I wonder about is the code examples you're giving ...

 I would expect this to work if we start to change something here:

 var_dump((int) '0x7b'); // int(123)
 var_dump((int) '0b011'); // int(123)
 var_dump((int) '0123'); // int(123)

 The last example was not mentioned here before but as you set in an
 example, I did it here as well ...

 Bye
 Simon

Hi, all

As I saw now in another thread - I forgot the octal number-system
which takes 0 as prefix ... and this would change the result of my
last example:

var_dump((int) '0173'); // int(123)

This makes me quite unsure if this should be done the way I proposed ...
Here I would not expect it to happen like this.

Bye
Simon

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php