Re: [PHP-DEV] 5.NEXT Integer and String type modifications

2013-09-24 Thread Martin Keckeis
2013/6/4 Ivan Enderlin @ Hoa ivan.ender...@hoa-project.net


 On 04/06/13 12:08, Pierre Joye wrote:

 On Tue, Jun 4, 2013 at 10:41 AM, Ivan Enderlin @ Hoa
 ivan.ender...@hoa-project.net** wrote:

 Hey :-),


 On 02/06/13 08:52, Johannes Schlüter wrote:

 It would be a *gigantic* patch, but the userland effects should be
 minimal (the only changes would be supporting longer strings, and
 consistent 64
 bit int support). The performance considerations should be minimal for
 non-legacy code (as both would still be using native data types)...

 History shows that such gigantic patches are often not finished and done
 as people underestimate the size of PHP and the fact that all etensions
 have
 to be checked which for this case means checking each external lib for
 their
 correct type for all their functions etc ... but I don't wan to stop
 you,
 I'm happy if you do this :-) (while I'm also happy about everybody
 spending
 time on fixing bugs instead of adding such high-risk changes  ;-))

 Is it possible to use a static C analyzer here? It could help a lot. I
 think
 about Frama-C [1], Pork [2] (now included in Oink [3]) or Clang Static
 Analyser [4] to name a few. A more complete list can be found in [5].

 We do it using Visual C++ static analyzer, which is an excellent tool
 for this kind of issue, almost on all commits. As soon as we have a
 fork for these changes I will add it so we can get regular updates.

 Excellent!


Since Anthony is sadly gone, is there still something going on in this
direction?
Just fallen into a bug related to this with future DateTimes after
2038-01-19
03:14:07


Re: [PHP-DEV] 5.NEXT Integer and String type modifications

2013-06-04 Thread Pierre Joye
On Tue, Jun 4, 2013 at 6:59 AM, Michael Wallner m...@php.net wrote:

 +1 for the idea
 +1 for Z_STRSIZE

at least Z_STRSIZET for the reason explained earlier :)


--
Pierre

@pierrejoye | http://www.libgd.org

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] 5.NEXT Integer and String type modifications

2013-06-04 Thread Julien Pauli
+1 , that will make a big diff .

I'm here to help others to go forward.

Julien.P


On Tue, Jun 4, 2013 at 8:33 AM, Pierre Joye pierre@gmail.com wrote:

 On Tue, Jun 4, 2013 at 6:59 AM, Michael Wallner m...@php.net wrote:

  +1 for the idea
  +1 for Z_STRSIZE

 at least Z_STRSIZET for the reason explained earlier :)


 --
 Pierre

 @pierrejoye | http://www.libgd.org

 --
 PHP Internals - PHP Runtime Development Mailing List
 To unsubscribe, visit: http://www.php.net/unsub.php




Re: [PHP-DEV] 5.NEXT Integer and String type modifications

2013-06-04 Thread Ivan Enderlin @ Hoa

Hey :-),

On 02/06/13 08:52, Johannes Schlüter wrote:

It would be a *gigantic* patch, but the userland effects should be
minimal (the only changes would be supporting longer strings, and consistent 64
bit int support). The performance considerations should be minimal for
non-legacy code (as both would still be using native data types)...

History shows that such gigantic patches are often not finished and done as 
people underestimate the size of PHP and the fact that all etensions have to be 
checked which for this case means checking each external lib for their correct 
type for all their functions etc ... but I don't wan to stop you, I'm happy if 
you do this :-) (while I'm also happy about everybody spending time on fixing 
bugs instead of adding such high-risk changes  ;-))
Is it possible to use a static C analyzer here? It could help a lot. I 
think about Frama-C [1], Pork [2] (now included in Oink [3]) or Clang 
Static Analyser [4] to name a few. A more complete list can be found in [5].


The idea is excellent by the way :-).
Cheers.


[1] http://frama-c.com/
[2] https://developer.mozilla.org/en/docs/Pork
[3] http://daniel-wilkerson.appspot.com/oink/index.html
[4] http://clang-analyzer.llvm.org/
[5] 
http://en.wikipedia.org/wiki/List_of_tools_for_static_code_analysis#C.2FC.2B.2B


--
Ivan Enderlin
Developer of Hoa
http://hoa-project.net/

PhD. student at DISC/Femto-ST (Vesontio) and INRIA (Cassis)
http://disc.univ-fcomte.fr/ and http://www.inria.fr/

Member of HTML and WebApps Working Group of W3C
http://w3.org/



--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] 5.NEXT Integer and String type modifications

2013-06-04 Thread Johannes Schlüter
On Tue, 2013-06-04 at 10:41 +0200, Ivan Enderlin @ Hoa wrote:
  History shows that such gigantic patches are often not finished and
  done as people underestimate the size of PHP and the fact that all 
  etensions have to be checked which for this case means checking
  each external lib for their correct type for all their functions
  etc ... but I don't wan to stop you, I'm happy if you do this :-)
  (while I'm also happy about everybody spending time on fixing bugs
  instead of adding such high-risk changes  ;-))

 Is it possible to use a static C analyzer here? It could help a lot. I
 think about Frama-C [1], Pork [2] (now included in Oink [3]) or Clang
 Static Analyser [4] to name a few. A more complete list can be found
 in [5].

To some degree, but there are enough cases which are technically and
such correct but logically wrong, so the code still has to be reviewed.

johannes



-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] 5.NEXT Integer and String type modifications

2013-06-04 Thread Pierre Joye
On Tue, Jun 4, 2013 at 10:41 AM, Ivan Enderlin @ Hoa
ivan.ender...@hoa-project.net wrote:
 Hey :-),


 On 02/06/13 08:52, Johannes Schlüter wrote:

 It would be a *gigantic* patch, but the userland effects should be
 minimal (the only changes would be supporting longer strings, and
 consistent 64
 bit int support). The performance considerations should be minimal for
 non-legacy code (as both would still be using native data types)...

 History shows that such gigantic patches are often not finished and done
 as people underestimate the size of PHP and the fact that all etensions have
 to be checked which for this case means checking each external lib for their
 correct type for all their functions etc ... but I don't wan to stop you,
 I'm happy if you do this :-) (while I'm also happy about everybody spending
 time on fixing bugs instead of adding such high-risk changes  ;-))

 Is it possible to use a static C analyzer here? It could help a lot. I think
 about Frama-C [1], Pork [2] (now included in Oink [3]) or Clang Static
 Analyser [4] to name a few. A more complete list can be found in [5].

We do it using Visual C++ static analyzer, which is an excellent tool
for this kind of issue, almost on all commits. As soon as we have a
fork for these changes I will add it so we can get regular updates.


--
Pierre

@pierrejoye |  http://www.libgd.org

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] 5.NEXT Integer and String type modifications

2013-06-04 Thread Ivan Enderlin @ Hoa


On 04/06/13 12:08, Pierre Joye wrote:

On Tue, Jun 4, 2013 at 10:41 AM, Ivan Enderlin @ Hoa
ivan.ender...@hoa-project.net wrote:

Hey :-),


On 02/06/13 08:52, Johannes Schlüter wrote:

It would be a *gigantic* patch, but the userland effects should be
minimal (the only changes would be supporting longer strings, and
consistent 64
bit int support). The performance considerations should be minimal for
non-legacy code (as both would still be using native data types)...

History shows that such gigantic patches are often not finished and done
as people underestimate the size of PHP and the fact that all etensions have
to be checked which for this case means checking each external lib for their
correct type for all their functions etc ... but I don't wan to stop you,
I'm happy if you do this :-) (while I'm also happy about everybody spending
time on fixing bugs instead of adding such high-risk changes  ;-))

Is it possible to use a static C analyzer here? It could help a lot. I think
about Frama-C [1], Pork [2] (now included in Oink [3]) or Clang Static
Analyser [4] to name a few. A more complete list can be found in [5].

We do it using Visual C++ static analyzer, which is an excellent tool
for this kind of issue, almost on all commits. As soon as we have a
fork for these changes I will add it so we can get regular updates.

Excellent!

--
Ivan Enderlin
Developer of Hoa
http://hoa-project.net/

PhD. student at DISC/Femto-ST (Vesontio) and INRIA (Cassis)
http://disc.univ-fcomte.fr/ and http://www.inria.fr/

Member of HTML and WebApps Working Group of W3C
http://w3.org/



--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] 5.NEXT Integer and String type modifications

2013-06-03 Thread Michael Wallner
On 2 June 2013 11:11, Johannes Schlüter johan...@schlueters.de wrote:


 On Jun 2, 2013, at 8:34, Pierre Joye pierre@gmail.com wrote:

 Obviously there's a pretty significant ABI break here. I propose a tweak
 of the Z_* macros to fix that. Basically, Z_STRLEN() will cast the result
 to an int. This is the same behavior as today, and will mean that existing
 extensions continue to function exactly as today. But new extensions (and
 elsewhere in core) can use a new macro Z_STRSIZE() which will return the
 native size_t.

 A new macro will be a good solution, but I would name it what it
 actually is, Z_SIZE_T.

 That's not what it is. It is the length of the string aka. 
 var.value.str.length as such it should indicate its relation to a string. So 
 something like Z_STRSIZE is  correct (and the name is nice thinking about 
 Unicode strings where length (characters) != size (bytes))

+1 for the idea
+1 for Z_STRSIZE
+1 for volunteering, as far as time permits!


--
Regards,
Mike

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] 5.NEXT Integer and String type modifications

2013-06-02 Thread Pierre Joye
On Fri, May 31, 2013 at 9:11 PM, Anthony Ferrara ircmax...@gmail.com wrote:
 Hello all,

 I want to start an idea thread (or at least get a conversation going) about
 cleaning up the core integer data type and string lengths. Here's my ideas:

 1. Change string length in the ZVAL from int to size_t
  - http://lxr.php.net/xref/PHP_5_5/Zend/zend.h#321

Huge +1, as well as for any (allocated) random buffer we use/allocate.

 2. Change long in the ZVAL  (lval) to a system-determined 64bit fixed size

 There are two reasons for this. First, on VS compiles (windows), the
 current long size is always 32 bit. So that means even 64 bit compiles may
 or may not have 64 bit ints.

To do it as transparently as possible and a one time change (but we
can't avoid #ifdef) is to add a php_int type, or, my prefered
solution, we go with int64_t for the zval int type. One open question
is whether we keep the architecture dependent integer size, which is
rather annoying.

 The second reason is that right now PHP can't really handle strings = 2^31
 characters even on 64 bit compiles. The problem gets pretty comical:

 $ php -d memory_limit=499g -r \$string = str_repeat('x',pow(2, 32)) .
 str_repeat('x', pow(2,4)); var_dump(strlen(\$string));
 int(16)

 Obviously there's a pretty significant ABI break here. I propose a tweak
 of the Z_* macros to fix that. Basically, Z_STRLEN() will cast the result
 to an int. This is the same behavior as today, and will mean that existing
 extensions continue to function exactly as today. But new extensions (and
 elsewhere in core) can use a new macro Z_STRSIZE() which will return the
 native size_t.

A new macro will be a good solution, but I would name it what it
actually is, Z_SIZE_T.

 Likewise we can do the same for the long data type (Z_LVAL() returns a
 long, and Z_PHPLVAL() returns a php_long (which is a typedef of a 64 bit
 compiler specific type).

I'm not a fan of adding a php_long type but move to the int*_t types.
or php_int*_t types for easy understanding of what is actually used.


 It'll also require 2 new zend_parse_parameters types (one for php_long and
 one for the string len using size_t instead)...

 Additionally, I'd propose a set of central helpers to cast back and forth
 between php_long and long, as well as int to size_t (with overflow checks,
 allowing us to do errors on detected overflows instead of silently ignoring
 them as today).

Same as before, stop using long which has been proven to be not really
portable and can be confusing.

 It would be a *gigantic* patch, but the userland effects should be minimal
 (the only changes would be supporting longer strings, and consistent 64 bit
 int support). The performance considerations should be minimal for
 non-legacy code (as both would still be using native data types)...

 What do you think? What am I missing from this? Or is this just a horrific
 idea (given the current implementation details)...?

It is a very good idea and we have been discussed it many times, since
too long. I'm not sure it can be done in 5.x tho'. But no matter when
it will be done, we can already begin to do it in a fork and write
down a RFC. I'll be very happy to help here, on my todos for full
win64 support. Also we will need to patch libraries as well to avoid
the same issues to happen there. A first discussion I had with many of
the developers working on these libraries show that they have (almost)
no issue to clean up this as well.

Cheers,
--
Pierre

@pierrejoye |  http://www.libgd.org

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] 5.NEXT Integer and String type modifications

2013-06-02 Thread Johannes Schlüter


Andrey Hristov p...@hristov.com wrote:
what about new type IS_LONG64, new field in union and new set of macros

for this type. New extensions or rewritten extensions will use the new 
macros. In 2-3 major versions, 5.8 for example, old macros will be 
dropped. Enough time extensions to be ported to the new macros.

Unortunately a lot of code makes assumptions about the type system, so adding 
(or changing) a type can cause issues which are hard to find without going 
through the code line by line with a lot of concentration, mistakes there will 
lead to evil bugs ... and all that where the cmpiler won't help unless we 
change names of structure elements and macros ... so forcing us to touch any 
line using a zval bool/long/... 

But if people are volunteering I'd be happy about such improvements.

johannes

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] 5.NEXT Integer and String type modifications

2013-06-02 Thread Johannes Schlüter


Anthony Ferrara ircmax...@gmail.com wrote:
1. Change string length in the ZVAL from int to size_t
 - http://lxr.php.net/xref/PHP_5_5/Zend/zend.h#321

This would be good but a lot of work and an hard to track engine change ...

2. Change long in the ZVAL  (lval) to a system-determined 64bit fixed
size

Didn't somebody do a great chunk of the work to add arbitrary integer support? 
64bit is nice, arbitrary would be nicer (and both have issues in situations 
where we pass the PHP int to an external library expecting an int or long or 
such ...)

Obviously there's a pretty significant ABI break here. I propose a
tweak of the Z_* macros to fix that. Basically, Z_STRLEN() will cast the
result to an int. This is the same behavior as today, and will mean that
existing extensions continue to function exactly as today. But new extensions
(and elsewhere in core) can use a new macro Z_STRSIZE() which will return
the native size_t.

This will give strange results and potential bugs with strings on systems where 
MAX_SIZE_T  MAXINT when a user passes a string longer than MAXINT (luckily 
this, on all relevnt systems) means more than 2GB data, which usully should be 
hard to do for an external attacker and be prevented by memory_limit etc. 

It would be a *gigantic* patch, but the userland effects should be
minimal (the only changes would be supporting longer strings, and consistent 64
bit int support). The performance considerations should be minimal for
non-legacy code (as both would still be using native data types)...

History shows that such gigantic patches are often not finished and done as 
people underestimate the size of PHP and the fact that all etensions have to be 
checked which for this case means checking each external lib for their correct 
type for all their functions etc ... but I don't wan to stop you, I'm happy if 
you do this :-) (while I'm also happy about everybody spending time on fixing 
bugs instead of adding such high-risk changes  ;-))

johannes

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] 5.NEXT Integer and String type modifications

2013-06-02 Thread Johannes Schlüter


On Jun 2, 2013, at 8:34, Pierre Joye pierre@gmail.com wrote:

 Obviously there's a pretty significant ABI break here. I propose a tweak
 of the Z_* macros to fix that. Basically, Z_STRLEN() will cast the result
 to an int. This is the same behavior as today, and will mean that existing
 extensions continue to function exactly as today. But new extensions (and
 elsewhere in core) can use a new macro Z_STRSIZE() which will return the
 native size_t.
 
 A new macro will be a good solution, but I would name it what it
 actually is, Z_SIZE_T.

That's not what it is. It is the length of the string aka. var.value.str.length 
as such it should indicate its relation to a string. So something like 
Z_STRSIZE is  correct (and the name is nice thinking about Unicode strings 
where length (characters) != size (bytes))

johannes
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] 5.NEXT Integer and String type modifications

2013-06-02 Thread Pierre Joye
On Sun, Jun 2, 2013 at 11:11 AM, Johannes Schlüter
johan...@schlueters.de wrote:


 On Jun 2, 2013, at 8:34, Pierre Joye pierre@gmail.com wrote:

 Obviously there's a pretty significant ABI break here. I propose a tweak
 of the Z_* macros to fix that. Basically, Z_STRLEN() will cast the result
 to an int. This is the same behavior as today, and will mean that existing
 extensions continue to function exactly as today. But new extensions (and
 elsewhere in core) can use a new macro Z_STRSIZE() which will return the
 native size_t.

 A new macro will be a good solution, but I would name it what it
 actually is, Z_SIZE_T.

 That's not what it is. It is the length of the string aka. 
 var.value.str.length as such it should indicate its relation to a string. So 
 something like Z_STRSIZE is  correct (and the name is nice thinking about 
 Unicode strings where length (characters) != size (bytes))

It is size_t. There is no such thing as unicode or multibyte string
length in php but in mbstring, intl or iconv, to name a few. php
strings are buffers, and buffers lengths use size_t.

This macro (and other) are about extension developers, working in  C,
not about what its representation in userland. Even if they are
closely related, obviously.

The day we will have actual multi bytes/unicode strings, we will need
a separate length to represent in characters (be multi bytes).

Cheers,
--
Pierre

@pierrejoye | http://www.libgd.org

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] 5.NEXT Integer and String type modifications

2013-06-01 Thread Andrey Hristov

 Hi,
On 05/31/2013 10:03 PM, Anthony Ferrara wrote:

Derick,


In principle I think this is great thing to do. Not having a 64 bit type is

annoying. I'm a bit curious on how this is going to work with all sorts of
object wrappers that are now in place as workaround. And casting int64 to
int32 needs to very well looked at as well.



As far as the casting, my first reaction would be to raise an
E_ENGINE_NOTICE on data loss (casting from int64 to int32 with ints  32
bit) and then adjusting the value to the nearest representable value
(LONG_MAX or LONG_MIN). In other words, it may need to be more than a
simple cast (an inline function perhaps)...

As far as object wrappers, any particular examples that you're thinking of?

Thanks for the thoughts

Anthony

what about new type IS_LONG64, new field in union and new set of macros 
for this type. New extensions or rewritten extensions will use the new 
macros. In 2-3 major versions, 5.8 for example, old macros will be 
dropped. Enough time extensions to be ported to the new macros.


Best,
Andrey

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] 5.NEXT Integer and String type modifications

2013-05-31 Thread Derick Rethans
Anthony Ferrara ircmax...@gmail.com wrote:

 I want to start an idea thread (or at least get a conversation going)
 about
 cleaning up the core integer data type and string lengths. Here's my
 ideas:
 
 1. Change string length in the ZVAL from int to size_t
  - http://lxr.php.net/xref/PHP_5_5/Zend/zend.h#321
 2. Change long in the ZVAL  (lval) to a system-determined 64bit fixed
 size

In principle I think this is great thing to do. Not having a 64 bit type is 
annoying. I'm a bit curious on how this is going to work with all sorts of 
object wrappers that are now in place as workaround. And casting int64 to int32 
needs to very well looked at as well.

Derick 


-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] 5.NEXT Integer and String type modifications

2013-05-31 Thread Anthony Ferrara
Derick,


In principle I think this is great thing to do. Not having a 64 bit type is
 annoying. I'm a bit curious on how this is going to work with all sorts of
 object wrappers that are now in place as workaround. And casting int64 to
 int32 needs to very well looked at as well.


As far as the casting, my first reaction would be to raise an
E_ENGINE_NOTICE on data loss (casting from int64 to int32 with ints  32
bit) and then adjusting the value to the nearest representable value
(LONG_MAX or LONG_MIN). In other words, it may need to be more than a
simple cast (an inline function perhaps)...

As far as object wrappers, any particular examples that you're thinking of?

Thanks for the thoughts

Anthony


Re: [PHP-DEV] 5.NEXT Integer and String type modifications

2013-05-31 Thread Derick Rethans
Anthony Ferrara ircmax...@gmail.com wrote:

 Derick,
 
 In principle I think this is great thing to do. Not having a 64 bit
 type is
  annoying. I'm a bit curious on how this is going to work with all
 sorts of
  object wrappers that are now in place as workaround. And casting
 int64 to
  int32 needs to very well looked at as well.
 
 
 As far as the casting, my first reaction would be to raise an
 E_ENGINE_NOTICE on data loss (casting from int64 to int32 with ints 
 32
 bit) and then adjusting the value to the nearest representable value
 (LONG_MAX or LONG_MIN). In other words, it may need to be more than a
 simple cast (an inline function perhaps)

That can't be handled in applications though ...

 As far as object wrappers, any particular examples that you're
 thinking of?

It happens on atleast two extensions that I've written, dbus and mongodb, so I 
was thinking there must be a few more.

Cheers,
Derick  



-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] 5.NEXT Integer and String type modifications

2013-05-31 Thread Sherif Ramadan
On Fri, May 31, 2013 at 4:21 PM, Derick Rethans der...@php.net wrote:

 Anthony Ferrara ircmax...@gmail.com wrote:

  Derick,
 
  In principle I think this is great thing to do. Not having a 64 bit
  type is
   annoying. I'm a bit curious on how this is going to work with all
  sorts of
   object wrappers that are now in place as workaround. And casting
  int64 to
   int32 needs to very well looked at as well.
  
  
  As far as the casting, my first reaction would be to raise an
  E_ENGINE_NOTICE on data loss (casting from int64 to int32 with ints 
  32
  bit) and then adjusting the value to the nearest representable value
  (LONG_MAX or LONG_MIN). In other words, it may need to be more than a
  simple cast (an inline function perhaps)

 That can't be handled in applications though ...

  As far as object wrappers, any particular examples that you're
  thinking of?

 It happens on atleast two extensions that I've written, dbus and mongodb,
 so I was thinking there must be a few more.


PDO does a lot of that in various places.


 Cheers,
 Derick



 --
 PHP Internals - PHP Runtime Development Mailing List
 To unsubscribe, visit: http://www.php.net/unsub.php