Re: [PHP-DEV] String Size Refactor Progress

2013-07-10 Thread Anatol Belski
Hi,

On Wed, July 10, 2013 07:51, Pierre Joye wrote:
 Substr needs to be refactored to use size_t. Right now, I just raise an
  error if Z_STRSIZE  INT_MAX (or an overflow would happen). I'd love
 to see that cleaned up more.

 We should consider moving to int64_t too as well while being at it.
 For many reasons or platforms it makes sense to drop (unsigned)
 long/int usage as well (and all derived types) and uses (u)int*_t instead.
 I'd to see this task as strongly related to the size_t move.
 It will also solve many other related issues (LFS support, 64bit
 integer portability issues, etc.). What do you think?

Agree 101%, while working on this I've already seen a couple of places
where having int64 in zval instead of long were the right choice while
porting the string stuff. Like what Anthony says about working with the
string size in userspace, but the other parts of code will need it
simultaneously, too. For that we've already zend_intptr_t defined in
zend_types.h . Lets tweak zval definition and do it parallel.


 Here's the branch:
 https://github.com/ircmaxell/php-src/tree/string_size_refactor_take_2
 And the diff:
 https://github.com/ircmaxell/php-src/compare/string_size_refactor_take_2


 If you want to help out, please let me know and let's try to coordinate
 so we don't step on each other's toes...

 It would be awesome to add this fork to lxr, it can be amazingly
 helpful. From our side, we are testing and fixing (Weltling) it already, we
 will try to do it on regular basis to do not stay behind or do to do not
 have to debug a huge amount of changes to fix one bug or another (can be
 painful with this kind of changes).

It were awesome to put this work into git.php.net space and use lxr.

Best regards

Anatol

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] String Size Refactor Progress

2013-07-09 Thread Pierre Joye
hi Anthony!

On Wed, Jul 3, 2013 at 4:12 PM, Anthony Ferrara ircmax...@gmail.com wrote:
 Hey all,

 So I've started the refactor to change the stored string size from int to
 size_t.


Great work so far! Thanks! :)

 I've got it compiling and the tests mostly passing (not all), when run with
 --disable-all and --disable-cgi.

 There are definitely still issues with the patch (there are some weird
 segfaults in certain times, which are caught by the tests), but it's
 progressing really nicely.

 Here's what I did:

 I created a new build option: --enable-zstrlen. This turns off the new
 match, and type-defs and defines everything back to how it was before. This
 is really useful for testing changes to ensure that they still work.

 Then, I defined a series of new types:

 #ifdef ZEND_USE_LEGACY_STRING_TYPES

 #define zend_str_size_int int
 #define zend_str_size_uint unsigned int
 #define zend_str_size_size_t size_t
 #define zend_str_size_long long
 typedef int zend_str_size;

 #else

 #define zend_str_size_int zend_str_size
 #define zend_str_size_uint zend_str_size
 #define zend_str_size_size_t zend_str_size
 #define zend_str_size_long zend_str_size
 typedef size_t zend_str_size;

 #endif

 Any API that accepted a string size parameter, I replace with one of the
 zend_str_size_* definitions. I chose to do this instead of just changing it
 directly to zend_str_size, as it should make extension developer's lives
 easier by supporting the intermediate types (with their own define lines
 for older versions of the API).

 These are intended to be removed after 1 or 2 releases, replacing
 everything with just zend_str_size.

 Due to a problem with zend_parse_parameters, I added two new identifiers: S
 and P. They act just like s and p, except that they return zend_str_size
 instead of int.

 When `--enable-zstrlen` is not enabled, I disable s and p, and changed ZPP
 to rase an E_ERROR on unknown parameter types. The E_ERROR change is not
 intended to go into production, but instead just makes life A LOT easier
 refactoring modules one at a time.

Like it, easy way to switch back and forth in case something is not
running smoothly or not as it should.

 Here's what's left to do:

 I've only really got the basic system working (for limited definitions of
 working). There's a ton of extensions that need migrating, and tons of
 parts of the core that i haven't fully migrated yet.

 I've migrated php_pcre.c over, but pcrelib still uses int for string sizes.
 This is going to be a much larger refactor, and I wanted to see people's
 thoughts prior to digging into it.

This is something we have to care about for all other external libs,
as we discussed earlier on IRC, this task will be as huge as the one
in PHP, if not more time consuming (convincing upstream projects to
change some APIs, f.e.).

 Substr needs to be refactored to use size_t. Right now, I just raise an
 error if Z_STRSIZE  INT_MAX (or an overflow would happen). I'd love to see
 that cleaned up more.

We should consider moving to int64_t too as well while being at it.
For many reasons or platforms it makes sense to drop (unsigned)
long/int usage as well (and all derived types) and uses (u)int*_t
instead. I'd to see this task as strongly related to the size_t move.
It will also solve many other related issues (LFS support, 64bit
integer portability issues, etc.). What do you think?


 Here's the branch:
 https://github.com/ircmaxell/php-src/tree/string_size_refactor_take_2
 And the diff:
 https://github.com/ircmaxell/php-src/compare/string_size_refactor_take_2

 If you want to help out, please let me know and let's try to coordinate so
 we don't step on each other's toes...

It would be awesome to add this fork to lxr, it can be amazingly
helpful. From our side, we are testing and fixing (Weltling) it
already, we will try to do it on regular basis to do not stay behind
or do to do not have to debug a huge amount of changes to fix one bug
or another (can be painful with this kind of changes).

Cheers,
--
Pierre

@pierrejoye | http://www.libgd.org

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP-DEV] String Size Refactor Progress

2013-07-03 Thread Anthony Ferrara
Hey all,

So I've started the refactor to change the stored string size from int to
size_t.

I've got it compiling and the tests mostly passing (not all), when run with
--disable-all and --disable-cgi.

There are definitely still issues with the patch (there are some weird
segfaults in certain times, which are caught by the tests), but it's
progressing really nicely.

Here's what I did:

I created a new build option: --enable-zstrlen. This turns off the new
match, and type-defs and defines everything back to how it was before. This
is really useful for testing changes to ensure that they still work.

Then, I defined a series of new types:

#ifdef ZEND_USE_LEGACY_STRING_TYPES

#define zend_str_size_int int
#define zend_str_size_uint unsigned int
#define zend_str_size_size_t size_t
#define zend_str_size_long long
typedef int zend_str_size;

#else

#define zend_str_size_int zend_str_size
#define zend_str_size_uint zend_str_size
#define zend_str_size_size_t zend_str_size
#define zend_str_size_long zend_str_size
typedef size_t zend_str_size;

#endif

Any API that accepted a string size parameter, I replace with one of the
zend_str_size_* definitions. I chose to do this instead of just changing it
directly to zend_str_size, as it should make extension developer's lives
easier by supporting the intermediate types (with their own define lines
for older versions of the API).

These are intended to be removed after 1 or 2 releases, replacing
everything with just zend_str_size.

Due to a problem with zend_parse_parameters, I added two new identifiers: S
and P. They act just like s and p, except that they return zend_str_size
instead of int.

When `--enable-zstrlen` is not enabled, I disable s and p, and changed ZPP
to rase an E_ERROR on unknown parameter types. The E_ERROR change is not
intended to go into production, but instead just makes life A LOT easier
refactoring modules one at a time.



Here's what's left to do:

I've only really got the basic system working (for limited definitions of
working). There's a ton of extensions that need migrating, and tons of
parts of the core that i haven't fully migrated yet.

I've migrated php_pcre.c over, but pcrelib still uses int for string sizes.
This is going to be a much larger refactor, and I wanted to see people's
thoughts prior to digging into it.

Substr needs to be refactored to use size_t. Right now, I just raise an
error if Z_STRSIZE  INT_MAX (or an overflow would happen). I'd love to see
that cleaned up more.

My general process has been to enable an extension, fix the compile errors
(typically due to removing Z_STRLEN*). Then run through the extension,
searching for int and replacing where appropriate with zend_str_size
(within a function) or zend_str_size_* in an API. Then run the tests for
that extension, and fix the issues as they come up. Finally, recompile with
-Werror and fix all of the warnings (yay!)...



Lessons Learned So Far

How this system is working today, I have no idea. There are SOOO many
issues in string handling just due to types. I've seen int, unsigned int,
size_t, long, unsigned long and others, silently cast back and forth
(implicit casts too). Some really weird things going on...





Here's the branch:
https://github.com/ircmaxell/php-src/tree/string_size_refactor_take_2
And the diff:
https://github.com/ircmaxell/php-src/compare/string_size_refactor_take_2

If you want to help out, please let me know and let's try to coordinate so
we don't step on each other's toes...

Thanks!

Anthony


Re: [PHP-DEV] String Size Refactor Progress

2013-07-03 Thread Michael Wallner
On Jul 3, 2013 4:12 PM, Anthony Ferrara ircmax...@gmail.com wrote:

 If you want to help out, please let me know and let's try to coordinate so
 we don't step on each other's toes...


I'm with you from August 1st, at the latest!

 Thanks!

 Anthony


Re: [PHP-DEV] String Size Refactor Progress

2013-07-03 Thread Christopher Jones



On 07/03/2013 07:12 AM, Anthony Ferrara wrote:


Then, I defined a series of new types:

#ifdef ZEND_USE_LEGACY_STRING_TYPES



Due to a problem with zend_parse_parameters, I added two new identifiers: S
and P. They act just like s and p, except that they return zend_str_size
instead of int.

When `--enable-zstrlen` is not enabled, I disable s and p, and changed ZPP
to rase an E_ERROR on unknown parameter types. The E_ERROR change is not
intended to go into production, but instead just makes life A LOT easier
refactoring modules one at a time.


Can you elaborate on the problem?  Ideally the return type of s  p would
be toggled by ZEND_USE_LEGACY_STRING_TYPES.

Chris

--
christopher.jo...@oracle.com  http://twitter.com/ghrd
Free PHP  Oracle book:
http://www.oracle.com/technetwork/topics/php/underground-php-oracle-manual-098250.html

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] String Size Refactor Progress

2013-07-03 Thread Anthony Ferrara
Chris,


When `--enable-zstrlen` is not enabled, I disable s and p, and changed ZPP
 to rase an E_ERROR on unknown parameter types. The E_ERROR change is not
 intended to go into production, but instead just makes life A LOT easier
 refactoring modules one at a time.


 Can you elaborate on the problem?  Ideally the return type of s  p would
 be toggled by ZEND_USE_LEGACY_STRING_TYPES


The problem is that changing 's' from int to size_t can (and does) cause
segfaults.

We can have the discussion if 's' and 'p' should remain there (and if the
compile time option should be there) at merge time, but for now the switch
just makes life easier making the change. It wasn't designed for a
long-term switch...

Anthony


Re: [PHP-DEV] String Size Refactor Progress

2013-07-03 Thread Stas Malyshev
Hi!


 The problem is that changing 's' from int to size_t can (and does) cause
 segfaults.

But if we refactor string lengths to be size_t, we can not allow code
that still relies on it being int, if sizeof(size_t) != sizeof(int). It
would just produce wrong length. I'd actually prefer segfault there at
least for testing phase because segfault is obvious while wrong length
would be much harder to catch.

 We can have the discussion if 's' and 'p' should remain there (and if the
 compile time option should be there) at merge time, but for now the switch
 just makes life easier making the change. It wasn't designed for a
 long-term switch...

OK, if it's a temp measure, that's fine, but if we're taking strings to
use size_t, then in the final code it should use size_t in the API
functions everywhere, otherwise there would be a lot of weird bugs.
-- 
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] String Size Refactor Progress

2013-07-03 Thread Yasuo Ohgaki
Hi Anthony,

2013/7/3 Anthony Ferrara ircmax...@gmail.com

 I've migrated php_pcre.c over, but pcrelib still uses int for string sizes.
 This is going to be a much larger refactor, and I wanted to see people's
 thoughts prior to digging into it.


For upgradability and compatibility, bundled pcrelib should be left as
it is. Instead, checking possible over/under flow before passes string is
better.

Most pre-complied systems use system's preclib, I suppose.

Regards,

--
Yasuo Ohgaki
yohg...@ohgaki.net


Re: [PHP-DEV] String Size Refactor Progress

2013-07-03 Thread Pierre Joye
Hi Yasuo,

On Jul 4, 2013 5:54 AM, Yasuo Ohgaki yohg...@ohgaki.net wrote:

 Hi Anthony,

 2013/7/3 Anthony Ferrara ircmax...@gmail.com

  I've migrated php_pcre.c over, but pcrelib still uses int for string
sizes.
  This is going to be a much larger refactor, and I wanted to see people's
  thoughts prior to digging into it.
 

 For upgradability and compatibility, bundled pcrelib should be left as
 it is. Instead, checking possible over/under flow before passes string is
 better.

Or ask pcre to use size_t as well for their next major version.

Cheers,
Pierre