Re: [PHP-DEV] concatenation operator

2012-07-02 Thread Christopher Jones


On 06/30/2012 04:51 PM, Johannes Schlüter wrote:

On Sat, 2012-06-30 at 03:53 -0700, Adi Mutu wrote:



Only thing that helps is learning the code structure and digging

through it.

Any hint/documentation to learn that?


Use the source. ;-)

A bit more seriously: No, there's no good single place to look at, there
are different blogs etc looking at specific pieces in detail, but the
best thing to do is looking at the code (the filenames in Zend/ give a
good idea what they are for ...), take a question and time and start
digging. For some things it's also good to look into xdebug, vld,
runkit, ... and see where they hook in to do their magic. And well, the
path from main() in sapi/cli/php_cli.c to execute() is not that long,
what then happens is a bit more complicated though (while then again,
once you're in, quite easy for most parts, too)

johannes


There is a wiki page linking to some useful resources: 
https://wiki.php.net/internals/references

Chris

--
christopher.jo...@oracle.com
http://twitter.com/#!/ghrd



--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] concatenation operator

2012-06-30 Thread Adi Mutu


By initialization i mean the latest point possible where I can set a 
breakpoint, but right before my scripts starts executing it's emalloc's or 
efree's.

 but even then you will see many things you're probably not interested in

such as?..

 Only thing that helps is learning the code structure and digging through it.

Any hint/documentation to learn that?

Thanks.




 From: Johannes Schlüter johan...@schlueters.de
To: Adi Mutu adi_mut...@yahoo.com 
Cc: Felipe Pena felipe...@gmail.com; PHP Developers Mailing List 
internals@lists.php.net 
Sent: Saturday, June 30, 2012 12:36 AM
Subject: Re: [PHP-DEV] concatenation operator
 
On Fri, 2012-06-29 at 11:47 -0700, Adi Mutu wrote:
 Sorry for the late reply, I was away for a while..
 I don't think I have dtrace because I'm on fedora.but i'll
 research.

As said: Currently only on Solaris, MacOS and BSD. Oracle is porting
DTrace to Oracle Linux. RedHat created SystemTap which is similar, ut I
have never used it.

 If i would want to set a breakpoint after php's initialization
 process, but right before the scripts execution, so that after that I
 can set breakpoints to emalloc and efree which are executed only
 during my scripts execution where should i set it? Hope the question
 was clear enough.

Depends on your view what initialization is. But execute() might be a
place which helps ... but even then you will see many things you're
probably not interested in. Only thing that helps is learning the code
structure and digging through it.

 dtrace related:
 Why have you used 'execute:return' and not concat_function:return?
 What's with the execute function?

That was a bug since I quickly edited an older script. In this case it
doesn't change the result.

johannes

Re: [PHP-DEV] concatenation operator

2012-06-30 Thread Johannes Schlüter
On Sat, 2012-06-30 at 03:53 -0700, Adi Mutu wrote:
 
 By initialization i mean the latest point possible where I can set a
 breakpoint, but right before my scripts starts executing it's
 emalloc's or efree's.

Does executing include compilation? Does it include creating a stack
frame etc. for the main routine? ...

  but even then you will see many things you're probably not
 interested in
 
 such as?..

Well, PHP is complex, it does quite a few things in order to run a
seemingly small script.

  Only thing that helps is learning the code structure and digging
 through it.
 
 Any hint/documentation to learn that?

Use the source. ;-)

A bit more seriously: No, there's no good single place to look at, there
are different blogs etc looking at specific pieces in detail, but the
best thing to do is looking at the code (the filenames in Zend/ give a
good idea what they are for ...), take a question and time and start
digging. For some things it's also good to look into xdebug, vld,
runkit, ... and see where they hook in to do their magic. And well, the
path from main() in sapi/cli/php_cli.c to execute() is not that long,
what then happens is a bit more complicated though (while then again,
once you're in, quite easy for most parts, too)

johannes



-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] concatenation operator

2012-06-29 Thread Adi Mutu
Hello,

Sorry for the late reply, I was away for a while..
I don't think I have dtrace because I'm on fedora.but i'll research.

If i would want to set a breakpoint after php's initialization process, but 
right before the scripts execution, so that after that I can set breakpoints to 
emalloc and efree which are executed only during my scripts execution where 
should i set it? Hope the question was clear enough.

dtrace related:
Why have you used 'execute:return' and not concat_function:return? What's with 
the execute function?


Thanks,
A.



 From: Johannes Schlüter johan...@schlueters.de
To: Adi Mutu adi_mut...@yahoo.com 
Cc: Felipe Pena felipe...@gmail.com; PHP Developers Mailing List 
internals@lists.php.net 
Sent: Thursday, June 7, 2012 11:18 PM
Subject: Re: [PHP-DEV] concatenation operator
 
On Thu, 2012-06-07 at 12:53 -0700, Adi Mutu wrote:
 Ok Johannes, thanks for the answer. I'll try to look deeper. 
 I basically just wanted to know what happens when you concatenate two
 strings? what emalloc/efree happens.

This depends. As always. As said what has to be done is one allocation
for the result value ... and then the zval magic, which depends on
refcount, references, ...

 Also can you tell me if possible how to put a breakpoint to
 emalloc/efree which are executed only after all core functions are
 registered? because it takes like a million years like this and a
 million F8 presses...

Depends on your debugger. Most allow conditional breakpoints or have a
breakpoint and while holding at some place add a few more ...

For such a question my preference is using DTrace (on Solaris, Mac or
BSD), something like this session:

        $ cat test.d
        #!/sbin/dtrace
        
        pid$target::concat_function:entry {
            self-in_concat = 1;
        }
        
        pid$target::execute:return {
            self-in_concat = 0;
        }
        
        pid$target::_emalloc:entry
        / self-in_concat /
        {
            trace(arg0);
            ustack();
        }
        
        pid$target::_erealloc:entry
        / self-in_concat /
        {
            trace(arg0);
            trace(arg1);
            ustack();
        }
        
        $ cat test1.php
        ?php
        $a = foo; $b = bar; $a.$b;
        
        $ dtrace -s test.d -c 'php test1.php'
        dtrace: script 'test.d' matched 4 probes
        dtrace: pid 16406 has exited
        CPU     ID                    FUNCTION:NAME
          3 100372                   _emalloc:entry                 7
                      php`_emalloc
                      php`concat_function+0x270
                      php`ZEND_CONCAT_SPEC_CV_CV_HANDLER+0xcd
                      php`execute+0x3d9
                      php`dtrace_execute+0xe7
                      php`zend_execute_scripts+0xf5
                      php`php_execute_script+0x2e8
                      php`do_cli+0x864
                      php`main+0x6e2
                      php`_start+0x83
        
        $ cat test2.php
        ?php
        $a = 23; $b = bar; $a.$b;
        
        $ dtrace -s test.d -c 'php test2.php'
        dtrace: script 'test.d' matched 4 probes
        dtrace: pid 16425 has exited
        CPU     ID                    FUNCTION:NAME
          1 100373                  _erealloc:entry                 0           
    79
                      php`_erealloc
                      php`xbuf_format_converter+0x11ee
                      php`vspprintf+0x34
                      php`zend_spprintf+0x2f
                      php`_convert_to_string+0x174
                      php`zend_make_printable_zval+0x5ec
                      php`concat_function+0x3c
                      php`ZEND_CONCAT_SPEC_CV_CV_HANDLER+0xcd
                      php`execute+0x3d9
                      php`dtrace_execute+0xe7
                      php`zend_execute_scripts+0xf5
                      php`php_execute_script+0x2e8
                      php`do_cli+0x864
                      php`main+0x6e2
                      php`_start+0x83
        
          1 100372                   _emalloc:entry                 6
                      php`_emalloc
                      php`concat_function+0x270
                      php`ZEND_CONCAT_SPEC_CV_CV_HANDLER+0xcd
                      php`execute+0x3d9
                      php`dtrace_execute+0xe7
                      php`zend_execute_scripts+0xf5
                      php`php_execute_script+0x2e8
                      php`do_cli+0x864
                      php`main+0x6e2
                      php`_start+0x83
        
So, when having two constant strings there's a single malloc, in this
case allocating 7 bytes (strlen(foo)+strlen(bar)+1), if you have a
different type it has to be converted first ...


johannes

Re: [PHP-DEV] concatenation operator

2012-06-29 Thread Johannes Schlüter
On Fri, 2012-06-29 at 11:47 -0700, Adi Mutu wrote:
 Sorry for the late reply, I was away for a while..
 I don't think I have dtrace because I'm on fedora.but i'll
 research.

As said: Currently only on Solaris, MacOS and BSD. Oracle is porting
DTrace to Oracle Linux. RedHat created SystemTap which is similar, ut I
have never used it.

 If i would want to set a breakpoint after php's initialization
 process, but right before the scripts execution, so that after that I
 can set breakpoints to emalloc and efree which are executed only
 during my scripts execution where should i set it? Hope the question
 was clear enough.

Depends on your view what initialization is. But execute() might be a
place which helps ... but even then you will see many things you're
probably not interested in. Only thing that helps is learning the code
structure and digging through it.

 dtrace related:
 Why have you used 'execute:return' and not concat_function:return?
 What's with the execute function?

That was a bug since I quickly edited an older script. In this case it
doesn't change the result.

johannes



-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] concatenation operator

2012-06-14 Thread Ángel González
On 13/06/12 05:26, Morgan L. Owens wrote:
 After reading the performance improvements RFC about interned strings,
 and its passing mention of a special data structure (e.g.
 zend_string) instead of char*, I've been thinking a little bit about
 this and what such a structure could be.

 But rather than interned strings, I thought that _implicit_
 concatenation would be a bigger win in the long term. Like interning,
 it relies on strings being immutable.

 This zend_string is a composite type. Leaves are _almost_ identical to
 existing string zvals - char* val, int len - but also an additional
 child_count field. For leaves, child_count is zero (not incidentally
 indicating that it _is_ a leaf). For internal nodes, val is a list
 of zend_strings (child_count of them). len still refers to the total
 string length (the sum of the len fields of its children).

 So a string that has been built up through concatenation is
 represented by a tree (actually a dag) of zend_strings. The edges in
 this dag are all properly reference-counted; discarding a string
 decrements the reference counts of its children.
How do you list then? As a single-linked list?
That would avoid reuse of the component strings in different
superstrings except from matching ends...




-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: Re: [PHP-DEV] concatenation operator

2012-06-14 Thread Morgan L. Owens

On 2012-06-15 04:00, Ángel González wrote:

On 13/06/12 05:26, Morgan L. Owens wrote:

After reading the performance improvements RFC about interned strings,
and its passing mention of a special data structure (e.g.
zend_string) instead of char*, I've been thinking a little bit about
this and what such a structure could be.

But rather than interned strings, I thought that _implicit_
concatenation would be a bigger win in the long term. Like interning,
it relies on strings being immutable.

This zend_string is a composite type. Leaves are _almost_ identical to
existing string zvals - char* val, int len - but also an additional
child_count field. For leaves, child_count is zero (not incidentally
indicating that it _is_ a leaf). For internal nodes, val is a list
of zend_strings (child_count of them). len still refers to the total
string length (the sum of the len fields of its children).

So a string that has been built up through concatenation is
represented by a tree (actually a dag) of zend_strings. The edges in
this dag are all properly reference-counted; discarding a string
decrements the reference counts of its children.

How do you list then? As a single-linked list?
That would avoid reuse of the component strings in different
superstrings except from matching ends...

I was thinking just in terms of an array (the composite would be 
pointing either to an array of characters or an array of strings). 
Mainly just because that's how I pictured it (and haven't thought of a 
reason not to, since the number of children is known when the 
concatenated string is created, and fixed due to immutability).


Component strings aren't copied as such, only referenced. In that sense 
the choice of array vs. list comes down to where that reference is kept 
- in the parent string or the elder sibling. Sharing common suffixes 
would save a number of references, but when concatenating two existing 
strings, the list of component references in the _prefix_ would need to 
be copied for the sake of whatever else is using it at the time 
(otherwise they would end up with the concatenated string as well).


Speaking of concatenation, unless potentially scary stuff is done, 
concatenating three strings is done by concatenating two of them, then 
concatenating the result with the third, giving a binary tree; so why am 
I suggesting an array of arbitrary length? Think of an implementation of 
PHP's join()/implode() that exploits this structure.



--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: Re: [PHP-DEV] concatenation operator

2012-06-12 Thread Morgan L. Owens

On 2012-06-08 08:18, Johannes Schlüter wrote:

On Thu, 2012-06-07 at 12:53 -0700, Adi Mutu wrote:

Ok Johannes, thanks for the answer. I'll try to look deeper.
I basically just wanted to know what happens when you concatenate two
strings? what emalloc/efree happens.


This depends. As always. As said what has to be done is one allocation
for the result value ... and then the zval magic, which depends on
refcount, references, ...





So, when having two constant strings there's a single malloc, in this
case allocating 7 bytes (strlen(foo)+strlen(bar)+1), if you have a
different type it has to be converted first ...



After reading the performance improvements RFC about interned strings, 
and its passing mention of a special data structure (e.g. zend_string) 
instead of char*, I've been thinking a little bit about this and what 
such a structure could be.


But rather than interned strings, I thought that _implicit_ 
concatenation would be a bigger win in the long term. Like interning, it 
relies on strings being immutable.


This zend_string is a composite type. Leaves are _almost_ identical to 
existing string zvals - char* val, int len - but also an additional 
child_count field. For leaves, child_count is zero (not incidentally 
indicating that it _is_ a leaf). For internal nodes, val is a list of 
zend_strings (child_count of them). len still refers to the total 
string length (the sum of the len fields of its children).


So a string that has been built up through concatenation is represented 
by a tree (actually a dag) of zend_strings. The edges in this dag are 
all properly reference-counted; discarding a string decrements the 
reference counts of its children.


Only when the character data is needed for something does it need to be 
allocated for and copied into one place (the internal node can then 
become a leaf).



--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] concatenation operator

2012-06-07 Thread Adi Mutu


that's nice, but i haven't understood a thing...i know something about php core 
and php extensions, but nothing about the Zend engine specific. 
Can you point me to some resources on this topic?

Thanks,



 From: Felipe Pena felipe...@gmail.com
To: Adi Mutu adi_mut...@yahoo.com 
Cc: PHP Developers Mailing List internals@lists.php.net 
Sent: Tuesday, June 5, 2012 11:17 PM
Subject: Re: [PHP-DEV] concatenation operator
 
Hi,

2012/6/5 Adi Mutu adi_mut...@yahoo.com:


 Hello,

 Can somebody point me to where the concatenation operator is implemented ?  
 . operator.

 Thanks,

See http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_vm_def.h#133

-- 
Regards,
Felipe Pena

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] concatenation operator

2012-06-07 Thread Johannes Schlüter
On Thu, 2012-06-07 at 11:50 -0700, Adi Mutu wrote:
 
 that's nice, but i haven't understood a thing...i know something about
 php core and php extensions, but nothing about the Zend engine
 specific. 

The mentioned place is directly in the VM, which in general is harder to
understand, but well, it directs to the concat_function on
http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_operators.c#1234

Knowing basic C should be enough to understand the code there. The
actual algorithm can also easily be guessed (allocate a buffer which
can hold both strings at once and copy them over,the code is a tiny bit
more complex as it tries tore use an existing buffer than allocating
something completely new)

The question is: What do you actually want to know?

 Can you point me to some resources on this topic?

Unfortunately not. The source is the best documentation we have for
that.

johannes



-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] concatenation operator

2012-06-07 Thread Adi Mutu
Ok Johannes, thanks for the answer. I'll try to look deeper. 
I basically just wanted to know what happens when you concatenate two strings? 
what emalloc/efree happens.

Also can you tell me if possible how to put a breakpoint to emalloc/efree which 
are executed only after all core functions are registered? because it takes 
like a million years like this and a million F8 presses...

Thanks.



 From: Johannes Schlüter johan...@schlueters.de
To: Adi Mutu adi_mut...@yahoo.com 
Cc: Felipe Pena felipe...@gmail.com; PHP Developers Mailing List 
internals@lists.php.net 
Sent: Thursday, June 7, 2012 10:44 PM
Subject: Re: [PHP-DEV] concatenation operator
 
On Thu, 2012-06-07 at 11:50 -0700, Adi Mutu wrote:
 
 that's nice, but i haven't understood a thing...i know something about
 php core and php extensions, but nothing about the Zend engine
 specific. 

The mentioned place is directly in the VM, which in general is harder to
understand, but well, it directs to the concat_function on
http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_operators.c#1234

Knowing basic C should be enough to understand the code there. The
actual algorithm can also easily be guessed (allocate a buffer which
can hold both strings at once and copy them over,the code is a tiny bit
more complex as it tries tore use an existing buffer than allocating
something completely new)

The question is: What do you actually want to know?

 Can you point me to some resources on this topic?

Unfortunately not. The source is the best documentation we have for
that.

johannes



-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] concatenation operator

2012-06-07 Thread Johannes Schlüter
On Thu, 2012-06-07 at 12:53 -0700, Adi Mutu wrote:
 Ok Johannes, thanks for the answer. I'll try to look deeper. 
 I basically just wanted to know what happens when you concatenate two
 strings? what emalloc/efree happens.

This depends. As always. As said what has to be done is one allocation
for the result value ... and then the zval magic, which depends on
refcount, references, ...

 Also can you tell me if possible how to put a breakpoint to
 emalloc/efree which are executed only after all core functions are
 registered? because it takes like a million years like this and a
 million F8 presses...

Depends on your debugger. Most allow conditional breakpoints or have a
breakpoint and while holding at some place add a few more ...

For such a question my preference is using DTrace (on Solaris, Mac or
BSD), something like this session:

$ cat test.d
#!/sbin/dtrace

pid$target::concat_function:entry {
self-in_concat = 1;
}

pid$target::execute:return {
self-in_concat = 0;
}

pid$target::_emalloc:entry
/ self-in_concat /
{
trace(arg0);
ustack();
}

pid$target::_erealloc:entry
/ self-in_concat /
{
trace(arg0);
trace(arg1);
ustack();
}

$ cat test1.php
?php
$a = foo; $b = bar; $a.$b;

$ dtrace -s test.d -c 'php test1.php'
dtrace: script 'test.d' matched 4 probes
dtrace: pid 16406 has exited
CPU IDFUNCTION:NAME
  3 100372   _emalloc:entry 7
  php`_emalloc
  php`concat_function+0x270
  php`ZEND_CONCAT_SPEC_CV_CV_HANDLER+0xcd
  php`execute+0x3d9
  php`dtrace_execute+0xe7
  php`zend_execute_scripts+0xf5
  php`php_execute_script+0x2e8
  php`do_cli+0x864
  php`main+0x6e2
  php`_start+0x83

$ cat test2.php
?php
$a = 23; $b = bar; $a.$b;

$ dtrace -s test.d -c 'php test2.php'
dtrace: script 'test.d' matched 4 probes
dtrace: pid 16425 has exited
CPU IDFUNCTION:NAME
  1 100373  _erealloc:entry 0   
79
  php`_erealloc
  php`xbuf_format_converter+0x11ee
  php`vspprintf+0x34
  php`zend_spprintf+0x2f
  php`_convert_to_string+0x174
  php`zend_make_printable_zval+0x5ec
  php`concat_function+0x3c
  php`ZEND_CONCAT_SPEC_CV_CV_HANDLER+0xcd
  php`execute+0x3d9
  php`dtrace_execute+0xe7
  php`zend_execute_scripts+0xf5
  php`php_execute_script+0x2e8
  php`do_cli+0x864
  php`main+0x6e2
  php`_start+0x83

  1 100372   _emalloc:entry 6
  php`_emalloc
  php`concat_function+0x270
  php`ZEND_CONCAT_SPEC_CV_CV_HANDLER+0xcd
  php`execute+0x3d9
  php`dtrace_execute+0xe7
  php`zend_execute_scripts+0xf5
  php`php_execute_script+0x2e8
  php`do_cli+0x864
  php`main+0x6e2
  php`_start+0x83

So, when having two constant strings there's a single malloc, in this
case allocating 7 bytes (strlen(foo)+strlen(bar)+1), if you have a
different type it has to be converted first ...


johannes



-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] concatenation operator

2012-06-05 Thread Felipe Pena
Hi,

2012/6/5 Adi Mutu adi_mut...@yahoo.com:


 Hello,

 Can somebody point me to where the concatenation operator is implemented ?  
 . operator.

 Thanks,

See http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_vm_def.h#133

-- 
Regards,
Felipe Pena

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php