Re: [PHP-DEV] [RFC] Exceptions in the engine
Pierre Joye wrote in message news:CAEZPtU7vt=ppk4p3vfzflaepzi_wfr2hr_av+dtzvd6d2dz...@mail.gmail.com... On Feb 21, 2015 2:08 AM, Tony Marston tonymars...@hotmail.com wrote: Nikita Nefedov wrote in message news:op.xuco5eutc9evq2@nikita-pc... On Fri, 20 Feb 2015 12:39:33 +0300, Tony Marston tonymars...@hotmail.com wrote: I disagree. Exceptions were originally invented to solve the semipredicate problem which only exists with procedural functions, not object methods. Many OO purists would like exceptions to be thrown everywhere, but this would present a huge BC break. If it were possible get these functions to throw an exception ONLY when they are included in a try ... catch block then this would not break BC at all. Tony, first of all - this still breaks BC, because exception is being thrown in a place where it used not to be... I disagree. The following function calls would not throw exceptions fopen(...); fwrite(...); fclose(...); while the following code would: try { fopen(...); fwrite(...); fclose(...); } catch () { } When some function's result heavily depends on the context it makes said function much harder to reason about. And creates mental overhead for those who'll have to read the code with this function. And again, if you need exceptions for fopen please consider using SplFileObject. For file usage, yes. But are there any other procedural functions without an Spl* alternative which would benefit from this technique? Expected failures should not raise exception. For example, IOs are expected to fail (be network, filesystem etc), I would really not be in favor of adding exceptions for similar cases. This is a normal control flow. Pierre Joye wrote in message news:CAEZPtU7vt=ppk4p3vfzflaepzi_wfr2hr_av+dtzvd6d2dz...@mail.gmail.com... On Feb 21, 2015 2:08 AM, Tony Marston tonymars...@hotmail.com wrote: Nikita Nefedov wrote in message news:op.xuco5eutc9evq2@nikita-pc... On Fri, 20 Feb 2015 12:39:33 +0300, Tony Marston tonymars...@hotmail.com wrote: I disagree. Exceptions were originally invented to solve the semipredicate problem which only exists with procedural functions, not object methods. Many OO purists would like exceptions to be thrown everywhere, but this would present a huge BC break. If it were possible get these functions to throw an exception ONLY when they are included in a try ... catch block then this would not break BC at all. Tony, first of all - this still breaks BC, because exception is being thrown in a place where it used not to be... I disagree. The following function calls would not throw exceptions fopen(...); fwrite(...); fclose(...); while the following code would: try { fopen(...); fwrite(...); fclose(...); } catch () { } When some function's result heavily depends on the context it makes said function much harder to reason about. And creates mental overhead for those who'll have to read the code with this function. And again, if you need exceptions for fopen please consider using SplFileObject. For file usage, yes. But are there any other procedural functions without an Spl* alternative which would benefit from this technique? Expected failures should not raise exception. For example, IOs are expected to fail (be network, filesystem etc), I would really not be in favor of adding exceptions for similar cases. This is a normal control flow. Then why do SplFileInfo::openFile and SplFileObject::__construct throw exceptions if the file cannot be opened? -- Tony Marston -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: AW: [PHP-DEV] Coercive Scalar Type Hints RFC
On 22/02/2015 13:09, Robert Stoll wrote: [snip] ... PHP 7.1: necessary bug-fixes introduced with PHP 7.0 PHP 7.x: deprecate even more if required PHP 8: - introduce scalar type hints which reflect the conversion rules as defined (adding strict type hints as well is possible of course, whether with an ini-setting, a declare statement or individually with a modifier something like strict int for a single parameter or strict function for all parameters incl. return type or strict class for every type defined in the class is up to discussion) - exchange the behaviour of (bool), (int) etc. - use the new conversion rules instead - change internal functions which do not yet obey to the new conversion rules - change the operators which do not yet obey to the new conversion rules (for instance, + would also emit an E_RECOVERABLE_ERROR for a + 1) - (change the control structures in order that they obey the new conversion rules as well) = as mentioned above, probably too strict for PHP Back to this RFC. think this RFC goes in the right direction with the specified conversion rules. Only thing to get rid of are the implicit conversions to bool from string, float and int IMO. Moreover, I like that the RFC already has different steps for adding the new behaviour. Yet, I think it should slow down a little bit as shown. I think we need more time to come up with a very good strategic solution. Thoughts? +1 - good analysis - a single mode approach with consistent type coercion rules across the board makes absolute sense even if STH are put back until PHP 8.x -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Coercive Scalar Type Hints RFC
On 02/22/2015 09:00 AM, Etienne Kneuss wrote: There have been several attempts: for JS: http://users-cs.au.dk/simonhj/tajs2009.pdf or similar techniques applied to PHP, quite outdated though: https://github.com/colder/phantm You are right that the lack of static information about types is (one of the) a main issue. Recovering the types has typically a huge performance cost, or is unreliable But seriously, time is getting wasted on this argument; it's actually a no-brainer: more static information helps tools that rely on static information. Yes. Absolutely. 100%. The question is rather: at what weight should we take (potential/future) external tools into account when developping language features? Previous on the list nodejs JIT engine was mentioned as a working example of a JIT without the language having any sort of type information. While this is true I think it should also be considered the amount of checks and resources required for the generated machine code to achieve this. On this tests you can see that in most situations the javascript v8 engine used on nodejs uses much more memory than that of current PHP (compare it to C also) (benchmarksgame.alioth.debian.org/u64/compare.php?lang=v8lang2=php) Yes, it is faster, but it consumes much more CPU and RAM in most situations, and I'm sure that it is related to the dynamic nature of the language. A JIT or AOT machine code generator IMHO will never have a decent use of system resources without some sort of strong/strict typed rules, somebody explain if thats not the case. As I see it, some example, if the JIT generated C++ code to then generate the machine code: function calc(int $val1, int $val2) : int {return $val1 + $val2;} On weak mode I see the generated code would be something like this: Variant* calc(Variant val1, Variant val2) { if(val1.isInt() val2.isInt()) return new Variant(val1.toInt() + val2.toInt()); else if(val1.isFloat() val2.isFloat()) return new Variant(val1.toInt() + val2.toInt()); else throw new RuntimeError(); } while on strict mode the generated code could be: int calc(int val1, int val2) { return val1 + val2; } So in this scenario is clear that strict mode performance and memory usage would be better. Code conversion code would be required only in some cases, example: calc(1, 5) // No need for casting calc((int) 12, (int) 15) // Needs casting depending on how the parser deals with it If my example is right it means strict would be better to achieve good performance rather than weak which is almost the same situation we have now with zval's. Also I think is wrong to say that casting will always take place on strict mode. So I have some questions floating on my mind for the coercive rfc. 1. Does weak mode could provide the required rules to implement a JIT with a sane level of memory and CPU usage? 2. I see that the proponents of dual weak/strict modes are offering to write a AOT implementation if strict makes it, And the impresive work of Joe (JITFU) and Anthony on recki-ct with the strict mode could be taken to another level of integration with PHP and performance. IMHO is harder and more resource hungry to implement a JIT/AOT using weak mode. With that said, if a JIT implementation is developed will the story of the ZendOptimizer being a commercial solution will be repeated or would this JIT implementation would be part of the core? Thats all that comes to mind now, and while many people doesn't care for performance, IMHO a programming language mainly targeted for the web should have some caring on this department. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Coercive Scalar Type Hints RFC
On Sat, Feb 21, 2015 at 2:39 PM, Pavel Kouřil pajou...@gmail.com wrote: On Sat, Feb 21, 2015 at 11:25 PM, Zeev Suraski z...@zend.com wrote: There’s a fundamental difference between the two RFCs that goes beyond whether using a global INI setting and the other per-file setting. The fundamental difference is that the endgame of the Dual Mode RFC is having two modes – and whatever syntax we’ll add, will be with us forever; and in the Coercive STH RFC – the endgame is a single mode with no INI entries, and opening the door to changing the rest of PHP to be consistent with the same rule-set as well (implicit casts). The challenge with the Coercive STH RFC is figuring out the best transition strategy, but the endgame is superior IMHO. Hello, the two modes was something that I didn't like, at all, as a userland developer. It seems really scary that decision to add 2 modes would mean that every PHP code could have been written in any of these 2 ways and it would stick forever with PHP, because removing it again if it proved to be a bad feature would be IMHO really painful. So a single mode is infinitely better than 2 modes. Also, personally, I would prefer #1 or #2 version for internal functions, but definitely without an INI switch. Not being able to change it on some hostings could make development for the transition period kinda painful. Regards Pavel Kouril -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php As a userland developer I will chime in and say that I love this rfc better than the dual mode. In my opinion, having dual mode will put unnecessary cognitive burden on me especially when reading other people's code and libraries. While current rfc conversion rules are also different than what exists in PHP but I can get used to them and they are more in line with what should ideally be there (except contentious ones like bool). I will certainly love to see the type coercion rules being unified and slightly tightened up over time and this is a good first step for that. I am not a lang expert but of the languages I have learnt and developed in, I can't think of any that allow dual mode type coercion rules in such an explicit manner as the one proposed by Andrea/Anthony. Thanks Shashank
Re: [PHP-DEV] Unnecessary extensions ...
On 22 02 2015, at 11:31, Lester Caine les...@lsces.co.uk wrote: With the discussion on adding http extension by default and not now having other key extensions in a normal build I'm looking at what I NEED and what I can get away without. On the current PHP7 test build I do not have mysqlnd installed as I don't use mysql, but I can't make the mysql section available in a second php-fpm instance becuase I can't add mysqlnd as a shared module. Just to clarify that bit: enabling ext/http by default wouldn’t mean it’s not possible to disable it. An extension enabled by default, does not implicitly mean it cannot be disabled, like standard or spl. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] PDO_DBLIB type handling
On 21/02/2015 23:12, Yasuo Ohgaki wrote: Hi Adam, On Sat, Feb 21, 2015 at 2:22 AM, Adam Baratz adam.bar...@gmail.com wrote: This driver returns all column data as a string, regardless of how it's represented in the DB. I created a patch for my own use that syncs up the type handling with the behavior of the MSSQL extension. This seems like it would be of general use. Does anyone have any feedback before I put together an RFC? My main question would be whether people would rather have this be the default/only behavior, or whether it should be opted into via PDO::ATTR_STRINGIFY_FETCHES. Databases return string data to return correct data in DB. Most obvious is NUMERIC data type. NUMERIC has any precision. We may have 128 bit INT in near future also. So it should return string by default, PHP may convert types into PHP native types optionally. Not the other way around. IMHO. The default behaviour of mysql/pgsql drivers is to convert to the matching PHP type, if possible. That can be turned off via PDO::ATTR_STRINGIFY_FETCHES = true. If PDO_DBLIB doesn't behave like that, I'd say it's a bug that needs to be fixed, but possibly only applied to a major/minor release due to the BC break. Cheers -- Matteo Beccati Development Consulting - http://www.beccati.com/ -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Coercive Scalar Type Hints RFC
On Sat Feb 21 2015 at 21:08:39 Anthony Ferrara ircmax...@gmail.com wrote: Zeev, I won't nit-pick every point, but there are a few I think need to be clarified. Proponents of Dynamic STH bring up consistency with the rest of the language, including some fundamental type-juggling aspects that have been key tenets of PHP since its inception. Strict STH, in their view, is inconsistent with these tenets. Dynamic STH is apparently consistency with the rest of the language's treatment of scalar types. It's inconsistent with the rest of the languages treatment of parameters. Not in the way Andrea proposed it, IIRC. She opted to go for consistency with internal functions. Either way, at the risk of being shot for talking about spiritual things, Dynamic STH is consistent with the dynamic spirit of PHP, even if there are some discrepancies between its rule-set and the implicit typing rules that govern expressions. Note that in this RFC I'm actually suggesting a possible way forward that will align *all* aspects of PHP, including implicit casting - and have them all governed by a single set of rules. The point I was making up to there is that we currently have 2 type systems: user-land object and ZPP-scalar. So in any given function you have 2 type systems interacting. The current ZPP scalar type is dynamic, and user-land object static. With the proposal here, you'd unify user-land scalar to behave as zpp-scalar. So you'd have two type systems in any given function: scalar and object (which behave differently). My proposal gives you the same two by default (scalar and object) and a strict switch to collapse them into a single, unified type system. This is even more apparent with the int-float acceptance, because we can mentally model Float as an object that extends Int. Then it makes perfect sense why you'd accept ints where you see floats, but not the opposite. However there's an important point to make here: a lot of best practice has been pushing against the way PHP treats scalar types in certain cases. Specifically around == vs === and using strict comparison mode in in_array, etc. I think you're correct on comparisons, but not so much on the rest. Dynamic use of scalars in expressions is still exceptionally common in PHP code. Even with comparisons, == is still very common - and you'd use == vs. === depending on what you need. So while it appears consistent with the rest of PHP, it only does so if you ignore a large part of both the language and the way it's commonly used. Let's agree to disagree. That's one thing we can always agree on! :) I'm talking about the object system. I don't think you're disagreeing that it's static. Hence coercive scalars are consistent only if you look at 1/2 the type system. That was the point I was making there. 3. Just Do It but give users an option to not - This has the problems that E_DEPRECATED has, but it also gets us back to having fundamental code behavior controlled by an INI setting, which for a very long time this community has generally seen as a bad thing (especially for portability and code re-use). I do too, and I was upfront about their cons, not just pros. And yet, they all bring us to a much better outcome within a relatively short period of time (in the lifetime of a language) than the Dual Mode will. Let's agree to disagree that an ini setting will be better than a per-file setting. In fact, I personally think this is major enough of an issue that I will vote no simply on this reason alone (type behavior depending on an ini setting in any way shape or form). Further, the two sets can cause the same functions to behave differently depending on where they're being called I think that's misleading. The functions will always behave the same. The difference is how you get data into the function. The behavior difference is in your code, not the end function. I'll be happy to get a suggestion from you on how to reword that. Ultimately, from the layman user's point of view, she'd be calling foo() from one place and have it accept her arguments, and foo() from another place and have it reject the very same arguments. Let me think on it and I will come up with something. With strict mode, you'd have to embed a cast (smart or explicit) to convert to an integer at the point the data comes in. First, I'm not aware of smart/safe casts being available or proposed at this point. Secondly, why at the point the data comes in? That would be ideal for static analyzers, but it's probably a lot more common that it will be done at the first point in time where it gets rejected. By smart cast I was referring to a function which checked is_numeric(). Not a new language construct. I have a hard time connecting to the 'power' approach. I think developers want their code to work, with
[PHP-DEV] Unnecessary extensions ...
With the discussion on adding http extension by default and not now having other key extensions in a normal build I'm looking at what I NEED and what I can get away without. On the current PHP7 test build I do not have mysqlnd installed as I don't use mysql, but I can't make the mysql section available in a second php-fpm instance becuase I can't add mysqlnd as a shared module. Just what is the current state on what is 'required' and what is still optional. I will return to banging on about breaking up php-src so that one CAN get away with building individual modules and I see no reason why those who want 'strict' can't have that as a pecl module to replace 'lax' operations. -- Lester Caine - G8HFL - Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk Rainbow Digital Media - http://rainbowdigitalmedia.co.uk -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Unnecessary extensions ...
I see no reason why those who want 'strict' can't have that as a pecl module to replace 'lax' operations. Simple, the most robust implementation is inferior to internal support. Making a call to this: function (int $some, double $other) { } behave as if Zend is strict is quite easy, what is difficult is: class Foo { public function bar(int $some) { } } class Qux extends Foo { public function bar(double $some) { } } Enforcing our current rules is so hard you might as well call it impossible. TL;DR because internal support is much much much better, in every possible way Cheers Joe Even if you managed it, it would not be robust, in any reasonable opinion. On Sun, Feb 22, 2015 at 10:31 AM, Lester Caine les...@lsces.co.uk wrote: With the discussion on adding http extension by default and not now having other key extensions in a normal build I'm looking at what I NEED and what I can get away without. On the current PHP7 test build I do not have mysqlnd installed as I don't use mysql, but I can't make the mysql section available in a second php-fpm instance becuase I can't add mysqlnd as a shared module. Just what is the current state on what is 'required' and what is still optional. I will return to banging on about breaking up php-src so that one CAN get away with building individual modules and I see no reason why those who want 'strict' can't have that as a pecl module to replace 'lax' operations. -- Lester Caine - G8HFL - Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk Rainbow Digital Media - http://rainbowdigitalmedia.co.uk -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Unnecessary extensions ...
Apologies for terribly formatted communication there. Cheers Joe On Sun, Feb 22, 2015 at 10:38 AM, Michael Wallner m...@php.net wrote: On 22 02 2015, at 11:31, Lester Caine les...@lsces.co.uk wrote: With the discussion on adding http extension by default and not now having other key extensions in a normal build I'm looking at what I NEED and what I can get away without. On the current PHP7 test build I do not have mysqlnd installed as I don't use mysql, but I can't make the mysql section available in a second php-fpm instance becuase I can't add mysqlnd as a shared module. Just to clarify that bit: enabling ext/http by default wouldn’t mean it’s not possible to disable it. An extension enabled by default, does not implicitly mean it cannot be disabled, like standard or spl. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP-DEV] Coercive Scalar Type Hints RFC
From: Etienne Kneuss [mailto:col...@php.net] Sent: Sunday, February 22, 2015 3:00 PM To: Anthony Ferrara; Zeev Suraski Cc: PHP internals Subject: Re: [PHP-DEV] Coercive Scalar Type Hints RFC The question is rather: at what weight should we take (potential/future) external tools into account when developping language features? Agreed! My answer - Static Analyzers need to be designed for Languages, rather than Languages being designed for Static Analyzers. Will send additional thoughts on Static Analysis on a separate, off-list email to put this argument to rest as both Anthony and agreed to. Zeev -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
AW: [PHP-DEV] Coercive Scalar Type Hints RFC
Hi Zeev, -Ursprüngliche Nachricht- Von: Zeev Suraski [mailto:z...@zend.com] Gesendet: Samstag, 21. Februar 2015 18:22 An: PHP internals Betreff: [PHP-DEV] Coercive Scalar Type Hints RFC All, I’ve been working with François and several other people from internals@ and the PHP community to create a single-mode Scalar Type Hints proposal. I think it’s the RFC is a bit premature and could benefit from a bit more time, but given the time pressure, as well as the fact that a not fully compatible subset of that RFC was published and has people already discussing it, it made the most sense to publish it sooner rather than later. The RFC is available here: wiki.php.net/rfc/coercive_sth Comments welcome! Zeev First of all, thank you and all others working on this RFC but also people working on another RFC related to scalar type hints. It is good that PHP will get scalar type hints eventually. Although I think the strict mode as proposed in the v0.5 RFC is nice as such, I prefer this RFC simply by the fact that it does not introduce different modes. I genuinely believe that different modes will be very harmful for PHP. Yet, this RFC is not perfect either. IMO PHP is not ready for scalar type hints, not for PHP 7.0 respectively and we should instead focusing on clearing the way for the introduction of scalar type hints in PHP 7.x and hence introduce all required BC breaks in PHP 7.0 which are necessary in order that scalar type hints can be added to PHP 7.x later on. I am provoking on purpose of course. But rightly so, because I think in all the debate about strict/weak scalar type hints we lost focus on what really matters in language growth, namely its maturation. For instance, Pierre and others carped about that string - bool and float - bool are accepted by this RFC. While I agree that it is a bad idea to apply implicit conversions to such input (I would not even allow int - bool to be honest), it makes totally sense for PHP to behave like this at the moment. I would even claim that null, array, object, literally everything should be accepted as well since the explicit (bool) accepts everything as well and some implicit castings such as the one in an if statement accepts also everything. From questions like these: Boolean STH (bool): this is by far too weak. How strings could be consider as valid, how? true Boolean true? I suppose then false will be boolean false? What's is the boolean value of float 0.5? At the very least only integer should be accepted, 0 false, anything =1 true I get the impression that even internals start to get confused about the conversion rules PHP has. Implicitly convert something to bool should be exactly the same as an explicit conversion (thus straight forward to verify http://3v4l.org/nVgbG ). We should start to eliminate the different behaviour of implicit/explicit castings [1], to have a consistent and predictable/obvious behaviour in the long run. Or in other words, and that is what I meant above, PHP's type system needs to mature. While I can understand that it looks beneficial to have all kind of reliefs for the beginner, it is rather harmful in the long run. PHP has so many inconsistencies and requires a user to be aware of all kind of edge cases that I think bugs are introduced more frequently than necessary. We already have different conversion mechanisms in PHP and I guess the reason why https://wiki.php.net/rfc/safe_cast was declined is based on the fact that most people did not want to see yet another group of conversion rules. There were people claiming that PHP follows the philosophy that a user does not need to know anything about scalar types. PHP will deal with it via type juggling. A function/operator requires an int? Just pass a scalar and PHP will convert it automatically via type juggling to int. That is long gone (probably was never there) because the user had to know exactly what type can be passed or rather what values, otherwise bugs are inevitable. Consider the following: a % 1; fmod(a, 0.5); Kind of logical that % accepts any kind of scalar where fmod does not, right? I do not want to exaggerate too much on this but I think you get my position that PHP needs to get rid of this inconsistencies rather than adding yet another obstacle which impedes to reach consistency. Once scalar type hints are in place it should follow the conversion rules which we want to have in PHP in the long run otherwise the BC impact it would have to change them would be too big and we would at least need to wait till PHP 8 if not even PHP 9. So what does that mean for scalar types? IMO it means that way more important than adding scalar type hints to PHP 7.0 is to agree on a new set of conversion rules for the long run. PHP should strive to have one consistent set of conversion rules which apply in all places where implicit or explicit conversion are
Re: [PHP-DEV] Coercive Scalar Type Hints RFC
On Sun, Feb 22, 2015 at 2:09 PM, Robert Stoll p...@tutteli.ch wrote: I see the migration plan roughly as follows: PHP 7.0: - reserve keywords: bool, int, float including alternatives - deprecate alternative type names such as boolean, integer etc. - introduce new conversion functions which reflect the current behaviour of (bool), (int) etc. -- as mentioned above, they could be named oldSchoolBoolConversion etc. -- Encourage users to use this function instead of (bool), (int) etc since (bool) etc. will change with PHP 8.0. Also mention, that this function should only be used if the weakness is really required otherwise use the new conversion functions from below - introduce new conversion functions which reflect the new defined conversion rule set (which shall be the only one encouraged in the future) Those functions shall trigger an E_RECOVERABLE_ERROR -- encourage users to use this functions instead of (bool), (int) and oldSchoolBoolConversion etc. (unless the weakness is really required, then use oldSchoolBoolConversion) - update the docs in order to reflect the new encouraged way. Also mention that: - (bool), (int) etc. will change their behaviour in PHP 8.0 - internal functions will use the new conversion rules if not already done this way in PHP 8.0 (for instance, strstr will no longer accept a scalar as third parameter in the case where we do not support implicit casts to bool) - operators will use the new conversion rules if not already done this way in PHP 8.0 - (control structures will use the new conversion rules if not already done this way in PHP 8.0) =Maybe this is too strict for most of you and goes against the spirit of PHP (I suppose some of you will say that - fair enough, I guess you are right). In this case, I would at least use the term loose comparison as mentioned here: http://php.net/manual/en/types.comparisons.php#types.comparisions-loose instead of using the term conversion, then it is compatible with the changes introduced in PHP 8.0 PHP 7.1: necessary bug-fixes introduced with PHP 7.0 PHP 7.x: deprecate even more if required PHP 8: - introduce scalar type hints which reflect the conversion rules as defined (adding strict type hints as well is possible of course, whether with an ini-setting, a declare statement or individually with a modifier something like strict int for a single parameter or strict function for all parameters incl. return type or strict class for every type defined in the class is up to discussion) - exchange the behaviour of (bool), (int) etc. - use the new conversion rules instead - change internal functions which do not yet obey to the new conversion rules - change the operators which do not yet obey to the new conversion rules (for instance, + would also emit an E_RECOVERABLE_ERROR for a + 1) - (change the control structures in order that they obey the new conversion rules as well) = as mentioned above, probably too strict for PHP Back to this RFC. think this RFC goes in the right direction with the specified conversion rules. Only thing to get rid of are the implicit conversions to bool from string, float and int IMO. Moreover, I like that the RFC already has different steps for adding the new behaviour. Yet, I think it should slow down a little bit as shown. I think we need more time to come up with a very good strategic solution. Hello, Am I understanding correctly that you are suggesting changes to type casting? This seems like a bad idea. Explicit and implicit conversions are something really different. Generally, implicit conversions are OK only when no data is lost and explicit conversions (casts) are used when you realize some information can get lost and you still want to proceed with the conversion. Having only one type of conversion is IMHO weird. Also, I'm not a fan of having to wait for scalar type hints for few more years. :( Regards Pavel Kouril -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Coercive Scalar Type Hints RFC
1. Does weak mode could provide the required rules to implement a JIT with a sane level of memory and CPU usage? There is no objective answer to the question while it has the clause with a sane level of The assertion in the RFC that says there is no difference between strict and weak types, in the context of a JIT/AOT compiler, is wrong. function add(int $l, int $r) { return $l + $r; } The first instruction in the interpreted code is not ZEND_ADD, first, parameters must be received from the stack. If that's a strict function, then un-stacking parameters is relatively easy, if it's a dynamic function then you have to generate code that is considerably more complicated. This is an inescapable difference, of the the kind that definitely does have a negative impact on implementation complexity, runtime, and maintainability. To me, it only makes sense to compile strict code AOT or JIT; If you want dynamic behaviour, we have an extremely mature platform for that. 2. ... With that said, if a JIT implementation is developed will the story of the ZendOptimizer being a commercial solution will be repeated or would this JIT implementation would be part of the core? There should hopefully be no need to complicate the core with the implementation, the number of people that are capable of maintaining Zend is low enough already, the number of people able to maintain something as new (for us) and complex as a JIT/AOT engine is even less, I fear. I think it's likely that Anthony and I, and Dmitry want different things for a JIT/AOT engine. I think Anthony and I are preferring an engine that requires minimal inference because type information is present (or implicit), while Dmitry probably favours the kind that can infer at runtime, the dynamic kind, like Zend is today. They are a world apart, I think, I'll be happy to be proven wrong about that. I like to think that even if Dmitry wrote it all by himself, it would be opensource from the start, in fact I don't think that will happen. I'm hoping we'll all work on the same solution together. Cheers Joe On Sun, Feb 22, 2015 at 2:24 PM, Jefferson Gonzalez jgm...@gmail.com wrote: On 02/22/2015 09:00 AM, Etienne Kneuss wrote: There have been several attempts: for JS: http://users-cs.au.dk/simonhj/tajs2009.pdf or similar techniques applied to PHP, quite outdated though: https://github.com/colder/phantm You are right that the lack of static information about types is (one of the) a main issue. Recovering the types has typically a huge performance cost, or is unreliable But seriously, time is getting wasted on this argument; it's actually a no-brainer: more static information helps tools that rely on static information. Yes. Absolutely. 100%. The question is rather: at what weight should we take (potential/future) external tools into account when developping language features? Previous on the list nodejs JIT engine was mentioned as a working example of a JIT without the language having any sort of type information. While this is true I think it should also be considered the amount of checks and resources required for the generated machine code to achieve this. On this tests you can see that in most situations the javascript v8 engine used on nodejs uses much more memory than that of current PHP (compare it to C also) (benchmarksgame.alioth.debian.org/u64/compare.php?lang=v8lang2=php) Yes, it is faster, but it consumes much more CPU and RAM in most situations, and I'm sure that it is related to the dynamic nature of the language. A JIT or AOT machine code generator IMHO will never have a decent use of system resources without some sort of strong/strict typed rules, somebody explain if thats not the case. As I see it, some example, if the JIT generated C++ code to then generate the machine code: function calc(int $val1, int $val2) : int {return $val1 + $val2;} On weak mode I see the generated code would be something like this: Variant* calc(Variant val1, Variant val2) { if(val1.isInt() val2.isInt()) return new Variant(val1.toInt() + val2.toInt()); else if(val1.isFloat() val2.isFloat()) return new Variant(val1.toInt() + val2.toInt()); else throw new RuntimeError(); } while on strict mode the generated code could be: int calc(int val1, int val2) { return val1 + val2; } So in this scenario is clear that strict mode performance and memory usage would be better. Code conversion code would be required only in some cases, example: calc(1, 5) // No need for casting calc((int) 12, (int) 15) // Needs casting depending on how the parser deals with it If my example is right it means strict would be better to achieve good performance rather than weak which is almost the same situation we have now with zval's. Also I think is wrong to say that casting will always take place on strict mode. So I have some questions floating on my mind for the coercive rfc.
AW: [PHP-DEV] Coercive Scalar Type Hints RFC
Hi Pavel, -Ursprüngliche Nachricht- Von: Pavel Kouřil [mailto:pajou...@gmail.com] Gesendet: Sonntag, 22. Februar 2015 15:54 An: Robert Stoll Cc: Zeev Suraski; PHP internals Betreff: Re: [PHP-DEV] Coercive Scalar Type Hints RFC On Sun, Feb 22, 2015 at 2:09 PM, Robert Stoll p...@tutteli.ch wrote: I see the migration plan roughly as follows: PHP 7.0: - reserve keywords: bool, int, float including alternatives - deprecate alternative type names such as boolean, integer etc. - introduce new conversion functions which reflect the current behaviour of (bool), (int) etc. -- as mentioned above, they could be named oldSchoolBoolConversion etc. -- Encourage users to use this function instead of (bool), (int) etc since (bool) etc. will change with PHP 8.0. Also mention, that this function should only be used if the weakness is really required otherwise use the new conversion functions from below - introduce new conversion functions which reflect the new defined conversion rule set (which shall be the only one encouraged in the future) Those functions shall trigger an E_RECOVERABLE_ERROR -- encourage users to use this functions instead of (bool), (int) and oldSchoolBoolConversion etc. (unless the weakness is really required, then use oldSchoolBoolConversion) - update the docs in order to reflect the new encouraged way. Also mention that: - (bool), (int) etc. will change their behaviour in PHP 8.0 - internal functions will use the new conversion rules if not already done this way in PHP 8.0 (for instance, strstr will no longer accept a scalar as third parameter in the case where we do not support implicit casts to bool) - operators will use the new conversion rules if not already done this way in PHP 8.0 - (control structures will use the new conversion rules if not already done this way in PHP 8.0) =Maybe this is too strict for most of you and goes against the spirit of PHP (I suppose some of you will say that - fair enough, I guess you are right). In this case, I would at least use the term loose comparison as mentioned here: http://php.net/manual/en/types.comparisons.php#types.comparisions-loos e instead of using the term conversion, then it is compatible with the changes introduced in PHP 8.0 PHP 7.1: necessary bug-fixes introduced with PHP 7.0 PHP 7.x: deprecate even more if required PHP 8: - introduce scalar type hints which reflect the conversion rules as defined (adding strict type hints as well is possible of course, whether with an ini-setting, a declare statement or individually with a modifier something like strict int for a single parameter or strict function for all parameters incl. return type or strict class for every type defined in the class is up to discussion) - exchange the behaviour of (bool), (int) etc. - use the new conversion rules instead - change internal functions which do not yet obey to the new conversion rules - change the operators which do not yet obey to the new conversion rules (for instance, + would also emit an E_RECOVERABLE_ERROR for a + 1) - (change the control structures in order that they obey the new conversion rules as well) = as mentioned above, probably too strict for PHP Back to this RFC. think this RFC goes in the right direction with the specified conversion rules. Only thing to get rid of are the implicit conversions to bool from string, float and int IMO. Moreover, I like that the RFC already has different steps for adding the new behaviour. Yet, I think it should slow down a little bit as shown. I think we need more time to come up with a very good strategic solution. Hello, Am I understanding correctly that you are suggesting changes to type casting? This seems like a bad idea. Explicit and implicit conversions are something really different. Generally, implicit conversions are OK only when no data is lost and explicit conversions (casts) are used when you realize some information can get lost and you still want to proceed with the conversion. Having only one type of conversion is IMHO weird. Yes, I am suggesting to make conversions behave the same regardless if it is implicit or explicit. The only difference between the two should be that one is stated explicitly by the user where the other is applied implicitly. Other programming languages behave like this and are more predictable for users as well as developers because one does not need to learn two sets of conversion rules. Also, I'm not a fan of having to wait for scalar type hints for few more years. :( Regards Pavel Kouril -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Coercive Scalar Type Hints RFC
On 02/22/2015 11:06 AM, Joe Watkins wrote: This is an inescapable difference, of the the kind that definitely does have a negative impact on implementation complexity, runtime, and maintainability. To me, it only makes sense to compile strict code AOT or JIT; If you want dynamic behaviour, we have an extremely mature platform for that. So basically weak mode coupled with a JIT will be almost the same thing we have today, the only difference is the opcache would be replaced with machine code (for a bit more of performance), but the same logic and code used on the zend engine will also be used on the generated code of the JIT (more bloat). On the other hand a strict type mode would allow the generation of machine code that is much cleaner and similar in respect of C to machine code translation, meaning it would be more efficient and less resource hungry to the point that functions code generated by the AOT or JIT would be more efficient than those functions provided by the zend engine, which does lots of type checking/parsing (less bloat). There should hopefully be no need to complicate the core with the implementation, the number of people that are capable of maintaining Zend is low enough already, the number of people able to maintain something as new (for us) and complex as a JIT/AOT engine is even less, I fear. And thats why I asked about the commercial stuff, because, like things are looking, from a technical perspective the strict mode opens the doors for an easier implementation of AOT or JIT while a weak mode would only make it harder for others in the community to work in such things, which again rises the question, does this whole idea of favoring a weak model by the minority serve as an impediment/complication for others so they (those who favor weak) can force a commercial solution? I think it's likely that Anthony and I, and Dmitry want different things for a JIT/AOT engine. I think Anthony and I are preferring an engine that requires minimal inference because type information is present (or implicit), while Dmitry probably favours the kind that can infer at runtime, the dynamic kind, like Zend is today. They are a world apart, I think, I'll be happy to be proven wrong about that. I like to think that even if Dmitry wrote it all by himself, it would be opensource from the start, in fact I don't think that will happen. I'm hoping we'll all work on the same solution together. And it would be ideal to have the most capable people to develop this solution to work in a single team from a community point of view. IMHO a dual weak/strict mode is the best way of getting people to work together in a way that benefits the community. Otherwise, a single handed man working on a solution can serve as a justification to commercialize something that is being currently offered by others (HHVM). -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)
-Original Message- From: Jefferson Gonzalez [mailto:jgm...@gmail.com] Sent: Sunday, February 22, 2015 4:25 PM To: Etienne Kneuss; Anthony Ferrara; Zeev Suraski Cc: PHP internals Subject: Re: [PHP-DEV] Coercive Scalar Type Hints RFC Jefferson, Please note that Anthony, the lead of the Dual Mode RFC, said this earlier on this thread, referring to the claim that Strict STH can improve JIT/AOT compilation: A statement which I said on the side, and I said should not impact RFC or voting in any way. And is in no part in my RFC at all. Please also see: marc.info/?l=php-internalsm=142439750614527w=2 So while Anthony and I don't agree on whether there are performance gains to be had from Strict STH, both of us agree that it's not at a level that should influence our decision regarding the RFCs on the table. I wholeheartedly agree with that stance, which is why I also listed the apparently extremely widespread misconception (IMHO) that Strict STH can meaningfully help JIT/AOT in my RFC. Despite that, as your email suggests, there are still (presumably a lot) of people out there that assume that there are, in fact, substantial gains to be had from JIT/AOT if we introduce Strict STH. I'm going to take another stab at explaining why that's not the case. A JIT or AOT machine code generator IMHO will never have a decent use of system resources without some sort of strong/strict typed rules, somebody explain if thats not the case. It kind of is and kind of isn't. There's consensus, I think, that if PHP was completely strongly typed - i.e., all variables need to be declared and typed ahead of time, cannot change types, etc. - we'd clearly be able to create a lot of optimizations in AOT that we can't do today. That the part that 'is the case'. But nobody is suggesting that we do that. The discussion on the table is very, very narrow: -- Can the code generated for a strict type hint can somehow be optimized significantly better than the code generated for a dynamic/coercive type hint. And here, I (as well as Dmitry, who actually wrote a JIT compiler for PHP) claim that it isn't the case. To be fair, there's no consensus on this point. Let me attempt, again, to explain why we don't believe there are any gains associated with Strict STH, be them with the regular engine, JIT or AOT. Consider the following code snippet: function strict_foo($x) { if (!is_int($x)) { trigger_error(); } .inner_code. } function very_lax_foo($x) { $x = (int) $x; .inner_code. } function test_strict() { .outer_code. strict_foo($x); } function test_lax() { .outer_code. very_lax_foo($x); } test_strict(); test_lax(); strict_foo() implements a pretty much identical check to the one that a Strict integer STH would perform. very_lax_foo() implements an explicit type conversion to int, that can pretty much never fail - which is significantly more lax than what is proposed for weak types in the Dual Mode RFC, and even more so compared to the Coercive STH RFC. .inner_code. is identical between the two foo() functions, and .outer_code. is identical between the two tester functions. The claim that strict types can be more efficiently optimized than more lax types, suggests it should be possible to optimize the code flow for test_strict()/strict_foo() significantly better than for very_lax_foo() using JIT/AOT. Let's dive in. Beginning with the easy part, that's been mentioned countless times - it's clear that it's just as easy (or hard) to optimize the .inner_code. block in the two implementations of foo(). It can bank on the exact same assumptions - $x would be an integer. So we can optimize the two function bases to exactly the same level. For example, if we're sure that $x inside the function never changes type - it can be optimized down to a C-level long. That's oversimplifying things a bit, but the important thing here is that it can be easily proven that the two function bodies can be optimized to the exact same level, for better or worse. The only difference between them is how they handle non-integer inputs; The strict implementation errors out if it gets a non-integer typed value, while the lax version happily accepts everything. But that's a functionality difference, not a performance one (i.e., if you want the value to be accepted in the strict case, you'd manually conduct the conversion before the call is made, or sooner - resulting in roughly the same behavior and performance). Now the slightly trickier part - the .outer_code. block. What can we say about the type of $x, without knowing what code is in there? Not a whole lot. We know that if $x isn't going to be of type int at the end of this block, test_strict() is going to fail, but that doesn't mean $x will truly be an int. The fact I want to be young and healthy doesn't mean I'm going to magically become young and healthy :) Let's dive further. Assuming we don't have strong variable type declarations (i.e., int
Re: [PHP-DEV] Coercive Scalar Type Hints RFC
On Sun, Feb 22, 2015 at 7:30 PM, Robert Stoll p...@tutteli.ch wrote: Hi Pavel, Yes, I am suggesting to make conversions behave the same regardless if it is implicit or explicit. The only difference between the two should be that one is stated explicitly by the user where the other is applied implicitly. Other programming languages behave like this and are more predictable for users as well as developers because one does not need to learn two sets of conversion rules. Actually this is not true. Other languages have differences between explicit conversions (aka casting) and implicit conversions as well. C# is the language I use the most after PHP, so I'll bring that one up (see https://msdn.microsoft.com/en-us/library/ms173105.aspx), but I believe other languages (probably Java?) act the same way. Regards Pavel Kouril -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] new json, push generated file?
Hi Jakub, On Tue, February 17, 2015 17:53, Anatol Belski wrote: Hi Jakub, On Sun, February 15, 2015 21:18, Jakub Zelenka wrote: On Wed, Feb 11, 2015 at 11:56 AM, Jakub Zelenka bu...@php.net wrote: I would like to push the the bison tab files shortly as the majority of people in this thread (including me) are for having them in the repo. The only thing that I would like is to have a specific version in the repo to prevent big diffs for small changes in the source files. For now I would like to have there re2c 0.13.6 (thanks for regenerating it back ;) ) and bison 2.7.1 gen files. I will update Readme at some point as well and add there that info. Hey just a quick update. I bumped the version in the repo for re2c 0.13.7.5 (latest - no changes in generated states ) and bison to 3.0.4 . I updated Readme as well. I have pushed the bison files in http://git.php.net/?p=php-src.git;a=commit;h=911f7b10b1f4c9529bc01580d29 8a 93a5cd6bbd2 . There is an explanation why the C preprocessor guard macro names are file system dependent. It means why there is YY_PHP_JSON_YY_HOME_JAKUB_PROG_PHP_MASTER_EXT_JSON_JSON_PARSER_TAB_H_IN CL UDED . It's due to bison algorithm for creating such name. As I noted the only solution that works for me is using different yacc.c skeleton. I have done it in jsond in https://github.com/bukka/php-jsond/commit/583619d7962fa57bf97dcdac4147d 8b 599a42672 where I have optional bison generation which means that I can stick with one bison version and use custom skeleton file. This is however not possible in the core where we allow range of bison versions which doesn't play well with skeletons that are version specific. thanks for pushing. I'm using re2c 0.13.7.5 now for master as well. With bison, it'll be however hard to move from 2.4.1 on Windows (and it's not that critical), but file generated with it seems to work. Anyway, nothing prevents you or anyone to regenerate it and push over, just in case. Most important is that fixes land in the *.re/*.y sources. And one knows who to ping for verifications :) FYI I had to downgrade re2c to 0.13.6 as the latest randomly crashes. Regards Anatol -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Re: [RFC] Exceptions in the engine
On Thu, Feb 19, 2015 at 4:13 PM, Dmitry Stogov dmi...@zend.com wrote: Hi Nikita, I refactored your implementation: https://github.com/php/php-src/pull/1095 I introduced a class hierarchy to minimize effect on existing code. cacth (Exception $e) won't catch new types of exceptions. BaseException (abstarct) +- EngineException +- ParaseException +- Exception +-ErrorException +- all other exceptions In case of uncaught Parse and EngineEexception PHP writes the compatible error message. I made it mainly to keep thousand PHPT tests unchanged. If you like we may introduce an ini option that will lead to emitting of Uncaught Exception ... with backtrace instead. The internal API was changed a bit. e.g. EngineException are thrown from zend_error(), if the error_code has E_EXCEPTION bit set. So to change some error into exception now we should change zend_error(E_ERROR,...) into zend_error(E_EXCEPTION|E_ERROR. ...) All tests are passed. I'm not sure about sapi/cli/tests/bug43177.phpt, because it started to fail also in master. We may need to replace E_RECOVERABLE_ERROR with E_ERROR and fix corresponding tests. Despite of this, I think the patch is good enough to be merged into master. We may decide to convert more fatal errors later, but it shouldn't prevent from putting RFC into vote. Thoughts? I've updated some minor points in the RFC to be consistent with the new patch. The BaseException based hierarchy will be a separate vote [1]. If there are no further concerns I'd start voting on this RFC tomorrow. One point wrt the patch: Do you think it would be hard to use a clean shutdown on uncaught exceptions? This would make sure we don't forget to free anything when throwing exceptions. Nikita [1]: https://wiki.php.net/rfc/engine_exceptions_for_php7#hierarchy
Re: [PHP-DEV] [RFC] Coercive Scalar Type Hints
On 02/21/2015 09:10 PM, Pádraic Brady wrote: On the RFC rules themselves, a few comments: 1. Happy to see leading/trailing spaces excluded. 2. Rules don't make mention of leading zeroes, e.g. 0003 3. 1E07 might be construed as overly generous assuming we are excluding stringy integers like hex, oct and binary 4. I'm assuming the stringy ints are rejected? 5. Is .32 coerced to float or only 0.32? Merely for clarification. 6. Boolean coercion from other types... Not entirely sure myself. Completely off the cuff: =0: false, 0:true, floats and strings need not apply. 7. In string to float, only capital E or also small e? 8. I'll never stop call them stringy ints. In my mind, certainly a better proposition than those introducing dual mode. Agree with the comments above, except that I am entirely sure that boolean coercion from other types should not be allowed. Cheers, Ole Markus -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP-DEV] Coercive Scalar Type Hints RFC
-Original Message- From: Jefferson González [mailto:jgm...@gmail.com] Sent: Sunday, February 22, 2015 11:59 PM To: Stanislav Malyshev Cc: Etienne Kneuss; Anthony Ferrara; Zeev Suraski; PHP internals Subject: Re: [PHP-DEV] Coercive Scalar Type Hints RFC 2015-02-22 16:38 GMT-04:00 Stanislav Malyshev smalys...@gmail.com: Yes, that's not the case, at least nobody ever showed that to be the case. In general, as JS example (among many others) shows, it is completely possible to have JIT without strict typing. In particular, coercive typing provides as much information as strict typing about variable type after passing the function boundary - the only difference is what happens _at_ the boundary and how the engine behaves when the types do not match, but I do not see where big performance difference would come from - the only possibility for different behavior would be if your app requires constant type juggling (checks are needed in strict mode anyway, since variables are not typed) - but in this case in strict mode you'd have to do manual type conversions, which aren't in any way faster than engine type conversions. So the case for JIT being somehow better with strict typing so far remains a myth without any substantiation. Well, strict on a JIT environment may haven't been proved, but it surely has been proved on statically compiled languages like C. Jefferson, Strict type hints will not make PHP even remotely similar or otherwise comparable to a statically typed language like C. It will take totally different facilities being added to PHP, which are not being considered. Currently, a JIT in the most cases can't compete to the bare performance of a static compiled language, both in resources and CPU, so how is non strict better in that sense? Nobody is saying it's better in that sense. Nobody is also suggesting that a statically compiled language like C isn't a lot easier to optimize than a dynamic language with JIT. What we are saying is that Strict type hints are of no help making JIT any better or easier, compared to Dynamic/Coercive hints. I provided what I believe to be detailed proof for that in my email titled 'JIT'. I thought those checks could be optional if generated at call time, thats why I gave these 2 examples: calc(1, 5) - no need for type checking or conversion, do a direct call calc(12, 15) - calc(strToInt(value1), strToInt(value2)) calc($var1, $var2) - needs type checking and conversion if required That's the wrong comparison to make. We should be comparing the same calls with the two systems, rather than one call in one system and a different one in a different system. Taking your example: calc(1, 5) - no need for type checking or conversion *in neither Strict type hints nor Coercive/Dynamic type hints*, do a direct call. Identical performance. calc(1, 5) - fails in strict type hints, succeeds in Dynamic/Coercive type hints (cannot be optimized) Again, this illustrates that the difference between the two is that of *functionality*, not performance. If you're saying that calc(1, 5) is slower than calc(1, 5) when using dynamic type hints - then that would be correct, but also pretty meaningless from a performance standpoint, if what you have are string values. And if you have integer values, well then, we've already established there's no difference between the two type hinting systems. Typically, you obtain the data you need in a type that's not under your control. You're getting data from the browser, database, filesystem, web service or some algorithm - the type of the values you get is determined by the API functions you're using to get the data from. So what are your options if what you have in your hand is 1 and 5, because that's how the APIs provided the data to you, as opposed to 1 and 5? Before they can be added, they need to be converted to integer format, whether this is done by explicitly casting them (likely outcome in case of a strict type hint), casting them through a safe coercive STH, or letting PHP's + operator implementation do it for you. The data needs to be converted somewhere. So what you are saying is that there is no way of determining the type of a variable (only at runtime), as Zeev explained on the previous messages, since variables aren't typed, checks are mandatory either way. There are ways to infer typing information both during compile time and also create 'educated guess' as to what the data type is going to be based on runtime information, but: 1. No, it's absolutely not possible to always determine the type of variables during compile-time, you'd often (perhaps more often than not) only know the data type with absolute certainty only at runtime 2. Whatever you CAN infer, you can infer equally regardless of whether a piece of code uses strict type hints or dynamic ones. Please
Re: [PHP-DEV] [VOTE] Expectations
Hi! I do not see much gain today to improve them while I do not see why we should not. It does not hurt. The gain is simple - today, assertions have costs so people that are performance-conscious (rightly or wrongly) use them less than they could. We can make them cost-less in production, while preserving their advantages in the test environments (also makes code easier to follow btw if asserts show which invariants are being enforced). Unit tests only provide for assertions in test code, but cost-less asserts can help you ensure that code works the way you intended all the way through, without paying for that with performance. -- Stas Malyshev smalys...@gmail.com -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] Re: JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)
2015-02-22 14:37 GMT-04:00 Zeev Suraski z...@zend.com: I think it's fair to say that Dmitry - who led the PHPNG effort - cares *a lot* performance. I'm sure you'd agree. I tend to think that I also care a lot about performance, and so does Xinchen. We all spent substantial parts of our lives working to speed PHP up. It's not whether we think performance is important - it is (although we do believe we should build optimizers for languages, more so than languages for optimizers). It's just that we all fail to see how the flavor of STH can have any meaningful influence on performance. Thanks for the feedback! Zeev Thanks for the insightful response! Now it would be nice to also see the opinions of the other camp.
Re: [PHP-DEV] Coercive Scalar Type Hints RFC
Hi! A JIT or AOT machine code generator IMHO will never have a decent use of system resources without some sort of strong/strict typed rules, somebody explain if thats not the case. Yes, that's not the case, at least nobody ever showed that to be the case. In general, as JS example (among many others) shows, it is completely possible to have JIT without strict typing. In particular, coercive typing provides as much information as strict typing about variable type after passing the function boundary - the only difference is what happens _at_ the boundary and how the engine behaves when the types do not match, but I do not see where big performance difference would come from - the only possibility for different behavior would be if your app requires constant type juggling (checks are needed in strict mode anyway, since variables are not typed) - but in this case in strict mode you'd have to do manual type conversions, which aren't in any way faster than engine type conversions. So the case for JIT being somehow better with strict typing so far remains a myth without any substantiation. while on strict mode the generated code could be: int calc(int val1, int val2) { return val1 + val2; } No, it can't be (at least it can't be the _entire_ code of this function), since the user still can pass non-int into this function - nothing introducing strict typing in functions, as it is proposed now, prevents it. What strict typing does is to ensure the error in this case, but to generate the error you still need the checks! BTW, your weak mode code is wrong too - there's no need to generate Variants if you typed the variables as int. You know once coercion is done they are ints. At least in the model that was now proposed. If my example is right it means strict would be better to achieve good Unfortunately, your example is not right. to another level of integration with PHP and performance. IMHO is harder and more resource hungry to implement a JIT/AOT using weak mode. With Please provide a substantiation for this opinion. So far what was provided was not correct. Thats all that comes to mind now, and while many people doesn't care for performance, IMHO a programming language mainly targeted for the web should have some caring on this department. Please do not strawman. A lot of people here care about performance, and you have not yet made case that strict typing has any benefit on performance, so implying that opponents of strict typing somehow don't care about performance while you champion it does not match the real situation. -- Stas Malyshev smalys...@gmail.com -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [RFC][DISCUSSION] Context Sensitive lexer
Hi! RFC: https://wiki.php.net/rfc/context_sensitive_lexer TL;DR commit: https://github.com/marcioAlmada/php-src/commit/c01014f9 PR: https://github.com/php/php-src/pull/1054 I like the idea. But we need to examine the cases carefully so we don't block some future routes - especially this is with regards to such things as type names which we wanted to reserve. I.e. method names resolution is probably clear, since they appear after - or ::, but for class names the context may be much more varied. -- Stas Malyshev smalys...@gmail.com -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [RFC] [VOTE] Parameter skipping RFC
Hi! I see a LOT of no votes against this RFC but can't find the thread outlining the reasoning for such resistence. I think my attempts to explain that this was a step towards named params, not a contradiction with them, failed - people read it, say we understood it and the say no, we don't want it, we want named params instead!. Well, let's hope somebody (not me) writes a patch for named params instead. In the meantime, 7.0 will have neither. -- Stas Malyshev smalys...@gmail.com -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)
Zeev, On Sun, Feb 22, 2015 at 1:37 PM, Zeev Suraski z...@zend.com wrote: -Original Message- From: Jefferson Gonzalez [mailto:jgm...@gmail.com] Sent: Sunday, February 22, 2015 4:25 PM To: Etienne Kneuss; Anthony Ferrara; Zeev Suraski Cc: PHP internals Subject: Re: [PHP-DEV] Coercive Scalar Type Hints RFC Jefferson, Please note that Anthony, the lead of the Dual Mode RFC, said this earlier on this thread, referring to the claim that Strict STH can improve JIT/AOT compilation: A statement which I said on the side, and I said should not impact RFC or voting in any way. And is in no part in my RFC at all. Please also see: marc.info/?l=php-internalsm=142439750614527w=2 So while Anthony and I don't agree on whether there are performance gains to be had from Strict STH, both of us agree that it's not at a level that should influence our decision regarding the RFCs on the table. I wholeheartedly agree with that stance, which is why I also listed the apparently extremely widespread misconception (IMHO) that Strict STH can meaningfully help JIT/AOT in my RFC. So you agree we shouldn't discuss it, then you go ahead and discuss it. I guess that shouldn't surprise me. Despite that, as your email suggests, there are still (presumably a lot) of people out there that assume that there are, in fact, substantial gains to be had from JIT/AOT if we introduce Strict STH. I'm going to take another stab at explaining why that's not the case. A JIT or AOT machine code generator IMHO will never have a decent use of system resources without some sort of strong/strict typed rules, somebody explain if thats not the case. It kind of is and kind of isn't. There's consensus, I think, that if PHP was completely strongly typed - i.e., all variables need to be declared and typed ahead of time, cannot change types, etc. - we'd clearly be able to create a lot of optimizations in AOT that we can't do today. That the part that 'is the case'. But nobody is suggesting that we do that. The discussion on the table is very, very narrow: There's no consensus there. As I've pointed out to you more than once, plenty of other languages manage this through type inference or reconstruction. Many (like Go) only requiring explicit types on the parameters, not on variables. Heck, I did **exactly** that in Recki-CT. So please don't dismiss something that's being done **IN THE PHP WORLD** just because you don't think it's possible. -- Can the code generated for a strict type hint can somehow be optimized significantly better than the code generated for a dynamic/coercive type hint. And here, I (as well as Dmitry, who actually wrote a JIT compiler for PHP) claim that it isn't the case. To be fair, there's no consensus on this point. And me, who wrote an AOT compiler that does **exactly** this, claim that it is the case. Along with other people who've worked on compilers. See the reply in a private thread you started that shows the tradeoffs, specifically in generated code efficiency and memory usage. You can keep ignoring the arguments, but PLEASE don't keep spreading them as fact. Also: if Dmitry worked on a JIT compiler, why isn't that code out in the open? And if the code isn't out, why isn't the knowledge open? Are we just supposed to rely on a single person's experience (especially when more than one other person's shared experiences differ)? Let me attempt, again, to explain why we don't believe there are any gains associated with Strict STH, be them with the regular engine, JIT or AOT. Consider the following code snippet: function strict_foo($x) { if (!is_int($x)) { trigger_error(); } .inner_code. } function very_lax_foo($x) { $x = (int) $x; .inner_code. } function test_strict() { .outer_code. strict_foo($x); } function test_lax() { .outer_code. very_lax_foo($x); } test_strict(); test_lax(); strict_foo() implements a pretty much identical check to the one that a Strict integer STH would perform. very_lax_foo() implements an explicit type conversion to int, that can pretty much never fail - which is significantly more lax than what is proposed for weak types in the Dual Mode RFC, and even more so compared to the Coercive STH RFC. .inner_code. is identical between the two foo() functions, and .outer_code. is identical between the two tester functions. The claim that strict types can be more efficiently optimized than more lax types, suggests it should be possible to optimize the code flow for test_strict()/strict_foo() significantly better than for very_lax_foo() using JIT/AOT. Assuming that they were split in the files appropriately, you are missing **THE** key thing we've been trying to tell you this entire time. Looking at a single function, yes there is no difference if it's strict or not (well, you can save some time on the next function call inside, but it's small). However we're not talking about
Re: [PHP-DEV] Coercive Scalar Type Hints RFC
On Sun, Feb 22, 2015 at 9:42 PM, Robert Stoll p...@tutteli.ch wrote: Probably it is a philosophical question how to look at it. IMO the only difference in C# (as well as in Java) lies in the way the conversions are applied. Implicit conversions are applied automatically by the compiler where explicit conversions are applied by the user. The difference lies in the fact that C# is statically typed and implicit conversions are only applied when it is certainly safe to apply one. However, Implicit conversions in C# behave the same as explicit conversion since implicit conversion which fail simply do not exist (there is no implicit conversion from double to int for instance). That is the way I look at it. You probably look at it from another point of view and would claim an implicit conversion from double to int in C# exists but just fails all the time = ergo implicit and explicit are different (that is my interpretation of your statement above). In this sense I would agree. But even when you think in this terms then you have to admit, they are fundamentally different in the way that implicit conversion which are different than explicit conversion always fail, in all cases - pretty much as if they do not exist. There are no cases, neither in C# nor in Java which I am aware of, where an implicit cast succeeds in certain cases but not in all and an explicit conversion succeeds in at least more cases than the implicit conversion. Hence, something like a should also not work in an explicit conversion in PHP IMO if it is not supported by the implicit conversion (otherwise strict mode is useless btw.) Try out the following C# code: dynamic d1 = 1.0; int d = d1; You will get the error Cannot implicitly convert type `double` to `int` at runtime. We see a fundamental difference between C# and PHP here. PHP is dynamically typed an relies on values rather than types (in contrast to C#). Therefore, the above code emits a runtime error even though the data could be converted to int without precision loss. This shall be different in PHP according to this RFC and I think that is perfectly fine. Yet, even more important it seems to me that implicit/explicit conversions behave the same way. At first it might seem strange to have just one conversion rule set in PHP since PHP is not known to be a language which shines due to its consistency... OK, I am serious again. If you think about it from the following point of view: A user writes an explicit conversion in order to state explicitly that some value will be converted (this is something which will be necessary in a strict mode). Why should this explicit conversion be different from the implicit one? There should not be any difference between explicit knowledge and implicit one. That is my opinion. If you really do not care about data loss and just want to squeeze a float/string into an int no matter what the value really is then you can use the @ in conjunction with ?? and provide the desired default value to fall back on if the conversion fails. If conversions like a to int really matters that much to the users of PHP then we could keep the oldSchoolIntConversion function (as propose in my first email) even in PHP 10 (I would probably get rid of them at some point). Cheers, Robert Well, I look at it this way (in a simplified manner). Hopefully this will make you understand my point of view more. - Implicit conversions work only when you are sure you won't lose stuff - Explicit conversions are for forcing (casting) variable to become another type, and when you are explicitely as user calling it, you are aware you can lose values Sure, the literal meaning in C# and PHP differs a little bit (because of static and dynamic typed language differences and stuff), but the *intent* is IMHO the same; implicit conversions can happen in the background safely, while for dangerous conversions, you have to cast by hand. And I see use cases for both of these types of conversions. Also, you are assuming that there will be a strict mode; I sincerely hope there won't. Ssince introduction of 2 modes, I was always saying that there should be only one mode - I don't really care whether it would be strict or weak, but just only one. Regards Pavel Kouril -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [RFC] [FINAL DISCUSSION] Script only include/require
Hi! I think this will be the final discussion before vote. This RFC is to make PHP stronger against script inclusion attacks just like other languages. https://wiki.php.net/rfc/script_only_include I still think this RFC takes a wrong road for the following reasons: 1. Having any code in your app that allows to run include on user-controlled files (I'm not talking about filtered cases but user data controlling the path) is insecure and can not be made secure. It should just never be done. Trying to find workarounds for this is like safe_mode - good idea in theory, leads to worse security in practice. 2. Default configuration would break tons of PHP scripts with extensions other than .php (very frequent case). The BC break potential of this is very big as it modifies core functionality. 3. Prohibiting phar uploads would also be a bc break, but more importantly, there still probably are ways to work around this by using phar files with extension different than .phar and then asking to include files within that phar file. As long as the eventual path would end in .php, your code would allow it. Also, the claim that move_upload_file() is obsolete is not based on anything as far as I can see. Why is it obsolete? -- Stas Malyshev smalys...@gmail.com -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
AW: [PHP-DEV] Coercive Scalar Type Hints RFC
-Ursprüngliche Nachricht- Von: Pavel Kouřil [mailto:pajou...@gmail.com] Gesendet: Sonntag, 22. Februar 2015 22:18 An: Robert Stoll Cc: Zeev Suraski; PHP internals Betreff: Re: [PHP-DEV] Coercive Scalar Type Hints RFC On Sun, Feb 22, 2015 at 9:42 PM, Robert Stoll p...@tutteli.ch wrote: Probably it is a philosophical question how to look at it. IMO the only difference in C# (as well as in Java) lies in the way the conversions are applied. Implicit conversions are applied automatically by the compiler where explicit conversions are applied by the user. The difference lies in the fact that C# is statically typed and implicit conversions are only applied when it is certainly safe to apply one. However, Implicit conversions in C# behave the same as explicit conversion since implicit conversion which fail simply do not exist (there is no implicit conversion from double to int for instance). That is the way I look at it. You probably look at it from another point of view and would claim an implicit conversion from double to int in C# exists but just fails all the time = ergo implicit and explicit are different (that is my interpretation of your statement above). In this sense I would agree. But even when you think in this terms then you have to admit, they are fundamentally different in the way that implicit conversion which are different than explicit conversion always fail, in all cases - pretty much as if they do not exist. There are no cases, neither in C# nor in Java which I am aware of, where an implicit cast succeeds in certain cases but not in all and an explicit conversion succeeds in at least more cases than the implicit conversion. Hence, something like a should also not work in an explicit conversion in PHP IMO if it is not supported by the implicit conversion (otherwise strict mode is useless btw.) Try out the following C# code: dynamic d1 = 1.0; int d = d1; You will get the error Cannot implicitly convert type `double` to `int` at runtime. We see a fundamental difference between C# and PHP here. PHP is dynamically typed an relies on values rather than types (in contrast to C#). Therefore, the above code emits a runtime error even though the data could be converted to int without precision loss. This shall be different in PHP according to this RFC and I think that is perfectly fine. Yet, even more important it seems to me that implicit/explicit conversions behave the same way. At first it might seem strange to have just one conversion rule set in PHP since PHP is not known to be a language which shines due to its consistency... OK, I am serious again. If you think about it from the following point of view: A user writes an explicit conversion in order to state explicitly that some value will be converted (this is something which will be necessary in a strict mode). Why should this explicit conversion be different from the implicit one? There should not be any difference between explicit knowledge and implicit one. That is my opinion. If you really do not care about data loss and just want to squeeze a float/string into an int no matter what the value really is then you can use the @ in conjunction with ?? and provide the desired default value to fall back on if the conversion fails. If conversions like a to int really matters that much to the users of PHP then we could keep the oldSchoolIntConversion function (as propose in my first email) even in PHP 10 (I would probably get rid of them at some point). Cheers, Robert Well, I look at it this way (in a simplified manner). Hopefully this will make you understand my point of view more. - Implicit conversions work only when you are sure you won't lose stuff - Explicit conversions are for forcing (casting) variable to become another type, and when you are explicitely as user calling it, you are aware you can lose values I see. I see and think you are not alone with this opinion. I give you another example and hope you reconsider your position (up to you what position you take afterwards of course). Consider the following in C# class A{} class B : A{} class C : A{} A a = new B(); B b = a; // will fail, needs a conversion C c1 = a; // will fail, needs a conversion C c2 = (C) a; //will fail at runtime And now imagine C# would not be based on types but on values. Then the following would be perfectly legal as well: B b = a; //is fine since a is of type B C c1 = c; //will fails since a is not of type C C c2 = (C) c; //still fails since a is not of type C Or to illustrate it differently. Imagine you have a shop and your main currency is $. However, you accept € as well as long as they are banknotes. In this case the customer can insert the banknotes in a currency exchange machine at the till. Now imagine the following four use cases: 1. A customer buys something with $ - everything is fine
RE: [PHP-DEV] Coercive Scalar Type Hints RFC
Hi Robert, So what does that mean for scalar types? IMO it means that way more important than adding scalar type hints to PHP 7.0 is to agree on a new set of conversion rules for the long run. PHP should strive to have one consistent set of conversion rules which apply in all places where implicit or explicit conversion are used. That's exactly what I mean. I think people should keep in mind, when talking about enabling/disabling a given conversion, that the implicit scope is every explicit or implicit conversion implemented in PHP. In an ideal world, we would proceed in reverse order. We wouldn't start considering modifying the ZPP ruleset before having aligned every implicit/explicit conversions existing in PHP on a single ruleset. Unfortunately, if we want to keep a chance with STH in 7.0, we cannot do that. So, we will probably evaluate potential BC breaks on ZPP ruleset modifications only, meaning we'll make decision without a good evaluation of the BC breaks introduced by aligning other PHP conversions on the newly-proposed ruleset. So, we'll need to extrapolate from ZPP-only results. Regards François -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP-DEV] Coercive Scalar Type Hints RFC
Hi Stas, It seems the actual problem is that we have too many compiler / code analysis experts in the community ;) (don't get me wrong, I am not saying that for you, I just admire your patience explaining the same again and again to people who never read one line from PHP core source). Regards François -Message d'origine- De : Stanislav Malyshev [mailto:smalys...@gmail.com] Envoyé : dimanche 22 février 2015 21:39 À : Jefferson Gonzalez; Etienne Kneuss; Anthony Ferrara; Zeev Suraski Cc : PHP internals Objet : Re: [PHP-DEV] Coercive Scalar Type Hints RFC Hi! A JIT or AOT machine code generator IMHO will never have a decent use of system resources without some sort of strong/strict typed rules, somebody explain if thats not the case. Yes, that's not the case, at least nobody ever showed that to be the case. In general, as JS example (among many others) shows, it is completely possible to have JIT without strict typing. In particular, coercive typing provides as much information as strict typing about variable type after passing the function boundary - the only difference is what happens _at_ the boundary and how the engine behaves when the types do not match, but I do not see where big performance difference would come from - the only possibility for different behavior would be if your app requires constant type juggling (checks are needed in strict mode anyway, since variables are not typed) - but in this case in strict mode you'd have to do manual type conversions, which aren't in any way faster than engine type conversions. So the case for JIT being somehow better with strict typing so far remains a myth without any substantiation. while on strict mode the generated code could be: int calc(int val1, int val2) { return val1 + val2; } No, it can't be (at least it can't be the _entire_ code of this function), since the user still can pass non-int into this function - nothing introducing strict typing in functions, as it is proposed now, prevents it. What strict typing does is to ensure the error in this case, but to generate the error you still need the checks! BTW, your weak mode code is wrong too - there's no need to generate Variants if you typed the variables as int. You know once coercion is done they are ints. At least in the model that was now proposed. If my example is right it means strict would be better to achieve good Unfortunately, your example is not right. to another level of integration with PHP and performance. IMHO is harder and more resource hungry to implement a JIT/AOT using weak mode. With Please provide a substantiation for this opinion. So far what was provided was not correct. Thats all that comes to mind now, and while many people doesn't care for performance, IMHO a programming language mainly targeted for the web should have some caring on this department. Please do not strawman. A lot of people here care about performance, and you have not yet made case that strict typing has any benefit on performance, so implying that opponents of strict typing somehow don't care about performance while you champion it does not match the real situation. -- Stas Malyshev smalys...@gmail.com -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Coercive Scalar Type Hints RFC
Hi! Well, strict on a JIT environment may haven't been proved, but it surely has been proved on statically compiled languages like C. Currently, a I understand that using the same concept of typing in both cases can be confusing, but that's pretty much where the similarity ends. Strict typing in C has very little to do with what is proposed as strict typing in PHP, and so far nobody is considering making PHP strictly typed in the way C is (let alone more strict languages than C are). So bringing C into the discussion is misleading. JIT in the most cases can't compete to the bare performance of a static compiled language, both in resources and CPU, so how is non strict better in that sense? Dynamic typing is not better in that sense. That's my whole point - from the JIT perspective, they are the same, so the claim that strict typing, as proposed, provides performance benefits, is incorrect. previous message, at runtime it consumes more memory and cpu and this is mostly due to all the type checking it requires. In that sense if the As I already mentioned, current strict proposal requires type checking too. The only one that doesn't is complete strict typing at compile-time - which nobody is proposing. strict proposal could improve that situation it would be a benefit. You keep repeating that, but that claim does not become more true because it is repeated more times. It still is as unsubstantiated and lacking base as it was the first time it was introduced. Please provide some proof (logical or experimental) as to why it must happen (yes, this includes the if too since it is pointless to bring it as a possibility if we do not have any way for this possibility to be realized). I thought those checks could be optional if generated at call time, thats why I gave these 2 examples: I don't see how they can be optional with strict typing. calc(1, 5) - no need for type checking or conversion, do a direct call calc(12, 15) - calc(strToInt(value1), strToInt(value2)) calc($var1, $var2) - needs type checking and conversion if required The same can be said about dynamic typing, with exactly the same words. The only difference is what happens *after* checking - but this is only relevant if the code relies on conversions, in which case in strict case it just won't work - hardly a performance improvement worth considering. I was thinking on the sense that before calling a function, type checking could take place and conversion if required, but may be thats even more complicated... This can be done in dynamic case too, provided the type information is present (i.e. constants). No current proposal does this, though, AFAIK. Static typed languages - Direct conversion to machine code Dynamic typed languages with JIT - Intermediate representation - Checks - Conversion to machine code with checks. We're not talking about making PHP statically typed language, do we? So this advantage - while without any doubt real - does not apply to PHP. -- Stas Malyshev smalys...@gmail.com -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)
Anthony, I started writing this long response, but instead, I want to localize the whole discussion to the one true root difference. Your position on that difference is the basis for your entire case, and my position on this argument is the base for my entire case. There we go: And note that this can only work with strict types since you can do the necessary type inference and reconstruction (both forward from a function call, and backwards before it). Please do explain how strict type hints help you do inference that you couldn't do with dynamic type hints. Ultimately, your whole argument hinges on that, but you mention it in parentheses almost as an afterthought. I claim the opposite - you cannot infer ANYTHING from Strict STH that you cannot infer from Coercive STH. Consequently, everything you've shown, down to the C-optimized version of strict_foo() can be implemented in the exact same way for very_lax_foo(). Being able to optimize away the value containers is not unique to languages with strict type hints. It's done in JavaScript JIT engines, and it was done in our JIT POC. With lax (weak, coercive) types, the ability to do type reconstruction drops significantly. Because you can no longer do any backwards inference from other function calls. Which means you can't prove if a type is stable in most cases (won't change). Therefore, you'll always have to allocate a ZVAL, and then the optimizations I showed above would stop working. Again, using the scientific method I'm familiar with, that would be a theory, and it would require proof. So far I haven't seen any proof, and I believe I pretty much proved the opposite with my example. Zeev -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] [RFC] Script only include/require
Hi all, I wrote patch and made adjustment in the RFC https://wiki.php.net/rfc/script_only_include https://github.com/php/php-src/pull/ Where to check filename extension is subject to be changed. At first, I thought implementing this as PHP code is good, but I've changed my mind. It seems better to be done in Zend code. Opinions are appreciated. This RFC aims to make PHP as secure as other languages with respect to script inclusion attacks. Note: File inclusion is not a scope of this RFC. INI Changes: - php_script - zend.script_extensions - Allow all files: * - NULL or Open Issues: - Error type - Is it OK to raise E_ERROR/E_RECOVERABLE_ERROR in zend_language_scanner.c? - Vote type - 50%+1 or 2/3 If there is anyone who would like to vote no for this RFC, I would like to know the reason and try to address/resolve issue you have. Thank you. Regards, -- Yasuo Ohgaki yohg...@ohgaki.net
Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)
On 22/02/15 18:37, Zeev Suraski wrote: Variant* calc(Variant val1, Variant val2) { if(val1.isInt() ) { // type checking if (!val1.coerceToInt()) { throw new RuntimeError() } If (!val2.coerceToInt()) { throw new RuntimeError(); } // function body begins here int result = Variant(val1.intValue() + val2.intValue()); return result; } A more practical example would be to replace coerceToInt() with inRange() which includes an int check/'coerce' as part of the range check, and produce a number of errors based on the result. -- Lester Caine - G8HFL - Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk Rainbow Digital Media - http://rainbowdigitalmedia.co.uk -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] Coercive Scalar Type Hints RFC
Adding in a thread that was started in private, but absolutely is worth sharing with the group: -- Forwarded message -- From: Etienne Kneuss col...@php.net Date: Sun, Feb 22, 2015 at 8:42 AM Subject: Re: [PHP-DEV] Coercive Scalar Type Hints RFC To: Zeev Suraski z...@zend.com Cc: Anthony Ferrara ircmax...@gmail.com, Dmitry Stogov dmi...@zend.com On Sun Feb 22 2015 at 14:23:58 Zeev Suraski z...@zend.com wrote: There have been several attempts: for JS: http://users-cs.au.dk/simonhj/tajs2009.pdf or similar techniques applied to PHP, quite outdated though: https://github.com/colder/phantm Looks like WebKit's type inference is doing some pretty good job at analyzing code, although I'm not sure how much of it is static vs. dynamic. My guess is that a lot of it is static: twitter.com/kangax/status/558974257724940288 My guess would be that it's almost entirely dynamic, or probabilistic (e.g. this nice recent work done at ETH: http://www.jsnice.org/). I think you underestimate the difficulty of statically recovering precise types from no-annotations without runtime witnesses ;) You don't want webkit anlysing the JS for 10 minutes until it renders the page. It is much more profitable to JIT these. You are right that the lack of static information about types is (one of the) a main issue. Recovering the types has typically a huge performance cost, or is unreliable We're not really talking about performance issue here, as static analysis is a separate activity that is unrelated to runtime performance. What I meant was: it is a performance issue for the static analyzer, not PHP itself. But seriously, time is getting wasted on this argument; it's actually a no-brainer: more static information helps tools that rely on static information. Yes. Absolutely. 100%. There's still disagreement between us on whether the different behavior of Strict STH constitutes additional static information or not, as it doesn't give you any extra information on the value being fed to the function, and it doesn't give you any extra information on what the function will receive. It only gives you information about how the function would behave if it gets a wrongly-typed value. 1) for forward analyses (which are the most common for these applications): it gives you precious information from the beginning of the function and forward. You can consider it similarly to a cast: You don't necessarily know what the value coming in is, but you know which type you are having from that point forward. 2) backward analyses could piggy-back the type constraints from the functions (strict or no strict) and check that they are met when constructing the value fed to the function. Having worked several years on static analysis tools for languages such as PHP, I can guarantee you that this information would help a lot. However, the other dynamic feature of PHP would still make analyses slow/unreliable/imprecise. Let's not imagine that this is the only thing missing for PHP to be static-analysis-wonderland, far from it. But my the bottom line is exactly the bottom line you ended with, and what I answered you on-list - how much weight should Static Analysis improvements have on our decision to introduce new language features? My answer is not that much, if they have downsides. Static Analyzers should be designed for languages and not vice versa. I fully agree in general that the flow should be this way. But it remains a bonus if a certain feature, as a plus, would help external tools. I believe it is worth mentionning.. Thanks, Zeev -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)
Hi! You can tell because you know the function foo expects an integer. So you can infer that $x will have to have the type integer due to the future requirement. Which means the expression $something / 2 must also be an integer. We know that's not the case, so we can raise an error here. OK, so your claim is that the compiler with strict typing can detect some situations which the dynamic one can not and reject some of the code. Without going too much into details, I agree with this, this is an obvious difference between strict and dynamic. However, this is not a performance advantage, obviously - since you are comparing running code with non-running one - your model just accepts less code. Obviously, this works if non-accepted code was wrong - and doesn't work if it was not. But we talked about running code, I thought. At that point the developer has the choice to explicitly cast or put in a floor() or one of a number of options. That's exactly what I claim would be the defect of the strict model - people would start putting excessive casts ensuring there would be cases where information is lost. For example, assume we knew $something is even: function bar(int $something): int { assert($something %2 == 0); $x = $something / 2; return foo($x); } Now everything is fine (ignoring the typing for a second), right? We're dealing with integers, /2 always divides evenly, all is great. Now we introduce strictness, so we'd need to say something like: function bar(int $something): int { assert($something %2 == 0); $x = $something / 2; return foo((int)$x); } Now assume somebody messed up on the routine code reformatting merge and the code somehow ended up like: function bar(int $something): int { $x = $something / 2; return foo((int)$x); } Do you see what the problem is? Now we lost the check for $something being even, but we would never know about it since type system forced us to insert (int) (which we didn't need) and thus disabled the controls for the bug of $something not being even (which we did need). But more important question is - with (int) the coercive model can use this information too, so what's the difference from strict model on that code? There seems to be none. Without strict typing this code is always stable, but you still need to generate full type assertions in a compiled version of foo() and use ZVALs for $x, hence reducing the effect of the optimization significantly. Wait, you said this code is invalid so no code will be generated. Did you mean code after introducing (int)? Then strict has no advantage anymore as we can derive the info from (int) anyway. Otherwise, I can't see how you can avoid generating typechecks in foo() unless the only place it can ever be called from is bar() - but I don't see how you can ensure that in PHP, and if you could, I don't see why weak model could not make the same conclusions on the same code. So far the only advantage I've seen seems to be that your compiler would reject code that looks suspicious to it and thus force the programmer to coerce the variables into the types manually - by (int) or floor() - something that the coercive model would do for you automatically. Once coerced, the same code would have the same type info (and thus same potential optimizations) in both models. I don't think it is a gain in general, and I don't think forcing people to modify their code qualifies as JIT performance gain. -- Stas Malyshev smalys...@gmail.com -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] PDO_DBLIB type handling
Hi Matteo, On Sun, Feb 22, 2015 at 7:45 PM, Matteo Beccati p...@beccati.com wrote: The default behaviour of mysql/pgsql drivers is to convert to the matching PHP type, if possible. That can be turned off via PDO::ATTR_STRINGIFY_FETCHES = true. If PDO_DBLIB doesn't behave like that, I'd say it's a bug that needs to be fixed, but possibly only applied to a major/minor release due to the BC break. Please write up RFC now. It's the time :) Regards, -- Yasuo Ohgaki yohg...@ohgaki.net
Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)
Stas, -- Can the code generated for a strict type hint can somehow be optimized significantly better than the code generated for a dynamic/coercive type hint. And me, who wrote an AOT compiler that does **exactly** this, claim Sorry, did exactly what? Here a bit more explanation would help. Optimized statically typed PHP functions. Or more specifically function calls inside of compiled code are treated strictly (so trying to pass a float to an int typed function would error at compile time). The outer function call (from non-compiled PHP) is parsed using ZPP rules, but once it's inside it's strict. https://github.com/google/recki-ct/blob/master/doc/0_introduction.md https://github.com/google/recki-ct/blob/master/doc/2_basic_operation.md However, since test_strict() is compiled, there's no reason to dispatch back up to PHP functions for strict_foo(). In fact, that would be exceedingly slow. So instead, we'd compile strict_foo() as a C function, and do a native function call to it. Never having to check types because they are passed on the C stack. Doesn't that assume strict_foo() is always called with the right type of arguments? What exactly ensures that it does in fact happen? Shouldn't you have the type check _somewhere_ to be able to claim this happens? test_foo() doesn't do any checks, so what ensures $x is of the right type for C? And if the check is there, how is it better? Yes it does check the types, but at compile time. My AOT compiler backend has no concept of a mixed or ZVAL type. All types are determined at compile time, and in the very few cases it can't it will error. The type inference engine attempts to determine specifically using all available information (prior context, current context, future context) to determine what the type is. It does also detect type changes (via assignment) and is able to correctly generate code based on that as well. And note that this can only work with strict types since you can do the necessary type inference and reconstruction (both forward from a function call, and backwards before it). I don't get the backwards part - I think you claimed it last time we discussed it but I haven't seen your answer explaining why it's OK to just ignore cases when the variable is of the wrong type. Right now, it looks like you claim that if somebody has a call strict_foo($x) and strict_foo() accepts integers, that magically makes $x integer and you can generate code everywhere (not only inside strict_foo but outside) assuming $x is integer without actually needing a check. I don't see how this can work. Ok, let's take another example: ?php declare(strict_types=1); function foo(int $int): int { return $int + 1; } function bar(int $something): int { $x = $something / 2; return foo($x); } ^^ In that case, without strict types, you'd have to generate code for both integer and float paths. With strict types, this code is invalid. You can tell because you know the function foo expects an integer. So you can infer that $x will have to have the type integer due to the future requirement. Which means the expression $something / 2 must also be an integer. We know that's not the case, so we can raise an error here. At that point the developer has the choice to explicitly cast or put in a floor() or one of a number of options. The function bar itself didn't give us that information. We needed to use the type information from foo() to infer the type of $x prior to foo()'s call. Or more specifically, we inferred the only stable type that it could be. Which let us determine that $x's assignment was where the error was (since it wasn't a stable assignment). Without strict typing this code is always stable, but you still need to generate full type assertions in a compiled version of foo() and use ZVALs for $x, hence reducing the effect of the optimization significantly. Anthony -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Reclassify E_STRICT notices
On 22/02/15 22:30, Nikita Popov wrote: I would like to propose reclassifying our few existing E_STRICT notices and removing this error category: https://wiki.php.net/rfc/reclassify_e_strict As we don't really have good guidelines on when which type of error should be thrown, I'm mainly going by what category other similar errors use. I'm open to suggestions, but hope this will not deteriorate into total bikeshed. At last something which fits my roadmap ... Only problem is 'Redefining a constructor' which I certainly accept while upgrading code, but I also appreciate why 'Remove PHP4 Constructors' might not be accepted leaving a difficulty. I think one ends up with still needing a 'mode' switch if the legacy constructors are retained? -- Lester Caine - G8HFL - Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk Rainbow Digital Media - http://rainbowdigitalmedia.co.uk -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)
Zeev, And note that this can only work with strict types since you can do the necessary type inference and reconstruction (both forward from a function call, and backwards before it). Please do explain how strict type hints help you do inference that you couldn't do with dynamic type hints. Ultimately, your whole argument hinges on that, but you mention it in parentheses almost as an afterthought. I claim the opposite - you cannot infer ANYTHING from Strict STH that you cannot infer from Coercive STH. Consequently, everything you've shown, down to the C-optimized version of strict_foo() can be implemented in the exact same way for very_lax_foo(). Being able to optimize away the value containers is not unique to languages with strict type hints. It's done in JavaScript JIT engines, and it was done in our JIT POC. I do here: http://news.php.net/php.internals/83504 I'll re-state the specific part in this mail: ?php declare(strict_types=1); function foo(int $int): int { return $int + 1; } function bar(int $something): int { $x = $something / 2; return foo($x); } ^^ In that case, without strict types, you'd have to generate code for both integer and float paths. With strict types, this code is invalid. You can tell because you know the function foo expects an integer. So you can infer that $x will have to have the type integer due to the future requirement. Which means the expression $something / 2 must also be an integer. We know that's not the case, so we can raise an error here. At that point the developer has the choice to explicitly cast or put in a floor() or one of a number of options. The function bar itself didn't give us that information. We needed to use the type information from foo() to infer the type of $x prior to foo()'s call. Or more specifically, we inferred the only stable type that it could be. Which let us determine that $x's assignment was where the error was (since it wasn't a stable assignment). Without strict typing this code is always stable, but you still need to generate full type assertions in a compiled version of foo() and use ZVALs for $x, hence reducing the effect of the optimization significantly. Anthony -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] RE: JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)
[I hope I managed to get the threading right although I think it's more than likely I didn't; apologies in advance if that's the case] -Original Message- From: Joe Watkins [mailto:pthre...@pthreads.org] Sent: Sunday, February 22, 2015 5:07 PM To: Jefferson Gonzalez Cc: Etienne Kneuss; Anthony Ferrara; Zeev Suraski; PHP internals Subject: Re: [PHP-DEV] Coercive Scalar Type Hints RFC 1. Does weak mode could provide the required rules to implement a JIT with a sane level of memory and CPU usage? There is no objective answer to the question while it has the clause with a sane level of The assertion in the RFC that says there is no difference between strict and weak types, in the context of a JIT/AOT compiler, is wrong. Can we take a hint from youtu.be/9MV1ot9HkPg please? :) function add(int $l, int $r) { return $l + $r; } The first instruction in the interpreted code is not ZEND_ADD, first, parameters must be received from the stack. If that's a strict function, then un-stacking parameters is relatively easy, if it's a dynamic function then you have to generate code that is considerably more complicated. Can you explain how so, assuming you know the same about what's waiting on the stack in both cases? If you know they're already ints, you can optimize the function to take this assumption into account - identically - in both the case of a strict or dynamic hint (and as a matter of fact, even if there's no hint at all). If, however, you don't know what's waiting on the stack - you would have to conduct checks in both cases. Sure, the checks are different - but the difference between them is behavioral, semantic - not performance related. This is an inescapable difference, of the the kind that definitely does have a negative impact on implementation complexity, runtime, and maintainability. This inescapable difference somehow manages to escape a bunch of people here, myself included... To me, it only makes sense to compile strict code AOT or JIT; If you want dynamic behaviour, we have an extremely mature platform for that. I'll have to disagree with that statement. I'm not yet sure what we can achieve with JIT in real world PHP web apps, but I've seen enough to know that with JIT, certain use cases which are impractical today due to performance (mainly data crunching) suddenly become viable. And that's with no STH at all, strict or otherwise. Our JIT POC runs bench.php 25 times faster than PHP 7, without a single byte changed in the source (no typo, 25 times faster, not 25%). Unfortunately, most real world app code doesn't look anything like what bench.php does - but it should illustrate the point about the viability of JIT/AOT for dynamic platforms. To take another example, JavaScript is extremely lax, are you suggesting that it doesn't make sense to use JIT with JavaScript? JIT made JS performance literally explode, without changing the language to be strict. If what you can already discern from dynamic type inference (which is quite a lot, as JS proves) isn't enough for you - what you need isn't some specific kind of type hinting (strict or otherwise) - but more tools to tell what the type is in situation where today you can't (or find it hard to) infer it. Strictly typed variable declarations would be one such thing, changing how our operators work to be more strict would be another (for the record, I'm absolutely not suggesting we do that!). In short, changing PHP to be a much more strongly typed language than it is today, with or without Strict STH. 2. ... With that said, if a JIT implementation is developed will the story of the ZendOptimizer being a commercial solution will be repeated or would this JIT implementation would be part of the core? I think it's likely that Anthony and I, and Dmitry want different things for a JIT/AOT engine. I think Anthony and I are preferring an engine that requires minimal inference because type information is present (or implicit), I don't see how that's possible, unless you add facilities such as the ones I mentioned above. If that's the goal, I think it should be clearly stated. Without that, you need the exact same type inference with strict and weak types in order to develop useful JIT/AOT. while Dmitry probably favours the kind that can infer at runtime, the dynamic kind, like Zend is today. They are a world apart, I think, I'll be happy to be proven wrong about that. The difference is really not about what kind of JIT implementation anyone prefers to have, but what kind of language behavior it's going to have to address. Trust me, I can tell you with absolute confidence that anybody who writes a JIT engine - Dmitry and JS engine devs included - would prefer to have the types just handed to him, rather than have to infer it. Type inference is hard. But that would be the task at hand. Thanks, Zeev -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit:
AW: [PHP-DEV] Coercive Scalar Type Hints RFC
-Ursprüngliche Nachricht- Von: Pavel Kouřil [mailto:pajou...@gmail.com] Gesendet: Sonntag, 22. Februar 2015 20:02 An: Robert Stoll Cc: Zeev Suraski; PHP internals Betreff: Re: [PHP-DEV] Coercive Scalar Type Hints RFC On Sun, Feb 22, 2015 at 7:30 PM, Robert Stoll p...@tutteli.ch wrote: Hi Pavel, Yes, I am suggesting to make conversions behave the same regardless if it is implicit or explicit. The only difference between the two should be that one is stated explicitly by the user where the other is applied implicitly. Other programming languages behave like this and are more predictable for users as well as developers because one does not need to learn two sets of conversion rules. Actually this is not true. Other languages have differences between explicit conversions (aka casting) and implicit conversions as well. C# is the language I use the most after PHP, so I'll bring that one up (see https://msdn.microsoft.com/en- us/library/ms173105.aspx), but I believe other languages (probably Java?) act the same way. Regards Pavel Kouril Hm... I reconsidered my statements and that is a good thing :) I am not sure if I got your view point. I will try to elaborate more on mine and explain how I interpret your statement. Probably it is a philosophical question how to look at it. IMO the only difference in C# (as well as in Java) lies in the way the conversions are applied. Implicit conversions are applied automatically by the compiler where explicit conversions are applied by the user. The difference lies in the fact that C# is statically typed and implicit conversions are only applied when it is certainly safe to apply one. However, Implicit conversions in C# behave the same as explicit conversion since implicit conversion which fail simply do not exist (there is no implicit conversion from double to int for instance). That is the way I look at it. You probably look at it from another point of view and would claim an implicit conversion from double to int in C# exists but just fails all the time = ergo implicit and explicit are different (that is my interpretation of your statement above). In this sense I would agree. But even when you think in this terms then you have to admit, they are fundamentally different in the way that implicit conversion which are different than explicit conversion always fail, in all cases - pretty much as if they do not exist. There are no cases, neither in C# nor in Java which I am aware of, where an implicit cast succeeds in certain cases but not in all and an explicit conversion succeeds in at least more cases than the implicit conversion. Hence, something like a should also not work in an explicit conversion in PHP IMO if it is not supported by the implicit conversion (otherwise strict mode is useless btw.) Try out the following C# code: dynamic d1 = 1.0; int d = d1; You will get the error Cannot implicitly convert type `double` to `int` at runtime. We see a fundamental difference between C# and PHP here. PHP is dynamically typed an relies on values rather than types (in contrast to C#). Therefore, the above code emits a runtime error even though the data could be converted to int without precision loss. This shall be different in PHP according to this RFC and I think that is perfectly fine. Yet, even more important it seems to me that implicit/explicit conversions behave the same way. At first it might seem strange to have just one conversion rule set in PHP since PHP is not known to be a language which shines due to its consistency... OK, I am serious again. If you think about it from the following point of view: A user writes an explicit conversion in order to state explicitly that some value will be converted (this is something which will be necessary in a strict mode). Why should this explicit conversion be different from the implicit one? There should not be any difference between explicit knowledge and implicit one. That is my opinion. If you really do not care about data loss and just want to squeeze a float/string into an int no matter what the value really is then you can use the @ in conjunction with ?? and provide the desired default value to fall back on if the conversion fails. If conversions like a to int really matters that much to the users of PHP then we could keep the oldSchoolIntConversion function (as propose in my first email) even in PHP 10 (I would probably get rid of them at some point). Cheers, Robert -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Coercive Scalar Type Hints RFC
2015-02-22 16:38 GMT-04:00 Stanislav Malyshev smalys...@gmail.com: Yes, that's not the case, at least nobody ever showed that to be the case. In general, as JS example (among many others) shows, it is completely possible to have JIT without strict typing. In particular, coercive typing provides as much information as strict typing about variable type after passing the function boundary - the only difference is what happens _at_ the boundary and how the engine behaves when the types do not match, but I do not see where big performance difference would come from - the only possibility for different behavior would be if your app requires constant type juggling (checks are needed in strict mode anyway, since variables are not typed) - but in this case in strict mode you'd have to do manual type conversions, which aren't in any way faster than engine type conversions. So the case for JIT being somehow better with strict typing so far remains a myth without any substantiation. Well, strict on a JIT environment may haven't been proved, but it surely has been proved on statically compiled languages like C. Currently, a JIT in the most cases can't compete to the bare performance of a static compiled language, both in resources and CPU, so how is non strict better in that sense? You can argue a lot about nodejs, but as I said on previous message, at runtime it consumes more memory and cpu and this is mostly due to all the type checking it requires. In that sense if the strict proposal could improve that situation it would be a benefit. No, it can't be (at least it can't be the _entire_ code of this function), since the user still can pass non-int into this function - nothing introducing strict typing in functions, as it is proposed now, prevents it. What strict typing does is to ensure the error in this case, but to generate the error you still need the checks! BTW, your weak mode code is wrong too - there's no need to generate Variants if you typed the variables as int. You know once coercion is done they are ints. At least in the model that was now proposed. I thought those checks could be optional if generated at call time, thats why I gave these 2 examples: calc(1, 5) - no need for type checking or conversion, do a direct call calc(12, 15) - calc(strToInt(value1), strToInt(value2)) calc($var1, $var2) - needs type checking and conversion if required I was thinking on the sense that before calling a function, type checking could take place and conversion if required, but may be thats even more complicated... So what you are saying is that there is no way of determining the type of a variable (only at runtime), as Zeev explained on the previous messages, since variables aren't typed, checks are mandatory either way. Please provide a substantiation for this opinion. So far what was provided was not correct. Static typed languages - Direct conversion to machine code Dynamic typed languages with JIT - Intermediate representation - Checks - Conversion to machine code with checks. Please do not strawman. A lot of people here care about performance, and you have not yet made case that strict typing has any benefit on performance, so implying that opponents of strict typing somehow don't care about performance while you champion it does not match the real situation. My intention is just that, clear the doubts, I thought and may still think that strict has some advantages, but I'm been proven wrong and many people with all these insightful information might as well.
Re: [PHP-DEV] new json, push generated file?
Hi Anatol, On Sun, Feb 22, 2015 at 6:09 PM, Anatol Belski anatol@belski.net wrote: FYI I had to downgrade re2c to 0.13.6 as the latest randomly crashes. Ok. :) There are no differences in the generated DFA so it's not a problem for me to use 0.13.6 too. The preferred versions are more about nicer diffs when regenerating files. So it's not a big issue if it gets regenerated with another supported version. I test all supported versions when I do some changes to the parser or scanner and I can always regenerate it back if someone else needs to do some urgent changes ;) Cheers Jakub
[PHP-DEV] Reclassify E_STRICT notices
Hi internals! I would like to propose reclassifying our few existing E_STRICT notices and removing this error category: https://wiki.php.net/rfc/reclassify_e_strict As we don't really have good guidelines on when which type of error should be thrown, I'm mainly going by what category other similar errors use. I'm open to suggestions, but hope this will not deteriorate into total bikeshed. Thanks, Nikita
Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)
Hi! -- Can the code generated for a strict type hint can somehow be optimized significantly better than the code generated for a dynamic/coercive type hint. And me, who wrote an AOT compiler that does **exactly** this, claim Sorry, did exactly what? Here a bit more explanation would help. However, since test_strict() is compiled, there's no reason to dispatch back up to PHP functions for strict_foo(). In fact, that would be exceedingly slow. So instead, we'd compile strict_foo() as a C function, and do a native function call to it. Never having to check types because they are passed on the C stack. Doesn't that assume strict_foo() is always called with the right type of arguments? What exactly ensures that it does in fact happen? Shouldn't you have the type check _somewhere_ to be able to claim this happens? test_foo() doesn't do any checks, so what ensures $x is of the right type for C? And if the check is there, how is it better? And note that this can only work with strict types since you can do the necessary type inference and reconstruction (both forward from a function call, and backwards before it). I don't get the backwards part - I think you claimed it last time we discussed it but I haven't seen your answer explaining why it's OK to just ignore cases when the variable is of the wrong type. Right now, it looks like you claim that if somebody has a call strict_foo($x) and strict_foo() accepts integers, that magically makes $x integer and you can generate code everywhere (not only inside strict_foo but outside) assuming $x is integer without actually needing a check. I don't see how this can work. -- Stas Malyshev smalys...@gmail.com -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [VOTE] Remove PHP 4 Constructors
On Sun, Feb 22, 2015 at 11:33 PM, Yasuo Ohgaki yohg...@ohgaki.net wrote: Hi Levi, On Mon, Feb 23, 2015 at 1:39 PM, Levi Morrison le...@php.net wrote: I have moved the RFC for removing PHP 4 constructors[1] into voting phase. As there are a lot of RFCs in discussion and voting right now I will leave this RFC in voting phase until the evening (UTC-7) of March 6th which is 12 days away; this will hopefully allow everyone to be able to review this RFC and vote on it without being rushed. This may be a bit off topic. During this RFC discussion, I mentioned Trait method name issue that trait method which has PHP4 constructor name for the class is treated as class constructor. http://3v4l.org/gHbdq (Trait method is called as constructor) If there is __construct() in the class http://3v4l.org/HEBMl (Fatal error: A has colliding constructor definitions coming from traits) Is this bug fixed also? Or we have to wait until PHP8? Aside from the new warning that is emitted and the old one that is removed no other behavior is changed. This means the bug will remain through the PHP 7 lifecycle even if this RFC passes. If the RFC passes the colliding constructor issue will be removed in PHP 8 when the rest of the old-style constructor support is removed. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] Re: [RFC] Script only include/require
Hi all, Zend engine experts especially, On Mon, Feb 23, 2015 at 6:23 AM, Yasuo Ohgaki yohg...@ohgaki.net wrote: I wrote patch and made adjustment in the RFC https://wiki.php.net/rfc/script_only_include https://github.com/php/php-src/pull/ Where to check filename extension is subject to be changed. At first, I thought implementing this as PHP code is good, but I've changed my mind. It seems better to be done in Zend code. Opinions are appreciated. I noticed very strange behavior under ZTS build with this patch. It turned out that compiler_globals is not accessible under ZTS build according to gdb. Is this intended? If so, where should I put script_extensions char array? Thank you. -- Yasuo Ohgaki yohg...@ohgaki.net
RE: [PHP-DEV] Coercive Scalar Type Hints RFC
Hi, For those interested in evaluating the impact of ZPP ruleset modications on internal and userland code, A pull request is now available : https://github.com/php/php-src/pull/1110 Please note that this is not a mere implementation of the RFC ruleset, although it comes preconfigured this way. It contains a set of 12 configurable options, each one enabling/disabling a particular ruleset modification. This allows for a much more powerful exploration of potential modifications and BC breaks against the existing codebase. Every combination of individual behaviors is possible, providing a theoretical number of about 3,000 potentials rulesets. Of course, a lot of these are not consistent, but it still allows for creative thinking. Given the time I had to write it, I didn't perform extensive testing. I just ensured the ruleset described in the RFC and the one you get when activating every possible changes both compile and seem to work as expected. I'll test more cases tomorrow. So, code review is key priority and every error (compile or runtime) you may get should be reported as fast as possible. Overall configuration possibilities include and go beyond the STH RFC, with the exception of numeric strings, whose proposed restrictions are not implemented yet, but will be soon. So, I hope you'll enjoy the new toy. And thoughts are welcome, as usual. Regards François -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Reclassify E_STRICT notices
Hi Nikita, On Mon, Feb 23, 2015 at 7:30 AM, Nikita Popov nikita@gmail.com wrote: I would like to propose reclassifying our few existing E_STRICT notices and removing this error category: https://wiki.php.net/rfc/reclassify_e_strict As we don't really have good guidelines on when which type of error should be thrown, I'm mainly going by what category other similar errors use. I'm open to suggestions, but hope this will not deteriorate into total bikeshed. +1 overall. Regarding Only variables should be assigned by reference Most of errors are appropriate, but some of them may be removed. For example, literals do not make sense so current behavior is good. $ php -r 'array_pop([1,2,3]);' PHP Fatal error: Only variables can be passed by reference in Command line code on line 1 However, emitting Only variables should be assigned by reference for this $top = array_pop(some_func_returns_array()); // Code needs only top element seems too strict, for example. I would rather PHP behaves like HHVM - http://3v4l.org/5AIrb. - http://3v4l.org/O0SXE Is it possible relax the error for tmp variables? Regards, -- Yasuo Ohgaki yohg...@ohgaki.net
Re: [PHP-DEV] [VOTE] Remove PHP 4 Constructors
Hi Levi, On Mon, Feb 23, 2015 at 1:39 PM, Levi Morrison le...@php.net wrote: I have moved the RFC for removing PHP 4 constructors[1] into voting phase. As there are a lot of RFCs in discussion and voting right now I will leave this RFC in voting phase until the evening (UTC-7) of March 6th which is 12 days away; this will hopefully allow everyone to be able to review this RFC and vote on it without being rushed. This may be a bit off topic. During this RFC discussion, I mentioned Trait method name issue that trait method which has PHP4 constructor name for the class is treated as class constructor. http://3v4l.org/gHbdq (Trait method is called as constructor) If there is __construct() in the class http://3v4l.org/HEBMl (Fatal error: A has colliding constructor definitions coming from traits) Is this bug fixed also? Or we have to wait until PHP8? Regards, P.S. I voted for yes, of course. -- Yasuo Ohgaki yohg...@ohgaki.net
Re: [PHP-DEV] [VOTE] Remove PHP 4 Constructors
Hi Levi, On Mon, Feb 23, 2015 at 3:40 PM, Levi Morrison le...@php.net wrote: Aside from the new warning that is emitted and the old one that is removed no other behavior is changed. This means the bug will remain through the PHP 7 lifecycle even if this RFC passes. If the RFC passes the colliding constructor issue will be removed in PHP 8 when the rest of the old-style constructor support is removed. Thank you for the answer. Everyone should vote yes for this RFC. Regards, -- Yasuo Ohgaki yohg...@ohgaki.net
Re: [PHP-DEV] Coercive Scalar Type Hints RFC
On 02/22/2015 06:28 PM, François Laupretre wrote: Hi Stas, It seems the actual problem is that we have too many compiler / code analysis experts in the community ;) (don't get me wrong, I am not saying that for you, I just admire your patience explaining the same again and again to people who never read one line from PHP core source). Well I never have worked on a JIT/AOT and I have to admit I haven't done any contributions to the PHP engine (and it seems I do not have any rights to write some couple of messages expressing concerns/views because of that). On the other side I took the wxwidgets extension in an effort to revive it (because I believe PHP can have other use cases). Improved its code generator (and other stuff that involved a relation with the PHP source code) which now generates more than 905941 lines of code that constitute the extension (github.com/wxphp/wxphp/tree/master/src). So I have indeed read source from PHP core. In any case, sorry if I have annoyed some, that never was my intention, we as humans can't posses all the knowledge of the world, so thats why we always learn from somebody else, whats the purpose of a community without participation :) Cheers! -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)
Stas, On Sun, Feb 22, 2015 at 6:47 PM, Stanislav Malyshev smalys...@gmail.com wrote: Hi! You can tell because you know the function foo expects an integer. So you can infer that $x will have to have the type integer due to the future requirement. Which means the expression $something / 2 must also be an integer. We know that's not the case, so we can raise an error here. OK, so your claim is that the compiler with strict typing can detect some situations which the dynamic one can not and reject some of the code. Without going too much into details, I agree with this, this is an obvious difference between strict and dynamic. However, this is not a Alright, we're getting somewhere. performance advantage, obviously - since you are comparing running code with non-running one - your model just accepts less code. Obviously, this works if non-accepted code was wrong - and doesn't work if it was not. But we talked about running code, I thought. It is still a performance advantage, because since we know the types are stable at compile time, we can generate far more optimized code (no variant types, native function calls, etc). And yes, it accepts less code. It refuses to accept code that is not type stable. More on that in a second: At that point the developer has the choice to explicitly cast or put in a floor() or one of a number of options. That's exactly what I claim would be the defect of the strict model - people would start putting excessive casts ensuring there would be cases where information is lost. For example, assume we knew $something is even: function bar(int $something): int { assert($something %2 == 0); $x = $something / 2; return foo($x); } Now everything is fine (ignoring the typing for a second), right? We're dealing with integers, /2 always divides evenly, all is great. Now we introduce strictness, so we'd need to say something like: function bar(int $something): int { assert($something %2 == 0); $x = $something / 2; return foo((int)$x); } Now assume somebody messed up on the routine code reformatting merge and the code somehow ended up like: function bar(int $something): int { $x = $something / 2; return foo((int)$x); } Do you see what the problem is? Now we lost the check for $something being even, but we would never know about it since type system forced us to insert (int) (which we didn't need) and thus disabled the controls for the bug of $something not being even (which we did need). Actually, in this case, the int cast does tell us something. It says that the result (truncation) is explicitly wanted. Not to the compiler (tho that happens), but to the developer. With coercive typing as proposed in Ze'ev's RFC, that would need to happen anyway. In both proposals that would generate a runtime error. The difference is, with strict types, we can detect the error ahead of time and warn about it. But more important question is - with (int) the coercive model can use this information too, so what's the difference from strict model on that code? There seems to be none. In this precise example there is none, because division is not type stable (it depends on the values of its arguments). Let's take a different example function foo(float $something): int { return $something + 0.5; } With coercive types, you can't tell ahead of time if that will error or not. With static types, you can. Without strict typing this code is always stable, but you still need to generate full type assertions in a compiled version of foo() and use ZVALs for $x, hence reducing the effect of the optimization significantly. Wait, you said this code is invalid so no code will be generated. Did you mean code after introducing (int)? Then strict has no advantage anymore as we can derive the info from (int) anyway. Otherwise, I can't see how you can avoid generating typechecks in foo() unless the only place it can ever be called from is bar() - but I don't see how you can ensure that in PHP, and if you could, I don't see why weak model could not make the same conclusions on the same code. No, I was talking about trying to do the same trick (using native function calls) with coercive types. So far the only advantage I've seen seems to be that your compiler would reject code that looks suspicious to it and thus force the programmer to coerce the variables into the types manually - by (int) or floor() - something that the coercive model would do for you automatically. Once coerced, the same code would have the same type info Actually, no. Coercive as proposed by Ze'ev would cause 8.5 to error if passed to an int type hint. So you'd need the cast there as well. Either that, or error at runtime as well. Hence, in both cases casts would be required. One could tell you ahead of time where you forgot a cast, the other would wait until runtime (when the edge-case was hit). (and thus same potential optimizations)
Re: [PHP-DEV] Coercive Scalar Type Hints RFC
On 02/22/2015 06:28 PM, François Laupretre wrote: Hi Stas, It seems the actual problem is that we have too many compiler / code analysis experts in the community ;) (don't get me wrong, I am not saying that for you, I just admire your patience explaining the same again and again to people who never read one line from PHP core source). Well I never have worked on a JIT/AOT and I have to admit I haven't done any contributions to the PHP engine (and it seems I do not have any rights to write some couple of messages expressing concerns/views because of that). On the other side I took the wxwidgets extension in an effort to revive it (because I believe PHP can have other use cases). Improved its code generator (and other stuff that involved a relation with the PHP source code) which now generates more than 905941 lines of code that constitute the extension (github.com/wxphp/wxphp/tree/master/src). So I have indeed read source from PHP core. In any case, sorry if I have annoyed some, that never was my intention, we as humans can't posses all the knowledge of the world, so thats why we always learn from somebody else, whats the purpose of a community without participation :) Cheers! -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)
Hi! It is still a performance advantage, because since we know the types are stable at compile time, we can generate far more optimized code (no variant types, native function calls, etc). I don't see where it comes from. So far you said that your compiler would reject some code. That doesn't generate any code, optimized or otherwise. For the code your compiler does not reject, still no advantage over dynamic model. Actually, in this case, the int cast does tell us something. It says that the result (truncation) is explicitly wanted. Not to the compiler (tho that happens), but to the developer. No, it doesn't say that in this case. The developer didn't actually want truncation. They just wanted to call foo(). You forced them to use truncation because that's the only way to call foo() in your compiler. They said it's ok since truncation is over value that is int anyway, and they are true - except when it stops to be true in the future. That generates brittle code because it forces the developer to take risks they otherwise wouldn't take - such as use much stronger forced conversions instead of more appropriate dynamic ones. With coercive typing as proposed in Ze'ev's RFC, that would need to happen anyway. In both proposals that would generate a runtime error. No, it wouldn't need to happen since no-DL conversion is allowed. The difference is, with strict types, we can detect the error ahead of time and warn about it. Static analyzer can warn about it regardless of type model. The only difference in strict model is that when compiling - not ahead of time, but in runtime - it would produce hard error even in case of even number, which can work just fine without it. In this precise example there is none, because division is not type That's what I am saying - if the code runs, there's no difference. The only difference your model runs less code, and forces (or, rather, strongly incentivizes) people to wrote more dangerous one because some of the non-dangerous one is not allowed. stable (it depends on the values of its arguments). Let's take a different example function foo(float $something): int { return $something + 0.5; } With coercive types, you can't tell ahead of time if that will error or not. With static types, you can. I'm not sure what this proves. Yes, of course there are cases where strict typing (please let's not confuse it with static typing - these are different things, static typing is when everything's type is known in advance and this is not happening in PHP, that's kind of the whole point) would disallow some code that dynamic typing allows. Nobody argues with that. What I am arguing with is that this difference is somehow useful - especially for JIT optimizations. No, I was talking about trying to do the same trick (using native function calls) with coercive types. I'm not sure what you are comparing to what. You provide some code and say in my compiler, this code A would not work, while in dynamic model it would. Instead, you should write code B. This code B would run faster in my compiler. But that is not a proof your compiler is better! Because code B would also run faster in dynamic model, and in addition, code A would also run (though indeed not faster than B). Actually, no. Coercive as proposed by Ze'ev would cause 8.5 to error if passed to an int type hint. So you'd need the cast there as well. Either that, or error at runtime as well. We were talking about the case where the argument was even, you must have missed that part. If the argument is not even, indeed both models would produce the same error, no difference there. The only difference in your model vs. dynamic model so far is that you forced the developer to do manual (int) instead of doing much smarter coercive check on entrance of foo(). There's no performance improvement in that and there's reliability decrease. Hence, in both cases casts would be required. One could tell you ahead of time where you forgot a cast, the other would wait until runtime (when the edge-case was hit). You imply it's always the case of forgot and the casts always should be there, which is not the case - actually, as I already said, I think this is the main defect of your model, forcing manual casts everywhere. Otherwise, I agree - that's the only difference. Still struggle to see any JIT gain. So far only one advantage demonstrated was the obvious one - if you obviously pass obvious non-int to int parameter in strict model, this can be detected statically. It would be stupid to deny that as it is pretty much immediately follows from the definition of strict model. But that's the only difference I see and not much of an advantage in my eyes as a) patently obvious cases would be pretty rare and b) in many cases would also not be what developer wanted, leading to manual casts and c) last but not least, static analyzer doing that can be as easily written without having these strict rules in core PHP!
Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)
On 23/02/15 00:25, Anthony Ferrara wrote: And as the static analyzer traces back, if it finds possibilities that don't match (for example, if you assigned it directly from $_POST), it's able to say that either the original assignment or the function call is an error. Why would using an integer I've passed in a URL be a 'fault'? All of the data navigation functions pass their state via the URL and one simply protects against hackers by filtering the state to a default value if it does not return the correct integer data. -- Lester Caine - G8HFL - Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk Rainbow Digital Media - http://rainbowdigitalmedia.co.uk -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)
Zeev, Partially. The static analysis and compilation would be pure AOT. So the errors would be told to the user when they try to analyze the program, not run it. Similar to HHVM's hh_client. How about that then: 1. The developers runs a static analyzer on the program. 2. It fails because the static analyzer detects float being fed to an int. 3. The user changes the code to convert the input to int. 4. You can now optimize the whole flow better, since you know for a fact it's an int. Is that an accurate flow? Yes. At least for what I was talking about in this thread. However, there could be a runtime compiler which compiles in PHP's compile flow (leveraging opcache, etc). In that case, if the type assertion isn't stable, the function wouldn't be compiled (the external analyzer would error, here it just doesn't compile). Then the code would be run under the Zend engine (and error when called). Got you. Is it fair to say that if we got to that case, it no longer matters what type of type hints we have? Once you get to the end, no. Recki-CT proves that. The difference though is the journey. The static analyzer can reason about far more code with strict types than it can without (due to the limited number of possibilities presented at each call). So this leaves the dilema: compiled code that behaves slightly differently (what Recki does) or whether it always behaves the same. So think of it as a graph. When you start the type analysis, there's one edge between $input and foo() with type mixed. Looking at foo's argument, you can say that the type of that graph edge must be int. Therefore it becomes an int. Then, when you look at $input, you see that it can be other things, and therefore there are unstable states which can error at runtime. So when you say it 'must be an int', what you mean is that you assume it needs to be an int, and attempt to either prove that or refute that. Is that correct? If you manage to prove it - you can generate optimal code. If you manage to refute that - the static analyzer will emit an error. If you can't determine - you defer to runtime. Is that correct? Basically yes. Anthony -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)
Stas, On Sun, Feb 22, 2015 at 8:35 PM, Stanislav Malyshev smalys...@gmail.com wrote: Hi! The difference though is the journey. The static analyzer can reason about far more code with strict types than it can without (due to the limited number of possibilities presented at each call). So this leaves the dilema: compiled code that behaves slightly differently (what Recki does) or whether it always behaves the same. Wait, so are you saying that advantage of having strict typing in PHP core is that some analyzer - which does not share code with PHP core, AFAIU - if it interpreted PHP types in strict manner and provided warnings where types it can statically deduce do not match and the authors of the code agreed with its suggestions and rewrote their code so that the analyzer would not complain, would in some cases result in code that might be JIT-optimized more efficiently? That is not a claim about strict typing in PHP core having any benefit at all. I'm not sure even this claim is true (as adding (int) doesn't actually improve performance - it just shifts around the place where the conversion is done, and once conversion is done, you can do the same optimizations as before) - but even if there's some situation where it is true, I don't see how it makes difference for PHP core (even in situation of PHP core + JIT extension or non-Zend PHP runtime with AOT/JIT). Please don't twist my words. Look at everything I said, don't take one statement from one very specific topic out of context as some sort of proof that there are no benefits. Anthony -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)
On 02/22/2015 09:15 PM, Stanislav Malyshev wrote: We were talking about the case where the argument was even, you must have missed that part. If the argument is not even, indeed both models would produce the same error, no difference there. The only difference in your model vs. dynamic model so far is that you forced the developer to do manual (int) instead of doing much smarter coercive check on entrance of foo(). There's no performance improvement in that and there's reliability decrease. How is coercive much smarter? Basically what coercive would do is similar to what the intval(), floatval(), etc... set of functions do with some type checking on the mix to ensure a value matches some set of rules. How casting (int) could be such dangerous thing? Lets take for example this code: echo (int) whats cooking!; echo intval(whats cooking); Both statements print 0, so how is casting unsafe??? -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)
-Original Message- From: Anthony Ferrara [mailto:ircmax...@gmail.com] Sent: Monday, February 23, 2015 3:21 AM To: Zeev Suraski Cc: PHP internals Subject: Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC) Zeev, Partially. The static analysis and compilation would be pure AOT. So the errors would be told to the user when they try to analyze the program, not run it. Similar to HHVM's hh_client. How about that then: 1. The developers runs a static analyzer on the program. 2. It fails because the static analyzer detects float being fed to an int. 3. The user changes the code to convert the input to int. 4. You can now optimize the whole flow better, since you know for a fact it's an int. Is that an accurate flow? Yes. At least for what I was talking about in this thread. OK. So the code after the fix would look like this: ?php declare(strict_types=1); function foo(int $int): int { return $int + 1; } function bar(int $something): int { $x = (int) $something / 2; // (int) or whatever else makes it clear it's an int return foo($x); } ? Let me explain how this could play out with coercive type hints: ?php function foo(int $int): int { return $int + 1; } function bar(int $something): int { $x = $something / 2; return foo($x); } We can all agree that determining the types of just about anything here is ultra-easy, so easy you could do it with a static analyzer, as you suggested. $int and $something are integers, while $x is either an integer or a float. We also know that both foo() and bar() expect integers. What's the optimal code we could generate here? First, on the function body of foo(), we can clearly and easily translate the whole into machine code, as we know we'll get a long and need to return a long. Moving to the caller scope in bar(), given we know $x is either a float or an integer, we could either generate code that calls coerce_to_int($x), or even some optimize machine code that checks zval.type and either uses the lval or converts dval. This can be done in AOT, no need to wait for runtime. Once we know for a fact we have an integer in our hands - we can make the call directly to the optimized foo(), a C level call without the overhead of a PHP function call. If you look at the generated code, it's going to be remarkably similar between the two cases. If the developer chooses to pick the casting route, it will look almost identical - except it will be convert_to_long() that is called instead of coerce_to_int(), the former being more aggressive than the latter. Can you see anything impossible or otherwise wrong with my description of how the AOT compiler would work in this case, with coercive type hints? If not, there are no performance benefits for the Strict typed version after the user alters his code to behave similarly to what coercive type hints would bring. Based on our Twitter discussion, I think I may have not made my position clear regarding where our differences are. I'm not claiming that you can't do the optimizations you say you can do. Not at all. My point is that we can do the very same optimizations with coercive types as well - basically, that there is no delta. However, there could be a runtime compiler which compiles in PHP's compile flow (leveraging opcache, etc). In that case, if the type assertion isn't stable, the function wouldn't be compiled (the external analyzer would error, here it just doesn't compile). Then the code would be run under the Zend engine (and error when called). Got you. Is it fair to say that if we got to that case, it no longer matters what type of type hints we have? Once you get to the end, no. Recki-CT proves that. Do you mean that the statement is unfair or that it no longer matters? If it's the former, can you elaborate as to why? The difference though is the journey. The static analyzer can reason about far more code with strict types than it can without (due to the limited number of possibilities presented at each call). So this leaves the dilema: compiled code that behaves slightly differently (what Recki does) or whether it always behaves the same. So think of it as a graph. When you start the type analysis, there's one edge between $input and foo() with type mixed. Looking at foo's argument, you can say that the type of that graph edge must be int. Therefore it becomes an int. Then, when you look at $input, you see that it can be other things, and therefore there are unstable states which can error at runtime. So when you say it 'must be an int', what you mean is that you assume it needs to be an int, and attempt to either prove that or refute that. Is that correct? If you manage to prove it - you can generate optimal code. If you manage to refute that - the static analyzer will emit an error. If you can't determine - you defer to runtime. Is that correct? Basically
Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)
Hi! It rejects code because doing code generation on the dynamic case is significantly harder and more resource intensive. Could that be built in? Sure. But it's a very significant difference from generating the static code. I can appreciate that. Dynamic typing is hard to translate into statically typed code efficiently. But I don't see how that is related to PHP having strict types - surely even strict types do not make PHP statically typed, in fact, I don't see how they improve much - so far you've shown me code examples that you compiler *wouldn't* handle. I don't see not being able to handle code is an advantage. Could I see examples of code that strict model *can* handle and that work better in that model? And even if we generated native code for the dynamic code, it would still need variants, and hence ZPP at runtime. Hence the static code has a significant performance benefit in that we can indeed bypass type checks as shown in the PECL example a few messages up (more than a few). I don't see how you can bypass type checks unless you know the variable types at the time of the call, from some external source or some information you collected about the code. If you know that, you could as well generate the same check-less code for weak/dynamic model. Passing a float to an integer parameter would result in a runtime E_RECOVERABLE_ERROR if the float has dataloss. So in the case I cited: foo($someint / 2), that will generate an E_RECOVERABLE_ERROR in Zeev's proposal, as well as in the static typing mode of mine. It sounds like you've missed the part of my reply where I was saying that I am considering the case of even numbers. With coercive typing as proposed in Ze'ev's RFC, that would need to happen anyway. In both proposals that would generate a runtime error. No, it wouldn't need to happen since no-DL conversion is allowed. Sure it would. 3/2 is 1.5. Which would fatal if I passed it to foo(int) under Zeev's RFC. Because of data loss. Again, you seem to miss the part where I said that we're considering a non-DL case. For DL case, both behave the same so there's indeed no difference (while you claimed there's some advantage for strict model?) This very particular case, yes, because of the simplicity of the types involved. But with strict typing you only need to look at 1 success case, but with coercive typing you need to look at many more. I do not see why you can ignore the fact that your assumptions about the variable types could be wrong with strict typing. PHP is not a static typed language, so unless you can prove definitely that the variable absolutely can not be anything other than the prescribed type (prior to the call), you still need to have code that accounts for the other possibility. If you can, however, prove that, both strict and dynamic typing would behave exactly the same! You could, of course, build your static analyzer in a way that would reject every code where it can not prove all types - however I hope you understand it is not an option for PHP core? Also, in many (I'd argue most) cases coercive has to either issue a warning (it doesn't know) or error on valid and functioning code. Example: function isdivisibleby2(string $foo): bool { if (preg_match('(\D)', $foo)) { return false; } return 0 == ($int % 2); } function something2(string $foo): int { if (!isdivisibleby2($foo)) { return 10; } return foo($foo / 2); } This code would never raise a runtime error in Zeev's coercive proposal. However, when looking at it statically, you cant tell (unless you've got a regex decompiler). So static analysis on dynamic types will either error on valid code, or not error on invalid code (and I'm not even talking about the halting problem here). True, but PHP is built on dynamic types, and neither proposal changes that. So you either propose to make PHP fully statically typed (which I hope you do not) or say static analysis is not perfect - which I wholeheartedly agree, but then again CS is full of unsolvable problems, and static analysis is, unfortunately, reduceable ultimately to one of them, so no wonder here. The same case would, of course, be true with strict and non-strict runtime typing - simply because PHP is not statically typed. Whereas with strict typing, the error would appear in both cases (static and runtime). And you could fix it. If you are saying that you can construct code, containing an error, which will be missed by coercive typing but would fail (not necessarily because of this specific error, but because of type mismatch) with strict typing, it is of course trivially true. But so what? This in no way proves strict typing caught the error - to prove that, the type failure should be causally connected to the error, in your examples it is not. Moreover, you somehow bring example of the code that is actually not wrong, practically speaking (as it divides by 2 the number
RE: [PHP-DEV] Type hints ...
Hi Lester, I am not sure I understand well, but the extended type syntax partially described in https://wiki.php.net/rfc/dbc may correspond to what you describe. Such extended syntax will be part of 'Design by Contract', meaning it's potentially too slow to run in production and checks can be turned on and off globally. When it is available, PHP argument type hints will become simplified fast checks that run every time, even in production. Extended types will support nested syntax as complex as 'object(Iterable)|array('id' = int(]0:), * = string|array(string))'. No limit to the syntax you may support here. It will also be available as a dynamic feature which will allow to check a variable against a dynamically-defined type. *This* will bring dramatic performance improvement in data validation. I don't imagine type hints will bring much in terms of overall performance. I guess that's what you mean but please confirm. I think this will be my next project for PHP, after STH if it passes. Regards François De : Lester Caine [mailto:les...@lsces.co.uk] Currently I have an array of variables and the docblock annotation tells me just what each element is intended to be. I process the variables on that basis and while it may be helpful to have some higher level of 'restraint', I have a working flexible system. As a variable is processed it is constrained by the appropriate rules. If PHP adds 'Type Hints' they will only apply to where I am passing an array variable, and the type hint adds additional processing to that which I already maintain myself. How will that improve performance? It won't, except if you remove some redundant checks from your PHP code. Type checks performed by STH are faster than the equivalent PHP code, that's the only possible performance improvement I imagine. Add to this equation that the type and constraints of a variable may well vary from one record set to another. It may well be that a fixed set of types can be defined, but these are not the types currently being defined and would include date types in parallel with a group of numeric types. Passing 'strict' types in some cases just does not compute in my book, and even 'coercive' types only addresses a subset of the types needed so that it adds another layer of 'checking' over what we already have in much of the existing user code base. People keep going on about different rule sets but this just adds another set of 'rules' rather than a single solution. -- Lester Caine - G8HFL - Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk Rainbow Digital Media - http://rainbowdigitalmedia.co.uk -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] Getting function namespace at runtime
Hi internals, I came really close to reach the final state of my to be proposed private class, interface and trait support here. However, I have a bug under this circumstance: https://gist.github.com/guilhermeblanco/3392925014c9f8374acc I'd love if someone could give me a hand on how could I get the currently active namespace (which does not exist at runtime, only compile time) in order to do the checks inside of VM. The only way I somehow through that could work was through something like EX(called_scope) or EX(func) or EX(call). Would love if someone could give me a north to finish the patch and put the finalized RFC for voting. =) []s, -- Guilherme Blanco MSN: guilhermebla...@hotmail.com GTalk: guilhermeblanco Toronto - ON/Canada
RE: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)
-Original Message- From: Anthony Ferrara [mailto:ircmax...@gmail.com] Sent: Monday, February 23, 2015 1:35 AM To: Zeev Suraski Cc: PHP internals Subject: Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC) Zeev, And note that this can only work with strict types since you can do the necessary type inference and reconstruction (both forward from a function call, and backwards before it). Please do explain how strict type hints help you do inference that you couldn't do with dynamic type hints. Ultimately, your whole argument hinges on that, but you mention it in parentheses almost as an afterthought. I claim the opposite - you cannot infer ANYTHING from Strict STH that you cannot infer from Coercive STH. Consequently, everything you've shown, down to the C-optimized version of strict_foo() can be implemented in the exact same way for very_lax_foo(). Being able to optimize away the value containers is not unique to languages with strict type hints. It's done in JavaScript JIT engines, and it was done in our JIT POC. I do here: http://news.php.net/php.internals/83504 I'll re-state the specific part in this mail: ?php declare(strict_types=1); function foo(int $int): int { return $int + 1; } function bar(int $something): int { $x = $something / 2; return foo($x); } ^^ In that case, without strict types, you'd have to generate code for both integer and float paths. With strict types, this code is invalid. Ok, but how does that support your case? That alludes to the functionality difference between strict STH and dynamic STH, and perhaps your Static Analysis argument. How does it help you generate better code code? Suggesting that the nature of a type hint can help you determine what's the value that's going to be passed to it is akin to saying that the size and shape of a door can tell you something about the person or beast that's standing on the other side. It just can't. Let me illustrate it in a less colorful way. Snippet 1: ... code that deals with $input ... foo($input); function foo(int $x) { ... } Snippet 2: ... code that deals with $input ... foo($input); function foo(float $x) { ... } Question: What can you learn from the signatures of foo() in snippet 1 and 2 about the type of $input? Does the fact I changed the function signature from snippet 1 to 2 somehow affects the type of $input? In what way? If I understood you correctly, you're assuming that $input will too come over using a strict type hint, which would tell you that it's an int and therefore safe. But a coercive type hint will do the exact same job. You can tell because you know the function foo expects an integer. So you can infer that $x will have to have the type integer due to the future requirement. Which means the expression $something / 2 must also be an integer. We know that's not the case, so we can raise an error here. This is static analysis, not better code generation. And it boils down to a functionality difference, not performance difference. Zeev -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Coercive Scalar Type Hints RFC
Can you all of you stop this madness with moving discussions off list? It is detestable, against almost all openness and principles behind an oss project like php. If we can't discuss anymore design, plans, ideas etc on the list then we are doomed, for good. On Feb 22, 2015 3:49 PM, Anthony Ferrara ircmax...@gmail.com wrote: Adding in a thread that was started in private, but absolutely is worth sharing with the group: -- Forwarded message -- From: Etienne Kneuss col...@php.net Date: Sun, Feb 22, 2015 at 8:42 AM Subject: Re: [PHP-DEV] Coercive Scalar Type Hints RFC To: Zeev Suraski z...@zend.com Cc: Anthony Ferrara ircmax...@gmail.com, Dmitry Stogov dmi...@zend.com On Sun Feb 22 2015 at 14:23:58 Zeev Suraski z...@zend.com wrote: There have been several attempts: for JS: http://users-cs.au.dk/simonhj/tajs2009.pdf or similar techniques applied to PHP, quite outdated though: https://github.com/colder/phantm Looks like WebKit's type inference is doing some pretty good job at analyzing code, although I'm not sure how much of it is static vs. dynamic. My guess is that a lot of it is static: twitter.com/kangax/status/558974257724940288 My guess would be that it's almost entirely dynamic, or probabilistic (e.g. this nice recent work done at ETH: http://www.jsnice.org/). I think you underestimate the difficulty of statically recovering precise types from no-annotations without runtime witnesses ;) You don't want webkit anlysing the JS for 10 minutes until it renders the page. It is much more profitable to JIT these. You are right that the lack of static information about types is (one of the) a main issue. Recovering the types has typically a huge performance cost, or is unreliable We're not really talking about performance issue here, as static analysis is a separate activity that is unrelated to runtime performance. What I meant was: it is a performance issue for the static analyzer, not PHP itself. But seriously, time is getting wasted on this argument; it's actually a no-brainer: more static information helps tools that rely on static information. Yes. Absolutely. 100%. There's still disagreement between us on whether the different behavior of Strict STH constitutes additional static information or not, as it doesn't give you any extra information on the value being fed to the function, and it doesn't give you any extra information on what the function will receive. It only gives you information about how the function would behave if it gets a wrongly-typed value. 1) for forward analyses (which are the most common for these applications): it gives you precious information from the beginning of the function and forward. You can consider it similarly to a cast: You don't necessarily know what the value coming in is, but you know which type you are having from that point forward. 2) backward analyses could piggy-back the type constraints from the functions (strict or no strict) and check that they are met when constructing the value fed to the function. Having worked several years on static analysis tools for languages such as PHP, I can guarantee you that this information would help a lot. However, the other dynamic feature of PHP would still make analyses slow/unreliable/imprecise. Let's not imagine that this is the only thing missing for PHP to be static-analysis-wonderland, far from it. But my the bottom line is exactly the bottom line you ended with, and what I answered you on-list - how much weight should Static Analysis improvements have on our decision to introduce new language features? My answer is not that much, if they have downsides. Static Analyzers should be designed for languages and not vice versa. I fully agree in general that the flow should be this way. But it remains a bonus if a certain feature, as a plus, would help external tools. I believe it is worth mentionning.. Thanks, Zeev -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [RFC][DISCUSSION] Context Sensitive lexer
Hi, Stas 2015-02-22 19:20 GMT-03:00 Stanislav Malyshev smalys...@gmail.com: Hi! I like the idea. But we need to examine the cases carefully so we don't block some future routes - especially this is with regards to such things as type names which we wanted to reserve. I.e. method names resolution is probably clear, since they appear after - or ::, but for class names the context may be much more varied. -- Stas Malyshev smalys...@gmail.com I agree. You and Nikita are right. Doing more than that with a pure lexical approach, without migrating to another lexer generator (which was already attempted before) or using some form of lexer feedback (which at current state breaks ext tokenizer) would be inadequate and create future issues. I'll probably work on a more ambitious and adequate solution for PHP 7.1~7.2. For now, as said before, I'll revert the RFC, and proposed patch, to version 0.2 aiming only class|object members declaration and access. This is perfectly achievable, has no drawbacks and brings many benefits. The RFC will probably be ready for discussion again in ~2 days. Thanks, Márcio
Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)
Zeev, I think we are indeed getting somewhere, I hope. If I understand correctly, effectively the flow you're talking about in your example is this: 1. The developers tries to run the program. 2. It fails because the static analyzer detects float being fed to an int. 3. The user changes the code to convert the input to int. 4. You can now optimize the whole flow better, since you know for a fact it's an int. Did I describe that correctly? Partially. The static analysis and compilation would be pure AOT. So the errors would be told to the user when they try to analyze the program, not run it. Similar to HHVM's hh_client. However, there could be a runtime compiler which compiles in PHP's compile flow (leveraging opcache, etc). In that case, if the type assertion isn't stable, the function wouldn't be compiled (the external analyzer would error, here it just doesn't compile). Then the code would be run under the Zend engine (and error when called). With strict typing at the foo() call site, it tells you that $input has to be an int or float (respectively between the snippets). I'm not following. Are you saying that because foo() expects an int or float respectively, $input has to be int or float? What if $input is really a string? Or a MySQL connection? So think of it as a graph. When you start the type analysis, there's one edge between $input and foo() with type mixed. Looking at foo's argument, you can say that the type of that graph edge must be int. Therefore it becomes an int. Then, when you look at $input, you see that it can be other things, and therefore there are unstable states which can error at runtime. Or are you saying that there was a strict type hint in the function that contains the call to foo(), so we know it's an int/float respectively? If so, how would it be any different with a coercive type hint? Not all data gets into a function from a parameter: function bar() { $x = $_POST['data']; foo($x); } in that case, we know $x can only be a string or an array (unless we find where that variable was written to in the program). So we know for a fact that there's a type error, even though it wasn't a parameter. Going deeper, we can look at other cases: function x() { if (time() % 360 0) { return 123; } } function bar() { $x = x(); foo($x); } In this case, we know that x() has two possible types: int/null. That doesn't satisfy the valid possibilities for foo (int), hence there's a possible type error. The key difference is this: Forward analysis (typing $x by assignment) can tell you valid modes for your program. Backward analysis (determining $x's type by its usages) can tell you invalid modes for your program. Combining them gives you more flexibility in hard-to-infer/reconstruct situations. Anthony -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)
-Original Message- From: Anthony Ferrara [mailto:ircmax...@gmail.com] Sent: Monday, February 23, 2015 3:02 AM To: Zeev Suraski Cc: PHP internals Subject: Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC) Zeev, I think we are indeed getting somewhere, I hope. If I understand correctly, effectively the flow you're talking about in your example is this: 1. The developers tries to run the program. 2. It fails because the static analyzer detects float being fed to an int. 3. The user changes the code to convert the input to int. 4. You can now optimize the whole flow better, since you know for a fact it's an int. Did I describe that correctly? Partially. The static analysis and compilation would be pure AOT. So the errors would be told to the user when they try to analyze the program, not run it. Similar to HHVM's hh_client. How about that then: 1. The developers runs a static analyzer on the program. 2. It fails because the static analyzer detects float being fed to an int. 3. The user changes the code to convert the input to int. 4. You can now optimize the whole flow better, since you know for a fact it's an int. Is that an accurate flow? However, there could be a runtime compiler which compiles in PHP's compile flow (leveraging opcache, etc). In that case, if the type assertion isn't stable, the function wouldn't be compiled (the external analyzer would error, here it just doesn't compile). Then the code would be run under the Zend engine (and error when called). Got you. Is it fair to say that if we got to that case, it no longer matters what type of type hints we have? With strict typing at the foo() call site, it tells you that $input has to be an int or float (respectively between the snippets). I'm not following. Are you saying that because foo() expects an int or float respectively, $input has to be int or float? What if $input is really a string? Or a MySQL connection? So think of it as a graph. When you start the type analysis, there's one edge between $input and foo() with type mixed. Looking at foo's argument, you can say that the type of that graph edge must be int. Therefore it becomes an int. Then, when you look at $input, you see that it can be other things, and therefore there are unstable states which can error at runtime. So when you say it 'must be an int', what you mean is that you assume it needs to be an int, and attempt to either prove that or refute that. Is that correct? If you manage to prove it - you can generate optimal code. If you manage to refute that - the static analyzer will emit an error. If you can't determine - you defer to runtime. Is that correct? For now only focusing on these two parts so that we can make some progress; May come back to others later... Thanks, Zeev -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)
Stas, It is still a performance advantage, because since we know the types are stable at compile time, we can generate far more optimized code (no variant types, native function calls, etc). I don't see where it comes from. So far you said that your compiler would reject some code. That doesn't generate any code, optimized or otherwise. For the code your compiler does not reject, still no advantage over dynamic model. It rejects code because doing code generation on the dynamic case is significantly harder and more resource intensive. Could that be built in? Sure. But it's a very significant difference from generating the static code. And even if we generated native code for the dynamic code, it would still need variants, and hence ZPP at runtime. Hence the static code has a significant performance benefit in that we can indeed bypass type checks as shown in the PECL example a few messages up (more than a few). Actually, in this case, the int cast does tell us something. It says that the result (truncation) is explicitly wanted. Not to the compiler (tho that happens), but to the developer. No, it doesn't say that in this case. The developer didn't actually want truncation. They just wanted to call foo(). You forced them to use truncation because that's the only way to call foo() in your compiler. They said it's ok since truncation is over value that is int anyway, and they are true - except when it stops to be true in the future. That generates brittle code because it forces the developer to take risks they otherwise wouldn't take - such as use much stronger forced conversions instead of more appropriate dynamic ones. Look at the RFC that Zeev proposed: https://wiki.php.net/rfc/coercive_sth#user-land_additions Passing a float to an integer parameter would result in a runtime E_RECOVERABLE_ERROR if the float has dataloss. So in the case I cited: foo($someint / 2), that will generate an E_RECOVERABLE_ERROR in Zeev's proposal, as well as in the static typing mode of mine. Hence to say casts are needed is a bit over-stating this proposal... With coercive typing as proposed in Ze'ev's RFC, that would need to happen anyway. In both proposals that would generate a runtime error. No, it wouldn't need to happen since no-DL conversion is allowed. Sure it would. 3/2 is 1.5. Which would fatal if I passed it to foo(int) under Zeev's RFC. Because of data loss. The difference is, with strict types, we can detect the error ahead of time and warn about it. Static analyzer can warn about it regardless of type model. The only difference in strict model is that when compiling - not ahead of time, but in runtime - it would produce hard error even in case of even number, which can work just fine without it. This very particular case, yes, because of the simplicity of the types involved. But with strict typing you only need to look at 1 success case, but with coercive typing you need to look at many more. Also, in many (I'd argue most) cases coercive has to either issue a warning (it doesn't know) or error on valid and functioning code. Example: function isdivisibleby2(string $foo): bool { if (preg_match('(\D)', $foo)) { return false; } return 0 == ($int % 2); } function something2(string $foo): int { if (!isdivisibleby2($foo)) { return 10; } return foo($foo / 2); } This code would never raise a runtime error in Zeev's coercive proposal. However, when looking at it statically, you cant tell (unless you've got a regex decompiler). So static analysis on dynamic types will either error on valid code, or not error on invalid code (and I'm not even talking about the halting problem here). Whereas with strict typing, the error would appear in both cases (static and runtime). And you could fix it. In this precise example there is none, because division is not type That's what I am saying - if the code runs, there's no difference. The only difference your model runs less code, and forces (or, rather, strongly incentivizes) people to wrote more dangerous one because some of the non-dangerous one is not allowed. More dangerous? stable (it depends on the values of its arguments). Let's take a different example function foo(float $something): int { return $something + 0.5; } With coercive types, you can't tell ahead of time if that will error or not. With static types, you can. I'm not sure what this proves. Yes, of course there are cases where strict typing (please let's not confuse it with static typing - these are different things, static typing is when everything's type is known in advance and this is not happening in PHP, that's kind of the whole point) would disallow some code that dynamic typing allows. Nobody argues with that. What I am arguing with is that this difference is somehow useful - especially for JIT optimizations. I've shown it a few times in this thread. So far nobody has said not possible to the code
Re: [PHP-DEV] [RFC] [FINAL DISCUSSION] Script only include/require
Hi Stas, On Mon, Feb 23, 2015 at 7:00 AM, Stanislav Malyshev smalys...@gmail.com wrote: I think this will be the final discussion before vote. This RFC is to make PHP stronger against script inclusion attacks just like other languages. https://wiki.php.net/rfc/script_only_include I still think this RFC takes a wrong road for the following reasons: 1. Having any code in your app that allows to run include on user-controlled files (I'm not talking about filtered cases but user data controlling the path) is insecure and can not be made secure. It should just never be done. Trying to find workarounds for this is like safe_mode - good idea in theory, leads to worse security in practice. This is mitigation proposal against script inclusions. The difference is clear by statistics. Because this is mitigation, it does not aims to be a perfect solution. It aims to make PHP as secure as other languages. I think system admins feel more comfortable with this change, too. They know PHP programs are very weak against script inclusion attacks compare to other languages. 2. Default configuration would break tons of PHP scripts with extensions other than .php (very frequent case). The BC break potential of this is very big as it modifies core functionality. Compatibility can be provided by one liner. ini_set('zend.script_extensions', '.php .phar .inc .phtml .php4 .php5'); ini_set() does not emit any errors for non existing INIs. 3. Prohibiting phar uploads would also be a bc break, but more importantly, there still probably are ways to work around this by using phar files with extension different than .phar and then asking to include files within that phar file. As long as the eventual path would end in .php, your code would allow it. Security is trade off relation, so I think this change acceptable trade off to disable script inclusion (executing attacker programs). Users can move uploaded files safely without move_uploaded_file() now. I just made use of it to provide another mitigation, since script only include cannot be mitigation for uploading script files under docroot. Also, the claim that move_upload_file() is obsolete is not based on anything as far as I can see. Why is it obsolete? move_uplaoded_file() is needed for register_globals. Attacker could specify source files (i.e. in $_FILES) other than uploaded files with register_globals. Current move_uploaded_file() checks source filename is really a uploaded file's filename. It prevents moving other files, so it's not completely useless but there is not real protections now because values in $_FILES is safe now. I know your point of view, but I hope you like this RFC. Thank you for your comment. Your comments are very helpful to come up with this RFC. Regards, -- Yasuo Ohgaki yohg...@ohgaki.net
[PHP-DEV] Re: [RFC] Script only include/require
Hi Dmitry and Nikita, On Mon, Feb 23, 2015 at 6:23 AM, Yasuo Ohgaki yohg...@ohgaki.net wrote: I wrote patch and made adjustment in the RFC https://wiki.php.net/rfc/script_only_include https://github.com/php/php-src/pull/ Where to check filename extension is subject to be changed. At first, I thought implementing this as PHP code is good, but I've changed my mind. It seems better to be done in Zend code. Opinions are appreciated. This RFC aims to make PHP as secure as other languages with respect to script inclusion attacks. Note: File inclusion is not a scope of this RFC. INI Changes: - php_script - zend.script_extensions - Allow all files: * - NULL or Open Issues: - Error type - Is it OK to raise E_ERROR/E_RECOVERABLE_ERROR in zend_language_scanner.c? - Vote type - 50%+1 or 2/3 If there is anyone who would like to vote no for this RFC, I would like to know the reason and try to address/resolve issue you have. Thank you. We don't have care much about which error is raised from Zend engine, since there will be engine exception. My questions are, is it ok to raise E_ERROR or E_RECOVERABLE_ERROR from zend_language_scanner.c? https://github.com/php/php-src/pull//files#diff-93ad74868f98ff7232ebea7c8b7fR624 Does engine exception catches error from zend_error_noreturn()? Thank you. Regards, -- Yasuo Ohgaki yohg...@ohgaki.net
[PHP-DEV] add me
add me Gopal Sharma -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] [VOTE] Remove PHP 4 Constructors
Dear Internals, I have moved the RFC for removing PHP 4 constructors[1] into voting phase. As there are a lot of RFCs in discussion and voting right now I will leave this RFC in voting phase until the evening (UTC-7) of March 6th which is 12 days away; this will hopefully allow everyone to be able to review this RFC and vote on it without being rushed. Cheers, Levi Morrison [1]: https://wiki.php.net/rfc/remove_php4_constructors -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] Type hints ...
Silly question time again ... Currently I have an array of variables and the docblock annotation tells me just what each element is intended to be. I process the variables on that basis and while it may be helpful to have some higher level of 'restraint', I have a working flexible system. As a variable is processed it is constrained by the appropriate rules. If PHP adds 'Type Hints' they will only apply to where I am passing an array variable, and the type hint adds additional processing to that which I already maintain myself. How will that improve performance? Add to this equation that the type and constraints of a variable may well vary from one record set to another. It may well be that a fixed set of types can be defined, but these are not the types currently being defined and would include date types in parallel with a group of numeric types. Passing 'strict' types in some cases just does not compute in my book, and even 'coercive' types only addresses a subset of the types needed so that it adds another layer of 'checking' over what we already have in much of the existing user code base. People keep going on about different rule sets but this just adds another set of 'rules' rather than a single solution. -- Lester Caine - G8HFL - Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk Rainbow Digital Media - http://rainbowdigitalmedia.co.uk -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)
Hi! The difference though is the journey. The static analyzer can reason about far more code with strict types than it can without (due to the limited number of possibilities presented at each call). So this leaves the dilema: compiled code that behaves slightly differently (what Recki does) or whether it always behaves the same. Wait, so are you saying that advantage of having strict typing in PHP core is that some analyzer - which does not share code with PHP core, AFAIU - if it interpreted PHP types in strict manner and provided warnings where types it can statically deduce do not match and the authors of the code agreed with its suggestions and rewrote their code so that the analyzer would not complain, would in some cases result in code that might be JIT-optimized more efficiently? That is not a claim about strict typing in PHP core having any benefit at all. I'm not sure even this claim is true (as adding (int) doesn't actually improve performance - it just shifts around the place where the conversion is done, and once conversion is done, you can do the same optimizations as before) - but even if there's some situation where it is true, I don't see how it makes difference for PHP core (even in situation of PHP core + JIT extension or non-Zend PHP runtime with AOT/JIT). -- Stas Malyshev smalys...@gmail.com -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Allow to use argument unpacking at any place in arguments list
Hi! This makes it impossible to use this feature with some of the ext/std functions (array_udiff, array_interect_ukey, etc.) and just feels a bit incomplete... I see how it can be useful with crazy functions like array_udiff, but these are in tiny minority. What I am concerned about is that besides those functions - which are weird anyway - code like foo($a, ...$b, $c) would be completely unmanageable as it would be impossible to know where $c is actually going. I think the case for weird array functions is pretty narrow and can be handled in ad-hoc manner without introducing this construct. I'm not sure if this change requires an RFC because this is a pretty small, advancement of already existing feature that doesn't contain any It's a new syntax (yes, looking a lot like an old one, but still new), so I think it requires an RFC. -- Stas Malyshev smalys...@gmail.com -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)
-Original Message- From: Jefferson Gonzalez [mailto:jgm...@gmail.com] Sent: Monday, February 23, 2015 3:58 AM To: Stanislav Malyshev; Anthony Ferrara Cc: Zeev Suraski; Jefferson Gonzalez; PHP internals Subject: Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC) How casting (int) could be such dangerous thing? Lets take for example this code: echo (int) whats cooking!; echo intval(whats cooking); Both statements print 0, so how is casting unsafe??? One key premise behind both strict type hinting and coercive type hinting is that conversions that lose data, or that 'invent' data, are typically indicators of a bug in the code. You're right that there's no risk of a segfault or buffer overflow from the snippets you listed. But there are fair chances that if you fed $x into round() and it contains whats cooking (string), your code contains a bug. Coercive typing allows 'sensible' conversions to take place, so that if you pass 35.7 (string) to round() it will be accepted without a problem. Strict typing will disallow any input that is not of the exact type that the function expects, so in strict mode, round() will reject it. The point that was raised by Stas and others is that this is likely to push the user to explicitly cast the string to float; Which from that point onwards, happily accept whats cooking, keeping the likely bug undetected. Zeev -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)
Hi! How is coercive much smarter? Basically what coercive would do is It can accept 2.0 but not 2.5. Explicit cast is a sledgehammer - it would convert both to 2. How casting (int) could be such dangerous thing? Lets take for example this code: echo (int) whats cooking!; echo intval(whats cooking); Both statements print 0, so how is casting unsafe??? Casting by itself is not dangerous. What is dangerous is using casting to work around type system - since in this case it could hide an error (such as passing string whats cooking! to function requiring integer). Of course, you can say such errors are of no importance to you - in which case you should never use typed parameters at all and you'll be fine :) (mostly) -- Stas Malyshev smalys...@gmail.com -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)
On 02/22/2015 10:06 PM, Zeev Suraski wrote: One key premise behind both strict type hinting and coercive type hinting is that conversions that lose data, or that 'invent' data, are typically indicators of a bug in the code. You're right that there's no risk of a segfault or buffer overflow from the snippets you listed. But there are fair chances that if you fed $x into round() and it contains whats cooking (string), your code contains a bug. Coercive typing allows 'sensible' conversions to take place, so that if you pass 35.7 (string) to round() it will be accepted without a problem. Strict typing will disallow any input that is not of the exact type that the function expects, so in strict mode, round() will reject it. The point that was raised by Stas and others is that this is likely to push the user to explicitly cast the string to float; Which from that point onwards, happily accept whats cooking, keeping the likely bug undetected. Thats true, but I think where most problems will rise is when dealing with user input, example: Good url myurl.com/?id=10 Bad url myurl.com/?id=somehing+else So in the url example neither coercive or strict are safe, IMHO you as a developer should analyze the input and decide what to do if the value isn't of an expected type. On strict you as a developer decide if casting is an accepted behavior, like when dealing with database output which may return values as string, or reading from config files, but you know the value is (int) compatible, so the casting is safe. Besides, in the v0.5 STH RFC the strict mode is optional. I think both RFC's should join, dual mode coercive/strict :), but I guess that will not be possible until Anthony convinces the coercive camp how strict could be used to do better optimizations. Unless it happens the other way around and is proved with code/patches that same level of optimizations can be reached with coercive. Anyway I just hope for scalar type hints, not just to improve code reliability, but also to gain some performance out of it. At the end I wish the best option is implemented since this is a really impacting feature for the future of the language. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)
Ze'ev, On Sun, Feb 22, 2015 at 6:57 PM, Zeev Suraski z...@zend.com wrote: -Original Message- From: Anthony Ferrara [mailto:ircmax...@gmail.com] Sent: Monday, February 23, 2015 1:35 AM To: Zeev Suraski Cc: PHP internals Subject: Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC) Zeev, And note that this can only work with strict types since you can do the necessary type inference and reconstruction (both forward from a function call, and backwards before it). Please do explain how strict type hints help you do inference that you couldn't do with dynamic type hints. Ultimately, your whole argument hinges on that, but you mention it in parentheses almost as an afterthought. I claim the opposite - you cannot infer ANYTHING from Strict STH that you cannot infer from Coercive STH. Consequently, everything you've shown, down to the C-optimized version of strict_foo() can be implemented in the exact same way for very_lax_foo(). Being able to optimize away the value containers is not unique to languages with strict type hints. It's done in JavaScript JIT engines, and it was done in our JIT POC. I do here: http://news.php.net/php.internals/83504 I'll re-state the specific part in this mail: ?php declare(strict_types=1); function foo(int $int): int { return $int + 1; } function bar(int $something): int { $x = $something / 2; return foo($x); } ^^ In that case, without strict types, you'd have to generate code for both integer and float paths. With strict types, this code is invalid. Ok, but how does that support your case? That alludes to the functionality difference between strict STH and dynamic STH, and perhaps your Static Analysis argument. How does it help you generate better code code? Because strict types makes that an error case. So I can then tell the user to fix it. Once they do (via cast, logic change, etc), I know the types of every variable the entire way through. So I can generate native code for both calls without using variants. Suggesting that the nature of a type hint can help you determine what's the value that's going to be passed to it is akin to saying that the size and shape of a door can tell you something about the person or beast that's standing on the other side. It just can't. It just can't yet it's done all the time. There is working code in the wild that does exactly that. It doesn't tell you what's on the other side (which you seem to be suggesting), but gives you the possibilities **that won't cause error**. So then if you find a possibility from the other direction that isn't in the set of stable possibilities, you can tell the user (because that would be a runtime error). The division case in the example shows that. Let me illustrate it in a less colorful way. Snippet 1: ... code that deals with $input ... foo($input); function foo(int $x) { ... } Snippet 2: ... code that deals with $input ... foo($input); function foo(float $x) { ... } Question: What can you learn from the signatures of foo() in snippet 1 and 2 about the type of $input? Does the fact I changed the function signature from snippet 1 to 2 somehow affects the type of $input? In what way? With strict typing at the foo() call site, it tells you that $input has to be an int or float (respectively between the snippets). And as the static analyzer traces back, if it finds possibilities that don't match (for example, if you assigned it directly from $_POST), it's able to say that either the original assignment or the function call is an error. So yes, it does affect the stable-state types that $input can have. And if we detect an error, we can tell the dev ahead of time about it. And hence they can make the appropriate fix. If I understood you correctly, you're assuming that $input will too come over using a strict type hint, which would tell you that it's an int and therefore safe. But a coercive type hint will do the exact same job. No. I'm assuming that $input came from something that we can infer a type set from. Which is basically anything in the language. You can tell because you know the function foo expects an integer. So you can infer that $x will have to have the type integer due to the future requirement. Which means the expression $something / 2 must also be an integer. We know that's not the case, so we can raise an error here. This is static analysis, not better code generation. And it boils down to a functionality difference, not performance difference. That static analysis enables better code generation. Which is precisely what I said in an earlier post: http://news.php.net/php.internals/83501 And I showed an example of the better code generation. I hope that makes my point a little clearer, Anthony -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)
-Original Message- From: Anthony Ferrara [mailto:ircmax...@gmail.com] Sent: Monday, February 23, 2015 2:25 AM To: Zeev Suraski Cc: PHP internals Subject: Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC) Ze'ev, It's Zeev, thanks :) Because strict types makes that an error case. So I can then tell the user to fix it. Once they do (via cast, logic change, etc), I know the types of every variable the entire way through. So I can generate native code for both calls without using variants. I think we are indeed getting somewhere, I hope. If I understand correctly, effectively the flow you're talking about in your example is this: 1. The developers tries to run the program. 2. It fails because the static analyzer detects float being fed to an int. 3. The user changes the code to convert the input to int. 4. You can now optimize the whole flow better, since you know for a fact it's an int. Did I describe that correctly? With strict typing at the foo() call site, it tells you that $input has to be an int or float (respectively between the snippets). I'm not following. Are you saying that because foo() expects an int or float respectively, $input has to be int or float? What if $input is really a string? Or a MySQL connection? Or are you saying that there was a strict type hint in the function that contains the call to foo(), so we know it's an int/float respectively? If so, how would it be any different with a coercive type hint? I hope that makes my point a little clearer, It actually does, I hope. I think we are getting somewhere, but we're not quite there yet. Thanks, Zeev -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)
-Original Message- From: Anthony Ferrara [mailto:ircmax...@gmail.com] Sent: Monday, February 23, 2015 3:43 AM To: Stanislav Malyshev Cc: Zeev Suraski; Jefferson Gonzalez; PHP internals Subject: Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC) Stas, It is still a performance advantage, because since we know the types are stable at compile time, we can generate far more optimized code (no variant types, native function calls, etc). I don't see where it comes from. So far you said that your compiler would reject some code. That doesn't generate any code, optimized or otherwise. For the code your compiler does not reject, still no advantage over dynamic model. It rejects code because doing code generation on the dynamic case is significantly harder and more resource intensive. Could that be built in? Sure. But it's a very significant difference from generating the static code. I hope I demonstrated in the other email that lists how two use cases would look with coercive type hints, the strategies and implementation of doing the optimizations for those cases where we can infer the type in compile time, would be similar in cost, complexity and resource consumption to the optimizations you're talking about. Even if we keep the handling of all other types as AOTless/JITless, it would still have performance equivalent to the strict case, given inputs that the strict case accepts. Either way, I'm happy we all agree that equally-efficient code can be generated for the dynamic case, which is the point I was making in the Coercive typing RFC. We still have the gap on whether it's truly a lot harder and resource intensive - I don't think it is as we can do the very same things in compile-time - but that's a smaller gap that I personally care less about. I wanted it to be clear to everyone that we can reach the same level of optimizations for Coercive type hints as we can for Strict. And even if we generated native code for the dynamic code, it would still need variants, and hence ZPP at runtime. Hence the static code has a significant performance benefit in that we can indeed bypass type checks as shown in the PECL example a few messages up (more than a few). We can only eliminate the ZPP structure during compile time if we know with certainty what the type is. If we do, we know that for both strict type hints and coercive type hints (i.e. we either managed to prove it's an int in the static analyzer in the strict case, or we managed to deduce what the type is in the coercive case). If we don't - we the ZPP structure it in exactly the same way. Thanks, Zeev -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)
Zeev, So the code after the fix would look like this: ?php declare(strict_types=1); function foo(int $int): int { return $int + 1; } function bar(int $something): int { $x = (int) $something / 2; // (int) or whatever else makes it clear it's an int return foo($x); } ? Let me explain how this could play out with coercive type hints: ?php function foo(int $int): int { return $int + 1; } function bar(int $something): int { $x = $something / 2; return foo($x); } We can all agree that determining the types of just about anything here is ultra-easy, so easy you could do it with a static analyzer, as you suggested. $int and $something are integers, while $x is either an integer or a float. We also know that both foo() and bar() expect integers. What's the optimal code we could generate here? First, on the function body of foo(), we can clearly and easily translate the whole into machine code, as we know we'll get a long and need to return a long. Moving to the caller scope in bar(), given we know $x is either a float or an integer, we could either generate code that calls coerce_to_int($x), or even some optimize machine code that checks zval.type and either uses the lval or converts dval. This can be done in AOT, no need to wait for runtime. Once we know for a fact we have an integer in our hands - we can make the call directly to the optimized foo(), a C level call without the overhead of a PHP function call. Well, yes and no. In this simple example, you could generate the division as float division, then checking the mantissa to determine if it's an int. long bar(long something) { double x = something / 2; if (x != (double)(long)x) { raise_error(); } return foo((long) x); } You're still doubling the number of CPU ops and adding at least one branch at runtime, but not a massive difference. However in general you'd have to use something like div_function and use a union type of some sort. You mention this (about checking zval.type at runtime). My goal would be to avoid using unions at all (and hence no zval). Because that drastically simplifies both compiler and code generator design. Especially for a JIT compiler (local, not tracing) simplified design generally translates to significantly faster runtime. Compare LLVM to libjit: 50x difference in compile time. If you look at the generated code, it's going to be remarkably similar between the two cases. If the developer chooses to pick the casting route, it will look almost identical - except it will be convert_to_long() that is called instead of coerce_to_int(), the former being more aggressive than the latter. I wouldn't even bother with that, I'd just use a C cast (well, the ASM equivalent). Saves function calls, zval representation, etc. Can you see anything impossible or otherwise wrong with my description of how the AOT compiler would work in this case, with coercive type hints? If not, there are no performance benefits for the Strict typed version after the user alters his code to behave similarly to what coercive type hints would bring. It's very much not about impossible. It's about complexity. Strict code is easier to reason about, it's easier to analyze and it's easier to code-generate because all of the reduced amount that you need to support. And we're not talking about making users change their code drastically. We're talking about -in many cases- minor tweaks. Minor tweaks that would need to be done with your proposal as well. So if we're going to require users change their code, why not make it opt-in and give them the predictability that we can? Got you. Is it fair to say that if we got to that case, it no longer matters what type of type hints we have? Once you get to the end, no. Recki-CT proves that. Do you mean that the statement is unfair or that it no longer matters? If it's the former, can you elaborate as to why? No, I meant that Recki proves what you said (once you get to a stable type analysis of even untyped code it doesn't matter the hints exist or not). So when you say it 'must be an int', what you mean is that you assume it needs to be an int, and attempt to either prove that or refute that. Is that correct? If you manage to prove it - you can generate optimal code. If you manage to refute that - the static analyzer will emit an error. If you can't determine - you defer to runtime. Is that correct? Basically yes. Let me describe here too how it may look with coercive hints. Instead of beginning with the assertion that it must be an int, we make no guess as to what it may be(*). We would use the very same methods you would use to prove or refute that it's an int, to determine whether it's an int. Our ability to deduce that it's an int is going to be identical to your ability to prove that it's an int. If we see that it comes from an int type hint, from an int typed