Re: [PHP-DEV] [RFC][Discussion] Parser extension API
On 4/26/2016 10:38 AM, Alexander Lisachenko wrote: > Hello, Nikita! > > It's a great news! Having this stuff in the 7.1 core is awesome! > > But I have several petitions for this API: > 1) Could it avoid using of raw functions for parsing API, like > ast\parse_file or ast\parse_code > 2) Could you please put all the parsing stuff into the `Parser` class and > put it into the `Php\` namespace, as well with node classes? > The current coding standard still prescribes to do so. However, these requests keep popping up and I think that this should be discussed at some point. However, putting everything into a static class just to mimic some OO that you might like makes absolutely no sense. We have multi-paradigm support for a reason: best tool for the job? That being said, the current implementation violates more rules of the coding standard (camelCase vs. snake_case). Currently we have the following: namespace ast { parse_file() parse_code() get_kind_name() kind_uses_flags() class Node { public $kind; public $flags; public $lineno; public $children; } } namespace ast\Node { class Decl extends ast\Node { public $endLineno; public $name; public $docComment; } } From a pure coding standards view and current conventions in core at least the following has to change: ast_parse_file() ast_parse_code() ast_get_kind_name() // ast_kind_name? ast_kind_uses_flags() // ast_kind_has_flags? class AstNode { public $kind; public $flags; public $lineno; public $children; } class AstDecl extends AstNode { public $end_lineno; public $name; public $doc_comment; } Of course usage of the \Php\Ast or \PHP\AST or \Php\AST or \PHP\Ast namespaces is possible but you already see from the four variations that this opens a huge can of worms that would require discussion. Personally I would design the API as follows, given that it is not a goal to have an anemic model in order to support direct changes of the public properties without any kind of validation. If that is an intended functionality things might need to be implemented a bit differently (and the anemic DTO approach could even make sense). Also note that I am modelling it in an OO way because it fits the actual data structure (tree). enum AstKind { /*API = [ ORDINAL, NAME ]*/ FUNCTION = [ 66, 'AST_FUNC_DECL' ]; CLASS = [ 67, 'AST_CLASS' ]; // ... /** @see \AstKind::getName() */ public function __toString(): string; /** bitfield */ public function getFlags(): int; public function getName(): string; public function getOrdinal(): int; public function hasFlags(): bool; } final class AstNode implements Countable, Traversable { private AstKind $kind; private int $line_number; private array $children = []; public function getKind(): AstKind; public function getLineNumber(): int; } final class AstDeclarationNode extends AstNode { private string $doc_comment; private int $end_line_number; private string $name; /** @see \AstDeclarationNode::getName() */ public function __toString(): string; public function getDocComment(): string; public function getEndLineNumber(): int; public function getName(): string; } final class Ast implements Countable, Traversable { private AstNode $root; private function __construct(); public static function fromFile(string $path): Ast; public static function fromCode( string $code, string $filename = 'string code' ): Ast; /** @see \Ast::getDump() */ public function __toString(): string; public function getDump(bool $line_numbers = false): string; } I do not think that it is necessary to use the exact same constant names as are used in the C source (e.g. AST_FUNC_DECL vs. AST_CLASS). The userland API could be more consistent here. You might disagree thought but in the end only the ordinal must be an absolute perfect match. -- Richard "Fleshgrinder" Fussenegger signature.asc Description: OpenPGP digital signature
Re: [PHP-DEV] [RFC][Discussion] Parser extension API
2016-04-26 11:15 GMT+03:00 Nikita Popov: > As an update here, I plan to create an RFC for bundling the php-ast > extension (maybe with minor modifications) with php-src. This was planned > for 7.1 anyway, and with the annotations RFC under discussion, this seems > like a good time. > ... > I'll try to post an RFC for review this week. > Hello, Nikita! It's a great news! Having this stuff in the 7.1 core is awesome! But I have several petitions for this API: 1) Could it avoid using of raw functions for parsing API, like ast\parse_file or ast\parse_code 2) Could you please put all the parsing stuff into the `Parser` class and put it into the `Php\` namespace, as well with node classes?
Re: [PHP-DEV] [RFC][Discussion] Parser extension API
On Tue, Apr 26, 2016 at 9:42 AM, Alexander Lisachenko < lisachenko...@gmail.com> wrote: > Hello, internals! > > I'd like to bring this topic to the discussion again, because now we have > a shiny PHP7 engine and keep moving. PHP grammar becomes more complex and > internal AST representation isn't available for the userland developers, so > all static analysis tools suffering from the lack of native AST API for > that. Only possible way for that is to perform tokenization of source code > and then manually reconstruct an AST (thanks to the PHP-Parser and Nikita > for doing this job for us, developers) > > Several days ago, Dmitry published a RFC for native attributes, which can > be a great tool for building more complex stuff on top of this metadata. > However, all attribute expressions will be stored as an AST nodes, so we > need an AST API again to analyse, parse or compile AST back into the source > code for evaling, etc. > > It would be nice to push php-ast extension (or similar one) into the core, > providing an API via static class, for example "Php\Parser". > As an update here, I plan to create an RFC for bundling the php-ast extension (maybe with minor modifications) with php-src. This was planned for 7.1 anyway, and with the annotations RFC under discussion, this seems like a good time. However, I will limit this RFC to the current state of the extension, i.e. I will not include the ability to inject the AST back into the PHP compiler -- as already mentioned this is very complicated (due to validation concerns) and should be left for future scope. Some concerns like pretty printing (compiling AST back to PHP code) will also be left for userland (e.g. tpunt has a library for this), as doing this in core (on the exported AST) is unnecessarily complicated, fragile and not sufficiently flexible (e.g. with regard to formatting). I'll try to post an RFC for review this week. Regards, Nikita
Re: [PHP-DEV] [RFC][Discussion] Parser extension API
Hello, internals! I'd like to bring this topic to the discussion again, because now we have a shiny PHP7 engine and keep moving. PHP grammar becomes more complex and internal AST representation isn't available for the userland developers, so all static analysis tools suffering from the lack of native AST API for that. Only possible way for that is to perform tokenization of source code and then manually reconstruct an AST (thanks to the PHP-Parser and Nikita for doing this job for us, developers) Several days ago, Dmitry published a RFC for native attributes, which can be a great tool for building more complex stuff on top of this metadata. However, all attribute expressions will be stored as an AST nodes, so we need an AST API again to analyse, parse or compile AST back into the source code for evaling, etc. It would be nice to push php-ast extension (or similar one) into the core, providing an API via static class, for example "Php\Parser". 2015-03-03 19:12 GMT+03:00 Leigh: > On 3 March 2015 at 11:56, Alexander Lisachenko > wrote: > > Good morning! > > > > I have cleaned https://wiki.php.net/rfc/parser-extension-api and > restricted > > it's scope only to the parsing API. Extension API can be implemented > later > > +1 > > > on top of > > > https://github.com/php/php-src/commit/1010b0ea4f4b9f96ae744f04c1191ac228580e48 > > and current implementation, because it requires a lot of discussion and > can > > not be implemented right now. > > I had no idea that zend_ast_process was such a recent addition, and > part of your proposal. I've actually started using it already > completely independently in one of my extensions! > > > 1. Should each node type be represented as personal class? > > There are two possible ways: single node class for everything (current > > proposal) and separate class for every node kind. I have made a quick > > research of AST API in several languages and noticed, that most of them > > generates AST class nodes automatically. E.g. UnaryOperationNode, > > StatementNode... This way is cool, but it requires a lot of classes to be > > loaded and available as API. Pros: SRP of each node class, AST validators > > (can be described with XML and DTD), more clear code analysis (checks > with > > $node instaceof StatementNode), typehints for concrete nodes for > visitors. > > However, PHP is dynamic language and we can use only one class for > > everything, adjusting `kind` property for each node. Pros: single class, > > easier to maintain and to use. Cons: bad UX (how to validate nodes, how > to > > determine number of available children, how to check is node using flags > or > > not, etc) > > I think we need to at least represent all of the current node structures. > > A common base class, and then classes to represent lists, zvals and > decls that extend from this base. > > > 2. Where metadata should be stored (flags, names of kind nodes, relation > > between node types)? This information will be needed later for validation > > of AST > > > > Nikita have some thoughts for the future :) So he asked about the storage > > of metadata to validate an AST and to perform some analysis on it. > Metadata > > should include the following: name of each node kind (this can be just a > > class name of node or constants in the class), node restrictions (which > > kind of node types can be used as children for concrete node; number of > > children nodes), node flag names (it's PUBLIC, PROTECTED, PRIVATE, etc) > > Thinking for the future is fine, but do we need this metadata for the > current proposal? Is the AST returned by the parser in a read-only > state, or can users create their own nodes + children and get a pretty > printed output? If it's the latter then we obviously need to know the > restrictions. > > I think we need a mechanism that keeps names/numbers in sync > automatically, maybe we can use some macros to automatically generate > enums and userland facing details at the same time, so we don't have > to keep several places in sync if/when new AST nodes are added. >
Re: [PHP-DEV] [RFC][Discussion] Parser extension API
On 3 March 2015 at 11:56, Alexander Lisachenko lisachenko...@gmail.com wrote: Good morning! I have cleaned https://wiki.php.net/rfc/parser-extension-api and restricted it's scope only to the parsing API. Extension API can be implemented later +1 on top of https://github.com/php/php-src/commit/1010b0ea4f4b9f96ae744f04c1191ac228580e48 and current implementation, because it requires a lot of discussion and can not be implemented right now. I had no idea that zend_ast_process was such a recent addition, and part of your proposal. I've actually started using it already completely independently in one of my extensions! 1. Should each node type be represented as personal class? There are two possible ways: single node class for everything (current proposal) and separate class for every node kind. I have made a quick research of AST API in several languages and noticed, that most of them generates AST class nodes automatically. E.g. UnaryOperationNode, StatementNode... This way is cool, but it requires a lot of classes to be loaded and available as API. Pros: SRP of each node class, AST validators (can be described with XML and DTD), more clear code analysis (checks with $node instaceof StatementNode), typehints for concrete nodes for visitors. However, PHP is dynamic language and we can use only one class for everything, adjusting `kind` property for each node. Pros: single class, easier to maintain and to use. Cons: bad UX (how to validate nodes, how to determine number of available children, how to check is node using flags or not, etc) I think we need to at least represent all of the current node structures. A common base class, and then classes to represent lists, zvals and decls that extend from this base. 2. Where metadata should be stored (flags, names of kind nodes, relation between node types)? This information will be needed later for validation of AST Nikita have some thoughts for the future :) So he asked about the storage of metadata to validate an AST and to perform some analysis on it. Metadata should include the following: name of each node kind (this can be just a class name of node or constants in the class), node restrictions (which kind of node types can be used as children for concrete node; number of children nodes), node flag names (it's PUBLIC, PROTECTED, PRIVATE, etc) Thinking for the future is fine, but do we need this metadata for the current proposal? Is the AST returned by the parser in a read-only state, or can users create their own nodes + children and get a pretty printed output? If it's the latter then we obviously need to know the restrictions. I think we need a mechanism that keeps names/numbers in sync automatically, maybe we can use some macros to automatically generate enums and userland facing details at the same time, so we don't have to keep several places in sync if/when new AST nodes are added. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [RFC][Discussion] Parser extension API
Good morning! I have cleaned https://wiki.php.net/rfc/parser-extension-api and restricted it's scope only to the parsing API. Extension API can be implemented later on top of https://github.com/php/php-src/commit/1010b0ea4f4b9f96ae744f04c1191ac228580e48 and current implementation, because it requires a lot of discussion and can not be implemented right now. I want to move this RFC to the vote phase soon to decide if it will be included into PHP7.0 or not. So let's discuss open questions before moving it to the vote phase (from SO chat) 1. Should each node type be represented as personal class? There are two possible ways: single node class for everything (current proposal) and separate class for every node kind. I have made a quick research of AST API in several languages and noticed, that most of them generates AST class nodes automatically. E.g. UnaryOperationNode, StatementNode... This way is cool, but it requires a lot of classes to be loaded and available as API. Pros: SRP of each node class, AST validators (can be described with XML and DTD), more clear code analysis (checks with $node instaceof StatementNode), typehints for concrete nodes for visitors. However, PHP is dynamic language and we can use only one class for everything, adjusting `kind` property for each node. Pros: single class, easier to maintain and to use. Cons: bad UX (how to validate nodes, how to determine number of available children, how to check is node using flags or not, etc) 2. Where metadata should be stored (flags, names of kind nodes, relation between node types)? This information will be needed later for validation of AST Nikita have some thoughts for the future :) So he asked about the storage of metadata to validate an AST and to perform some analysis on it. Metadata should include the following: name of each node kind (this can be just a class name of node or constants in the class), node restrictions (which kind of node types can be used as children for concrete node; number of children nodes), node flag names (it's PUBLIC, PROTECTED, PRIVATE, etc) 2015-02-19 10:22 GMT+03:00 Dmitry Stogov dmi...@zend.com: On Thu, Feb 19, 2015 at 8:42 AM, Sara Golemon poll...@php.net wrote: On Wed, Feb 18, 2015 at 11:22 AM, Dmitry Stogov dmi...@zend.com wrote: I think the AST API shouldn't use public properties. Using it, we will have to construct the whole tree of objects, duplicating information from AST. I would propose SimpleXML approach instead - construct object only for node(s) we currently access. So at first we will construct just single object referring to AST root. Then traversing it we will create and destroy objects for necessary nodes. I'm not sure if you've seen my astkit extension, but it does this. https://github.com/sgolemon/astkit I didn't see it before, and took just a quick look, but I like it. Any special reason, why you didn't implement ArrayAccess interface for children access? For me it would look natural. Thanks. Dmitry. -Sara
Re: [PHP-DEV] [RFC][Discussion] Parser extension API
On Wed, 2015-02-18 at 16:17 +0100, Nikita Popov wrote: On Wed, Feb 18, 2015 at 4:06 PM, Cesar Rodas ce...@rodas.me wrote: On 18/02/15 15:59, Nikita Popov wrote: On Wed, Feb 18, 2015 at 7:22 AM, Dmitry Stogov dmi...@zend.com wrote: Hi, On Tue, Feb 17, 2015 at 2:46 PM, Alexander Lisachenko lisachenko...@gmail.com wrote: Hello, internals! I want to introduce a RFC for providing a userland API for accessing an Abstract Syntax Tree of the source code and to provide userland parser hooks for source code modification: https://wiki.php.net/rfc/parser-extension-api Thanks! The first part, describing https://github.com/nikic/php-ast , looks fine. I see no problems including this extension into PHP-7.0 core distribution. May be even making it required (like ext/reflection). Nikita, what do you think? The second part looks very interesting, however it has some uncovered questions. - API for AST modification - AST validation (someone may insert break node in parameter-list). If you have enough experience, I would suggest you to try writing an external extension that would implement this idea. If you'll need some modification in PHP core (e.g adding callbacks), we may consider including them in PHP-7.0. Thanks. Dmitry. I agree with Dmitry. Exporting the AST to userland in PHP 7.0 would be nice. With Dmitry's work on assertions we even have a pretty printer for it, which allows us to convert the AST back to PHP code. In this matter, how would it be? It'd be awesome if function expects an AST tree object instead of a value: function doQuery($table, *$where) { $where = convert_ast_toSqlWhere($where); } doQuery(foobar, $col1 = something AND $col2 == $col3); Or at least `ast(expr)`, although both would be nice to have. The problem here is that in most cases we do not actually know what function will be called at compile time. A very simple example would be to replace doQuery() with $db-query() and already we don't know what it is that we're actually calling and whether or not it needs an AST. However this does not apply to just methods, it's an issue with nearly all calls. As such PHP cannot know at compile time whether it should issue a normal call with evaluated arguments or whether it needs to preserve the AST and pass that. Just keeping the AST around for all calls would be too expensive (it's a pretty big memory hog). So rather than having this as a modifier in the function signature, it would have to be a modifier in the call syntax. E.g. rust uses foo!() syntax for macros. Nikita Wouldn't the trick be to have $col1 be an object instance that uses operator overloading? So instead of solving the expression, operators would return the AST. Was possible with the operator extension Some old experimental code from 2006 echo query($connection, array($catalog-Person-id, $catalog-Titles-title. ' ' . $catalog-Person-name), new SqlJoin(SqlJoin::TYPE_INNER, $catalog-Person, $catalog-Titles, $catalog-Person-titleId == $catalog-Titles-id), $catalog-Person-id == 1); outputs SELECT Person.id, Titles.title || ' ' || Person.name FROM Person INNER JOIN Titles ON Person.titleId = Titles.id WHERE Person.id = 1 Jared -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [RFC][Discussion] Parser extension API
2015-02-18 22:22 GMT+03:00 Dmitry Stogov dmi...@zend.com: I think the AST API shouldn't use public properties. Using it, we will have to construct the whole tree of objects, duplicating information from AST. I would propose SimpleXML approach instead - construct object only for node(s) we currently access. What about Generators? They can be used nicely for handling AST data. And each node can be constructed as needed during the traversal. Probably this way is better than ArrayAccess
Re: [PHP-DEV] [RFC][Discussion] Parser extension API
I think the AST API shouldn't use public properties. Using it, we will have to construct the whole tree of objects, duplicating information from AST. I would propose SimpleXML approach instead - construct object only for node(s) we currently access. So at first we will construct just single object referring to AST root. Then traversing it we will create and destroy objects for necessary nodes. To access children I would propose to implementing ArrayAccess interface. $ast = \php\ast\parse($string); foreach ($ast as $child) { echo \t . $child-getKindName() . \n; foreach ($child as $grandchild) { echo \t . $child-getKindName() . \n; } } Thanks. Dmitry. On Wed, Feb 18, 2015 at 10:06 PM, Nikita Popov nikita@gmail.com wrote: On Wed, Feb 18, 2015 at 4:22 PM, Alexander Lisachenko lisachenko...@gmail.com wrote: 2015-02-18 17:59 GMT+03:00 Nikita Popov nikita@gmail.com: Alexander, I would recommend you to split this into two RFCs, one dealing only with AST export (and maybe pretty printing) and the second one with the compilation hooks. There's probably a few questions about the export API that should be discussed that will be forgotten if everyone focuses on the more complicated, and probably not yet relevant, question of compilation hooks. Hello, Nikita! Thanks for you thoughts! Sounds reasonable for me, because of short timeframe for PHP7. Let's go with the first part of RFC, because second part is really cumbersome and can be applied later, eg. in 7.1 I will split this RFC into two parts and mark first part for 7.0. Great! About first part, do you agree with proposed classes and namespace `Php\Parser`? Yeah, I'm okay with that namespace. I prefer php\ast as being more distinctive from existing projects, but that's not really important. I want to propose to use classes for parser API instead of pure functions because it can give more usable OO-API for future needs. I'm okay with having stuff like -getKindName() on the nodes, however I'd still keep around the free-standing functions, because they can be used without a node (i.e. you only need the kind AST_FOO and not an instantiated node). I also don't see why we need ParserEngine::parse() instead of just a (namespaced) parse() function. Not a fan of unnecessary wrapper classes. There is one more notice regarding to the zval nodes, they should be also exported as objects, not as values. I think we should export coherent nodes for everything. What's the advantage of this? Most leaf nodes will be zval nodes and I feel it would be rather inconvenient if they'd all have extra wrappers. Is there some specific problem with including the values directly? Nikita
Re: [PHP-DEV] [RFC][Discussion] Parser extension API
I'm okay with having stuff like -getKindName() on the nodes, however I'd still keep around the free-standing functions, because they can be used without a node (i.e. you only need the kind AST_FOO and not an instantiated node). I think that getting name of kind is useless without value of kind. According to the RFC, there is only one way from user land to get a kind of node: ask a parser for an AST, that will contain nodes. Is there any reason to do getKindName(1), where 1 - is just integer value. I can't see how it can be useful, this information is fully related to the node. I also don't see why we need ParserEngine::parse() instead of just a (namespaced) parse() function. Not a fan of unnecessary wrapper classes. I think that PHP is ugly enough :) We should try to use namespaces and classes with methods to provide a better experience for user land developers. Classes allow for logical composition of common functionality, provides autocompletion in IDE and can be considered as step to standardization of core API. (Today I discussed this question with colleagues that will be great to have all system classes in Php namespace for PHP=7.0) I know, that on engine level, classes aren't native, but you should think about how this feature will be used by developers. Because, creation of API is one-time process, but imagine how many times it will be used by developers. I think, it will be more polite to provide OO-code for modern API. What's the advantage of this? Most leaf nodes will be zval nodes and I feel it would be rather inconvenient if they'd all have extra wrappers. Is there some specific problem with including the values directly? Yes, reason here is the same as for `token_get_all()`. I hear so many WTFs when value can be either simple character or array. Much harder to work with inconsistent data and many libraries performs additional steps to provide better usability (see https://github.com/Andrewsville/PHP-Token-Reflection/blob/1c5ff91ee6877d1fb0f60fab3e3fc562bdb02684/TokenReflection/Stream/StreamBase.php#L100-L108 or https://github.com/lisachenko/go-aop-php/blob/3c381e243367096d47dc9676b3ad7445d7720ba9/src/Instrument/Transformer/MagicConstantTransformer.php#L117 ) So, each element should be represented as common node, this will simplify usage of API.
Re: [PHP-DEV] [RFC][Discussion] Parser extension API
On Wed, Feb 18, 2015 at 8:22 PM, Dmitry Stogov dmi...@zend.com wrote: I think the AST API shouldn't use public properties. Using it, we will have to construct the whole tree of objects, duplicating information from AST. I would propose SimpleXML approach instead - construct object only for node(s) we currently access. So at first we will construct just single object referring to AST root. Then traversing it we will create and destroy objects for necessary nodes. To access children I would propose to implementing ArrayAccess interface. $ast = \php\ast\parse($string); foreach ($ast as $child) { echo \t . $child-getKindName() . \n; foreach ($child as $grandchild) { echo \t . $child-getKindName() . \n; } } Thanks. Dmitry. I've considered using this approach, but decided against it because it introduces a lot of magic for unclear gains. Lazily creating the objects means we have to keep around the entire AST arena while at least one node referencing it is alive. So if you're analyzing a large project with a few thousand files and keep around a few nodes in some cases, you end up not just with the memory usage of those nodes, but with the memory usage of the entire ASTs. Furthermore in many cases you'll just traverse the entire AST, in which case you'll need to instantiate all nodes anyway and the lazy instantiation will only hurt. In any case, performance of constructing a full AST is pretty good - I don't remember the exact numbers, but ast\parse_code() is only slightly slower than doing token_get_all() on the same code. Nikita
Re: [PHP-DEV] [RFC][Discussion] Parser extension API
On Wed, Feb 18, 2015 at 10:34 PM, Alexander Lisachenko lisachenko...@gmail.com wrote: 2015-02-18 22:22 GMT+03:00 Dmitry Stogov dmi...@zend.com: I think the AST API shouldn't use public properties. Using it, we will have to construct the whole tree of objects, duplicating information from AST. I would propose SimpleXML approach instead - construct object only for node(s) we currently access. What about Generators? They can be used nicely for handling AST data. And each node can be constructed as needed during the traversal. Probably this way is better than ArrayAccess No. Generators allows only sequential access, while ArrayAccess - random. Thanks. Dmitry.
Re: [PHP-DEV] [RFC][Discussion] Parser extension API
On Wed, Feb 18, 2015 at 11:00 PM, Nikita Popov nikita@gmail.com wrote: On Wed, Feb 18, 2015 at 8:22 PM, Dmitry Stogov dmi...@zend.com wrote: I think the AST API shouldn't use public properties. Using it, we will have to construct the whole tree of objects, duplicating information from AST. I would propose SimpleXML approach instead - construct object only for node(s) we currently access. So at first we will construct just single object referring to AST root. Then traversing it we will create and destroy objects for necessary nodes. To access children I would propose to implementing ArrayAccess interface. $ast = \php\ast\parse($string); foreach ($ast as $child) { echo \t . $child-getKindName() . \n; foreach ($child as $grandchild) { echo \t . $child-getKindName() . \n; } } Thanks. Dmitry. I've considered using this approach, but decided against it because it introduces a lot of magic for unclear gains. Lazily creating the objects means we have to keep around the entire AST arena while at least one node referencing it is alive. So if you're analyzing a large project with a few thousand files and keep around a few nodes in some cases, you end up not just with the memory usage of those nodes, but with the memory usage of the entire ASTs. Furthermore in many cases you'll just traverse the entire AST, in which case you'll need to instantiate all nodes anyway and the lazy instantiation will only hurt. In any case, performance of constructing a full AST is pretty good - I don't remember the exact numbers, but ast\parse_code() is only slightly slower than doing token_get_all() on the same code. I think the whole memory usage should be less, because at each point you'll most probably keep only few AST objects, but I understood your arguments. They may make sense, but it's hard to say what is better without comparison of implementations. Thanks. Dmitry. Nikita
Re: [PHP-DEV] [RFC][Discussion] Parser extension API
On Wed, Feb 18, 2015 at 11:22 AM, Dmitry Stogov dmi...@zend.com wrote: I think the AST API shouldn't use public properties. Using it, we will have to construct the whole tree of objects, duplicating information from AST. I would propose SimpleXML approach instead - construct object only for node(s) we currently access. So at first we will construct just single object referring to AST root. Then traversing it we will create and destroy objects for necessary nodes. I'm not sure if you've seen my astkit extension, but it does this. https://github.com/sgolemon/astkit -Sara -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [RFC][Discussion] Parser extension API
On Thu, Feb 19, 2015 at 8:42 AM, Sara Golemon poll...@php.net wrote: On Wed, Feb 18, 2015 at 11:22 AM, Dmitry Stogov dmi...@zend.com wrote: I think the AST API shouldn't use public properties. Using it, we will have to construct the whole tree of objects, duplicating information from AST. I would propose SimpleXML approach instead - construct object only for node(s) we currently access. So at first we will construct just single object referring to AST root. Then traversing it we will create and destroy objects for necessary nodes. I'm not sure if you've seen my astkit extension, but it does this. https://github.com/sgolemon/astkit I didn't see it before, and took just a quick look, but I like it. Any special reason, why you didn't implement ArrayAccess interface for children access? For me it would look natural. Thanks. Dmitry. -Sara
Re: [PHP-DEV] [RFC][Discussion] Parser extension API
On Wed, Feb 18, 2015 at 7:22 AM, Dmitry Stogov dmi...@zend.com wrote: Hi, On Tue, Feb 17, 2015 at 2:46 PM, Alexander Lisachenko lisachenko...@gmail.com wrote: Hello, internals! I want to introduce a RFC for providing a userland API for accessing an Abstract Syntax Tree of the source code and to provide userland parser hooks for source code modification: https://wiki.php.net/rfc/parser-extension-api Thanks! The first part, describing https://github.com/nikic/php-ast , looks fine. I see no problems including this extension into PHP-7.0 core distribution. May be even making it required (like ext/reflection). Nikita, what do you think? The second part looks very interesting, however it has some uncovered questions. - API for AST modification - AST validation (someone may insert break node in parameter-list). If you have enough experience, I would suggest you to try writing an external extension that would implement this idea. If you'll need some modification in PHP core (e.g adding callbacks), we may consider including them in PHP-7.0. Thanks. Dmitry. I agree with Dmitry. Exporting the AST to userland in PHP 7.0 would be nice. With Dmitry's work on assertions we even have a pretty printer for it, which allows us to convert the AST back to PHP code. The bits about intercepting compilation, modifying the AST and importing it back into the compiler are a LOT more complicated. Apart from many open design questions, this will also require quite a bit of implementation effort because we must be very careful with the many assumptions the compiler makes about the structure of the AST (otherwise - segfaults). I'd prefer to delay this to a later PHP version (there's no problem to add it in a minor), preferably with a PoC extension being available beforehand. Alexander, I would recommend you to split this into two RFCs, one dealing only with AST export (and maybe pretty printing) and the second one with the compilation hooks. There's probably a few questions about the export API that should be discussed that will be forgotten if everyone focuses on the more complicated, and probably not yet relevant, question of compilation hooks. Nikita
Re: [PHP-DEV] [RFC][Discussion] Parser extension API
On 18/02/15 15:59, Nikita Popov wrote: On Wed, Feb 18, 2015 at 7:22 AM, Dmitry Stogov dmi...@zend.com wrote: Hi, On Tue, Feb 17, 2015 at 2:46 PM, Alexander Lisachenko lisachenko...@gmail.com wrote: Hello, internals! I want to introduce a RFC for providing a userland API for accessing an Abstract Syntax Tree of the source code and to provide userland parser hooks for source code modification: https://wiki.php.net/rfc/parser-extension-api Thanks! The first part, describing https://github.com/nikic/php-ast , looks fine. I see no problems including this extension into PHP-7.0 core distribution. May be even making it required (like ext/reflection). Nikita, what do you think? The second part looks very interesting, however it has some uncovered questions. - API for AST modification - AST validation (someone may insert break node in parameter-list). If you have enough experience, I would suggest you to try writing an external extension that would implement this idea. If you'll need some modification in PHP core (e.g adding callbacks), we may consider including them in PHP-7.0. Thanks. Dmitry. I agree with Dmitry. Exporting the AST to userland in PHP 7.0 would be nice. With Dmitry's work on assertions we even have a pretty printer for it, which allows us to convert the AST back to PHP code. In this matter, how would it be? It'd be awesome if function expects an AST tree object instead of a value: function doQuery($table, *$where) { $where = convert_ast_toSqlWhere($where); } doQuery(foobar, $col1 = something AND $col2 == $col3); Or at least `ast(expr)`, although both would be nice to have. The bits about intercepting compilation, modifying the AST and importing it back into the compiler are a LOT more complicated. Apart from many open design questions, this will also require quite a bit of implementation effort because we must be very careful with the many assumptions the compiler makes about the structure of the AST (otherwise - segfaults). I'd prefer to delay this to a later PHP version (there's no problem to add it in a minor), preferably with a PoC extension being available beforehand. Alexander, I would recommend you to split this into two RFCs, one dealing only with AST export (and maybe pretty printing) and the second one with the compilation hooks. There's probably a few questions about the export API that should be discussed that will be forgotten if everyone focuses on the more complicated, and probably not yet relevant, question of compilation hooks. Nikita -- César D. Rodas Open Source developer +595-983-161124 PGP: F9ED A265 A3AB C8A1 D145 7368 158A 0336 C707 0AA6 signature.asc Description: OpenPGP digital signature
Re: [PHP-DEV] [RFC][Discussion] Parser extension API
2015-02-18 17:59 GMT+03:00 Nikita Popov nikita@gmail.com: Alexander, I would recommend you to split this into two RFCs, one dealing only with AST export (and maybe pretty printing) and the second one with the compilation hooks. There's probably a few questions about the export API that should be discussed that will be forgotten if everyone focuses on the more complicated, and probably not yet relevant, question of compilation hooks. Hello, Nikita! Thanks for you thoughts! Sounds reasonable for me, because of short timeframe for PHP7. Let's go with the first part of RFC, because second part is really cumbersome and can be applied later, eg. in 7.1 I will split this RFC into two parts and mark first part for 7.0. About first part, do you agree with proposed classes and namespace `Php\Parser`? I want to propose to use classes for parser API instead of pure functions because it can give more usable OO-API for future needs. There is one more notice regarding to the zval nodes, they should be also exported as objects, not as values. I think we should export coherent nodes for everything.
Re: [PHP-DEV] [RFC][Discussion] Parser extension API
Hi, On Tue, Feb 17, 2015 at 2:46 PM, Alexander Lisachenko lisachenko...@gmail.com wrote: Hello, internals! I want to introduce a RFC for providing a userland API for accessing an Abstract Syntax Tree of the source code and to provide userland parser hooks for source code modification: https://wiki.php.net/rfc/parser-extension-api Thanks! The first part, describing https://github.com/nikic/php-ast , looks fine. I see no problems including this extension into PHP-7.0 core distribution. May be even making it required (like ext/reflection). Nikita, what do you think? The second part looks very interesting, however it has some uncovered questions. - API for AST modification - AST validation (someone may insert break node in parameter-list). If you have enough experience, I would suggest you to try writing an external extension that would implement this idea. If you'll need some modification in PHP core (e.g adding callbacks), we may consider including them in PHP-7.0. Thanks. Dmitry.
Re: [PHP-DEV] [RFC][Discussion] Parser extension API
2015-02-17 16:35 GMT+03:00 Rowan Collins rowan.coll...@gmail.com: I think should is OK for extensions, but not for userland hooks. There needs to be a very firm definition of the right and wrong way to implement these hooks, which can at the very least warn the user when they are slipping into undefined behaviour. Thanks for this point. Added this question to the list of open issues in RFC. My vision is to update a documentation with clear description, why parser extensions should be registered statically without conditions. Parsing process and limitations should be also described in the RFC. 2015-02-17 16:35 GMT+03:00 Rowan Collins rowan.coll...@gmail.com: The other alternative is to go to the other extreme, and have the extensions scoped to a particular file, more like Perl pragmas. Dare I say we could use the declare() syntax that everyone seems to have an irrational hatred of? declare(pragma=Example\DbcParserExtension); Ouch ) Don't like this way at all, because extensions should process all files, even if it's not marked with some hints. What I want to see is PHP analog of Java bytecode compilers, they can transform an original source code into specific tokens and provide custom extensions on top of that. So, I want to choose between php.ini option and explicit methods. Let's see another opinions. Thanks!
Re: [PHP-DEV] [RFC][Discussion] Parser extension API
Alexander Lisachenko wrote on 17/02/2015 14:21: 2015-02-17 16:35 GMT+03:00 Rowan Collins rowan.coll...@gmail.com mailto:rowan.coll...@gmail.com: The other alternative is to go to the other extreme, and have the extensions scoped to a particular file, more like Perl pragmas. Dare I say we could use the declare() syntax that everyone seems to have an irrational hatred of? declare(pragma=Example\DbcParserExtension); Ouch ) Don't like this way at all, because extensions should process all files, even if it's not marked with some hints. What I want to see is PHP analog of Java bytecode compilers, they can transform an original source code into specific tokens and provide custom extensions on top of that. So, I want to choose between php.ini option and explicit methods. Let's see another opinions. It's interesting that you used the word extension there. If the hook can only be registered for in php.ini (and it would need to be PHP_INI_SYSTEM if you want to avoid OpCache needing to vary its cache on different settings) then these are basically extensions which happen to be written in PHP - you install them globally, they execute unconditionally, and they do things which can't be done any other way. In that case, it would maybe make more sense to make this part of the simplified extension API discussed in a previous thread, rather than exposing it to userland at all. The only abilities you'd lose are ones which should never be used anyway, e.g. statically registering an extension, then dynamically changing its behaviour at runtime by altering global/static variables. I think the same argument applies to per-file pragmas over per-install settings, as to Dependency Injection over global state - you should be able to reason about each module of code in isolation, reuse it in different combinations, etc, etc. What if you want to use two different DbC extensions on the same server, and they collide with each other? What would the error look like if you failed to install an extension that was required to process your source code as intended? A declare line (or Perl's use, or Python's import) gives a clear point where that requirement is defined. Regards, -- Rowan Collins [IMSoP]
Re: [PHP-DEV] [RFC][Discussion] Parser extension API
Alexander Lisachenko wrote on 17/02/2015 12:41: 2015-02-17 15:29 GMT+03:00 Benjamin Eberlei kont...@beberlei.de: Well not really, depending on the path towards a require a parser extension is already registered or not. Or you have conditional registration of extensions. So you could load a file with an extension registered, but it still serves the old opcodes for that file. Technically, this will be possible, however, Parser Extensions should be registered ASAP during bootstrap process, near spl_autoload_register(), stream_wrapper_register(), stream_filter_register(), etc.. I think should is OK for extensions, but not for userland hooks. There needs to be a very firm definition of the right and wrong way to implement these hooks, which can at the very least warn the user when they are slipping into undefined behaviour. Note that spl_autoload_register doesn't affect OpCache because it only changes the mapping of a class to a parsed file (keyed on the file system path to the source), not the compiled contents of that file. If AST hooks can be registered on a global scope at any time, then compiling the same file can lead to different op codes at different times (e.g. page loads which take a different path through the code), which would mean the OpCache would need extra information in its keys to cache each possible version. One more possible way for this RFC is to remove these register/unregister methods from the `Php\Parser\Engine` class and add an option to the php.ini with list of classes to load. This option can be adjusted then per project/directory. Is this more suitable? The other alternative is to go to the other extreme, and have the extensions scoped to a particular file, more like Perl pragmas. Dare I say we could use the declare() syntax that everyone seems to have an irrational hatred of? declare(pragma=Example\DbcParserExtension); Regards, -- Rowan Collins [IMSoP] -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [RFC][Discussion] Parser extension API
Hello, Lars! 2015-02-17 22:09 GMT+03:00 Lars Strojny l...@strojny.net: Looks cool and I could see a couple of interesting possibilities arising. One thing: any particular reason ExtensionInterface is static? I could see a couple of benefits having extensions carry state and registerExtension() taking an instance of ExtensionInterface, not a class name. What do you think? Thanks for the positive feedback! My first thought was to use instances for extensions: https://gist.github.com/lisachenko/ffcfdec4c46e01864b33. However, I have a doubts that this will introduce additional possibilities for side-effects, when extensions can be configured differently/loaded/unloaded. In this thread some people noticed this fact too and suggested to add a restrictions for preventing conditional parsing of the source code. So, static method and static registration was chosen to reduce a number of possible conditional extensions. This has some disadvantages (testing, configuration, etc), but should give better experience with parsing behavior.
Re: [PHP-DEV] [RFC][Discussion] Parser extension API
Hi Alexander, On 17 Feb 2015, at 12:46, Alexander Lisachenko lisachenko...@gmail.com wrote: Hello, internals! I want to introduce a RFC for providing a userland API for accessing an Abstract Syntax Tree of the source code and to provide userland parser hooks for source code modification: https://wiki.php.net/rfc/parser-extension-api Looks cool and I could see a couple of interesting possibilities arising. One thing: any particular reason ExtensionInterface is static? I could see a couple of benefits having extensions carry state and registerExtension() taking an instance of ExtensionInterface, not a class name. What do you think? cu, Lars signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [PHP-DEV] [RFC][Discussion] Parser extension API
On 17 February 2015 at 11:46, Alexander Lisachenko lisachenko...@gmail.com wrote: Hello, internals! I want to introduce a RFC for providing a userland API for accessing an Abstract Syntax Tree of the source code and to provide userland parser hooks for source code modification: https://wiki.php.net/rfc/parser-extension-api Thanks! Re: Userland representation of AST. I use Nikita's PHP-Parser for static analysis already, so my first reaction is that I like the idea of being able to obtain the canonical AST from the engine, that is automatically updated as new features are added. Does the AST provided by the internal parser provide all of the information required to be able to turn it back into source code? Re: Extending the parser from PHP userland. This I don't like so much. To be honest I hoped this would be an API to extend the parser from PHP extensions :) I think letting userland fiddle directly with the compilation process is just asking for trouble. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [RFC][Discussion] Parser extension API
2015-02-17 15:06 GMT+03:00 Leigh lei...@gmail.com: Does the AST provided by the internal parser provide all of the information required to be able to turn it back into source code? Yes, this should be possible, via zend_emit_op*() and zend_compile*() 2015-02-17 15:06 GMT+03:00 Leigh lei...@gmail.com: Re: Extending the parser from PHP userland. This I don't like so much. To be honest I hoped this would be an API to extend the parser from PHP extensions :) I think letting userland fiddle directly with the compilation process is just asking for trouble. Expected reaction ) However, it would be nice to have such API on userland too (on engine level this can be a hook too). This will give an instrument for building custom things on user side, but will keep PHP engine clear from a lot of stuff, which can be implemented by users. Nice example is Design-By-Contract validation, it can be easily done via parser hooks.
Re: [PHP-DEV] [RFC][Discussion] Parser extension API
2015-02-17 15:09 GMT+03:00 Benjamin Eberlei kont...@beberlei.de: The visitor API is the essential part here and you left it out. Yes, I decided not to put Visitor in the RFC (this was added as open question to the RFC). But this part can be discussed in the case of general acceptance. 2015-02-17 15:09 GMT+03:00 Benjamin Eberlei kont...@beberlei.de: How does this work with Engine Extensions, are they considered for all require/include's *after* the registrtion? How does this work with opcache? Yes, parser extensions will be called for all require/include/evals after registration. This part is transparent for opcache, because opcache just stores an opcodes for the file. AST is parsed only once for each file, then hooks can transform the AST and after that compiler will produce a final opcodes that can be stored for that file in the opcache. So, I expect no impact on opcache logic.
Re: [PHP-DEV] [RFC][Discussion] Parser extension API
2015-02-17 15:29 GMT+03:00 Benjamin Eberlei kont...@beberlei.de: Well not really, depending on the path towards a require a parser extension is already registered or not. Or you have conditional registration of extensions. So you could load a file with an extension registered, but it still serves the old opcodes for that file. Technically, this will be possible, however, Parser Extensions should be registered ASAP during bootstrap process, near spl_autoload_register(), stream_wrapper_register(), stream_filter_register(), etc.. So, no conditional code hooks should be used for that. But this can be the consistent with stream wrappers, filters and autoloaders. Developer can decide what he need to enable some features. One more possible way for this RFC is to remove these register/unregister methods from the `Php\Parser\Engine` class and add an option to the php.ini with list of classes to load. This option can be adjusted then per project/directory. Is this more suitable? Thanks!
Re: [PHP-DEV] [RFC][Discussion] Parser extension API
On 17 February 2015 at 12:22, Alexander Lisachenko lisachenko...@gmail.com wrote: Yes, parser extensions will be called for all require/include/evals after registration. This part is transparent for opcache, because opcache just stores an opcodes for the file. AST is parsed only once for each file, then hooks can transform the AST and after that compiler will produce a final opcodes that can be stored for that file in the opcache. So, I expect no impact on opcache logic. So: Internal Parse AST - Extension Parse AST - Generate OpCodes - OpCache? So if the parser extension does anything dynamic, then OpCache will have the wrong version cached. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [RFC][Discussion] Parser extension API
On Tue, Feb 17, 2015 at 12:46 PM, Alexander Lisachenko lisachenko...@gmail.com wrote: Hello, internals! I want to introduce a RFC for providing a userland API for accessing an Abstract Syntax Tree of the source code and to provide userland parser hooks for source code modification: https://wiki.php.net/rfc/parser-extension-api 1. The visitor API is the essential part here and you left it out. 2. How does this work with Engine Extensions, are they considered for all require/include's *after* the registrtion? How does this work with opcache? Thanks!
Re: [PHP-DEV] [RFC][Discussion] Parser extension API
On Tue, Feb 17, 2015 at 1:22 PM, Alexander Lisachenko lisachenko...@gmail.com wrote: 2015-02-17 15:09 GMT+03:00 Benjamin Eberlei kont...@beberlei.de: The visitor API is the essential part here and you left it out. Yes, I decided not to put Visitor in the RFC (this was added as open question to the RFC). But this part can be discussed in the case of general acceptance. 2015-02-17 15:09 GMT+03:00 Benjamin Eberlei kont...@beberlei.de: How does this work with Engine Extensions, are they considered for all require/include's *after* the registrtion? How does this work with opcache? Yes, parser extensions will be called for all require/include/evals after registration. This part is transparent for opcache, because opcache just stores an opcodes for the file. AST is parsed only once for each file, then hooks can transform the AST and after that compiler will produce a final opcodes that can be stored for that file in the opcache. So, I expect no impact on opcache logic. Well not really, depending on the path towards a require a parser extension is already registered or not. Or you have conditional registration of extensions. So you could load a file with an extension registered, but it still serves the old opcodes for that file.
Re: [PHP-DEV] [RFC][Discussion] Parser extension API
2015-02-17 15:48 GMT+03:00 Leigh lei...@gmail.com: So: Internal Parse AST - Extension Parse AST - Generate OpCodes - OpCache? Yes, it's correct flow. 2015-02-17 15:48 GMT+03:00 Leigh lei...@gmail.com: So if the parser extension does anything dynamic, then OpCache will have the wrong version cached. This restriction is intended by design, no dynamic changes are allowed. However, if you need, then you can disable an opcode cacher and do anything you want. Your example is actual for production mode of PHP too, when opcache.validate_timestamps=0 or apc.stat=off. Parsing will be performed only once, then you need to clear cache manually to reload changes from the file system. So, parser extension works transparently and used only when needed to process an AST.