[PHP-DEV] Modify language grammar to allow trailing commas in function/method calls

2008-07-21 Thread Evan Priestley
This was floated in 2003 but had weak advocation and didn't seem to  
come to a decisive resolution:


http://marc.info/?l=php-internalsm=106685833011253w=2

Basically, the proposal is to modify the grammar to allow trailing  
commas in function and method calls, so this becomes a parseable PHP  
construct:


f(1, 2, 3,);

This patch applies only to function and method calls; it does not  
apply to function or method definitions. It also does not allow the  
degenerative case of f(,).


The real value of relaxing this rule is in nontrivial cases that span  
across multiple lines:


sprintf(
'long example pattern with %d conversions: %s',
$several,
$conversions
);

This sort of construction makes the code more readable but also  
exposes you to trailing-comma errors. You can easily introduce an  
error either by removing the last parameter or by changing parameter  
order. At least in my experience, these are relatively common syntax  
errors, and ones that are easy to make when making edits that are  
apparently minor. In the above example, I can remove the %s  
conversion, remove the $conversions parameter, and inadvertently  
introduce a syntax error because I have neglected to remove the comma  
after $several. Similarly, I can change the parameter order by  
modifying the string and swapping $several for $conversions, but  
introduce a syntax error by neglecting to move the comma.


In the current grammar, if a function or method call has been written  
with parameters broken across multiple lines, adding or removing a  
trailing parameter means you need to modify two lines of code instead  
of one. This muddies and reduces the value of blame features in  
revision control. It also makes diffs slightly larger and noisier,  
which can make code reviews take a little longer since you have to  
scan the line and make sure the only change is to the trailing comma.


The looser grammar is easier to use and more consistent: it's easier  
to add, remove, or move parameters in an editor since you only need  
to use line-oriented editor operations, and you don't need to  
mentally distinguish between an array context and a parameter context.


In the original thread, Andi Gutmans explained that the decision to  
allowing trailing commas in array() literals is supported by the  
argument that it makes code generation easier. It seems like this  
argument applies to trailing commas in function definitions just as  
easily, and that the general ease-of-use arguments laid out above  
provide at least as much value as this (particularly since this  
specific code generation problem is solvable with implode()).


Andi also argued that this reduces readability and prevents assigning  
semantics to a trailing comma in the future. While I agree that it  
reduces readability in the f(1, 2,) case, I disagree that it  
reduces readability in the less trivial multi-line case and it  
greatly enhances writeability. The possibility that the language  
would ever benefit from assigning semantics here, while worth  
considering, seems small. While my language architect credentials are  
pretty weak, I can't think of any reasonable meaning. Many languages  
accept trailing commas in data definitions (apparently including Java 
(!) now[1]), and some (such as Python[2]) accept them in calls; in  
all cases, the behavior is to ignore them. If PHP provided a  
different semantic, this would be somewhat startling.


In general, trailing commas are increasingly an accepted part of the  
grammars of modern languages. Beyond Java, Ruby and Python, Firefox  
now accepts them in Javascript object definitions (which is surefire  
way to tell that someone didn't test in IE). While they are more  
often accepted in data definitions than calls, I don't see a strong  
reason to distinguish between the cases.


This change has no impact on backward compatibility. It does makes it  
slightly more difficult to write code which runs across multiple  
versions of PHP. However, because it fails fast and explicitly, it's  
an error which is easy to detect and resolve when you decide you want  
to support more versions of PHP with your project. It's also  
straightforward to write a script that uses the tokenizer to safely  
and unambiguously remove trailing commas (I'd be happy to furnish  
such a script if people think there's value in it and there's a  
reasonable place to put it).


The diff in the original thread still seems correct, at least against  
a relatively recent release -- I applied it to PHP 5.2.5 and ran the  
tests, as well as verifying that the build could parse and execute  
code which used trailing commas in calls.


So, what's the feeling on this? We're trying to weigh the merits of  
rolling it into our stack at Facebook, but we'd feel a lot more  
comfortable if it was present upstream.


Thanks,
Evan Priestley

[1] http://java.sun.com/docs

Re: [PHP-DEV] Pushing PHP Into The Web 2.0 Generation

2006-04-18 Thread Evan Priestley
I'm not sure if this whole thing is supposed to be tongue-in-cheek or  
not, but making the language more human seems, to me, to be a  
fairly weak argument for making numeric and string literals like `5'  
and `smile' into objects, adding closures, and altering array  
syntax. The given examples seem particularly unconvincing -- how does  
`5-times()' differ from `5'? Does Socket.new know that port 80 is  
HTTP, and implement a full server for the protocol? How do you set  
blocking or timeouts on the socket, or respond with anything but a  
static string, or describe whether or not the connection is closed  
afterward? What about when you need a new protocol?


It's easy to transform trivial concepts like this port 80 responder  
into English (or something near to it), but English is poorly suited  
for describing the solutions to many programming problems. To give an  
example, RFC822 is 47 pages[1] long, and a good portion of it deals  
just with what an e-mail address looks like. In fact, the document  
opts /not/ to use English to specify its information, but to  
reference /another/ document which uses English to define BNF[2], a  
grammar notation. RFC822 then uses a mixture of BNF and English to  
describe the format of an e-mail address.


English is quite poorly suited to many of the problem domains  
commonly encountered in software development. Languages like PHP  
(and, particularly, regular expressions) are terse because they need  
to be -- English will never be able to differentiate between valid  
and invalid e-mail addresses in much less than 47 pages. And  
describing, say, the matrix algebra behind a 3D graphics engine in  
English would be a daunting task.


I also disagree that `do { smile() } ( 5-times() );' (or whatever  
you're suggesting) is more human than `for( $ii = 0; $ii  5; $ii+ 
+ ) { smile(); }'. It may be more immediately understandable to an  
untrained English speaker, but PHP developers aren't untrained  
English speakers -- they have invested time in learning a language  
which is more terse, powerful, and capable than English in a specific  
set of problem domains (that is, PHP is probably better for serving  
web pages, while English is probably better for talking about your  
feelings or writing poetry or ordering a beer). To a PHP developer, a  
for loop is a familiar idiom, even if its meaning isn't obvious to an  
English speaker.


The onus to push PHP into the web 2.0 generation is on the PHP  
application developer, not the PHP language developer. As an example,  
take 30boxes.com, an Ajax Calendar written in PHP. You can type eat  
pie every thursday until june 22 into the box and it will schedule  
the event properly. In my mind, this is very user-friendly and  
human, but it's at the interface level, not at the implementation  
level.


In my mind, PHP *language* developers should focus on making PHP the  
best language it can be for /developers/ (who work in PHP), while PHP  
*application* developers should focus on making /their applications/  
the best applications they can be for /users/ (who work in English).  
The time required to learn PHP is relatively small compared to the  
time someone may spend developing in it, so changes which lower the  
barrier to entry (which is already very, very low) but are  
detrimental to the long-term user need much stronger justification  
than is present here.


There may be good arguments for making strings into objects, but any  
argument for such a drastic change to the language needs to be  
circumspect and pragmatic, and consider performance impact (probably  
excruciating), how it would work with loose typing (`$a = 'pie'; $a- 
times();'), how much work is required (a lot) and who is actually  
going to do the work (you), etc., and offer concrete advantages if it  
endeavors to be convincing. In my mind, metaphors about icebergs fail  
to do this.


Anyway, I laughed, but I wasn't sure if I was supposed to.

Evan

[1] http://www.ietf.org/rfc/rfc0822.txt?number=822
[2] http://cui.unige.ch/db-research/Enseignement/analyseinfo/ 
AboutBNF.html


On Apr 18, 2006, at 6:25 AM, James Crane wrote:


I've written a short paper on the future of PHP and I'd appreciate it
if you folks would take a look at it and exchange your thoughts with
me.

http://www.maraby.com/papers/pushing_php_into_the_web_20_generation

Don't be too skeptical. ;)

Cheers!

M.T.

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] 'odd' handling of $moo['bar'] where $moo is a string

2006-02-09 Thread Evan Priestley

When indexing a string, the string is cast to an integer:
(int)'file' == 0

When in a conditional expression, the string is cast to a boolean:
(bool)'file' == true

It isn't necessarily a fatal error, consider:
$keys[ '3' ]

- Evan


On Feb 9, 2006, at 4:54 PM, Ian P. Christian wrote:


Can someone explain the following to me?

echo DEBUG:\n;
var_dump($keys);
var_dump($keys['file']);
var_dump(isset($keys['NOTEXISTING']));
var_dump(isset($keys['file']));
exit;


DEBUG:
string(19) myConsoleController
string(1) m
bool(true)
bool(true)


Now, it seems that $keys['file'] == $keys[0], which makes sense why  
the issets

return true. However...
if ('file') echo 'true';
will print moo, therefore 'file' == 1, not 0. Why is this different  
when using

it as a string offset?
IMO, using a string as a string offset, a fatal error should be  
raised.


Kind Regards,

--
Ian P. Christian
http://pookey.co.uk

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php