Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)

2024-04-01 Thread Alexander Pravdin
On Tue, Apr 2, 2024 at 9:18 AM Ilija Tovilo  wrote:
>
> Hi everyone!
>
> I'd like to introduce an idea I've played around with for a couple of
> weeks: Data classes, sometimes called structs in other languages (e.g.
> Swift and C#).
>
> ```php
> data class Vector {
> private $values;
>
> public function __construct(...$values) {
> $this->values = $values;
> }
>
> public mutating function append($value) {
> $this->values[] = $value;
> }
> }
>
> $a = new Vector(1, 2, 3);
> $b = $a;
> $b->append!(4);
> var_dump($a); // Vector(1, 2, 3)
> var_dump($b); // Vector(1, 2, 3, 4)
> ```
>


While I like the idea, I would like to suggest something else in
addition or as a separate feature. As an active user of readonly
classes with all promoted properties for data-holding purposes, I
would be happy to see the possibility of cloning them with passing
some properties to modify:

readonly class Data {
function __construct(
public string $foo,
public string $bar,
public string $baz,
) {}
}

$data = new Data(foo: 'A', bar: 'B', baz: 'C');

$data2 = clone $data with (bar: 'X', baz: 'Y');

Under the hood, this "clone" will copy all values of promoted
properties as is but modify some of them to custom values specified by
the user. The implementation of this functionality in the userland
destroys the beauty of readonly classes with promoted properties.
Manual implementation requires a lot of code lines while bringing no
sense to users who read this code. Cloning methods are bigger than the
meaningful part of the class - the constructor with properties
declaration. Because I have to redeclare all the properties in the
method arguments and then initialize each property with a
corresponding value. I love readonly classes with promoted properties
for data-holding purposes and the above feature is the only one I'm
missing to be completely happy.

In my personal experience, I never needed to copy data classes like
arrays, the immutability protects against unwanted changes enough. But
copying references helps to save memory, some datasets I work with can
be very big.

--
Best,
Alex


Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)

2024-04-01 Thread Deleu
On Mon, Apr 1, 2024 at 9:20 PM Ilija Tovilo  wrote:

> Hi everyone!
>
> I'd like to introduce an idea I've played around with for a couple of
> weeks: Data classes, sometimes called structs in other languages (e.g.
> Swift and C#).
>
> In a nutshell, data classes are classes with value semantics.
> Instances of data classes are implicitly copied when assigned to a
> variable, or when passed to a function. When the new instance is
> modified, the original instance remains untouched. This might sound
> familiar: It's exactly how arrays work in PHP.
>
> ```php
> $a = [1, 2, 3];
> $b = $a;
> $b[] = 4;
> var_dump($a); // [1, 2, 3]
> var_dump($b); // [1, 2, 3, 4]
> ```
>
> You may think that copying the array on each assignment is expensive,
> and you would be right. PHP uses a trick called copy-on-write, or CoW
> for short. `$a` and `$b` actually share the same array until `$b[] =
> 4;` modifies it. It's only at this point that the array is copied and
> replaced in `$b`, so that the modification doesn't affect `$a`. As
> long as a variable is the sole owner of a value, or none of the
> variables modify the value, no copy is needed. Data classes use the
> same mechanism.
>
> But why value semantics in the first place? There are two major flaws
> with by-reference semantics for data structures:
>
> 1. It's very easy to forget cloning data that is referenced somewhere
> else before modifying it. This will lead to "spooky actions at a
> distance". Having recently used JavaScript (where all data structures
> have by-reference semantics) for an educational IR optimizer,
> accidental mutations of shared arrays/maps/sets were my primary source
> of bugs.
> 2. Defensive cloning (to avoid issue 1) will lead to useless work when
> the value is not referenced anywhere else.
>
> PHP offers readonly properties and classes to address issue 1.
> However, they further promote issue 2 by making it impossible to
> modify values without cloning them first, even if we know they are not
> referenced anywhere else. Some APIs further exacerbate the issue by
> requiring multiple copies for multiple modifications (e.g.
> `$response->withStatus(200)->withHeader('X-foo', 'foo');`).
>
> As you may have noticed, arrays already solve both of these issues
> through CoW. Data classes allow implementing arbitrary data structures
> with the same value semantics in core, extensions or userland. For
> example, a `Vector` data class may look something like the following:
>
> ```php
> data class Vector {
> private $values;
>
> public function __construct(...$values) {
> $this->values = $values;
> }
>
> public mutating function append($value) {
> $this->values[] = $value;
> }
> }
>
> $a = new Vector(1, 2, 3);
> $b = $a;
> $b->append!(4);
> var_dump($a); // Vector(1, 2, 3)
> var_dump($b); // Vector(1, 2, 3, 4)
> ```
>
> An internal Vector implementation might offer a faster and stricter
> alternative to arrays (e.g. Vector from php-ds).
>
>
Exciting times to be a PHP Developer!


> Some other things to note about data classes:
>
> * Data classes are ordinary classes, and as such may implement
> interfaces, methods and more. I have not decided whether they should
> support inheritance.
>

I'd argue in favor of not including inheritance in the first version.
Taking inheritance out is an impossible BC Break. Not introducing it in the
first stable release gives users a chance to evaluate whether it's
something we will drastically miss.


> * Mutating method calls on data classes use a slightly different
> syntax: `$vector->append!(42)`. All methods mutating `$this` must be
> marked as `mutating`. The reason for this is twofold: 1. It signals to
> the caller that the value is modified. 2. It allows `$vector` to be
> cloned before knowing whether the method `append` is modifying, which
> hugely reduces implementation complexity in the engine.
>

I'm not sure if I understood this one. Do you mean that the `!` modifier
here (at call-site) is helping the engine clone the variable before even
diving into whether `append()` has been tagged as mutating? From outside it
looks odd that a clone would happen ahead-of-time while talking about
copy-on-write. Would this syntax break for non-mutating methods?


> * Data classes customize identity (`===`) comparison, in the same way
> arrays do. Two data objects are identical if all their properties are
> identical (including order for dynamic properties).
> * Sharing data classes by-reference is possible using references, as
> you would for arrays.
> * We may decide to auto-implement `__toString` for data classes,
> amongst other things. I am still undecided whether this is useful for
> PHP.
> * Data classes protect from interior mutability. More concretely,
> mutating nested data objects stored in a `readonly` property is not
> legal, whereas it would be if they were ordinary objects.
> * In the future, it should be possible to allow using data classes in
> `SplObjectStorage`. 

[PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)

2024-04-01 Thread Ilija Tovilo
Hi everyone!

I'd like to introduce an idea I've played around with for a couple of
weeks: Data classes, sometimes called structs in other languages (e.g.
Swift and C#).

In a nutshell, data classes are classes with value semantics.
Instances of data classes are implicitly copied when assigned to a
variable, or when passed to a function. When the new instance is
modified, the original instance remains untouched. This might sound
familiar: It's exactly how arrays work in PHP.

```php
$a = [1, 2, 3];
$b = $a;
$b[] = 4;
var_dump($a); // [1, 2, 3]
var_dump($b); // [1, 2, 3, 4]
```

You may think that copying the array on each assignment is expensive,
and you would be right. PHP uses a trick called copy-on-write, or CoW
for short. `$a` and `$b` actually share the same array until `$b[] =
4;` modifies it. It's only at this point that the array is copied and
replaced in `$b`, so that the modification doesn't affect `$a`. As
long as a variable is the sole owner of a value, or none of the
variables modify the value, no copy is needed. Data classes use the
same mechanism.

But why value semantics in the first place? There are two major flaws
with by-reference semantics for data structures:

1. It's very easy to forget cloning data that is referenced somewhere
else before modifying it. This will lead to "spooky actions at a
distance". Having recently used JavaScript (where all data structures
have by-reference semantics) for an educational IR optimizer,
accidental mutations of shared arrays/maps/sets were my primary source
of bugs.
2. Defensive cloning (to avoid issue 1) will lead to useless work when
the value is not referenced anywhere else.

PHP offers readonly properties and classes to address issue 1.
However, they further promote issue 2 by making it impossible to
modify values without cloning them first, even if we know they are not
referenced anywhere else. Some APIs further exacerbate the issue by
requiring multiple copies for multiple modifications (e.g.
`$response->withStatus(200)->withHeader('X-foo', 'foo');`).

As you may have noticed, arrays already solve both of these issues
through CoW. Data classes allow implementing arbitrary data structures
with the same value semantics in core, extensions or userland. For
example, a `Vector` data class may look something like the following:

```php
data class Vector {
private $values;

public function __construct(...$values) {
$this->values = $values;
}

public mutating function append($value) {
$this->values[] = $value;
}
}

$a = new Vector(1, 2, 3);
$b = $a;
$b->append!(4);
var_dump($a); // Vector(1, 2, 3)
var_dump($b); // Vector(1, 2, 3, 4)
```

An internal Vector implementation might offer a faster and stricter
alternative to arrays (e.g. Vector from php-ds).

Some other things to note about data classes:

* Data classes are ordinary classes, and as such may implement
interfaces, methods and more. I have not decided whether they should
support inheritance.
* Mutating method calls on data classes use a slightly different
syntax: `$vector->append!(42)`. All methods mutating `$this` must be
marked as `mutating`. The reason for this is twofold: 1. It signals to
the caller that the value is modified. 2. It allows `$vector` to be
cloned before knowing whether the method `append` is modifying, which
hugely reduces implementation complexity in the engine.
* Data classes customize identity (`===`) comparison, in the same way
arrays do. Two data objects are identical if all their properties are
identical (including order for dynamic properties).
* Sharing data classes by-reference is possible using references, as
you would for arrays.
* We may decide to auto-implement `__toString` for data classes,
amongst other things. I am still undecided whether this is useful for
PHP.
* Data classes protect from interior mutability. More concretely,
mutating nested data objects stored in a `readonly` property is not
legal, whereas it would be if they were ordinary objects.
* In the future, it should be possible to allow using data classes in
`SplObjectStorage`. However, because hashing is complex, this will be
postponed to a separate RFC.

One known gotcha is that we cannot trivially enforce placement of
`modfying` on methods without a performance hit. It is the
responsibility of the user to correctly mark such methods.

Here's a fully functional PoC, excluding JIT:
https://github.com/php/php-src/pull/13800

Let me know what you think. I will start working on an RFC draft once
work on property hooks concludes.

Ilija


Re: [PHP-DEV] [RFC] Invoke __callStatic when non-static public methods are called statically

2024-04-01 Thread Rowan Tommins [IMSoP]

On 29/03/2024 18:14, Robert Landers wrote:

When generating proxies for existing types, you often need to share
some state between the proxies. To do that, you put static
methods/properties on the proxy class and hope to the PHP Gods that
nobody will ever accidentally name something in their concrete class
with the name you chose for things. To help with that, you create some
kind of insane prefix.



Separating static and non-static methods wouldn't solve this - the 
concrete class could equally add a static method with the same name but 
a different signature, and your generated proxy would fail to compile.


In fact, exactly the same thing happens with instance methods in testing 
libraries: test doubles have a mixture of methods for configuring mock / 
spy behaviour, and methods mimicking or forwarding calls to the real 
interface / class. Those names could collide, and require awkward 
workarounds.


In a statically typed language, a concrete class can have two methods 
with the same name, but different static types, e.g. when explicitly 
implementing interfaces. In a "duck typing" system like PHP's, that's 
much trickier, because a call to $foo->bar() doesn't have a natural way 
to choose which "bar" is meant.




I'd much rather see static and non-static methods being able to
have the same name


Allowing this would lead to ambiguous calls, because as others have 
pointed out, :: doesn't always denote a static call. Consider this code:


class Test {
  public function test() { echo 'instance test'; }
  public static function test() { echo 'static test'; }
}

class Test2 extends Test {
  public function runTest() { parent::test(); }
}

(new Test2)->runTest();

Currently, this can call either of the test() methods if you comment the 
other out: https://3v4l.org/5HlPE https://3v4l.org/LBALm


If both are defined, which should it call? And if you wanted the other, 
how would you specify that? We would need some new syntax to remove the 
ambiguity.



Regards,

--
Rowan Tommins
[IMSoP]


Re: [PHP-DEV] [RFC] Invoke __callStatic when non-static public methods are called statically

2024-04-01 Thread Rowan Tommins [IMSoP]

On 29/03/2024 02:39, 하늘아부지 wrote:

I created a wiki for __callStatic related issues.
Please see:
https://wiki.php.net/rfc/complete_callstatc_magic



Hi,

Several times in the discussion you have said (in different words) 
"__callStatic is called for instance methods which are private or 
protected", but that is not how it is generally interpreted.


If you are calling a method from outside the class, as far as you're 
concerned only public methods exist; private methods are, by definition, 
hidden implementation details. This is more obvious in languages with 
static typing, where if you have an instance of some interface, only the 
methods on that interface exist; the concrete object might actually have 
other methods, but you can't access them.


That is what is meant by "inaccessible": __call and __callStatic are 
called for methods which, as seen from the current scope, *do not exist*.



You could still argue that static context is like a different scope, or 
a different statically typed interface - as far as that context is 
concerned, only static methods exist. But that's also not a common 
interpretation, for (at least) two reasons:


Firstly, there is no syntax in PHP which specifically marks a static 
call - Foo::bar() is used for both static calls, and for forwarding 
instance calls, most obviously in the case of parent::foo().


Secondly, until PHP 8, marking a method as static was optional; an error 
was only raised once you tried to access $this in a context where it 
wasn't defined. In PHP 4, this was correct code; in PHP 5 and 7, it 
raised diagnostics (first E_STRICT, later E_DEPRECATED) but still ran 
the method:


class Foo {
    function bar() {
    echo 'Hello, World!';
    }
}
Foo::bar();


I think that's part of the reason you're getting negative feedback: to 
you, the feature seems like an obvious extension, even a bug fix; but to 
others, it seems like a complete change to how static calls are interpreted.


Regards,

--
Rowan Tommins
[IMSoP]


Re: [PHP-DEV] Consider removing autogenerated files from tarballs

2024-04-01 Thread Robert Landers
On Mon, Apr 1, 2024 at 1:53 AM Ben Ramsey  wrote:
>
> > On Mar 31, 2024, at 11:08, Robert Landers  wrote:
> >
> > There are probably multiple parties that require trust: the people
> > hosting the CI servers, the people with access to the CI servers, the
> > RM, and maybe more that I can't think of right now.
> >
> > One option would be to have
> >
> > - CI push the code + generated files to a git-branch `php-8.3-built`
> > (or something) so that changes can be reviewed, along with the
> > tarball.
> > - CI signs the commit and tarball.
> > - RM checks out commit and, also signs the tarball, then does a git
> > commit --amend --signoff and "blesses" the commit
> > - RM releases tarball
>
>
> When I was considering this and created a PR that followed these steps, I 
> discussed the process with folks from other open source communities, notably 
> the Apache Software Foundation community, since some of their projects follow 
> similar processes. The notion of automating the build and signing it on a 
> remote machine, only to be inspected and signed again on the release 
> manager’s machine was outright rejected by everyone. The machine where it is 
> signed by the RM should be the machine where it is built, according to 
> everyone I spoke with.
>
> As it stands right now, if we build the tarball on a remote machine (in CI), 
> and then the RM wants to compare it and build it locally, the hashes on those 
> tarballs will be different because we can’t guarantee reproducible builds. If 
> we could guarantee reproducible builds, then maybe this process could work, 
> but it would still require the RM to build it locally from the source tag in 
> order to trust and verify that nothing sneaked in on the CI machine.
>
> Cheers,
> Ben
>

I think the big point is to store the generated files in git for CI
builds. To verify the tarball is that commit, checkout the branch and
untar the file, there should be no changes, git clean should result in
no removed files, etc. This would make injecting malicious code
visible, at the very least. Whether someone catches it and actually
reviews the generated files is a different question. But if we wanted
something that is better than nothing... it's a pretty simple
solution.

Reproducible builds is an orthogonal but related problem.