Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)

2024-04-06 Thread Ilija Tovilo
Hi Rowan

On Fri, Apr 5, 2024 at 12:28 AM Rowan Tommins [IMSoP]
 wrote:
>
> On 03/04/2024 00:01, Ilija Tovilo wrote:
>
> Regardless of the implementation, there are a lot of interactions we will 
> want to consider; and we will have to keep considering new ones as we add to 
> the language. For instance, the Property Hooks RFC would probably have needed 
> a section on "Interaction with Data Classes".

That remark was implying that data classes really are just classes
with some additional tweaks. That gives us the ability to handle them
differently when desired. However, they will otherwise behave just
like classes, which makes it not so different from your suggestion.

> On a practical note, a few things I've already thought of to consider:
>
> - Can a data class have readonly properties (or be marked "readonly data 
> class")? If so, how will they behave?

Yes. The CoW semantics become irrelevant, given that nothing may
trigger a separation. However, data classes also include value
equality, and hashing in the future. These may still be useful for
immutable data.

> - Can you explicitly use the "clone" keyword with an instance of a data 
> class? Does it make any difference?

Manual cloning is not useful, but it's also not harmful. So I'm
leaning towards allowing this. This way, data classes may be handled
generically, along with other non-data classes.

> - Tied into that: can you implement __clone(), and when will it be called?

Yes. `__clone` will be called when the object is separated, as you would expect.

> - If you implement __set(), will copy-on-write be triggered before it's 
> called?

Yes. Separation happens as part of the property fetching, rather than
the assignment itself. Hence, for `$foo->bar->baz = 'baz';`, once
`Bar::__set('baz', 'baz')` is called, `$foo` and `$foo->bar` will
already have been separated.

> - Can you implement __destruct()? Will it ever be called?

Yes. As with any other object, this will be called once the last
reference to the object goes away. There's nothing special going on.

It's worth noting that CoW makes `__clone` and `__destruct` somewhat
nondeterministic, or at least non-obvious.

> > Consider this example, which would > work with the current approach: > > 
> > $shapes[0]->position->zero!();
>
> I find this concise example confusing, and I think there's a few things to 
> unpack here...

I think you're putting too much focus on CoW. CoW should really be
considered an implementation detail. It's not _fully_ transparent,
given that it is observable through `__clone` and `__destruct` as
mentioned above. But it is _mostly_ transparent.

Conceptually, the copy happens not when the method is called, but when
the variable is assigned. For your example:

```php
$shape = new Shape(new Position(42,42));
$copy = $shape; // Conceptually, a recursive copy happens here.
$copy->position->zero!(); // $shape is already detached from $copy.
The ! merely indicates that the value is modified.
```

> The array access doesn't need any special marker, because there's no 
> ambiguity.

This is only true if you ignore ArrayAccess. `$foo['bar']` does not
necessarily indicate that `$foo` is an array. If it were a `Vector`,
then we would absolutely need an indication to separate it.

It's true that `$foo->bar` currently indicates that `$foo` is a
reference type. This assumption would break with this RFC, but that's
also kind of the whole point.

> What is going to be CoW cloned, and what is going to be modified in place? I 
> can't actually know without knowing the definition behind both $item and 
> $item->shape. It might even vary depending on input.

For the most part, data classes should consist of other value types,
or immutable reference types (e.g. DateTimeImmutable). This actually
makes the rules quite simple: If you assign a value type, the entire
data structure is copied recursively. The fact that PHP delays this
step for performance is unimportant. The fact that immutable reference
types aren't cloned is also unimportant, given that they don't change.

Ilija


Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)

2024-04-04 Thread Rowan Tommins [IMSoP]

On 03/04/2024 00:01, Ilija Tovilo wrote:
Data classes are classes with a single additional  > zend_class_entry.ce_flags flag. So unless customized, they behave as 
> classes. This way, we have the option to tweak any behavior we would 
> like, but we don't need to. > > Of course, this will still require an 
analysis of what behavior we > might want to tweak.


Regardless of the implementation, there are a lot of interactions we 
will want to consider; and we will have to keep considering new ones as 
we add to the language. For instance, the Property Hooks RFC would 
probably have needed a section on "Interaction with Data Classes".


On the other hand, maybe having two types of objects to consider each 
time is better than having to consider combinations of lots of small 
features.



On a practical note, a few things I've already thought of to consider:

- Can a data class have readonly properties (or be marked "readonly data 
class")? If so, how will they behave?
- Can you explicitly use the "clone" keyword with an instance of a data 
class? Does it make any difference?

- Tied into that: can you implement __clone(), and when will it be called?
- If you implement __set(), will copy-on-write be triggered before it's 
called?

- Can you implement __destruct()? Will it ever be called?




Consider this example, which would  > work with the current approach: > > 
$shapes[0]->position->zero!();


I find this concise example confusing, and I think there's a few things 
to unpack here...



Firstly, there's putting a data object in an array:

$numbers = [ new Number(42) ];
$cow = $numbers;
$cow[0]->increment!();
assert($numbers !== $cow);

This is fairly clearly equivalent to this:

$numbers = [ 42 ];
$cow = $numbers;
$cow[0]++;
assert($numbers !== $cow);

CoW is triggered on the array for both, because ++ and ->increment!() 
are both clearly modifications.



Second, there's putting a data object into another data object:

$shape = new Shape(new Position(42,42));
$cow = $shape;
$cow->position->zero!();
assert($shape !== $cow);

This is slightly less obvious, because it presumably depends on the 
definition of Shape. Assuming Position is a data class:


- If Shape is a normal class, changing the value of $cow->position just 
happens in place, and the assertion fails


- If Shape is a readonly class (or position is a readonly property on a 
normal class), changing the value of $cow->position shouldn't be 
allowed, so this will presumably give an error


- If Shape is a data class, changing the value of $shape->position 
implies a "mutation" of $shape itself, so we get a separation before 
anything is modified, and the assertion passes


Unlike in the array case, this behaviour can't be resolved until you 
know the run-time type of $shape.



Now, back to your example:

$shapes = [ new Shape(new Position(42,42)) ];
$cow = $shapes;
$shapes[0]->position->zero!(); assert($cow !== $shapes);

This combines the two, meaning that now we can't know whether to 
separate the array until we know (at run-time) whether Shape is a normal 
class or a data class.


But once that is known, the whole of "->position->zero!()" is a 
modification to $shapes[0], so we need to separate $shapes.




Without such a class-wide marker, you'll need to remember to add the
special syntax exactly where applicable.

$shapes![0]!->position!->zero();



The array access doesn't need any special marker, because there's no 
ambiguity. The ambiguous call is the reference to ->position: in your 
current proposal, this represents a modification *if Shape is a data 
class, and is itself being modified*. My suggestion (or really, thought 
experiment) was that it would represent a modification *if it has a ! in 
the call*.


So if Shape is a readonly class:

$shapes[0]->position->!zero();
// Error: attempting to modify readonly property Shape::$position

$shapes[0]->!position->!zero();
// OK; an optimised version of:
$shapes[0] = clone $shapes[0] with [
    'position' =>  (clone $shapes[0]->position with ['x'=>0,'y'=>0])
];

If ->! is only allowed if the RHS is either a readonly property or a 
mutating method, then this can be reasoned about statically: it will 
either error, or cause a CoW separation of $shapes. It also allows 
classes to mix aspects of "data class" and "normal class" behaviour, 
which might or might not be a good idea.



This is mostly just a thought experiment, but I am a bit concerned that 
code like this is going to be confusingly ambiguous:


$item->shape->position->zero!();

What is going to be CoW cloned, and what is going to be modified in 
place? I can't actually know without knowing the definition behind both 
$item and $item->shape. It might even vary depending on input.



Regards,

--
Rowan Tommins
[IMSoP]


Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)

2024-04-04 Thread Kévin Dunglas
Data classes will be a very useful addition to "API Platform".

API Platform is a "resource-oriented" framework that strongly encourages
the use of "data-only" classes:
we use PHP classes both as a specification language to document the public
shape of web APIs (like an OpenAPI specification, but written in PHP
instead of JSON or YAML),
and as Data Transfer Objects containing the data to be serialized into JSON
(read), or the JSON payload deserialized into PHP objects (write).

Being able to encourage users to use structs (that's what we already call
this type of behavior-less class in our workshops) for these objects will
help us a lot.

Kévin


Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)

2024-04-03 Thread Ilija Tovilo
Hi Larry

On Wed, Apr 3, 2024 at 12:03 AM Larry Garfield  wrote:
>
> On Tue, Apr 2, 2024, at 6:04 PM, Ilija Tovilo wrote:
>
> > I think you misunderstood. The intention is to mark both call-site and
> > declaration. Call-site is marked with ->method!(), while declaration
> > is marked with "public mutating function". Call-site is required to
> > avoid the engine complexity, as previously mentioned. But
> > declaration-site is required so that the user (and IDEs) even know
> > that you need to use the special syntax at the call-site.
>
> Ah, OK.  That's... unfortunate, but I defer to you on the implementation 
> complexity.

As I've argued, I believe the different syntax is a positive. This
way, data classes are known to stay unmodified unless:

1. You're explicitly modifying it yourself.
2. You're calling a mutating method, with its associated syntax.
3. You're creating a reference from the value, either explicitly or by
passing it to a by-reference parameter.

By-reference argument passing is the only way that mutations of data
classes can be hidden (given that they look exactly like normal
by-value arguments), and its arguably a flaw of by-reference passing
itself. In all other cases, you can expect your value _not_ to
unexpectedly change. For this reason, I consider it as an alternative
approach to readonly classes.

> > Disallowing ordinary by-ref objects is not trivial without additional
> > performance penalties, and I don't see a good reason for it. Can you
> > provide an example on when that would be problematic?
>
> There's two aspects to it, that I see.
>
> data class A {
>   public function __construct(public string $name) {}
> }
>
> data class B {
>   public function __construct(
> public A $a,
> public PDO $conn,
>   ) {}
> }
>
> $b = new B(new A(), $pdoConnection);
>
> function stuff(B $b2) {
>   $b2->a->name = 'Larry';
>   // This triggers a CoW on $b2, separating it from $b, and also creating a 
> new instance of A.  What about $conn?
>   // Does it get cloned?  That would be bad.  Does it not get cloned?  That 
> seems weird that it's still the same on
>   // a data object.
>
>   $b2->conn->beginTransaction();
>   // This I would say is technically a modification, since the state of the 
> connection is changing.  But then
>   // should this trigger $b2 cloning from $b1?  Neither answer is obvious to 
> me.
> }

IMO, the answer is relatively straight-forward: PDO is a reference
type. For all intents and purposes, when you're passing B to stuff(),
B is copied. Since B::$conn is a "reference" (read pointer), copying B
doesn't copy the connection, only the reference to it. B::$a, however,
is a value type, so copying B also copies A. The fact that this isn't
_exactly_ what happens under the hood due to CoW is an implementation
detail, it doesn't need to change how you think about it. From the
users standpoint, $b and $b2 can already separate values once stuff()
is called.

This is really no different from arrays:

```php
$b = ['a' => ['name' => 'Larry'], 'conn' => $pdoConnection];
$b2 = $b; // $b is detached from $b2, $b['conn'] remains a shared object.
```

> The other aspect is, eg, serialization.  People will come to expect 
> (reasonably) that a data class will have certain properties (in the abstract 
> sense, not lexical sense).  For instance, most classes are serializable, but 
> a few are not.  (Eg, if they have a reference to PDO or a file handle or 
> something unserializable.)  Data classes seem like they should be safe to 
> serialize always, as they're "just data".  If data classes are limited to 
> primitives and data classes internally, that means we can effectively 
> guarantee that they will be serializable, always.  If one of the properties 
> could be a non-serializable object, that assumption breaks.

I'm not sure that's a convincing argument to fully disallow reference
types, especially since it would prevent you from storing
DateTimeImmutables and other immutable values in data classes and thus
break many valid use-cases. That would arguably be very limiting.

> There's probably other similar examples besides serialization where "think of 
> this as data" and "think of this as logic" is how you'd want to think, which 
> leads to different assumptions, which we shouldn't stealthily break.

I think your assumption here is that non-data classes cannot contain
data. This doesn't hold, and especially will not until data classes
become more common. Readonly classes can be considered strict versions
of data classes in terms of mutability, minus some of the other
semantic changes (e.g. identity).

Ilija


Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)

2024-04-02 Thread Ilija Tovilo
Hi Rowan

On Tue, Apr 2, 2024 at 10:10 PM Rowan Tommins [IMSoP]
 wrote:
>
> On 02/04/2024 01:17, Ilija Tovilo wrote:
>
> I'd like to introduce an idea I've played around with for a couple of
> weeks: Data classes, sometimes called structs in other languages (e.g.
> Swift and C#).
>
> I'm not sure if you've considered it already, but mutating methods should 
> probably be constrained to be void (or maybe "mutating" could occupy the 
> return type slot). Otherwise, someone is bound to write this:
>
> $start = new Location('Here');
> $end = $start->move!('There');
>
> Expecting it to mean this:
>
> $start = new Location('Here');
> $end = $start;
> $end->move!('There');
>
> When it would actually mean this:
>
> $start = new Location('Here');
> $start->move!('There');
> $end = $start;

I think there are some valid patterns for mutating methods with a
return value. For example, Set::add() might return a bool to indicate
whether the value was already present in the set.

> I seem to remember when this was discussed before, the argument being made 
> that separating value objects completely means you have to spend time 
> deciding how they interact with every feature of the language.

Data classes are classes with a single additional
zend_class_entry.ce_flags flag. So unless customized, they behave as
classes. This way, we have the option to tweak any behavior we would
like, but we don't need to.

Of course, this will still require an analysis of what behavior we
might want to tweak.

> Does the copy-on-write optimisation actually require the entire class to be 
> special, or could it be triggered by a mutating method on any object? To 
> allow direct modification of properties as well, we could move the call-site 
> marker slightly to a ->! operator:
>
> $foo->!mutate();
> $foo->!bar = 42;

I suppose this is possible, but it puts the burden for figuring out
what to separate onto the user. Consider this example, which would
work with the current approach:

$shapes[0]->position->zero!();

The left-hand-side of the mutating method call is fetched by
"read+write". Essentially, this ensures that any array or data class
is separated (copied if RC >1).

Without such a class-wide marker, you'll need to remember to add the
special syntax exactly where applicable.

$shapes![0]!->position!->zero();

In this case, $shapes, $shapes[0], and $shapes[0]->position must all
be separated. This seems very easy to mess up, especially since only
zero() is actually known to be separating and can thus be verified at
runtime.

> The main drawback I can see (outside of the implementation, which I can't 
> comment on) is that we couldn't overload the === operator to use value 
> semantics. In exchange, a lot of decisions would simply be made for us: they 
> would just be objects, with all the same behaviour around inheritance, 
> serialization, and so on.

Right, this would either require some other marker that switches to
this mode of comparison, or operator overloading.

Ilija


Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)

2024-04-02 Thread Ilija Tovilo
Hi Niels

On Tue, Apr 2, 2024 at 8:16 PM Niels Dossche  wrote:
>
> On 02/04/2024 02:17, Ilija Tovilo wrote:
> > Hi everyone!
> >
> > I'd like to introduce an idea I've played around with for a couple of
> > weeks: Data classes, sometimes called structs in other languages (e.g.
> > Swift and C#).
>
> As already hinted in the thread, I also think inheritance may be dangerous in 
> a first version.
> I want to add to that: if you extend a data-class with a non-data-class, the 
> data-class behaviour gets lost, which is logical in a sense but also 
> surprised me in a way.

Yes, that's definitely not intended. I haven't implemented any
inheritance checks yet. But if inheritance is allowed, then it should
be restricted to classes of the same kind (by-ref or by-val).

> Also, FWIW, I'm not sure about the name "data" class, perhaps "value" class 
> or something alike is what people may be more familiar with wrt semantics, 
> although dataclass is also a known term.

I'm happy with value class, struct, record, data class, what have you.
I'll accept whatever the majority prefers.

> I do have a question about iterator behaviour. Consider this code:
> ```
> data class Test {
> public $a = 1;
> public $b = 2;
> }
>
> $test = new Test;
> foreach ($test as $k => &$v) {
> if ($k === "b")
> $test->a = $test;
> var_dump($k);
> }
> ```
>
> This will reset the iterator of the object on separation, so we will get an 
> infinite loop.
> Is this intended?
> If so, is it because the right hand side is the original object while the 
> left hand side gets the clone?
> Is this consistent with how arrays separate?

That's a good question. I have not really thought about iterators yet.
Modification of an array iterated by-reference does not restart the
iterator. Actually, by-reference capturing of the value also captures
the array by-reference, which is not completely intuitive.

My initial gut feeling is to handle data classes the same, i.e.
capture them by-reference when iterating the value by reference, so
that iteration is not restarted.

Ilija


Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)

2024-04-02 Thread Larry Garfield
On Tue, Apr 2, 2024, at 6:04 PM, Ilija Tovilo wrote:

>> What would be the reason not to?  As you indicated in another reply, the 
>> main reason some languages don't is to avoid large stack copies, but PHP 
>> doesn't have large stack copies for objects anyway so that's a non-issue.
>>
>> I've long argued that the fewer differences there are between service 
>> classes and data classes, the better, so I'm not sure what advantage this 
>> would have other than "ugh, inheritance is such a mess" (which is true, but 
>> that ship sailed long ago).
>
> One issue that just came to mind is object identity. For example:
>
> class Person {
> public function __construct(
> public string $firstname,
> public string $lastname,
> ) {}
> }
>
> class Manager extends Person {
> public function bossAround() {}
> }
>
> $person = new Person('Boss', 'Man');
> $manager = new Manager('Boss', 'Man');
> var_dump($person === $manager); // ???
>
> Equality for data objects is based on data, rather than the object
> handle. How does this interact with inheritance? Technically, Person
> and Manager represent the same data. Manager contains additional
> behavior, but does that change identity?
>
> I'm not sure what the answer is. That's just the first thing that came
> to mind. I'm confident we'll discover more such edge cases. Of course,
> I can invest the time to find the questions before deciding to
> disallow inheritance.

As Bruce already demonstrated, equality should include type, not just 
properties.  Even without inheritance that is necessary.

There may be good reason to omit inheritance, as we did on enums, but that 
shouldn't be the starting point.  (I'd have to research and see what other 
languages do. I think it's a mixed bag.)  We should try to ferret out those 
edge cases and see if there's reasonable solutions to them.

>> > * Mutating method calls on data classes use a slightly different
>> > syntax: `$vector->append!(42)`. All methods mutating `$this` must be
>> > marked as `mutating`. The reason for this is twofold: 1. It signals to
>> > the caller that the value is modified. 2. It allows `$vector` to be
>> > cloned before knowing whether the method `append` is modifying, which
>> > hugely reduces implementation complexity in the engine.
>>
>> As discussed in R11, it would be very beneficial if this marker could be on 
>> the method definition, not the method invocation.  You indicated that would 
>> be Hard(tm), but I think it's worth some effort to see if it's surmountably 
>> hard.  (Or at least less hard than just auto-detecting it, which you 
>> indicated is Extremely Hard(tm).)
>
> I think you misunderstood. The intention is to mark both call-site and
> declaration. Call-site is marked with ->method!(), while declaration
> is marked with "public mutating function". Call-site is required to
> avoid the engine complexity, as previously mentioned. But
> declaration-site is required so that the user (and IDEs) even know
> that you need to use the special syntax at the call-site.

Ah, OK.  That's... unfortunate, but I defer to you on the implementation 
complexity.

>> So to the extent there is a consensus, equality, stringifying, and a 
>> hashcode (which we don't have yet, but will need in the future for some 
>> things I suspect) seem to be the rough expected defaults.
>
> I'm just skeptical whether the default __toString() is ever useful. I
> can see an argument for it for quick debugging in languages that don't
> provide something like var_dump(). In PHP this seems much less useful.
> It's impossible to provide a default implementation that works
> everywhere (or pretty much anywhere, even).
>
> Equality is already included. Hashing should be added separately, and
> probably not just to data classes.

The equivalent of Python's __repr__ (which it auto-generates) would be 
__debugInfo().  Arguably its current output is what the default would likely be 
anyway, though.  I believe the typical auto-toString output is the same data, 
but presented in a more human-friendly way.  (So yes, mainly useful for 
debugging.)

Equality, well, we've already debated whether or not we should make that a 
general feature. :-)  Of note, though, in languages with equals(), it's also 
user-overridable.

>> > * In the future, it should be possible to allow using data classes in
>> > `SplObjectStorage`. However, because hashing is complex, this will be
>> > postponed to a separate RFC.

I believe this is where we would want/need a __hash() method or similar; Derick 
and I encountered that while researching collections in other languages.  
Leaving it out for now is fine, but it would be important for any future 
list-of functionality.

>> Would data class properties only be allowed to be other data classes, or 
>> could they hold a non-data class?  My knee jerk response is they should be 
>> data classes all the way down; the only counter-argument I can think of it 
>> would be how much existing code 

Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)

2024-04-02 Thread Deleu
On Tue, Apr 2, 2024 at 1:47 PM Larry Garfield 
wrote:

> > * Data classes protect from interior mutability. More concretely,
> > mutating nested data objects stored in a `readonly` property is not
> > legal, whereas it would be if they were ordinary objects.
> > * In the future, it should be possible to allow using data classes in
> > `SplObjectStorage`. However, because hashing is complex, this will be
> > postponed to a separate RFC.
>
> Would data class properties only be allowed to be other data classes, or
> could they hold a non-data class?  My knee jerk response is they should be
> data classes all the way down; the only counter-argument I can think of it
> would be how much existing code is out there that is a "data class" in all
> but name.  I still fear someone adding a DB connection object to a data
> class and everything going to hell, though. :-)
>

If there is a class made up of 90% data struct and 10% non-data struct, the
90% could be extracted into a true data struct and be referenced in the
existing regular class, making it even more organized in terms of
establishing what's "data" and what's "service". I would really favor
making it "data class" all the way down.

I understand you disagree with the argument against inheritance, but to me
the same logic applies here. Making it data class only allows for lifting
the restriction in the future, if necessary (requiring another RFC vote).
Making it mixed on version 1 means that support for the mixture of them can
never be undone.


-- 
Marco Deleu


Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)

2024-04-02 Thread Rowan Tommins [IMSoP]

On 02/04/2024 01:17, Ilija Tovilo wrote:

I'd like to introduce an idea I've played around with for a couple of
weeks: Data classes, sometimes called structs in other languages (e.g.
Swift and C#).



Hi Ilija,

I'm really interested to see how this develops. A couple of thoughts 
that immediately occurred to me...



I'm not sure if you've considered it already, but mutating methods 
should probably be constrained to be void (or maybe "mutating" could 
occupy the return type slot). Otherwise, someone is bound to write this:


$start = new Location('Here');
$end = $start->move!('There');

Expecting it to mean this:

$start = new Location('Here');
$end = $start;
$end->move!('There');

When it would actually mean this:

$start = new Location('Here');
$start->move!('There');
$end = $start;


I seem to remember when this was discussed before, the argument being 
made that separating value objects completely means you have to spend 
time deciding how they interact with every feature of the language.


Does the copy-on-write optimisation actually require the entire class to 
be special, or could it be triggered by a mutating method on any object? 
To allow direct modification of properties as well, we could move the 
call-site marker slightly to a ->! operator:


$foo->!mutate();
$foo->!bar = 42;

The first would be the same as your current version: it would perform a 
CoW reference separation / clone, then call the method, which would 
require a "mutating" marker. The second would essentially be an 
optimised version of $foo = clone $foo with [ 'bar' => 42 ]


During the method call or write operation, readonly properties would 
allow an additional write, as is the case in __clone and the "clone 
with" proposal. So a "pure" data object would simply be declared with 
the existing "readonly class" syntax.


The main drawback I can see (outside of the implementation, which I 
can't comment on) is that we couldn't overload the === operator to use 
value semantics. In exchange, a lot of decisions would simply be made 
for us: they would just be objects, with all the same behaviour around 
inheritance, serialization, and so on.



Regards,

--
Rowan Tommins
[IMSoP]


Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)

2024-04-02 Thread Rob Landers
On Tue, Apr 2, 2024, at 20:51, Bruce Weirdan wrote:
> On Tue, Apr 2, 2024 at 8:05 PM Ilija Tovilo  wrote:
> 
> > Equality for data objects is based on data, rather than the object
> > handle.
> 
> I believe equality should always consider the type of the object.
> 
> ```php
> new Problem(size:'big') === new Universe(size:'big')
> && new Problem(size:'big') === new Shoe(size:'big');
> ```
> 
> If the above can ever be true then I'm not sure how big is the problem
> (but probably very big).
> Also see the examples of non-comparable ids - `new CompanyId(1)`
> should not be equal to `new PersonId(1)`
> 
> And I'd find it very confusing if the following crashed
> 
> ```php
> function f(Universe $_u): void {}
> $universe = new Universe(size:'big');
> $shoe = new Shoe(size:'big);
> 
> if ($shoe === $universe) {
>f($shoe); // shoe is *identical* to the universe, so it should be
> accepted wherever the universe is
> }
> ```
> 
> -- 
>   Best regards,
>   Bruce Weirdan 
> mailto:weir...@gmail.com
> 

I'd love to see it so that equality was more like == for regular objects. If 
the type matches and the data matches, it's true. It'd be really helpful to be 
able to downcast types though. Such as in my user id example I gave earlier. 
Once it reaches a certain point in the code, it doesn't matter that it was once 
a UserId, it just matters that it is currently an Id.

Now that I think about it, decoration might be better than inheritance here and 
inheritance might make more sense to be banned. In other words, this might be 
just as simple and easy to use:

data class Id {
  public function __construct(public string $id) {}
}

data class UserId {
  public function __construct(public Id $id) {}
}

Though it would be really interesting to use them as "traits" for each other to 
say "this data class can be converted to another type, but information will be 
lost" where they are 100% separate types but can be "cast" to specified types.

// "use" has all the same rules as extends, but,
// UserId is not an Id; it can be converted to an Id
data class UserId use Id {
  public function __construct(public string $id, public string $name) {}
}

$user = new UserId('123', 'rob');

$id = (Id) $user;

$user !== $id === true;

$id is 100% Id and lost all its "userness." Hmm. Interesting indeed. Probably 
not practical, but interesting.

— Rob

Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)

2024-04-02 Thread Bruce Weirdan
On Tue, Apr 2, 2024 at 8:05 PM Ilija Tovilo  wrote:

> Equality for data objects is based on data, rather than the object
> handle.

I believe equality should always consider the type of the object.

```php
new Problem(size:'big') === new Universe(size:'big')
&& new Problem(size:'big') === new Shoe(size:'big');
```

If the above can ever be true then I'm not sure how big is the problem
(but probably very big).
Also see the examples of non-comparable ids - `new CompanyId(1)`
should not be equal to `new PersonId(1)`

And I'd find it very confusing if the following crashed

```php
function f(Universe $_u): void {}
$universe = new Universe(size:'big');
$shoe = new Shoe(size:'big);

if ($shoe === $universe) {
   f($shoe); // shoe is *identical* to the universe, so it should be
accepted wherever the universe is
}
```

-- 
  Best regards,
  Bruce Weirdan mailto:weir...@gmail.com


Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)

2024-04-02 Thread Niels Dossche
On 02/04/2024 02:17, Ilija Tovilo wrote:
> Hi everyone!
> 
> I'd like to introduce an idea I've played around with for a couple of
> weeks: Data classes, sometimes called structs in other languages (e.g.
> Swift and C#).
> 
> In a nutshell, data classes are classes with value semantics.
> Instances of data classes are implicitly copied when assigned to a
> variable, or when passed to a function. When the new instance is
> modified, the original instance remains untouched. This might sound
> familiar: It's exactly how arrays work in PHP.
> 
> ```php
> $a = [1, 2, 3];
> $b = $a;
> $b[] = 4;
> var_dump($a); // [1, 2, 3]
> var_dump($b); // [1, 2, 3, 4]
> ```
> 
> You may think that copying the array on each assignment is expensive,
> and you would be right. PHP uses a trick called copy-on-write, or CoW
> for short. `$a` and `$b` actually share the same array until `$b[] =
> 4;` modifies it. It's only at this point that the array is copied and
> replaced in `$b`, so that the modification doesn't affect `$a`. As
> long as a variable is the sole owner of a value, or none of the
> variables modify the value, no copy is needed. Data classes use the
> same mechanism.
> 
> But why value semantics in the first place? There are two major flaws
> with by-reference semantics for data structures:
> 
> 1. It's very easy to forget cloning data that is referenced somewhere
> else before modifying it. This will lead to "spooky actions at a
> distance". Having recently used JavaScript (where all data structures
> have by-reference semantics) for an educational IR optimizer,
> accidental mutations of shared arrays/maps/sets were my primary source
> of bugs.
> 2. Defensive cloning (to avoid issue 1) will lead to useless work when
> the value is not referenced anywhere else.
> 
> PHP offers readonly properties and classes to address issue 1.
> However, they further promote issue 2 by making it impossible to
> modify values without cloning them first, even if we know they are not
> referenced anywhere else. Some APIs further exacerbate the issue by
> requiring multiple copies for multiple modifications (e.g.
> `$response->withStatus(200)->withHeader('X-foo', 'foo');`).
> 
> As you may have noticed, arrays already solve both of these issues
> through CoW. Data classes allow implementing arbitrary data structures
> with the same value semantics in core, extensions or userland. For
> example, a `Vector` data class may look something like the following:
> 
> ```php
> data class Vector {
> private $values;
> 
> public function __construct(...$values) {
> $this->values = $values;
> }
> 
> public mutating function append($value) {
> $this->values[] = $value;
> }
> }
> 
> $a = new Vector(1, 2, 3);
> $b = $a;
> $b->append!(4);
> var_dump($a); // Vector(1, 2, 3)
> var_dump($b); // Vector(1, 2, 3, 4)
> ```
> 
> An internal Vector implementation might offer a faster and stricter
> alternative to arrays (e.g. Vector from php-ds).
> 
> Some other things to note about data classes:
> 
> * Data classes are ordinary classes, and as such may implement
> interfaces, methods and more. I have not decided whether they should
> support inheritance.
> * Mutating method calls on data classes use a slightly different
> syntax: `$vector->append!(42)`. All methods mutating `$this` must be
> marked as `mutating`. The reason for this is twofold: 1. It signals to
> the caller that the value is modified. 2. It allows `$vector` to be
> cloned before knowing whether the method `append` is modifying, which
> hugely reduces implementation complexity in the engine.
> * Data classes customize identity (`===`) comparison, in the same way
> arrays do. Two data objects are identical if all their properties are
> identical (including order for dynamic properties).
> * Sharing data classes by-reference is possible using references, as
> you would for arrays.
> * We may decide to auto-implement `__toString` for data classes,
> amongst other things. I am still undecided whether this is useful for
> PHP.
> * Data classes protect from interior mutability. More concretely,
> mutating nested data objects stored in a `readonly` property is not
> legal, whereas it would be if they were ordinary objects.
> * In the future, it should be possible to allow using data classes in
> `SplObjectStorage`. However, because hashing is complex, this will be
> postponed to a separate RFC.
> 
> One known gotcha is that we cannot trivially enforce placement of
> `modfying` on methods without a performance hit. It is the
> responsibility of the user to correctly mark such methods.
> 
> Here's a fully functional PoC, excluding JIT:
> https://github.com/php/php-src/pull/13800
> 
> Let me know what you think. I will start working on an RFC draft once
> work on property hooks concludes.
> 
> Ilija

Hi Ilija

Thank you for this proposal, I like the idea of having value semantic objects 
available.
I pulled your branch and played with it a bit.

As already hinted in 

Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)

2024-04-02 Thread Ilija Tovilo
Hi Larry

On Tue, Apr 2, 2024 at 5:31 PM Larry Garfield  wrote:
>
> On Tue, Apr 2, 2024, at 12:17 AM, Ilija Tovilo wrote:
> > Hi everyone!
> >
> > I'd like to introduce an idea I've played around with for a couple of
> > weeks: Data classes, sometimes called structs in other languages (e.g.
> > Swift and C#).
> >
> > * Data classes are ordinary classes, and as such may implement
> > interfaces, methods and more. I have not decided whether they should
> > support inheritance.
>
> What would be the reason not to?  As you indicated in another reply, the main 
> reason some languages don't is to avoid large stack copies, but PHP doesn't 
> have large stack copies for objects anyway so that's a non-issue.
>
> I've long argued that the fewer differences there are between service classes 
> and data classes, the better, so I'm not sure what advantage this would have 
> other than "ugh, inheritance is such a mess" (which is true, but that ship 
> sailed long ago).

One issue that just came to mind is object identity. For example:

class Person {
public function __construct(
public string $firstname,
public string $lastname,
) {}
}

class Manager extends Person {
public function bossAround() {}
}

$person = new Person('Boss', 'Man');
$manager = new Manager('Boss', 'Man');
var_dump($person === $manager); // ???

Equality for data objects is based on data, rather than the object
handle. How does this interact with inheritance? Technically, Person
and Manager represent the same data. Manager contains additional
behavior, but does that change identity?

I'm not sure what the answer is. That's just the first thing that came
to mind. I'm confident we'll discover more such edge cases. Of course,
I can invest the time to find the questions before deciding to
disallow inheritance.

> > * Mutating method calls on data classes use a slightly different
> > syntax: `$vector->append!(42)`. All methods mutating `$this` must be
> > marked as `mutating`. The reason for this is twofold: 1. It signals to
> > the caller that the value is modified. 2. It allows `$vector` to be
> > cloned before knowing whether the method `append` is modifying, which
> > hugely reduces implementation complexity in the engine.
>
> As discussed in R11, it would be very beneficial if this marker could be on 
> the method definition, not the method invocation.  You indicated that would 
> be Hard(tm), but I think it's worth some effort to see if it's surmountably 
> hard.  (Or at least less hard than just auto-detecting it, which you 
> indicated is Extremely Hard(tm).)

I think you misunderstood. The intention is to mark both call-site and
declaration. Call-site is marked with ->method!(), while declaration
is marked with "public mutating function". Call-site is required to
avoid the engine complexity, as previously mentioned. But
declaration-site is required so that the user (and IDEs) even know
that you need to use the special syntax at the call-site.

> So to the extent there is a consensus, equality, stringifying, and a hashcode 
> (which we don't have yet, but will need in the future for some things I 
> suspect) seem to be the rough expected defaults.

I'm just skeptical whether the default __toString() is ever useful. I
can see an argument for it for quick debugging in languages that don't
provide something like var_dump(). In PHP this seems much less useful.
It's impossible to provide a default implementation that works
everywhere (or pretty much anywhere, even).

Equality is already included. Hashing should be added separately, and
probably not just to data classes.

> > * In the future, it should be possible to allow using data classes in
> > `SplObjectStorage`. However, because hashing is complex, this will be
> > postponed to a separate RFC.
>
> Would data class properties only be allowed to be other data classes, or 
> could they hold a non-data class?  My knee jerk response is they should be 
> data classes all the way down; the only counter-argument I can think of it 
> would be how much existing code is out there that is a "data class" in all 
> but name.  I still fear someone adding a DB connection object to a data class 
> and everything going to hell, though. :-)

Disallowing ordinary by-ref objects is not trivial without additional
performance penalties, and I don't see a good reason for it. Can you
provide an example on when that would be problematic?

Ilija


Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)

2024-04-02 Thread Robert Landers
On Tue, Apr 2, 2024 at 2:20 AM Ilija Tovilo  wrote:
>
> Hi everyone!
>
> I'd like to introduce an idea I've played around with for a couple of
> weeks: Data classes, sometimes called structs in other languages (e.g.
> Swift and C#).
>
> In a nutshell, data classes are classes with value semantics.
> Instances of data classes are implicitly copied when assigned to a
> variable, or when passed to a function. When the new instance is
> modified, the original instance remains untouched. This might sound
> familiar: It's exactly how arrays work in PHP.
>
> ```php
> $a = [1, 2, 3];
> $b = $a;
> $b[] = 4;
> var_dump($a); // [1, 2, 3]
> var_dump($b); // [1, 2, 3, 4]
> ```
>
> You may think that copying the array on each assignment is expensive,
> and you would be right. PHP uses a trick called copy-on-write, or CoW
> for short. `$a` and `$b` actually share the same array until `$b[] =
> 4;` modifies it. It's only at this point that the array is copied and
> replaced in `$b`, so that the modification doesn't affect `$a`. As
> long as a variable is the sole owner of a value, or none of the
> variables modify the value, no copy is needed. Data classes use the
> same mechanism.
>
> But why value semantics in the first place? There are two major flaws
> with by-reference semantics for data structures:
>
> 1. It's very easy to forget cloning data that is referenced somewhere
> else before modifying it. This will lead to "spooky actions at a
> distance". Having recently used JavaScript (where all data structures
> have by-reference semantics) for an educational IR optimizer,
> accidental mutations of shared arrays/maps/sets were my primary source
> of bugs.
> 2. Defensive cloning (to avoid issue 1) will lead to useless work when
> the value is not referenced anywhere else.
>
> PHP offers readonly properties and classes to address issue 1.
> However, they further promote issue 2 by making it impossible to
> modify values without cloning them first, even if we know they are not
> referenced anywhere else. Some APIs further exacerbate the issue by
> requiring multiple copies for multiple modifications (e.g.
> `$response->withStatus(200)->withHeader('X-foo', 'foo');`).
>
> As you may have noticed, arrays already solve both of these issues
> through CoW. Data classes allow implementing arbitrary data structures
> with the same value semantics in core, extensions or userland. For
> example, a `Vector` data class may look something like the following:
>
> ```php
> data class Vector {
> private $values;
>
> public function __construct(...$values) {
> $this->values = $values;
> }
>
> public mutating function append($value) {
> $this->values[] = $value;
> }
> }
>
> $a = new Vector(1, 2, 3);
> $b = $a;
> $b->append!(4);
> var_dump($a); // Vector(1, 2, 3)
> var_dump($b); // Vector(1, 2, 3, 4)
> ```
>
> An internal Vector implementation might offer a faster and stricter
> alternative to arrays (e.g. Vector from php-ds).
>
> Some other things to note about data classes:
>
> * Data classes are ordinary classes, and as such may implement
> interfaces, methods and more. I have not decided whether they should
> support inheritance.
> * Mutating method calls on data classes use a slightly different
> syntax: `$vector->append!(42)`. All methods mutating `$this` must be
> marked as `mutating`. The reason for this is twofold: 1. It signals to
> the caller that the value is modified. 2. It allows `$vector` to be
> cloned before knowing whether the method `append` is modifying, which
> hugely reduces implementation complexity in the engine.
> * Data classes customize identity (`===`) comparison, in the same way
> arrays do. Two data objects are identical if all their properties are
> identical (including order for dynamic properties).
> * Sharing data classes by-reference is possible using references, as
> you would for arrays.
> * We may decide to auto-implement `__toString` for data classes,
> amongst other things. I am still undecided whether this is useful for
> PHP.
> * Data classes protect from interior mutability. More concretely,
> mutating nested data objects stored in a `readonly` property is not
> legal, whereas it would be if they were ordinary objects.
> * In the future, it should be possible to allow using data classes in
> `SplObjectStorage`. However, because hashing is complex, this will be
> postponed to a separate RFC.
>
> One known gotcha is that we cannot trivially enforce placement of
> `modfying` on methods without a performance hit. It is the
> responsibility of the user to correctly mark such methods.
>
> Here's a fully functional PoC, excluding JIT:
> https://github.com/php/php-src/pull/13800
>
> Let me know what you think. I will start working on an RFC draft once
> work on property hooks concludes.
>
> Ilija

Neat! I've been playing around with "value-like" objects for awhile now:

https://github.com/withinboredom/time

Having inheritance supported would be useful, for example, 

Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)

2024-04-02 Thread Larry Garfield
On Tue, Apr 2, 2024, at 12:17 AM, Ilija Tovilo wrote:
> Hi everyone!
>
> I'd like to introduce an idea I've played around with for a couple of
> weeks: Data classes, sometimes called structs in other languages (e.g.
> Swift and C#).

*gets popcorn*

> In a nutshell, data classes are classes with value semantics.
> Instances of data classes are implicitly copied when assigned to a
> variable, or when passed to a function. When the new instance is
> modified, the original instance remains untouched. This might sound
> familiar: It's exactly how arrays work in PHP.
>
> ```php
> $a = [1, 2, 3];
> $b = $a;
> $b[] = 4;
> var_dump($a); // [1, 2, 3]
> var_dump($b); // [1, 2, 3, 4]
> ```
>
> You may think that copying the array on each assignment is expensive,
> and you would be right. PHP uses a trick called copy-on-write, or CoW
> for short. `$a` and `$b` actually share the same array until `$b[] =
> 4;` modifies it. It's only at this point that the array is copied and
> replaced in `$b`, so that the modification doesn't affect `$a`. As
> long as a variable is the sole owner of a value, or none of the
> variables modify the value, no copy is needed. Data classes use the
> same mechanism.
>
> But why value semantics in the first place? There are two major flaws
> with by-reference semantics for data structures:
>
> 1. It's very easy to forget cloning data that is referenced somewhere
> else before modifying it. This will lead to "spooky actions at a
> distance". Having recently used JavaScript (where all data structures
> have by-reference semantics) for an educational IR optimizer,
> accidental mutations of shared arrays/maps/sets were my primary source
> of bugs.
> 2. Defensive cloning (to avoid issue 1) will lead to useless work when
> the value is not referenced anywhere else.
>
> PHP offers readonly properties and classes to address issue 1.
> However, they further promote issue 2 by making it impossible to
> modify values without cloning them first, even if we know they are not
> referenced anywhere else. Some APIs further exacerbate the issue by
> requiring multiple copies for multiple modifications (e.g.
> `$response->withStatus(200)->withHeader('X-foo', 'foo');`).
>
> As you may have noticed, arrays already solve both of these issues
> through CoW. Data classes allow implementing arbitrary data structures
> with the same value semantics in core, extensions or userland. For
> example, a `Vector` data class may look something like the following:
>
> ```php
> data class Vector {
> private $values;
>
> public function __construct(...$values) {
> $this->values = $values;
> }
>
> public mutating function append($value) {
> $this->values[] = $value;
> }
> }
>
> $a = new Vector(1, 2, 3);
> $b = $a;
> $b->append!(4);
> var_dump($a); // Vector(1, 2, 3)
> var_dump($b); // Vector(1, 2, 3, 4)
> ```
>
> An internal Vector implementation might offer a faster and stricter
> alternative to arrays (e.g. Vector from php-ds).
>
> Some other things to note about data classes:
>
> * Data classes are ordinary classes, and as such may implement
> interfaces, methods and more. I have not decided whether they should
> support inheritance.

What would be the reason not to?  As you indicated in another reply, the main 
reason some languages don't is to avoid large stack copies, but PHP doesn't 
have large stack copies for objects anyway so that's a non-issue.

I've long argued that the fewer differences there are between service classes 
and data classes, the better, so I'm not sure what advantage this would have 
other than "ugh, inheritance is such a mess" (which is true, but that ship 
sailed long ago).

> * Mutating method calls on data classes use a slightly different
> syntax: `$vector->append!(42)`. All methods mutating `$this` must be
> marked as `mutating`. The reason for this is twofold: 1. It signals to
> the caller that the value is modified. 2. It allows `$vector` to be
> cloned before knowing whether the method `append` is modifying, which
> hugely reduces implementation complexity in the engine.

As discussed in R11, it would be very beneficial if this marker could be on the 
method definition, not the method invocation.  You indicated that would be 
Hard(tm), but I think it's worth some effort to see if it's surmountably hard.  
(Or at least less hard than just auto-detecting it, which you indicated is 
Extremely Hard(tm).)

> * Data classes customize identity (`===`) comparison, in the same way
> arrays do. Two data objects are identical if all their properties are
> identical (including order for dynamic properties).
> * Sharing data classes by-reference is possible using references, as
> you would for arrays.
>
> * We may decide to auto-implement `__toString` for data classes,
> amongst other things. I am still undecided whether this is useful for
> PHP.

For reference:

Java record classes auto-generate equals(), toString(), hashCode(), and 
same-name methods (we don't need 

Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)

2024-04-02 Thread Ilija Tovilo
Hi Alexander

On Tue, Apr 2, 2024 at 4:53 AM Alexander Pravdin  wrote:
>
> On Tue, Apr 2, 2024 at 9:18 AM Ilija Tovilo  wrote:
> >
> > I'd like to introduce an idea I've played around with for a couple of
> > weeks: Data classes, sometimes called structs in other languages (e.g.
> > Swift and C#).
>
> While I like the idea, I would like to suggest something else in
> addition or as a separate feature. As an active user of readonly
> classes with all promoted properties for data-holding purposes, I
> would be happy to see the possibility of cloning them with passing
> some properties to modify:
>
> readonly class Data {
> function __construct(
> public string $foo,
> public string $bar,
> public string $baz,
> ) {}
> }
>
> $data = new Data(foo: 'A', bar: 'B', baz: 'C');
>
> $data2 = clone $data with (bar: 'X', baz: 'Y');

What you're asking for is part of the "Clone with" RFC:
https://wiki.php.net/rfc/clone_with

This issue is valid and the RFC would improve the ergonomics of
readonly classes.

However, note that it really only addresses a small part of what this
RFC tries achieve:

> Some APIs further exacerbate the issue by
requiring multiple copies for multiple modifications (e.g.
`$response->withStatus(200)->withHeader('X-foo', 'foo');`).

Readonly works fine for compact data structures, even if it is copied
more than it needs. For large data structures, like large lists, a
copy for each modification would be detrimental.

https://3v4l.org/GR6On

See how the performance of an insert into an array tanks if a copy of
the array is performed in each iteration (due to an additional
reference to it). Readonly is just not viable for data structures such
as lists, maps, sets, etc.

Ilija


Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)

2024-04-02 Thread Ilija Tovilo
Hi Marco

On Tue, Apr 2, 2024 at 2:56 AM Deleu  wrote:
>
>
>
> On Mon, Apr 1, 2024 at 9:20 PM Ilija Tovilo  wrote:
>>
>> I'd like to introduce an idea I've played around with for a couple of
>> weeks: Data classes, sometimes called structs in other languages (e.g.
>> Swift and C#).
>>
>> snip
>>
>> Some other things to note about data classes:
>>
>> * Data classes are ordinary classes, and as such may implement
>> interfaces, methods and more. I have not decided whether they should
>> support inheritance.
>
> I'd argue in favor of not including inheritance in the first version. Taking 
> inheritance out is an impossible BC Break. Not introducing it in the first 
> stable release gives users a chance to evaluate whether it's something we 
> will drastically miss.

I would probably agree. I believe the reasoning some languages don't
support inheritance for value types is because they are stored on the
stack. Inheritance encourages large structures, but copying very large
structures over and over on the stack may be slow.

In PHP, objects always live on the heap, and due to CoW we don't have
this problem. Still, it may be beneficial to disallow inheritance
first, and relax this restriction if it is necessary.

>> * Mutating method calls on data classes use a slightly different
>> syntax: `$vector->append!(42)`. All methods mutating `$this` must be
>> marked as `mutating`. The reason for this is twofold: 1. It signals to
>> the caller that the value is modified. 2. It allows `$vector` to be
>> cloned before knowing whether the method `append` is modifying, which
>> hugely reduces implementation complexity in the engine.
>
> I'm not sure if I understood this one. Do you mean that the `!` modifier here 
> (at call-site) is helping the engine clone the variable before even diving 
> into whether `append()` has been tagged as mutating?

Precisely. The issue comes from deeper nested values:

$circle->position->zero();

Imagine that Circle is a data class with a Position, which is also a
data class. Position::zero() is a mutating method that sets the
coordinates to 0:0. For this to work, not only the position needs to
be copied, but also $circle. However, the engine doesn't yet know
ahead of time whether zero() is mutating, and as such needs to perform
a copy.

One idea was to evaluate the left-hand-side of the method call, and
repeat it with a copy if the method is mutating. However, this is not
trivially possible, because opcodes consume their operands. So, for an
expression like `getCircle()->position->zero()`, the return value of
`getCircle()` is already gone. `!` explicitly distinguishes the call
from non-mutating calls, and knows that a copy will be needed.

But as mentioned previously, I think a different syntax offers
additional benefits for readability.

> From outside it looks odd that a clone would happen ahead-of-time while 
> talking about copy-on-write. Would this syntax break for non-mutating methods?

If by break you mean the engine would error, then yes. Only mutating
methods may (and must) be called with the $foo->bar!() syntax.

Ilija


Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)

2024-04-01 Thread Alexander Pravdin
On Tue, Apr 2, 2024 at 9:18 AM Ilija Tovilo  wrote:
>
> Hi everyone!
>
> I'd like to introduce an idea I've played around with for a couple of
> weeks: Data classes, sometimes called structs in other languages (e.g.
> Swift and C#).
>
> ```php
> data class Vector {
> private $values;
>
> public function __construct(...$values) {
> $this->values = $values;
> }
>
> public mutating function append($value) {
> $this->values[] = $value;
> }
> }
>
> $a = new Vector(1, 2, 3);
> $b = $a;
> $b->append!(4);
> var_dump($a); // Vector(1, 2, 3)
> var_dump($b); // Vector(1, 2, 3, 4)
> ```
>


While I like the idea, I would like to suggest something else in
addition or as a separate feature. As an active user of readonly
classes with all promoted properties for data-holding purposes, I
would be happy to see the possibility of cloning them with passing
some properties to modify:

readonly class Data {
function __construct(
public string $foo,
public string $bar,
public string $baz,
) {}
}

$data = new Data(foo: 'A', bar: 'B', baz: 'C');

$data2 = clone $data with (bar: 'X', baz: 'Y');

Under the hood, this "clone" will copy all values of promoted
properties as is but modify some of them to custom values specified by
the user. The implementation of this functionality in the userland
destroys the beauty of readonly classes with promoted properties.
Manual implementation requires a lot of code lines while bringing no
sense to users who read this code. Cloning methods are bigger than the
meaningful part of the class - the constructor with properties
declaration. Because I have to redeclare all the properties in the
method arguments and then initialize each property with a
corresponding value. I love readonly classes with promoted properties
for data-holding purposes and the above feature is the only one I'm
missing to be completely happy.

In my personal experience, I never needed to copy data classes like
arrays, the immutability protects against unwanted changes enough. But
copying references helps to save memory, some datasets I work with can
be very big.

--
Best,
Alex


Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)

2024-04-01 Thread Deleu
On Mon, Apr 1, 2024 at 9:20 PM Ilija Tovilo  wrote:

> Hi everyone!
>
> I'd like to introduce an idea I've played around with for a couple of
> weeks: Data classes, sometimes called structs in other languages (e.g.
> Swift and C#).
>
> In a nutshell, data classes are classes with value semantics.
> Instances of data classes are implicitly copied when assigned to a
> variable, or when passed to a function. When the new instance is
> modified, the original instance remains untouched. This might sound
> familiar: It's exactly how arrays work in PHP.
>
> ```php
> $a = [1, 2, 3];
> $b = $a;
> $b[] = 4;
> var_dump($a); // [1, 2, 3]
> var_dump($b); // [1, 2, 3, 4]
> ```
>
> You may think that copying the array on each assignment is expensive,
> and you would be right. PHP uses a trick called copy-on-write, or CoW
> for short. `$a` and `$b` actually share the same array until `$b[] =
> 4;` modifies it. It's only at this point that the array is copied and
> replaced in `$b`, so that the modification doesn't affect `$a`. As
> long as a variable is the sole owner of a value, or none of the
> variables modify the value, no copy is needed. Data classes use the
> same mechanism.
>
> But why value semantics in the first place? There are two major flaws
> with by-reference semantics for data structures:
>
> 1. It's very easy to forget cloning data that is referenced somewhere
> else before modifying it. This will lead to "spooky actions at a
> distance". Having recently used JavaScript (where all data structures
> have by-reference semantics) for an educational IR optimizer,
> accidental mutations of shared arrays/maps/sets were my primary source
> of bugs.
> 2. Defensive cloning (to avoid issue 1) will lead to useless work when
> the value is not referenced anywhere else.
>
> PHP offers readonly properties and classes to address issue 1.
> However, they further promote issue 2 by making it impossible to
> modify values without cloning them first, even if we know they are not
> referenced anywhere else. Some APIs further exacerbate the issue by
> requiring multiple copies for multiple modifications (e.g.
> `$response->withStatus(200)->withHeader('X-foo', 'foo');`).
>
> As you may have noticed, arrays already solve both of these issues
> through CoW. Data classes allow implementing arbitrary data structures
> with the same value semantics in core, extensions or userland. For
> example, a `Vector` data class may look something like the following:
>
> ```php
> data class Vector {
> private $values;
>
> public function __construct(...$values) {
> $this->values = $values;
> }
>
> public mutating function append($value) {
> $this->values[] = $value;
> }
> }
>
> $a = new Vector(1, 2, 3);
> $b = $a;
> $b->append!(4);
> var_dump($a); // Vector(1, 2, 3)
> var_dump($b); // Vector(1, 2, 3, 4)
> ```
>
> An internal Vector implementation might offer a faster and stricter
> alternative to arrays (e.g. Vector from php-ds).
>
>
Exciting times to be a PHP Developer!


> Some other things to note about data classes:
>
> * Data classes are ordinary classes, and as such may implement
> interfaces, methods and more. I have not decided whether they should
> support inheritance.
>

I'd argue in favor of not including inheritance in the first version.
Taking inheritance out is an impossible BC Break. Not introducing it in the
first stable release gives users a chance to evaluate whether it's
something we will drastically miss.


> * Mutating method calls on data classes use a slightly different
> syntax: `$vector->append!(42)`. All methods mutating `$this` must be
> marked as `mutating`. The reason for this is twofold: 1. It signals to
> the caller that the value is modified. 2. It allows `$vector` to be
> cloned before knowing whether the method `append` is modifying, which
> hugely reduces implementation complexity in the engine.
>

I'm not sure if I understood this one. Do you mean that the `!` modifier
here (at call-site) is helping the engine clone the variable before even
diving into whether `append()` has been tagged as mutating? From outside it
looks odd that a clone would happen ahead-of-time while talking about
copy-on-write. Would this syntax break for non-mutating methods?


> * Data classes customize identity (`===`) comparison, in the same way
> arrays do. Two data objects are identical if all their properties are
> identical (including order for dynamic properties).
> * Sharing data classes by-reference is possible using references, as
> you would for arrays.
> * We may decide to auto-implement `__toString` for data classes,
> amongst other things. I am still undecided whether this is useful for
> PHP.
> * Data classes protect from interior mutability. More concretely,
> mutating nested data objects stored in a `readonly` property is not
> legal, whereas it would be if they were ordinary objects.
> * In the future, it should be possible to allow using data classes in
> `SplObjectStorage`.