Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)
Hi Rowan On Fri, Apr 5, 2024 at 12:28 AM Rowan Tommins [IMSoP] wrote: > > On 03/04/2024 00:01, Ilija Tovilo wrote: > > Regardless of the implementation, there are a lot of interactions we will > want to consider; and we will have to keep considering new ones as we add to > the language. For instance, the Property Hooks RFC would probably have needed > a section on "Interaction with Data Classes". That remark was implying that data classes really are just classes with some additional tweaks. That gives us the ability to handle them differently when desired. However, they will otherwise behave just like classes, which makes it not so different from your suggestion. > On a practical note, a few things I've already thought of to consider: > > - Can a data class have readonly properties (or be marked "readonly data > class")? If so, how will they behave? Yes. The CoW semantics become irrelevant, given that nothing may trigger a separation. However, data classes also include value equality, and hashing in the future. These may still be useful for immutable data. > - Can you explicitly use the "clone" keyword with an instance of a data > class? Does it make any difference? Manual cloning is not useful, but it's also not harmful. So I'm leaning towards allowing this. This way, data classes may be handled generically, along with other non-data classes. > - Tied into that: can you implement __clone(), and when will it be called? Yes. `__clone` will be called when the object is separated, as you would expect. > - If you implement __set(), will copy-on-write be triggered before it's > called? Yes. Separation happens as part of the property fetching, rather than the assignment itself. Hence, for `$foo->bar->baz = 'baz';`, once `Bar::__set('baz', 'baz')` is called, `$foo` and `$foo->bar` will already have been separated. > - Can you implement __destruct()? Will it ever be called? Yes. As with any other object, this will be called once the last reference to the object goes away. There's nothing special going on. It's worth noting that CoW makes `__clone` and `__destruct` somewhat nondeterministic, or at least non-obvious. > > Consider this example, which would > work with the current approach: > > > > $shapes[0]->position->zero!(); > > I find this concise example confusing, and I think there's a few things to > unpack here... I think you're putting too much focus on CoW. CoW should really be considered an implementation detail. It's not _fully_ transparent, given that it is observable through `__clone` and `__destruct` as mentioned above. But it is _mostly_ transparent. Conceptually, the copy happens not when the method is called, but when the variable is assigned. For your example: ```php $shape = new Shape(new Position(42,42)); $copy = $shape; // Conceptually, a recursive copy happens here. $copy->position->zero!(); // $shape is already detached from $copy. The ! merely indicates that the value is modified. ``` > The array access doesn't need any special marker, because there's no > ambiguity. This is only true if you ignore ArrayAccess. `$foo['bar']` does not necessarily indicate that `$foo` is an array. If it were a `Vector`, then we would absolutely need an indication to separate it. It's true that `$foo->bar` currently indicates that `$foo` is a reference type. This assumption would break with this RFC, but that's also kind of the whole point. > What is going to be CoW cloned, and what is going to be modified in place? I > can't actually know without knowing the definition behind both $item and > $item->shape. It might even vary depending on input. For the most part, data classes should consist of other value types, or immutable reference types (e.g. DateTimeImmutable). This actually makes the rules quite simple: If you assign a value type, the entire data structure is copied recursively. The fact that PHP delays this step for performance is unimportant. The fact that immutable reference types aren't cloned is also unimportant, given that they don't change. Ilija
Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)
On 03/04/2024 00:01, Ilija Tovilo wrote: Data classes are classes with a single additional > zend_class_entry.ce_flags flag. So unless customized, they behave as > classes. This way, we have the option to tweak any behavior we would > like, but we don't need to. > > Of course, this will still require an analysis of what behavior we > might want to tweak. Regardless of the implementation, there are a lot of interactions we will want to consider; and we will have to keep considering new ones as we add to the language. For instance, the Property Hooks RFC would probably have needed a section on "Interaction with Data Classes". On the other hand, maybe having two types of objects to consider each time is better than having to consider combinations of lots of small features. On a practical note, a few things I've already thought of to consider: - Can a data class have readonly properties (or be marked "readonly data class")? If so, how will they behave? - Can you explicitly use the "clone" keyword with an instance of a data class? Does it make any difference? - Tied into that: can you implement __clone(), and when will it be called? - If you implement __set(), will copy-on-write be triggered before it's called? - Can you implement __destruct()? Will it ever be called? Consider this example, which would > work with the current approach: > > $shapes[0]->position->zero!(); I find this concise example confusing, and I think there's a few things to unpack here... Firstly, there's putting a data object in an array: $numbers = [ new Number(42) ]; $cow = $numbers; $cow[0]->increment!(); assert($numbers !== $cow); This is fairly clearly equivalent to this: $numbers = [ 42 ]; $cow = $numbers; $cow[0]++; assert($numbers !== $cow); CoW is triggered on the array for both, because ++ and ->increment!() are both clearly modifications. Second, there's putting a data object into another data object: $shape = new Shape(new Position(42,42)); $cow = $shape; $cow->position->zero!(); assert($shape !== $cow); This is slightly less obvious, because it presumably depends on the definition of Shape. Assuming Position is a data class: - If Shape is a normal class, changing the value of $cow->position just happens in place, and the assertion fails - If Shape is a readonly class (or position is a readonly property on a normal class), changing the value of $cow->position shouldn't be allowed, so this will presumably give an error - If Shape is a data class, changing the value of $shape->position implies a "mutation" of $shape itself, so we get a separation before anything is modified, and the assertion passes Unlike in the array case, this behaviour can't be resolved until you know the run-time type of $shape. Now, back to your example: $shapes = [ new Shape(new Position(42,42)) ]; $cow = $shapes; $shapes[0]->position->zero!(); assert($cow !== $shapes); This combines the two, meaning that now we can't know whether to separate the array until we know (at run-time) whether Shape is a normal class or a data class. But once that is known, the whole of "->position->zero!()" is a modification to $shapes[0], so we need to separate $shapes. Without such a class-wide marker, you'll need to remember to add the special syntax exactly where applicable. $shapes![0]!->position!->zero(); The array access doesn't need any special marker, because there's no ambiguity. The ambiguous call is the reference to ->position: in your current proposal, this represents a modification *if Shape is a data class, and is itself being modified*. My suggestion (or really, thought experiment) was that it would represent a modification *if it has a ! in the call*. So if Shape is a readonly class: $shapes[0]->position->!zero(); // Error: attempting to modify readonly property Shape::$position $shapes[0]->!position->!zero(); // OK; an optimised version of: $shapes[0] = clone $shapes[0] with [ 'position' => (clone $shapes[0]->position with ['x'=>0,'y'=>0]) ]; If ->! is only allowed if the RHS is either a readonly property or a mutating method, then this can be reasoned about statically: it will either error, or cause a CoW separation of $shapes. It also allows classes to mix aspects of "data class" and "normal class" behaviour, which might or might not be a good idea. This is mostly just a thought experiment, but I am a bit concerned that code like this is going to be confusingly ambiguous: $item->shape->position->zero!(); What is going to be CoW cloned, and what is going to be modified in place? I can't actually know without knowing the definition behind both $item and $item->shape. It might even vary depending on input. Regards, -- Rowan Tommins [IMSoP]
Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)
Data classes will be a very useful addition to "API Platform". API Platform is a "resource-oriented" framework that strongly encourages the use of "data-only" classes: we use PHP classes both as a specification language to document the public shape of web APIs (like an OpenAPI specification, but written in PHP instead of JSON or YAML), and as Data Transfer Objects containing the data to be serialized into JSON (read), or the JSON payload deserialized into PHP objects (write). Being able to encourage users to use structs (that's what we already call this type of behavior-less class in our workshops) for these objects will help us a lot. Kévin
Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)
Hi Larry On Wed, Apr 3, 2024 at 12:03 AM Larry Garfield wrote: > > On Tue, Apr 2, 2024, at 6:04 PM, Ilija Tovilo wrote: > > > I think you misunderstood. The intention is to mark both call-site and > > declaration. Call-site is marked with ->method!(), while declaration > > is marked with "public mutating function". Call-site is required to > > avoid the engine complexity, as previously mentioned. But > > declaration-site is required so that the user (and IDEs) even know > > that you need to use the special syntax at the call-site. > > Ah, OK. That's... unfortunate, but I defer to you on the implementation > complexity. As I've argued, I believe the different syntax is a positive. This way, data classes are known to stay unmodified unless: 1. You're explicitly modifying it yourself. 2. You're calling a mutating method, with its associated syntax. 3. You're creating a reference from the value, either explicitly or by passing it to a by-reference parameter. By-reference argument passing is the only way that mutations of data classes can be hidden (given that they look exactly like normal by-value arguments), and its arguably a flaw of by-reference passing itself. In all other cases, you can expect your value _not_ to unexpectedly change. For this reason, I consider it as an alternative approach to readonly classes. > > Disallowing ordinary by-ref objects is not trivial without additional > > performance penalties, and I don't see a good reason for it. Can you > > provide an example on when that would be problematic? > > There's two aspects to it, that I see. > > data class A { > public function __construct(public string $name) {} > } > > data class B { > public function __construct( > public A $a, > public PDO $conn, > ) {} > } > > $b = new B(new A(), $pdoConnection); > > function stuff(B $b2) { > $b2->a->name = 'Larry'; > // This triggers a CoW on $b2, separating it from $b, and also creating a > new instance of A. What about $conn? > // Does it get cloned? That would be bad. Does it not get cloned? That > seems weird that it's still the same on > // a data object. > > $b2->conn->beginTransaction(); > // This I would say is technically a modification, since the state of the > connection is changing. But then > // should this trigger $b2 cloning from $b1? Neither answer is obvious to > me. > } IMO, the answer is relatively straight-forward: PDO is a reference type. For all intents and purposes, when you're passing B to stuff(), B is copied. Since B::$conn is a "reference" (read pointer), copying B doesn't copy the connection, only the reference to it. B::$a, however, is a value type, so copying B also copies A. The fact that this isn't _exactly_ what happens under the hood due to CoW is an implementation detail, it doesn't need to change how you think about it. From the users standpoint, $b and $b2 can already separate values once stuff() is called. This is really no different from arrays: ```php $b = ['a' => ['name' => 'Larry'], 'conn' => $pdoConnection]; $b2 = $b; // $b is detached from $b2, $b['conn'] remains a shared object. ``` > The other aspect is, eg, serialization. People will come to expect > (reasonably) that a data class will have certain properties (in the abstract > sense, not lexical sense). For instance, most classes are serializable, but > a few are not. (Eg, if they have a reference to PDO or a file handle or > something unserializable.) Data classes seem like they should be safe to > serialize always, as they're "just data". If data classes are limited to > primitives and data classes internally, that means we can effectively > guarantee that they will be serializable, always. If one of the properties > could be a non-serializable object, that assumption breaks. I'm not sure that's a convincing argument to fully disallow reference types, especially since it would prevent you from storing DateTimeImmutables and other immutable values in data classes and thus break many valid use-cases. That would arguably be very limiting. > There's probably other similar examples besides serialization where "think of > this as data" and "think of this as logic" is how you'd want to think, which > leads to different assumptions, which we shouldn't stealthily break. I think your assumption here is that non-data classes cannot contain data. This doesn't hold, and especially will not until data classes become more common. Readonly classes can be considered strict versions of data classes in terms of mutability, minus some of the other semantic changes (e.g. identity). Ilija
Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)
Hi Rowan On Tue, Apr 2, 2024 at 10:10 PM Rowan Tommins [IMSoP] wrote: > > On 02/04/2024 01:17, Ilija Tovilo wrote: > > I'd like to introduce an idea I've played around with for a couple of > weeks: Data classes, sometimes called structs in other languages (e.g. > Swift and C#). > > I'm not sure if you've considered it already, but mutating methods should > probably be constrained to be void (or maybe "mutating" could occupy the > return type slot). Otherwise, someone is bound to write this: > > $start = new Location('Here'); > $end = $start->move!('There'); > > Expecting it to mean this: > > $start = new Location('Here'); > $end = $start; > $end->move!('There'); > > When it would actually mean this: > > $start = new Location('Here'); > $start->move!('There'); > $end = $start; I think there are some valid patterns for mutating methods with a return value. For example, Set::add() might return a bool to indicate whether the value was already present in the set. > I seem to remember when this was discussed before, the argument being made > that separating value objects completely means you have to spend time > deciding how they interact with every feature of the language. Data classes are classes with a single additional zend_class_entry.ce_flags flag. So unless customized, they behave as classes. This way, we have the option to tweak any behavior we would like, but we don't need to. Of course, this will still require an analysis of what behavior we might want to tweak. > Does the copy-on-write optimisation actually require the entire class to be > special, or could it be triggered by a mutating method on any object? To > allow direct modification of properties as well, we could move the call-site > marker slightly to a ->! operator: > > $foo->!mutate(); > $foo->!bar = 42; I suppose this is possible, but it puts the burden for figuring out what to separate onto the user. Consider this example, which would work with the current approach: $shapes[0]->position->zero!(); The left-hand-side of the mutating method call is fetched by "read+write". Essentially, this ensures that any array or data class is separated (copied if RC >1). Without such a class-wide marker, you'll need to remember to add the special syntax exactly where applicable. $shapes![0]!->position!->zero(); In this case, $shapes, $shapes[0], and $shapes[0]->position must all be separated. This seems very easy to mess up, especially since only zero() is actually known to be separating and can thus be verified at runtime. > The main drawback I can see (outside of the implementation, which I can't > comment on) is that we couldn't overload the === operator to use value > semantics. In exchange, a lot of decisions would simply be made for us: they > would just be objects, with all the same behaviour around inheritance, > serialization, and so on. Right, this would either require some other marker that switches to this mode of comparison, or operator overloading. Ilija
Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)
Hi Niels On Tue, Apr 2, 2024 at 8:16 PM Niels Dossche wrote: > > On 02/04/2024 02:17, Ilija Tovilo wrote: > > Hi everyone! > > > > I'd like to introduce an idea I've played around with for a couple of > > weeks: Data classes, sometimes called structs in other languages (e.g. > > Swift and C#). > > As already hinted in the thread, I also think inheritance may be dangerous in > a first version. > I want to add to that: if you extend a data-class with a non-data-class, the > data-class behaviour gets lost, which is logical in a sense but also > surprised me in a way. Yes, that's definitely not intended. I haven't implemented any inheritance checks yet. But if inheritance is allowed, then it should be restricted to classes of the same kind (by-ref or by-val). > Also, FWIW, I'm not sure about the name "data" class, perhaps "value" class > or something alike is what people may be more familiar with wrt semantics, > although dataclass is also a known term. I'm happy with value class, struct, record, data class, what have you. I'll accept whatever the majority prefers. > I do have a question about iterator behaviour. Consider this code: > ``` > data class Test { > public $a = 1; > public $b = 2; > } > > $test = new Test; > foreach ($test as $k => &$v) { > if ($k === "b") > $test->a = $test; > var_dump($k); > } > ``` > > This will reset the iterator of the object on separation, so we will get an > infinite loop. > Is this intended? > If so, is it because the right hand side is the original object while the > left hand side gets the clone? > Is this consistent with how arrays separate? That's a good question. I have not really thought about iterators yet. Modification of an array iterated by-reference does not restart the iterator. Actually, by-reference capturing of the value also captures the array by-reference, which is not completely intuitive. My initial gut feeling is to handle data classes the same, i.e. capture them by-reference when iterating the value by reference, so that iteration is not restarted. Ilija
Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)
On Tue, Apr 2, 2024, at 6:04 PM, Ilija Tovilo wrote: >> What would be the reason not to? As you indicated in another reply, the >> main reason some languages don't is to avoid large stack copies, but PHP >> doesn't have large stack copies for objects anyway so that's a non-issue. >> >> I've long argued that the fewer differences there are between service >> classes and data classes, the better, so I'm not sure what advantage this >> would have other than "ugh, inheritance is such a mess" (which is true, but >> that ship sailed long ago). > > One issue that just came to mind is object identity. For example: > > class Person { > public function __construct( > public string $firstname, > public string $lastname, > ) {} > } > > class Manager extends Person { > public function bossAround() {} > } > > $person = new Person('Boss', 'Man'); > $manager = new Manager('Boss', 'Man'); > var_dump($person === $manager); // ??? > > Equality for data objects is based on data, rather than the object > handle. How does this interact with inheritance? Technically, Person > and Manager represent the same data. Manager contains additional > behavior, but does that change identity? > > I'm not sure what the answer is. That's just the first thing that came > to mind. I'm confident we'll discover more such edge cases. Of course, > I can invest the time to find the questions before deciding to > disallow inheritance. As Bruce already demonstrated, equality should include type, not just properties. Even without inheritance that is necessary. There may be good reason to omit inheritance, as we did on enums, but that shouldn't be the starting point. (I'd have to research and see what other languages do. I think it's a mixed bag.) We should try to ferret out those edge cases and see if there's reasonable solutions to them. >> > * Mutating method calls on data classes use a slightly different >> > syntax: `$vector->append!(42)`. All methods mutating `$this` must be >> > marked as `mutating`. The reason for this is twofold: 1. It signals to >> > the caller that the value is modified. 2. It allows `$vector` to be >> > cloned before knowing whether the method `append` is modifying, which >> > hugely reduces implementation complexity in the engine. >> >> As discussed in R11, it would be very beneficial if this marker could be on >> the method definition, not the method invocation. You indicated that would >> be Hard(tm), but I think it's worth some effort to see if it's surmountably >> hard. (Or at least less hard than just auto-detecting it, which you >> indicated is Extremely Hard(tm).) > > I think you misunderstood. The intention is to mark both call-site and > declaration. Call-site is marked with ->method!(), while declaration > is marked with "public mutating function". Call-site is required to > avoid the engine complexity, as previously mentioned. But > declaration-site is required so that the user (and IDEs) even know > that you need to use the special syntax at the call-site. Ah, OK. That's... unfortunate, but I defer to you on the implementation complexity. >> So to the extent there is a consensus, equality, stringifying, and a >> hashcode (which we don't have yet, but will need in the future for some >> things I suspect) seem to be the rough expected defaults. > > I'm just skeptical whether the default __toString() is ever useful. I > can see an argument for it for quick debugging in languages that don't > provide something like var_dump(). In PHP this seems much less useful. > It's impossible to provide a default implementation that works > everywhere (or pretty much anywhere, even). > > Equality is already included. Hashing should be added separately, and > probably not just to data classes. The equivalent of Python's __repr__ (which it auto-generates) would be __debugInfo(). Arguably its current output is what the default would likely be anyway, though. I believe the typical auto-toString output is the same data, but presented in a more human-friendly way. (So yes, mainly useful for debugging.) Equality, well, we've already debated whether or not we should make that a general feature. :-) Of note, though, in languages with equals(), it's also user-overridable. >> > * In the future, it should be possible to allow using data classes in >> > `SplObjectStorage`. However, because hashing is complex, this will be >> > postponed to a separate RFC. I believe this is where we would want/need a __hash() method or similar; Derick and I encountered that while researching collections in other languages. Leaving it out for now is fine, but it would be important for any future list-of functionality. >> Would data class properties only be allowed to be other data classes, or >> could they hold a non-data class? My knee jerk response is they should be >> data classes all the way down; the only counter-argument I can think of it >> would be how much existing code
Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)
On Tue, Apr 2, 2024 at 1:47 PM Larry Garfield wrote: > > * Data classes protect from interior mutability. More concretely, > > mutating nested data objects stored in a `readonly` property is not > > legal, whereas it would be if they were ordinary objects. > > * In the future, it should be possible to allow using data classes in > > `SplObjectStorage`. However, because hashing is complex, this will be > > postponed to a separate RFC. > > Would data class properties only be allowed to be other data classes, or > could they hold a non-data class? My knee jerk response is they should be > data classes all the way down; the only counter-argument I can think of it > would be how much existing code is out there that is a "data class" in all > but name. I still fear someone adding a DB connection object to a data > class and everything going to hell, though. :-) > If there is a class made up of 90% data struct and 10% non-data struct, the 90% could be extracted into a true data struct and be referenced in the existing regular class, making it even more organized in terms of establishing what's "data" and what's "service". I would really favor making it "data class" all the way down. I understand you disagree with the argument against inheritance, but to me the same logic applies here. Making it data class only allows for lifting the restriction in the future, if necessary (requiring another RFC vote). Making it mixed on version 1 means that support for the mixture of them can never be undone. -- Marco Deleu
Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)
On 02/04/2024 01:17, Ilija Tovilo wrote: I'd like to introduce an idea I've played around with for a couple of weeks: Data classes, sometimes called structs in other languages (e.g. Swift and C#). Hi Ilija, I'm really interested to see how this develops. A couple of thoughts that immediately occurred to me... I'm not sure if you've considered it already, but mutating methods should probably be constrained to be void (or maybe "mutating" could occupy the return type slot). Otherwise, someone is bound to write this: $start = new Location('Here'); $end = $start->move!('There'); Expecting it to mean this: $start = new Location('Here'); $end = $start; $end->move!('There'); When it would actually mean this: $start = new Location('Here'); $start->move!('There'); $end = $start; I seem to remember when this was discussed before, the argument being made that separating value objects completely means you have to spend time deciding how they interact with every feature of the language. Does the copy-on-write optimisation actually require the entire class to be special, or could it be triggered by a mutating method on any object? To allow direct modification of properties as well, we could move the call-site marker slightly to a ->! operator: $foo->!mutate(); $foo->!bar = 42; The first would be the same as your current version: it would perform a CoW reference separation / clone, then call the method, which would require a "mutating" marker. The second would essentially be an optimised version of $foo = clone $foo with [ 'bar' => 42 ] During the method call or write operation, readonly properties would allow an additional write, as is the case in __clone and the "clone with" proposal. So a "pure" data object would simply be declared with the existing "readonly class" syntax. The main drawback I can see (outside of the implementation, which I can't comment on) is that we couldn't overload the === operator to use value semantics. In exchange, a lot of decisions would simply be made for us: they would just be objects, with all the same behaviour around inheritance, serialization, and so on. Regards, -- Rowan Tommins [IMSoP]
Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)
On Tue, Apr 2, 2024, at 20:51, Bruce Weirdan wrote: > On Tue, Apr 2, 2024 at 8:05 PM Ilija Tovilo wrote: > > > Equality for data objects is based on data, rather than the object > > handle. > > I believe equality should always consider the type of the object. > > ```php > new Problem(size:'big') === new Universe(size:'big') > && new Problem(size:'big') === new Shoe(size:'big'); > ``` > > If the above can ever be true then I'm not sure how big is the problem > (but probably very big). > Also see the examples of non-comparable ids - `new CompanyId(1)` > should not be equal to `new PersonId(1)` > > And I'd find it very confusing if the following crashed > > ```php > function f(Universe $_u): void {} > $universe = new Universe(size:'big'); > $shoe = new Shoe(size:'big); > > if ($shoe === $universe) { >f($shoe); // shoe is *identical* to the universe, so it should be > accepted wherever the universe is > } > ``` > > -- > Best regards, > Bruce Weirdan > mailto:weir...@gmail.com > I'd love to see it so that equality was more like == for regular objects. If the type matches and the data matches, it's true. It'd be really helpful to be able to downcast types though. Such as in my user id example I gave earlier. Once it reaches a certain point in the code, it doesn't matter that it was once a UserId, it just matters that it is currently an Id. Now that I think about it, decoration might be better than inheritance here and inheritance might make more sense to be banned. In other words, this might be just as simple and easy to use: data class Id { public function __construct(public string $id) {} } data class UserId { public function __construct(public Id $id) {} } Though it would be really interesting to use them as "traits" for each other to say "this data class can be converted to another type, but information will be lost" where they are 100% separate types but can be "cast" to specified types. // "use" has all the same rules as extends, but, // UserId is not an Id; it can be converted to an Id data class UserId use Id { public function __construct(public string $id, public string $name) {} } $user = new UserId('123', 'rob'); $id = (Id) $user; $user !== $id === true; $id is 100% Id and lost all its "userness." Hmm. Interesting indeed. Probably not practical, but interesting. — Rob
Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)
On Tue, Apr 2, 2024 at 8:05 PM Ilija Tovilo wrote: > Equality for data objects is based on data, rather than the object > handle. I believe equality should always consider the type of the object. ```php new Problem(size:'big') === new Universe(size:'big') && new Problem(size:'big') === new Shoe(size:'big'); ``` If the above can ever be true then I'm not sure how big is the problem (but probably very big). Also see the examples of non-comparable ids - `new CompanyId(1)` should not be equal to `new PersonId(1)` And I'd find it very confusing if the following crashed ```php function f(Universe $_u): void {} $universe = new Universe(size:'big'); $shoe = new Shoe(size:'big); if ($shoe === $universe) { f($shoe); // shoe is *identical* to the universe, so it should be accepted wherever the universe is } ``` -- Best regards, Bruce Weirdan mailto:weir...@gmail.com
Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)
On 02/04/2024 02:17, Ilija Tovilo wrote: > Hi everyone! > > I'd like to introduce an idea I've played around with for a couple of > weeks: Data classes, sometimes called structs in other languages (e.g. > Swift and C#). > > In a nutshell, data classes are classes with value semantics. > Instances of data classes are implicitly copied when assigned to a > variable, or when passed to a function. When the new instance is > modified, the original instance remains untouched. This might sound > familiar: It's exactly how arrays work in PHP. > > ```php > $a = [1, 2, 3]; > $b = $a; > $b[] = 4; > var_dump($a); // [1, 2, 3] > var_dump($b); // [1, 2, 3, 4] > ``` > > You may think that copying the array on each assignment is expensive, > and you would be right. PHP uses a trick called copy-on-write, or CoW > for short. `$a` and `$b` actually share the same array until `$b[] = > 4;` modifies it. It's only at this point that the array is copied and > replaced in `$b`, so that the modification doesn't affect `$a`. As > long as a variable is the sole owner of a value, or none of the > variables modify the value, no copy is needed. Data classes use the > same mechanism. > > But why value semantics in the first place? There are two major flaws > with by-reference semantics for data structures: > > 1. It's very easy to forget cloning data that is referenced somewhere > else before modifying it. This will lead to "spooky actions at a > distance". Having recently used JavaScript (where all data structures > have by-reference semantics) for an educational IR optimizer, > accidental mutations of shared arrays/maps/sets were my primary source > of bugs. > 2. Defensive cloning (to avoid issue 1) will lead to useless work when > the value is not referenced anywhere else. > > PHP offers readonly properties and classes to address issue 1. > However, they further promote issue 2 by making it impossible to > modify values without cloning them first, even if we know they are not > referenced anywhere else. Some APIs further exacerbate the issue by > requiring multiple copies for multiple modifications (e.g. > `$response->withStatus(200)->withHeader('X-foo', 'foo');`). > > As you may have noticed, arrays already solve both of these issues > through CoW. Data classes allow implementing arbitrary data structures > with the same value semantics in core, extensions or userland. For > example, a `Vector` data class may look something like the following: > > ```php > data class Vector { > private $values; > > public function __construct(...$values) { > $this->values = $values; > } > > public mutating function append($value) { > $this->values[] = $value; > } > } > > $a = new Vector(1, 2, 3); > $b = $a; > $b->append!(4); > var_dump($a); // Vector(1, 2, 3) > var_dump($b); // Vector(1, 2, 3, 4) > ``` > > An internal Vector implementation might offer a faster and stricter > alternative to arrays (e.g. Vector from php-ds). > > Some other things to note about data classes: > > * Data classes are ordinary classes, and as such may implement > interfaces, methods and more. I have not decided whether they should > support inheritance. > * Mutating method calls on data classes use a slightly different > syntax: `$vector->append!(42)`. All methods mutating `$this` must be > marked as `mutating`. The reason for this is twofold: 1. It signals to > the caller that the value is modified. 2. It allows `$vector` to be > cloned before knowing whether the method `append` is modifying, which > hugely reduces implementation complexity in the engine. > * Data classes customize identity (`===`) comparison, in the same way > arrays do. Two data objects are identical if all their properties are > identical (including order for dynamic properties). > * Sharing data classes by-reference is possible using references, as > you would for arrays. > * We may decide to auto-implement `__toString` for data classes, > amongst other things. I am still undecided whether this is useful for > PHP. > * Data classes protect from interior mutability. More concretely, > mutating nested data objects stored in a `readonly` property is not > legal, whereas it would be if they were ordinary objects. > * In the future, it should be possible to allow using data classes in > `SplObjectStorage`. However, because hashing is complex, this will be > postponed to a separate RFC. > > One known gotcha is that we cannot trivially enforce placement of > `modfying` on methods without a performance hit. It is the > responsibility of the user to correctly mark such methods. > > Here's a fully functional PoC, excluding JIT: > https://github.com/php/php-src/pull/13800 > > Let me know what you think. I will start working on an RFC draft once > work on property hooks concludes. > > Ilija Hi Ilija Thank you for this proposal, I like the idea of having value semantic objects available. I pulled your branch and played with it a bit. As already hinted in
Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)
Hi Larry On Tue, Apr 2, 2024 at 5:31 PM Larry Garfield wrote: > > On Tue, Apr 2, 2024, at 12:17 AM, Ilija Tovilo wrote: > > Hi everyone! > > > > I'd like to introduce an idea I've played around with for a couple of > > weeks: Data classes, sometimes called structs in other languages (e.g. > > Swift and C#). > > > > * Data classes are ordinary classes, and as such may implement > > interfaces, methods and more. I have not decided whether they should > > support inheritance. > > What would be the reason not to? As you indicated in another reply, the main > reason some languages don't is to avoid large stack copies, but PHP doesn't > have large stack copies for objects anyway so that's a non-issue. > > I've long argued that the fewer differences there are between service classes > and data classes, the better, so I'm not sure what advantage this would have > other than "ugh, inheritance is such a mess" (which is true, but that ship > sailed long ago). One issue that just came to mind is object identity. For example: class Person { public function __construct( public string $firstname, public string $lastname, ) {} } class Manager extends Person { public function bossAround() {} } $person = new Person('Boss', 'Man'); $manager = new Manager('Boss', 'Man'); var_dump($person === $manager); // ??? Equality for data objects is based on data, rather than the object handle. How does this interact with inheritance? Technically, Person and Manager represent the same data. Manager contains additional behavior, but does that change identity? I'm not sure what the answer is. That's just the first thing that came to mind. I'm confident we'll discover more such edge cases. Of course, I can invest the time to find the questions before deciding to disallow inheritance. > > * Mutating method calls on data classes use a slightly different > > syntax: `$vector->append!(42)`. All methods mutating `$this` must be > > marked as `mutating`. The reason for this is twofold: 1. It signals to > > the caller that the value is modified. 2. It allows `$vector` to be > > cloned before knowing whether the method `append` is modifying, which > > hugely reduces implementation complexity in the engine. > > As discussed in R11, it would be very beneficial if this marker could be on > the method definition, not the method invocation. You indicated that would > be Hard(tm), but I think it's worth some effort to see if it's surmountably > hard. (Or at least less hard than just auto-detecting it, which you > indicated is Extremely Hard(tm).) I think you misunderstood. The intention is to mark both call-site and declaration. Call-site is marked with ->method!(), while declaration is marked with "public mutating function". Call-site is required to avoid the engine complexity, as previously mentioned. But declaration-site is required so that the user (and IDEs) even know that you need to use the special syntax at the call-site. > So to the extent there is a consensus, equality, stringifying, and a hashcode > (which we don't have yet, but will need in the future for some things I > suspect) seem to be the rough expected defaults. I'm just skeptical whether the default __toString() is ever useful. I can see an argument for it for quick debugging in languages that don't provide something like var_dump(). In PHP this seems much less useful. It's impossible to provide a default implementation that works everywhere (or pretty much anywhere, even). Equality is already included. Hashing should be added separately, and probably not just to data classes. > > * In the future, it should be possible to allow using data classes in > > `SplObjectStorage`. However, because hashing is complex, this will be > > postponed to a separate RFC. > > Would data class properties only be allowed to be other data classes, or > could they hold a non-data class? My knee jerk response is they should be > data classes all the way down; the only counter-argument I can think of it > would be how much existing code is out there that is a "data class" in all > but name. I still fear someone adding a DB connection object to a data class > and everything going to hell, though. :-) Disallowing ordinary by-ref objects is not trivial without additional performance penalties, and I don't see a good reason for it. Can you provide an example on when that would be problematic? Ilija
Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)
On Tue, Apr 2, 2024 at 2:20 AM Ilija Tovilo wrote: > > Hi everyone! > > I'd like to introduce an idea I've played around with for a couple of > weeks: Data classes, sometimes called structs in other languages (e.g. > Swift and C#). > > In a nutshell, data classes are classes with value semantics. > Instances of data classes are implicitly copied when assigned to a > variable, or when passed to a function. When the new instance is > modified, the original instance remains untouched. This might sound > familiar: It's exactly how arrays work in PHP. > > ```php > $a = [1, 2, 3]; > $b = $a; > $b[] = 4; > var_dump($a); // [1, 2, 3] > var_dump($b); // [1, 2, 3, 4] > ``` > > You may think that copying the array on each assignment is expensive, > and you would be right. PHP uses a trick called copy-on-write, or CoW > for short. `$a` and `$b` actually share the same array until `$b[] = > 4;` modifies it. It's only at this point that the array is copied and > replaced in `$b`, so that the modification doesn't affect `$a`. As > long as a variable is the sole owner of a value, or none of the > variables modify the value, no copy is needed. Data classes use the > same mechanism. > > But why value semantics in the first place? There are two major flaws > with by-reference semantics for data structures: > > 1. It's very easy to forget cloning data that is referenced somewhere > else before modifying it. This will lead to "spooky actions at a > distance". Having recently used JavaScript (where all data structures > have by-reference semantics) for an educational IR optimizer, > accidental mutations of shared arrays/maps/sets were my primary source > of bugs. > 2. Defensive cloning (to avoid issue 1) will lead to useless work when > the value is not referenced anywhere else. > > PHP offers readonly properties and classes to address issue 1. > However, they further promote issue 2 by making it impossible to > modify values without cloning them first, even if we know they are not > referenced anywhere else. Some APIs further exacerbate the issue by > requiring multiple copies for multiple modifications (e.g. > `$response->withStatus(200)->withHeader('X-foo', 'foo');`). > > As you may have noticed, arrays already solve both of these issues > through CoW. Data classes allow implementing arbitrary data structures > with the same value semantics in core, extensions or userland. For > example, a `Vector` data class may look something like the following: > > ```php > data class Vector { > private $values; > > public function __construct(...$values) { > $this->values = $values; > } > > public mutating function append($value) { > $this->values[] = $value; > } > } > > $a = new Vector(1, 2, 3); > $b = $a; > $b->append!(4); > var_dump($a); // Vector(1, 2, 3) > var_dump($b); // Vector(1, 2, 3, 4) > ``` > > An internal Vector implementation might offer a faster and stricter > alternative to arrays (e.g. Vector from php-ds). > > Some other things to note about data classes: > > * Data classes are ordinary classes, and as such may implement > interfaces, methods and more. I have not decided whether they should > support inheritance. > * Mutating method calls on data classes use a slightly different > syntax: `$vector->append!(42)`. All methods mutating `$this` must be > marked as `mutating`. The reason for this is twofold: 1. It signals to > the caller that the value is modified. 2. It allows `$vector` to be > cloned before knowing whether the method `append` is modifying, which > hugely reduces implementation complexity in the engine. > * Data classes customize identity (`===`) comparison, in the same way > arrays do. Two data objects are identical if all their properties are > identical (including order for dynamic properties). > * Sharing data classes by-reference is possible using references, as > you would for arrays. > * We may decide to auto-implement `__toString` for data classes, > amongst other things. I am still undecided whether this is useful for > PHP. > * Data classes protect from interior mutability. More concretely, > mutating nested data objects stored in a `readonly` property is not > legal, whereas it would be if they were ordinary objects. > * In the future, it should be possible to allow using data classes in > `SplObjectStorage`. However, because hashing is complex, this will be > postponed to a separate RFC. > > One known gotcha is that we cannot trivially enforce placement of > `modfying` on methods without a performance hit. It is the > responsibility of the user to correctly mark such methods. > > Here's a fully functional PoC, excluding JIT: > https://github.com/php/php-src/pull/13800 > > Let me know what you think. I will start working on an RFC draft once > work on property hooks concludes. > > Ilija Neat! I've been playing around with "value-like" objects for awhile now: https://github.com/withinboredom/time Having inheritance supported would be useful, for example,
Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)
On Tue, Apr 2, 2024, at 12:17 AM, Ilija Tovilo wrote: > Hi everyone! > > I'd like to introduce an idea I've played around with for a couple of > weeks: Data classes, sometimes called structs in other languages (e.g. > Swift and C#). *gets popcorn* > In a nutshell, data classes are classes with value semantics. > Instances of data classes are implicitly copied when assigned to a > variable, or when passed to a function. When the new instance is > modified, the original instance remains untouched. This might sound > familiar: It's exactly how arrays work in PHP. > > ```php > $a = [1, 2, 3]; > $b = $a; > $b[] = 4; > var_dump($a); // [1, 2, 3] > var_dump($b); // [1, 2, 3, 4] > ``` > > You may think that copying the array on each assignment is expensive, > and you would be right. PHP uses a trick called copy-on-write, or CoW > for short. `$a` and `$b` actually share the same array until `$b[] = > 4;` modifies it. It's only at this point that the array is copied and > replaced in `$b`, so that the modification doesn't affect `$a`. As > long as a variable is the sole owner of a value, or none of the > variables modify the value, no copy is needed. Data classes use the > same mechanism. > > But why value semantics in the first place? There are two major flaws > with by-reference semantics for data structures: > > 1. It's very easy to forget cloning data that is referenced somewhere > else before modifying it. This will lead to "spooky actions at a > distance". Having recently used JavaScript (where all data structures > have by-reference semantics) for an educational IR optimizer, > accidental mutations of shared arrays/maps/sets were my primary source > of bugs. > 2. Defensive cloning (to avoid issue 1) will lead to useless work when > the value is not referenced anywhere else. > > PHP offers readonly properties and classes to address issue 1. > However, they further promote issue 2 by making it impossible to > modify values without cloning them first, even if we know they are not > referenced anywhere else. Some APIs further exacerbate the issue by > requiring multiple copies for multiple modifications (e.g. > `$response->withStatus(200)->withHeader('X-foo', 'foo');`). > > As you may have noticed, arrays already solve both of these issues > through CoW. Data classes allow implementing arbitrary data structures > with the same value semantics in core, extensions or userland. For > example, a `Vector` data class may look something like the following: > > ```php > data class Vector { > private $values; > > public function __construct(...$values) { > $this->values = $values; > } > > public mutating function append($value) { > $this->values[] = $value; > } > } > > $a = new Vector(1, 2, 3); > $b = $a; > $b->append!(4); > var_dump($a); // Vector(1, 2, 3) > var_dump($b); // Vector(1, 2, 3, 4) > ``` > > An internal Vector implementation might offer a faster and stricter > alternative to arrays (e.g. Vector from php-ds). > > Some other things to note about data classes: > > * Data classes are ordinary classes, and as such may implement > interfaces, methods and more. I have not decided whether they should > support inheritance. What would be the reason not to? As you indicated in another reply, the main reason some languages don't is to avoid large stack copies, but PHP doesn't have large stack copies for objects anyway so that's a non-issue. I've long argued that the fewer differences there are between service classes and data classes, the better, so I'm not sure what advantage this would have other than "ugh, inheritance is such a mess" (which is true, but that ship sailed long ago). > * Mutating method calls on data classes use a slightly different > syntax: `$vector->append!(42)`. All methods mutating `$this` must be > marked as `mutating`. The reason for this is twofold: 1. It signals to > the caller that the value is modified. 2. It allows `$vector` to be > cloned before knowing whether the method `append` is modifying, which > hugely reduces implementation complexity in the engine. As discussed in R11, it would be very beneficial if this marker could be on the method definition, not the method invocation. You indicated that would be Hard(tm), but I think it's worth some effort to see if it's surmountably hard. (Or at least less hard than just auto-detecting it, which you indicated is Extremely Hard(tm).) > * Data classes customize identity (`===`) comparison, in the same way > arrays do. Two data objects are identical if all their properties are > identical (including order for dynamic properties). > * Sharing data classes by-reference is possible using references, as > you would for arrays. > > * We may decide to auto-implement `__toString` for data classes, > amongst other things. I am still undecided whether this is useful for > PHP. For reference: Java record classes auto-generate equals(), toString(), hashCode(), and same-name methods (we don't need
Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)
Hi Alexander On Tue, Apr 2, 2024 at 4:53 AM Alexander Pravdin wrote: > > On Tue, Apr 2, 2024 at 9:18 AM Ilija Tovilo wrote: > > > > I'd like to introduce an idea I've played around with for a couple of > > weeks: Data classes, sometimes called structs in other languages (e.g. > > Swift and C#). > > While I like the idea, I would like to suggest something else in > addition or as a separate feature. As an active user of readonly > classes with all promoted properties for data-holding purposes, I > would be happy to see the possibility of cloning them with passing > some properties to modify: > > readonly class Data { > function __construct( > public string $foo, > public string $bar, > public string $baz, > ) {} > } > > $data = new Data(foo: 'A', bar: 'B', baz: 'C'); > > $data2 = clone $data with (bar: 'X', baz: 'Y'); What you're asking for is part of the "Clone with" RFC: https://wiki.php.net/rfc/clone_with This issue is valid and the RFC would improve the ergonomics of readonly classes. However, note that it really only addresses a small part of what this RFC tries achieve: > Some APIs further exacerbate the issue by requiring multiple copies for multiple modifications (e.g. `$response->withStatus(200)->withHeader('X-foo', 'foo');`). Readonly works fine for compact data structures, even if it is copied more than it needs. For large data structures, like large lists, a copy for each modification would be detrimental. https://3v4l.org/GR6On See how the performance of an insert into an array tanks if a copy of the array is performed in each iteration (due to an additional reference to it). Readonly is just not viable for data structures such as lists, maps, sets, etc. Ilija
Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)
Hi Marco On Tue, Apr 2, 2024 at 2:56 AM Deleu wrote: > > > > On Mon, Apr 1, 2024 at 9:20 PM Ilija Tovilo wrote: >> >> I'd like to introduce an idea I've played around with for a couple of >> weeks: Data classes, sometimes called structs in other languages (e.g. >> Swift and C#). >> >> snip >> >> Some other things to note about data classes: >> >> * Data classes are ordinary classes, and as such may implement >> interfaces, methods and more. I have not decided whether they should >> support inheritance. > > I'd argue in favor of not including inheritance in the first version. Taking > inheritance out is an impossible BC Break. Not introducing it in the first > stable release gives users a chance to evaluate whether it's something we > will drastically miss. I would probably agree. I believe the reasoning some languages don't support inheritance for value types is because they are stored on the stack. Inheritance encourages large structures, but copying very large structures over and over on the stack may be slow. In PHP, objects always live on the heap, and due to CoW we don't have this problem. Still, it may be beneficial to disallow inheritance first, and relax this restriction if it is necessary. >> * Mutating method calls on data classes use a slightly different >> syntax: `$vector->append!(42)`. All methods mutating `$this` must be >> marked as `mutating`. The reason for this is twofold: 1. It signals to >> the caller that the value is modified. 2. It allows `$vector` to be >> cloned before knowing whether the method `append` is modifying, which >> hugely reduces implementation complexity in the engine. > > I'm not sure if I understood this one. Do you mean that the `!` modifier here > (at call-site) is helping the engine clone the variable before even diving > into whether `append()` has been tagged as mutating? Precisely. The issue comes from deeper nested values: $circle->position->zero(); Imagine that Circle is a data class with a Position, which is also a data class. Position::zero() is a mutating method that sets the coordinates to 0:0. For this to work, not only the position needs to be copied, but also $circle. However, the engine doesn't yet know ahead of time whether zero() is mutating, and as such needs to perform a copy. One idea was to evaluate the left-hand-side of the method call, and repeat it with a copy if the method is mutating. However, this is not trivially possible, because opcodes consume their operands. So, for an expression like `getCircle()->position->zero()`, the return value of `getCircle()` is already gone. `!` explicitly distinguishes the call from non-mutating calls, and knows that a copy will be needed. But as mentioned previously, I think a different syntax offers additional benefits for readability. > From outside it looks odd that a clone would happen ahead-of-time while > talking about copy-on-write. Would this syntax break for non-mutating methods? If by break you mean the engine would error, then yes. Only mutating methods may (and must) be called with the $foo->bar!() syntax. Ilija
Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)
On Tue, Apr 2, 2024 at 9:18 AM Ilija Tovilo wrote: > > Hi everyone! > > I'd like to introduce an idea I've played around with for a couple of > weeks: Data classes, sometimes called structs in other languages (e.g. > Swift and C#). > > ```php > data class Vector { > private $values; > > public function __construct(...$values) { > $this->values = $values; > } > > public mutating function append($value) { > $this->values[] = $value; > } > } > > $a = new Vector(1, 2, 3); > $b = $a; > $b->append!(4); > var_dump($a); // Vector(1, 2, 3) > var_dump($b); // Vector(1, 2, 3, 4) > ``` > While I like the idea, I would like to suggest something else in addition or as a separate feature. As an active user of readonly classes with all promoted properties for data-holding purposes, I would be happy to see the possibility of cloning them with passing some properties to modify: readonly class Data { function __construct( public string $foo, public string $bar, public string $baz, ) {} } $data = new Data(foo: 'A', bar: 'B', baz: 'C'); $data2 = clone $data with (bar: 'X', baz: 'Y'); Under the hood, this "clone" will copy all values of promoted properties as is but modify some of them to custom values specified by the user. The implementation of this functionality in the userland destroys the beauty of readonly classes with promoted properties. Manual implementation requires a lot of code lines while bringing no sense to users who read this code. Cloning methods are bigger than the meaningful part of the class - the constructor with properties declaration. Because I have to redeclare all the properties in the method arguments and then initialize each property with a corresponding value. I love readonly classes with promoted properties for data-holding purposes and the above feature is the only one I'm missing to be completely happy. In my personal experience, I never needed to copy data classes like arrays, the immutability protects against unwanted changes enough. But copying references helps to save memory, some datasets I work with can be very big. -- Best, Alex
Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)
On Mon, Apr 1, 2024 at 9:20 PM Ilija Tovilo wrote: > Hi everyone! > > I'd like to introduce an idea I've played around with for a couple of > weeks: Data classes, sometimes called structs in other languages (e.g. > Swift and C#). > > In a nutshell, data classes are classes with value semantics. > Instances of data classes are implicitly copied when assigned to a > variable, or when passed to a function. When the new instance is > modified, the original instance remains untouched. This might sound > familiar: It's exactly how arrays work in PHP. > > ```php > $a = [1, 2, 3]; > $b = $a; > $b[] = 4; > var_dump($a); // [1, 2, 3] > var_dump($b); // [1, 2, 3, 4] > ``` > > You may think that copying the array on each assignment is expensive, > and you would be right. PHP uses a trick called copy-on-write, or CoW > for short. `$a` and `$b` actually share the same array until `$b[] = > 4;` modifies it. It's only at this point that the array is copied and > replaced in `$b`, so that the modification doesn't affect `$a`. As > long as a variable is the sole owner of a value, or none of the > variables modify the value, no copy is needed. Data classes use the > same mechanism. > > But why value semantics in the first place? There are two major flaws > with by-reference semantics for data structures: > > 1. It's very easy to forget cloning data that is referenced somewhere > else before modifying it. This will lead to "spooky actions at a > distance". Having recently used JavaScript (where all data structures > have by-reference semantics) for an educational IR optimizer, > accidental mutations of shared arrays/maps/sets were my primary source > of bugs. > 2. Defensive cloning (to avoid issue 1) will lead to useless work when > the value is not referenced anywhere else. > > PHP offers readonly properties and classes to address issue 1. > However, they further promote issue 2 by making it impossible to > modify values without cloning them first, even if we know they are not > referenced anywhere else. Some APIs further exacerbate the issue by > requiring multiple copies for multiple modifications (e.g. > `$response->withStatus(200)->withHeader('X-foo', 'foo');`). > > As you may have noticed, arrays already solve both of these issues > through CoW. Data classes allow implementing arbitrary data structures > with the same value semantics in core, extensions or userland. For > example, a `Vector` data class may look something like the following: > > ```php > data class Vector { > private $values; > > public function __construct(...$values) { > $this->values = $values; > } > > public mutating function append($value) { > $this->values[] = $value; > } > } > > $a = new Vector(1, 2, 3); > $b = $a; > $b->append!(4); > var_dump($a); // Vector(1, 2, 3) > var_dump($b); // Vector(1, 2, 3, 4) > ``` > > An internal Vector implementation might offer a faster and stricter > alternative to arrays (e.g. Vector from php-ds). > > Exciting times to be a PHP Developer! > Some other things to note about data classes: > > * Data classes are ordinary classes, and as such may implement > interfaces, methods and more. I have not decided whether they should > support inheritance. > I'd argue in favor of not including inheritance in the first version. Taking inheritance out is an impossible BC Break. Not introducing it in the first stable release gives users a chance to evaluate whether it's something we will drastically miss. > * Mutating method calls on data classes use a slightly different > syntax: `$vector->append!(42)`. All methods mutating `$this` must be > marked as `mutating`. The reason for this is twofold: 1. It signals to > the caller that the value is modified. 2. It allows `$vector` to be > cloned before knowing whether the method `append` is modifying, which > hugely reduces implementation complexity in the engine. > I'm not sure if I understood this one. Do you mean that the `!` modifier here (at call-site) is helping the engine clone the variable before even diving into whether `append()` has been tagged as mutating? From outside it looks odd that a clone would happen ahead-of-time while talking about copy-on-write. Would this syntax break for non-mutating methods? > * Data classes customize identity (`===`) comparison, in the same way > arrays do. Two data objects are identical if all their properties are > identical (including order for dynamic properties). > * Sharing data classes by-reference is possible using references, as > you would for arrays. > * We may decide to auto-implement `__toString` for data classes, > amongst other things. I am still undecided whether this is useful for > PHP. > * Data classes protect from interior mutability. More concretely, > mutating nested data objects stored in a `readonly` property is not > legal, whereas it would be if they were ordinary objects. > * In the future, it should be possible to allow using data classes in > `SplObjectStorage`.