Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-21 Thread tyson andre
Hi Mike Shinkel,

> >> Hmm. I must have missed that thread as I was definitely following the list 
> >> at that time. 
> >> 
> >> But I found the thread, which only had three (3) comments from others:
> >> 
> >> https://externals.io/message/112639
> >> 
> >> From Levi Morrison it seems his objection was to adding `push()` and 
> >> `pop()` to a class including the name "Fixed."  Levi suggested 
> >> soft-deprecating `SplStack` because it was implemented as a linked-list, 
> >> but he proposed adding `Spl\ArrayStack` or similar, so it seems he was 
> >> open to iterating on the `Spl` classes in general (no pun intended.) 
> >> 
> >> From Nikita is seemed that he did not object so much as comment on Levi's 
> >> suggestion of adding `Spl\ArrayStack` and suggested instead an `SqlDeque` 
> >> that would handle queue usage more efficiently that plain PHP arrays.
> >> 
> >> So I think those responses were promising, but that you did not followed 
> >> up on them. I mean no disrespect — we all get busy, our priorities change, 
> >> and things fall off our radar

I said that **in response to you suggesting adding functionality to 
`SplFixedArray`** as the reason why I am not adding functionality to 
`SplFixedArray`.
Those responses were promising for functionality that is not about 
`SplFixedArray`.

The `Vector` is a name choice. `SplArrayStack` and a `Vector` would have very 
similar performance characteristics and probably identical internal 
representations.
However, a more expansive standard library such as `contains`, `map`, `filter`, 
`reduce`, etc. makes more sense on a List/Vector
than a `Stack` if you're solely going based on the name - when you hear 
`Stack`, you mostly think of pushing or popping from it.

As you said also below in your followup response, I am following up on the 
suggestion for a `Deque`.

>  — but it feels to me like you might have more success pursing your use-cases 
> related to the `Spl` classes than via a pure `Vector` class.

It's hard to know which approach (namespaces such as Collection\, SplXyz, or 
short names) will succeed without actually creating an RFC.
I'd mentioned my personal reasons for expecting Spl not to be the best choice.
Any email discussion only has comments from a handful of people with different 
arguments and preferences,
and many times more people might vote on the final RFC

> > Experience in past RFCs gave me the impression that if:
> > 
> > 1. All of the responses are suggesting using a different approach(php-ds, 
> > arrays),
> > 2. Other comments are negative or uninterested.
> > 3. None of the feedback on the original idea is positive or interested in 
> > it.
> > 
> > When feedback was like that, voting would typically have mostly "no" 
> > results.
> 
> Understood, but for clarity I was implying that wanting to change 
> `SplFixedArray` was an "XY problem" and that maybe the way to address your 
> actually use-cases was to pursue other approaches that people were 
> suggesting, which _is_ what you did yesterday.  :-)
>
> >> Maybe propose an `SplVector` class that extends `SplFixedArray`, or 
> >> something similar that addresses the use-case and with a name that people 
> >> can agree on?
> > 
> > I'd be stuck with all of the features in `SplFixedArray` that get 
> > introduced later and its design deisions.
> 
> You wouldn't be stuck with all the feature of `SplFixedArray` if you did 
> "something similar." 

> (I make this point only as it seems you have dismiss one aspect of my 
> suggestion while not acknowledging the alternatives I present. Twice now, at 
> least.)

I'm not sure which of the multiple suggestions you brought up was  you're 
referring to as "something similar".
Your original suggestion I responded to was to modify "SplFixedArray",
I assumed you were suggesting that I change my RFC to focus on SplFixedArray,
I had the impression after feedback those changes to SplFixedArray would 
overwhelmingly fail especially due to being named "Fixed".

I don't consider making it a subclass of SplFixedArray a good design decision.
It's possible to invoke methods on base classes with `ReflectionMethod` so I 
can't make `Vector` a subclass of `SplFixedArray` with an entirely different 
implementation.
So any memory SplFixedArray wastes (e.g. to mitigate bugs already found or 
found in the future) is also wasted in any subclass of SplFixedArray.


- Additionally, this has the same problem as `SplDoublyLinkedList` and its 
subclasses.
  It prevents changing the internal representation of adding certain types of 
functionality if that wouldn't work with the base class.
- Additionally, this would make deprecating and removing `SplFixedArray` 
significantly harder or impractical,
  if it ever seemed appropriate in the future due to lack of use.

Additionally, I'm proposing a final class: SplFixedArray already exists and 
can't be converted to a final class because code already extends it.
See https://wiki.php.net/rfc/deque#final_class for the 

Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-21 Thread Mike Schinkel
> On Sep 19, 2021, at 8:55 AM, tyson andre  wrote:
> 
> Hi Mike Schinkel,
>> 
>> Hmm. I must have missed that thread as I was definitely following the list 
>> at that time. 
>> 
>> But I found the thread, which only had three (3) comments from others:
>> 
>> https://externals.io/message/112639
>> 
>> From Levi Morrison it seems his objection was to adding `push()` and `pop()` 
>> to a class including the name "Fixed."  Levi suggested soft-deprecating 
>> `SplStack` because it was implemented as a linked-list, but he proposed 
>> adding `Spl\ArrayStack` or similar, so it seems he was open to iterating on 
>> the `Spl` classes in general (no pun intended.) 
>> 
>> From Nikita is seemed that he did not object so much as comment on Levi's 
>> suggestion of adding `Spl\ArrayStack` and suggested instead an `SqlDeque` 
>> that would handle queue usage more efficiently that plain PHP arrays.
>> 
>> So I think those responses were promising, but that you did not followed up 
>> on them. I mean no disrespect — we all get busy, our priorities change, and 
>> things fall off our radar — but it feels to me like you might have more 
>> success pursing your use-cases related to the `Spl` classes than via a pure 
>> `Vector` class.
> 
> Experience in past RFCs gave me the impression that if:
> 
> 1. All of the responses are suggesting using a different approach(php-ds, 
> arrays),
> 2. Other comments are negative or uninterested.
> 3. None of the feedback on the original idea is positive or interested in it.
> 
> When feedback was like that, voting would typically have mostly "no" results.

Understood, but for clarity I was implying that wanting to change 
`SplFixedArray` was an "XY problem" and that maybe the way to address your 
actually use-cases was to pursue other approaches that people were suggesting, 
which _is_ what you did yesterday.  :-)

>> Maybe propose an `SplVector` class that extends `SplFixedArray`, or 
>> something similar that addresses the use-case and with a name that people 
>> can agree on?
> 
> I'd be stuck with all of the features in `SplFixedArray` that get introduced 
> later and its design deisions.

You wouldn't be stuck with all the feature of `SplFixedArray` if you did 
"something similar." 

(I make this point only as it seems you have dismiss one aspect of my 
suggestion while not acknowledging the alternatives I present. Twice now, at 
least.)

>> I wavered on whether or not to propose a configurable growth factor, but 
>> ironically I did so to head off the potential complaint from anyone who 
>> cares deeply about memory usage (isn't that you?) that not allowing the 
>> growth factor to be configurable would mean that either the class would use 
>> too much memory for some use-cases, or would need to reallocate more memory 
>> too frequently for other use-cases, depending on what the default growth 
>> factor would be.
>> 
>> That said, I don't see how a configurable growth factor should be 
>> problematic for PHP? For those who don't need/care to optimize memory usage 
>> or reallocation frequency they can simply ignore it; no harm done. But for 
>> those who do care, it would give them the ability to fine tune their memory 
>> usage, which for selected use-cases could mean the difference between being 
>> able to implement something in PHP, or not.
>> 
>> Note that someone could easily argue that adding a memory-optimized data 
>> structure when we already have a perfectly flexible data structure with PHP 
>> arrays that can be used for the same algorithms is "excessive for a 
>> high-level language."  But then I don't think you would make that argument, 
>> so why make it for a configurable growth factor? #honestquestion
> 
> The growth factor is even lower level than shrinkToFit/reserve, and requires 
> extra memory to store the float,
> extra cpu time to do floating point multiplication rather than doubling,
> and additional API methods for something that 99% of applications wouldn't 
> use.
> I consider it more suitable for a low level language.

I respect your points here, but disagree.

> And if we discover a different resizing strategy is better, it prevents us 
> from changing it.

This is not true. 

We could easily no-op the GrowthFactor method and it would not break anything 
in 99.9...% percent of use-cases.

The relevant question here should be, what is the likelihood of us discovering 
a better resizing strategy that would not benefit at all from a growth factor?  
Is there evidence of one anywhere else?  I know that Go — designed to be 
performant to the extent it does not add complexity — uses a growth factor.

>> And finally I think when you conveyed the intent of the author of `ext-ds` 
>> you omitted part of his full statement. When seen in full I believe his 
>> statement conveys a different interest than the partial one implies:
>> 
>> https://github.com/php-ds/ext-ds/issues/156
>> 
>> While he did say "My long-term intention has been to not merge this 
>> 

Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-20 Thread tyson andre
Hi Peter Bowyer,

> That is a fair point. Vector is an overloaded and common word. For me a
> vector will always default to an entity characterized by a magnitude and a
> direction, because that's what I learned and used for years. The next
> definition I learned was the Numpy one.
> 
> That for me is the sticking point if this Vector allows mixed types which
> include arrays or vectors. Store them inside a Vector and then you end up
> with a matrix, a tensor and so-on in something identified as a Vector,
> which is nonsense. Yes C++ does that [1]. Yes with generics it sort-of
> makes sense. Numpy gets round it by calling the type `ndarray` and a vector
> is a specialised one-dimensional array.
> 
> If it's a high-performance array and that's the goal, call it hparray. Call
> it a tuple. Call it a dictionary.

- `hparray`: I think putting high performance in any class name in core is a 
mistake,
  and generally poor naming choice, and will mislead users now or in the future.
  (unless it is literally an API client for a database or server that includes 
high performance in the server software's name)

  Benchmarks currently show it using less memory but some more time than 
`array`,
  and those benchmarks will change as opcache's internals or PHP's 
representation 
  of `object`s or `array`s change.

  Which choice of data structure is highest performance would depend on the 
benchmark or needs of the application/library.
- `tuple`: In mathematics, most references I've heard of to tuples are 
generally 
  fixed sizes (n-tuples). In programming, python and C++ and various other 
languages
  use tuple to refer to a fixed-size (and immutable) data structure,
  making this naming choice extremely confusing.
  https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences
  https://en.cppreference.com/w/cpp/utility/tuple

  > (In C++)Class template std::tuple is a fixed-size collection of 
heterogeneous values.
- `dictionary` - Wikipedia refers to this as an associative array 
https://en.wikipedia.org/wiki/Associative_array
  which is the exact opposite of what my Vector RFC is proposing.
 
So I don't consider any of those proposed names appropriate alternatives, 
and expect much, much stronger opposition to an RFC using that naming choice 
for this functionality.

I expect opposition to any naming choice I propose; `Vector` is what I expect 
to have the least opposition.

Thanks,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-20 Thread Peter Bowyer
Hi Tyson,

On Sat, 18 Sept 2021 at 16:46, tyson andre 
wrote:

> Many of php's names are based on the naming choices in libraries made in
> C/C++.
> So using https://cplusplus.com/reference/vector/vector/ for my RFC
> https://wiki.php.net/rfc/vector
> seems like the most natural naming choice,
> and would make it easier for people with backgrounds in that family of
> languages to find the functionality they're looking for.
> PHP already has a SplStack, SplQueue, etc, like C++'s `stack`, `queue`,
> etc.
>

That is a fair point. Vector is an overloaded and common word. For me a
vector will always default to an entity characterized by a magnitude and a
direction, because that's what I learned and used for years. The next
definition I learned was the Numpy one.

That for me is the sticking point if this Vector allows mixed types which
include arrays or vectors. Store them inside a Vector and then you end up
with a matrix, a tensor and so-on in something identified as a Vector,
which is nonsense. Yes C++ does that [1]. Yes with generics it sort-of
makes sense. Numpy gets round it by calling the type `ndarray` and a vector
is a specialised one-dimensional array.

If it's a high-performance array and that's the goal, call it hparray. Call
it a tuple. Call it a dictionary.


> Also, your comment is ambiguous. Are you saying that you personally object
> to the name,
> or that you're fine with the name but think that the comments by
> Larry/Chris/Pierre in this email thread are representative of voters.
>

Both.

I object to the name for what's being proposed, but am not necessarily
against what's being proposed if it looks more useful than the Spl* stuff.

I'm fine with the name but for something other than what's being proposed.

HTH
Peter

1. https://www.geeksforgeeks.org/vector-of-vectors-in-c-stl-with-examples/


Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-19 Thread tyson andre
Hi Mike Schinkel,
 
> >> Given there seems to be a lot of concern about the approach the RFC 
> >> proposes would it not address the concerns about memory usage and 
> >> performance if several methods were added to SplFixedArray instead (as 
> >> well as functions like indexOf(), contains(), map(), filter(), 
> >> JSONSerialize(), etc., or similar):
> >> 
> >> ===
> >> 
> >> setCapacity(int) — Sets the Capacity, i.e. the maximum Size before resize
> >> getCapacity():int — Gets the current Capacity.
> >> 
> >> setGrowthFactor(float) — Sets the Growth Factor for push(). Defaults to 2
> >> getGrowthFactor():float — Gets the current Growth Factor
> >> 
> >> pop([shrink]):mixed — Returns [Size] then subtracts 1 from Size. If 
> >> (bool)shrink passed then call shrink().
> >> push(mixed) — Sets [Size]=mixed, then Size++, unless Size=Capacity then 
> >> setSize(n) where n=round(Size*GrowthFactor,0) before Size++.
> >> 
> >> grow([new_capacity]) — Increases memory allocated. Sets Capacity to 
> >> Size*GrowthFactor or new_capacity.
> >> shrink([new_capacity]) — Reduces memory allocated. Sets Capacity to 
> >> current Size or new_capacity.
> >> 
> >> ===
> >> 
> >> If you had these methods then I think you would get the memory and 
> >> performance improvements you want, and if you really want a final Vector 
> >> class for your own uses you could roll your own using inheritance or 
> >> containment.
> > 
> > I asked 8 months ago about `push`/`pop` in SplFixedArray. The few responses 
> > were unanimously opposed to SplFixedArray being repurposed like a vector, 
> > the setSize functionality was treated more like an escape hatch and it was 
> > conceptually for fixed-size data.
> 
> Hmm. I must have missed that thread as I was definitely following the list at 
> that time. 
> 
> But I found the thread, which only had three (3) comments from others:
> 
> https://externals.io/message/112639
> 
> From Levi Morrison it seems his objection was to adding `push()` and `pop()` 
> to a class including the name "Fixed."  Levi suggested soft-deprecating 
> `SplStack` because it was implemented as a linked-list, but he proposed 
> adding `Spl\ArrayStack` or similar, so it seems he was open to iterating on 
> the `Spl` classes in general (no pun intended.) 
> 
> From Nikita is seemed that he did not object so much as comment on Levi's 
> suggestion of adding `Spl\ArrayStack` and suggested instead an `SqlDeque` 
> that would handle queue usage more efficiently that plain PHP arrays.
> 
> So I think those responses were promising, but that you did not followed up 
> on them. I mean no disrespect — we all get busy, our priorities change, and 
> things fall off our radar — but it feels to me like you might have more 
> success pursing your use-cases related to the `Spl` classes than via a pure 
> `Vector` class.

Experience in past RFCs gave me the impression that if:

1. All of the responses are suggesting using a different approach(php-ds, 
arrays),
2. Other comments are negative or uninterested.
3. None of the feedback on the original idea is positive or interested in it.

When feedback was like that, voting would typically have mostly "no" results.

Some of the feedback such as `*Deque` was interesting, but not related to 
extending SplFixedArray.

> Maybe propose an `SplVector` class that extends `SplFixedArray`, or something 
> similar that addresses the use-case and with a name that people can agree on?

I'd be stuck with all of the features in `SplFixedArray` that get introduced 
later and its design deisions.

> BTW, here are two other somewhat-related threads:
> 
> - https://externals.io/message/110731
> - https://externals.io/message/113141
> 
> > I also believe adding a configurable growth factor would be excessive for a 
> > high level language.
> 
> I wavered on whether or not to propose a configurable growth factor, but 
> ironically I did so to head off the potential complaint from anyone who cares 
> deeply about memory usage (isn't that you?) that not allowing the growth 
> factor to be configurable would mean that either the class would use too much 
> memory for some use-cases, or would need to reallocate more memory too 
> frequently for other use-cases, depending on what the default growth factor 
> would be.
> 
> That said, I don't see how a configurable growth factor should be problematic 
> for PHP? For those who don't need/care to optimize memory usage or 
> reallocation frequency they can simply ignore it; no harm done. But for those 
> who do care, it would give them the ability to fine tune their memory usage, 
> which for selected use-cases could mean the difference between being able to 
> implement something in PHP, or not.
> 
> Note that someone could easily argue that adding a memory-optimized data 
> structure when we already have a perfectly flexible data structure with PHP 
> arrays that can be used for the same algorithms is "excessive for a 
> high-level language."  

Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-18 Thread Mike Schinkel
Hi Tyson,

Thanks for the reply.

> On Sep 18, 2021, at 7:26 PM, tyson andre  wrote:
> 
> Hi Mike Schinkel,
> 
>> Given there seems to be a lot of concern about the approach the RFC proposes 
>> would it not address the concerns about memory usage and performance if 
>> several methods were added to SplFixedArray instead (as well as functions 
>> like indexOf(), contains(), map(), filter(), JSONSerialize(), etc., or 
>> similar):
>> 
>> ===
>> 
>> setCapacity(int) — Sets the Capacity, i.e. the maximum Size before resize
>> getCapacity():int — Gets the current Capacity.
>> 
>> setGrowthFactor(float) — Sets the Growth Factor for push(). Defaults to 2
>> getGrowthFactor():float — Gets the current Growth Factor
>> 
>> pop([shrink]):mixed — Returns [Size] then subtracts 1 from Size. If 
>> (bool)shrink passed then call shrink().
>> push(mixed) — Sets [Size]=mixed, then Size++, unless Size=Capacity then 
>> setSize(n) where n=round(Size*GrowthFactor,0) before Size++.
>> 
>> grow([new_capacity]) — Increases memory allocated. Sets Capacity to 
>> Size*GrowthFactor or new_capacity.
>> shrink([new_capacity]) — Reduces memory allocated. Sets Capacity to current 
>> Size or new_capacity.
>> 
>> ===
>> 
>> If you had these methods then I think you would get the memory and 
>> performance improvements you want, and if you really want a final Vector 
>> class for your own uses you could roll your own using inheritance or 
>> containment.
> 
> I asked 8 months ago about `push`/`pop` in SplFixedArray. The few responses 
> were unanimously opposed to SplFixedArray being repurposed like a vector, the 
> setSize functionality was treated more like an escape hatch and it was 
> conceptually for fixed-size data.

Hmm. I must have missed that thread as I was definitely following the list at 
that time. 

But I found the thread, which only had three (3) comments from others:

https://externals.io/message/112639

From Levi Morrison it seems his objection was to adding `push()` and `pop()` to 
a class including the name "Fixed."  Levi suggested soft-deprecating `SplStack` 
because it was implemented as a linked-list, but he proposed adding 
`Spl\ArrayStack` or similar, so it seems he was open to iterating on the `Spl` 
classes in general (no pun intended.) 

From Nikita is seemed that he did not object so much as comment on Levi's 
suggestion of adding `Spl\ArrayStack` and suggested instead an `SqlDeque` that 
would handle queue usage more efficiently that plain PHP arrays.

So I think those responses were promising, but that you did not followed up on 
them. I mean no disrespect — we all get busy, our priorities change, and things 
fall off our radar — but it feels to me like you might have more success 
pursing your use-cases related to the `Spl` classes than via a pure `Vector` 
class. Maybe propose an `SplVector` class that extends `SplFixedArray`, or 
something similar that addresses the use-case and with a name that people can 
agree on?

BTW, here are two other somewhat-related threads:

- https://externals.io/message/110731
- https://externals.io/message/113141

> I also believe adding a configurable growth factor would be excessive for a 
> high level language.

I wavered on whether or not to propose a configurable growth factor, but 
ironically I did so to head off the potential complaint from anyone who cares 
deeply about memory usage (isn't that you?) that not allowing the growth factor 
to be configurable would mean that either the class would use too much memory 
for some use-cases, or would need to reallocate more memory too frequently for 
other use-cases, depending on what the default growth factor would be.

That said, I don't see how a configurable growth factor should be problematic 
for PHP? For those who don't need/care to optimize memory usage or reallocation 
frequency they can simply ignore it; no harm done. But for those who do care, 
it would give them the ability to fine tune their memory usage, which for 
selected use-cases could mean the difference between being able to implement 
something in PHP, or not.

Note that someone could easily argue that adding a memory-optimized data 
structure when we already have a perfectly flexible data structure with PHP 
arrays that can be used for the same algorithms is "excessive for a high-level 
language."  But then I don't think you would make that argument, so why make it 
for a configurable growth factor? #honestquestion

> This has been asked about multiple times in threads on unrelated proposals 
> (https://externals.io/message/112639#112641 and 
> https://externals.io/message/93301#93301 years ago) throughout the years, but 
> the maintainer of php-ds had a long term goal of developing the separately 
> from php's release cycle (and was still focusing on the PECL when I'd asked 
> on the GitHub issue in the link almost a year ago).

And finally I think when you conveyed the intent of the author of `ext-ds` you 
omitted part of his 

Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-18 Thread Mike Schinkel
Hi Larry,

> On Sep 18, 2021, at 12:03 PM, Larry Garfield  wrote:
> 
> Rather than go point by point, I'm going to respond globally here.
> 
> I am frequently on-record hating on PHP arrays, and stating that I want 
> something better.  The problems with PHP arrays include:
> 
> 1. They're badly performing (because they cannot be optimized)
> 2. They're not type safe
> 3. They're mutable
> 4. They mix sequences (true arrays) with dictionaries/hashmaps, making 
> everything uglier
> 5. People keep using them as structs, when they're not
> 6. The API around them is procedural, inconsistent, and overall gross
> 7. They lack a lot of native shorthand operations found in other languages 
> (eg, slicing)
> 8. Their error handling is crap

Would you mind elaborating on points #3 and #8?  

It is not clear to me what you are getting at with those points.

-Mike
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-18 Thread tyson andre
Hi Mike Schinkel,

> Given there seems to be a lot of concern about the approach the RFC proposes 
> would it not address the concerns about memory usage and performance if 
> several methods were added to SplFixedArray instead (as well as functions 
> like indexOf(), contains(), map(), filter(), JSONSerialize(), etc., or 
> similar):
> 
> ===
> 
> setCapacity(int) — Sets the Capacity, i.e. the maximum Size before resize
> getCapacity():int — Gets the current Capacity.
> 
> setGrowthFactor(float) — Sets the Growth Factor for push(). Defaults to 2
> getGrowthFactor():float — Gets the current Growth Factor
> 
> pop([shrink]):mixed — Returns [Size] then subtracts 1 from Size. If 
> (bool)shrink passed then call shrink().
> push(mixed) — Sets [Size]=mixed, then Size++, unless Size=Capacity then 
> setSize(n) where n=round(Size*GrowthFactor,0) before Size++.
> 
> grow([new_capacity]) — Increases memory allocated. Sets Capacity to 
> Size*GrowthFactor or new_capacity.
> shrink([new_capacity]) — Reduces memory allocated. Sets Capacity to current 
> Size or new_capacity.
> 
> ===
> 
> If you had these methods then I think you would get the memory and 
> performance improvements you want, and if you really want a final Vector 
> class for your own uses you could roll your own using inheritance or 
> containment.

I asked 8 months ago about `push`/`pop` in SplFixedArray. The few responses 
were unanimously opposed to SplFixedArray being repurposed like a vector,
the setSize functionality was treated more like an escape hatch and it was 
conceptually for fixed-size data.

I also believe adding a configurable growth factor would be excessive for a 
high level language.

Thanks,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-18 Thread Matthew Brown
On Sat, 18 Sept 2021 at 12:04, Larry Garfield 
wrote:

>
> I am frequently on-record hating on PHP arrays, and stating that I want
> something better.  The problems with PHP arrays include:
>
> 1. They're badly performing (because they cannot be optimized)
> 2. They're not type safe
> 3. They're mutable
> 4. They mix sequences (true arrays) with dictionaries/hashmaps, making
> everything uglier
> 5. People keep using them as structs, when they're not
> 6. The API around them is procedural, inconsistent, and overall gross
> 7. They lack a lot of native shorthand operations found in other languages
> (eg, slicing)
> 8. Their error handling is crap
>
> Any new native/stdlib alternative to arrays needs to address at least half
> of those issues, preferably most/all.
>

Hey Larry,

I believe 1. and 2. are an impossible standard for any PHP-based proposal
to meet. If you want it to be (runtime) type-safe, that assumes the
existence of runtime type checks which can quickly become a performance
bottleneck.

For 3, having explored immutability in depth with Psalm, arrays don't
present any sort of challenge due to their copy-on-write behavior. There's
a chunk of Psalm's codebase that makes heavy use of arrays, and it's
still provably pure.

5: they're used as makeshift structs, but there's nothing preventing people
using constructor property promotion and named parameters to model the same
data. I think this is effectively a solved problem.

No real debates about the 4, 6, 7 and 8, but I'm radically opposed to
throwing out the baby with the bathwater here — languages that have solved
these problems exist, but demanding a proposal pass a purity test means the
PHP project is more likely to stay the way it is.

Best wishes,

Matt


Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-18 Thread tyson andre
Hi Larry Garfield,

> Rather than go point by point, I'm going to respond globally here.
> 
> I am frequently on-record hating on PHP arrays, and stating that I want 
> something better.  The problems with PHP arrays include:
> 
> 1. They're badly performing (because they cannot be optimized)
> 2. They're not type safe
> 3. They're mutable
> 4. They mix sequences (true arrays) with dictionaries/hashmaps, making 
> everything uglier
> 5. People keep using them as structs, when they're not
> 6. The API around them is procedural, inconsistent, and overall gross
> 7. They lack a lot of native shorthand operations found in other languages 
> (eg, slicing)
> 8. Their error handling is crap
> 
> Any new native/stdlib alternative to arrays needs to address at least half of 
> those issues, preferably most/all.
> 
> This proposal addresses the first point and... that's it.  Point 5 is sort of 
> covered by virtue of being out of scope, so maybe this covers 1.5 out of 8.  
> That's insufficient to be worth the effort to support and deal with in code.  
> That makes this approach a strong -1 for me.
> 
> "Fancy algorithms are slow when n is small, and n is usually small." -- Rob 
> Pike
> 
> That some of the design choices here mirror existing poor implementations is 
> not an endorsement of them.  I don't think I've seen anyone on this list say 
> anything good about SPL beyond iterators and autoloading, so it's not really 
> a good model to emulate.
> 
> Additionally, please don't play into the trope about procedural/mutable code 
> being more beginner friendly.  That's not the case, beyond being a 
> self-fulfilling prophesy.  (If we teach procedural/mutable code first, then 
> most beginners will be most proficient in procedural/mutable code.)  I would 
> argue that, on the whole, immutable values make code easier to reason about 
> and write once you get past trivially small sizes.  We do new developers a 
> gross disservice by treating immutability as an "advanced" technique, when it 
> should really be the default, beginner technique taught from day one.
> 
> I am not aware of any PECL implementations of lists that have type safety, 
> because I don't use many PECL packages.  However, in user space it's quite 
> simple to do:
> 
> https://presentations.garfieldtech.com/slides-never-use-arrays/phpkonf2021/#/5/2
> 
> See that slide and scroll down for additional examples.  Every one of those 
> examples took me less than 5 minutes to write.  If we want to have a better 
> alternative in core, it needs to be *at least* as capable as what I can throw 
> together in 5 minutes.  The proposal as-is is not even as capable as those 
> examples.

Yes, you can implement those immutable and typed data structures in userland.
You are doing that by adding userland code hiding the internal implementations 
of the mutable `array` to solve the needs of your library/application (e.g. 
those 8).
Adding a mutable `Vector` gives another way to internally represent those 
userland data structures when you need those userland data structures to share 
data internally without using PHP references (not as part of the public api), 
e.g. appending to a list of error objects, performing a depth-first search or 
breadth-first search, etc.

As for your example, it's impossible to type hint without generics, and 
nobody's working on generics.
If I have your userland `TypedArray::forType(MyClass::class);`,
I can pass it to any parameter/return value/property expecting a `TypedArray`,
but that will then throw an Error at runtime with no warning ahead of time if I 
pass it to a function expecting a `TypedArray` of `OtherClass`.
Static analyzers exist separately from php that could analyze that, but 

1. Many developers wouldn't have static analyzers set up.
2. The TypedArrays may be created from unserialization from apcu/memcache/redis 
and be impractical to analyze (e.g. from an older release of a library or 
application)
3. Voters may object to this additional way to write PHP code that could error 
at runtime.

**What data structures do you want in core? Do you want them to eagerly 
evaluate generators or lazily evaluate them? Is `TypedArray` or `TypedSequence` 
something you think should have an RFC or plan to create an RFC for?**

Even if immutable data structures are proposed, there's a further division 
between programmers who want lazy or eager immutables (e.g. their constructors 
or factory methods to eagerly evaluate iterable values or lazily evaluate 
values),
and there may be enough objections to either choice (for the specific data 
structure proposed) when it was time to actually vote to result in the vote 
failing.
(in addition to other objections that come up in any new proposal for core 
datastructures)
This discourages me from proposing immutable data structures.

I'd agree on the utility of Set/Map/sorted data structures (though the hashable 
vs not hashable, comparator vs no comparator, how to hash, etc. is a discussion 

Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-18 Thread Pierre Joye
Hi Tyson,

On Sat, Sep 18, 2021, 10:46 PM tyson andre 
wrote:

> Hi Peter Bowyer,
>
>
> Many of php's names are based on the naming choices in libraries made in
> C/C++.
> So using https://cplusplus.com/reference/vector/vector/ for my RFC
> https://wiki.php.net/rfc/vector
> seems like the most natural naming choice,
> and would make it easier for people with backgrounds in that family of
> languages to find the functionality they're looking for.
>

I do and as mentioned before it makes it confusing and harder because of
the use of zval and not specific type like in c++.  A zval is not a php
userland type.

C++ initialized a vector using the type:

vector  myvalues;


then all elements must be of type int.

This proposal breaks this already.

I expect having a second `Stack` would be confusing and make it hard to
> remember which is the efficient one.
> (Especially since stacks typically don't include specialized resizing
> methods)
>

Yes, as it isn't a stack either ;)

I have no idea how you could name, sorry,  but definitely not a vector.



> Also, your comment is ambiguous. Are you saying that you personally object
> to the name,
> or that you're fine with the name but think that the comments by
> Larry/Chris/Pierre in this email thread are representative of voters.
>


Maybe representative of what vectors are and how they are used.

Vectorization is a key to programming now like multi threading or
parraelism a few years back. Vectorization is an old principles as
processors gain in power in the last year, this the way raw performance
scale even more with adding more cores.

Now, having a fixed array named Vector in php would be a big mistake and
actually very confusing.

best,
Pierre

>


Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-18 Thread Larry Garfield
On Fri, Sep 17, 2021, at 8:49 PM, tyson andre wrote:
> 
> > Improving collection/set operations in PHP is something near and dear to my 
> > heart,
> > so I'm in favor of adding a Vector class or similar to the stdlib.
> > 
> > However, I am not a fan of this particular design.
> > 
> > As Levi noted, this being a mutable object that passes by handle is asking 
> > for trouble.
> > It should either be some by-value internal type, or an immutable object 
> > with evolver methods on it.
> > (E.g., add($val): Vector). Making it a mutable object is creating spooky 
> > action at a distance problems.
> > An immutable object seems likely easier to implement than a new type,
> > but both are beyond my capabilities so I defer to those who could do so.
> 
> https://wiki.php.net/rfc/vector#adding_a_native_type_instead_is_vec 
> discusses why I'm doubtful of `is_vec` getting implemented or passing.
> Especially with `add()` taking linear time to copy all elements of the 
> existing value if you mean an array rather than a linked list-like 
> structure, and any referenced copies taking a lot more memory than an 
> imperative version would.
> 
> 
> PHP's end users and internals members come from a wide variety of 
> backgrounds,
> and I assume most beginning or experienced PHP programmers would tend 
> towards imperative programming rather than functional 
> programming.
> 
> PHP provides tools such as `clone`, private visibility, etc to deal with that.
> 
> The lack of any immutable object datastructures in core and the lack of 
> immutable focused extensions in 
> PECL https://pecl.php.net/package-search.php?pkg_name=immutable
> https://www.php.net/manual-lookup.php?pattern=immutable=quickref
> (other than DateTimeImmutable)
> heavily discourage me from proposing anything immutable.
> 
> (Technically, https://github.com/TysonAndre/pecl-teds has minimal 
> implementations of immutable data structures, but the api is still 
> being revised and Vector is the primary focus, followed by iterable 
> functions. e.g. there's no `ImmutableSequence::add($value): 
> ImmutableSequence` method.)
> 
> 
> > The methods around size control are seemingly pointless from a user POV.
> 
> setSize is useful in allocating exactly the variable amount of memory 
> needed while using less memory than a PHP array.
> `setSize($newSize, 0)` would be much more efficient and concise in 
> initializing the value.
> 
> - Or in quickly reducing the size of the array rather than repeatedly 
> calling pop in a loop.
> 
> And while methods around capacity control exist in many other 
> programming languages, they aren't used by most users and most users 
> are fine with functionality they don't use existing.
> The applications or libraries that do have a good use case to reduce 
> memory will take advantage of them and end users of those 
> applications/libraries would benefit from the memory usage reduction.
> 
> > I understand the memory optimization value they have, but that's not 
> > something PHP developers are at all used to dealing with.
> > That makes it less of a convenient drop-in replacement for array and more 
> > just another user-space collection object, but in C with internals 
> > endorsement.
> > If such logic needs to be included, it should be kept as minimalist as 
> > possible for usability,
> > even at the cost of a little memory usage in some cases.
> 
> If the functionality was just a drop-in replacement for array, others 
> may say "why not just use array and the array libraries?" (or Vector).
> With the strategy of doubling capacity, it can be up to 99% more memory 
> than needed in some cases (Even more wastage after shrinking from the 
> maximum size).
> 
> > There is no reason to preserve keys.
> > A Vector or list type should not have user-defined keys.
> > It should just be a linear list. If you populate it from an existing 
> > array/iterable, the keys should be entirely ignored.
> > If you care about keys you want a HashMap or Dictionary or similar (which 
> > we also desperately need in the stdlib, but that's a separate thing).
> 
> The behavior is similar to 
> https://www.php.net/manual/en/splfixedarray.fromarray.php 
> It tries to preserve the keys, and fills in gaps with null.
> 
> 1. There's the consistency with existing functionality such as 
> SplFixedArray::fromArray, or existing constructors preserving keys.
> 2. And I'd imagined that a last minute objection of "Wait, `new 
> SplFixedArray([1 => 'second', 0 => 'first'])` does what by default? 
> Isn't this using the keys 0 and 1?", and the same for gaps
> 
>    I was considering only having the no-param constructor, and adding 
> the static method fromValues(iterable $it) to make it clearer keys are 
> ignored.
> 
> > Whether or not contains() needs a comparison callback in my mind depends 
> > mainly on whether or not the operator overloading RFC passes. 
> > If it does, then contains() can/should use the __compareTo() method on 
> > objects.
> > If it 

Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-18 Thread tyson andre
Hi Peter Bowyer,

> > > To echo Pierre, a Vector needs to be of a single guaranteed type.
> > > Yes, this gets us back to the generics conversation again, but I presume
> > (perhaps naively?) there are ways to address this question without getting
> > into full-blown generics.
> >
> > Yep, as you said, this type is mixed, just like the SplFixedArray,
> > ArrayObject, values of SplObjectStorage/WeakMap, etc.
> >
> 
> Please rename your proposal as the use of the term "Vector" is confusing
> for people who use them in other languages. Much of the discussion so far
> has been around whether it's a Vector or what it should be; changing the
> proposed name will allow the discussion to focus on what you're proposing
> to add, not what others (myself included) would like to see added to PHP :)

Many of php's names are based on the naming choices in libraries made in C/C++.
So using https://cplusplus.com/reference/vector/vector/ for my RFC 
https://wiki.php.net/rfc/vector
seems like the most natural naming choice,
and would make it easier for people with backgrounds in that family of 
languages to find the functionality they're looking for.
PHP already has a SplStack, SplQueue, etc, like C++'s `stack`, `queue`, etc.

I expect having a second `Stack` would be confusing and make it hard to 
remember which is the efficient one.
(Especially since stacks typically don't include specialized resizing methods)

No alternative names have been suggested by you or them so far, as far as I 
remember, and 2 of those responders seem to be saying they would vote no 
regardless of the choice of name (for reasons such as wanting generic-like 
functionality, wanting immutability or built-in types, etc.).
PHP's already using List to refer to linked lists, and `array` in PHP already 
refers to a hash table (including in ArrayObject).
So I expect a stronger objection to alternative names that I can think of.

Also, your comment is ambiguous. Are you saying that you personally object to 
the name,
or that you're fine with the name but think that the comments by 
Larry/Chris/Pierre in this email thread are representative of voters.

- People who wouldn't find the name surprising wouldn't bother writing an email 
to indicate a lack of surprise.

Thanks,
-Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-18 Thread Peter Bowyer
On Sat, 18 Sept 2021 at 02:49, tyson andre 
wrote:

> > To echo Pierre, a Vector needs to be of a single guaranteed type.
> > Yes, this gets us back to the generics conversation again, but I presume
> (perhaps naively?) there are ways to address this question without getting
> into full-blown generics.
>
> Yep, as you said, this type is mixed, just like the SplFixedArray,
> ArrayObject, values of SplObjectStorage/WeakMap, etc.
>

Please rename your proposal as the use of the term "Vector" is confusing
for people who use them in other languages. Much of the discussion so far
has been around whether it's a Vector or what it should be; changing the
proposed name will allow the discussion to focus on what you're proposing
to add, not what others (myself included) would like to see added to PHP :)

Peter


Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-17 Thread Mike Schinkel
> On Sep 16, 2021, at 10:09 PM, tyson andre  wrote:
> 
> Hi internals,
> 
> I've created a new RFC https://wiki.php.net/rfc/vector proposing to add 
> `final class Vector` to PHP.
> 
> PHP's native `array` type is rare among programming language in that it is 
> used as an associative map of values, but also needs to support lists of 
> values.
> In order to support both use cases while also providing a consistent internal 
> array HashTable API to the PHP's internals and PECLs, additional memory is 
> needed to track keys 
> (https://www.npopov.com/2014/12/22/PHPs-new-hashtable-implementation.html - 
> around twice as much as is needed to just store the values due to needing 
> space both for the string pointer and int key in a Bucket, for non-reference 
> counted values)).
> Additionally, creating non-constant arrays will allocate space for at least 8 
> elements to make the initial resizing more efficient, potentially wasting 
> memory.
> 
> It would be useful to have an efficient variable-length container in the 
> standard library for the following reasons: 
> 
> 1. To save memory in applications or libraries that may need to store many 
> lists of values and/or run as a CLI or embedded process for long periods of 
> time 
>   (in modules identified as using the most memory or potentially exceeding 
> memory limits in the worst case)
>   (both in userland and in native code written in php-src/PECLs)
> 2. To provide a better alternative to `ArrayObject` and `SplFixedArray` for 
> use cases 
>   where objects are easier to use than arrays - e.g. variable sized 
> collections (For lists of values) that can be passed by value to be read and 
> modified.
> 3. To give users the option of stronger runtime guarantees that property, 
> parameter, or return values really contain a list of values without gaps, 
> that array modifications don't introduce gaps or unexpected indexes, etc.
> 
> Thoughts on Vector?

Given there seems to be a lot of concern about the approach the RFC proposes 
would it not address the concerns about memory usage and performance if several 
methods were added to SplFixedArray instead (as well as functions like 
indexOf(), contains(), map(), filter(), JSONSerialize(), etc., or similar):

===

setCapacity(int) — Sets the Capacity, i.e. the maximum Size before resize
getCapacity():int — Gets the current Capacity.

setGrowthFactor(float) — Sets the Growth Factor for push(). Defaults to 2
getGrowthFactor():float — Gets the current Growth Factor

pop([shrink]):mixed — Returns [Size] then subtracts 1 from Size. If 
(bool)shrink passed then call shrink().
push(mixed) — Sets [Size]=mixed, then Size++, unless Size=Capacity then 
setSize(n) where n=round(Size*GrowthFactor,0) before Size++.

grow([new_capacity]) — Increases memory allocated. Sets Capacity to 
Size*GrowthFactor or new_capacity.
shrink([new_capacity]) — Reduces memory allocated. Sets Capacity to current 
Size or new_capacity.

===

If you had these methods then I think you would get the memory and performance 
improvements you want, and if you really want a final Vector class for your own 
uses you could roll your own using inheritance or containment.

Would this not work?

-Mike

P.S. I also think asking for new methods on SplFixedArray has a much greater 
chance of successful than an RFC for Vector. #jmtcw

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-17 Thread Pierre Joye
Hi Tyson,

On Sat, Sep 18, 2021, 10:21 AM tyson andre 
wrote:

>
> This proposal already has a fixed-sized type - that type is `mixed` (or
> `zval` internally), like ArrayObject, WeakMap, etc. already have in their
> values.
> (Similar to how basic Java collections (e.g. ArrayList​) are all
> collections of `Object` after generic type erasure.)
>


Thanks for the clarification.

So the name of this proposal is misleading. They are not vector.

I am not sure php needs another type of fixed array at this stage. So -1
here overall.

best,
Pierre

>


Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-17 Thread tyson andre

Hi Pierre Joye,

> Not sure you care or read my reply but I had to jump in one more time here :)
> 
> On Sat, Sep 18, 2021 at 8:49 AM tyson andre  wrote:
> 
> > setSize is useful in allocating exactly the variable amount of memory 
> > needed while using less memory than a PHP array.
> > `setSize($newSize, 0)` would be much more efficient and concise in 
> > initializing the value.
> >
> > - Or in quickly reducing the size of the array rather than repeatedly 
> > calling pop in a loop.
> 
> I would rather not reduce it at all, but use the vector_size and keep
> it. User land set its max size but a realloc/free should not be
> necessary and counter productive from a perf point of view. If one
> uses it in a daemon, it can always be destroyed as needed.
> 
> > > To echo Pierre, a Vector needs to be of a single guaranteed type.
> > > Yes, this gets us back to the generics conversation again, but I presume 
> > > (perhaps naively?) there are ways to address this question without 
> > > getting into full-blown generics.
> >
> > Yep, as you said, this type is mixed, just like the SplFixedArray, 
> > ArrayObject, values of SplObjectStorage/WeakMap, etc.
> > Generic support is something that's been brought up before, investigated, 
> > then abandoned.
> >
> > My concerns with adding StringVector, MixedVector, IntVector, FloatVector, 
> > BoolVector, ArrayVector (confusing), ObjectVector, etc is that
> >
> > - I doubt many people would agree that there's a wide use case for any
> >   specific one of them compared to a vector of any type.
> 
> I am lost here. This is the main usage of Vector. For linear
> arithmetic like dot product, masking, add/sub/mul/div of vector etc. I
> do not see any other usage per see for all the things I have
> implemented or saw out there. Additionally, f.e., a string is a vector
> already on its own, I am not sure a vector of vectors makes sense ;).
> 
> >   This would be even harder to argue for than just a single Vector type.
> > - Mixes of null and type `T` might make sense in many cases (e.g. optional 
> > objects, statistics that failed to get computed, etc) but would be 
> > forbidden by that
> > - It would be a bad choice if generic support did get added in the future.
> 
> These are special cases for general purposes of vectors. Implementing
> vectors focusing on these special cases rather than the general
> purpose (vectorization) would be a strategic mistake. I mentioned it
> before, but please take a look at the numpy's Vector f.e., with
> python's operator overload, what has been done there is simply
> amazing, bringing vector processing/arithmetic a huge boost in
> performance, even with millions of entries (14 to 400x speed boost
> compared to classic array, even fixed).
> 
> > > But really, a non-type-guaranteed Vector/List construct is of fairly 
> > > little use to me in practice, and that's before we even get into the 
> > > potential performance optimizations for map() and filter() from type 
> > > guarantees.
> >
> > See earlier comments on `vec`/Generics not being actively worked on right 
> > now and probably being a far way away from an implementation that would 
> > pass a vote.
> 
> Generics!=Vector. But I hope that's not the way we are heading here :)
> 
> > As for optimizations, opcache currently doesn't optimize individual global 
> > functions (let alone methods), it optimizes opcodes.
> > Even array_map()/array_filter() aren't optimized, they call callbacks in an 
> > ordinary way.
> > E.g. https://github.com/php/php-src/pull/5588 or 
> > https://externals.io/message/109847 regarding ordinary methods.
> >
> > Aside: In the long term, I think the opcache core team had a long-term plan 
> > of changing the intermediate representation to make these types of 
> > optimizations feasible without workarounds like the one I proposed in 5588
> 
> You are fully correct here, I see a lack of the engine devs
> involvement (not complaining, just a state of the affairs :) in such
> RFC where this kind of feature could greatly benefit PHP. Well
> planned, this is a huge addition to PHP.
> 
> It is also why I am convinced that doing it right for Vectors (as a
> start) and thinking forwards to JIT and ops overloading (internally or
> userland) to allow smooth and nice vectorization (as some parts use
> them already internally f.e.) will bring PHP up to speed with the
> competition. If we don't, we just have something that would be similar
> to what anyone could do in userland with more flexibility.

I have no plans to change the direction of this RFC in those directions and no 
personal interest in working on generics (where others have attempted and 
failed) or operator overloading for array operations.

**Adding anything like numpy's operator overloading or generics is entirely out 
of the scope of my proposal and not the goal of my proposal.**
Both of those are massive projects compared to adding a small number of data 
structures.
**See 

Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-17 Thread Pierre Joye
Good morning,

Not sure you care or read my reply but I had to jump in one more time here :)

On Sat, Sep 18, 2021 at 8:49 AM tyson andre  wrote:

> setSize is useful in allocating exactly the variable amount of memory needed 
> while using less memory than a PHP array.
> `setSize($newSize, 0)` would be much more efficient and concise in 
> initializing the value.
>
> - Or in quickly reducing the size of the array rather than repeatedly calling 
> pop in a loop.

I would rather not reduce it at all, but use the vector_size and keep
it. User land set its max size but a realloc/free should not be
necessary and counter productive from a perf point of view. If one
uses it in a daemon, it can always be destroyed as needed.

> > To echo Pierre, a Vector needs to be of a single guaranteed type.
> > Yes, this gets us back to the generics conversation again, but I presume 
> > (perhaps naively?) there are ways to address this question without getting 
> > into full-blown generics.
>
> Yep, as you said, this type is mixed, just like the SplFixedArray, 
> ArrayObject, values of SplObjectStorage/WeakMap, etc.
> Generic support is something that's been brought up before, investigated, 
> then abandoned.
>
> My concerns with adding StringVector, MixedVector, IntVector, FloatVector, 
> BoolVector, ArrayVector (confusing), ObjectVector, etc is that
>
> - I doubt many people would agree that there's a wide use case for any
>   specific one of them compared to a vector of any type.

I am lost here. This is the main usage of Vector. For linear
arithmetic like dot product, masking, add/sub/mul/div of vector etc. I
do not see any other usage per see for all the things I have
implemented or saw out there. Additionally, f.e., a string is a vector
already on its own, I am not sure a vector of vectors makes sense ;).

>   This would be even harder to argue for than just a single Vector type.
> - Mixes of null and type `T` might make sense in many cases (e.g. optional 
> objects, statistics that failed to get computed, etc) but would be forbidden 
> by that
> - It would be a bad choice if generic support did get added in the future.

These are special cases for general purposes of vectors. Implementing
vectors focusing on these special cases rather than the general
purpose (vectorization) would be a strategic mistake. I mentioned it
before, but please take a look at the numpy's Vector f.e., with
python's operator overload, what has been done there is simply
amazing, bringing vector processing/arithmetic a huge boost in
performance, even with millions of entries (14 to 400x speed boost
compared to classic array, even fixed).

> > But really, a non-type-guaranteed Vector/List construct is of fairly little 
> > use to me in practice, and that's before we even get into the potential 
> > performance optimizations for map() and filter() from type guarantees.
>
> See earlier comments on `vec`/Generics not being actively worked on right now 
> and probably being a far way away from an implementation that would pass a 
> vote.

Generics!=Vector. But I hope that's not the way we are heading here :)

> As for optimizations, opcache currently doesn't optimize individual global 
> functions (let alone methods), it optimizes opcodes.
> Even array_map()/array_filter() aren't optimized, they call callbacks in an 
> ordinary way.
> E.g. https://github.com/php/php-src/pull/5588 or 
> https://externals.io/message/109847 regarding ordinary methods.
>
> Aside: In the long term, I think the opcache core team had a long-term plan 
> of changing the intermediate representation to make these types of 
> optimizations feasible without workarounds like the one I proposed in 5588

You are fully correct here, I see a lack of the engine devs
involvement (not complaining, just a state of the affairs :) in such
RFC where this kind of feature could greatly benefit PHP. Well
planned, this is a huge addition to PHP.

It is also why I am convinced that doing it right for Vectors (as a
start) and thinking forwards to JIT and ops overloading (internally or
userland) to allow smooth and nice vectorization (as some parts use
them already internally f.e.) will bring PHP up to speed with the
competition. If we don't, we just have something that would be similar
to what anyone could do in userland with more flexibility.

Best,
-- 
Pierre

@pierrejoye | http://www.libgd.org

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-17 Thread tyson andre

> Improving collection/set operations in PHP is something near and dear to my 
> heart,
> so I'm in favor of adding a Vector class or similar to the stdlib.
> 
> However, I am not a fan of this particular design.
> 
> As Levi noted, this being a mutable object that passes by handle is asking 
> for trouble.
> It should either be some by-value internal type, or an immutable object with 
> evolver methods on it.
> (E.g., add($val): Vector). Making it a mutable object is creating spooky 
> action at a distance problems.
> An immutable object seems likely easier to implement than a new type,
> but both are beyond my capabilities so I defer to those who could do so.

https://wiki.php.net/rfc/vector#adding_a_native_type_instead_is_vec discusses 
why I'm doubtful of `is_vec` getting implemented or passing.
Especially with `add()` taking linear time to copy all elements of the existing 
value if you mean an array rather than a linked list-like structure, and any 
referenced copies taking a lot more memory than an imperative version would.


PHP's end users and internals members come from a wide variety of backgrounds,
and I assume most beginning or experienced PHP programmers would tend towards 
imperative programming rather than functional programming.

PHP provides tools such as `clone`, private visibility, etc to deal with that.

The lack of any immutable object datastructures in core and the lack of 
immutable focused extensions in PECL 
https://pecl.php.net/package-search.php?pkg_name=immutable
https://www.php.net/manual-lookup.php?pattern=immutable=quickref
(other than DateTimeImmutable)
heavily discourage me from proposing anything immutable.

(Technically, https://github.com/TysonAndre/pecl-teds has minimal 
implementations of immutable data structures, but the api is still being 
revised and Vector is the primary focus, followed by iterable functions. e.g. 
there's no `ImmutableSequence::add($value): ImmutableSequence` method.)


> The methods around size control are seemingly pointless from a user POV.

setSize is useful in allocating exactly the variable amount of memory needed 
while using less memory than a PHP array.
`setSize($newSize, 0)` would be much more efficient and concise in initializing 
the value.

- Or in quickly reducing the size of the array rather than repeatedly calling 
pop in a loop.

And while methods around capacity control exist in many other programming 
languages, they aren't used by most users and most users are fine with 
functionality they don't use existing.
The applications or libraries that do have a good use case to reduce memory 
will take advantage of them and end users of those applications/libraries would 
benefit from the memory usage reduction.

> I understand the memory optimization value they have, but that's not 
> something PHP developers are at all used to dealing with.
> That makes it less of a convenient drop-in replacement for array and more 
> just another user-space collection object, but in C with internals 
> endorsement.
> If such logic needs to be included, it should be kept as minimalist as 
> possible for usability,
> even at the cost of a little memory usage in some cases.

If the functionality was just a drop-in replacement for array, others may say 
"why not just use array and the array libraries?" (or Vector).
With the strategy of doubling capacity, it can be up to 99% more memory than 
needed in some cases (Even more wastage after shrinking from the maximum size).

> There is no reason to preserve keys.
> A Vector or list type should not have user-defined keys.
> It should just be a linear list. If you populate it from an existing 
> array/iterable, the keys should be entirely ignored.
> If you care about keys you want a HashMap or Dictionary or similar (which we 
> also desperately need in the stdlib, but that's a separate thing).

The behavior is similar to 
https://www.php.net/manual/en/splfixedarray.fromarray.php 
It tries to preserve the keys, and fills in gaps with null.

1. There's the consistency with existing functionality such as 
SplFixedArray::fromArray, or existing constructors preserving keys.
2. And I'd imagined that a last minute objection of "Wait, `new 
SplFixedArray([1 => 'second', 0 => 'first'])` does what by default? Isn't this 
using the keys 0 and 1?", and the same for gaps

   I was considering only having the no-param constructor, and adding the 
static method fromValues(iterable $it) to make it clearer keys are ignored.

> Whether or not contains() needs a comparison callback in my mind depends 
> mainly on whether or not the operator overloading RFC passes. 
> If it does, then contains() can/should use the __compareTo() method on 
> objects.
> If it doesn't, then there needs to be some other way to compare non-identical 
> objects or else that method becomes mostly useless.

There's a distinction between needs and very nice to have - a contains check 
for some predicate on a Vector can be done with a userland helper 

Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-17 Thread tyson andre
Hi Max Semenik,

> Since Ds was mentioned, I've added it to your benchmark (code and complete 
> results at https://gist.github.com/MaxSem/d0ea0755d6deabaf88c9ef26039b2f27):
> 
> Appending to array:         n= 1048576 iterations=      20 memory=33558608 
> bytes, create+destroy time=0.369 read time = 0.210 result=10995105792000
> Appending to Vector:        n= 1048576 iterations=      20 memory=16777304 
> bytes, create+destroy time=0.270 read time = 0.270 result=10995105792000
> Appending to SplStack:      n= 1048576 iterations=      20 memory=33554584 
> bytes, create+destroy time=0.893 read time = 0.397 result=10995105792000
> Appending to SplFixedArray: n= 1048576 iterations=      20 memory=16777304 
> bytes, create+destroy time=2.475 read time = 0.340 result=10995105792000
> Appending to Ds\Vector:     n= 1048576 iterations=      20 memory=24129632 
> bytes, create+destroy time=0.389 read time = 0.305 result=10995105792000
> 
> Another comparison with Ds, I wonder if an interface akin to Ds\Sequence[1] 
> could be added, to have something in common with other future containers.

It's worth noting that the first 4 data structures all start with initial sizes 
that are powers of 2 and continue doubling (and not mattering for SplStack, a 
doubly linked list),
but according to Ds\Vector's documentation,
it starts with a minimum size of 10. So it's an unfair comparison. 
http://docs.php.net/manual/en/class.ds-vector.php#ds-vector.constants.min-capacity
So there are probably larger copies done in Ds\Vector - Ds\Vector might do 
better for other sizes or use less memory under other circumstances.

(for reasons mentioned in https://externals.io/message/116048#116054 , I 
haven't checked the resizing strategy used by Ds\Vector - doubling is a common 
choice in vector implementations in other languages, others use other multiples 
of old capacity, etc)

Regards,
- Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-17 Thread Levi Morrison via internals
> * Whether or not contains() needs a comparison callback in my mind depends 
> mainly on whether or not the operator overloading RFC passes.  If it does, 
> then contains() can/should use the __compareTo() method on objects.  If it 
> doesn't, then there needs to be some other way to compare non-identical 
> objects or else that method becomes mostly useless.

This is only partly true. Let's say we have a vector of some complex
type A. There are legitimate reasons for using different ways of
comparing As, such as when projecting sub-fields (for example, sorting
by each member's name this time, but next time sorting by each
member's location).

Of course, if it passes, then using a type's built-in comparison
overloading is a sensible default, but it doesn't remove the need of
having a custom comparator.

-

I was tired when I originally pointed it out the comparator/equatable
stuff, but Tyson was rightly saying that `any` solves this need, e.g.

contains($value, $eq)) {/**/}
// translates to
if ($vec->any(fn ($x) => $eq($x, $value))) {/**/}
?>

However, it's not as clear what to do for `indexOf` where you care
about the index it was found at.

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-17 Thread Larry Garfield
On Thu, Sep 16, 2021, at 9:09 PM, tyson andre wrote:
> Hi internals,
> 
> I've created a new RFC https://wiki.php.net/rfc/vector proposing to add 
> `final class Vector` to PHP.
> 
> PHP's native `array` type is rare among programming language in that it 
> is used as an associative map of values, but also needs to support 
> lists of values.
> In order to support both use cases while also providing a consistent 
> internal array HashTable API to the PHP's internals and PECLs, 
> additional memory is needed to track keys 
> (https://www.npopov.com/2014/12/22/PHPs-new-hashtable-implementation.html - 
> around twice as much as is needed to just store the values due to needing 
> space both for the string pointer and int key in a Bucket, for non-reference 
> counted values)).
> Additionally, creating non-constant arrays will allocate space for at 
> least 8 elements to make the initial resizing more efficient, 
> potentially wasting memory.
> 
> It would be useful to have an efficient variable-length container in 
> the standard library for the following reasons: 
> 
> 1. To save memory in applications or libraries that may need to store 
> many lists of values and/or run as a CLI or embedded process for long 
> periods of time 
>(in modules identified as using the most memory or potentially 
> exceeding memory limits in the worst case)
>(both in userland and in native code written in php-src/PECLs)
> 2. To provide a better alternative to `ArrayObject` and `SplFixedArray` 
> for use cases 
>where objects are easier to use than arrays - e.g. variable sized 
> collections (For lists of values) that can be passed by value to be 
> read and modified.
> 3. To give users the option of stronger runtime guarantees that 
> property, parameter, or return values really contain a list of values 
> without gaps, that array modifications don't introduce gaps or 
> unexpected indexes, etc.
> 
> Thoughts on Vector?
> 
> P.S. The functionality in this proposal can be tested/tried out at 
> https://pecl.php.net/teds (under the class name `\Teds\Vector` instead 
> of `\Vector`).
> (That is a PECL I created earlier this year for future versions of 
> iterable proposals, common data structures such as Vector/Deque, and 
> less commonly used data structures that may be of use in future work on 
> implementing other data structures)

Improving collection/set operations in PHP is something near and dear to my 
heart, so I'm in favor of adding a Vector class or similar to the stdlib.

However, I am not a fan of this particular design.

* As Levi noted, this being a mutable object that passes by handle is asking 
for trouble.  It should either be some by-value internal type, or an immutable 
object with evolver methods on it.  (Eg, add($val): Vector).  Making it a 
mutable object is creating spooky action at a distance problems.  An immutable 
object seems likely easier to implement than a new type, but both are beyond my 
capabilities so I defer to those who could do so.

* The methods around size control are seemingly pointless from a user POV.  I 
understand the memory optimization value they have, but that's not something 
PHP developers are at all used to dealing with.  That makes it less of a 
convenient drop-in replacement for array and more just another user-space 
collection object, but in C with internals endorsement.  If such logic needs to 
be included, it should be kept as minimalist as possible for usability, even at 
the cost of a little memory usage in some cases.

* There is no reason to preserve keys.  A Vector or list type should not have 
user-defined keys.  It should just be a linear list.  If you populate it from 
an existing array/iterable, the keys should be entirely ignored.  If you care 
about keys you want a HasMap or Dictionary or similar (which we also 
desperately need in the stdlib, but that's a separate thing).

* Whether or not contains() needs a comparison callback in my mind depends 
mainly on whether or not the operator overloading RFC passes.  If it does, then 
contains() can/should use the __compareTo() method on objects.  If it doesn't, 
then there needs to be some other way to compare non-identical objects or else 
that method becomes mostly useless.

* To echo Pierre, a Vector needs to be of a single guaranteed type.  Yes, this 
gets us back to the generics conversation again, but I presume (perhaps 
naively?) there are ways to address this question without getting into 
full-blown generics.  But really, a non-type-guaranteed Vector/List construct 
is of fairly little use to me in practice, and that's before we even get into 
the potential performance optimizations for map() and filter() from type 
guarantees.  I can write a type-guaranteed user-space class that does what I 
need in under 10 minutes, and for most low cardinality sets that's adequately 
performant.  A built-in needs to be better than that.

I very much appreciate the chicken-and-egg challenge of wanting to get 

Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-17 Thread Max Semenik
On Fri, Sep 17, 2021 at 5:10 AM tyson andre 
wrote:

> Hi internals,
>
> I've created a new RFC https://wiki.php.net/rfc/vector proposing to add
> `final class Vector` to PHP.


Thank you so much, Tyson - I love your proposal. Since Ds was mentioned,
I've added it to your benchmark (code and complete results at
https://gist.github.com/MaxSem/d0ea0755d6deabaf88c9ef26039b2f27):

Appending to array: n= 1048576 iterations=  20 memory=33558608
bytes, create+destroy time=0.369 read time = 0.210 result=10995105792000
Appending to Vector:n= 1048576 iterations=  20 memory=16777304
bytes, create+destroy time=0.270 read time = 0.270 result=10995105792000
Appending to SplStack:  n= 1048576 iterations=  20 memory=33554584
bytes, create+destroy time=0.893 read time = 0.397 result=10995105792000
Appending to SplFixedArray: n= 1048576 iterations=  20 memory=16777304
bytes, create+destroy time=2.475 read time = 0.340 result=10995105792000
Appending to Ds\Vector: n= 1048576 iterations=  20 memory=24129632
bytes, create+destroy time=0.389 read time = 0.305 result=10995105792000

Another comparison with Ds, I wonder if an interface akin to Ds\Sequence[1]
could be added, to have something in common with other future containers.

-
[1] http://docs.php.net/manual/en/class.ds-sequence.php

-- 
Best regards,
Max Semenik


Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-17 Thread Pierre Joye
Hi Tyson,

Back on my laptop so I will answer my question myself as I read the
source code. Please, really, that should be part of the RFC content.
Half of the questions here are about APIs, goals, etc. RFC should be
specifications as much as possible.

On Fri, Sep 17, 2021 at 12:43 PM Pierre Joye  wrote:
>
> Hello Tyson,
>
> Vector support would be very good. JIT can do a lot with them if we
> have a clean Vector implementation, or even without JIT.

Teds\Vector is named as Vector however I am afraid it is not, it is a
fixed array implementation. A vector, as in all other languages are,
for the definition, , fixed or variable sizes, of element of the same
type, The same type is absolutely key here.The reason to require the
same type is the core principle of vector (and vectorization),
structure of arrays rather than array of structs. The latter are hard
(or pointless) to parallelize and hard to optimize. An easy way to
play with Vector would be to try out numpy's Vector, which is by far
one of the best (and fastest) scripting language implementations of
Vector.

I did not spend enough time on the code, but I would by deconstructing:

typedef struct _teds_vector_entries {
size_t size;
size_t capacity;
zval *entries;
} teds_vector_entries;

to  different types (or using multiple entries with a zval_type entry.
ie. for a zval float Vector:

typedef struct _teds_vector_entries {
size_t size;
size_t capacity;
double float;
} teds_vector_entries;

Alternatively, common C port of C++ Vector do something along this line:

typedef struct _cVector{
unsigned int size;
unsigned int cnt_elements;
unsigned int element_size;
void *elements;
} cVector;

where the initialization is:

void cVectorInit (cVector *array, unsigned int element_size); where
element_size is sizeof(double) f.e.

so any append, truncate, etc. are aware of the size to be
(re)allocated, if needed.

The only addition to handle zval would be:

typedef struct _cVector{
unsigned int size;
unsigned int cnt_elements;
zval_enum type zval_type;
unsigned int element_size;
void *elements;
} cVector;

Doing so will drastically help to finally have a simple, by usage and
implementation/api, way to implement vectorization using PHP.

I would also like to suggest having it in the engine somehow, if it is
not possible to have JIT jump in here if it is not an actual type in
the engine. While it is possible to have intrinsics implementations in
any extension, it won't be as good or efficient as an engine type with
JIT support.

Also, as it stands, I do not think it can be called a Vector. So I am
not too keen for it as I don't think we need another SplFixed*Array
implementation, as simple as it could be. The RFC needs some work
anyway before any vote can be taken.

In any case, if the above would be something you may consider (to
implement an actual vector), I can help and would be happy too if you
need/like to.

Best,
-- 
Pierre

@pierrejoye | http://www.libgd.org

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-17 Thread Levi Morrison via internals
> I'd considered using a signature of `setSize(int $size, mixed $value = null)` 
> to allow using something other than null
> but decided to leave that to a followup proposal if it passed.

Rust and C++ both accept a value to pad, there's no reason to restrict
this to only null. Go ahead and make the change now.

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-17 Thread tyson andre
Hi Christian Schenider,

> First of all: I don't have a strong opinion on a Vector class being useful or 
> necessary.
> 
> But I have two comments about this RFC:
> 
> 1. Using the very generic name Vector without any prefix/namespace seems 
> dangerous and asking for BC breaks.

I downloaded the top 400 composer packages with 
https://github.com/nikic/popular-package-analysis/ and didn't find any classes 
named Vector.

- Only php-cs-fixer extends SplFixedArray in one class. It can continue do so.
- I don't see other classes called Vector. Just stubs for `\Ds\Vector`.

There are tradeoffs and objections to any possible choice of name I could make, 
including this or alternates.

- Too likely to have conflicts
- Excessively long
- Open to adopting namespace but objecting to migrating existing classes (or 
not doing so)
- Objecting to a specific choice 

> 2. I don't like that this class is final. The reasons given in 
> https://wiki.php.net/rfc/vector#final_class 
> https://wiki.php.net/rfc/vector#final_class seem unconvincing to me and 
> restrict the usage of Vector in a way which makes me question the usefulness 
> to a big enough part of the PHP community.
> These two reasons combined would make me reject the RFC at the current stage.

There are alternatives such as making all/almost all of the methods 
final(especially for reading and modifying array elements or basic properties 
of the vector), but allowing extending the class.

- Still, I don't think that'd be very useful, and would make future final 
method additions to Vector backwards incompatible.
- Trying to do everything (e.g. be extensible and handle all edge cases of 
extension) has often resulted in many spl data structures doing not anything 
very well(efficiently, correctly, or possible to make universal assumptions 
about or optimize in the future with opcache/the jit).

While it is possible to extend ArrayObject and SplFixedArray, very few things 
do that, and it'd generally lead to worse API design except in a few cases.
(E.g. `UserList extends \Vector` wouldn't be able to enforce that inserted 
values are actually users with final methods)

Thanks,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-17 Thread tyson andre
Hi Levi Morrison,

> I mean that there isn't a way to provide a custom way to compare for
> equality. One way to accomplish this is to have a signature like:
> 
>    function contains(T $value, ?callable(T, T):bool $comparator = null): bool
> 
> The same goes for `indexOf`.

It'd make much more sense to have `->any(static fn($other): bool => 
$comparator($value, $other)): ?:int`
Overloading contains to do two different things (identity check or test the 
result of a callable)
seems like it's unintuitive to users.

Since there is plenty of time to add more functionality,
and I still haven't created the extended iterable library proposal,
this currently only adds operations that are significantly more efficient 
inside the Vector
(or have a return type of Vector) rather than going through the generic 
Iterator methods.

> > > - I don't know what `setSize(int $size)` does. What does it do if the
> > > current size is less than `$size`? What about if its current size is
> > > greater? I suspect this is about capacity, not size, but without docs
> >  > I am just guessing.
> >
> > It's the same behavior as 
> > https://www.php.net/manual/en/splfixedarray.setsize.php . It's about size, 
> > not capacity.
> >
> > > Change the size of an array to the new size of size.
> > > If size is less than the current array size, any values after the new 
> > > size will be discarded.
> > > If size is greater than the current array size, the array will be padded 
> > > with null values.
> >
> > I'd planned to add phpdoc documentation and examples before starting a vote 
> > to document the behavior and thrown exceptions of the proposed methods.
> 
> I would rather see multiple methods like:
>     function truncateTo(int $size)
>     function padEnd(int $length, $value) // allows more than just null
>     function padBeginning(int $length, $value)

I'd consider this unfriendly to users (and personally consider it a poor 
design) if we start with 3 or 4 different ways to change the size of the Vector.
(Especially if English is a second language)

A wide variety of programming languages such as Java, Rust, C++, etc. all use 
resize rather than truncateTo/padEnd,
after what I assume is considerable discussion among language design experts in 
those languages.
In the vast majority of cases, users know the exact size they want and don't 
care about the mechanism to set that.
(And if the size is set larger or smaller in an `if{...}else{...}`, the 
existence of setSize is still needed.
Or if the user intends to reuse the allocated memory while overwriting all 
values.)

- Diverging from what end users are familiar with (without a strong reason to) 
would also make it harder to start using `Vector`.

I'd considered using a signature of `setSize(int $size, mixed $value = null)` 
to allow using something other than null
but decided to leave that to a followup proposal if it passed.

For now, I'd omitted ways to add to the start of the array because the linear 
time taken would be potentially objectionable,
if people didn't imagine using it themselves or thought it'd be more 
appropriate for end users to use a Deque.

> And one or more for increasing/ensuring capacity without changing size.

setCapacity seems useful to me for reserving exactly the amount of memory 
needed when the final size was known (e.g. setCapacity(2) to avoid 
over-allocating) but I was waiting to see if anyone else wanted that.

Thanks,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-17 Thread Pierre

Le 17/09/2021 à 14:54, tyson andre a écrit :

Aside: https://github.com/TysonAndre/pecl-teds#iterable-functions
starts doing that, but evaluating eagerly instead of using generators.
I still don't think there's enough functionality yet to re-propose that.

Nice to know this, I wasn't aware it even existed, I'll have a look.

I know this vector proposal is not about that, but nevertheless, in my
opinion, it must start preparing the terrain for this, or all other RFC
in the future will only create new isolated data structures and make the
SPL even more inconsistent.

It's possible, but I don't know what others think.

1. https://www.php.net/manual/en/class.ds-collection.php actually seems fairly 
universal, but out of scope, and I don't know if people would json encode a 
SplMaxHeap. Right now that isn't implemented and the value is always `{}`
2. `add($value)/remove($value)/contains[Value]($value)` is limited to some 
structures - Only containsValue() would apply to ArrayObject/SplObjectStorage. 
The others wouldn't work since you'd need to know the keys as well.
That's true, vector is a bit aside of what we'd expect from a full blown 
collection API, it's a very basic structure in the end so it can 
probably live on its own.

Also,

- Union type/intersection type support exists, so allowing any generic 
collection interface is less urgent.
That's right, but I don't think that union/intersection types solve the 
generic collection problem, you'd still have to match for specific class 
names or interfaces if methods are not rationalised in a single API.

- equals() may work, though infinite recursion (or the way it is or isn't 
detected) in circular data structures is a potential objection, especially with 
lack of stack overflow detection - php just crashes/segfaults without a useful 
method when it runs out of stack space.

For the ones that are universal, 
Countable/ArrayAccess/IteratorAggregate/Traversable already exist.
Yes, they exist, but I wouldn't place IteratorAggregate as being part of 
the interface, it's about implementation, but right. Anyway altogether 
they form a very poor API covering a very small surface and I'd imagine 
those becoming a legacy thing if a new API was introduced.

Also, as you said, this RFC is not about that.
Requiring that anyone systematically overhaul existing data structures before 
adding any new data structures
seems like it would significantly delay or discourage any future additions of 
data structures.

In the immediate future, an RFC only doing that would not have much short-term 
benefit to users - it would also have short-term drawbacks for what I consider 
not enough benefit,
if adopting that interface made libraries drop support for older php versions.


I think your point is legit, and a part of me agrees with you, probably 
having some data structure before thinking about rationalisation is 
something that would make people move forward. Nevertheless it's always 
very difficult to change things once they're here, and the whole problem.


I crave so deeply for a complete, easy to use, well documented and 
standard collection API that I always jump on such RFC's to tell people 
"stop using DS, stop using Doctrine Collections, stop using "[name it 
here] collection", please everyone, let's design, implement and use the 
single and same one, so that we will never have to support them all 
(them being the 1,000 existing duplicated library in userland) in our 
framework or business code.


Thanks a lot for your answer and you time, despite the fact I still 
think that designing a collection first can still be done, having the 
vector type/class in core is a great idea.


Regards,

--

Pierre

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-17 Thread tyson andre

Hi Pierre,

> That's nice, and I like it, but like many people I will argue about the
> API itself.
> 
> One thing is that there's many methods in there that would totally fit
> generic collection common interfaces, and in that regard, I'd be very
> sad that it would be merged as is.

It isn't an interface, but my previous attempts at introducing common 
functionality for working with iterables have failed,
e.g. with preferring userland reasons or being too small in scope among the 
reasons.
https://wiki.php.net/rfc/any_all_on_iterable#straw_poll

Until there's a Set type or a Map type, adding generic functionality such as 
contains()
to all spl data structures is harder.

I haven't seen any recent additions of utility methods to existing spl 
datastructures in years other than when filling an urgent need,
(e.g. SplHeap->isCorrupted())
and have been pessimistic about that succeeding, but may be mistaken.

> I think it's taking the problem backwards, I would personally prefer that:
> 
>  - This RFC introduces the vector into a new Collection namespace, or
> any other collection/iterable/enumerable related namespace, that'd
> probably become the birth of a later to be standard collection API.
> 
>  - Start thinking about a common API even if it's for one or two
> methods, and propose something that later would give the impulsion for
> adding new collection types and extending this in order to be become
> something that looks like a really coherent collection API.
> 
> If this goes in without regarding the greater plan, it will induce
> inconsistencies in the future, when people will try to make something
> greater. I'd love having something like DS and nikic/iter fused
> altogether into PHP core, as a whole, in a consistent, performant, with
> a nice and comprehensive API (and that doesn't require to install
> userland dependencies).

Aside: https://github.com/TysonAndre/pecl-teds#iterable-functions
starts doing that, but evaluating eagerly instead of using generators.
I still don't think there's enough functionality yet to re-propose that.

> I know this vector proposal is not about that, but nevertheless, in my
> opinion, it must start preparing the terrain for this, or all other RFC
> in the future will only create new isolated data structures and make the
> SPL even more inconsistent.

It's possible, but I don't know what others think.

1. https://www.php.net/manual/en/class.ds-collection.php actually seems fairly 
universal, but out of scope, and I don't know if people would json encode a 
SplMaxHeap. Right now that isn't implemented and the value is always `{}`
2. `add($value)/remove($value)/contains[Value]($value)` is limited to some 
structures - Only containsValue() would apply to ArrayObject/SplObjectStorage. 
The others wouldn't work since you'd need to know the keys as well.

Also,

- Union type/intersection type support exists, so allowing any generic 
collection interface is less urgent.
- equals() may work, though infinite recursion (or the way it is or isn't 
detected) in circular data structures is a potential objection, especially with 
lack of stack overflow detection - php just crashes/segfaults without a useful 
method when it runs out of stack space.

For the ones that are universal, 
Countable/ArrayAccess/IteratorAggregate/Traversable already exist.

Also, as you said, this RFC is not about that.
Requiring that anyone systematically overhaul existing data structures before 
adding any new data structures
seems like it would significantly delay or discourage any future additions of 
data structures.

In the immediate future, an RFC only doing that would not have much short-term 
benefit to users - it would also have short-term drawbacks for what I consider 
not enough benefit,
if adopting that interface made libraries drop support for older php versions.

Thanks,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-17 Thread Olle Härstedt
On Thu, Sep 16, 2021 at 8:10 PM tyson andre  
wrote: 
> > Hi internals, 
> 
> I've created a new RFC https://wiki.php.net/rfc/vector proposing to add 
> `final class Vector` to PHP. 
> 
> PHP's native `array` type is rare among programming language in that it is 
> used as an associative map of values, but also needs to support lists of 
> values. 
> In order to support both use cases while also providing a consistent internal 
> array HashTable API to the PHP's internals and PECLs, additional memory is 
> needed to track keys 
> (https://www.npopov.com/2014/12/22/PHPs-new-hashtable-implementation.html - 
> around twice as much as is needed to just store the values due to needing 
> space both for the string pointer and int key in a Bucket, for non-reference 
> counted values)). 
> Additionally, creating non-constant arrays will allocate space for at least 8 
> elements to make the initial resizing more efficient, potentially wasting 
> memory. 
> 
> It would be useful to have an efficient variable-length container in the 
> standard library for the following reasons: 
> 
> 1. To save memory in applications or libraries that may need to store many 
> lists of values and/or run as a CLI or embedded process for long periods of 
> time 
>(in modules identified as using the most memory or potentially exceeding 
> memory limits in the worst case) 
>(both in userland and in native code written in php-src/PECLs) 
> 2. To provide a better alternative to `ArrayObject` and `SplFixedArray` for 
> use cases 
>where objects are easier to use than arrays - e.g. variable sized 
> collections (For lists of values) that can be passed by value to be read and 
> modified. 
> 3. To give users the option of stronger runtime guarantees that property, 
> parameter, or return values really contain a list of values without gaps, 
> that array modifications don't introduce gaps or unexpected indexes, etc. 
> 
> Thoughts on Vector? 
> 
> P.S. The functionality in this proposal can be tested/tried out at 
> https://pecl.php.net/teds (under the class name `\Teds\Vector` instead of 
> `\Vector`). 
> (That is a PECL I created earlier this year for future versions of iterable 
> proposals, common data structures such as Vector/Deque, and less commonly 
> used data structures that may be of use in future work on implementing other 
> data structures) 
> 
> Thanks, 
> Tyson 
> -- 
> PHP Internals - PHP Runtime Development Mailing List 
> To unsubscribe, visit: https://www.php.net/unsub.php 
> 
 
I'm okay with a final Vector class in general. I don't love the 
proposed API but don't hate it either. Feedback on that at the end. 
 
What I would _love_ is a `vec` type from hacklang, which is similar to 
this but pass-by-value, copy-on-write like an array. Of course, this 
would require engine work and I understand it isn't as simple to add. 
 
Feedback on API: 
 
-  `indexOf` returning `false` instead of `null` when it cannot be 
found. If we are keeping this method (which I don't like, because 
there's no comparator), please return `null` instead of false. The 
language has facilities for working with null like `??`, so please 
prefer that when it isn't needed for BC (like this, this is a new 
API). 
- `contains` also doesn't have a comparator. 
-  Similarly but less strongly, I don't like the filter callable 
returning `mixed` -- please just make it `bool`. 
- I don't know what `setSize(int $size)` does. What does it do if the 
current size is less than `$size`? What about if its current size is 
greater? I suspect this is about capacity, not size, but without docs 
I am just guessing. 
 
-- 
PHP Internals - PHP Runtime Development Mailing List 
To unsubscribe, visit: https://www.php.net/unsub.php 
 





use SplFixedArray as vec;



Done. ;)



Olle

Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-17 Thread Pierre

Le 17/09/2021 à 04:09, tyson andre a écrit :

Hi internals,

I've created a new RFC https://wiki.php.net/rfc/vector proposing to add `final 
class Vector` to PHP.


Hello,

That's nice, and I like it, but like many people I will argue about the 
API itself.


One thing is that there's many methods in there that would totally fit 
generic collection common interfaces, and in that regard, I'd be very 
sad that it would be merged as is.


I think it's taking the problem backwards, I would personally prefer that:

 - This RFC introduces the vector into a new Collection namespace, or 
any other collection/iterable/enumerable related namespace, that'd 
probably become the birth of a later to be standard collection API.


 - Start thinking about a common API even if it's for one or two 
methods, and propose something that later would give the impulsion for 
adding new collection types and extending this in order to be become 
something that looks like a really coherent collection API.


If this goes in without regarding the greater plan, it will induce 
inconsistencies in the future, when people will try to make something 
greater. I'd love having something like DS and nikic/iter fused 
altogether into PHP core, as a whole, in a consistent, performant, with 
a nice and comprehensive API (and that doesn't require to install 
userland dependencies).


I know this vector proposal is not about that, but nevertheless, in my 
opinion, it must start preparing the terrain for this, or all other RFC 
in the future will only create new isolated data structures and make the 
SPL even more inconsistent.


Regards,

--

Pierre

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-17 Thread Pierre Joye
Good afternoon Christian,

On Fri, Sep 17, 2021 at 3:07 PM Christian Schneider
 wrote:
>
> Am 17.09.2021 um 04:09 schrieb tyson andre :
> > I've created a new RFC https://wiki.php.net/rfc/vector proposing to add 
> > `final class Vector` to PHP.
>
>
> First of all: I don't have a strong opinion on a Vector class being useful or 
> necessary.
>
> But I have two comments about this RFC:
> 1) Using the very generic name Vector without any prefix/namespace seems 
> dangerous and asking for BC breaks.
> 2) I don't like that this class is final. The reasons given in 
> https://wiki.php.net/rfc/vector#final_class 
>  seem unconvincing to me and 
> restrict the usage of Vector in a way which makes me question the usefulness 
> to a big enough part of the PHP community.
>
> These two reasons combined would make me reject the RFC at the current stage.

I think it is more in a draft stage for discussions.

To be more precise with my earlier reply, I only see such additions as
useful if it is an actual Vector as known in other languages and
widely used the last years in ML and other similar areas like data or
image processing.

To me a vector is useful if it allows vectorized operations, as in
SIMD, AltVec, CUBA etc. Some refs:

https://users.ece.cmu.edu/~franzf/teaching/slides-18-645-simd.pdf
https://indico.cern.ch/event/238763/attachments/401939/558861/HP-intel_mic_optimization.pdf

These two refer to Intel architecture but ARM (especiall v9 with Neon,
MIPS, ppc and maybe soon riscV does support such operations as well.
It is amazingly well suited for raw performance increase. I can
imagine having annotations and/or specific optimization for vectors of
scalar processing. It requires a bit of (re)thinking, but it is
totally worth it.

Generic multiple data types vectors are less useful for such things
and SplFixedArray does it already, if I understand the RFC, as it
stands now, correctly.

best,
-- 
Pierre

@pierrejoye | http://www.libgd.org

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-17 Thread Christian Schneider
Am 17.09.2021 um 04:09 schrieb tyson andre :
> I've created a new RFC https://wiki.php.net/rfc/vector proposing to add 
> `final class Vector` to PHP.


First of all: I don't have a strong opinion on a Vector class being useful or 
necessary.

But I have two comments about this RFC:
1) Using the very generic name Vector without any prefix/namespace seems 
dangerous and asking for BC breaks.
2) I don't like that this class is final. The reasons given in 
https://wiki.php.net/rfc/vector#final_class 
 seem unconvincing to me and 
restrict the usage of Vector in a way which makes me question the usefulness to 
a big enough part of the PHP community.

These two reasons combined would make me reject the RFC at the current stage.

- Chris



Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-16 Thread Pierre Joye
Hello Tyson,

Vector support would be very good. JIT can do a lot with them if we
have a clean Vector implementation, or even without JIT.

What is your base inspiration for Vector? I do like the pretty
standard C++ Vector implementation:

https://www.cplusplus.com/reference/vector/vector/

Where a Vector is initalizied with:

$myIntVector = vector;

What is key for performance is also the alloc/realloc/free strategy.
In C++ (or most C or other languages custom) gives control to the
creators to define max size, capacity, etc.

If a non typed Vector is the goal, then I am less in need of it, still
good to have but not as good as a clear pure Vector support :).

Also I think it will be very good to have more details about what this
RFC proposes in the RFC. It is kind of hard to follow right now, with
all external links. RFCs are better if they act as a real
specification :)

Best,
Pierre

On Fri, Sep 17, 2021 at 9:10 AM tyson andre  wrote:
>
> Hi internals,
>
> I've created a new RFC https://wiki.php.net/rfc/vector proposing to add 
> `final class Vector` to PHP.
>
> PHP's native `array` type is rare among programming language in that it is 
> used as an associative map of values, but also needs to support lists of 
> values.
> In order to support both use cases while also providing a consistent internal 
> array HashTable API to the PHP's internals and PECLs, additional memory is 
> needed to track keys 
> (https://www.npopov.com/2014/12/22/PHPs-new-hashtable-implementation.html - 
> around twice as much as is needed to just store the values due to needing 
> space both for the string pointer and int key in a Bucket, for non-reference 
> counted values)).
> Additionally, creating non-constant arrays will allocate space for at least 8 
> elements to make the initial resizing more efficient, potentially wasting 
> memory.
>
> It would be useful to have an efficient variable-length container in the 
> standard library for the following reasons:
>
> 1. To save memory in applications or libraries that may need to store many 
> lists of values and/or run as a CLI or embedded process for long periods of 
> time
>(in modules identified as using the most memory or potentially exceeding 
> memory limits in the worst case)
>(both in userland and in native code written in php-src/PECLs)
> 2. To provide a better alternative to `ArrayObject` and `SplFixedArray` for 
> use cases
>where objects are easier to use than arrays - e.g. variable sized 
> collections (For lists of values) that can be passed by value to be read and 
> modified.
> 3. To give users the option of stronger runtime guarantees that property, 
> parameter, or return values really contain a list of values without gaps, 
> that array modifications don't introduce gaps or unexpected indexes, etc.
>
> Thoughts on Vector?
>
> P.S. The functionality in this proposal can be tested/tried out at 
> https://pecl.php.net/teds (under the class name `\Teds\Vector` instead of 
> `\Vector`).
> (That is a PECL I created earlier this year for future versions of iterable 
> proposals, common data structures such as Vector/Deque, and less commonly 
> used data structures that may be of use in future work on implementing other 
> data structures)
>
> Thanks,
> Tyson
> --
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: https://www.php.net/unsub.php
>


-- 
Pierre

@pierrejoye | http://www.libgd.org


-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-16 Thread Matthew Brown
I can also give some in-the-trenches perspective of vec's utility, having
spent the last month and a half writing Hack. vec is a really useful data
structure to be able to use explicitly in code. It makes code that uses it
easier to understand.

The main benefit over Vector is that it could be used as a straightforward
replacement for array in many cases, with the same copy-on-write semantics
helping developers avoid spooky action-at-a-distance. I'm confident that by
the time that such a feature would be ready for primetime, there would be
tools in place to assist migrations. We could also presumably allow for
casts between vec and array.

And once opcache is sorted out, I'm also confident that the advice "using
vec will make your code a bit faster" would be a successful recruitment
tool.

Best wishes,

Matt


Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-16 Thread Matthew Brown
On Thu, 16 Sept 2021 at 23:33, tyson andre 
wrote:

> Yeah, as mentioned in
> https://wiki.php.net/rfc/vector#adding_a_native_type_instead_is_vec , it
> would require a massive amount of work.
>
> - A standard library for dealing with `vec`, filtering it, etc
> - Userland libraries and PECLs would need to deal with a third complex
> type different from array/object that probably couldn't be implicitly
> - Extensive familiarity with opcache and the JIT for x86 and other
> platforms beyond what I have
> - Willingness to do that with the uncertainty the final implementation
> would get 2/3 votes with backwards compatibility objections, etc.
>

I feel like the standard library could be added in userland first, and then
corresponding faster implementations could arrive in the std lib.

But the last point is a really important one, and feels like a weakness in
the RFC process.

I know RFCs without implementations are generally frowned upon, but if
2/3rds of the community agreed that they wanted some sort of vec[] support
in theory, it might then free up the implementer(s) to take a more granular
approach to supporting vec. It could, for example, be an experimental
feature for a minor version.

Best wishes,

Matt

On Thu, 16 Sept 2021 at 23:33, tyson andre 
wrote:

> Hi Levi Morrison,
>
> > I'm okay with a final Vector class in general. I don't love the
> > proposed API but don't hate it either. Feedback on that at the end.
> >
> > What I would _love_ is a `vec` type from hacklang, which is similar to
> > this but pass-by-value, copy-on-write like an array. Of course, this
> > would require engine work and I understand it isn't as simple to add.
>
> Yeah, as mentioned in
> https://wiki.php.net/rfc/vector#adding_a_native_type_instead_is_vec , it
> would require a massive amount of work.
>
> - A standard library for dealing with `vec`, filtering it, etc
> - Userland libraries and PECLs would need to deal with a third complex
> type different from array/object that probably couldn't be implicitly
> - Extensive familiarity with opcache and the JIT for x86 and other
> platforms beyond what I have
> - Willingness to do that with the uncertainty the final implementation
> would get 2/3 votes with backwards compatibility objections, etc.
>
> > Feedback on API:
> >
> > -  `indexOf` returning `false` instead of `null` when it cannot be
> > found. If we are keeping this method (which I don't like, because
> > there's no comparator), please return `null` instead of false. The
> > language has facilities for working with null like `??`, so please
> > prefer that when it isn't needed for BC (like this, this is a new
> > API).
>
> I hadn't thought about that - that seems reasonable since I don't remember
> anything else adding indexOf as a method name.
>
> > - `contains` also doesn't have a comparator.
>
> I was considering proposing `->any(callable)` and `->all(callable)`
> extensions if this passed.
> I'm not quite sure what you mean by a comparator for contains. There'd
> have to be a way to check if a raw closure is contained.
>
> > -  Similarly but less strongly, I don't like the filter callable
> > returning `mixed` -- please just make it `bool`.
>
> The filter callable is something that would be passed into the filter
> function. The return value would be checked for truthiness.
> The phpdoc in the documentation could be changed, but that wouldn't change
> the implementation.
>
> > - I don't know what `setSize(int $size)` does. What does it do if the
> > current size is less than `$size`? What about if its current size is
> > greater? I suspect this is about capacity, not size, but without docs
>  > I am just guessing.
>
> It's the same behavior as
> https://www.php.net/manual/en/splfixedarray.setsize.php . It's about
> size, not capacity.
>
> > Change the size of an array to the new size of size.
> > If size is less than the current array size, any values after the new
> size will be discarded.
> > If size is greater than the current array size, the array will be padded
> with null values.
>
> I'd planned to add phpdoc documentation and examples before starting a
> vote to document the behavior and thrown exceptions of the proposed methods.
>
> Thanks,
> Tyson
> --
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: https://www.php.net/unsub.php
>
>


Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-16 Thread Levi Morrison via internals
> > - `contains` also doesn't have a comparator.
>
> I was considering proposing `->any(callable)` and `->all(callable)` 
> extensions if this passed.
> I'm not quite sure what you mean by a comparator for contains. There'd have 
> to be a way to check if a raw closure is contained.

I mean that there isn't a way to provide a custom way to compare for
equality. One way to accomplish this is to have a signature like:
function contains(T $value, ?callable(T, T):bool $comparator = null): bool

The same goes for `indexOf`.

> > - I don't know what `setSize(int $size)` does. What does it do if the
> > current size is less than `$size`? What about if its current size is
> > greater? I suspect this is about capacity, not size, but without docs
>  > I am just guessing.
>
> It's the same behavior as 
> https://www.php.net/manual/en/splfixedarray.setsize.php . It's about size, 
> not capacity.
>
> > Change the size of an array to the new size of size.
> > If size is less than the current array size, any values after the new size 
> > will be discarded.
> > If size is greater than the current array size, the array will be padded 
> > with null values.
>
> I'd planned to add phpdoc documentation and examples before starting a vote 
> to document the behavior and thrown exceptions of the proposed methods.

I would rather see multiple methods like:
function truncateTo(int $size)
function padEnd(int $length, $value) // allows more than just null
function padBeginning(int $length, $value)
And one or more for increasing/ensuring capacity without changing size.

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-16 Thread tyson andre
Hey Marco Pivetta,

> Would it perhaps make sense to drag in php-ds, which has matured quite a bit 
> over the years? I'm referring to: 
> https://www.php.net/manual/en/class.ds-sequence.php
> 
> Is what you are suggesting with `Vector` different from it?
> 
> Note: For some reason, I can't quote your post and then reply, so it will be 
> a top-post 路‍♀️

This was outlined in the section 
https://wiki.php.net/rfc/vector#why_not_use_php-ds_instead before I sent out 
the announcement. To expand on that,

This has been asked about multiple times in threads on unrelated proposals 
(https://externals.io/message/112639#112641 and 
https://externals.io/message/93301#93301 years ago) throughout the years,
but the maintainer of php-ds had a long term goal of developing the separately 
from php's release cycle (and was still focusing on the PECL when I'd asked on 
the GitHub issue in the link almost a year ago).

- There have been no proposals from the maintainer to do that so far, that was 
what the maintainer mentioned as a long term plan.
- I personally doubt having it developed separately from php's release cycle 
would be accepted by voters (e.g. if unpopular decisions couldn't be voted 
against), or how backwards compatibility would be handled in that model, and 
had other concerns. (e.g. API debates such as 
https://externals.io/message/93301#93301)
- With php-ds itself getting merged anytime soon seeming unlikely to me, I 
decided to start independently working on efficient data structure 
implementations.

I don't see dragging it in (against the maintainer's wishes) as a viable option 
for many, many, many reasons.
But having efficient datastructures in PHP's core is still useful.

- While PECL development outside of php has its benefits for development and 
ability to make new features available in older php releases,
  it's less likely that application and 
  library authors will start making use of those data structures because many 
users won't have any given PECL already installed. 
  (though php-ds also publishes a polyfill, it would not have the cpu and 
memory savings, and add its own overhead)

- Additionally, users (and organizations using PHP) can often make stronger 
assumptions on
  backwards compatibility and long-term availability of functionality that is 
merged into PHP's core.

So the choice of feature set, some names, signatures, and internal 
implementation details are different, because this is reimplementing a common 
datastructure found in different forms in many languages.
It's definitely a mature project, but I personally feel like reimplementing 
this (without referring to the php-ds source code and without copying the 
entire api as-is) is the best choice to add efficient data structures to core 
while respecting the maintainer's work on the php-ds project and their wish to 
maintain control over the php-ds project.

As a result, I've been working on implementing data structures such as Vector 
based on php-src's data structure implementations (mostly SplFixedArray and 
ArraayObject) instead (and based on my past PECL/RFC experience, e.g. with 
runkit7/igbinary)

Regards,
Tyson

Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-16 Thread tyson andre
Hi Levi Morrison,

> I'm okay with a final Vector class in general. I don't love the
> proposed API but don't hate it either. Feedback on that at the end.
> 
> What I would _love_ is a `vec` type from hacklang, which is similar to
> this but pass-by-value, copy-on-write like an array. Of course, this
> would require engine work and I understand it isn't as simple to add.

Yeah, as mentioned in 
https://wiki.php.net/rfc/vector#adding_a_native_type_instead_is_vec , it would 
require a massive amount of work.

- A standard library for dealing with `vec`, filtering it, etc
- Userland libraries and PECLs would need to deal with a third complex type 
different from array/object that probably couldn't be implicitly 
- Extensive familiarity with opcache and the JIT for x86 and other platforms 
beyond what I have
- Willingness to do that with the uncertainty the final implementation would 
get 2/3 votes with backwards compatibility objections, etc.

> Feedback on API:
> 
> -  `indexOf` returning `false` instead of `null` when it cannot be
> found. If we are keeping this method (which I don't like, because
> there's no comparator), please return `null` instead of false. The
> language has facilities for working with null like `??`, so please
> prefer that when it isn't needed for BC (like this, this is a new
> API).

I hadn't thought about that - that seems reasonable since I don't remember 
anything else adding indexOf as a method name.

> - `contains` also doesn't have a comparator.

I was considering proposing `->any(callable)` and `->all(callable)` extensions 
if this passed.
I'm not quite sure what you mean by a comparator for contains. There'd have to 
be a way to check if a raw closure is contained.

> -  Similarly but less strongly, I don't like the filter callable
> returning `mixed` -- please just make it `bool`.

The filter callable is something that would be passed into the filter function. 
The return value would be checked for truthiness.
The phpdoc in the documentation could be changed, but that wouldn't change the 
implementation.

> - I don't know what `setSize(int $size)` does. What does it do if the
> current size is less than `$size`? What about if its current size is
> greater? I suspect this is about capacity, not size, but without docs
 > I am just guessing.

It's the same behavior as 
https://www.php.net/manual/en/splfixedarray.setsize.php . It's about size, not 
capacity.

> Change the size of an array to the new size of size.
> If size is less than the current array size, any values after the new size 
> will be discarded.
> If size is greater than the current array size, the array will be padded with 
> null values.

I'd planned to add phpdoc documentation and examples before starting a vote to 
document the behavior and thrown exceptions of the proposed methods.

Thanks,
Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-16 Thread Levi Morrison via internals
On Thu, Sep 16, 2021 at 8:10 PM tyson andre  wrote:
>
> Hi internals,
>
> I've created a new RFC https://wiki.php.net/rfc/vector proposing to add 
> `final class Vector` to PHP.
>
> PHP's native `array` type is rare among programming language in that it is 
> used as an associative map of values, but also needs to support lists of 
> values.
> In order to support both use cases while also providing a consistent internal 
> array HashTable API to the PHP's internals and PECLs, additional memory is 
> needed to track keys 
> (https://www.npopov.com/2014/12/22/PHPs-new-hashtable-implementation.html - 
> around twice as much as is needed to just store the values due to needing 
> space both for the string pointer and int key in a Bucket, for non-reference 
> counted values)).
> Additionally, creating non-constant arrays will allocate space for at least 8 
> elements to make the initial resizing more efficient, potentially wasting 
> memory.
>
> It would be useful to have an efficient variable-length container in the 
> standard library for the following reasons:
>
> 1. To save memory in applications or libraries that may need to store many 
> lists of values and/or run as a CLI or embedded process for long periods of 
> time
>(in modules identified as using the most memory or potentially exceeding 
> memory limits in the worst case)
>(both in userland and in native code written in php-src/PECLs)
> 2. To provide a better alternative to `ArrayObject` and `SplFixedArray` for 
> use cases
>where objects are easier to use than arrays - e.g. variable sized 
> collections (For lists of values) that can be passed by value to be read and 
> modified.
> 3. To give users the option of stronger runtime guarantees that property, 
> parameter, or return values really contain a list of values without gaps, 
> that array modifications don't introduce gaps or unexpected indexes, etc.
>
> Thoughts on Vector?
>
> P.S. The functionality in this proposal can be tested/tried out at 
> https://pecl.php.net/teds (under the class name `\Teds\Vector` instead of 
> `\Vector`).
> (That is a PECL I created earlier this year for future versions of iterable 
> proposals, common data structures such as Vector/Deque, and less commonly 
> used data structures that may be of use in future work on implementing other 
> data structures)
>
> Thanks,
> Tyson
> --
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: https://www.php.net/unsub.php
>

I'm okay with a final Vector class in general. I don't love the
proposed API but don't hate it either. Feedback on that at the end.

What I would _love_ is a `vec` type from hacklang, which is similar to
this but pass-by-value, copy-on-write like an array. Of course, this
would require engine work and I understand it isn't as simple to add.

Feedback on API:

-  `indexOf` returning `false` instead of `null` when it cannot be
found. If we are keeping this method (which I don't like, because
there's no comparator), please return `null` instead of false. The
language has facilities for working with null like `??`, so please
prefer that when it isn't needed for BC (like this, this is a new
API).
- `contains` also doesn't have a comparator.
-  Similarly but less strongly, I don't like the filter callable
returning `mixed` -- please just make it `bool`.
- I don't know what `setSize(int $size)` does. What does it do if the
current size is less than `$size`? What about if its current size is
greater? I suspect this is about capacity, not size, but without docs
I am just guessing.

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-16 Thread Marco Pivetta
Hey Tyson,

Would it perhaps make sense to drag in php-ds, which has matured quite a
bit over the years? I'm referring to:
https://www.php.net/manual/en/class.ds-sequence.php

Is what you are suggesting with `Vector` different from it?

Note: For some reason, I can't quote your post and then reply, so it will
be a top-post 路‍♀️

On Fri, 17 Sep 2021, 04:10 tyson andre,  wrote:

> Hi internals,
>
> I've created a new RFC https://wiki.php.net/rfc/vector proposing to add
> `final class Vector` to PHP.
>
> PHP's native `array` type is rare among programming language in that it is
> used as an associative map of values, but also needs to support lists of
> values.
> In order to support both use cases while also providing a consistent
> internal array HashTable API to the PHP's internals and PECLs, additional
> memory is needed to track keys (
> https://www.npopov.com/2014/12/22/PHPs-new-hashtable-implementation.html
> - around twice as much as is needed to just store the values due to needing
> space both for the string pointer and int key in a Bucket, for
> non-reference counted values)).
> Additionally, creating non-constant arrays will allocate space for at
> least 8 elements to make the initial resizing more efficient, potentially
> wasting memory.
>
> It would be useful to have an efficient variable-length container in the
> standard library for the following reasons:
>
> 1. To save memory in applications or libraries that may need to store many
> lists of values and/or run as a CLI or embedded process for long periods of
> time
>(in modules identified as using the most memory or potentially
> exceeding memory limits in the worst case)
>(both in userland and in native code written in php-src/PECLs)
> 2. To provide a better alternative to `ArrayObject` and `SplFixedArray`
> for use cases
>where objects are easier to use than arrays - e.g. variable sized
> collections (For lists of values) that can be passed by value to be read
> and modified.
> 3. To give users the option of stronger runtime guarantees that property,
> parameter, or return values really contain a list of values without gaps,
> that array modifications don't introduce gaps or unexpected indexes, etc.
>
> Thoughts on Vector?
>
> P.S. The functionality in this proposal can be tested/tried out at
> https://pecl.php.net/teds (under the class name `\Teds\Vector` instead of
> `\Vector`).
> (That is a PECL I created earlier this year for future versions of
> iterable proposals, common data structures such as Vector/Deque, and less
> commonly used data structures that may be of use in future work on
> implementing other data structures)
>
> Thanks,
> Tyson


[PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-16 Thread tyson andre
Hi internals,

I've created a new RFC https://wiki.php.net/rfc/vector proposing to add `final 
class Vector` to PHP.

PHP's native `array` type is rare among programming language in that it is used 
as an associative map of values, but also needs to support lists of values.
In order to support both use cases while also providing a consistent internal 
array HashTable API to the PHP's internals and PECLs, additional memory is 
needed to track keys 
(https://www.npopov.com/2014/12/22/PHPs-new-hashtable-implementation.html - 
around twice as much as is needed to just store the values due to needing space 
both for the string pointer and int key in a Bucket, for non-reference counted 
values)).
Additionally, creating non-constant arrays will allocate space for at least 8 
elements to make the initial resizing more efficient, potentially wasting 
memory.

It would be useful to have an efficient variable-length container in the 
standard library for the following reasons: 

1. To save memory in applications or libraries that may need to store many 
lists of values and/or run as a CLI or embedded process for long periods of 
time 
   (in modules identified as using the most memory or potentially exceeding 
memory limits in the worst case)
   (both in userland and in native code written in php-src/PECLs)
2. To provide a better alternative to `ArrayObject` and `SplFixedArray` for use 
cases 
   where objects are easier to use than arrays - e.g. variable sized 
collections (For lists of values) that can be passed by value to be read and 
modified.
3. To give users the option of stronger runtime guarantees that property, 
parameter, or return values really contain a list of values without gaps, that 
array modifications don't introduce gaps or unexpected indexes, etc.

Thoughts on Vector?

P.S. The functionality in this proposal can be tested/tried out at 
https://pecl.php.net/teds (under the class name `\Teds\Vector` instead of 
`\Vector`).
(That is a PECL I created earlier this year for future versions of iterable 
proposals, common data structures such as Vector/Deque, and less commonly used 
data structures that may be of use in future work on implementing other data 
structures)

Thanks,
Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php