Re: [PHP-DEV] Re: [RFC] Under Discussion: Add Random class and RandomNumberGenerator interface

2021-06-14 Thread Guilliam Xavier
> Indeed, the current implementation appears to have several problems. So,
> how about the following implementation?
>
> ```php
> interface RandomNumberGenerator
> {
> public function generate(): int;
> }
>
> final class Random
> {
> private RandomNumberGenerator $rng;
> public function __construct(?RandomNumberGenerator $rng = null)
> {
> $this->rng = $rng ?? new XorShift128Plus(random_int(PHP_INT_MIN,
> PHP_INT_MAX));
> }
> public function nextInt(): int {}
> public function getInt(int $min, int $max): int {}
> public function getBytes(int $length): string {}
> public function shuffleArray(array $array): array {}
> public function shuffleString(string $string): string {}
> public function __serialize(): array {}
> public function __unserialize(array $data): void {}
> }
>
> class XorShift128PlusNumberGenerator implements RandomNumberGenerator
> {
> public function generate(): int {}
> public function __serialize(): array {}
> public function __unserialize(array $data): void {}
> }
>
> class MT19937NumberGenerator implements RandomNumberGenerator
> {
> public function generate(): int {}
> public function __serialize(): array {}
> public function __unserialize(array $data): void {}
> }
>
> class SecureNumberGenerator implements RandomNumberGenerator
> {
> public function generate(): int {}
> }
> ```
>
> This is an approach similar to Nikita's `createDefault`. If the
> constructor has a null argument, it uses the default, XorShift128+,
> internally.
>
> Also, whether the Random class is serializable or clonable depends on the
> instance of RandomNumberGenerator being used. This means that when the
> Random class clone is called, the `$rng` member will be implicitly cloned.
>
> How about this?
>

Hello, thanks for thinking again.

A few editorial notes:
- I guess that would be "new XorShift128PlusNumberGenerator" (like the
class name)?
- The non-Secure implementations' stubs are probably missing `public
function __construct(?int $seed = null) {}`?
- The Random class' and non-Secure implementations' stubs are probably
missing `public function __clone(): void {}` (like `__serialize()`)?

That would let us use the API like this:

1. Construction: one of:
- default RNG and seed: `$random = new Random();`
- chosen RNG, default seed: e.g. `$random = new Random(new
MT19937NumberGenerator());`
- chosen RNG and seed: e.g. `$random = new Random(new
MT19937NumberGenerator(1234));`
(no "default RNG, chosen seed", but the default RNG can be documented,
and one could argue that a chosen seed only makes sense with a chosen RNG
anyway)

2. Usage: `$int = $random->nextInt();`, `$percent = $random->getInt(0,
100);`, `$dword = $random->getBytes(4);`, `$shuffledList =
$random->shuffleArray($list);` etc.

I think this is well-consistent with the "pure design" described by Nikita,
and I personally find it both flexible/extensible and easy-to-use =)
(Just beware that the namespace question will probably pop up again.)

PS: I feel like my numerous questions/suggestions (in this thread and the
previous ones) may also have caused some deviations, so I hope that I won't
need more and that other participants will reach a consensus...

Best regards,

-- 
Guilliam Xavier


Re: [PHP-DEV] Re: [RFC] Under Discussion: Add Random class and RandomNumberGenerator interface

2021-06-12 Thread Go Kudo
Sorry, I'm late to reply.

> How much more "costly" would it be to define a class (implementing
RandomNumberGenerator) and use its (full) name as the algo identifier?

If the Random class always accepts an instance of the
RandomNumberGenerator, it will be necessary to provide a class that
implements the RandomNumberGenerator whenever you try to implement a new
RNG.
The current approach of specifying the algorithm by string does not require
the implementation of a class and is more developer friendly.

However, I may be the only one who writes an extension that adds RNG in the
first place, so the additional cost is well worth it.

> (already pointed out by Jordi) new Random(new CustomRNG(/*custom args*/),
$seed) is "illogical" (the passed $seed won't be used) but
"possible" (even if your current implementation throws a ValueError when
$seed is non-null);

I thought about it carefully after that, and this is indeed a problem.
It is difficult to find the problem easily by static analysis or other
methods.

> 64-bit problems

I feel like I'm making too big a deal out of this. This kind of
incompatibility happens some time in userland.
In conclusion, I see no problem with the implicit truncation behavior.



Indeed, the current implementation appears to have several problems. So,
how about the following implementation?

```php
interface RandomNumberGenerator
{
public function generate(): int;
}

final class Random
{
private RandomNumberGenerator $rng;
public function __construct(?RandomNumberGenerator $rng = null)
{
$this->rng = $rng ?? new XorShift128Plus(random_int(PHP_INT_MIN,
PHP_INT_MAX));
}
public function nextInt(): int {}
public function getInt(int $min, int $max): int {}
public function getBytes(int $length): string {}
public function shuffleArray(array $array): array {}
public function shuffleString(string $string): string {}
public function __serialize(): array {}
public function __unserialize(array $data): void {}
}

class XorShift128PlusNumberGenerator implements RandomNumberGenerator
{
public function generate(): int {}
public function __serialize(): array {}
public function __unserialize(array $data): void {}
}

class MT19937NumberGenerator implements RandomNumberGenerator
{
public function generate(): int {}
public function __serialize(): array {}
public function __unserialize(array $data): void {}
}

class SecureNumberGenerator implements RandomNumberGenerator
{
public function generate(): int {}
}
```

This is an approach similar to Nikita's `createDefault`. If the constructor
has a null argument, it uses the default, XorShift128+, internally.

Also, whether the Random class is serializable or clonable depends on the
instance of RandomNumberGenerator being used. This means that when the
Random class clone is called, the `$rng` member will be implicitly cloned.

How about this?

Regards,
Go Kudo

2021年6月10日(木) 1:11 Guilliam Xavier :

>
> On Wed, Jun 9, 2021 at 3:16 PM Go Kudo  wrote:
>
>> Thanks Guilliam.
>>
>> > I'm not sure you really addressed
>> https://externals.io/message/114561#114680 ?
>>
>> I thought I had solved this problem by implementing the
>> `RandomNumberGenerator` interface.
>>
>> Accepting strings and objects as arguments is certainly complicated, but
>> I thought it met the necessary requirements.
>> From the aspect of static analysis, there seems to be no problem.
>>
>> Reverting to a full object-based approach would increase the cost of
>> providing new RNGs from extensions. I think the string approach is superior
>> in this respect.
>>
>
> How much more "costly" would it be to define a class (implementing
> RandomNumberGenerator) and use its (full) name as the algo identifier?
>
>
>> Nikita's `createDefault()` approach is certainly good, but I think it is
>> still difficult for beginners to understand.
>>
>> I also thought about it for a while after that, and I think that the
>> current implementation is the most appropriate for now. What do you think?
>>
>
> I still agree with Nikita that there's a discrepancy between e.g. `new
> Random(RANDOM_XXX, $seed)` and `new Random(new CustomRNG(/*custom
> args*/))`, which also poses additional questions, e.g.:
>
> - (already pointed out by Jordi) `new Random(new CustomRNG(/*custom
> args*/), $seed)` is "illogical" (the passed $seed won't be used) but
> "possible" (even if your current implementation throws a ValueError when
> $seed is non-null);
>
> - This opens the possibility of "leaking" the RNG "source" (even if no
> `public function getRNG()`), e.g.:
>
> ```
> $rng = new CustomRNG(/*custom args*/);
> $r1 = new Random($rng);
> $r2 = new Random($rng);
> var_dump($r1->nextInt() === $r2->nextInt()); // false (not true), I
> suppose?
> ```
>
> or:
>
> ```
> $rng = new CustomRNG(/*custom args*/);
> $r = new Random($rng);
> var_dump($r->nextInt()); // 1st of sequence
> $rng->generate();
> var_dump($r->nextInt()); // 3rd of sequence (not 2nd), I 

Re: [PHP-DEV] Re: [RFC] Under Discussion: Add Random class and RandomNumberGenerator interface

2021-06-09 Thread Guilliam Xavier
On Wed, Jun 9, 2021 at 3:16 PM Go Kudo  wrote:

> Thanks Guilliam.
>
> > I'm not sure you really addressed
> https://externals.io/message/114561#114680 ?
>
> I thought I had solved this problem by implementing the
> `RandomNumberGenerator` interface.
>
> Accepting strings and objects as arguments is certainly complicated, but I
> thought it met the necessary requirements.
> From the aspect of static analysis, there seems to be no problem.
>
> Reverting to a full object-based approach would increase the cost of
> providing new RNGs from extensions. I think the string approach is superior
> in this respect.
>

How much more "costly" would it be to define a class (implementing
RandomNumberGenerator) and use its (full) name as the algo identifier?


> Nikita's `createDefault()` approach is certainly good, but I think it is
> still difficult for beginners to understand.
>
> I also thought about it for a while after that, and I think that the
> current implementation is the most appropriate for now. What do you think?
>

I still agree with Nikita that there's a discrepancy between e.g. `new
Random(RANDOM_XXX, $seed)` and `new Random(new CustomRNG(/*custom
args*/))`, which also poses additional questions, e.g.:

- (already pointed out by Jordi) `new Random(new CustomRNG(/*custom
args*/), $seed)` is "illogical" (the passed $seed won't be used) but
"possible" (even if your current implementation throws a ValueError when
$seed is non-null);

- This opens the possibility of "leaking" the RNG "source" (even if no
`public function getRNG()`), e.g.:

```
$rng = new CustomRNG(/*custom args*/);
$r1 = new Random($rng);
$r2 = new Random($rng);
var_dump($r1->nextInt() === $r2->nextInt()); // false (not true), I suppose?
```

or:

```
$rng = new CustomRNG(/*custom args*/);
$r = new Random($rng);
var_dump($r->nextInt()); // 1st of sequence
$rng->generate();
var_dump($r->nextInt()); // 3rd of sequence (not 2nd), I suppose?
```

(possibly in less obvious ways), which may be considered bad or not (but
still a discrepancy vs extension-provided algos).

Again, taking a class name and optional extra args (or an $options array,
like password_hash(), maybe even including 'seed' only for the algos that
need one) to construct, rather than an already constructed object, might be
better?

But that would also "split" the constructor args from the class (increasing
the risk of "mismatch"), and make using anonymous classes less easy, and
maybe we haven't considered all the uses cases (e.g. what about
`Random::getAlgoInfo(CustomRNG::class)`?)... So that might also be worse :/

It would be great to have more insights! (if nobody else speaks up then
let's keep it as is of course)


> > Couldn't the Random class simply implement `public function __clone():
> void` (that would internally do like `$this->rng = clone $this->rng;`)?
>
> Implicitly cloning in areas where the user cannot interfere may cause
> problems when using objects that cannot be cloned.
> However, this may be overthinking it. The reason is that a
> `RandomNumberGenerator` that cannot be cloned should be implemented as algo
> in the first place.
> If there are no objections, I will change the implementation.
>

Ah, true, I hadn't thought about non-clonable implementations... But that's
already the case for `__serialize()` and non-serializable implementations,
right? (But let's wait a bit for possible objections, indeed)


> > Indeed, a decision has to be made. If you throw an exception, we
> wouldn't have the "issue" of different results on 32- vs 64-bit machines,
> but XorShift128+ and Secure would be unusable on a 32-bit machine, right?
>
> I don't see any problem with the current implicit truncation behavior for
> this. Even if a user switches to a 64-bit environment, compatibility can be
> maintained by explicitly truncating the generated values. In addition, most
> of the other methods only use 32bit internally.
> If you are concerned about this, you should probably be concerned about
> the incompatibility with MT_RAND_PHP.
> That problem can also be compensated for with an extension that adds algo.
>

I thought you were "struggling with [this implicit truncation] behavior
when calling nextInt() in a 32-bit environment using a 64-bit RNG [...]
which means that the code loses compatibility with the result of running on
a 64-bit machine"? And you asked if throwing an exception would be
preferable?
Anyway, I personally don't care about 32-bit (but other people may).


> Regards,
> Go Kudo
>

Regards,

-- 
Guilliam Xavier


Re: [PHP-DEV] Re: [RFC] Under Discussion: Add Random class and RandomNumberGenerator interface

2021-06-09 Thread Go Kudo
Thanks Guilliam.

> I'm not sure you really addressed
https://externals.io/message/114561#114680 ?

I thought I had solved this problem by implementing the
`RandomNumberGenerator` interface.

Accepting strings and objects as arguments is certainly complicated, but I
thought it met the necessary requirements.
>From the aspect of static analysis, there seems to be no problem.

Reverting to a full object-based approach would increase the cost of
providing new RNGs from extensions. I think the string approach is superior
in this respect.

Nikita's `createDefault()` approach is certainly good, but I think it is
still difficult for beginners to understand.

I also thought about it for a while after that, and I think that the
current implementation is the most appropriate for now. What do you think?

> Couldn't the Random class simply implement `public function __clone():
void` (that would internally do like `$this->algo = clone $this->algo;`)?

Implicitly cloning in areas where the user cannot interfere may cause
problems when using objects that cannot be cloned.
However, this may be overthinking it. The reason is that a
`RandomNumberGenerator` that cannot be cloned should be implemented as algo
in the first place.
If there are no objections, I will change the implementation.

> Indeed, a decision has to be made. If you throw an exception, we wouldn't
have the "issue" of different results on 32- vs 64-bit machines, but
XorShift128+ and Secure would be unusable on a 32-bit machine, right?

I don't see any problem with the current implicit truncation behavior for
this. Even if a user switches to a 64-bit environment, compatibility can be
maintained by explicitly truncating the generated values. In addition, most
of the other methods only use 32bit internally.
If you are concerned about this, you should probably be concerned about the
incompatibility with MT_RAND_PHP.
That problem can also be compensated for with an extension that adds algo.

Regards,
Go Kudo

2021年6月9日(水) 19:07 Guilliam Xavier :

> Hello Go Kudo,
>
> On Tue, Jun 8, 2021 at 2:33 PM Go Kudo  wrote:
>
>> Hello iinternals.
>>
>> There doesn't seem to be much mention of it.
>> But, that may be because it has been discussed well in advance.
>> Thank you for participating in the discussion.
>>
>
> I'm not sure you really addressed
> https://externals.io/message/114561#114680 ? especially this part:
>
> """
> If you pick the second option, then you should consistently separate the
> source for both extension-provided RNGs and user-provided ones. I don't
> think it makes sense to have extension provided RNGs use a `new
> Random(RANDOM_XORSHIFT128PLUS)` interface, while user-provided ones use
> `new Random(new SomeRandomSource)`. Going down this route, it should be
> `new Random(new XorShift128Plus)`. This is a pure design, which also means
> that it's more complicated, and may make simple usages harder (though there
> is certainly room for a Random::createDefault() here.)
> """
>
> Now, that might sound like it would take us back to the "full OO" version
> (with multiple classes), which was deemed too "user-*un*friendly" by
> several people (including me :s),
> *but* there are some new ideas to consider, e.g. (for the Random class):
>
> - add a `public static function createDefault(): self` as suggested by
> Nikita, or
>
> - change `public function __construct` parameters to e.g.
>   `(string $algo = XorShift128Plus::class, ?int $seed = null, mixed
> ...$extra_args)`
>   (that would internally do a `new $algo($seed, ...$extra_args)` to store),
>   to be called not as
>   `new Random(new UserDefinedRNG($seed_or_null, $foo, $bar))` but as
>   `new Random(UserDefinedRNG::class, $seed_or_null, $foo, $bar)`.
>
> Any new opinions (or other ideas)?
>
>
>> Now, if there is no particular discussion on this, I will try to start the
>> voting phase next week.
>> Of course, I will contact you separately.
>>
>> However, I was looking back at the implementation and found only one point
>> of concern.
>>
>> With the current implementation, the results of the following example will
>> match.
>>
>> ```php
>> $one = new Random();
>> $one->nextInt();
>> $two = clone $one;
>>
>> var_dump($one->nextInt() === $two->nextInt()); // true
>> ```
>>
>> But, this is not true for user implementations.
>>
>> ```php
>> class UserRNG implements RandomNumberGenerator
>> {
>> protected int $current = 0;
>>
>> public function generate(): int
>> {
>> return ++$this->current;
>> }
>> }
>>
>> $rng = new UserRNG();
>> $one = new Random($rng);
>> $one->nextInt();
>> $two = clone $one;
>>
>> var_dump($one->nextInt() === $two->nextInt()); // false
>> ```
>>
>> This is because `$rng` is kept as a normal property and is managed by the
>> standard PHP mechanism. In other words, it is passed by reference.
>>
>> This is not consistent with the built-in RNG behavior. However, I don't
>> see
>> a problem with this behavior.
>> I feel that the standard PHP behavior is preferable 

Re: [PHP-DEV] Re: [RFC] Under Discussion: Add Random class and RandomNumberGenerator interface

2021-06-09 Thread Guilliam Xavier
>
> Couldn't the Random class simply implement `public function __clone():
> void` (that would internally do like `$this->algo = clone $this->algo;`)?
>

Sorry I meant like `$this->rng = clone $this->rng;`.

PS: Please don't reply to this erratum, but rather to the previous message
;)


Re: [PHP-DEV] Re: [RFC] Under Discussion: Add Random class and RandomNumberGenerator interface

2021-06-09 Thread Guilliam Xavier
Hello Go Kudo,

On Tue, Jun 8, 2021 at 2:33 PM Go Kudo  wrote:

> Hello iinternals.
>
> There doesn't seem to be much mention of it.
> But, that may be because it has been discussed well in advance.
> Thank you for participating in the discussion.
>

I'm not sure you really addressed https://externals.io/message/114561#114680
? especially this part:

"""
If you pick the second option, then you should consistently separate the
source for both extension-provided RNGs and user-provided ones. I don't
think it makes sense to have extension provided RNGs use a `new
Random(RANDOM_XORSHIFT128PLUS)` interface, while user-provided ones use
`new Random(new SomeRandomSource)`. Going down this route, it should be
`new Random(new XorShift128Plus)`. This is a pure design, which also means
that it's more complicated, and may make simple usages harder (though there
is certainly room for a Random::createDefault() here.)
"""

Now, that might sound like it would take us back to the "full OO" version
(with multiple classes), which was deemed too "user-*un*friendly" by
several people (including me :s),
*but* there are some new ideas to consider, e.g. (for the Random class):

- add a `public static function createDefault(): self` as suggested by
Nikita, or

- change `public function __construct` parameters to e.g.
  `(string $algo = XorShift128Plus::class, ?int $seed = null, mixed
...$extra_args)`
  (that would internally do a `new $algo($seed, ...$extra_args)` to store),
  to be called not as
  `new Random(new UserDefinedRNG($seed_or_null, $foo, $bar))` but as
  `new Random(UserDefinedRNG::class, $seed_or_null, $foo, $bar)`.

Any new opinions (or other ideas)?


> Now, if there is no particular discussion on this, I will try to start the
> voting phase next week.
> Of course, I will contact you separately.
>
> However, I was looking back at the implementation and found only one point
> of concern.
>
> With the current implementation, the results of the following example will
> match.
>
> ```php
> $one = new Random();
> $one->nextInt();
> $two = clone $one;
>
> var_dump($one->nextInt() === $two->nextInt()); // true
> ```
>
> But, this is not true for user implementations.
>
> ```php
> class UserRNG implements RandomNumberGenerator
> {
> protected int $current = 0;
>
> public function generate(): int
> {
> return ++$this->current;
> }
> }
>
> $rng = new UserRNG();
> $one = new Random($rng);
> $one->nextInt();
> $two = clone $one;
>
> var_dump($one->nextInt() === $two->nextInt()); // false
> ```
>
> This is because `$rng` is kept as a normal property and is managed by the
> standard PHP mechanism. In other words, it is passed by reference.
>
> This is not consistent with the built-in RNG behavior. However, I don't see
> a problem with this behavior.
> I feel that the standard PHP behavior is preferable to changing the
> userland behavior in a specific way.
>
> I would like to solicit opinions.
>

Couldn't the Random class simply implement `public function __clone():
void` (that would internally do like `$this->algo = clone $this->algo;`)?


>
> Regards,
> Go Kudo
>
> 2021年6月1日(火) 23:28 Go Kudo :
>
> > Hello internals.
> > Thanks for continuing to participate in the discussion.
> >
> > I've sorted out the proposal, and finished writing and implementing the
> > RFC.
> > (Funny, I know.) I think it's time to start a discussion.
> >
> > https://wiki.php.net/rfc/rng_extension
> > https://github.com/php/php-src/pull/7079
> >
> > The main changes since last time are as follows:
> >
> > - The ugly RANDOM_USER has been removed.
> > - RandomNumberGenerator interface has been added for user-defined RNGs.
> > - Random class is now final.
> > - Random class now accepts a RandomNumberGenerator interface other than
> > string as the first argument to the constructor.
> > - INI directive has been removed. In 32-bit environments, the result is
> > always truncated.
> >
> > What I'm struggling with now is the behavior when calling nextInt() in a
> > 32-bit environment using a 64-bit RNG. It currently returns a truncated
> > result, which means that the code loses compatibility with the result of
> > running on a 64-bit machine.
> > I was also considering throwing an exception, but which would be
> > preferable?
>

Indeed, a decision has to be made. If you throw an exception, we wouldn't
have the "issue" of different results on 32- vs 64-bit machines, but
XorShift128+ and Secure would be unusable on a 32-bit machine, right?
How do other existing implementations handle this question?

>
> > I would like to answer a few unanswered questions.
> >
> > > What is bias?
> >
> > I' ve confirmed that the bias issue in mt_rand has already been fixed and
> > is no longer a problem.
> >
> > This implementation copies most of it from mt_rand. Therefore, this
> > problem does not exist.
> >
> > Therefore, an equivalent method to mt_getrandmax() is no longer provided.
>

Great!


> >
> > > uint64 & right-shift
> >
> > This is a 

[PHP-DEV] Re: [RFC] Under Discussion: Add Random class and RandomNumberGenerator interface

2021-06-08 Thread Go Kudo
Hello iinternals.

There doesn't seem to be much mention of it.
But, that may be because it has been discussed well in advance.
Thank you for participating in the discussion.

Now, if there is no particular discussion on this, I will try to start the
voting phase next week.
Of course, I will contact you separately.

However, I was looking back at the implementation and found only one point
of concern.

With the current implementation, the results of the following example will
match.

```php
$one = new Random();
$one->nextInt();
$two = clone $one;

var_dump($one->nextInt() === $two->nextInt()); // true
```

But, this is not true for user implementations.

```php
class UserRNG implements RandomNumberGenerator
{
protected int $current = 0;

public function generate(): int
{
return ++$this->current;
}
}

$rng = new UserRNG();
$one = new Random($rng);
$one->nextInt();
$two = clone $one;

var_dump($one->nextInt() === $two->nextInt()); // false
```

This is because `$rng` is kept as a normal property and is managed by the
standard PHP mechanism. In other words, it is passed by reference.

This is not consistent with the built-in RNG behavior. However, I don't see
a problem with this behavior.
I feel that the standard PHP behavior is preferable to changing the
userland behavior in a specific way.

I would like to solicit opinions.

Regards,
Go Kudo

2021年6月1日(火) 23:28 Go Kudo :

> Hello internals.
> Thanks for continuing to participate in the discussion.
>
> I've sorted out the proposal, and finished writing and implementing the
> RFC.
> (Funny, I know.) I think it's time to start a discussion.
>
> https://wiki.php.net/rfc/rng_extension
> https://github.com/php/php-src/pull/7079
>
> The main changes since last time are as follows:
>
> - The ugly RANDOM_USER has been removed.
> - RandomNumberGenerator interface has been added for user-defined RNGs.
> - Random class is now final.
> - Random class now accepts a RandomNumberGenerator interface other than
> string as the first argument to the constructor.
> - INI directive has been removed. In 32-bit environments, the result is
> always truncated.
>
> What I'm struggling with now is the behavior when calling nextInt() in a
> 32-bit environment using a 64-bit RNG. It currently returns a truncated
> result, which means that the code loses compatibility with the result of
> running on a 64-bit machine.
> I was also considering throwing an exception, but which would be
> preferable?
>
> I would like to answer a few unanswered questions.
>
> > What is bias?
>
> I' ve confirmed that the bias issue in mt_rand has already been fixed and
> is no longer a problem.
>
> This implementation copies most of it from mt_rand. Therefore, this
> problem does not exist.
>
> Therefore, an equivalent method to mt_getrandmax() is no longer provided.
>
> > uint64 & right-shift
>
> This is a specification of the RNG implementation. PHP does not have
> unsigned integers, but RNG handles unsigned integers (to be precise, even
> the sign bit is random).
>
> As mentioned above, PHP does not have unsigned integers, which means that
> using the results generated by RNGs may cause compatibility problems in the
> future. To avoid this, a 1-bit right shift is always required (similar to
> mt_rand()).
>
>
>
>