use cpu sensor for cpuspeed
The cpu hz sensor is more accurate and updates faster than than the value currently used for hw.cpuspeed. So return that value (scaled). This doesn't set cpuspeed directly because the acpi does that and it's hard to create a whole system of priority overrides. I think it's simpler and maybe even better to track every value, and then return the best one available. Index: identcpu.c === RCS file: /home/cvs/src/sys/arch/amd64/amd64/identcpu.c,v retrieving revision 1.124 diff -u -p -r1.124 identcpu.c --- identcpu.c 26 Apr 2022 10:48:20 - 1.124 +++ identcpu.c 15 May 2022 00:15:39 - @@ -64,6 +64,7 @@ void cpu_check_vmm_cap(struct cpu_info * /* sysctl wants this. */ char cpu_model[48]; int cpuspeed; +uint64_t sensorspeed; int amd64_has_xcrypt; #ifdef CRYPTO @@ -244,6 +245,8 @@ int cpu_amd64speed(int *freq) { *freq = cpuspeed; + if (sensorspeed) + *freq = sensorspeed / 1ULL; return (0); } @@ -337,6 +340,8 @@ cpu_hz_update_sensor(void *args) val = (adelta * 100) / mdelta * tsc_frequency; val = ((val + FREQ_50MHZ / 2) / FREQ_50MHZ) * FREQ_50MHZ; ci->ci_hz_sensor.value = val; + if (CPU_IS_PRIMARY(ci)) + sensorspeed = val; } atomic_clearbits_int(>p_flag, P_CPUPEG);
Re: Picky, but much more efficient arc4random_uniform!
On Sun, 15 May 2022, Steffen Nurpmeso wrote: > Stuart Henderson wrote in ... > |what's the perceived problem you're wanting to solve? and does that > |problem actually exist in the first place? > > The problem is that if have a low upper bound then modulo will "remove a > lot of randomization". For example if you have a program which > generates Lotto numbers (1..49), then using _uniform() as it is will > generate many duplicates. Wut. The *WHOLE POINT* of arc4random_uniform() is that it has uniform distribution. Says right so in the manpage. If an implementation of that API fails to do that, it's a broken implementation. The OpenBSD implementation achieves that. NetBSD's implementation has the same core logic. Here's my quick lotto pick demo program, doing 490 picks, so they should all be somewhere near 10: #include #include #define LIMIT 49 int main(void) { unsigned counts[LIMIT] = { 0 }; int i; for (i = 0; i < 10 * LIMIT; i++) counts[arc4random_uniform(LIMIT)]++; for (i = 0; i < LIMIT; i++) printf("%2d\t%7u\n", i+1, counts[i]); return 0; } And a sample run: : bleys; ./a.out 1 100100 2 100639 399965 499729 599641 699650 7 100299 8 100164 999791 10 100101 11 100657 12 100042 1399661 1499927 1599426 1699491 1799646 18 100133 19 100013 2099942 2199873 2299924 2399567 24 100152 25 100688 26 100011 27 100481 2899980 29 100406 3099726 3199808 3299929 33 100050 3499983 35 100048 3699771 3799906 38 100215 39 100261 40 100426 4199847 4299533 43 100368 4499695 45 100041 46 100465 4799875 48 100034 4999920 : bleys; Looks pretty good to me, with repeated runs showing the values bouncing around. ... > I was a bit interested when i saw Luke's message, but i am no > mathematician and, like i said, i in fact never needed the > functionality. And the question i would have is how small > upper_bound has to be for the modulo problem to really kick in. If the implementation isn't broken, it's not a problem, period. > And even if, whether it matters. For example, if you have > a bitset of 1024 "in-flight things", and the small upper bound > hits a used slot, you could simply search forward until you find > a free one. I mean, with 1024 possible values the number of > possibilities is anyhow restricted. Well hey, let's give Luke's a try and see how uniform it is: #include #include #define LIMIT 49 int main(void) { unsigned counts[LIMIT] = { 0 }; unsigned counts2[LIMIT] = { 0 }; int i; for (i = 0; i < 10 * LIMIT; i++) { counts[arc4random_uniform(LIMIT)]++; counts2[arc4random_uniform_fast_simple(LIMIT)]++; } for (i = 0; i < LIMIT; i++) printf("%2d\t%7u\t%7u\n", i+1, counts[i], counts2[i]); return 0; } : bleys; ./a.out 1 100188 76502 299983 76602 3 100521 76522 4 100416 76682 5 100171 76387 6 100163 76759 7 100024 76336 8 19 76703 999769 76237 1099892 76532 11 100197 76730 12 100483 76398 1399769 76310 14 100075 76474 1599781 76599 1699846 76439 1799814 76430 18 100313 76648 19 100259 76813 2099885 77068 21 100302 76546 228 76698 2399491 76678 24 100340 76324 2599763 115263 2699872 153008 27 100022 152979 2899481 153793 29 100018 210714 3099617 229286 31 100167 297003 32 100270 449664 33 100468 76790 3499115 76452 3599921 76392 3699862 76140 37 100485 76607 38 100029 75885 3999577 76498 4099479 76727 41 100139 76746 42 100883 76698 43 100102 76474 4499801 76592 45 100117 76124 4699678 76417 4799770 76639 4899524 77034 49 100151 76658 : bleys; Wow, that last column is *bad*. Repeated runs show 25-32 are consistently high, 32 being *outrageously* off, and the others are low. Luke's implementation does not correctly implement the API. Doesn't matter if it's a million times faster when it doesn't deliver the goal. Philip Guenther
Re: Picky, but much more efficient arc4random_uniform!
Stuart Henderson wrote in : |On 2022/05/14 06:56, Luke Small wrote: |> If I use arc4random_uniform() repeatedly to create a random distribution \ |> of |> say numbers less than 0x1000 or even something weird like 0x1300 will the |> random distribution be better with arc4random_uniform() or with mine? | |there's no point to have a choice of different arc4random_uniform_xyz |with different characteristics, how is somebody going to pick the |"right" one? | |the bottom of netbsd's version of the arc4random(3) manual says it well: | | One may be tempted to create new APIs to accommodate different \ | security | models and performance constraints without unpleasant surprises \ | on dif- | ferent operating systems. This should not be done lightly, though, | because there are already too many different choices, and too \ | many oppor- | tunities for programmers to reach for one and pick the wrong one. | |what's the perceived problem you're wanting to solve? and does that |problem actually exist in the first place? The problem is that if have a low upper bound then modulo will "remove a lot of randomization". For example if you have a program which generates Lotto numbers (1..49), then using _uniform() as it is will generate many duplicates. I used the approach i found in a TeX file over twenty years ago ("RANDOM.TEX, v.1 (Donald Arseneau) from TeTeX, texmf/tex/generic/misc/random.tex"; and now that i look, he published a .v2 not too long ago), it was named _clip() as it produces something in between a minimum and a maximum: _max -= _min; ++_max; _min = Maximum::sir - 2; // m - 2 = 2^(32|64) - 3 if (_max != 0) _min /= _max; for(;;) { uir i; i = _random; i += M1::uir; if(_min != 0) i /= _min; if(i >= _max && (i != 0 || _max != 0)) { _random *= 16807; // just MUL if(_random == 0) _random = 1; continue; } _random = i; break; } _random += origmin; _Assert(1 && _random >= origmin && _random <= origmax); That is the random number is multiplied with a, eh, non-prime, as long as we do not fit. However i found this became a terrible mess with 64-bit numbers, so i removed the function, aka did not "port it over" when i implemented a random number generator for my own use cases. Maybe i should have increased the multiplicator. It is really, really bad. :-) I only ever used it for a Lotto number program anyhow, i think, and so i thought it would be better to simply read a one byte random and skip those that do not fit, instead of making them always fit with modulo 50. That is, application specific workaround. Things are less harmful for large numbers, anyway. In fact i have to say i was astonished to see someone using this function, not too long ago i saw a FreeBSD commit fly by which uses it! I was a bit interested when i saw Luke's message, but i am no mathematician and, like i said, i in fact never needed the functionality. And the question i would have is how small upper_bound has to be for the modulo problem to really kick in. And even if, whether it matters. For example, if you have a bitset of 1024 "in-flight things", and the small upper bound hits a used slot, you could simply search forward until you find a free one. I mean, with 1024 possible values the number of possibilities is anyhow restricted. --steffen | |Der Kragenbaer,The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
Re: Picky, but much more efficient arc4random_uniform!
int main() { int results[3] = { 0, 0, 0 }; for (int i = 0; i < 10; i++) { results[arc4random_uniform_fast_simple(3)]++; } for (int i = 0; i < 3; i++) printf("%d: %d\n", i, results[i]); return 0; } % ./a.out 0: 24809 1: 50011 2: 25180 You can't reuse bits because they'll be biased.
Re: Picky, but much more efficient arc4random_uniform!
On 2022/05/14 06:56, Luke Small wrote: > If I use arc4random_uniform() repeatedly to create a random distribution of > say numbers less than 0x1000 or even something weird like 0x1300 will the > random distribution be better with arc4random_uniform() or with mine? there's no point to have a choice of different arc4random_uniform_xyz with different characteristics, how is somebody going to pick the "right" one? the bottom of netbsd's version of the arc4random(3) manual says it well: One may be tempted to create new APIs to accommodate different security models and performance constraints without unpleasant surprises on dif- ferent operating systems. This should not be done lightly, though, because there are already too many different choices, and too many oppor- tunities for programmers to reach for one and pick the wrong one. what's the perceived problem you're wanting to solve? and does that problem actually exist in the first place?
Re: stop using mquery(2) inside realloc(3)
On Sat, 14 May 2022, Philip Guenther wrote: > On Sat, 14 May 2022, Theo de Raadt wrote: > > I worry a little about having libc use an undocumented mmap(2) flag. > > About as much as using mquery, which is non-standard. > > > > Maybe __MAP_NOREPLACE should get documentation? __MAP_NOFAULT is in the > > same situation. The behaviour of these flags should be documented > > (set in stone), which may also discourage accidental behaviour changes > > by kernel developers in the future? > > To throw some spaghetti at the wall... Fix the grammar of the __MAP_NOFAULT description and mention signals there. Index: sys/mmap.2 === RCS file: /data/src/openbsd/src/lib/libc/sys/mmap.2,v retrieving revision 1.68 diff -u -p -r1.68 mmap.2 --- sys/mmap.2 31 Mar 2022 17:27:16 - 1.68 +++ sys/mmap.2 14 May 2022 19:31:28 - @@ -58,8 +58,19 @@ The argument describes the address where the system should place the mapping. If the .Dv MAP_FIXED -flag is specified, the allocation will happen at the specified address, +flag is specified and the +.Dv __MAP_NOREPLACE +flag is not specified, +the allocation will happen at the specified address, replacing any previously established mappings in its range. +If both the +.Dv MAP_FIXED +and +.Dv __MAP_NOREPLACE +flags are specified, +the allocation will happen at the specified address, +failing with no effect if any previously established mappings are +in its range. Otherwise, the mapping will be placed at the available spot at .Fa addr ; failing that it will be placed "close by". @@ -153,6 +164,18 @@ mappings) must be multiples of the page size. Existing mappings in the address range will be replaced. Use of this option is discouraged. +.It Dv __MAP_NOFAULT +Indicate that access to pages that are not backed by the mapped +file or device will result in zero-filled anonymous pages being +provided instead of a signal being delivered. +This flag must not be used with an anonymous mapping. +.It Dv __MAP_NOREPLACE +Indicates that a +.Dv MAP_FIXED +mapping should fail if it would require replacing any existing +mappings. +This flag must be used in combination with +.Dv MAP_FIXED . .It Dv MAP_STACK Indicate that the mapping is used as a stack. This flag must be used in combination with @@ -278,6 +301,12 @@ device does not support memory mapping. The allocation .Fa len was 0. +.It Bq Er EINVAL +.Dv __MAP_NOFAULT +was specified for an anonymous mapping or +.Dv __MAP_NOREPLACE +was specified without +.Dv MAP_FIXED . .It Bq Er ENOMEM .Dv MAP_FIXED was specified and the @@ -323,6 +352,14 @@ A fully functional system call first appeared in SunOS 4.0 and has been available since .Bx 4.3 Net/2 . +The +.Dv __MAP_NOFAULT +flag appeared in +.Ox 5.7 . +The +.Dv __MAP_NOREPLACE +flag appeared in +.Ox 5.3 . .Sh CAVEATS .St -p1003.1-2008 specifies that references to pages beyond the end of a mapped object
Re: stop using mquery(2) inside realloc(3)
On Sat, 14 May 2022, Theo de Raadt wrote: > I worry a little about having libc use an undocumented mmap(2) flag. > About as much as using mquery, which is non-standard. > > Maybe __MAP_NOREPLACE should get documentation? __MAP_NOFAULT is in the > same situation. The behaviour of these flags should be documented > (set in stone), which may also discourage accidental behaviour changes > by kernel developers in the future? To throw some spaghetti at the wall... Index: sys/mmap.2 === RCS file: /data/src/openbsd/src/lib/libc/sys/mmap.2,v retrieving revision 1.68 diff -u -p -r1.68 mmap.2 --- sys/mmap.2 31 Mar 2022 17:27:16 - 1.68 +++ sys/mmap.2 14 May 2022 19:06:00 - @@ -58,8 +58,19 @@ The argument describes the address where the system should place the mapping. If the .Dv MAP_FIXED -flag is specified, the allocation will happen at the specified address, +flag is specified and the +.Dv __MAP_NOREPLACE +flag is not specified, +the allocation will happen at the specified address, replacing any previously established mappings in its range. +If both the +.Dv MAP_FIXED +and +.Dv __MAP_NOREPLACE +flags are specified, +the allocation will happen at the specified address, +failing with no effect if any previously established mappings are +in its range. Otherwise, the mapping will be placed at the available spot at .Fa addr ; failing that it will be placed "close by". @@ -153,6 +164,17 @@ mappings) must be multiples of the page size. Existing mappings in the address range will be replaced. Use of this option is discouraged. +.It Dv __MAP_NOFAULT +Indicate that access to pages that are not backed by the mapped +file or device will be replaced by zero-filled anonymous pages. +This flag must not be used with an anonymous mapping. +.It Dv __MAP_NOREPLACE +Indicates that a +.Dv MAP_FIXED +mapping should fail if it would require replacing any existing +mappings. +This flag must be used in combination with +.Dv MAP_FIXED . .It Dv MAP_STACK Indicate that the mapping is used as a stack. This flag must be used in combination with @@ -278,6 +300,12 @@ device does not support memory mapping. The allocation .Fa len was 0. +.It Bq Er EINVAL +.Dv __MAP_NOFAULT +was specified for an anonymous mapping or +.Dv __MAP_NOREPLACE +was specified without +.Dv MAP_FIXED . .It Bq Er ENOMEM .Dv MAP_FIXED was specified and the @@ -323,6 +351,14 @@ A fully functional system call first appeared in SunOS 4.0 and has been available since .Bx 4.3 Net/2 . +The +.Dv __MAP_NOFAULT +flag appeared in +.Ox 5.7 . +The +.Dv __MAP_NOREPLACE +flag appeared in +.Ox 5.3 . .Sh CAVEATS .St -p1003.1-2008 specifies that references to pages beyond the end of a mapped object
Re: Picky, but much more efficient arc4random_uniform!
Look at my code. I don’t even use a modulus operator. I perform hit and miss with a random bitstream. How can I have a bias of something I don’t do? I return a bitstream which meets the parameters of being a value less than the upper bound. Much like arc4random_buf(). If I use arc4random_uniform() repeatedly to create a random distribution of say numbers less than 0x1000 or even something weird like 0x1300 will the random distribution be better with arc4random_uniform() or with mine? For 0x1000 mine will simply pluck 12 bits of random data straight from the arc4random() (and preserve the remaining 20 bits for later) on the first try, just like it’s arc4random_buf(). arc4random_uniform() will perform a modulus of a 32 bit number which adds data to the bitstream. Does it make it better? Perhaps it makes it harder to guess the source bits. I don’t know; and I’m not going to pretend to be a cryptologist. But I’m looking at modulo bias. I didn’t know what it was, before, but I basically “rejection sample”: https://research.kudelskisecurity.com/2020/07/28/the-definitive-guide-to-modulo-bias-and-how-to-avoid-it/ On Sat, May 14, 2022 at 6:14 AM Otto Moerbeek wrote: > On Sat, May 14, 2022 at 05:48:10AM -0500, Luke Small wrote: > > > arc4random_uniform_fast2 that I made, streams in data from arc4random() > and > > uses the datastream directly and uses it as a bit by bit right "sliding > > window" in the last loop. arc4random_uniform() uses a modulus which I is > > simple to implement, but I wonder how cryptographically sound or even how > > evenly it distributes. Adding a modulus seems sloppy without something > > better. I did make arc4random_fast_simple() which merely takes an > > upperbound. I integrated arc4random_uniform_fast_bitsearch() or whatever > > the top function was into it which binary searches to find the correct > size > > bitfield (return value) needed to barely fit the upperbound while also > > being able to discover every possible value below the upperbound. It > isn't > > as fast as arc4random_uniform_fast2 if it were used repeatedly after a > > single use of arc4random_uniform_fast_bitsearch() , but it does exactly > the > > same thing and appears faster than repeatedly using arc4random_uniform() > > and it's wasteful use of arc4random() and calling the expensive rekeying > > function more often. > > > > It may be interesting to determine even without looking at performance, > > whether arc4random_fast_simple() creates a more superior, secure use of > the > > chacha20 stream than arc4random_uniform() with the modulus. what exactly > > does all that extra data from the modulus do to the random distribution? > > > > -Luke > > I don't follow you at all. Your blabbering does not even use the terms > "uniform" and "modulo bias". I wonder even if you realize what they > mean in this context. > > -Otto > > -- -Luke
Re: stop using mquery(2) inside realloc(3)
I worry a little about having libc use an undocumented mmap(2) flag. About as much as using mquery, which is non-standard. Maybe __MAP_NOREPLACE should get documentation? __MAP_NOFAULT is in the same situation. The behaviour of these flags should be documented (set in stone), which may also discourage accidental behaviour changes by kernel developers in the future?
Re: librthread: validate timespec inputs with timespecisvalid(3)
On Sat, 14 May 2022 09:31:26 -0500, Scott Cheloha wrote: > ok? OK millert@ - todd
librthread: validate timespec inputs with timespecisvalid(3)
ok? Index: rthread_rwlock_compat.c === RCS file: /cvs/src/lib/librthread/rthread_rwlock_compat.c,v retrieving revision 1.1 diff -u -p -r1.1 rthread_rwlock_compat.c --- rthread_rwlock_compat.c 13 Feb 2019 13:15:39 - 1.1 +++ rthread_rwlock_compat.c 14 May 2022 14:29:27 - @@ -143,8 +143,7 @@ int pthread_rwlock_timedrdlock(pthread_rwlock_t *lockp, const struct timespec *abstime) { - if (abstime == NULL || abstime->tv_nsec < 0 || - abstime->tv_nsec >= 10) + if (abstime == NULL || !timespecisvalid(abstime)) return (EINVAL); return (_rthread_rwlock_rdlock(lockp, abstime, 0)); } @@ -210,8 +209,7 @@ int pthread_rwlock_timedwrlock(pthread_rwlock_t *lockp, const struct timespec *abstime) { - if (abstime == NULL || abstime->tv_nsec < 0 || - abstime->tv_nsec >= 10) + if (abstime == NULL || !timespecisvalid(abstime)) return (EINVAL); return (_rthread_rwlock_wrlock(lockp, abstime, 0)); } Index: rthread_sem.c === RCS file: /cvs/src/lib/librthread/rthread_sem.c,v retrieving revision 1.32 diff -u -p -r1.32 rthread_sem.c --- rthread_sem.c 6 Apr 2020 00:01:08 - 1.32 +++ rthread_sem.c 14 May 2022 14:29:27 - @@ -254,8 +254,7 @@ sem_timedwait(sem_t *semp, const struct int error; PREP_CANCEL_POINT(tib); - if (!semp || !(sem = *semp) || abstime == NULL || - abstime->tv_nsec < 0 || abstime->tv_nsec >= 10) { + if (!semp || !(sem = *semp) || !abstime || !timespecisvalid(abstime)) { errno = EINVAL; return (-1); } Index: rthread_sem_compat.c === RCS file: /cvs/src/lib/librthread/rthread_sem_compat.c,v retrieving revision 1.1 diff -u -p -r1.1 rthread_sem_compat.c --- rthread_sem_compat.c8 Jun 2018 13:53:01 - 1.1 +++ rthread_sem_compat.c14 May 2022 14:29:27 - @@ -19,6 +19,7 @@ #include #include #include +#include #include #include @@ -266,8 +267,7 @@ sem_timedwait(sem_t *semp, const struct int r; PREP_CANCEL_POINT(tib); - if (!semp || !(sem = *semp) || abstime == NULL || - abstime->tv_nsec < 0 || abstime->tv_nsec >= 10) { + if (!semp || !(sem = *semp) || !abstime || !timespecisvalid(abstime)) { errno = EINVAL; return (-1); }
Re: Picky, but much more efficient arc4random_uniform!
On Sat, May 14, 2022 at 05:48:10AM -0500, Luke Small wrote: > arc4random_uniform_fast2 that I made, streams in data from arc4random() and > uses the datastream directly and uses it as a bit by bit right "sliding > window" in the last loop. arc4random_uniform() uses a modulus which I is > simple to implement, but I wonder how cryptographically sound or even how > evenly it distributes. Adding a modulus seems sloppy without something > better. I did make arc4random_fast_simple() which merely takes an > upperbound. I integrated arc4random_uniform_fast_bitsearch() or whatever > the top function was into it which binary searches to find the correct size > bitfield (return value) needed to barely fit the upperbound while also > being able to discover every possible value below the upperbound. It isn't > as fast as arc4random_uniform_fast2 if it were used repeatedly after a > single use of arc4random_uniform_fast_bitsearch() , but it does exactly the > same thing and appears faster than repeatedly using arc4random_uniform() > and it's wasteful use of arc4random() and calling the expensive rekeying > function more often. > > It may be interesting to determine even without looking at performance, > whether arc4random_fast_simple() creates a more superior, secure use of the > chacha20 stream than arc4random_uniform() with the modulus. what exactly > does all that extra data from the modulus do to the random distribution? > > -Luke I don't follow you at all. Your blabbering does not even use the terms "uniform" and "modulo bias". I wonder even if you realize what they mean in this context. -Otto
Re: Picky, but much more efficient arc4random_uniform!
arc4random_uniform_fast2 that I made, streams in data from arc4random() and uses the datastream directly and uses it as a bit by bit right "sliding window" in the last loop. arc4random_uniform() uses a modulus which I is simple to implement, but I wonder how cryptographically sound or even how evenly it distributes. Adding a modulus seems sloppy without something better. I did make arc4random_fast_simple() which merely takes an upperbound. I integrated arc4random_uniform_fast_bitsearch() or whatever the top function was into it which binary searches to find the correct size bitfield (return value) needed to barely fit the upperbound while also being able to discover every possible value below the upperbound. It isn't as fast as arc4random_uniform_fast2 if it were used repeatedly after a single use of arc4random_uniform_fast_bitsearch() , but it does exactly the same thing and appears faster than repeatedly using arc4random_uniform() and it's wasteful use of arc4random() and calling the expensive rekeying function more often. It may be interesting to determine even without looking at performance, whether arc4random_fast_simple() creates a more superior, secure use of the chacha20 stream than arc4random_uniform() with the modulus. what exactly does all that extra data from the modulus do to the random distribution? -Luke
Re: Picky, but much more efficient arc4random_uniform!
In general, wasting some arc4random bits is not a problem at all. I see a lot of code with no demonstration, explanation or proof why it would be correct (i.e. produces uniform results). You only talk about speed. The current code might not be optimal, but at least it is very clear it's correct. -Otto On Fri, May 13, 2022 at 09:43:26AM -0500, Luke Small wrote: > I made a couple new versions of a new kind of arc4random_uniform-like > function and some example functions which use them. Instead of having a > sufficiently large random number greater than the modulus, I pick a random > number using arc4random() from a bitfield where the length of the bitfield > is just below or slightly beyond the value of the modulus and returns the > bitfield it if it is less than the value of the modulus. > > In both versions, I retrieve a bitfield from a static uint64_t which is > refreshed from periodic arc4random() calls. The functions demand a bit > length. but I provide a convenient bitsize search function: > arc4random_uniform_fast_bitsearch() > > in the first version, if the chosen return value isn't less than the > modulus, the entire bitfield is trashed and a completely new bitfield is > refreshed from the cache. This can be very inefficient and for certain > upperbounds where the functions demand that all returned values less than > the upperbound are possible. This can perform worse than > arc4random_uniform() even with more random number generation churn. > > in the second version, if the chosen return value isn't less than the > modulus, the bitfield is shifted right, losing the least significant bit > and a new random-value bit from the least significant bit of the cache is > put into the most significant bit of the bitfield and it tries again. For > significant expenditures of effort, this version is always more efficient > than arc4random_uniform() and slightly to much more efficient than my first > version. > > > The performance boost comes from less pseudorandom number generation by not > trashing perfectly good random data and preserving it so that rekeying is > performed less. > > > I suspect that the first version may be secure, but I'm now sure what you > think of the second version, for being secure. Is there a way to test how > well distributed and secure it is?! > > Perhaps it's even better than the typical use case of arc4random_uniform > since it feeds it a bitstream instead of performing a modulous operation on > it! > > I pledge("stdio") and unveil(NULL, NULL) at the beginning, so you know it > doesn't do anything but malloc() an array, do stuff on the array and print > stuff; if you don't profile, anyway. > > What do you think of having such a thing in OpenBSD? > > > > > newdata() generates random a-z characters performing arc4random_uniform(26); > > newdataTypableFilename() generates random filename typable characters > performing arc4random_uniform(92); > > I perform other tests including presumed worst-cases on large numbers and > numbers which will have a lot of misses. > > > > > -Luke
Re: stop using mquery(2) inside realloc(3)
On Fri, May 13, 2022 at 08:21:19PM -0700, Philip Guenther wrote: > If you try to grow a 'large' (at least half a page) allocation with > realloc(3), it'll try to extend the memory mapping for it and if that > works it won't need to move it. (on a side note: it does not do anything at all if the # of pages does not grow). > > Currently, it does that by using mquery(2) to see if that area is > available and if so then trying to mmap it, munmaping it if mmap(2) didn't > return the desired address (perhaps due to a race with another thread). > Instead of doing mquery/mmap/munmap, this path can use the MAP_FIXED and > __MAP_NOREPLACE flags to directly request the specific address but not > change anything if it's not available. > > > (We still use mquery in ld.so on i386 as a performance optimization, but > that use case differs in needing to find a base address for *multiple* > mappings, where unrolling partial hits grows to be more expensive than > probing with mquery and only trying to do all the mapping on a successful > set of probes: recall that on x86 munmap() is more expensive than mmap() > due to TLB shootdowns. This case in realloc() only has a single mapping to > probe/extend so it avoids those costs. Indeed, this diff eliminates the > current "munmap on failed attempt" path/cost.) > > > The regress/lib/libc/malloc tests continue to pass on my amd64 box, with > ktrace confirming the change in calls. > > oks? I was looking through the commit log to see when this flag was introduced. It was in sys/mman.h rev 1.21, but a bit hidden becuase the commit message talks about __MAP_NOREMAP. Anyway, I cannot oversee if it is relevant if all pmap implementations implement this or if this is done in a higher layer. Also a tiny bit worried the __MAP_NOREPLACE path in uvm isn't exercised as much, so it might reveal bugs. But thats no reason not to do this now. So if my first question can be answered with: it's in a higher layer, OK. -Otto > > > Philip > > > Index: stdlib/malloc.c > === > RCS file: /data/src/openbsd/src/lib/libc/stdlib/malloc.c,v > retrieving revision 1.273 > diff -u -p -r1.273 malloc.c > --- stdlib/malloc.c 26 Feb 2022 16:14:42 - 1.273 > +++ stdlib/malloc.c 14 May 2022 02:35:04 - > @@ -100,9 +100,6 @@ > #define MMAPA(a,sz,f)mmap((a), (sz), PROT_READ | PROT_WRITE, \ > MAP_ANON | MAP_PRIVATE | (f), -1, 0) > > -#define MQUERY(a,sz,f) mquery((a), (sz), PROT_READ | PROT_WRITE, \ > -MAP_ANON | MAP_PRIVATE | MAP_FIXED | (f), -1, 0) > - > struct region_info { > void *p;/* page; low bits used to mark chunks */ > uintptr_t size; /* size for pages, or chunk_info pointer */ > @@ -1687,11 +1684,7 @@ orealloc(struct dir_info **argpool, void > size_t needed = rnewsz - roldsz; > > STATS_INC(pool->cheap_realloc_tries); > - q = MQUERY(hint, needed, pool->mmap_flag); > - if (q == hint) > - q = MMAPA(hint, needed, > pool->mmap_flag); > - else > - q = MAP_FAILED; > + q = MMAPA(hint, needed, MAP_FIXED | > __MAP_NOREPLACE | pool->mmap_flag); > if (q == hint) { > STATS_ADD(pool->malloc_used, needed); > if (pool->malloc_junk == 2) > @@ -1709,9 +1702,6 @@ orealloc(struct dir_info **argpool, void > STATS_INC(pool->cheap_reallocs); > ret = p; > goto done; > - } else if (q != MAP_FAILED) { > - if (munmap(q, needed)) > - wrterror(pool, "munmap %p", q); > } > } > } else if (rnewsz < roldsz) { >