Re: CVS commit: src/sys/uvm

2022-06-19 Thread Rin Okuyama

On 2022/06/09 1:55, Michael Lorenz wrote:

Module Name:src
Committed By:   macallan
Date:   Wed Jun  8 16:55:00 UTC 2022

Modified Files:
src/sys/uvm: uvm_map.c

Log Message:
initialize a variable to appease clang


My bug! Thanks for fixing it.

rin


Re: CVS commit: src/sys/uvm

2020-09-22 Thread Ryo ONODERA
Hi,

Chuck Silvers  writes:

> On Tue, Sep 22, 2020 at 01:51:33AM +0900, Ryo ONODERA wrote:
>> Hi,
>> 
>> It seems that r1.124 of uvm_amap.c causes random userland segfaults
>> on my NetBSD/amd64.
>
> this should be fixed now, could you try again?
>
> this was another case of my usual trick "test a change well, modify
> the change at the last minute, re-test only minimally, commit".
> sorry about that.

Your commit fixes the random segfaults.

Thank you very much for your quick fix.

> -Chuck

-- 
Ryo ONODERA // r...@tetera.org
PGP fingerprint = 82A2 DC91 76E0 A10A 8ABB  FD1B F404 27FA C7D1 15F3


Re: CVS commit: src/sys/uvm

2020-09-21 Thread Chuck Silvers
On Tue, Sep 22, 2020 at 01:51:33AM +0900, Ryo ONODERA wrote:
> Hi,
> 
> It seems that r1.124 of uvm_amap.c causes random userland segfaults
> on my NetBSD/amd64.

this should be fixed now, could you try again?

this was another case of my usual trick "test a change well, modify
the change at the last minute, re-test only minimally, commit".
sorry about that.

-Chuck


Re: CVS commit: src/sys/uvm

2020-09-21 Thread Ryo ONODERA
Hi,

It seems that r1.124 of uvm_amap.c causes random userland segfaults
on my NetBSD/amd64.

If you cannot reproduce the random segfaults, I will send my backtraces.

"Chuck Silvers"  writes:

> Module Name:  src
> Committed By: chs
> Date: Sun Sep 20 23:03:01 UTC 2020
>
> Modified Files:
>   src/sys/uvm: uvm_amap.c
>
> Log Message:
> Effectively disable the AMAP_REFALL flag because it is unsafe.
> This flag tells the amap code that it does not need to allocate ppref
> as part of adding or removing a reference, but that is only correct
> if the range of the reference being added or removed is the same
> as the range of all other references to the amap, and the point of
> this flag is exactly to try to optimize the case where the range is
> different and thus this flag would not be correct to use.
> Fixes PR 55366.
>
>
> To generate a diff of this commit:
> cvs rdiff -u -r1.123 -r1.124 src/sys/uvm/uvm_amap.c
>
> Please note that diffs are not public domain; they are subject to the
> copyright notices on the relevant files.
>
> Modified files:
>
> Index: src/sys/uvm/uvm_amap.c
> diff -u src/sys/uvm/uvm_amap.c:1.123 src/sys/uvm/uvm_amap.c:1.124
> --- src/sys/uvm/uvm_amap.c:1.123  Tue Aug 18 10:40:20 2020
> +++ src/sys/uvm/uvm_amap.cSun Sep 20 23:03:01 2020
> @@ -1,4 +1,4 @@
> -/*   $NetBSD: uvm_amap.c,v 1.123 2020/08/18 10:40:20 chs Exp $   */
> +/*   $NetBSD: uvm_amap.c,v 1.124 2020/09/20 23:03:01 chs Exp $   */
>  
>  /*
>   * Copyright (c) 1997 Charles D. Cranor and Washington University.
> @@ -35,7 +35,7 @@
>   */
>  
>  #include 
> -__KERNEL_RCSID(0, "$NetBSD: uvm_amap.c,v 1.123 2020/08/18 10:40:20 chs Exp 
> $");
> +__KERNEL_RCSID(0, "$NetBSD: uvm_amap.c,v 1.124 2020/09/20 23:03:01 chs Exp 
> $");
>  
>  #include "opt_uvmhist.h"
>  
> @@ -1593,7 +1593,7 @@ amap_adjref_anons(struct vm_amap *amap, 
>* so that the ppref values match the current amap refcount.
>*/
>  
> - if (amap->am_ppref == NULL && !all && len != amap->am_nslot) {
> + if (amap->am_ppref == NULL) {
>   amap_pp_establish(amap, offset);
>   }
>  #endif
>

-- 
Ryo ONODERA // r...@tetera.org
PGP fingerprint = 82A2 DC91 76E0 A10A 8ABB  FD1B F404 27FA C7D1 15F3


re: CVS commit: src/sys/uvm/pmap

2020-08-23 Thread matthew green
> Modified Files:
>   src/sys/uvm/pmap: pmap_segtab.c
> 
> Log Message:
> Remove the #if defined(__mips_n64) && PAGE_SIZE == 8192 and make the
> check MI - all PTs are PAGE_SIZE aligned

thanks!  that is a much better way of doing it.


.mrg.


Re: CVS commit: src/sys/uvm

2020-07-18 Thread Rin Okuyama

On 2020/07/18 17:30, Jukka Ruohonen wrote:

On Sat, Jul 18, 2020 at 05:19:07PM +0900, Rin Okuyama wrote:

For most (all?) ports, these specifiers are exposed only for
_KERNEL and friends. So, inttypes(*3*) would not be the best
place for them. Currently, I'm not sure where they should be.
/usr/share/misc/style?


Perhaps a new inttypes(9)?  Another option would be a split akin to
types(3).


I prefer to types(3); there already be types only for _KERNEL
namespace. But no strong opinion from me.

Thanks,
rin


Re: CVS commit: src/sys/uvm

2020-07-18 Thread Jukka Ruohonen
On Sat, Jul 18, 2020 at 05:19:07PM +0900, Rin Okuyama wrote:
> For most (all?) ports, these specifiers are exposed only for
> _KERNEL and friends. So, inttypes(*3*) would not be the best
> place for them. Currently, I'm not sure where they should be.
> /usr/share/misc/style?

Perhaps a new inttypes(9)?  Another option would be a split akin to
types(3).

- Jukka


Re: CVS commit: src/sys/uvm

2020-07-18 Thread Rin Okuyama

On 2020/07/16 16:02, matthew green wrote:

thanks!  i'll try to remember we have PRIxPADDR because i
considered looking for it and thought we didn't have it...


My pleasure!

On 2020/07/16 16:10, Jukka Ruohonen wrote:

The whole { PRIxPADDR, PRIxPSIZE, ..., PRIxREGISTER } family
should probably be documented in inttypes(3)?


For most (all?) ports, these specifiers are exposed only for
_KERNEL and friends. So, inttypes(*3*) would not be the best
place for them. Currently, I'm not sure where they should be.
/usr/share/misc/style?

Thanks,
rin


Re: CVS commit: src/sys/uvm

2020-07-16 Thread Jukka Ruohonen
On Thu, Jul 16, 2020 at 05:02:18PM +1000, matthew green wrote:
> thanks!  i'll try to remember we have PRIxPADDR because i
> considered looking for it and thought we didn't have it...

The whole { PRIxPADDR, PRIxPSIZE, ..., PRIxREGISTER } family
should probably be documented in inttypes(3)?

- Jukka


re: CVS commit: src/sys/uvm

2020-07-16 Thread matthew green
"Rin Okuyama" writes:
> Module Name:  src
> Committed By: rin
> Date: Wed Jul 15 15:08:26 UTC 2020
> 
> Modified Files:
>   src/sys/uvm: uvm_physseg.c
> 
> Log Message:
> Fix typo. Use PRIxPADDR rather than casting.

thanks!  i'll try to remember we have PRIxPADDR because i
considered looking for it and thought we didn't have it...


.mrg.


Re: CVS commit: src/sys/uvm

2020-05-11 Thread Joerg Sonnenberger
On Sun, May 10, 2020 at 11:53:00PM +0100, Alexander Nasonov wrote:
> Taylor R Campbell wrote:
> > Log Message:
> > Implement swap encryption.
> > 
> > Enabled by sysctl -w vm.swap_encrypt=1.
> 
> If secmodel_securelevel(9) is still a thing, locking down this sysctl
> at high securelevel may improve our security. Prior to this change,
> swap devices were readable (even if enrypted with cgd).  With this
> sysctl set to 1, all new swap devices will be encrypted, the only
> thing to worry about is if it's set back to 0 on a compromised host.

Well, the ability to turn it off should be locked down. Enabling it for
securelevel>0 seems fine?

Joerg


Re: CVS commit: src/sys/uvm

2020-05-11 Thread Alexander Nasonov
Taylor R Campbell wrote:
> This sounds entirely reasonable.  Would you like to draft an
> implementation of that?

Sure, I can look into this on the weekend.

> Presumably it would require writing a sysctl callback function for
> vm.swap_encrypt, and would somehow involve kauth, but I'm not sure
> offhand what needs to happen beyond that.  Perhaps vm.user_va0_disable
> can be a source of inspiration.

I implemented a similar behaviour for SVS sysctl but I later removed it
because SVS sysctl was removed.

-- 
Alex


Re: CVS commit: src/sys/uvm

2020-05-10 Thread Alistair Crooks
On Sat, 9 May 2020 at 14:50, Taylor R Campbell  wrote:

> Module Name:src
> Committed By:   riastradh
> Date:   Sat May  9 21:50:39 UTC 2020
>
> Modified Files:
> src/sys/uvm: uvm_swap.c
>
> Log Message:
> Implement swap encryption.
>
> Enabled by sysctl -w vm.swap_encrypt=1.  Key is generated lazily when
> we first need to swap a page.  Key is chosen independently for each
> swap device.  The ith swap page is encrypted with AES256-CBC using
> AES256_k(le32enc(i) || 0^96) as the initialization vector.  Can be
> changed at any time; no need for compatibility with on-disk formats.
> Costs one bit of memory per page in each swapdev, plus a few hundred
> bytes per swapdev to store the expanded AES key.
>
> Shoulda done this decades ago!  Plan to enable this by default;
> performance impact is unlikely to matter because it only happens when
> you're already swapping anyway.  Much easier to set up than cgd, so
> we can rip out all the documentation about carefully setting up
> random-keyed cgd at the right time.
>
> Thanks, this is great - looking forward to it being default!

Best,
Alistair


Re: CVS commit: src/sys/uvm

2020-05-10 Thread Taylor R Campbell
> Date: Sun, 10 May 2020 23:53:00 +0100
> From: Alexander Nasonov 
> 
> Taylor R Campbell wrote:
> > Log Message:
> > Implement swap encryption.
> > 
> > Enabled by sysctl -w vm.swap_encrypt=1.
> 
> If secmodel_securelevel(9) is still a thing, locking down this sysctl
> at high securelevel may improve our security. Prior to this change,
> swap devices were readable (even if enrypted with cgd).  With this
> sysctl set to 1, all new swap devices will be encrypted, the only
> thing to worry about is if it's set back to 0 on a compromised host.

This sounds entirely reasonable.  Would you like to draft an
implementation of that?

Presumably it would require writing a sysctl callback function for
vm.swap_encrypt, and would somehow involve kauth, but I'm not sure
offhand what needs to happen beyond that.  Perhaps vm.user_va0_disable
can be a source of inspiration.


Re: CVS commit: src/sys/uvm

2020-05-10 Thread Alexander Nasonov
Taylor R Campbell wrote:
> Log Message:
> Implement swap encryption.
> 
> Enabled by sysctl -w vm.swap_encrypt=1.

If secmodel_securelevel(9) is still a thing, locking down this sysctl
at high securelevel may improve our security. Prior to this change,
swap devices were readable (even if enrypted with cgd).  With this
sysctl set to 1, all new swap devices will be encrypted, the only
thing to worry about is if it's set back to 0 on a compromised host.

Not sure if this makes sense because all files on a compromised
host can be read and processes' memory can be probably dumped.

Alex


Re: CVS commit: src/sys/uvm

2020-04-26 Thread Rin Okuyama

On 2020/04/27 11:47, Rin Okuyama wrote:

Module Name:src
Committed By:   rin
Date:   Mon Apr 27 02:47:26 UTC 2020

Modified Files:
src/sys/uvm: uvm_extern.h

Log Message:
Add missing \ to fix build for PMAP_CACHE_VIVT, i.e., ARMv4 and prior.


s/v4/v5/


re: CVS commit: src/sys/uvm

2020-01-22 Thread matthew green
Andrew Doran writes:
> On Wed, Jan 22, 2020 at 10:08:16AM +1100, matthew green wrote:
> 
> > Andrew Doran writes:
> > > I also recommend disabling ACPI idle, at least until it can be made less
> > > aggressive by default.  It causes a significant slowdown.  It can be done
> > > with detaching all acpicpu devices using "drvctl -d" on each.
> > 
> > this disables cpufreq support, doesn't it?  that seems like
> > an unfortunate side effect..
> 
> I dunno, I assume so.  It seems like a tricky one to solve.
> 
> I think we could do with a heuristic that does a "fast idle" used when the
> system is busy and a "low power idle" when things have gotten quiet.  And
> the heuristic likely needs to be controllable with sysctl to suit
> preferences (power consumption, performance).
> 
> We probably also want to take a CPU in low power idle out of
> kcpuset_running, but that imposes its own penalty on x86 because it then
> goes from using broadcast IPIs to sending directed IPIs to every CPU;
> needs to be tried.
> 
> Also I don't know what to do about HyperThreading; it'd be nice if ACPI had
> the smarts to handle it.
> 
> My problem with this one is I don't know what I'm missing here...  Maybe
> time for a mail.

if machdep.idle-mechanism had a settable by user component,
like the way that that cpufreq provides 'available' and
'current' nodes, it would be probably easy.


.mrg.


Re: CVS commit: src/sys/uvm

2020-01-22 Thread Andrew Doran
On Wed, Jan 22, 2020 at 10:08:16AM +1100, matthew green wrote:

> Andrew Doran writes:
> > I also recommend disabling ACPI idle, at least until it can be made less
> > aggressive by default.  It causes a significant slowdown.  It can be done
> > with detaching all acpicpu devices using "drvctl -d" on each.
> 
> this disables cpufreq support, doesn't it?  that seems like
> an unfortunate side effect..

I dunno, I assume so.  It seems like a tricky one to solve.

I think we could do with a heuristic that does a "fast idle" used when the
system is busy and a "low power idle" when things have gotten quiet.  And
the heuristic likely needs to be controllable with sysctl to suit
preferences (power consumption, performance).

We probably also want to take a CPU in low power idle out of
kcpuset_running, but that imposes its own penalty on x86 because it then
goes from using broadcast IPIs to sending directed IPIs to every CPU;
needs to be tried.

Also I don't know what to do about HyperThreading; it'd be nice if ACPI had
the smarts to handle it.

My problem with this one is I don't know what I'm missing here...  Maybe
time for a mail.

Andrew


re: CVS commit: src/sys/uvm

2020-01-21 Thread matthew green
Andrew Doran writes:
> I also recommend disabling ACPI idle, at least until it can be made less
> aggressive by default.  It causes a significant slowdown.  It can be done
> with detaching all acpicpu devices using "drvctl -d" on each.

this disables cpufreq support, doesn't it?  that seems like
an unfortunate side effect..


.mrg.


Re: CVS commit: src/sys/uvm

2020-01-21 Thread Andrew Doran
On Fri, Jan 10, 2020 at 10:21:25PM +, Andrew Doran wrote:
> Hi Frank,
> 
> On Fri, Jan 10, 2020 at 01:10:02PM +0100, Frank Kardel wrote:
> 
> > Hi !
> > 
> > With this state of January 2nd we ran some tests for robustness and timing
> > with our database setup:
> > 
> > Machine:
> > 
> > Mainboard: S2600WFT
> > 
> > CPU: 2 x Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
> > 
> > machdep.spectre_v1.mitigated = 0
> > machdep.spectre_v2.hwmitigated = 1
> > machdep.spectre_v2.swmitigated = 1
> > machdep.spectre_v2.method = [GCC retpoline] + [Intel IBRS]
> > machdep.spectre_v4.mitigated = 0
> > machdep.spectre_v4.method = (none)
> > machdep.mds.mitigated = 0
> > machdep.mds.method = (none)
> > machdep.taa.mitigated = 0
> > machdep.taa.method = [MDS]
> > 
> > Memory:
> > 
> > hw.physmem64 = 549446447104
> > hw.usermem64 = 549438365696
> > 
> > This machine is/has been a challenge to NetBSD as it has 0.5Tb Memory and 32
> > cores.
> > 
> > Testcase is restoring a 1Tb Postgresql-11 database with varying degres of
> > Postresql pg_restore parallelism.
> > 
> > Why did we do the tests? The machine was installed with 8.99.24 as that
> > supported the memory setup.
> > 
> > The machine was not able to reliably copy with many db/restore processes and
> > large memory - see
> > 
> >  PR kern/54209: NetBSD 8 large memory performance extremely low
> >  PR kern/54210: NetBSD-8 processes presumably not exiting
> > 
> > for details.
> > 
> > With Andrew Doran's work on the vm system we restarted the tests.
> > 
> > The baseline is 8.99.24 from around  Sep  3 04:10:20 UTC 2018:
> > TEST 1
> > FRESH BOOT
> > time pg_restore -Upgsql -p5433 -Fd -d db -j5 20200103-db.dmpdir
> > 1826.599u 1752.878s 10:36:03.83 9.3%0+0k 397+0io 1789pf+0w
> > 
> > Higher levels of parallelism lead to a higher probability for catatonic
> > systems with increasing restore parallelism.
> > Trouble starts around -j8 and gets worse at higher levels.
> > 
> > TEST 2
> > 9.99.33 from around Fri Jan  3 16:14:02 CET 2020
> > FRESH BOOT
> > time pg_restore -Upgsql -p5433 -Fd -d db -j28 20200103-db.dmpdir
> > 2047.925u 1191.878s 14:24:15.23 6.2%0+0k 0+0io 5784pf+0w
> > 
> > This survived a -j28 run that was not possible with 8.99.24 - this a a big
> > step forward, but ~4h slower real time.
> > 
> > TEST 3
> > FRESH BOOT
> > 9.99.34 from around Mon Jan  6 14:43:01
> > time pg_restore -Upgsql -p5433 -Fd -d db -j28 20200103-db.dmpdir
> > 1816.348u 1792.530s 10:56:02.56 9.1%0+0k 395+0io 5620pf+0w
> > 
> > -j5 run to compare to 9.99.33 - big improvement in real run time though
> > system time went up.
> > 
> > TEST 4
> > State after TEST 3 run to compare to 8.99.24
> > time pg_restore -Upgsql -p5433 -Fd -d db -j5 20200103-db.dmpdir
> > 1706.548u 1748.623s 11:26:38.87 8.3%0+0k 0+0io 1420pf+0w
> > 
> > This ran faster that -j28 - probably due to less contention, but 50 min
> > slower that 8.99.24 after fresh boot.
> > 
> > TEST 5:
> > re-run TEST 4 with fresh boot for 8.99.24 comparison
> > time pg_restore -Upgsql -p5433 -Fd -d db -j5 20200103-db.dmpdir
> > 1710.665u 1611.083s 9:14:56.86 9.9% 0+0k 398+0io 1504pf+0w
> > 
> > better the 8.99.24 for real time.
> > 
> > There seems no big difference in system time between 8.99.24 and 9.99.34,
> > but a big improvement in robustness.
> > The lockups don't seem to happen any more and there are a fewer short term
> > system freezes and the systems remains
> > responsive with 9.99.34.
> > 
> > The big differences in real time are interesting but the cause for that may
> > not be easy to pinpoint. The database
> > runs on an nvme:
> > nvme0 at pci10 dev 0 function 0: Intel SSD DC P4500 (rev. 0x00)
> > nvme0: NVMe 1.2
> > nvme0: for admin queue interrupting at msix4 vec 0
> > nvme0: INTEL SSDPE2KX040T8, firmware VDV10131, serial ...
> > nvme0: for io queue 1 interrupting at msix4 vec 1 affinity to cpu0
> > [...]
> > nvme0: for io queue 32 interrupting at msix4 vec 32 affinity to cpu31
> > ld0 at nvme0 nsid 1
> > ld0: 3726 GB, 486401 cyl, 255 head, 63 sec, 512 bytes/sect x 7814037168
> > sectors
> > 
> > And we are seeing transfer rates up to 300Mb/s and up 80% busy on the
> > complex I/O (load) and CPU (build index) workload.
> > 
> > So in summary we a a big step forward in robustness.
> > 
> > Thanks to Andrew for the big improvements here.
> 
> Thank you for the detailed testing, and report.
> 
> Many of the changes to the VM system came from Takashi Yamamoto's
> yamt-pagecache branch, so it's not all my work.
> 
> I'm glad to hear that this has worked well for you.  There are a couple of
> things that, time permitting, I would like to get in place over the next few
> weeks which should help a little with this workload (and then I am done, for
> now).
> 
> The first is enabling Jaromir Dolecek's vm.ubc_direct by default, which may
> help with such a high I/O rate.  There is a possible deadlock condition with
> this that needs to be fixed first.

ubc_direct didn't make it in yet.  As an interim measure I 

Re: CVS commit: src/sys/uvm

2020-01-17 Thread Kamil Rytarowski
On 10.01.2020 23:21, Andrew Doran wrote:
> The second is pulling in efficient tracking of page clean/dirty status from
> the yamt-pagecache branch.  This reduces the amount of work fsync() needs to
> do, which should be of benefit to the DBMS.
> 

We probably should adapt DBMS to use fsync_range(), at least there where
Linux uses sync_file_range(). We might need to implement more semantics
if needed, but there is need to someone with expertise in DBMS to speak up.



signature.asc
Description: OpenPGP digital signature


Re: CVS commit: src/sys/uvm

2020-01-10 Thread Andrew Doran
Hi Frank,

On Fri, Jan 10, 2020 at 01:10:02PM +0100, Frank Kardel wrote:

> Hi !
> 
> With this state of January 2nd we ran some tests for robustness and timing
> with our database setup:
> 
> Machine:
> 
> Mainboard: S2600WFT
> 
> CPU: 2 x Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
> 
> machdep.spectre_v1.mitigated = 0
> machdep.spectre_v2.hwmitigated = 1
> machdep.spectre_v2.swmitigated = 1
> machdep.spectre_v2.method = [GCC retpoline] + [Intel IBRS]
> machdep.spectre_v4.mitigated = 0
> machdep.spectre_v4.method = (none)
> machdep.mds.mitigated = 0
> machdep.mds.method = (none)
> machdep.taa.mitigated = 0
> machdep.taa.method = [MDS]
> 
> Memory:
> 
> hw.physmem64 = 549446447104
> hw.usermem64 = 549438365696
> 
> This machine is/has been a challenge to NetBSD as it has 0.5Tb Memory and 32
> cores.
> 
> Testcase is restoring a 1Tb Postgresql-11 database with varying degres of
> Postresql pg_restore parallelism.
> 
> Why did we do the tests? The machine was installed with 8.99.24 as that
> supported the memory setup.
> 
> The machine was not able to reliably copy with many db/restore processes and
> large memory - see
> 
>  PR kern/54209: NetBSD 8 large memory performance extremely low
>  PR kern/54210: NetBSD-8 processes presumably not exiting
> 
> for details.
> 
> With Andrew Doran's work on the vm system we restarted the tests.
> 
> The baseline is 8.99.24 from around  Sep  3 04:10:20 UTC 2018:
> TEST 1
> FRESH BOOT
> time pg_restore -Upgsql -p5433 -Fd -d db -j5 20200103-db.dmpdir
> 1826.599u 1752.878s 10:36:03.83 9.3%0+0k 397+0io 1789pf+0w
> 
> Higher levels of parallelism lead to a higher probability for catatonic
> systems with increasing restore parallelism.
> Trouble starts around -j8 and gets worse at higher levels.
> 
> TEST 2
> 9.99.33 from around Fri Jan  3 16:14:02 CET 2020
> FRESH BOOT
> time pg_restore -Upgsql -p5433 -Fd -d db -j28 20200103-db.dmpdir
> 2047.925u 1191.878s 14:24:15.23 6.2%0+0k 0+0io 5784pf+0w
> 
> This survived a -j28 run that was not possible with 8.99.24 - this a a big
> step forward, but ~4h slower real time.
> 
> TEST 3
> FRESH BOOT
> 9.99.34 from around Mon Jan  6 14:43:01
> time pg_restore -Upgsql -p5433 -Fd -d db -j28 20200103-db.dmpdir
> 1816.348u 1792.530s 10:56:02.56 9.1%0+0k 395+0io 5620pf+0w
> 
> -j5 run to compare to 9.99.33 - big improvement in real run time though
> system time went up.
> 
> TEST 4
> State after TEST 3 run to compare to 8.99.24
> time pg_restore -Upgsql -p5433 -Fd -d db -j5 20200103-db.dmpdir
> 1706.548u 1748.623s 11:26:38.87 8.3%0+0k 0+0io 1420pf+0w
> 
> This ran faster that -j28 - probably due to less contention, but 50 min
> slower that 8.99.24 after fresh boot.
> 
> TEST 5:
> re-run TEST 4 with fresh boot for 8.99.24 comparison
> time pg_restore -Upgsql -p5433 -Fd -d db -j5 20200103-db.dmpdir
> 1710.665u 1611.083s 9:14:56.86 9.9% 0+0k 398+0io 1504pf+0w
> 
> better the 8.99.24 for real time.
> 
> There seems no big difference in system time between 8.99.24 and 9.99.34,
> but a big improvement in robustness.
> The lockups don't seem to happen any more and there are a fewer short term
> system freezes and the systems remains
> responsive with 9.99.34.
> 
> The big differences in real time are interesting but the cause for that may
> not be easy to pinpoint. The database
> runs on an nvme:
> nvme0 at pci10 dev 0 function 0: Intel SSD DC P4500 (rev. 0x00)
> nvme0: NVMe 1.2
> nvme0: for admin queue interrupting at msix4 vec 0
> nvme0: INTEL SSDPE2KX040T8, firmware VDV10131, serial ...
> nvme0: for io queue 1 interrupting at msix4 vec 1 affinity to cpu0
> [...]
> nvme0: for io queue 32 interrupting at msix4 vec 32 affinity to cpu31
> ld0 at nvme0 nsid 1
> ld0: 3726 GB, 486401 cyl, 255 head, 63 sec, 512 bytes/sect x 7814037168
> sectors
> 
> And we are seeing transfer rates up to 300Mb/s and up 80% busy on the
> complex I/O (load) and CPU (build index) workload.
> 
> So in summary we a a big step forward in robustness.
> 
> Thanks to Andrew for the big improvements here.

Thank you for the detailed testing, and report.

Many of the changes to the VM system came from Takashi Yamamoto's
yamt-pagecache branch, so it's not all my work.

I'm glad to hear that this has worked well for you.  There are a couple of
things that, time permitting, I would like to get in place over the next few
weeks which should help a little with this workload (and then I am done, for
now).

The first is enabling Jaromir Dolecek's vm.ubc_direct by default, which may
help with such a high I/O rate.  There is a possible deadlock condition with
this that needs to be fixed first.

The second is pulling in efficient tracking of page clean/dirty status from
the yamt-pagecache branch.  This reduces the amount of work fsync() needs to
do, which should be of benefit to the DBMS.

Cheers,
Andrew


Re: CVS commit: src/sys/uvm

2020-01-10 Thread Frank Kardel

Hi !

With this state of January 2nd we ran some tests for robustness and 
timing with our database setup:


Machine:

Mainboard: S2600WFT

CPU: 2 x Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz

machdep.spectre_v1.mitigated = 0
machdep.spectre_v2.hwmitigated = 1
machdep.spectre_v2.swmitigated = 1
machdep.spectre_v2.method = [GCC retpoline] + [Intel IBRS]
machdep.spectre_v4.mitigated = 0
machdep.spectre_v4.method = (none)
machdep.mds.mitigated = 0
machdep.mds.method = (none)
machdep.taa.mitigated = 0
machdep.taa.method = [MDS]

Memory:

hw.physmem64 = 549446447104
hw.usermem64 = 549438365696

This machine is/has been a challenge to NetBSD as it has 0.5Tb Memory 
and 32 cores.


Testcase is restoring a 1Tb Postgresql-11 database with varying degres 
of Postresql pg_restore parallelism.


Why did we do the tests? The machine was installed with 8.99.24 as that 
supported the memory setup.


The machine was not able to reliably copy with many db/restore processes 
and large memory - see


 PR kern/54209: NetBSD 8 large memory performance extremely low
 PR kern/54210: NetBSD-8 processes presumably not exiting

for details.

With Andrew Doran's work on the vm system we restarted the tests.

The baseline is 8.99.24 from around  Sep  3 04:10:20 UTC 2018:
TEST 1
FRESH BOOT
time pg_restore -Upgsql -p5433 -Fd -d db -j5 20200103-db.dmpdir
1826.599u 1752.878s 10:36:03.83 9.3%0+0k 397+0io 1789pf+0w

Higher levels of parallelism lead to a higher probability for catatonic 
systems with increasing restore parallelism.

Trouble starts around -j8 and gets worse at higher levels.

TEST 2
9.99.33 from around Fri Jan  3 16:14:02 CET 2020
FRESH BOOT
time pg_restore -Upgsql -p5433 -Fd -d db -j28 20200103-db.dmpdir
2047.925u 1191.878s 14:24:15.23 6.2%0+0k 0+0io 5784pf+0w

This survived a -j28 run that was not possible with 8.99.24 - this a a 
big step forward, but ~4h slower real time.


TEST 3
FRESH BOOT
9.99.34 from around Mon Jan  6 14:43:01
time pg_restore -Upgsql -p5433 -Fd -d db -j28 20200103-db.dmpdir
1816.348u 1792.530s 10:56:02.56 9.1%0+0k 395+0io 5620pf+0w

-j5 run to compare to 9.99.33 - big improvement in real run time though 
system time went up.


TEST 4
State after TEST 3 run to compare to 8.99.24
time pg_restore -Upgsql -p5433 -Fd -d db -j5 20200103-db.dmpdir
1706.548u 1748.623s 11:26:38.87 8.3%0+0k 0+0io 1420pf+0w

This ran faster that -j28 - probably due to less contention, but 50 min 
slower that 8.99.24 after fresh boot.


TEST 5:
re-run TEST 4 with fresh boot for 8.99.24 comparison
time pg_restore -Upgsql -p5433 -Fd -d db -j5 20200103-db.dmpdir
1710.665u 1611.083s 9:14:56.86 9.9% 0+0k 398+0io 1504pf+0w

better the 8.99.24 for real time.

There seems no big difference in system time between 8.99.24 and 
9.99.34, but a big improvement in robustness.
The lockups don't seem to happen any more and there are a fewer short 
term system freezes and the systems remains

responsive with 9.99.34.

The big differences in real time are interesting but the cause for that 
may not be easy to pinpoint. The database

runs on an nvme:
nvme0 at pci10 dev 0 function 0: Intel SSD DC P4500 (rev. 0x00)
nvme0: NVMe 1.2
nvme0: for admin queue interrupting at msix4 vec 0
nvme0: INTEL SSDPE2KX040T8, firmware VDV10131, serial ...
nvme0: for io queue 1 interrupting at msix4 vec 1 affinity to cpu0
[...]
nvme0: for io queue 32 interrupting at msix4 vec 32 affinity to cpu31
ld0 at nvme0 nsid 1
ld0: 3726 GB, 486401 cyl, 255 head, 63 sec, 512 bytes/sect x 7814037168 
sectors


And we are seeing transfer rates up to 300Mb/s and up 80% busy on the 
complex I/O (load) and CPU (build index) workload.


So in summary we a a big step forward in robustness.

Thanks to Andrew for the big improvements here.

Frank


On 01/02/20 03:00, Andrew Doran wrote:
> Module Name:src
> Committed By:ad
> Date:Thu Jan  2 02:00:35 UTC 2020
>
> Modified Files:
> src/sys/uvm: uvm_amap.c uvm_amap.h
>
> Log Message:
> Back out the amap allocation  changes from earlier today - have seen 
a panic

> with them.  Retain the lock changes.
>
>
> To generate a diff of this commit:
> cvs rdiff -u -r1.113 -r1.114 src/sys/uvm/uvm_amap.c
> cvs rdiff -u -r1.38 -r1.39 src/sys/uvm/uvm_amap.h
>
> Please note that diffs are not public domain; they are subject to the
> copyright notices on the relevant files.
>



re: CVS commit: src/sys/uvm

2020-01-02 Thread matthew green
"Andrew Doran" writes:
> Module Name:  src
> Committed By: ad
> Date: Fri Dec 27 13:19:25 UTC 2019
> 
> Modified Files:
>   src/sys/uvm: uvm.h uvm_page.c
> 
> Log Message:
> Nothing uses uvm.cpus any more, and we can do the same with cpu_lookup(),
> so get rid of it.

this has been useful to me in the past for diag / core dumps.


.mrg.


re: CVS commit: src/sys/uvm

2020-01-01 Thread matthew green
> > So I guess we won't be switching pg->phys_addr from paddr to pfn?
> > Speaking of which, any plans for expanding to >32-bit (or >31-bit, if
> > signed) pfns in the rest of uvm?
> 
> That's not part of my current plans for UVM, which right now extend only as
> far as breaking the back of the performance problems with builds.  31 bit
> pfns get us into the terabyte range I think.  I have a couple of thoughts
> here, firstly SVR4 or Solaris has a pfn_t and pgcnt_t or something like that
> and it would would be nice to have an analogue.  Kind of surprised we didn't
> inherit something like that from Mach.

FWIW, the numbers are: we break at 8TiB ram, that's where
the 32 bit signed overflow happens.

the other number of note: HP sell a machine with 48TiB of
ram already, since most of last year if not more.

another method would be to lie to UVM and provide it with
>4KiB pages to manage, like VAX already does.


.mrg.


Re: CVS commit: src/sys/uvm

2019-12-24 Thread Andrew Doran
On Tue, Dec 24, 2019 at 03:22:54AM +, Taylor R Campbell wrote:

> > Module Name:src
> > Committed By:   ad
> > Date:   Sat Dec 21 14:41:44 UTC 2019
> > 
> > - Add inlines to set/get locator values in the unused lower bits of
> >   pg->phys_addr.  Begin by using it to cache the freelist index, because
> >   computing it is expensive and that shows up during profiling.  Discussed
> >   on tech-kern.
> 
> So I guess we won't be switching pg->phys_addr from paddr to pfn?
> Speaking of which, any plans for expanding to >32-bit (or >31-bit, if
> signed) pfns in the rest of uvm?

That's not part of my current plans for UVM, which right now extend only as
far as breaking the back of the performance problems with builds.  31 bit
pfns get us into the terabyte range I think.  I have a couple of thoughts
here, firstly SVR4 or Solaris has a pfn_t and pgcnt_t or something like that
and it would would be nice to have an analogue.  Kind of surprised we didn't
inherit something like that from Mach.  Secondly in the comments above those
functions:

 * None of this is set in stone; it can be adjusted as needed.

What I have put in there is only a means to an end, to get some extra fields
without putting too much pressure on old systems with small memory.

> Can you use __BITS/__SHIFTIN/__SHIFTOUT for this instead of magic hex
> masks?

Good suggestion, I'll make that change.

Cheers,
Andrew


Re: CVS commit: src/sys/uvm

2019-12-23 Thread Jason Thorpe



> On Dec 23, 2019, at 7:22 PM, Taylor R Campbell 
>  wrote:
> 
> So I guess we won't be switching pg->phys_addr from paddr to pfn?

If that's the case, we should rename the field.

-- thorpej



Re: CVS commit: src/sys/uvm

2019-12-23 Thread Taylor R Campbell
> Module Name:src
> Committed By:   ad
> Date:   Sat Dec 21 14:41:44 UTC 2019
> 
> - Add inlines to set/get locator values in the unused lower bits of
>   pg->phys_addr.  Begin by using it to cache the freelist index, because
>   computing it is expensive and that shows up during profiling.  Discussed
>   on tech-kern.

So I guess we won't be switching pg->phys_addr from paddr to pfn?
Speaking of which, any plans for expanding to >32-bit (or >31-bit, if
signed) pfns in the rest of uvm?

> +static inline unsigned
> +uvm_page_get_bucket(struct vm_page *pg)
> +{
> +   return (pg->phys_addr & 0x3e0) >> 5;
> +}

Can you use __BITS/__SHIFTIN/__SHIFTOUT for this instead of magic hex
masks?

#define PHYSADDR_FREELIST   __BITS(0,4)
#define PHYSADDR_BUCKET __BITS(5,9)

static inline unsigned
uvm_page_get_bucket(struct vm_page *pg)
{
return __SHIFTOUT(pg->phys_addr, PHYSADDR_BUCKET);
}

static inline unsigned
uvm_page_set_bucket(struct vm_page *pg, unsigned b)
{
pg->phys_addr &= ~PHYSADDR_BUCKET;
pg->phys_addr |= __SHIFTIN(b, PHYSADDR_BUCKET);
}


Re: CVS commit: src/sys/uvm

2019-12-21 Thread Andrew Doran
On Sat, Dec 21, 2019 at 03:08:18PM +0100, Christoph Badura wrote:

> On Sat, Dec 21, 2019 at 12:58:26PM +, Andrew Doran wrote:
> > Modified Files:
> > src/sys/uvm: uvm_extern.h uvm_page.c
> > Log Message:
> > Add uvm_free(): returns number of free pages in system.
> 
> Can you rename this to a more descriptive name? Say uvm_free_pages() or
> something.
> 
> Also, we traditionally use xxx_free() to actually free some object and not
> return statistics.  It would be nice to spare future readers the
> confusion.

Yes that's fair enough, I see your point.  May not be today but I'll come
back to it soon.

Andrew


Re: CVS commit: src/sys/uvm

2019-12-21 Thread Christoph Badura
On Sat, Dec 21, 2019 at 12:58:26PM +, Andrew Doran wrote:
> Modified Files:
>   src/sys/uvm: uvm_extern.h uvm_page.c
> Log Message:
> Add uvm_free(): returns number of free pages in system.

Can you rename this to a more descriptive name? Say uvm_free_pages() or
something.

Also, we traditionally use xxx_free() to actually free some object and not
return statistics.  It would be nice to spare future readers the
confusion.

--chris


Re: CVS commit: src/sys/uvm

2019-12-01 Thread Taylor R Campbell
> Date: Sun, 1 Dec 2019 11:54:24 +
> From: Andrew Doran 
> 
> On Sun, Dec 01, 2019 at 08:19:09AM +, Maxime Villard wrote:
> 
> > Modified Files:
> > src/sys/uvm: uvm_fault.c
> > 
> > Log Message:
> > Use atomic_{load,store}_relaxed() on global counters.
> 
> If you would be so kind, please don't do any more of the UVM counters.  I
> have a patch to make these CPU local which will be going out for review in
> the near future.

Sounds good -- using atomic_load/store for these counters was intended
to be a stop-gap measure until someone did the work to make them
CPU-local, as discussed in the past couple weeks on tech-kern!


Re: CVS commit: src/sys/uvm

2019-12-01 Thread Andrew Doran
Hi,

On Sun, Dec 01, 2019 at 08:19:09AM +, Maxime Villard wrote:

> Modified Files:
>   src/sys/uvm: uvm_fault.c
> 
> Log Message:
> Use atomic_{load,store}_relaxed() on global counters.

If you would be so kind, please don't do any more of the UVM counters.  I
have a patch to make these CPU local which will be going out for review in
the near future.

Cheers,
Andrew

<<< uvm_fault.c
uvm_stat_inc(UVM_STAT_FLTANGET);
===
atomic_store_relaxed(,
atomic_load_relaxed() + 1);
>>> 1.210


re: CVS commit: src/sys/uvm

2019-08-06 Thread matthew green
"Maxime Villard" writes:
> Module Name:  src
> Committed By: maxv
> Date: Tue Aug  6 08:10:27 UTC 2019
> 
> Modified Files:
>   src/sys/uvm: uvm_mmap.c
> 
> Log Message:
> Change 'npgs' from int to size_t. Otherwise the 64bit->32bit conversion
> could lead to npgs=0, which is not expected. It later triggers a panic
> in uvm_vsunlock().
> 
> Found by TriforceAFL (Akul Pillai).

ah, and so we begin the task of making UVM handle more than 
31 bits of page number :-)


.mrg.


Re: CVS commit: src/sys/uvm

2018-12-10 Thread Jaromír Doleček
Good point, added asserts in genfs_io.c for this.
Le dim. 9 déc. 2018 à 23:56, matthew green  a écrit :
>
> "Jaromir Dolecek" writes:
> > Module Name:  src
> > Committed By: jdolecek
> > Date: Sun Dec  9 20:45:37 UTC 2018
> >
> > Modified Files:
> >   src/sys/uvm: uvm_bio.c
> >
> > Log Message:
> > for direct map case, avoid PGO_NOBLOCKALLOC when writing, it makes
> > genfs_getpages() return unallocated pages using the zero page and
> > PG_RDONLY; the old code relied on fault logic to get it allocated, which
> > the direct case can't rely on
> >
> > instead just allocate the blocks right away; pass PGO_JOURNALLOCKED
> > so that code wouldn't try to take wapbl lock, this code path is called
> > with it already held
>
> can you assert this if not already done?  that the locking is OK,
> i mean...
>
> thanks.
>
>
> .mrg.


re: CVS commit: src/sys/uvm

2018-12-09 Thread matthew green
"Jaromir Dolecek" writes:
> Module Name:  src
> Committed By: jdolecek
> Date: Sun Dec  9 20:45:37 UTC 2018
> 
> Modified Files:
>   src/sys/uvm: uvm_bio.c
> 
> Log Message:
> for direct map case, avoid PGO_NOBLOCKALLOC when writing, it makes
> genfs_getpages() return unallocated pages using the zero page and
> PG_RDONLY; the old code relied on fault logic to get it allocated, which
> the direct case can't rely on
> 
> instead just allocate the blocks right away; pass PGO_JOURNALLOCKED
> so that code wouldn't try to take wapbl lock, this code path is called
> with it already held

can you assert this if not already done?  that the locking is OK,
i mean...

thanks.


.mrg.


Re: CVS commit: src/sys/uvm

2017-10-27 Thread Christos Zoulas
On Oct 27, 10:14pm, campbell+netbsd-source-change...@mumble.net (Taylor R 
Campbell) wrote:
-- Subject: Re: CVS commit: src/sys/uvm

| > Due to incorrect error recovery mmap requests that were denied due to
| > PaX/MPROTECT ended up not cleaning up, which made processes stuck. I
| > could fix the commit message but that would break the git conversion.
| 
| I thought we had concluded that fixes within about 24h were annoying
| but OK.

I had not heard that :-)

christos


Re: CVS commit: src/sys/uvm

2017-10-27 Thread Kamil Rytarowski
On 28.10.2017 00:14, Taylor R Campbell wrote:
>> Date: Fri, 27 Oct 2017 21:46:48 + (UTC)
>> This needs to be pulled up to -8.
> 
> Yes.
> 

I've verified that the reported bug is gone.



signature.asc
Description: OpenPGP digital signature


Re: CVS commit: src/sys/uvm

2017-10-27 Thread Taylor R Campbell
> Date: Fri, 27 Oct 2017 21:46:48 + (UTC)
> From: chris...@astron.com (Christos Zoulas)
> 
> In article ,
> Paul Goyette   wrote:
> >Can you please PLEASE provide an actual description of the problem 
> >you're fixing.
> 
> Due to incorrect error recovery mmap requests that were denied due to
> PaX/MPROTECT ended up not cleaning up, which made processes stuck. I
> could fix the commit message but that would break the git conversion.

I thought we had concluded that fixes within about 24h were annoying
but OK.

> This needs to be pulled up to -8.

Yes.


Re: CVS commit: src/sys/uvm

2017-10-27 Thread Christos Zoulas
In article ,
Paul Goyette   wrote:
>On Fri, 27 Oct 2017, Utkarsh Anand wrote:
>
>> Module Name: src
>> Committed By:utkarsh009
>> Date:Fri Oct 27 12:01:08 UTC 2017
>>
>> Modified Files:
>>  src/sys/uvm: uvm_mmap.c
>>
>> Log Message:
>> [syzkaller] Fix for PR #52658 as suggested by riastradh@
>>
>> The bug was found by Dmitry Vyukov (dvyu...@google.com)
>> using syzkaller and was tested by me on a VM running
>> 8.99.5
>
>Can you please PLEASE provide an actual description of the problem 
>you're fixing.

Due to incorrect error recovery mmap requests that were denied due to
PaX/MPROTECT ended up not cleaning up, which made processes stuck. I
could fix the commit message but that would break the git conversion.

This needs to be pulled up to -8.

christos



Re: CVS commit: src/sys/uvm

2017-10-27 Thread Paul Goyette

On Fri, 27 Oct 2017, Utkarsh Anand wrote:


Module Name:src
Committed By:   utkarsh009
Date:   Fri Oct 27 12:01:08 UTC 2017

Modified Files:
src/sys/uvm: uvm_mmap.c

Log Message:
[syzkaller] Fix for PR #52658 as suggested by riastradh@

The bug was found by Dmitry Vyukov (dvyu...@google.com)
using syzkaller and was tested by me on a VM running
8.99.5


Can you please PLEASE provide an actual description of the problem 
you're fixing.




To generate a diff of this commit:
cvs rdiff -u -r1.166 -r1.167 src/sys/uvm/uvm_mmap.c

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.


!DSPAM:59f3200a228591155775156!




+--+--++
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:  |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com   |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org |
+--+--++


Re: CVS commit: src/sys/uvm

2017-05-21 Thread Kamil Rytarowski
On 20.05.2017 02:02, Kamil Rytarowski wrote:
> On 19.05.2017 17:30, Chuck Silvers wrote:
>> Module Name: src
>> Committed By:chs
>> Date:Fri May 19 15:30:19 UTC 2017
>>
>> Modified Files:
>>  src/sys/uvm: uvm_map.c uvm_mmap.c
>>
>> Log Message:
>> make MAP_FIXED mapping operations atomic. fixes PR 52239.
>> previously, unmapping any entries being replaced was done separately
>> from entering the new mapping, which allowed another thread doing
>> a non-MAP_FIXED mapping to allocate the range out from under the
>> MAP_FIXED thread.
>>
>>
>> To generate a diff of this commit:
>> cvs rdiff -u -r1.346 -r1.347 src/sys/uvm/uvm_map.c
>> cvs rdiff -u -r1.164 -r1.165 src/sys/uvm/uvm_mmap.c
>>
> 
> UVM broke after this commit. I cannot build packages due to random
> memory corruptions. Processes die / files (at least executables) contain
> trash.
> 
> There are also users on IRC reporting the same behavior.
> 

After recent changes the issues are gone.
 - src/sys/uvm/uvm_extern.h r.1.206
 - src/sys/uvm/uvm_map.c r.1.349
 - src/sys/uvm/uvm_mmap.c r.1.166

Thanks Chuck Silvers!



signature.asc
Description: OpenPGP digital signature


Re: CVS commit: src/sys/uvm

2017-05-19 Thread Kamil Rytarowski
On 19.05.2017 17:30, Chuck Silvers wrote:
> Module Name:  src
> Committed By: chs
> Date: Fri May 19 15:30:19 UTC 2017
> 
> Modified Files:
>   src/sys/uvm: uvm_map.c uvm_mmap.c
> 
> Log Message:
> make MAP_FIXED mapping operations atomic. fixes PR 52239.
> previously, unmapping any entries being replaced was done separately
> from entering the new mapping, which allowed another thread doing
> a non-MAP_FIXED mapping to allocate the range out from under the
> MAP_FIXED thread.
> 
> 
> To generate a diff of this commit:
> cvs rdiff -u -r1.346 -r1.347 src/sys/uvm/uvm_map.c
> cvs rdiff -u -r1.164 -r1.165 src/sys/uvm/uvm_mmap.c
> 

UVM broke after this commit. I cannot build packages due to random
memory corruptions. Processes die / files (at least executables) contain
trash.

There are also users on IRC reporting the same behavior.



signature.asc
Description: OpenPGP digital signature


Re: CVS commit: src/sys/uvm

2017-05-19 Thread Christos Zoulas
In article <20170519162231.ga3...@britannica.bec.de>,
Joerg Sonnenberger   wrote:
>On Fri, May 19, 2017 at 04:14:03PM +, co...@sdf.org wrote:
>> On Fri, May 19, 2017 at 03:30:19PM +, Chuck Silvers wrote:
>> > Module Name:   src
>> > Committed By:  chs
>> > Date:  Fri May 19 15:30:19 UTC 2017
>> > 
>> > Modified Files:
>> >src/sys/uvm: uvm_map.c uvm_mmap.c
>> > 
>> > Log Message:
>> > make MAP_FIXED mapping operations atomic. fixes PR 52239.
>> > previously, unmapping any entries being replaced was done separately
>> > from entering the new mapping, which allowed another thread doing
>> > a non-MAP_FIXED mapping to allocate the range out from under the
>> > MAP_FIXED thread.
>> 
>> Does that have security ramifications?
>
>It's a form of memory corruption under races. We should issue a SN for
>it, but I don't think MAP_FIXED is that popular in general. The case in
>jemalloc is now better served by using mprotect and PROT_MPROTECT, btw.

We should pull it up to -7 at least...

christos



Re: CVS commit: src/sys/uvm

2017-05-19 Thread Joerg Sonnenberger
On Fri, May 19, 2017 at 04:14:03PM +, co...@sdf.org wrote:
> On Fri, May 19, 2017 at 03:30:19PM +, Chuck Silvers wrote:
> > Module Name:src
> > Committed By:   chs
> > Date:   Fri May 19 15:30:19 UTC 2017
> > 
> > Modified Files:
> > src/sys/uvm: uvm_map.c uvm_mmap.c
> > 
> > Log Message:
> > make MAP_FIXED mapping operations atomic. fixes PR 52239.
> > previously, unmapping any entries being replaced was done separately
> > from entering the new mapping, which allowed another thread doing
> > a non-MAP_FIXED mapping to allocate the range out from under the
> > MAP_FIXED thread.
> 
> Does that have security ramifications?

It's a form of memory corruption under races. We should issue a SN for
it, but I don't think MAP_FIXED is that popular in general. The case in
jemalloc is now better served by using mprotect and PROT_MPROTECT, btw.

Joerg


Re: CVS commit: src/sys/uvm

2017-05-19 Thread coypu
On Fri, May 19, 2017 at 03:30:19PM +, Chuck Silvers wrote:
> Module Name:  src
> Committed By: chs
> Date: Fri May 19 15:30:19 UTC 2017
> 
> Modified Files:
>   src/sys/uvm: uvm_map.c uvm_mmap.c
> 
> Log Message:
> make MAP_FIXED mapping operations atomic. fixes PR 52239.
> previously, unmapping any entries being replaced was done separately
> from entering the new mapping, which allowed another thread doing
> a non-MAP_FIXED mapping to allocate the range out from under the
> MAP_FIXED thread.

Does that have security ramifications? it sounds like something that
won't be good if not atomic, but I don't know enough to tell if it can
be abused.

Thanks.


re: CVS commit: src/sys/uvm

2017-03-20 Thread matthew green
> Log Message:
> Ugh.   This stuff is disgusting.   We really need an arch dependent
> PRIxOFF (and PRIdOFF) to print off_t's in a way that matches the
> arch's definition of off_t.

off_t is supposed to be more-MI.  it comes down to:

sys/types.h:177:typedef __off_t off_t;  /* file offset */
ys/ansi.h:42:typedef __int64_t __off_t;/* file offset */

so it just needs to be whatever int64_t resolves to.  this should
not need MD code - but aliases to existing PRI*.


.mrg.


Re: CVS commit: src/sys/uvm

2017-03-20 Thread Robert Elz
Date:Mon, 20 Mar 2017 13:03:31 +0100
From:Joerg Sonnenberger 
Message-ID:  <20170320120331.ga19...@britannica.bec.de>

  | On Mon, Mar 20, 2017 at 10:44:24AM +, Robert Elz wrote:
  | > Why is there no PRI[xd]OFF ?   How are off_t's intended to be printed?
  | 
  | %jd + cast to intmax_t.

Yes, that one I knew - that's a generic solution to printing any
random unknown integral type - and it's ugly - useful as a fallback though.

If that was to be the general recommended solution, we'd have no need for
almost any of the gazillion PRI[xd]TYPE #defines that exist now (PRIxPSIZE,
PRIxVADDR, ...)

off_t's are a common enough data type that we should support printing them
without casts (IMO.)

kre




Re: CVS commit: src/sys/uvm

2017-03-20 Thread Joerg Sonnenberger
On Mon, Mar 20, 2017 at 10:44:24AM +, Robert Elz wrote:
> Why is there no PRI[xd]OFF ?   How are off_t's intended to be printed?

%jd + cast to intmax_t.

Joerg


Re: CVS commit: src/sys/uvm

2016-12-24 Thread coypu
On Fri, Dec 23, 2016 at 09:22:22AM +1100, matthew green wrote:
> > Log Message:
> > Use uvm_physseg.h:uvm_page_physload() instead of uvm_extern.h
> > 
> > For this, include uvm_physseg.h in the build and include tree, make a
> > cosmetic modification to the prototype for uvm_page_physload().
> 
> why does this need to be installed?
> 
> this header seems kernel-only, it should be not installed
> and it should not be included if !_KERNEL.
> 
> note that  should work in user or kernel
> sanely to include from anywhere outside of sys/uvm itself,
> but we shouldn't export things to userland we don't need.
> 

I've made it kernel-only as a quick fix because it uses paddr_t
which isn't available to userland, and it's breaking pkgsrc
packages, to keep -current usable.

see: http://mail-index.netbsd.org/pkgsrc-users/2016/12/24/msg024178.html
(devel/libuv).


re: CVS commit: src/sys/uvm

2016-12-22 Thread matthew green
"Cherry G. Mathew" writes:
> Module Name:  src
> Committed By: cherry
> Date: Thu Dec 22 13:26:25 UTC 2016
> 
> Modified Files:
>   src/sys/uvm: Makefile uvm.h uvm_extern.h uvm_page.c
> 
> Log Message:
> Use uvm_physseg.h:uvm_page_physload() instead of uvm_extern.h
> 
> For this, include uvm_physseg.h in the build and include tree, make a
> cosmetic modification to the prototype for uvm_page_physload().

why does this need to be installed?

this header seems kernel-only, it should be not installed
and it should not be included if !_KERNEL.

note that  should work in user or kernel
sanely to include from anywhere outside of sys/uvm itself,
but we shouldn't export things to userland we don't need.


.mrg.


Re: CVS commit: src/sys/uvm

2016-07-21 Thread Maxime Villard

Le 20/07/2016 à 20:07, matthew green a écrit :

"Maxime Villard" writes:

Module Name:src
Committed By:   maxv
Date:   Wed Jul 20 12:38:44 UTC 2016

Modified Files:
src/sys/uvm: uvm_extern.h uvm_km.c

Log Message:
Introduce uvm_km_protect.


can you bring this up on tech-kern and discuss what it is for?
this commit message is entirely useless today, let alone in the
future when someone tries to understand what is happening here.



[1], and then [2]. It's just a simple function to mprotect an area allocated
with uvm_km_alloc. It could also be used here [3].


[1] http://mail-index.netbsd.org/source-changes/2016/07/20/msg076249.html
[2] http://mail-index.netbsd.org/source-changes/2016/07/20/msg076250.html
[3] https://nxr.netbsd.org/xref/src/sys/dev/ic/sti.c#349


re: CVS commit: src/sys/uvm

2016-07-20 Thread matthew green
"Maxime Villard" writes:
> Module Name:  src
> Committed By: maxv
> Date: Wed Jul 20 12:38:44 UTC 2016
> 
> Modified Files:
>   src/sys/uvm: uvm_extern.h uvm_km.c
> 
> Log Message:
> Introduce uvm_km_protect.

can you bring this up on tech-kern and discuss what it is for?
this commit message is entirely useless today, let alone in the
future when someone tries to understand what is happening here.

changing UVM should always be discussed with UVM maintainers.

thanks.


.mrg.


Re: CVS commit: src/sys/uvm

2014-02-28 Thread Matt Thomas

On Feb 27, 2014, at 11:07 PM, Alan Barrett a...@cequrux.com wrote:

 On Wed, 26 Feb 2014, Matt Thomas wrote:
 Modified Files:
  src/sys/uvm: uvm_meter.c uvm_param.h
 
 Log Message:
 Add vm.min_address and vm.max_address which return VM_MIN_ADDRESS and
 VM_MAXUSER_ADDRESS.
 
 Do these need to use the old style #define constants
 instead of the new style (since 2005) CTL_CREATE idiom?

Don't need to but I think it's lower overhead if they use
old style definitions.



Re: CVS commit: src/sys/uvm

2014-02-28 Thread David Laight
On Fri, Feb 28, 2014 at 09:07:40AM +0200, Alan Barrett wrote:
 On Wed, 26 Feb 2014, Matt Thomas wrote:
 Modified Files:
  src/sys/uvm: uvm_meter.c uvm_param.h
 
 Log Message:
 Add vm.min_address and vm.max_address which return VM_MIN_ADDRESS and
 VM_MAXUSER_ADDRESS.
 
 Do these need to use the old style #define constants
 instead of the new style (since 2005) CTL_CREATE idiom?

I certainly thought that no new 'fixed number' sysctls were supposed to
be added.

David

-- 
David Laight: da...@l8s.co.uk


Re: CVS commit: src/sys/uvm

2014-02-27 Thread Alan Barrett

On Wed, 26 Feb 2014, Matt Thomas wrote:

Modified Files:
src/sys/uvm: uvm_meter.c uvm_param.h

Log Message:
Add vm.min_address and vm.max_address which return VM_MIN_ADDRESS and
VM_MAXUSER_ADDRESS.


Do these need to use the old style #define constants
instead of the new style (since 2005) CTL_CREATE idiom?

--apb (Alan Barrett)


Re: CVS commit: src/sys/uvm

2012-09-05 Thread Cherry G. Mathew
On 4 September 2012 12:32, Mindaugas Rasiukevicius rm...@netbsd.org wrote:
 Matt Thomas m...@3am-software.com wrote:

 On Sep 3, 2012, at 3:33 PM, Mindaugas Rasiukevicius wrote:

  Matt Thomas m...@netbsd.org wrote:
  Module Name:   src
  Committed By:  matt
  Date:  Mon Sep  3 19:53:43 UTC 2012
 
  Modified Files:
 src/sys/uvm: uvm_km.c uvm_map.c
 
  Log Message:
  Switch to a spin lock (uvm_kentry_lock) which, fortunately, was sitting
  there unused.
 
  - pmap_growkernel() may use adaptive locks, which cannot be acquired
  with the spin lock held; so the change breaks at least x86 and alpha.
 
  - Why in the caller?  I think it would be better do leave it for the
  pmaps, e.g. they may re-use the locks which already provide the
  necessary protection and which need to be taken anyway (like in x86
  pmap).

 uvm_maxkaddr need a lock for its updating

 growkernel can be called uvm_km_mem_alloc which might be called
 at interrupt level.

 The second point stands, but I see you already fixed it - thanks!

 As for pmap_growkernel() being called from interrupt context - right, then
 it seems Xen is broken, as its path in pmap_growkernel() acquires adaptive
 pmaps_lock and might call pool_cache_invalidate() which can block..


I don't see a xen specific lock being taken in pmap_growkernel()

-- 
~Cherry


Re: CVS commit: src/sys/uvm

2012-09-04 Thread Mindaugas Rasiukevicius
Matt Thomas m...@3am-software.com wrote:
 
 On Sep 3, 2012, at 3:33 PM, Mindaugas Rasiukevicius wrote:
 
  Matt Thomas m...@netbsd.org wrote:
  Module Name:   src
  Committed By:  matt
  Date:  Mon Sep  3 19:53:43 UTC 2012
  
  Modified Files:
 src/sys/uvm: uvm_km.c uvm_map.c
  
  Log Message:
  Switch to a spin lock (uvm_kentry_lock) which, fortunately, was sitting
  there unused.
  
  - pmap_growkernel() may use adaptive locks, which cannot be acquired
  with the spin lock held; so the change breaks at least x86 and alpha.
  
  - Why in the caller?  I think it would be better do leave it for the
  pmaps, e.g. they may re-use the locks which already provide the
  necessary protection and which need to be taken anyway (like in x86
  pmap).
 
 uvm_maxkaddr need a lock for its updating
 
 growkernel can be called uvm_km_mem_alloc which might be called
 at interrupt level.

The second point stands, but I see you already fixed it - thanks!

As for pmap_growkernel() being called from interrupt context - right, then
it seems Xen is broken, as its path in pmap_growkernel() acquires adaptive
pmaps_lock and might call pool_cache_invalidate() which can block..

-- 
Mindaugas


Re: CVS commit: src/sys/uvm

2012-09-03 Thread Mindaugas Rasiukevicius
Matt Thomas m...@netbsd.org wrote:
 Module Name:  src
 Committed By: matt
 Date: Mon Sep  3 19:53:43 UTC 2012
 
 Modified Files:
   src/sys/uvm: uvm_km.c uvm_map.c
 
 Log Message:
 Switch to a spin lock (uvm_kentry_lock) which, fortunately, was sitting
 there unused.

- pmap_growkernel() may use adaptive locks, which cannot be acquired with
  the spin lock held; so the change breaks at least x86 and alpha.

- Why in the caller?  I think it would be better do leave it for the pmaps,
  e.g. they may re-use the locks which already provide the necessary
  protection and which need to be taken anyway (like in x86 pmap).

-- 
Mindaugas


Re: CVS commit: src/sys/uvm

2012-09-03 Thread Matt Thomas

On Sep 3, 2012, at 3:33 PM, Mindaugas Rasiukevicius wrote:

 Matt Thomas m...@netbsd.org wrote:
 Module Name: src
 Committed By:matt
 Date:Mon Sep  3 19:53:43 UTC 2012
 
 Modified Files:
  src/sys/uvm: uvm_km.c uvm_map.c
 
 Log Message:
 Switch to a spin lock (uvm_kentry_lock) which, fortunately, was sitting
 there unused.
 
 - pmap_growkernel() may use adaptive locks, which cannot be acquired with
  the spin lock held; so the change breaks at least x86 and alpha.
 
 - Why in the caller?  I think it would be better do leave it for the pmaps,
  e.g. they may re-use the locks which already provide the necessary
  protection and which need to be taken anyway (like in x86 pmap).

uvm_maxkaddr need a lock for its updating

growkernel can be called uvm_km_mem_alloc which might be called
at interrupt level.


Re: CVS commit: src/sys/uvm

2011-09-29 Thread J. Hannken-Illjes

On Sep 29, 2011, at 12:52 AM, Matt Thomas wrote:

 Module Name:  src
 Committed By: matt
 Date: Wed Sep 28 22:52:16 UTC 2011
 
 Modified Files:
   src/sys/uvm: uvm_page.c uvm_pager.c uvm_pager.h
 
 Log Message:
 Reallocate emergency pager va when ncolors is increased. (modication of
 patch from mrg).

This panics LOCKDEBUG kernels as the call to uvm_pager_realloc_emerg()
in function uvm_page_recolor(), file uvm_page.c:1048 with spin
lock uvm_fpageqlock held leads to:

Mutex error: lockdebug_barrier: spin lock held
db{0} bt
mutex_enter() at netbsd:mutex_enter+0x4a3
uvm_km_check_empty() at netbsd:uvm_km_check_empty+0x73
uvm_map() at netbsd:uvm_map+0x1a2
uvm_km_alloc() at netbsd:uvm_km_alloc+0xe4
uvm_pager_realloc_emerg() at netbsd:uvm_pager_realloc_emerg+0x5b
uvm_page_recolor() at netbsd:uvm_page_recolor+0x35b
cpu_attach() at netbsd:cpu_attach+0x333

--
Juergen Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)



re: CVS commit: src/sys/uvm

2011-09-29 Thread matthew green

 
 On Sep 29, 2011, at 12:52 AM, Matt Thomas wrote:
 
  Module Name:src
  Committed By:   matt
  Date:   Wed Sep 28 22:52:16 UTC 2011
  
  Modified Files:
  src/sys/uvm: uvm_page.c uvm_pager.c uvm_pager.h
  
  Log Message:
  Reallocate emergency pager va when ncolors is increased. (modication of
  patch from mrg).
 
 This panics LOCKDEBUG kernels as the call to uvm_pager_realloc_emerg()
 in function uvm_page_recolor(), file uvm_page.c:1048 with spin
 lock uvm_fpageqlock held leads to:
 
 Mutex error: lockdebug_barrier: spin lock held
 db{0} bt
 mutex_enter() at netbsd:mutex_enter+0x4a3
 uvm_km_check_empty() at netbsd:uvm_km_check_empty+0x73
 uvm_map() at netbsd:uvm_map+0x1a2
 uvm_km_alloc() at netbsd:uvm_km_alloc+0xe4
 uvm_pager_realloc_emerg() at netbsd:uvm_pager_realloc_emerg+0x5b
 uvm_page_recolor() at netbsd:uvm_page_recolor+0x35b
 cpu_attach() at netbsd:cpu_attach+0x333

ah... this is the crash i saw that made me not commit my patch :-)


.mrg.


re: CVS commit: src/sys/uvm

2011-09-29 Thread matthew green

  Module Name:src
  Committed By:   matt
  Date:   Wed Sep 28 22:52:16 UTC 2011
  
  Modified Files:
  src/sys/uvm: uvm_page.c uvm_pager.c uvm_pager.h
  
  Log Message:
  Reallocate emergency pager va when ncolors is increased. (modication of
  patch from mrg).
 
 This panics LOCKDEBUG kernels as the call to uvm_pager_realloc_emerg()
 in function uvm_page_recolor(), file uvm_page.c:1048 with spin
 lock uvm_fpageqlock held leads to:
 
 Mutex error: lockdebug_barrier: spin lock held
 db{0} bt
 mutex_enter() at netbsd:mutex_enter+0x4a3
 uvm_km_check_empty() at netbsd:uvm_km_check_empty+0x73
 uvm_map() at netbsd:uvm_map+0x1a2
 uvm_km_alloc() at netbsd:uvm_km_alloc+0xe4
 uvm_pager_realloc_emerg() at netbsd:uvm_pager_realloc_emerg+0x5b
 uvm_page_recolor() at netbsd:uvm_page_recolor+0x35b
 cpu_attach() at netbsd:cpu_attach+0x333

this was the uvm_fpageqlock.  it should be fixed now with
uvm_page.c 1.177.


.mrg.


Re: CVS commit: src/sys/uvm

2011-08-10 Thread YAMAMOTO Takashi
[ moving to tech-kern ]

hi,

 y...@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote:
  
  Here is the updated patch after your changes:
  
  http://www.netbsd.org/~rmind/uvm_anon_freelst2.diff
  
  As you noted, uvm_anfree() can temporarily release the amap lock - that
  can happen in amap_copy().  Patch closes the race by moving uvm_anfree
  () further, and changes the semantics of the function, now called
  uvm_anon_freelst(), to return with amap lock released (plus free anons
  without lock held).
 
 the temporary release of the amap lock is only for O-A loan
 which you disabled, isn't it?
 
 Right, uvm_anon_locklaonpg() dance can happen only in O-A case.  However,
 having uvm_anfree() able to release the lock by its interface definition
 is potentially defective.  It is the main motivation why I want to slightly
 rework the code into uvm_anon_freelst() which would always drop the lock
 and move freeing of anons to the end point.  Cleaner, less error prone.

the committed change seems broken in case uvm_anon_dispose sets PG_RELEASED.
in that case, uvm_anon_freelst should leave the anon as it will be freed by
uvm_anon_release later.

YAMAMOTO Takashi

 
 -- 
 Mindaugas


Re: CVS commit: src/sys/uvm

2011-08-09 Thread Jukka Ruohonen
On Sat, Aug 06, 2011 at 05:25:04PM +, Mindaugas Rasiukevicius wrote:
 Module Name:  src
 Committed By: rmind
 Date: Sat Aug  6 17:25:04 UTC 2011
 
 Modified Files:
   src/sys/uvm: uvm_amap.c uvm_anon.c uvm_anon.h uvm_fault.c uvm_loan.c
   uvm_map.c
 
 Log Message:
 - Rework uvm_anfree() into uvm_anon_freelst(), which always drops the lock.
 - Free anons in uvm_anon_freelst() without lock held.
 - Mechanic sync to unused loaning code.

These changes caused a regression;

ump/rumpkern/t_sp (393/449): 10 test cases
basic: Passed.
fork_fakeauth: Passed.
fork_pipecomm: Passed.
fork_simple: Passed.
reconnect: Passed.
signal: Passed.
sigsafe: Passed.
stress_killer: panic: kernel diagnostic assertion anon-an_page ==
NULL failed: file
/bracket/i386/work/2011.08.06.19.23.38/src/sys/uvm/uvm_anon.c, line
202
cpu0: Begin traceback...
?(c0e86374,dc6,c5548940,1,c0ea46e4,0,c4ecbf80,1,c0ea3914,0) at 0
cpu0: End traceback...

dumping to dev 0,1 offset 1984
dump 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
succeeded

The same assertion is catched with other tests too (see [1]).

- Jukka.

[1] Examples:

http://releng.netbsd.org/b5reports/i386/build/2011.08.06.19.23.38/test.log
http://releng.netbsd.org/b5reports/i386/build/2011.08.06.19.47.54/test.log
http://releng.netbsd.org/b5reports/i386/build/2011.08.06.19.53.24/test.log
http://releng.netbsd.org/b5reports/i386/build/2011.08.07.02.18.56/test.log



Re: CVS commit: src/sys/uvm

2011-07-25 Thread Mindaugas Rasiukevicius
y...@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote:
   Log Message:
   amap_copy(): Keep the source amap locked until its lock has been
   copied.
  
  btw, this code seems to assume that uvm_anfree does not release the
  lock even temporarily while the comment on uvm_anfree1 says the
  opposite.
  
  http://www.netbsd.org/~rmind/uvm_anon_freelst.diff
  
  Looks good?
 
 i don't understand what it solves.  can you explain a little?

Here is the updated patch after your changes:

http://www.netbsd.org/~rmind/uvm_anon_freelst2.diff

As you noted, uvm_anfree() can temporarily release the amap lock - that can
happen in amap_copy().  Patch closes the race by moving uvm_anfree() further,
and changes the semantics of the function, now called uvm_anon_freelst(), to
return with amap lock released (plus free anons without lock held).

-- 
Mindaugas


Re: CVS commit: src/sys/uvm

2011-07-25 Thread YAMAMOTO Takashi
hi,

 y...@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote:
   Log Message:
   amap_copy(): Keep the source amap locked until its lock has been
   copied.
  
  btw, this code seems to assume that uvm_anfree does not release the
  lock even temporarily while the comment on uvm_anfree1 says the
  opposite.
  
  http://www.netbsd.org/~rmind/uvm_anon_freelst.diff
  
  Looks good?
 
 i don't understand what it solves.  can you explain a little?
 
 Here is the updated patch after your changes:
 
 http://www.netbsd.org/~rmind/uvm_anon_freelst2.diff
 
 As you noted, uvm_anfree() can temporarily release the amap lock - that can
 happen in amap_copy().  Patch closes the race by moving uvm_anfree() further,
 and changes the semantics of the function, now called uvm_anon_freelst(), to
 return with amap lock released (plus free anons without lock held).

the temporary release of the amap lock is only for O-A loan
which you disabled, isn't it?

YAMAMOTO Takashi

 
 -- 
 Mindaugas


Re: CVS commit: src/sys/uvm

2011-07-05 Thread YAMAMOTO Takashi
hi,

 y...@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote:
  Module Name:   src
  Committed By:  hannken
  Date:  Mon Jun 27 15:56:37 UTC 2011
  
  Modified Files:
 src/sys/uvm: uvm_amap.c
  
  Log Message:
  amap_copy(): Keep the source amap locked until its lock has been copied.
 
 btw, this code seems to assume that uvm_anfree does not release the lock
 even temporarily while the comment on uvm_anfree1 says the opposite.
 
 http://www.netbsd.org/~rmind/uvm_anon_freelst.diff
 
 Looks good?

i don't understand what it solves.  can you explain a little?

YAMAMOTO Takashi

 
 
 YAMAMOTO Takashi
 
 -- 
 Mindaugas


Re: CVS commit: src/sys/uvm

2011-06-29 Thread Mindaugas Rasiukevicius
y...@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote:
 hi,
 
  Module Name:src
  Committed By:   hannken
  Date:   Mon Jun 27 15:56:37 UTC 2011
  
  Modified Files:
  src/sys/uvm: uvm_amap.c
  
  Log Message:
  amap_copy(): Keep the source amap locked until its lock has been copied.
 
 btw, this code seems to assume that uvm_anfree does not release the lock
 even temporarily while the comment on uvm_anfree1 says the opposite.

Right.  Fix would be to always drop the lock in uvm_anfree() and call it
in the last point.  I will write a patch tomorrow.

 
 YAMAMOTO Takashi

-- 
Mindaugas


Re: CVS commit: src/sys/uvm

2011-06-27 Thread YAMAMOTO Takashi
hi,

 Module Name:  src
 Committed By: hannken
 Date: Mon Jun 27 15:56:37 UTC 2011
 
 Modified Files:
   src/sys/uvm: uvm_amap.c
 
 Log Message:
 amap_copy(): Keep the source amap locked until its lock has been copied.

btw, this code seems to assume that uvm_anfree does not release the lock
even temporarily while the comment on uvm_anfree1 says the opposite.

YAMAMOTO Takashi

 
 Kernel assertion anon-an_lock == amap-am_lock no longer fails.
 
 Ok: Mindaugas Rasiukevicius rm...@netbsd.org
 
 
 To generate a diff of this commit:
 cvs rdiff -u -r1.99 -r1.100 src/sys/uvm/uvm_amap.c
 
 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.


Re: CVS commit: src/sys/uvm

2011-06-27 Thread Masao Uebayashi
On Tue, Jun 28, 2011 at 9:39 AM, YAMAMOTO Takashi
y...@mwd.biglobe.ne.jp wrote:
 hi,

 Module Name:  src
 Committed By: hannken
 Date:         Mon Jun 27 15:56:37 UTC 2011

 Modified Files:
       src/sys/uvm: uvm_amap.c

 Log Message:
 amap_copy(): Keep the source amap locked until its lock has been copied.

 btw, this code seems to assume that uvm_anfree does not release the lock
 even temporarily while the comment on uvm_anfree1 says the opposite.

This locking protocol looks odd to me.  Mutex obj is alloc'ed in amap,
why not free'ed in amap too?  uobj is doing so.


 YAMAMOTO Takashi


 Kernel assertion anon-an_lock == amap-am_lock no longer fails.

 Ok: Mindaugas Rasiukevicius rm...@netbsd.org


 To generate a diff of this commit:
 cvs rdiff -u -r1.99 -r1.100 src/sys/uvm/uvm_amap.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.



Re: CVS commit: src/sys/uvm

2011-06-23 Thread tsugutomo . enami
Mindaugas Rasiukevicius rm...@netbsd.org writes:

 Module Name:  src
 Committed By: rmind
 Date: Thu Jun 23 18:15:30 UTC 2011

 Modified Files:
   src/sys/uvm: uvm_amap.c

 Log Message:
 Clean-up, add asserts, slightly simplify.


 To generate a diff of this commit:
 cvs rdiff -u -r1.95 -r1.96 src/sys/uvm/uvm_amap.c

In amap_copy(), I guess map entry may be clipped and need to recompute
entry-end - entry-start after that.

enami.


Re: CVS commit: src/sys/uvm

2011-06-23 Thread Mindaugas Rasiukevicius
tsugutomo.en...@jp.sony.com wrote:
  Modified Files:
  src/sys/uvm: uvm_amap.c
 
  Log Message:
  Clean-up, add asserts, slightly simplify.
 
 
  To generate a diff of this commit:
  cvs rdiff -u -r1.95 -r1.96 src/sys/uvm/uvm_amap.c
 
 In amap_copy(), I guess map entry may be clipped and need to recompute
 entry-end - entry-start after that.

Yes.. fixed.  Thanks!

-- 
Mindaugas


Re: CVS commit: src/sys/uvm

2011-06-23 Thread Mindaugas Rasiukevicius
YAMAMOTO Takashi y...@netbsd.org wrote:
 Module Name:  src
 Committed By: yamt
 Date: Fri Jun 24 01:23:05 UTC 2011
 
 Modified Files:
   src/sys/uvm: uvm_anon.c
 
 Log Message:
 uvm_anon_release: fix a locking error after the rmind-uvmplock merge
 

Thanks a lot!  I have been hunting this for a while without positive
result (apparently, just adding more regressions).

-- 
Mindaugas


Re: CVS commit: src/sys/uvm

2011-06-17 Thread Mindaugas Rasiukevicius
Hello,

Juergen Hannken-Illjes hann...@netbsd.org wrote:
 Module Name:  src
 Committed By: hannken
 Date: Fri Jun 17 09:50:52 UTC 2011
 
 Modified Files:
   src/sys/uvm: uvm_bio.c
 
 Log Message:
 When ubc_alloc() reuses a cached mapping window remove the object from
 the lists AFTER clearing its mapping.
 
 Removes a race where uvm_obj_destroy() sees an empty uo_ubc list and
 destroys the object before ubc_alloc() gets the objects lock to clear
 the mapping.

How is this relevant, since the entries are protected by ubc_object's
lock, which is held across ubc_alloc() and ubc_purge() parts?

-- 
Mindaugas


Re: CVS commit: src/sys/uvm

2011-01-20 Thread YAMAMOTO Takashi
hi,

do you have any plan to use pg-offset of anon pages,
or is it an unnecessary side-effect?

YAMAMOTO Takashi

 Module Name:  src
 Committed By: matt
 Date: Tue Jan  4 08:26:33 UTC 2011
 
 Modified Files:
   src/sys/uvm: uvm_extern.h uvm_fault.c uvm_km.c uvm_page.c
 
 Log Message:
 Add better color matching selecting free pages.  KM pages will now allocated
 so that VA and PA have the same color.  On a page fault, choose a physical
 page that has the same color as the virtual address.
 
 When allocating kernel memory pages, allow the MD to specify a preferred
 VM_FREELIST from which to choose pages.  For machines with large amounts
 of memory ( 4GB), all kernel memory to come from 4GB to reduce the amount
 of bounce buffering needed with 32bit DMA devices.
 
 
 To generate a diff of this commit:
 cvs rdiff -u -r1.167 -r1.168 src/sys/uvm/uvm_extern.h
 cvs rdiff -u -r1.178 -r1.179 src/sys/uvm/uvm_fault.c
 cvs rdiff -u -r1.106 -r1.107 src/sys/uvm/uvm_km.c
 cvs rdiff -u -r1.168 -r1.169 src/sys/uvm/uvm_page.c
 
 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.


Re: CVS commit: src/sys/uvm

2011-01-16 Thread YAMAMOTO Takashi
hi,

  (Just in case, have you read my paper?)
 
 which paper?  i guess no.
 
 http://uebayasi.dyndns.org/~uebayasi/tmp/xip.pdf

i will put it on my todo list.  thanks.

YAMAMOTO Takashi


Re: CVS commit: src/sys/uvm

2011-01-06 Thread Antti Kantee
On Thu Jan 06 2011 at 05:51:57 +, enami tsugutomo wrote:
 Module Name:  src
 Committed By: enami
 Date: Thu Jan  6 05:51:57 UTC 2011
 
 Modified Files:
   src/sys/uvm: uvm_fault.c
 
 Log Message:
 Fix bugs introduced by previous commit; allocated page needs to be bound
 with the anon, and uvmfault_anonget may be called with ufi NULL.

Thanks.  This makes automated tests work again.

-- 
älä karot toivorikkauttas, kyl rätei ja lumpui piisaa


Re: CVS commit: src/sys/uvm

2011-01-05 Thread Toru Nishimura

Masao Uebayashi uebay...@tombi.co.jp responds as;


The VIPT rule is simple;  just make sure to match colour between
VPN and PFN. 


Hmmm.  I've thought that the page colouring constraint is *should*,
not *must*.


Colour match between VPN and PFN is *must*

If not obeying the rule, it's possible two or more cache index is
selected for given a physical address.

Consider the reason why VIPT alias issue vanishes when
PAGE_SIZE is increased.   It would make the number of
colour into one  _by_matching_ low order bits (= colour
selector field) of VPN and PFN.

Toru Nishimura / ALKYL Technology



Re: CVS commit: src/sys/uvm

2011-01-05 Thread Masao Uebayashi
On Wed, Jan 05, 2011 at 05:58:34AM +, YAMAMOTO Takashi wrote:
 hi,
 
  I take silence as no objection.
 
 the silence in this case means was-busy-for-other-things-and-forgot.
 sorry.
 
  I have no real code for this big picture at this moment.  Making
  vm_physseg available as reference is the first step.  This only
  changes uvm_page_physload() to return a pointer:
  
 -void uvm_page_physload();
 +void *uvm_page_physload();
  
  But this makes XIP pager MUCH cleaner.  The reason has been explained
  many times.
 
 because the separate uvm_page_physload_device is no longer necessary,
 you mean?  i have no problem with the step.
 
  Making fault handlers and pagers to use vm_physseg * + off_t is
  the next step, and I don't intend to work on it now.  I just want
  to explain the big picture.
  
   

Keep vm_physseg * + off_t array on stack.  If UVM objects uses
vm_page (e.g. vnode), its pager looks up vm_page - vm_physseg *
+ off_t *once* and cache it on stack.
   
   do you mean something like this?
struct {
vm_physseg *hoge;
off_t fuga;
} foo [16];
  
  Yes.
  
  Or cache vm_page * with it, like:
  
 struct vm_id {
 vm_physseg *seg;
 off_t off;
 vm_page *pg;
 };
  
 uvm_fault()
 {
 vm_id pgs[];
 :
 }
  
  Vnode pager (genfs_getpages) takes vm_page's by looking up
  vnode::v_uobj's list, or uvn_findpages().
  
  When it returns back to fault handler, we have to lookup vm_physseg
  for each page.  Then fill the seg slot above (assuming we'll
  remove vm_page::phys_addr soon).
  
  Fault handler calls per-vm_page operations iff vm_page slot is filled.
  XIP pages are not pageq'ed.
 
 pgo_get returns either seg+off or pg for each vm_id slots?

vm_id[] is passed to pgo_get, each pgo_get (uvn_get, udv_get, ...)
will fill seg + off and return the array back to uvm_fault.

Pagers that use vm_page (e.g. vnode) will get pg from vnode, then
lookup seg+off, fill them into the vm_id slot.

Those that don't use vm_page (e.g. cdev) will get seg+off from
underlying objects, fill them into the vm_id slot, and return.

 
  XIP pages don't need vm_page, but
  cached because it's vnode.
 
 can you explain this sentence?

Sorry, it was so unclear.

XIP pages don't need vm_page (as a paging state variable),
because the pages are read-only and never involve paging.

XIP pages have to be mapped as cacheable at MMU level,
because it's vnode.

I'm allocating the full vm_page for XIP'able pages, only because
chs@ wanted to support loaning, which needs vm_page::loan_count.

 
  (Just in case, have you read my paper?)
 
 which paper?  i guess no.

http://uebayasi.dyndns.org/~uebayasi/tmp/xip.pdf

 
 YAMAMOTO Takashi

-- 
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635


Re: CVS commit: src/sys/uvm

2011-01-05 Thread Toru Nishimura

Some more rambling commments;


 Colour match between VPN and PFN is *must*


There are two cases when colour match rule does matter;

1. new physical page is assigned for a given va.
   - just make sure to choose matched PFN with VPN.
2. kernel double maps another va, which may be KVA or UVA.
   - just make sure to choose a matched va.
   - for example, vmapbuf() is responsible to do it right.

Infamous SOSEND_NO_LOAN option fails into the second
case.  ipc_socket.c is apparently ignorant about KVA
colour matching.  It ends up with potential VIPT alias by
choosing conflicting KVA to make double map.

The most common usage of mmap(2) is like;

   va = mmap((void *)0, ... );

Kernel is free to choose and return va which fits at colour
boundary.   If conflicting va is given in the 1st argument,
mmap neglects the hint value and choose matched va
which may or may not be close to the hint.

For each port, it's just ok to tell UVM the number of colour
at boot time.  In PIPT system, the number is one, which
makes whole VM logic works just as before.   In VIPT
system, determine the number of colour from cache size
and way count to tell.  No pmap fixup code is necessary.

I would emphasis here that whole VIPT issue lies in MI part
of VM, which is unware of colour match scheme and makes
VIPT alias by itself, not in VM MD part.

Toru Nishimura / ALKYL Technology


Re: CVS commit: src/sys/uvm

2011-01-04 Thread Toru Nishimura

Matt Thomas modified UVM colour matching scheme;


Modified Files:
   src/sys/uvm: uvm_extern.h uvm_fault.c uvm_km.c uvm_page.c

Log Message:
Add better color matching selecting free pages.  KM pages will now allocated
so that VA and PA have the same color.  On a page fault, choose a physical
page that has the same color as the virtual address.


This change is a big forward-reap to have VIPT-safe NetBSD VM for R4000 and
modern ARM.  Combined with matched KVA selection against UVA, found in
vm_machdep.c::vmapbuf(), there remains little ocasion where VM chooses
mistakenly wrong colour combination to bind VPN and PFN.

The entire effect is to eliminate the necessity of VIPT fixup efforts in 
port-specific
pmap.c and ends up with improving the cache effeciency in large degree.  This
is _the intent_behind VIPT design.  So far OS virtual memory strategy paid 
little
attention to make VIPT cache work correctly.

Toru Nishimura / ALKYL Technology


Re: CVS commit: src/sys/uvm

2011-01-04 Thread Michael Graff
On 1/4/11 2:26 AM, Matt Thomas wrote:
 Module Name:  src
 Committed By: matt
 Date: Tue Jan  4 08:26:33 UTC 2011
 
 Modified Files:
   src/sys/uvm: uvm_extern.h uvm_fault.c uvm_km.c uvm_page.c
 
 Log Message:
...
 When allocating kernel memory pages, allow the MD to specify a preferred
 VM_FREELIST from which to choose pages.  For machines with large amounts
 of memory ( 4GB), all kernel memory to come from 4GB to reduce the amount
 of bounce buffering needed with 32bit DMA devices.

Sorry to be clueless here, but does this mean I lose my large, automatic
disk cache on my 32GB machine, or is this just for other types of memory
purposes?

--Michael


Re: CVS commit: src/sys/uvm

2011-01-04 Thread Matt Thomas

On Jan 4, 2011, at 2:19 AM, Michael Graff wrote:

 On 1/4/11 2:26 AM, Matt Thomas wrote:
 Module Name: src
 Committed By:matt
 Date:Tue Jan  4 08:26:33 UTC 2011
 
 Modified Files:
  src/sys/uvm: uvm_extern.h uvm_fault.c uvm_km.c uvm_page.c
 
 Log Message:
 ...
 When allocating kernel memory pages, allow the MD to specify a preferred
 VM_FREELIST from which to choose pages.  For machines with large amounts
 of memory ( 4GB), all kernel memory to come from 4GB to reduce the amount
 of bounce buffering needed with 32bit DMA devices.
 
 Sorry to be clueless here, but does this mean I lose my large, automatic
 disk cache on my 32GB machine, or is this just for other types of memory
 purposes?

Not really.  A lot of device can only do 32bit DMA transfers so without some 
type assistance (like the alpha has) you are restricted to DMA to the first 4GB 
of RAM.  If you are doing DMA to an address = 4GB, the system will stage the 
data in a buffer  4GB for the DMA and then move the data to its ultimate 
destination (for a read).  For a write it copies to the bounce buffer, and 
then does a DMA.

The point is if you are allocating a mbuf or a usb buffer, it makes sense to 
allocate from the first 4GB if you can to avoid using bounce buffers.

It's just a preference.

Re: CVS commit: src/sys/uvm

2011-01-04 Thread Manuel Bouyer
On Tue, Jan 04, 2011 at 02:32:02AM -0800, Matt Thomas wrote:
 Not really.  A lot of device can only do 32bit DMA transfers so without some 
 type assistance (like the alpha has) you are restricted to DMA to the first 
 4GB of RAM.  If you are doing DMA to an address = 4GB, the system will stage 
 the data in a buffer  4GB for the DMA and then move the data to its ultimate 
 destination (for a read).  For a write it copies to the bounce buffer, and 
 then does a DMA.
 
 The point is if you are allocating a mbuf or a usb buffer, it makes sense to 
 allocate from the first 4GB if you can to avoid using bounce buffers.
 
 It's just a preference.

Is there some provision to always keep some 4GB pages free (with some being
quite large for e.g. USB descriptors) so there is always some bounce buffers
available ? AFAIK, if you allocate from the 4Gb free list, you can eat all
DMA-capable RAM.
I fixed this issue in arch/x86/x86/x86_machdep.c 1.37, I wonder if your change
has reintroduced this problem ...

-- 
Manuel Bouyer bou...@antioche.eu.org
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: CVS commit: src/sys/uvm

2011-01-04 Thread Masao Uebayashi
On Tue, Jan 04, 2011 at 06:25:12PM +0900, Toru Nishimura wrote:
 Matt Thomas modified UVM colour matching scheme;
 
 Modified Files:
src/sys/uvm: uvm_extern.h uvm_fault.c uvm_km.c uvm_page.c
 
 Log Message:
 Add better color matching selecting free pages.  KM pages will now allocated
 so that VA and PA have the same color.  On a page fault, choose a physical
 page that has the same color as the virtual address.
 
 This change is a big forward-reap to have VIPT-safe NetBSD VM for R4000 and
 modern ARM.  Combined with matched KVA selection against UVA, found in
 vm_machdep.c::vmapbuf(), there remains little ocasion where VM chooses
 mistakenly wrong colour combination to bind VPN and PFN.
 
 The entire effect is to eliminate the necessity of VIPT fixup efforts in 
 port-specific
 pmap.c and ends up with improving the cache effeciency in large degree.  This
 is _the intent_behind VIPT design.  So far OS virtual memory strategy paid 
 little
 attention to make VIPT cache work correctly.

It'd be nice if we can really eliminate VIPT fixup *code* in pmaps.

And I *think* this is possible if we don't remap pages using
pmap_kenter_pa().


Re: CVS commit: src/sys/uvm

2011-01-04 Thread Matt Thomas

On Jan 4, 2011, at 4:57 AM, Manuel Bouyer wrote:

 On Tue, Jan 04, 2011 at 02:32:02AM -0800, Matt Thomas wrote:
 Not really.  A lot of device can only do 32bit DMA transfers so without some 
 type assistance (like the alpha has) you are restricted to DMA to the first 
 4GB of RAM.  If you are doing DMA to an address = 4GB, the system will 
 stage the data in a buffer  4GB for the DMA and then move the data to its 
 ultimate destination (for a read).  For a write it copies to the bounce 
 buffer, and then does a DMA.
 
 The point is if you are allocating a mbuf or a usb buffer, it makes sense to 
 allocate from the first 4GB if you can to avoid using bounce buffers.
 
 It's just a preference.
 
 Is there some provision to always keep some 4GB pages free (with some being
 quite large for e.g. USB descriptors) so there is always some bounce buffers
 available ? AFAIK, if you allocate from the 4Gb free list, you can eat all
 DMA-capable RAM.
 I fixed this issue in arch/x86/x86/x86_machdep.c 1.37, I wonder if your change
 has reintroduced this problem ...

Since it's not turned on by default, I doubt it.



Re: CVS commit: src/sys/uvm

2011-01-04 Thread Toru Nishimura

Masao Uebayashi uebay...@tombi.co.jp responds as;


The entire effect is to eliminate the necessity of VIPT fixup efforts in
port-specific pmap.c and ends up with improving the cache effeciency
in large degree.  This is _the intent_ behind VIPT design.  So far OS
virtual memory strategy paid little attention to make VIPT cache work
correctly.


It'd be nice if we can really eliminate VIPT fixup *code* in pmaps.

And I *think* this is possible if we don't remap pages using
pmap_kenter_pa().


The VIPT rule is simple;  just make sure to match colour between
VPN and PFN.  It's the mandatory programming practice.  This rule
is clearly documented in _many_ CPU architecture documents with
VIPT cache.

- mmap(2) must refuse to map mixed colour VA.
- dynamic linker should pay attention to map share library at
 colour boundary.  This ensure cache lines shared between
 processes.

Toru Nishimura / ALKYL Technology


Re: CVS commit: src/sys/uvm

2011-01-04 Thread Matt Thomas

On Jan 4, 2011, at 8:33 AM, Manuel Bouyer wrote:

 On Tue, Jan 04, 2011 at 08:03:33AM -0800, Matt Thomas wrote:
 I fixed this issue in arch/x86/x86/x86_machdep.c 1.37, I wonder if your 
 change
 has reintroduced this problem ...
 
 Since it's not turned on by default, I doubt it.
 
 I guess if you wrote it, you expect to use it for something :)
 in which cases do you intend to turn it on ? A kernel config option,
 or for some ports ?

I turn it on for MIPS.

BTW, MIPS has separate freelists for 512MB, 4GB, and then everything else.

Re: CVS commit: src/sys/uvm

2011-01-04 Thread Manuel Bouyer
On Tue, Jan 04, 2011 at 08:35:48AM -0800, Matt Thomas wrote:
 
 On Jan 4, 2011, at 8:33 AM, Manuel Bouyer wrote:
 
  On Tue, Jan 04, 2011 at 08:03:33AM -0800, Matt Thomas wrote:
  I fixed this issue in arch/x86/x86/x86_machdep.c 1.37, I wonder if your 
  change
  has reintroduced this problem ...
  
  Since it's not turned on by default, I doubt it.
  
  I guess if you wrote it, you expect to use it for something :)
  in which cases do you intend to turn it on ? A kernel config option,
  or for some ports ?
 
 I turn it on for MIPS.
 
 BTW, MIPS has separate freelists for 512MB, 4GB, and then everything else.

OK, so the question remains for mips :) how do you prevent kernel memory
from stealing all DMA-capable memory ?
Or do maybe you expect to always have something free in the 512MB list ?
BTW, x86 has 16MB, 4GB and everything else. the 16MB list was not enough to
cover the bus_dma(9) needs. Maybe the 512MB list will, provided it's not
already starved by other needs.

-- 
Manuel Bouyer bou...@antioche.eu.org
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: CVS commit: src/sys/uvm

2011-01-04 Thread YAMAMOTO Takashi
hi,

 I take silence as no objection.

the silence in this case means was-busy-for-other-things-and-forgot.
sorry.

 I have no real code for this big picture at this moment.  Making
 vm_physseg available as reference is the first step.  This only
 changes uvm_page_physload() to return a pointer:
 
  -void uvm_page_physload();
  +void *uvm_page_physload();
 
 But this makes XIP pager MUCH cleaner.  The reason has been explained
 many times.

because the separate uvm_page_physload_device is no longer necessary,
you mean?  i have no problem with the step.

 Making fault handlers and pagers to use vm_physseg * + off_t is
 the next step, and I don't intend to work on it now.  I just want
 to explain the big picture.
 
  
   
   Keep vm_physseg * + off_t array on stack.  If UVM objects uses
   vm_page (e.g. vnode), its pager looks up vm_page - vm_physseg *
   + off_t *once* and cache it on stack.
  
  do you mean something like this?
 struct {
 vm_physseg *hoge;
 off_t fuga;
 } foo [16];
 
 Yes.
 
 Or cache vm_page * with it, like:
 
  struct vm_id {
  vm_physseg *seg;
  off_t off;
  vm_page *pg;
  };
 
  uvm_fault()
  {
  vm_id pgs[];
  :
  }
 
 Vnode pager (genfs_getpages) takes vm_page's by looking up
 vnode::v_uobj's list, or uvn_findpages().
 
 When it returns back to fault handler, we have to lookup vm_physseg
 for each page.  Then fill the seg slot above (assuming we'll
 remove vm_page::phys_addr soon).
 
 Fault handler calls per-vm_page operations iff vm_page slot is filled.
 XIP pages are not pageq'ed.

pgo_get returns either seg+off or pg for each vm_id slots?

 XIP pages don't need vm_page, but
 cached because it's vnode.

can you explain this sentence?

 (Just in case, have you read my paper?)

which paper?  i guess no.

YAMAMOTO Takashi


Re: CVS commit: src/sys/uvm

2011-01-03 Thread Masao Uebayashi
I take silence as no objection.

On Thu, Dec 23, 2010 at 12:48:04PM +0900, Masao Uebayashi wrote:
 On Wed, Dec 22, 2010 at 05:37:57AM +, YAMAMOTO Takashi wrote:
  hi,
  
   Could you ack this discussion?
  
  sorry for dropping a ball.
  
   
   On Tue, Dec 07, 2010 at 01:19:46AM +0900, Masao Uebayashi wrote:
   On Thu, Nov 25, 2010 at 11:32:39PM +, YAMAMOTO Takashi wrote:
[ adding cc: tech-kern@ ]

hi,

 On Wed, Nov 24, 2010 at 11:26:39PM -0800, Matt Thomas wrote:
 
 On Nov 24, 2010, at 10:47 PM, Masao Uebayashi wrote:
 
  On Thu, Nov 25, 2010 at 05:44:21AM +, YAMAMOTO Takashi wrote:
  hi,
  
  On Thu, Nov 25, 2010 at 04:18:25AM +, YAMAMOTO Takashi 
  wrote:
  hi,
  
  Hi, thanks for review.
  
  On Thu, Nov 25, 2010 at 01:58:04AM +, YAMAMOTO Takashi 
  wrote:
  hi,
  
  - what's VM_PHYSSEG_OP_PG?
  
  It's to lookup vm_physseg by struct vm_page *, relying on 
  that
  struct vm_page *[] is allocated linearly.  It'll be used to 
  remove
  vm_page::phys_addr as we talked some time ago.
  
  i'm not sure if commiting this unused uncommented code now 
  helps it.
  some try-and-benchmark cycles might be necessary given that
  vm_page - paddr conversion could be performace critical.
  
  If you really care performance, we can directly pass struct 
  vm_page
  * to pmap_enter().
  
  We're doing struct vm_page * - paddr_t just before 
  pmap_enter(),
  then doing paddr_t - vm_physseg reverse lookup again in
  pmap_enter() to check if a given PA is managed.  What is really
  needed here is, to lookup struct vm_page * - vm_physseg 
  once
  and you'll know both paddr_t and managed or not.
  
  i agree that the current code is not ideal in that respect.
  otoh, i'm not sure if passing vm_physseg around is a good idea.
  
  It's great you share the interest.
  
  I chose vm_physseg, because it was there.  I'm open to 
  alternatives,
  but I don't think you have many options...
 
 Passing vm_page * doesn't work if the page isn't managed since there
 won't be a vm_page for the paddr_t.
 
 Now passing both paddr_t and vm_page * would work and if the pointer
 to the vm_page, it would be an unmanaged mapping.  This also gives 
 the
 access to mdpg without another lookup.
 
 What if XIP'ed md(4), where physical pages are in .data (or .rodata)?
 
 And don't forget that you're the one who first pointed out that
 allocating vm_pages for XIP is a pure waste of memory. ;)

i guess matt meant if the pointer to the vm_page is NULL,.

 
 I'm allocating vm_pages, only because of phys_addr and loan_count.
 I believe vm_pages is unnecessary for read-only XIP segments.
 Because they're read-only, and stateless.
 
 I've already concluded that the current managed or not model
 doesn't work for XIP.  I'm pretty sure that my vm_physseg + off_t
 model can explain everything.  I'm rather waiting for a counter
 example how vm_physseg doesn't work...

i guess your suggestion is too vague.
where do you want to use vm_physseg * + off_t instead of vm_page * ?
getpages, pmap_enter, and?  how their function prototypes would be?
   
   The basic idea is straightforward; always allocate vm_physseg for
   memories/devices.  If a vm_physseg is used as general purpose
   memory, you allocate vm_page[] (as vm_physseg::pgs).  If it's
   potentially mapped as cached, you allocate pvh (as vm_physseg:pvh).
  
  can you provide function prototypes?
 
 I have no real code for this big picture at this moment.  Making
 vm_physseg available as reference is the first step.  This only
 changes uvm_page_physload() to return a pointer:
 
   -void uvm_page_physload();
   +void *uvm_page_physload();
 
 But this makes XIP pager MUCH cleaner.  The reason has been explained
 many times.
 
 Making fault handlers and pagers to use vm_physseg * + off_t is
 the next step, and I don't intend to work on it now.  I just want
 to explain the big picture.
 
  
   
   Keep vm_physseg * + off_t array on stack.  If UVM objects uses
   vm_page (e.g. vnode), its pager looks up vm_page - vm_physseg *
   + off_t *once* and cache it on stack.
  
  do you mean something like this?
  struct {
  vm_physseg *hoge;
  off_t fuga;
  } foo [16];
 
 Yes.
 
 Or cache vm_page * with it, like:
 
   struct vm_id {
   vm_physseg *seg;
   off_t off;
   vm_page *pg;
   };
 
   uvm_fault()
   {
   vm_id pgs[];
   :
   }
 
 Vnode pager (genfs_getpages) takes vm_page's by looking up
 vnode::v_uobj's list, or uvn_findpages().
 
 When it returns back to fault handler, we have to lookup vm_physseg
 for each page.  Then fill the seg slot 

Re: CVS commit: src/sys/uvm

2010-12-22 Thread Masao Uebayashi
On Wed, Dec 22, 2010 at 05:37:57AM +, YAMAMOTO Takashi wrote:
 hi,
 
  Could you ack this discussion?
 
 sorry for dropping a ball.
 
  
  On Tue, Dec 07, 2010 at 01:19:46AM +0900, Masao Uebayashi wrote:
  On Thu, Nov 25, 2010 at 11:32:39PM +, YAMAMOTO Takashi wrote:
   [ adding cc: tech-kern@ ]
   
   hi,
   
On Wed, Nov 24, 2010 at 11:26:39PM -0800, Matt Thomas wrote:

On Nov 24, 2010, at 10:47 PM, Masao Uebayashi wrote:

 On Thu, Nov 25, 2010 at 05:44:21AM +, YAMAMOTO Takashi wrote:
 hi,
 
 On Thu, Nov 25, 2010 at 04:18:25AM +, YAMAMOTO Takashi wrote:
 hi,
 
 Hi, thanks for review.
 
 On Thu, Nov 25, 2010 at 01:58:04AM +, YAMAMOTO Takashi 
 wrote:
 hi,
 
 - what's VM_PHYSSEG_OP_PG?
 
 It's to lookup vm_physseg by struct vm_page *, relying on that
 struct vm_page *[] is allocated linearly.  It'll be used to 
 remove
 vm_page::phys_addr as we talked some time ago.
 
 i'm not sure if commiting this unused uncommented code now helps 
 it.
 some try-and-benchmark cycles might be necessary given that
 vm_page - paddr conversion could be performace critical.
 
 If you really care performance, we can directly pass struct 
 vm_page
 * to pmap_enter().
 
 We're doing struct vm_page * - paddr_t just before 
 pmap_enter(),
 then doing paddr_t - vm_physseg reverse lookup again in
 pmap_enter() to check if a given PA is managed.  What is really
 needed here is, to lookup struct vm_page * - vm_physseg once
 and you'll know both paddr_t and managed or not.
 
 i agree that the current code is not ideal in that respect.
 otoh, i'm not sure if passing vm_physseg around is a good idea.
 
 It's great you share the interest.
 
 I chose vm_physseg, because it was there.  I'm open to alternatives,
 but I don't think you have many options...

Passing vm_page * doesn't work if the page isn't managed since there
won't be a vm_page for the paddr_t.

Now passing both paddr_t and vm_page * would work and if the pointer
to the vm_page, it would be an unmanaged mapping.  This also gives the
access to mdpg without another lookup.

What if XIP'ed md(4), where physical pages are in .data (or .rodata)?

And don't forget that you're the one who first pointed out that
allocating vm_pages for XIP is a pure waste of memory. ;)
   
   i guess matt meant if the pointer to the vm_page is NULL,.
   

I'm allocating vm_pages, only because of phys_addr and loan_count.
I believe vm_pages is unnecessary for read-only XIP segments.
Because they're read-only, and stateless.

I've already concluded that the current managed or not model
doesn't work for XIP.  I'm pretty sure that my vm_physseg + off_t
model can explain everything.  I'm rather waiting for a counter
example how vm_physseg doesn't work...
   
   i guess your suggestion is too vague.
   where do you want to use vm_physseg * + off_t instead of vm_page * ?
   getpages, pmap_enter, and?  how their function prototypes would be?
  
  The basic idea is straightforward; always allocate vm_physseg for
  memories/devices.  If a vm_physseg is used as general purpose
  memory, you allocate vm_page[] (as vm_physseg::pgs).  If it's
  potentially mapped as cached, you allocate pvh (as vm_physseg:pvh).
 
 can you provide function prototypes?

I have no real code for this big picture at this moment.  Making
vm_physseg available as reference is the first step.  This only
changes uvm_page_physload() to return a pointer:

-void uvm_page_physload();
+void *uvm_page_physload();

But this makes XIP pager MUCH cleaner.  The reason has been explained
many times.

Making fault handlers and pagers to use vm_physseg * + off_t is
the next step, and I don't intend to work on it now.  I just want
to explain the big picture.

 
  
  Keep vm_physseg * + off_t array on stack.  If UVM objects uses
  vm_page (e.g. vnode), its pager looks up vm_page - vm_physseg *
  + off_t *once* and cache it on stack.
 
 do you mean something like this?
   struct {
   vm_physseg *hoge;
   off_t fuga;
   } foo [16];

Yes.

Or cache vm_page * with it, like:

struct vm_id {
vm_physseg *seg;
off_t off;
vm_page *pg;
};

uvm_fault()
{
vm_id pgs[];
:
}

Vnode pager (genfs_getpages) takes vm_page's by looking up
vnode::v_uobj's list, or uvn_findpages().

When it returns back to fault handler, we have to lookup vm_physseg
for each page.  Then fill the seg slot above (assuming we'll
remove vm_page::phys_addr soon).

Fault handler calls per-vm_page operations iff vm_page slot is filled.
XIP pages are not pageq'ed.  XIP pages don't need vm_page, but
cached because it's vnode.

(Just in case, have you read 

Re: CVS commit: src/sys/uvm

2010-12-21 Thread Masao Uebayashi
Could you ack this discussion?

On Tue, Dec 07, 2010 at 01:19:46AM +0900, Masao Uebayashi wrote:
 On Thu, Nov 25, 2010 at 11:32:39PM +, YAMAMOTO Takashi wrote:
  [ adding cc: tech-kern@ ]
  
  hi,
  
   On Wed, Nov 24, 2010 at 11:26:39PM -0800, Matt Thomas wrote:
   
   On Nov 24, 2010, at 10:47 PM, Masao Uebayashi wrote:
   
On Thu, Nov 25, 2010 at 05:44:21AM +, YAMAMOTO Takashi wrote:
hi,

On Thu, Nov 25, 2010 at 04:18:25AM +, YAMAMOTO Takashi wrote:
hi,

Hi, thanks for review.

On Thu, Nov 25, 2010 at 01:58:04AM +, YAMAMOTO Takashi wrote:
hi,

- what's VM_PHYSSEG_OP_PG?

It's to lookup vm_physseg by struct vm_page *, relying on that
struct vm_page *[] is allocated linearly.  It'll be used to 
remove
vm_page::phys_addr as we talked some time ago.

i'm not sure if commiting this unused uncommented code now helps it.
some try-and-benchmark cycles might be necessary given that
vm_page - paddr conversion could be performace critical.

If you really care performance, we can directly pass struct vm_page
* to pmap_enter().

We're doing struct vm_page * - paddr_t just before pmap_enter(),
then doing paddr_t - vm_physseg reverse lookup again in
pmap_enter() to check if a given PA is managed.  What is really
needed here is, to lookup struct vm_page * - vm_physseg once
and you'll know both paddr_t and managed or not.

i agree that the current code is not ideal in that respect.
otoh, i'm not sure if passing vm_physseg around is a good idea.

It's great you share the interest.

I chose vm_physseg, because it was there.  I'm open to alternatives,
but I don't think you have many options...
   
   Passing vm_page * doesn't work if the page isn't managed since there
   won't be a vm_page for the paddr_t.
   
   Now passing both paddr_t and vm_page * would work and if the pointer
   to the vm_page, it would be an unmanaged mapping.  This also gives the
   access to mdpg without another lookup.
   
   What if XIP'ed md(4), where physical pages are in .data (or .rodata)?
   
   And don't forget that you're the one who first pointed out that
   allocating vm_pages for XIP is a pure waste of memory. ;)
  
  i guess matt meant if the pointer to the vm_page is NULL,.
  
   
   I'm allocating vm_pages, only because of phys_addr and loan_count.
   I believe vm_pages is unnecessary for read-only XIP segments.
   Because they're read-only, and stateless.
   
   I've already concluded that the current managed or not model
   doesn't work for XIP.  I'm pretty sure that my vm_physseg + off_t
   model can explain everything.  I'm rather waiting for a counter
   example how vm_physseg doesn't work...
  
  i guess your suggestion is too vague.
  where do you want to use vm_physseg * + off_t instead of vm_page * ?
  getpages, pmap_enter, and?  how their function prototypes would be?
 
 The basic idea is straightforward; always allocate vm_physseg for
 memories/devices.  If a vm_physseg is used as general purpose
 memory, you allocate vm_page[] (as vm_physseg::pgs).  If it's
 potentially mapped as cached, you allocate pvh (as vm_physseg:pvh).
 
 Keep vm_physseg * + off_t array on stack.  If UVM objects uses
 vm_page (e.g. vnode), its pager looks up vm_page - vm_physseg *
 + off_t *once* and cache it on stack.
 
  any valid paddr_t value will belong to exactly one vm_phsseg?
 
 That's the idea.  This would clarify mem(4) backend too.
 
 Note that allocating vm_physseg for device segments is cheap.


Re: CVS commit: src/sys/uvm

2010-12-21 Thread YAMAMOTO Takashi
hi,

 Could you ack this discussion?

sorry for dropping a ball.

 
 On Tue, Dec 07, 2010 at 01:19:46AM +0900, Masao Uebayashi wrote:
 On Thu, Nov 25, 2010 at 11:32:39PM +, YAMAMOTO Takashi wrote:
  [ adding cc: tech-kern@ ]
  
  hi,
  
   On Wed, Nov 24, 2010 at 11:26:39PM -0800, Matt Thomas wrote:
   
   On Nov 24, 2010, at 10:47 PM, Masao Uebayashi wrote:
   
On Thu, Nov 25, 2010 at 05:44:21AM +, YAMAMOTO Takashi wrote:
hi,

On Thu, Nov 25, 2010 at 04:18:25AM +, YAMAMOTO Takashi wrote:
hi,

Hi, thanks for review.

On Thu, Nov 25, 2010 at 01:58:04AM +, YAMAMOTO Takashi wrote:
hi,

- what's VM_PHYSSEG_OP_PG?

It's to lookup vm_physseg by struct vm_page *, relying on that
struct vm_page *[] is allocated linearly.  It'll be used to 
remove
vm_page::phys_addr as we talked some time ago.

i'm not sure if commiting this unused uncommented code now helps 
it.
some try-and-benchmark cycles might be necessary given that
vm_page - paddr conversion could be performace critical.

If you really care performance, we can directly pass struct vm_page
* to pmap_enter().

We're doing struct vm_page * - paddr_t just before 
pmap_enter(),
then doing paddr_t - vm_physseg reverse lookup again in
pmap_enter() to check if a given PA is managed.  What is really
needed here is, to lookup struct vm_page * - vm_physseg once
and you'll know both paddr_t and managed or not.

i agree that the current code is not ideal in that respect.
otoh, i'm not sure if passing vm_physseg around is a good idea.

It's great you share the interest.

I chose vm_physseg, because it was there.  I'm open to alternatives,
but I don't think you have many options...
   
   Passing vm_page * doesn't work if the page isn't managed since there
   won't be a vm_page for the paddr_t.
   
   Now passing both paddr_t and vm_page * would work and if the pointer
   to the vm_page, it would be an unmanaged mapping.  This also gives the
   access to mdpg without another lookup.
   
   What if XIP'ed md(4), where physical pages are in .data (or .rodata)?
   
   And don't forget that you're the one who first pointed out that
   allocating vm_pages for XIP is a pure waste of memory. ;)
  
  i guess matt meant if the pointer to the vm_page is NULL,.
  
   
   I'm allocating vm_pages, only because of phys_addr and loan_count.
   I believe vm_pages is unnecessary for read-only XIP segments.
   Because they're read-only, and stateless.
   
   I've already concluded that the current managed or not model
   doesn't work for XIP.  I'm pretty sure that my vm_physseg + off_t
   model can explain everything.  I'm rather waiting for a counter
   example how vm_physseg doesn't work...
  
  i guess your suggestion is too vague.
  where do you want to use vm_physseg * + off_t instead of vm_page * ?
  getpages, pmap_enter, and?  how their function prototypes would be?
 
 The basic idea is straightforward; always allocate vm_physseg for
 memories/devices.  If a vm_physseg is used as general purpose
 memory, you allocate vm_page[] (as vm_physseg::pgs).  If it's
 potentially mapped as cached, you allocate pvh (as vm_physseg:pvh).

can you provide function prototypes?

 
 Keep vm_physseg * + off_t array on stack.  If UVM objects uses
 vm_page (e.g. vnode), its pager looks up vm_page - vm_physseg *
 + off_t *once* and cache it on stack.

do you mean something like this?
struct {
vm_physseg *hoge;
off_t fuga;
} foo [16];

YAMAMOTO Takashi

 
  any valid paddr_t value will belong to exactly one vm_phsseg?
 
 That's the idea.  This would clarify mem(4) backend too.
 
 Note that allocating vm_physseg for device segments is cheap.


Re: CVS commit: src/sys/uvm

2010-12-06 Thread Masao Uebayashi
On Thu, Nov 25, 2010 at 11:32:39PM +, YAMAMOTO Takashi wrote:
 [ adding cc: tech-kern@ ]
 
 hi,
 
  On Wed, Nov 24, 2010 at 11:26:39PM -0800, Matt Thomas wrote:
  
  On Nov 24, 2010, at 10:47 PM, Masao Uebayashi wrote:
  
   On Thu, Nov 25, 2010 at 05:44:21AM +, YAMAMOTO Takashi wrote:
   hi,
   
   On Thu, Nov 25, 2010 at 04:18:25AM +, YAMAMOTO Takashi wrote:
   hi,
   
   Hi, thanks for review.
   
   On Thu, Nov 25, 2010 at 01:58:04AM +, YAMAMOTO Takashi wrote:
   hi,
   
   - what's VM_PHYSSEG_OP_PG?
   
   It's to lookup vm_physseg by struct vm_page *, relying on that
   struct vm_page *[] is allocated linearly.  It'll be used to remove
   vm_page::phys_addr as we talked some time ago.
   
   i'm not sure if commiting this unused uncommented code now helps it.
   some try-and-benchmark cycles might be necessary given that
   vm_page - paddr conversion could be performace critical.
   
   If you really care performance, we can directly pass struct vm_page
   * to pmap_enter().
   
   We're doing struct vm_page * - paddr_t just before pmap_enter(),
   then doing paddr_t - vm_physseg reverse lookup again in
   pmap_enter() to check if a given PA is managed.  What is really
   needed here is, to lookup struct vm_page * - vm_physseg once
   and you'll know both paddr_t and managed or not.
   
   i agree that the current code is not ideal in that respect.
   otoh, i'm not sure if passing vm_physseg around is a good idea.
   
   It's great you share the interest.
   
   I chose vm_physseg, because it was there.  I'm open to alternatives,
   but I don't think you have many options...
  
  Passing vm_page * doesn't work if the page isn't managed since there
  won't be a vm_page for the paddr_t.
  
  Now passing both paddr_t and vm_page * would work and if the pointer
  to the vm_page, it would be an unmanaged mapping.  This also gives the
  access to mdpg without another lookup.
  
  What if XIP'ed md(4), where physical pages are in .data (or .rodata)?
  
  And don't forget that you're the one who first pointed out that
  allocating vm_pages for XIP is a pure waste of memory. ;)
 
 i guess matt meant if the pointer to the vm_page is NULL,.
 
  
  I'm allocating vm_pages, only because of phys_addr and loan_count.
  I believe vm_pages is unnecessary for read-only XIP segments.
  Because they're read-only, and stateless.
  
  I've already concluded that the current managed or not model
  doesn't work for XIP.  I'm pretty sure that my vm_physseg + off_t
  model can explain everything.  I'm rather waiting for a counter
  example how vm_physseg doesn't work...
 
 i guess your suggestion is too vague.
 where do you want to use vm_physseg * + off_t instead of vm_page * ?
 getpages, pmap_enter, and?  how their function prototypes would be?

The basic idea is straightforward; always allocate vm_physseg for
memories/devices.  If a vm_physseg is used as general purpose
memory, you allocate vm_page[] (as vm_physseg::pgs).  If it's
potentially mapped as cached, you allocate pvh (as vm_physseg:pvh).

Keep vm_physseg * + off_t array on stack.  If UVM objects uses
vm_page (e.g. vnode), its pager looks up vm_page - vm_physseg *
+ off_t *once* and cache it on stack.

 any valid paddr_t value will belong to exactly one vm_phsseg?

That's the idea.  This would clarify mem(4) backend too.

Note that allocating vm_physseg for device segments is cheap.


Re: CVS commit: src/sys/uvm

2010-11-25 Thread Masao Uebayashi
On Wed, Nov 24, 2010 at 11:26:39PM -0800, Matt Thomas wrote:
 
 On Nov 24, 2010, at 10:47 PM, Masao Uebayashi wrote:
 
  On Thu, Nov 25, 2010 at 05:44:21AM +, YAMAMOTO Takashi wrote:
  hi,
  
  On Thu, Nov 25, 2010 at 04:18:25AM +, YAMAMOTO Takashi wrote:
  hi,
  
  Hi, thanks for review.
  
  On Thu, Nov 25, 2010 at 01:58:04AM +, YAMAMOTO Takashi wrote:
  hi,
  
  - what's VM_PHYSSEG_OP_PG?
  
  It's to lookup vm_physseg by struct vm_page *, relying on that
  struct vm_page *[] is allocated linearly.  It'll be used to remove
  vm_page::phys_addr as we talked some time ago.
  
  i'm not sure if commiting this unused uncommented code now helps it.
  some try-and-benchmark cycles might be necessary given that
  vm_page - paddr conversion could be performace critical.
  
  If you really care performance, we can directly pass struct vm_page
  * to pmap_enter().
  
  We're doing struct vm_page * - paddr_t just before pmap_enter(),
  then doing paddr_t - vm_physseg reverse lookup again in
  pmap_enter() to check if a given PA is managed.  What is really
  needed here is, to lookup struct vm_page * - vm_physseg once
  and you'll know both paddr_t and managed or not.
  
  i agree that the current code is not ideal in that respect.
  otoh, i'm not sure if passing vm_physseg around is a good idea.
  
  It's great you share the interest.
  
  I chose vm_physseg, because it was there.  I'm open to alternatives,
  but I don't think you have many options...
 
 Passing vm_page * doesn't work if the page isn't managed since there
 won't be a vm_page for the paddr_t.
 
 Now passing both paddr_t and vm_page * would work and if the pointer
 to the vm_page, it would be an unmanaged mapping.  This also gives the
 access to mdpg without another lookup.

What if XIP'ed md(4), where physical pages are in .data (or .rodata)?

And don't forget that you're the one who first pointed out that
allocating vm_pages for XIP is a pure waste of memory. ;)

I'm allocating vm_pages, only because of phys_addr and loan_count.
I believe vm_pages is unnecessary for read-only XIP segments.
Because they're read-only, and stateless.

I've already concluded that the current managed or not model
doesn't work for XIP.  I'm pretty sure that my vm_physseg + off_t
model can explain everything.  I'm rather waiting for a counter
example how vm_physseg doesn't work...


Re: CVS commit: src/sys/uvm

2010-11-25 Thread Nick Hudson
On Thursday 25 November 2010 04:45:31 Masao Uebayashi wrote:
 Module Name:  src
 Committed By: uebayasi
 Date: Thu Nov 25 04:45:30 UTC 2010
 
 Modified Files:
   src/sys/uvm: uvm_page.c uvm_page.h uvm_pglist.c
 
 Log Message:
 Revert vm_physseg allocation changes.  A report says that it causes
 panics when used with mplayer in heavy load.

I guess you didn't mean to delete VM_PAGE_TO_MD

Nick


Re: CVS commit: src/sys/uvm

2010-11-25 Thread YAMAMOTO Takashi
[ adding cc: tech-kern@ ]

hi,

 On Wed, Nov 24, 2010 at 11:26:39PM -0800, Matt Thomas wrote:
 
 On Nov 24, 2010, at 10:47 PM, Masao Uebayashi wrote:
 
  On Thu, Nov 25, 2010 at 05:44:21AM +, YAMAMOTO Takashi wrote:
  hi,
  
  On Thu, Nov 25, 2010 at 04:18:25AM +, YAMAMOTO Takashi wrote:
  hi,
  
  Hi, thanks for review.
  
  On Thu, Nov 25, 2010 at 01:58:04AM +, YAMAMOTO Takashi wrote:
  hi,
  
  - what's VM_PHYSSEG_OP_PG?
  
  It's to lookup vm_physseg by struct vm_page *, relying on that
  struct vm_page *[] is allocated linearly.  It'll be used to remove
  vm_page::phys_addr as we talked some time ago.
  
  i'm not sure if commiting this unused uncommented code now helps it.
  some try-and-benchmark cycles might be necessary given that
  vm_page - paddr conversion could be performace critical.
  
  If you really care performance, we can directly pass struct vm_page
  * to pmap_enter().
  
  We're doing struct vm_page * - paddr_t just before pmap_enter(),
  then doing paddr_t - vm_physseg reverse lookup again in
  pmap_enter() to check if a given PA is managed.  What is really
  needed here is, to lookup struct vm_page * - vm_physseg once
  and you'll know both paddr_t and managed or not.
  
  i agree that the current code is not ideal in that respect.
  otoh, i'm not sure if passing vm_physseg around is a good idea.
  
  It's great you share the interest.
  
  I chose vm_physseg, because it was there.  I'm open to alternatives,
  but I don't think you have many options...
 
 Passing vm_page * doesn't work if the page isn't managed since there
 won't be a vm_page for the paddr_t.
 
 Now passing both paddr_t and vm_page * would work and if the pointer
 to the vm_page, it would be an unmanaged mapping.  This also gives the
 access to mdpg without another lookup.
 
 What if XIP'ed md(4), where physical pages are in .data (or .rodata)?
 
 And don't forget that you're the one who first pointed out that
 allocating vm_pages for XIP is a pure waste of memory. ;)

i guess matt meant if the pointer to the vm_page is NULL,.

 
 I'm allocating vm_pages, only because of phys_addr and loan_count.
 I believe vm_pages is unnecessary for read-only XIP segments.
 Because they're read-only, and stateless.
 
 I've already concluded that the current managed or not model
 doesn't work for XIP.  I'm pretty sure that my vm_physseg + off_t
 model can explain everything.  I'm rather waiting for a counter
 example how vm_physseg doesn't work...

i guess your suggestion is too vague.
where do you want to use vm_physseg * + off_t instead of vm_page * ?
getpages, pmap_enter, and?  how their function prototypes would be?
any valid paddr_t value will belong to exactly one vm_phsseg?

YAMAMOTO Takashi


Re: CVS commit: src/sys/uvm

2010-11-24 Thread YAMAMOTO Takashi
hi,

- what's VM_PHYSSEG_OP_PG?
- why *offp calculation is done in _p functions, rather than the caller?

YAMAMOTO Takashi

 Module Name:  src
 Committed By: uebayasi
 Date: Sun Nov 14 15:06:34 UTC 2010
 
 Modified Files:
   src/sys/uvm: uvm_page.c uvm_page.h uvm_pglist.c
 
 Log Message:
 Be a little more friendly to dynamic physical segment registration.
 
 Maintain an array of pointer to struct vm_physseg, instead of struct
 array.  So that VM subsystem can take its pointer safely.  Pointer
 to this struct will replace raw paddr_t usage in the future.
 
 Dynamic removal is not supported yet.
 
 Only MD data structure changes, no kernel bump needed.
 
 Tested on i386, amd64, powerpc/ibm40x, arm11.
 
 
 To generate a diff of this commit:
 cvs rdiff -u -r1.163 -r1.164 src/sys/uvm/uvm_page.c
 cvs rdiff -u -r1.66 -r1.67 src/sys/uvm/uvm_page.h
 cvs rdiff -u -r1.46 -r1.47 src/sys/uvm/uvm_pglist.c
 
 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.


Re: CVS commit: src/sys/uvm

2010-11-24 Thread Masao Uebayashi
Hi, thanks for review.

On Thu, Nov 25, 2010 at 01:58:04AM +, YAMAMOTO Takashi wrote:
 hi,
 
 - what's VM_PHYSSEG_OP_PG?

It's to lookup vm_physseg by struct vm_page *, relying on that
struct vm_page *[] is allocated linearly.  It'll be used to remove
vm_page::phys_addr as we talked some time ago.

 - why *offp calculation is done in _p functions, rather than the caller?

Hmmm.  I can't remember any reason, except it's similar to the
original code.  It's better to do in callers, yes.

Masao

 
 YAMAMOTO Takashi
 
  Module Name:src
  Committed By:   uebayasi
  Date:   Sun Nov 14 15:06:34 UTC 2010
  
  Modified Files:
  src/sys/uvm: uvm_page.c uvm_page.h uvm_pglist.c
  
  Log Message:
  Be a little more friendly to dynamic physical segment registration.
  
  Maintain an array of pointer to struct vm_physseg, instead of struct
  array.  So that VM subsystem can take its pointer safely.  Pointer
  to this struct will replace raw paddr_t usage in the future.
  
  Dynamic removal is not supported yet.
  
  Only MD data structure changes, no kernel bump needed.
  
  Tested on i386, amd64, powerpc/ibm40x, arm11.
  
  
  To generate a diff of this commit:
  cvs rdiff -u -r1.163 -r1.164 src/sys/uvm/uvm_page.c
  cvs rdiff -u -r1.66 -r1.67 src/sys/uvm/uvm_page.h
  cvs rdiff -u -r1.46 -r1.47 src/sys/uvm/uvm_pglist.c
  
  Please note that diffs are not public domain; they are subject to the
  copyright notices on the relevant files.

-- 
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635


  1   2   >