Re: Buffer cache made to use >32bit mem addresses (i.e. >~3GB support for the buffer cache) nowadays or planned soon?

2016-02-14 Thread Tinker

On 2016-02-15 10:15, Constantine A. Murenin wrote:
..

I think it got reverted by:

..

but I'm not an expert so would wait on confirmation by Bob Beck.



Yes, I think you are correct, and it was indeed reverted.

..

But it looks like the functions that were introduced in the above
commit are still WIP and don't actually flip anything yet:

http://bxr.su/o/sys/kern/vfs_bio.c#buf_flip_high

307buf_flip_high(struct buf *bp)

..

313/* XXX does nothing to buffer for now */

..

317buf_flip_dma(struct buf *bp)

..

324/* XXX does not flip buffer for now */



Thank you for clarifying. This is a quite big deal, for anyone with lots 
of disk IO and RAM. This will be #2 on my OpenBSD wishlist for the year.



How complex is this to implement, and who would be able to do it?


May donor powers come to me or someone else this year to contribute. 
OpenBSD is the finest OS out there.




Re: Buffer cache made to use >32bit mem addresses (i.e. >~3GB support for the buffer cache) nowadays or planned soon?

2016-02-14 Thread Constantine A. Murenin
On 14 February 2016 at 10:29, Karel Gardas  wrote:
> On Sat, Feb 13, 2016 at 9:39 PM, Stuart Henderson  
> wrote:
>> There was this commit, I don't *think* it got reverted.
>>
>>
>>
>> CVSROOT:/cvs
>> Module name:src
>> Changes by: b...@cvs.openbsd.org2013/06/11 13:01:20
>>
>> Modified files:
>> sys/kern   : kern_sysctl.c spec_vnops.c vfs_bio.c
>>  vfs_biomem.c vfs_vops.c
>> sys/sys: buf.h mount.h
>> sys/uvm: uvm_extern.h uvm_page.c
>> usr.bin/systat : iostat.c
>>
>> Log message:
>> High memory page flipping for the buffer cache.
>>
>> This change splits the buffer cache free lists into lists of dma reachable
>> buffers and high memory buffers based on the ranges returned by pmemrange.
>> Buffers move from dma to high memory as they age, but are flipped to dma
>> reachable memory if IO is needed to/from and high mem buffer. The total
>> amount of buffers  allocated is now bufcachepercent of both the dma and
>> the high memory region.
>>
>> This change allows the use of large buffer caches on amd64 using more than
>> 4 GB of memory
>>
>> ok tedu@ krw@ - testing by many.
>
> I think it got reverted by:
>
> commit ac77fb26761065b7f6031098e6a182cacfaf7437
> Author: beck 
> Date:   Tue Jul 9 15:37:43 2013 +
>
> back out the cache flipper temporarily to work out of tree.
> will come back soon.
> ok deraadt@
>
>
> but I'm not an expert so would wait on confirmation by Bob Beck.


Yes, I think you are correct, and it was indeed reverted.


Some parts have since been reimplemented and brought back by
http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/kern/vfs_bio.c#rev1.170
on 2015/07/19:

http://marc.info/?l=openbsd-cvs=143732292523715=2

> CVSROOT:/cvs
> Module name:src
> Changes by:b...@cvs.openbsd.org2015/07/19 10:21:11
>
> Modified files:
> sys/kern   : vfs_bio.c vfs_vops.c
> sys/sys: buf.h
>
> Log message:
> Use two 2q caches for the buffer cache, moving previously warm buffers from 
> the
> first queue to the second.
> Mark the first queue as DMA in preparation for being able to use more memory
> by flipping. Flipper code currently only sets and clears the flag.
> ok tedu@ guenther@


But it looks like the functions that were introduced in the above
commit are still WIP and don't actually flip anything yet:

http://bxr.su/o/sys/kern/vfs_bio.c#buf_flip_high

307buf_flip_high(struct buf *bp)
308{
309KASSERT(ISSET(bp->b_flags, B_BC));
310KASSERT(ISSET(bp->b_flags, B_DMA));
311KASSERT(bp->cache == DMA_CACHE);
312CLR(bp->b_flags, B_DMA);
313/* XXX does nothing to buffer for now */
314}

http://bxr.su/o/sys/kern/vfs_bio.c#buf_flip_dma

317buf_flip_dma(struct buf *bp)
318{
319KASSERT(ISSET(bp->b_flags, B_BC));
320KASSERT(ISSET(bp->b_flags, B_BUSY));
321if (!ISSET(bp->b_flags, B_DMA)) {
322KASSERT(bp->cache > DMA_CACHE);
323KASSERT(bp->cache < NUM_CACHES);
324/* XXX does not flip buffer for now */

Cheers,
Constantine.



Re: Buffer cache made to use >32bit mem addresses (i.e. >~3GB support for the buffer cache) nowadays or planned soon?

2016-02-14 Thread Karel Gardas
On Sat, Feb 13, 2016 at 9:39 PM, Stuart Henderson  wrote:
> There was this commit, I don't *think* it got reverted.
>
>
>
> CVSROOT:/cvs
> Module name:src
> Changes by: b...@cvs.openbsd.org2013/06/11 13:01:20
>
> Modified files:
> sys/kern   : kern_sysctl.c spec_vnops.c vfs_bio.c
>  vfs_biomem.c vfs_vops.c
> sys/sys: buf.h mount.h
> sys/uvm: uvm_extern.h uvm_page.c
> usr.bin/systat : iostat.c
>
> Log message:
> High memory page flipping for the buffer cache.
>
> This change splits the buffer cache free lists into lists of dma reachable
> buffers and high memory buffers based on the ranges returned by pmemrange.
> Buffers move from dma to high memory as they age, but are flipped to dma
> reachable memory if IO is needed to/from and high mem buffer. The total
> amount of buffers  allocated is now bufcachepercent of both the dma and
> the high memory region.
>
> This change allows the use of large buffer caches on amd64 using more than
> 4 GB of memory
>
> ok tedu@ krw@ - testing by many.

I think it got reverted by:

commit ac77fb26761065b7f6031098e6a182cacfaf7437
Author: beck 
Date:   Tue Jul 9 15:37:43 2013 +

back out the cache flipper temporarily to work out of tree.
will come back soon.
ok deraadt@


but I'm not an expert so would wait on confirmation by Bob Beck.



Re: Buffer cache made to use >32bit mem addresses (i.e. >~3GB support for the buffer cache) nowadays or planned soon?

2016-02-13 Thread Karel Gardas
I think you would also like to investigate this one:
http://www.undeadly.org/cgi?action=article=2006061416

> Some quite deep reading [1] taught me that at least quite recently, there
> was a ~3GB cap on the buffer cache, independent of architecture and system
> RAM size.



Re: Buffer cache made to use >32bit mem addresses (i.e. >~3GB support for the buffer cache) nowadays or planned soon?

2016-02-13 Thread Tinker

Dear Karel,

Thanks - wait - this post from 2006 you mentioned now, is it saying that 
actually >32bit/>~3GB buffer cache IS SUPPORTED/WORKS on any AMD64 *with 
IOMMU* support in the CPU, and was working all the time??


(That would mean that I misunderstood those references I posted in the 
previous email because in actuality the 32bit/~3GB constraint only is in 
certain usecases that is on non-IOMMU CPU:s.)


Please clarify!



So if so, any Intel processor with the "Intel® Virtualization Technology 
for Directed I/O (VT-d)" feature such as for example this one 
http://ark.intel.com/products/81061/ , or any of the CPU:s listed on 
https://en.wikipedia.org/wiki/List_of_IOMMU-supporting_hardware , has 
that support?


Does the motherboard need specific support too as Wikipedia indicates - 
though at least any Xeon server motherboard from the last 3-4 years must 
for sure have it right?


Thank you everyone for your excellent work with OpenBSD!

Best regards,
Tinker

[1]
David Mazieres linked to two docs in there, those are only on 
archive.org now:

https://web.archive.org/web/20150814051509/http://www.intel.com/content/dam/www/public/us/en/documents/product-specifications/vt-directed-io-spec.pdf
https://web.archive.org/web/20081218031805/http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/34434.pdf


On 2016-02-14 01:15, Karel Gardas wrote:

I think you would also like to investigate this one:
http://www.undeadly.org/cgi?action=article=2006061416

Some quite deep reading [1] taught me that at least quite recently, 
there
was a ~3GB cap on the buffer cache, independent of architecture and 
system

RAM size.




Re: Buffer cache made to use >32bit mem addresses (i.e. >~3GB support for the buffer cache) nowadays or planned soon?

2016-02-13 Thread Karel Gardas
I'm afraid you read too quickly and w/o attention to detail, please
reread and pay special attention to the last paragraph. Especially to:

"IOMMU is present in all "real" AMD64 machines, but not the Intel
clones. Unfortunately, OpenBSD support for IOMMU on the AMD machines
is not quite ready for primetime (code exists, but "real life" has
consorted against me finishing it)."

On Sat, Feb 13, 2016 at 7:35 PM, Tinker  wrote:
> Dear Karel,
>
> Thanks - wait - this post from 2006 you mentioned now, is it saying that
> actually >32bit/>~3GB buffer cache IS SUPPORTED/WORKS on any AMD64 *with
> IOMMU* support in the CPU, and was working all the time??
>
> (That would mean that I misunderstood those references I posted in the
> previous email because in actuality the 32bit/~3GB constraint only is in
> certain usecases that is on non-IOMMU CPU:s.)
>
> Please clarify!
>
>
>
> So if so, any Intel processor with the "Intel® Virtualization Technology
for
> Directed I/O (VT-d)" feature such as for example this one
> http://ark.intel.com/products/81061/ , or any of the CPU:s listed on
> https://en.wikipedia.org/wiki/List_of_IOMMU-supporting_hardware , has that
> support?
>
> Does the motherboard need specific support too as Wikipedia indicates -
> though at least any Xeon server motherboard from the last 3-4 years must
for
> sure have it right?
>
> Thank you everyone for your excellent work with OpenBSD!
>
> Best regards,
> Tinker
>
> [1]
> David Mazieres linked to two docs in there, those are only on archive.org
> now:
>
https://web.archive.org/web/20150814051509/http://www.intel.com/content/dam/w
ww/public/us/en/documents/product-specifications/vt-directed-io-spec.pdf
>
https://web.archive.org/web/20081218031805/http://www.amd.com/us-en/assets/co
ntent_type/white_papers_and_tech_docs/34434.pdf
>
>
>
> On 2016-02-14 01:15, Karel Gardas wrote:
>>
>> I think you would also like to investigate this one:
>> http://www.undeadly.org/cgi?action=article=2006061416
>>
>>> Some quite deep reading [1] taught me that at least quite recently, there
>>> was a ~3GB cap on the buffer cache, independent of architecture and
>>> system
>>> RAM size.



Re: Buffer cache made to use >32bit mem addresses (i.e. >~3GB support for the buffer cache) nowadays or planned soon?

2016-02-13 Thread Tinker
Aha. So the article is saying that full IOMMU support is waiting on all 
AMD64 machines (so that would mean any Intel and AMD-manufactured 
processor with VT-d etc.), and you're saying that this is what needs to 
be implemented for the buffer cache to finally get >32bit/>~3GB support?


Are there plans today to implement this and then get the >32bit/>~3GB 
support going?


(I don't understand exactly how these things fit together, this is why I 
asked tentatively.)


Thanks, Tinker

On 2016-02-14 02:03, Karel Gardas wrote:

I'm afraid you read too quickly and w/o attention to detail, please
reread and pay special attention to the last paragraph. Especially to:

"IOMMU is present in all "real" AMD64 machines, but not the Intel
clones. Unfortunately, OpenBSD support for IOMMU on the AMD machines
is not quite ready for primetime (code exists, but "real life" has
consorted against me finishing it)."

On Sat, Feb 13, 2016 at 7:35 PM, Tinker  wrote:

Dear Karel,

Thanks - wait - this post from 2006 you mentioned now, is it saying 
that
actually >32bit/>~3GB buffer cache IS SUPPORTED/WORKS on any AMD64 
*with

IOMMU* support in the CPU, and was working all the time??

(That would mean that I misunderstood those references I posted in the
previous email because in actuality the 32bit/~3GB constraint only is 
in

certain usecases that is on non-IOMMU CPU:s.)

Please clarify!



So if so, any Intel processor with the "Intel® Virtualization 
Technology

for

Directed I/O (VT-d)" feature such as for example this one
http://ark.intel.com/products/81061/ , or any of the CPU:s listed on
https://en.wikipedia.org/wiki/List_of_IOMMU-supporting_hardware , has 
that

support?

Does the motherboard need specific support too as Wikipedia indicates 
-
though at least any Xeon server motherboard from the last 3-4 years 
must

for

sure have it right?

Thank you everyone for your excellent work with OpenBSD!

Best regards,
Tinker

[1]
David Mazieres linked to two docs in there, those are only on 
archive.org

now:


https://web.archive.org/web/20150814051509/http://www.intel.com/content/dam/w
ww/public/us/en/documents/product-specifications/vt-directed-io-spec.pdf



https://web.archive.org/web/20081218031805/http://www.amd.com/us-en/assets/co
ntent_type/white_papers_and_tech_docs/34434.pdf




On 2016-02-14 01:15, Karel Gardas wrote:


I think you would also like to investigate this one:
http://www.undeadly.org/cgi?action=article=2006061416

Some quite deep reading [1] taught me that at least quite recently, 
there

was a ~3GB cap on the buffer cache, independent of architecture and
system
RAM size.




Re: Buffer cache made to use >32bit mem addresses (i.e. >~3GB support for the buffer cache) nowadays or planned soon?

2016-02-13 Thread Stuart Henderson
On 2016-02-13, Tinker  wrote:
> Hi,
>
> Some quite deep reading [1] taught me that at least quite recently, 
> there was a ~3GB cap on the buffer cache, independent of architecture 
> and system RAM size.
>
> Reading the source history of vfs_bio.c [2] gives me a vague impression 
> that this cap is there also today.
>
> Just wanted to check, has this cap been removed, or is there any plan to 
> remove it next months from now?

There was this commit, I don't *think* it got reverted.



CVSROOT:/cvs
Module name:src
Changes by: b...@cvs.openbsd.org2013/06/11 13:01:20

Modified files:
sys/kern   : kern_sysctl.c spec_vnops.c vfs_bio.c
 vfs_biomem.c vfs_vops.c
sys/sys: buf.h mount.h
sys/uvm: uvm_extern.h uvm_page.c
usr.bin/systat : iostat.c

Log message:
High memory page flipping for the buffer cache.

This change splits the buffer cache free lists into lists of dma reachable
buffers and high memory buffers based on the ranges returned by pmemrange.
Buffers move from dma to high memory as they age, but are flipped to dma
reachable memory if IO is needed to/from and high mem buffer. The total
amount of buffers  allocated is now bufcachepercent of both the dma and
the high memory region.

This change allows the use of large buffer caches on amd64 using more than
4 GB of memory

ok tedu@ krw@ - testing by many.



Re: Buffer cache made to use >32bit mem addresses (i.e. >~3GB support for the buffer cache) nowadays or planned soon?

2016-02-13 Thread Tinker

On 2016-02-14 03:39, Stuart Henderson wrote:

On 2016-02-13, Tinker  wrote:

Hi,

Some quite deep reading [1] taught me that at least quite recently,
there was a ~3GB cap on the buffer cache, independent of architecture
and system RAM size.

Reading the source history of vfs_bio.c [2] gives me a vague 
impression

that this cap is there also today.

Just wanted to check, has this cap been removed, or is there any plan 
to

remove it next months from now?


There was this commit, I don't *think* it got reverted.



CVSROOT:/cvs
Module name:src
Changes by: b...@cvs.openbsd.org2013/06/11 13:01:20

Modified files:
sys/kern   : kern_sysctl.c spec_vnops.c vfs_bio.c
 vfs_biomem.c vfs_vops.c
sys/sys: buf.h mount.h
sys/uvm: uvm_extern.h uvm_page.c
usr.bin/systat : iostat.c

Log message:
High memory page flipping for the buffer cache.

This change splits the buffer cache free lists into lists of dma 
reachable
buffers and high memory buffers based on the ranges returned by 
pmemrange.
Buffers move from dma to high memory as they age, but are flipped to 
dma

reachable memory if IO is needed to/from and high mem buffer. The total
amount of buffers  allocated is now bufcachepercent of both the dma and
the high memory region.

This change allows the use of large buffer caches on amd64 using more 
than

4 GB of memory

ok tedu@ krw@ - testing by many.



Indeed it's in the box e.g. 
http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/kern/vfs_bio.c etc.,


So >32bit/>~3GB support has been in the box since OpenBSD 5.6 or 5.7.

Awesome, thank you so much for clarifying!

Tinker