Re: [PATCH v12 10/20] dax: Replace XIP documentation with DAX documentation

2016-01-27 Thread Jared Hulbert
On Mon, Jan 25, 2016 at 1:18 PM, Jared Hulbert  wrote:
> On Mon, Jan 25, 2016 at 8:52 AM, Matthew Wilcox  wrote:
>> On Sun, Jan 24, 2016 at 01:03:49AM -0800, Jared Hulbert wrote:
>>> I our defense we didn't know we were sinning at the time.
>>
>> Fair enough.  Cache flushing is Hard.
>>
>>> Can you walk me through the cache flushing hole?  How is it okay on
>>> X86 but not VIVT archs?  I'm missing something obvious here.
>>>
>>> I thought earlier that vm_insert_mixed() handled the necessary
>>> flushing.  Is that even the part you are worried about?
>>
>> No, that part should be fine.  My concern is about write() calls to files
>> which are also mmaped.  See Documentation/cachetlb.txt around line 229,
>> starting with "There exists another whole class of cpu cache issues" ...
>
> oh wow.  So aren't all the copy_to/from_user() variants specifically
> supposed to handle such cases?
>
>>> What flushing functions would you call if you did have a cache page.
>>
>> Well, that's the problem; they don't currently exist.
>>
>>> There are all kinds of cache flushing functions that work without a
>>> struct page. If nothing else the specialized ASM instructions that do
>>> the various flushes don't use struct page as a parameter.  This isn't
>>> the first I've run into the lack of a sane cache API.  Grep for
>>> inval_cache in the mtd drivers, should have been much easier.  Isn't
>>> the proper solution to fix update_mmu_cache() or build out a pageless
>>> cache flushing API?
>>>
>>> I don't get the explicit mapping solution.  What are you mapping
>>> where?  What addresses would be SHMLBA?  Phys, kernel, userspace?
>>
>> The problem comes in dax_io() where the kernel stores to an alias of the
>> user address (or reads from an alias of the user address).  Theoretically,
>> we should flush user addresses before we read from the kernel's alias,
>> and flush the kernel's alias after we store to it.
>
> Reasoning this out loud here.  Please correct.
>
> For the dax read case:
> - kernel virt is mapped to pfn
> - data is memcpy'd from kernel virt
>
> For the dax write case:
> - kernel virt is mapped to pfn
> - data is memcpy'd to kernel virt
> - user virt map to pfn attempts to read
>
> Is that right?  I see the x86 does a nocache copy_to/from operation,
> I'm not familiar with the semantics of that call and it would take me
> a while to understand the assembly but I assume it's doing some magic
> opcodes that forces the writes down to physical memory with each
> load/store.  Does the the caching model of the x86 arch update the
> cache entries tied to the physical memory on update?
>
> For architectures that don't do auto coherency magic...
>
> For reads:
> - User dcaches need flushing before kernel virtual mapping to ensure
> kernel reads latest data.  If the user has unflushed data in the
> dcache it would not be reflected in the read copy.
> This failure mode only is a problem if the filesystem is RW.
>
> For writes:
> - Unlike the read case we don't need up to date data for the user's
> mapping of a pfn.  However, the user will need to caches invalidated
> to get fresh data, so we should make sure to writeback any affected
> lines in the user caches so they don't get lost if we do an
> invalidate.  I suppose uncommitted data might corrupt the new data
> written from the kernel mapping if the cachelines get flushed later.
> - After the data is memcpy'ed to the kernel virt map the cache, and
> possibly the write buffers, should be flushed.  Without this flush the
> data might not ever get to the user mapped versions.
> - Assuming the user maps were all flushed at the outset they should be
> reloaded with fresh data on access.
>
> Do I get it more or less?

I assume the silence means I don't get it.

Moving along...

The need to flush kernel aliases and user alias without a struct page
was articulated and cited as the reason why the DAX doesn't work with
ARM, MIPS, and SPARC.

One of the following routines should work for kernel flushing, right?
--  flush_cache_vmap(unsigned long start, unsigned long end)
--  flush_kernel_vmap_range(void *vaddr, int size)
--  invalidate_kernel_vmap_range(void *vaddr, int size)

For user aliases I'm less confident with here, but at first glance I
don't see why these wouldn't work?
-- flush_cache_page(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn)
-- flush_cache_range(struct vm_area_struct *vma, unsigned long start,
unsigned long end)

Help?!  I missing something here.

>> But if we create a new address for the kernel to use which lands on the
>> same cache line as the user's address (and this is what SHMLBA is used
>> to indicate), there is no incoherency between the kernel's view and the
>> user's view.  And no new cache flushing API is needed.
>
> So... how exactly would one force the kernel address to be at the
> SHMLBA boundary?
>
>> Is that clearer?  I'm not always good at explaining these things in a
>> way which makes sense to other people :-(
>
> Yeah.  I 

Re: [PATCH v12 10/20] dax: Replace XIP documentation with DAX documentation

2016-01-27 Thread Jared Hulbert
On Mon, Jan 25, 2016 at 1:18 PM, Jared Hulbert  wrote:
> On Mon, Jan 25, 2016 at 8:52 AM, Matthew Wilcox  wrote:
>> On Sun, Jan 24, 2016 at 01:03:49AM -0800, Jared Hulbert wrote:
>>> I our defense we didn't know we were sinning at the time.
>>
>> Fair enough.  Cache flushing is Hard.
>>
>>> Can you walk me through the cache flushing hole?  How is it okay on
>>> X86 but not VIVT archs?  I'm missing something obvious here.
>>>
>>> I thought earlier that vm_insert_mixed() handled the necessary
>>> flushing.  Is that even the part you are worried about?
>>
>> No, that part should be fine.  My concern is about write() calls to files
>> which are also mmaped.  See Documentation/cachetlb.txt around line 229,
>> starting with "There exists another whole class of cpu cache issues" ...
>
> oh wow.  So aren't all the copy_to/from_user() variants specifically
> supposed to handle such cases?
>
>>> What flushing functions would you call if you did have a cache page.
>>
>> Well, that's the problem; they don't currently exist.
>>
>>> There are all kinds of cache flushing functions that work without a
>>> struct page. If nothing else the specialized ASM instructions that do
>>> the various flushes don't use struct page as a parameter.  This isn't
>>> the first I've run into the lack of a sane cache API.  Grep for
>>> inval_cache in the mtd drivers, should have been much easier.  Isn't
>>> the proper solution to fix update_mmu_cache() or build out a pageless
>>> cache flushing API?
>>>
>>> I don't get the explicit mapping solution.  What are you mapping
>>> where?  What addresses would be SHMLBA?  Phys, kernel, userspace?
>>
>> The problem comes in dax_io() where the kernel stores to an alias of the
>> user address (or reads from an alias of the user address).  Theoretically,
>> we should flush user addresses before we read from the kernel's alias,
>> and flush the kernel's alias after we store to it.
>
> Reasoning this out loud here.  Please correct.
>
> For the dax read case:
> - kernel virt is mapped to pfn
> - data is memcpy'd from kernel virt
>
> For the dax write case:
> - kernel virt is mapped to pfn
> - data is memcpy'd to kernel virt
> - user virt map to pfn attempts to read
>
> Is that right?  I see the x86 does a nocache copy_to/from operation,
> I'm not familiar with the semantics of that call and it would take me
> a while to understand the assembly but I assume it's doing some magic
> opcodes that forces the writes down to physical memory with each
> load/store.  Does the the caching model of the x86 arch update the
> cache entries tied to the physical memory on update?
>
> For architectures that don't do auto coherency magic...
>
> For reads:
> - User dcaches need flushing before kernel virtual mapping to ensure
> kernel reads latest data.  If the user has unflushed data in the
> dcache it would not be reflected in the read copy.
> This failure mode only is a problem if the filesystem is RW.
>
> For writes:
> - Unlike the read case we don't need up to date data for the user's
> mapping of a pfn.  However, the user will need to caches invalidated
> to get fresh data, so we should make sure to writeback any affected
> lines in the user caches so they don't get lost if we do an
> invalidate.  I suppose uncommitted data might corrupt the new data
> written from the kernel mapping if the cachelines get flushed later.
> - After the data is memcpy'ed to the kernel virt map the cache, and
> possibly the write buffers, should be flushed.  Without this flush the
> data might not ever get to the user mapped versions.
> - Assuming the user maps were all flushed at the outset they should be
> reloaded with fresh data on access.
>
> Do I get it more or less?

I assume the silence means I don't get it.

Moving along...

The need to flush kernel aliases and user alias without a struct page
was articulated and cited as the reason why the DAX doesn't work with
ARM, MIPS, and SPARC.

One of the following routines should work for kernel flushing, right?
--  flush_cache_vmap(unsigned long start, unsigned long end)
--  flush_kernel_vmap_range(void *vaddr, int size)
--  invalidate_kernel_vmap_range(void *vaddr, int size)

For user aliases I'm less confident with here, but at first glance I
don't see why these wouldn't work?
-- flush_cache_page(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn)
-- flush_cache_range(struct vm_area_struct *vma, unsigned long start,
unsigned long end)

Help?!  I missing something here.

>> But if we create a new address for the kernel to use which lands on the
>> same cache line as the user's address (and this is what SHMLBA is used
>> to indicate), there is no incoherency between the kernel's view and the
>> user's view.  And no new cache flushing API is needed.
>
> So... how exactly would one force the kernel address to be at the
> SHMLBA boundary?
>
>> Is that clearer?  I'm not always good at explaining these things in a
>> way which makes 

Re: [PATCH v12 10/20] dax: Replace XIP documentation with DAX documentation

2016-01-25 Thread Jared Hulbert
On Mon, Jan 25, 2016 at 8:52 AM, Matthew Wilcox  wrote:
> On Sun, Jan 24, 2016 at 01:03:49AM -0800, Jared Hulbert wrote:
>> I our defense we didn't know we were sinning at the time.
>
> Fair enough.  Cache flushing is Hard.
>
>> Can you walk me through the cache flushing hole?  How is it okay on
>> X86 but not VIVT archs?  I'm missing something obvious here.
>>
>> I thought earlier that vm_insert_mixed() handled the necessary
>> flushing.  Is that even the part you are worried about?
>
> No, that part should be fine.  My concern is about write() calls to files
> which are also mmaped.  See Documentation/cachetlb.txt around line 229,
> starting with "There exists another whole class of cpu cache issues" ...

oh wow.  So aren't all the copy_to/from_user() variants specifically
supposed to handle such cases?

>> What flushing functions would you call if you did have a cache page.
>
> Well, that's the problem; they don't currently exist.
>
>> There are all kinds of cache flushing functions that work without a
>> struct page. If nothing else the specialized ASM instructions that do
>> the various flushes don't use struct page as a parameter.  This isn't
>> the first I've run into the lack of a sane cache API.  Grep for
>> inval_cache in the mtd drivers, should have been much easier.  Isn't
>> the proper solution to fix update_mmu_cache() or build out a pageless
>> cache flushing API?
>>
>> I don't get the explicit mapping solution.  What are you mapping
>> where?  What addresses would be SHMLBA?  Phys, kernel, userspace?
>
> The problem comes in dax_io() where the kernel stores to an alias of the
> user address (or reads from an alias of the user address).  Theoretically,
> we should flush user addresses before we read from the kernel's alias,
> and flush the kernel's alias after we store to it.

Reasoning this out loud here.  Please correct.

For the dax read case:
- kernel virt is mapped to pfn
- data is memcpy'd from kernel virt

For the dax write case:
- kernel virt is mapped to pfn
- data is memcpy'd to kernel virt
- user virt map to pfn attempts to read

Is that right?  I see the x86 does a nocache copy_to/from operation,
I'm not familiar with the semantics of that call and it would take me
a while to understand the assembly but I assume it's doing some magic
opcodes that forces the writes down to physical memory with each
load/store.  Does the the caching model of the x86 arch update the
cache entries tied to the physical memory on update?

For architectures that don't do auto coherency magic...

For reads:
- User dcaches need flushing before kernel virtual mapping to ensure
kernel reads latest data.  If the user has unflushed data in the
dcache it would not be reflected in the read copy.
This failure mode only is a problem if the filesystem is RW.

For writes:
- Unlike the read case we don't need up to date data for the user's
mapping of a pfn.  However, the user will need to caches invalidated
to get fresh data, so we should make sure to writeback any affected
lines in the user caches so they don't get lost if we do an
invalidate.  I suppose uncommitted data might corrupt the new data
written from the kernel mapping if the cachelines get flushed later.
- After the data is memcpy'ed to the kernel virt map the cache, and
possibly the write buffers, should be flushed.  Without this flush the
data might not ever get to the user mapped versions.
- Assuming the user maps were all flushed at the outset they should be
reloaded with fresh data on access.

Do I get it more or less?

> But if we create a new address for the kernel to use which lands on the
> same cache line as the user's address (and this is what SHMLBA is used
> to indicate), there is no incoherency between the kernel's view and the
> user's view.  And no new cache flushing API is needed.

So... how exactly would one force the kernel address to be at the
SHMLBA boundary?

> Is that clearer?  I'm not always good at explaining these things in a
> way which makes sense to other people :-(

Yeah.  I think I'm at 80% comprehension here.  Or at least I think I
am.  Thanks.


Re: [PATCH v12 10/20] dax: Replace XIP documentation with DAX documentation

2016-01-25 Thread Matthew Wilcox
On Sun, Jan 24, 2016 at 01:03:49AM -0800, Jared Hulbert wrote:
> I our defense we didn't know we were sinning at the time.

Fair enough.  Cache flushing is Hard.

> Can you walk me through the cache flushing hole?  How is it okay on
> X86 but not VIVT archs?  I'm missing something obvious here.
> 
> I thought earlier that vm_insert_mixed() handled the necessary
> flushing.  Is that even the part you are worried about?

No, that part should be fine.  My concern is about write() calls to files
which are also mmaped.  See Documentation/cachetlb.txt around line 229,
starting with "There exists another whole class of cpu cache issues" ...

> What flushing functions would you call if you did have a cache page.

Well, that's the problem; they don't currently exist.

> There are all kinds of cache flushing functions that work without a
> struct page. If nothing else the specialized ASM instructions that do
> the various flushes don't use struct page as a parameter.  This isn't
> the first I've run into the lack of a sane cache API.  Grep for
> inval_cache in the mtd drivers, should have been much easier.  Isn't
> the proper solution to fix update_mmu_cache() or build out a pageless
> cache flushing API?
> 
> I don't get the explicit mapping solution.  What are you mapping
> where?  What addresses would be SHMLBA?  Phys, kernel, userspace?

The problem comes in dax_io() where the kernel stores to an alias of the
user address (or reads from an alias of the user address).  Theoretically,
we should flush user addresses before we read from the kernel's alias,
and flush the kernel's alias after we store to it.

But if we create a new address for the kernel to use which lands on the
same cache line as the user's address (and this is what SHMLBA is used
to indicate), there is no incoherency between the kernel's view and the
user's view.  And no new cache flushing API is needed.

Is that clearer?  I'm not always good at explaining these things in a
way which makes sense to other people :-(


Re: [PATCH v12 10/20] dax: Replace XIP documentation with DAX documentation

2016-01-25 Thread Matthew Wilcox
On Sun, Jan 24, 2016 at 01:03:49AM -0800, Jared Hulbert wrote:
> I our defense we didn't know we were sinning at the time.

Fair enough.  Cache flushing is Hard.

> Can you walk me through the cache flushing hole?  How is it okay on
> X86 but not VIVT archs?  I'm missing something obvious here.
> 
> I thought earlier that vm_insert_mixed() handled the necessary
> flushing.  Is that even the part you are worried about?

No, that part should be fine.  My concern is about write() calls to files
which are also mmaped.  See Documentation/cachetlb.txt around line 229,
starting with "There exists another whole class of cpu cache issues" ...

> What flushing functions would you call if you did have a cache page.

Well, that's the problem; they don't currently exist.

> There are all kinds of cache flushing functions that work without a
> struct page. If nothing else the specialized ASM instructions that do
> the various flushes don't use struct page as a parameter.  This isn't
> the first I've run into the lack of a sane cache API.  Grep for
> inval_cache in the mtd drivers, should have been much easier.  Isn't
> the proper solution to fix update_mmu_cache() or build out a pageless
> cache flushing API?
> 
> I don't get the explicit mapping solution.  What are you mapping
> where?  What addresses would be SHMLBA?  Phys, kernel, userspace?

The problem comes in dax_io() where the kernel stores to an alias of the
user address (or reads from an alias of the user address).  Theoretically,
we should flush user addresses before we read from the kernel's alias,
and flush the kernel's alias after we store to it.

But if we create a new address for the kernel to use which lands on the
same cache line as the user's address (and this is what SHMLBA is used
to indicate), there is no incoherency between the kernel's view and the
user's view.  And no new cache flushing API is needed.

Is that clearer?  I'm not always good at explaining these things in a
way which makes sense to other people :-(


Re: [PATCH v12 10/20] dax: Replace XIP documentation with DAX documentation

2016-01-25 Thread Jared Hulbert
On Mon, Jan 25, 2016 at 8:52 AM, Matthew Wilcox  wrote:
> On Sun, Jan 24, 2016 at 01:03:49AM -0800, Jared Hulbert wrote:
>> I our defense we didn't know we were sinning at the time.
>
> Fair enough.  Cache flushing is Hard.
>
>> Can you walk me through the cache flushing hole?  How is it okay on
>> X86 but not VIVT archs?  I'm missing something obvious here.
>>
>> I thought earlier that vm_insert_mixed() handled the necessary
>> flushing.  Is that even the part you are worried about?
>
> No, that part should be fine.  My concern is about write() calls to files
> which are also mmaped.  See Documentation/cachetlb.txt around line 229,
> starting with "There exists another whole class of cpu cache issues" ...

oh wow.  So aren't all the copy_to/from_user() variants specifically
supposed to handle such cases?

>> What flushing functions would you call if you did have a cache page.
>
> Well, that's the problem; they don't currently exist.
>
>> There are all kinds of cache flushing functions that work without a
>> struct page. If nothing else the specialized ASM instructions that do
>> the various flushes don't use struct page as a parameter.  This isn't
>> the first I've run into the lack of a sane cache API.  Grep for
>> inval_cache in the mtd drivers, should have been much easier.  Isn't
>> the proper solution to fix update_mmu_cache() or build out a pageless
>> cache flushing API?
>>
>> I don't get the explicit mapping solution.  What are you mapping
>> where?  What addresses would be SHMLBA?  Phys, kernel, userspace?
>
> The problem comes in dax_io() where the kernel stores to an alias of the
> user address (or reads from an alias of the user address).  Theoretically,
> we should flush user addresses before we read from the kernel's alias,
> and flush the kernel's alias after we store to it.

Reasoning this out loud here.  Please correct.

For the dax read case:
- kernel virt is mapped to pfn
- data is memcpy'd from kernel virt

For the dax write case:
- kernel virt is mapped to pfn
- data is memcpy'd to kernel virt
- user virt map to pfn attempts to read

Is that right?  I see the x86 does a nocache copy_to/from operation,
I'm not familiar with the semantics of that call and it would take me
a while to understand the assembly but I assume it's doing some magic
opcodes that forces the writes down to physical memory with each
load/store.  Does the the caching model of the x86 arch update the
cache entries tied to the physical memory on update?

For architectures that don't do auto coherency magic...

For reads:
- User dcaches need flushing before kernel virtual mapping to ensure
kernel reads latest data.  If the user has unflushed data in the
dcache it would not be reflected in the read copy.
This failure mode only is a problem if the filesystem is RW.

For writes:
- Unlike the read case we don't need up to date data for the user's
mapping of a pfn.  However, the user will need to caches invalidated
to get fresh data, so we should make sure to writeback any affected
lines in the user caches so they don't get lost if we do an
invalidate.  I suppose uncommitted data might corrupt the new data
written from the kernel mapping if the cachelines get flushed later.
- After the data is memcpy'ed to the kernel virt map the cache, and
possibly the write buffers, should be flushed.  Without this flush the
data might not ever get to the user mapped versions.
- Assuming the user maps were all flushed at the outset they should be
reloaded with fresh data on access.

Do I get it more or less?

> But if we create a new address for the kernel to use which lands on the
> same cache line as the user's address (and this is what SHMLBA is used
> to indicate), there is no incoherency between the kernel's view and the
> user's view.  And no new cache flushing API is needed.

So... how exactly would one force the kernel address to be at the
SHMLBA boundary?

> Is that clearer?  I'm not always good at explaining these things in a
> way which makes sense to other people :-(

Yeah.  I think I'm at 80% comprehension here.  Or at least I think I
am.  Thanks.


Re: [PATCH v12 10/20] dax: Replace XIP documentation with DAX documentation

2016-01-24 Thread Jared Hulbert
I our defense we didn't know we were sinning at the time.

Can you walk me through the cache flushing hole?  How is it okay on
X86 but not VIVT archs?  I'm missing something obvious here.

I thought earlier that vm_insert_mixed() handled the necessary
flushing.  Is that even the part you are worried about?

vm_insert_mixed()->insert_pfn()->update_mmu_cache() _should_ handle
the flush.  Except of course now that I look at the ARM code it looks
like it isn't doing anything if !pfn_valid().I need to spend
some more time looking at this again.

What flushing functions would you call if you did have a cache page.
There are all kinds of cache flushing functions that work without a
struct page. If nothing else the specialized ASM instructions that do
the various flushes don't use struct page as a parameter.  This isn't
the first I've run into the lack of a sane cache API.  Grep for
inval_cache in the mtd drivers, should have been much easier.  Isn't
the proper solution to fix update_mmu_cache() or build out a pageless
cache flushing API?

I don't get the explicit mapping solution.  What are you mapping
where?  What addresses would be SHMLBA?  Phys, kernel, userspace?


Re: [PATCH v12 10/20] dax: Replace XIP documentation with DAX documentation

2016-01-24 Thread Jared Hulbert
I our defense we didn't know we were sinning at the time.

Can you walk me through the cache flushing hole?  How is it okay on
X86 but not VIVT archs?  I'm missing something obvious here.

I thought earlier that vm_insert_mixed() handled the necessary
flushing.  Is that even the part you are worried about?

vm_insert_mixed()->insert_pfn()->update_mmu_cache() _should_ handle
the flush.  Except of course now that I look at the ARM code it looks
like it isn't doing anything if !pfn_valid().I need to spend
some more time looking at this again.

What flushing functions would you call if you did have a cache page.
There are all kinds of cache flushing functions that work without a
struct page. If nothing else the specialized ASM instructions that do
the various flushes don't use struct page as a parameter.  This isn't
the first I've run into the lack of a sane cache API.  Grep for
inval_cache in the mtd drivers, should have been much easier.  Isn't
the proper solution to fix update_mmu_cache() or build out a pageless
cache flushing API?

I don't get the explicit mapping solution.  What are you mapping
where?  What addresses would be SHMLBA?  Phys, kernel, userspace?


Re: [PATCH v12 10/20] dax: Replace XIP documentation with DAX documentation

2016-01-22 Thread Matthew Wilcox
On Fri, Jan 22, 2016 at 01:48:08PM +, Chris Brandt wrote:
> I believe the motivation for the new DAX code was being able to
> read/write data directly to specific physical memory. However, with
> the AXFS file system, XIP file mapping was mostly beneficial for direct
> access to executable code pages, not data. Code pages were XIP-ed, and
> data pages were copied to RAM as normal. This results in a significant
> reduction in system RAM, especially when used with an XIP_KERNEL. In
> some systems, most of your RAM is eaten up by lots of code pages from
> big bloated shared libraries, not R/W data. (of course I'm talking about
> smaller embedded system here)

OK, I can't construct a failure case for read-only usages.  If you want
to put together a patch-set that re-enables DAX in a read-only way on
those architectures, I'm fine with that.

I think your time would be better spent fixing the read-write problems;
once we see persistent memory on the embedded platforms, we'll need that
code anyway.



RE: [PATCH v12 10/20] dax: Replace XIP documentation with DAX documentation

2016-01-22 Thread Chris Brandt
I believe the motivation for the new DAX code was being able to read/write data 
directly to specific physical memory. However, with the AXFS file system, XIP 
file mapping was mostly beneficial for direct access to executable code pages, 
not data. Code pages were XIP-ed, and data pages were copied to RAM as normal. 
This results in a significant reduction in system RAM, especially when used 
with an XIP_KERNEL. In some systems, most of your RAM is eaten up by lots of 
code pages from big bloated shared libraries, not R/W data. (of course I'm 
talking about smaller embedded system here)


Also, it's up to the file system decide to decide what should be XIP/DAX or 
not. If your motivation is to DAX/XIP code pages to save RAM, then you don't 
have to worry about '/etc/password' cache issues, because that file would be 
handled in a traditional manner.

I think it comes down to what your motivation to DAX is: DAX data or DAX code


Chris



-Original Message-
From: Wilcox, Matthew R [mailto:matthew.r.wil...@intel.com] 
Sent: Friday, January 22, 2016 8:08 AM
To: Jared Hulbert 
Cc: Linux FS Devel ; LKML 
; Linux Memory Management List 
; Matthew Wilcox ; Andrew Morton 
; Carsten Otte ; Chris Brandt 

Subject: RE: [PATCH v12 10/20] dax: Replace XIP documentation with DAX 
documentation

Hi Jared,

The old filemap_xip code was living in a state of sin ;-)  It was writing to 
the kernel's mapping of an address, and then not flushing the cache before 
telling userspace that the data was updated.  That left userspace able to read 
stale data, which might actually have been a security hole (had that page 
previously contained, say, /etc/passwd).

We don't have cache flushing functions that work without a struct page.  So we 
need to come up with a new solution.  My preferred solution is to explicitly 
map the memory before using it.  On ARM, MIPS & SPARC, each page should be 
mapped to an address that is at a multiple of SHMLBA from the address that the 
user has the page mapped at.  On other architectures, there is no d-cache flush 
problem, so they can use an identity map.

Or you can just enable the DAX code and continue living in the state of sin 
that you were in before.  It probably won't bite you ... maybe ...

-Original Message-
From: Jared Hulbert [mailto:jare...@gmail.com]
Sent: Thursday, January 21, 2016 10:38 AM
To: Wilcox, Matthew R
Cc: Linux FS Devel; LKML; Linux Memory Management List; Matthew Wilcox; Andrew 
Morton; Carsten Otte; Chris Brandt
Subject: Re: [PATCH v12 10/20] dax: Replace XIP documentation with DAX 
documentation

HI!  I've been out of the community for a while, but I'm trying to step back in 
here and catch up with some of my old areas of specialty.
Couple questions, sorry to drag up such old conversations.

The DAX documentation that made it into kernel 4.0 has the following line  "The 
DAX code does not work correctly on architectures which have virtually mapped 
caches such as ARM, MIPS and SPARC."

1) It really doesn't support ARM.?  I never had problems with the old 
filemap_xip.c stuff on ARM, what changed?
2) Is there a thread discussing this?

On Fri, Oct 24, 2014 at 2:20 PM, Matthew Wilcox  
wrote:
> From: Matthew Wilcox 
>
> Based on the original XIP documentation, this documents the current 
> state of affairs, and includes instructions on how users can enable 
> DAX if their devices and kernel support it.
>
> Signed-off-by: Matthew Wilcox 
> Reviewed-by: Randy Dunlap 
> ---
>  Documentation/filesystems/00-INDEX |  5 ++-  
> Documentation/filesystems/dax.txt  | 89 
> ++
>  Documentation/filesystems/xip.txt  | 71 
> --
>  3 files changed, 92 insertions(+), 73 deletions(-)  create mode 
> 100644 Documentation/filesystems/dax.txt  delete mode 100644 
> Documentation/filesystems/xip.txt
>
> diff --git a/Documentation/filesystems/00-INDEX 
> b/Documentation/filesystems/00-INDEX
> index ac28149..9922939 100644
> --- a/Documentation/filesystems/00-INDEX
> +++ b/Documentation/filesystems/00-INDEX
> @@ -34,6 +34,9 @@ configfs/
> - directory containing configfs documentation and example code.
>  cramfs.txt
> - info on the cram filesystem for small storage (ROMs etc).
> +dax.txt
> +   - info on avoiding the page cache for files stored on CPU-addressable
> + storage devices.
>  debugfs.txt
> - info on the debugfs filesystem.
>  devpts.txt
> @@ -154,5 +157,3 @@ xfs-self-describing-metadata.txt
> - info on XFS Self Describing Metadata.
>  xfs.txt
> - info and mount options for the XFS filesystem.
> -xip.txt
> -   - info on execute-in-place for file mappings.
> diff --git a/Documentation/filesystems/dax.txt 
> b/Documentation/filesystems/dax.txt
> new file mode 100644
> index 000..635adaa
> 

RE: [PATCH v12 10/20] dax: Replace XIP documentation with DAX documentation

2016-01-22 Thread Wilcox, Matthew R
Hi Jared,

The old filemap_xip code was living in a state of sin ;-)  It was writing to 
the kernel's mapping of an address, and then not flushing the cache before 
telling userspace that the data was updated.  That left userspace able to read 
stale data, which might actually have been a security hole (had that page 
previously contained, say, /etc/passwd).

We don't have cache flushing functions that work without a struct page.  So we 
need to come up with a new solution.  My preferred solution is to explicitly 
map the memory before using it.  On ARM, MIPS & SPARC, each page should be 
mapped to an address that is at a multiple of SHMLBA from the address that the 
user has the page mapped at.  On other architectures, there is no d-cache flush 
problem, so they can use an identity map.

Or you can just enable the DAX code and continue living in the state of sin 
that you were in before.  It probably won't bite you ... maybe ...

-Original Message-
From: Jared Hulbert [mailto:jare...@gmail.com] 
Sent: Thursday, January 21, 2016 10:38 AM
To: Wilcox, Matthew R
Cc: Linux FS Devel; LKML; Linux Memory Management List; Matthew Wilcox; Andrew 
Morton; Carsten Otte; Chris Brandt
Subject: Re: [PATCH v12 10/20] dax: Replace XIP documentation with DAX 
documentation

HI!  I've been out of the community for a while, but I'm trying to
step back in here and catch up with some of my old areas of specialty.
Couple questions, sorry to drag up such old conversations.

The DAX documentation that made it into kernel 4.0 has the following
line  "The DAX code does not work correctly on architectures which
have virtually mapped caches such as ARM, MIPS and SPARC."

1) It really doesn't support ARM.?  I never had problems with
the old filemap_xip.c stuff on ARM, what changed?
2) Is there a thread discussing this?

On Fri, Oct 24, 2014 at 2:20 PM, Matthew Wilcox
 wrote:
> From: Matthew Wilcox 
>
> Based on the original XIP documentation, this documents the current
> state of affairs, and includes instructions on how users can enable DAX
> if their devices and kernel support it.
>
> Signed-off-by: Matthew Wilcox 
> Reviewed-by: Randy Dunlap 
> ---
>  Documentation/filesystems/00-INDEX |  5 ++-
>  Documentation/filesystems/dax.txt  | 89 
> ++
>  Documentation/filesystems/xip.txt  | 71 --
>  3 files changed, 92 insertions(+), 73 deletions(-)
>  create mode 100644 Documentation/filesystems/dax.txt
>  delete mode 100644 Documentation/filesystems/xip.txt
>
> diff --git a/Documentation/filesystems/00-INDEX 
> b/Documentation/filesystems/00-INDEX
> index ac28149..9922939 100644
> --- a/Documentation/filesystems/00-INDEX
> +++ b/Documentation/filesystems/00-INDEX
> @@ -34,6 +34,9 @@ configfs/
> - directory containing configfs documentation and example code.
>  cramfs.txt
> - info on the cram filesystem for small storage (ROMs etc).
> +dax.txt
> +   - info on avoiding the page cache for files stored on CPU-addressable
> + storage devices.
>  debugfs.txt
> - info on the debugfs filesystem.
>  devpts.txt
> @@ -154,5 +157,3 @@ xfs-self-describing-metadata.txt
> - info on XFS Self Describing Metadata.
>  xfs.txt
> - info and mount options for the XFS filesystem.
> -xip.txt
> -   - info on execute-in-place for file mappings.
> diff --git a/Documentation/filesystems/dax.txt 
> b/Documentation/filesystems/dax.txt
> new file mode 100644
> index 000..635adaa
> --- /dev/null
> +++ b/Documentation/filesystems/dax.txt
> @@ -0,0 +1,89 @@
> +Direct Access for files
> +---
> +
> +Motivation
> +--
> +
> +The page cache is usually used to buffer reads and writes to files.
> +It is also used to provide the pages which are mapped into userspace
> +by a call to mmap.
> +
> +For block devices that are memory-like, the page cache pages would be
> +unnecessary copies of the original storage.  The DAX code removes the
> +extra copy by performing reads and writes directly to the storage device.
> +For file mappings, the storage device is mapped directly into userspace.
> +
> +
> +Usage
> +-
> +
> +If you have a block device which supports DAX, you can make a filesystem
> +on it as usual.  When mounting it, use the -o dax option manually
> +or add 'dax' to the options in /etc/fstab.
> +
> +
> +Implementation Tips for Block Driver Writers
> +
> +
> +To support DAX in your block driver, implement the 'direct_access'
> +block device operation.  It is used to translate the sector number
> +(expressed in units of 512-byte sectors) to a page frame number (pfn)
> +that identifies the physical page for the memory.  It 

RE: [PATCH v12 10/20] dax: Replace XIP documentation with DAX documentation

2016-01-22 Thread Wilcox, Matthew R
Hi Jared,

The old filemap_xip code was living in a state of sin ;-)  It was writing to 
the kernel's mapping of an address, and then not flushing the cache before 
telling userspace that the data was updated.  That left userspace able to read 
stale data, which might actually have been a security hole (had that page 
previously contained, say, /etc/passwd).

We don't have cache flushing functions that work without a struct page.  So we 
need to come up with a new solution.  My preferred solution is to explicitly 
map the memory before using it.  On ARM, MIPS & SPARC, each page should be 
mapped to an address that is at a multiple of SHMLBA from the address that the 
user has the page mapped at.  On other architectures, there is no d-cache flush 
problem, so they can use an identity map.

Or you can just enable the DAX code and continue living in the state of sin 
that you were in before.  It probably won't bite you ... maybe ...

-Original Message-
From: Jared Hulbert [mailto:jare...@gmail.com] 
Sent: Thursday, January 21, 2016 10:38 AM
To: Wilcox, Matthew R
Cc: Linux FS Devel; LKML; Linux Memory Management List; Matthew Wilcox; Andrew 
Morton; Carsten Otte; Chris Brandt
Subject: Re: [PATCH v12 10/20] dax: Replace XIP documentation with DAX 
documentation

HI!  I've been out of the community for a while, but I'm trying to
step back in here and catch up with some of my old areas of specialty.
Couple questions, sorry to drag up such old conversations.

The DAX documentation that made it into kernel 4.0 has the following
line  "The DAX code does not work correctly on architectures which
have virtually mapped caches such as ARM, MIPS and SPARC."

1) It really doesn't support ARM.?  I never had problems with
the old filemap_xip.c stuff on ARM, what changed?
2) Is there a thread discussing this?

On Fri, Oct 24, 2014 at 2:20 PM, Matthew Wilcox
<matthew.r.wil...@intel.com> wrote:
> From: Matthew Wilcox <wi...@linux.intel.com>
>
> Based on the original XIP documentation, this documents the current
> state of affairs, and includes instructions on how users can enable DAX
> if their devices and kernel support it.
>
> Signed-off-by: Matthew Wilcox <wi...@linux.intel.com>
> Reviewed-by: Randy Dunlap <rdun...@infradead.org>
> ---
>  Documentation/filesystems/00-INDEX |  5 ++-
>  Documentation/filesystems/dax.txt  | 89 
> ++
>  Documentation/filesystems/xip.txt  | 71 --
>  3 files changed, 92 insertions(+), 73 deletions(-)
>  create mode 100644 Documentation/filesystems/dax.txt
>  delete mode 100644 Documentation/filesystems/xip.txt
>
> diff --git a/Documentation/filesystems/00-INDEX 
> b/Documentation/filesystems/00-INDEX
> index ac28149..9922939 100644
> --- a/Documentation/filesystems/00-INDEX
> +++ b/Documentation/filesystems/00-INDEX
> @@ -34,6 +34,9 @@ configfs/
> - directory containing configfs documentation and example code.
>  cramfs.txt
> - info on the cram filesystem for small storage (ROMs etc).
> +dax.txt
> +   - info on avoiding the page cache for files stored on CPU-addressable
> + storage devices.
>  debugfs.txt
> - info on the debugfs filesystem.
>  devpts.txt
> @@ -154,5 +157,3 @@ xfs-self-describing-metadata.txt
> - info on XFS Self Describing Metadata.
>  xfs.txt
> - info and mount options for the XFS filesystem.
> -xip.txt
> -   - info on execute-in-place for file mappings.
> diff --git a/Documentation/filesystems/dax.txt 
> b/Documentation/filesystems/dax.txt
> new file mode 100644
> index 000..635adaa
> --- /dev/null
> +++ b/Documentation/filesystems/dax.txt
> @@ -0,0 +1,89 @@
> +Direct Access for files
> +---
> +
> +Motivation
> +--
> +
> +The page cache is usually used to buffer reads and writes to files.
> +It is also used to provide the pages which are mapped into userspace
> +by a call to mmap.
> +
> +For block devices that are memory-like, the page cache pages would be
> +unnecessary copies of the original storage.  The DAX code removes the
> +extra copy by performing reads and writes directly to the storage device.
> +For file mappings, the storage device is mapped directly into userspace.
> +
> +
> +Usage
> +-
> +
> +If you have a block device which supports DAX, you can make a filesystem
> +on it as usual.  When mounting it, use the -o dax option manually
> +or add 'dax' to the options in /etc/fstab.
> +
> +
> +Implementation Tips for Block Driver Writers
> +
> +
> +To support DAX in your block driver, implement the 'direct_access'
> +block device operation.  It is used to translate the sector number
> +

RE: [PATCH v12 10/20] dax: Replace XIP documentation with DAX documentation

2016-01-22 Thread Chris Brandt
I believe the motivation for the new DAX code was being able to read/write data 
directly to specific physical memory. However, with the AXFS file system, XIP 
file mapping was mostly beneficial for direct access to executable code pages, 
not data. Code pages were XIP-ed, and data pages were copied to RAM as normal. 
This results in a significant reduction in system RAM, especially when used 
with an XIP_KERNEL. In some systems, most of your RAM is eaten up by lots of 
code pages from big bloated shared libraries, not R/W data. (of course I'm 
talking about smaller embedded system here)


Also, it's up to the file system decide to decide what should be XIP/DAX or 
not. If your motivation is to DAX/XIP code pages to save RAM, then you don't 
have to worry about '/etc/password' cache issues, because that file would be 
handled in a traditional manner.

I think it comes down to what your motivation to DAX is: DAX data or DAX code


Chris



-Original Message-
From: Wilcox, Matthew R [mailto:matthew.r.wil...@intel.com] 
Sent: Friday, January 22, 2016 8:08 AM
To: Jared Hulbert <jare...@gmail.com>
Cc: Linux FS Devel <linux-fsde...@vger.kernel.org>; LKML 
<linux-kernel@vger.kernel.org>; Linux Memory Management List 
<linux...@kvack.org>; Matthew Wilcox <wi...@linux.intel.com>; Andrew Morton 
<a...@linux-foundation.org>; Carsten Otte <co...@de.ibm.com>; Chris Brandt 
<chris.bra...@renesas.com>
Subject: RE: [PATCH v12 10/20] dax: Replace XIP documentation with DAX 
documentation

Hi Jared,

The old filemap_xip code was living in a state of sin ;-)  It was writing to 
the kernel's mapping of an address, and then not flushing the cache before 
telling userspace that the data was updated.  That left userspace able to read 
stale data, which might actually have been a security hole (had that page 
previously contained, say, /etc/passwd).

We don't have cache flushing functions that work without a struct page.  So we 
need to come up with a new solution.  My preferred solution is to explicitly 
map the memory before using it.  On ARM, MIPS & SPARC, each page should be 
mapped to an address that is at a multiple of SHMLBA from the address that the 
user has the page mapped at.  On other architectures, there is no d-cache flush 
problem, so they can use an identity map.

Or you can just enable the DAX code and continue living in the state of sin 
that you were in before.  It probably won't bite you ... maybe ...

-Original Message-
From: Jared Hulbert [mailto:jare...@gmail.com]
Sent: Thursday, January 21, 2016 10:38 AM
To: Wilcox, Matthew R
Cc: Linux FS Devel; LKML; Linux Memory Management List; Matthew Wilcox; Andrew 
Morton; Carsten Otte; Chris Brandt
Subject: Re: [PATCH v12 10/20] dax: Replace XIP documentation with DAX 
documentation

HI!  I've been out of the community for a while, but I'm trying to step back in 
here and catch up with some of my old areas of specialty.
Couple questions, sorry to drag up such old conversations.

The DAX documentation that made it into kernel 4.0 has the following line  "The 
DAX code does not work correctly on architectures which have virtually mapped 
caches such as ARM, MIPS and SPARC."

1) It really doesn't support ARM.?  I never had problems with the old 
filemap_xip.c stuff on ARM, what changed?
2) Is there a thread discussing this?

On Fri, Oct 24, 2014 at 2:20 PM, Matthew Wilcox <matthew.r.wil...@intel.com> 
wrote:
> From: Matthew Wilcox <wi...@linux.intel.com>
>
> Based on the original XIP documentation, this documents the current 
> state of affairs, and includes instructions on how users can enable 
> DAX if their devices and kernel support it.
>
> Signed-off-by: Matthew Wilcox <wi...@linux.intel.com>
> Reviewed-by: Randy Dunlap <rdun...@infradead.org>
> ---
>  Documentation/filesystems/00-INDEX |  5 ++-  
> Documentation/filesystems/dax.txt  | 89 
> ++
>  Documentation/filesystems/xip.txt  | 71 
> --
>  3 files changed, 92 insertions(+), 73 deletions(-)  create mode 
> 100644 Documentation/filesystems/dax.txt  delete mode 100644 
> Documentation/filesystems/xip.txt
>
> diff --git a/Documentation/filesystems/00-INDEX 
> b/Documentation/filesystems/00-INDEX
> index ac28149..9922939 100644
> --- a/Documentation/filesystems/00-INDEX
> +++ b/Documentation/filesystems/00-INDEX
> @@ -34,6 +34,9 @@ configfs/
> - directory containing configfs documentation and example code.
>  cramfs.txt
> - info on the cram filesystem for small storage (ROMs etc).
> +dax.txt
> +   - info on avoiding the page cache for files stored on CPU-addressable
> + storage devices.
>  debugfs.txt
> - info on the debugfs filesystem.
>  devpts.txt
> @@ -154,5 +157,3 @@ xfs-self-describing-metadata.t

Re: [PATCH v12 10/20] dax: Replace XIP documentation with DAX documentation

2016-01-22 Thread Matthew Wilcox
On Fri, Jan 22, 2016 at 01:48:08PM +, Chris Brandt wrote:
> I believe the motivation for the new DAX code was being able to
> read/write data directly to specific physical memory. However, with
> the AXFS file system, XIP file mapping was mostly beneficial for direct
> access to executable code pages, not data. Code pages were XIP-ed, and
> data pages were copied to RAM as normal. This results in a significant
> reduction in system RAM, especially when used with an XIP_KERNEL. In
> some systems, most of your RAM is eaten up by lots of code pages from
> big bloated shared libraries, not R/W data. (of course I'm talking about
> smaller embedded system here)

OK, I can't construct a failure case for read-only usages.  If you want
to put together a patch-set that re-enables DAX in a read-only way on
those architectures, I'm fine with that.

I think your time would be better spent fixing the read-write problems;
once we see persistent memory on the embedded platforms, we'll need that
code anyway.



Re: [PATCH v12 10/20] dax: Replace XIP documentation with DAX documentation

2016-01-21 Thread Jared Hulbert
HI!  I've been out of the community for a while, but I'm trying to
step back in here and catch up with some of my old areas of specialty.
Couple questions, sorry to drag up such old conversations.

The DAX documentation that made it into kernel 4.0 has the following
line  "The DAX code does not work correctly on architectures which
have virtually mapped caches such as ARM, MIPS and SPARC."

1) It really doesn't support ARM.?  I never had problems with
the old filemap_xip.c stuff on ARM, what changed?
2) Is there a thread discussing this?

On Fri, Oct 24, 2014 at 2:20 PM, Matthew Wilcox
 wrote:
> From: Matthew Wilcox 
>
> Based on the original XIP documentation, this documents the current
> state of affairs, and includes instructions on how users can enable DAX
> if their devices and kernel support it.
>
> Signed-off-by: Matthew Wilcox 
> Reviewed-by: Randy Dunlap 
> ---
>  Documentation/filesystems/00-INDEX |  5 ++-
>  Documentation/filesystems/dax.txt  | 89 
> ++
>  Documentation/filesystems/xip.txt  | 71 --
>  3 files changed, 92 insertions(+), 73 deletions(-)
>  create mode 100644 Documentation/filesystems/dax.txt
>  delete mode 100644 Documentation/filesystems/xip.txt
>
> diff --git a/Documentation/filesystems/00-INDEX 
> b/Documentation/filesystems/00-INDEX
> index ac28149..9922939 100644
> --- a/Documentation/filesystems/00-INDEX
> +++ b/Documentation/filesystems/00-INDEX
> @@ -34,6 +34,9 @@ configfs/
> - directory containing configfs documentation and example code.
>  cramfs.txt
> - info on the cram filesystem for small storage (ROMs etc).
> +dax.txt
> +   - info on avoiding the page cache for files stored on CPU-addressable
> + storage devices.
>  debugfs.txt
> - info on the debugfs filesystem.
>  devpts.txt
> @@ -154,5 +157,3 @@ xfs-self-describing-metadata.txt
> - info on XFS Self Describing Metadata.
>  xfs.txt
> - info and mount options for the XFS filesystem.
> -xip.txt
> -   - info on execute-in-place for file mappings.
> diff --git a/Documentation/filesystems/dax.txt 
> b/Documentation/filesystems/dax.txt
> new file mode 100644
> index 000..635adaa
> --- /dev/null
> +++ b/Documentation/filesystems/dax.txt
> @@ -0,0 +1,89 @@
> +Direct Access for files
> +---
> +
> +Motivation
> +--
> +
> +The page cache is usually used to buffer reads and writes to files.
> +It is also used to provide the pages which are mapped into userspace
> +by a call to mmap.
> +
> +For block devices that are memory-like, the page cache pages would be
> +unnecessary copies of the original storage.  The DAX code removes the
> +extra copy by performing reads and writes directly to the storage device.
> +For file mappings, the storage device is mapped directly into userspace.
> +
> +
> +Usage
> +-
> +
> +If you have a block device which supports DAX, you can make a filesystem
> +on it as usual.  When mounting it, use the -o dax option manually
> +or add 'dax' to the options in /etc/fstab.
> +
> +
> +Implementation Tips for Block Driver Writers
> +
> +
> +To support DAX in your block driver, implement the 'direct_access'
> +block device operation.  It is used to translate the sector number
> +(expressed in units of 512-byte sectors) to a page frame number (pfn)
> +that identifies the physical page for the memory.  It also returns a
> +kernel virtual address that can be used to access the memory.
> +
> +The direct_access method takes a 'size' parameter that indicates the
> +number of bytes being requested.  The function should return the number
> +of bytes that can be contiguously accessed at that offset.  It may also
> +return a negative errno if an error occurs.
> +
> +In order to support this method, the storage must be byte-accessible by
> +the CPU at all times.  If your device uses paging techniques to expose
> +a large amount of memory through a smaller window, then you cannot
> +implement direct_access.  Equally, if your device can occasionally
> +stall the CPU for an extended period, you should also not attempt to
> +implement direct_access.
> +
> +These block devices may be used for inspiration:
> +- axonram: Axon DDR2 device driver
> +- brd: RAM backed block device driver
> +- dcssblk: s390 dcss block device driver
> +
> +
> +Implementation Tips for Filesystem Writers
> +--
> +
> +Filesystem support consists of
> +- adding support to mark inodes as being DAX by setting the S_DAX flag in
> +  i_flags
> +- implementing the direct_IO address space operation, and calling
> +  dax_do_io() instead of blockdev_direct_IO() if S_DAX is set
> +- implementing an mmap file operation for DAX files which sets the
> +  VM_MIXEDMAP flag on the VMA, and setting the vm_ops to include handlers
> +  for fault and page_mkwrite (which should probably call dax_fault() and
> +  dax_mkwrite(), 

Re: [PATCH v12 10/20] dax: Replace XIP documentation with DAX documentation

2016-01-21 Thread Jared Hulbert
HI!  I've been out of the community for a while, but I'm trying to
step back in here and catch up with some of my old areas of specialty.
Couple questions, sorry to drag up such old conversations.

The DAX documentation that made it into kernel 4.0 has the following
line  "The DAX code does not work correctly on architectures which
have virtually mapped caches such as ARM, MIPS and SPARC."

1) It really doesn't support ARM.?  I never had problems with
the old filemap_xip.c stuff on ARM, what changed?
2) Is there a thread discussing this?

On Fri, Oct 24, 2014 at 2:20 PM, Matthew Wilcox
 wrote:
> From: Matthew Wilcox 
>
> Based on the original XIP documentation, this documents the current
> state of affairs, and includes instructions on how users can enable DAX
> if their devices and kernel support it.
>
> Signed-off-by: Matthew Wilcox 
> Reviewed-by: Randy Dunlap 
> ---
>  Documentation/filesystems/00-INDEX |  5 ++-
>  Documentation/filesystems/dax.txt  | 89 
> ++
>  Documentation/filesystems/xip.txt  | 71 --
>  3 files changed, 92 insertions(+), 73 deletions(-)
>  create mode 100644 Documentation/filesystems/dax.txt
>  delete mode 100644 Documentation/filesystems/xip.txt
>
> diff --git a/Documentation/filesystems/00-INDEX 
> b/Documentation/filesystems/00-INDEX
> index ac28149..9922939 100644
> --- a/Documentation/filesystems/00-INDEX
> +++ b/Documentation/filesystems/00-INDEX
> @@ -34,6 +34,9 @@ configfs/
> - directory containing configfs documentation and example code.
>  cramfs.txt
> - info on the cram filesystem for small storage (ROMs etc).
> +dax.txt
> +   - info on avoiding the page cache for files stored on CPU-addressable
> + storage devices.
>  debugfs.txt
> - info on the debugfs filesystem.
>  devpts.txt
> @@ -154,5 +157,3 @@ xfs-self-describing-metadata.txt
> - info on XFS Self Describing Metadata.
>  xfs.txt
> - info and mount options for the XFS filesystem.
> -xip.txt
> -   - info on execute-in-place for file mappings.
> diff --git a/Documentation/filesystems/dax.txt 
> b/Documentation/filesystems/dax.txt
> new file mode 100644
> index 000..635adaa
> --- /dev/null
> +++ b/Documentation/filesystems/dax.txt
> @@ -0,0 +1,89 @@
> +Direct Access for files
> +---
> +
> +Motivation
> +--
> +
> +The page cache is usually used to buffer reads and writes to files.
> +It is also used to provide the pages which are mapped into userspace
> +by a call to mmap.
> +
> +For block devices that are memory-like, the page cache pages would be
> +unnecessary copies of the original storage.  The DAX code removes the
> +extra copy by performing reads and writes directly to the storage device.
> +For file mappings, the storage device is mapped directly into userspace.
> +
> +
> +Usage
> +-
> +
> +If you have a block device which supports DAX, you can make a filesystem
> +on it as usual.  When mounting it, use the -o dax option manually
> +or add 'dax' to the options in /etc/fstab.
> +
> +
> +Implementation Tips for Block Driver Writers
> +
> +
> +To support DAX in your block driver, implement the 'direct_access'
> +block device operation.  It is used to translate the sector number
> +(expressed in units of 512-byte sectors) to a page frame number (pfn)
> +that identifies the physical page for the memory.  It also returns a
> +kernel virtual address that can be used to access the memory.
> +
> +The direct_access method takes a 'size' parameter that indicates the
> +number of bytes being requested.  The function should return the number
> +of bytes that can be contiguously accessed at that offset.  It may also
> +return a negative errno if an error occurs.
> +
> +In order to support this method, the storage must be byte-accessible by
> +the CPU at all times.  If your device uses paging techniques to expose
> +a large amount of memory through a smaller window, then you cannot
> +implement direct_access.  Equally, if your device can occasionally
> +stall the CPU for an extended period, you should also not attempt to
> +implement direct_access.
> +
> +These block devices may be used for inspiration:
> +- axonram: Axon DDR2 device driver
> +- brd: RAM backed block device driver
> +- dcssblk: s390 dcss block device driver
> +
> +
> +Implementation Tips for Filesystem Writers
> +--
> +
> +Filesystem support consists of
> +- adding support to mark inodes as being DAX by setting the S_DAX flag in
> +  i_flags
> +- implementing the direct_IO address space operation, and calling
> +  dax_do_io() instead of blockdev_direct_IO() if S_DAX is set
> +- implementing an mmap file operation for DAX files which sets the
> +  VM_MIXEDMAP flag on the VMA, and setting the vm_ops to include handlers
> 

Re: [PATCH v12 10/20] dax: Replace XIP documentation with DAX documentation

2015-01-12 Thread Andrew Morton
On Fri, 24 Oct 2014 17:20:42 -0400 Matthew Wilcox  
wrote:

> Based on the original XIP documentation, this documents the current
> state of affairs, and includes instructions on how users can enable DAX
> if their devices and kernel support it.

Nice ;)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v12 10/20] dax: Replace XIP documentation with DAX documentation

2015-01-12 Thread Andrew Morton
On Fri, 24 Oct 2014 17:20:42 -0400 Matthew Wilcox matthew.r.wil...@intel.com 
wrote:

 Based on the original XIP documentation, this documents the current
 state of affairs, and includes instructions on how users can enable DAX
 if their devices and kernel support it.

Nice ;)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/