On Thu, Sep 03, 2015 at 09:32:02AM +0300, Boaz Harrosh wrote:
> On 09/02/2015 10:04 PM, Ross Zwisler wrote:
> > On Tue, Sep 01, 2015 at 03:18:41PM +0300, Boaz Harrosh wrote:
> <>
> >> Apps expect all these to work:
> >> 1. open mmap m-write msync ... close
> >> 2. open mmap m-write fsync ... close
On 09/02/2015 07:19 PM, Dave Hansen wrote:
> On 09/02/2015 09:00 AM, Boaz Harrosh wrote:
We are going to have 2-socket systems with 6TB of persistent memory in
them. I think it's important to design this mechanism so that it scales
to memory sizes like that and supports large
On 09/02/2015 10:04 PM, Ross Zwisler wrote:
> On Tue, Sep 01, 2015 at 03:18:41PM +0300, Boaz Harrosh wrote:
<>
>> Apps expect all these to work:
>> 1. open mmap m-write msync ... close
>> 2. open mmap m-write fsync ... close
>> 3. open mmap m-write unmap ... fsync close
>>
>> 4. open mmap m-write
On Thu, Sep 03, 2015 at 09:32:02AM +0300, Boaz Harrosh wrote:
> On 09/02/2015 10:04 PM, Ross Zwisler wrote:
> > On Tue, Sep 01, 2015 at 03:18:41PM +0300, Boaz Harrosh wrote:
> <>
> >> Apps expect all these to work:
> >> 1. open mmap m-write msync ... close
> >> 2. open mmap m-write fsync ... close
On 09/02/2015 10:04 PM, Ross Zwisler wrote:
> On Tue, Sep 01, 2015 at 03:18:41PM +0300, Boaz Harrosh wrote:
<>
>> Apps expect all these to work:
>> 1. open mmap m-write msync ... close
>> 2. open mmap m-write fsync ... close
>> 3. open mmap m-write unmap ... fsync close
>>
>> 4. open mmap m-write
On 09/02/2015 07:19 PM, Dave Hansen wrote:
> On 09/02/2015 09:00 AM, Boaz Harrosh wrote:
We are going to have 2-socket systems with 6TB of persistent memory in
them. I think it's important to design this mechanism so that it scales
to memory sizes like that and supports large
On Wed, Sep 02, 2015 at 12:13:21PM +0300, Kirill A. Shutemov wrote:
> On Wed, Sep 02, 2015 at 08:49:22AM +1000, Dave Chinner wrote:
> > On Tue, Sep 01, 2015 at 01:08:04PM +0300, Kirill A. Shutemov wrote:
> > > On Tue, Sep 01, 2015 at 09:38:03AM +1000, Dave Chinner wrote:
> > > > On Mon, Aug 31,
On Wed, Sep 02, 2015 at 01:04:01PM -0600, Ross Zwisler wrote:
> On Tue, Sep 01, 2015 at 03:18:41PM +0300, Boaz Harrosh wrote:
> > So the approach we took was a bit different to exactly solve these
> > problem, and to also not over flush too much. here is what we did.
> >
> > * At
On Tue, Sep 01, 2015 at 03:18:41PM +0300, Boaz Harrosh wrote:
> So the approach we took was a bit different to exactly solve these
> problem, and to also not over flush too much. here is what we did.
>
> * At vm_operations_struct we also override the .close vector (say call it
> dax_vm_close)
>
On Tue, Sep 01, 2015 at 04:12:42PM +0300, Boaz Harrosh wrote:
> On 08/31/2015 09:59 PM, Ross Zwisler wrote:
> > @@ -753,3 +755,18 @@ int dax_truncate_page(struct inode *inode, loff_t
> > from, get_block_t get_block)
> > return dax_zero_page_range(inode, from, length, get_block);
> > }
> >
On 09/02/2015 09:00 AM, Boaz Harrosh wrote:
>> > We are going to have 2-socket systems with 6TB of persistent memory in
>> > them. I think it's important to design this mechanism so that it scales
>> > to memory sizes like that and supports large mmap()s.
>> >
>> > I'm not sure the application
On 09/02/2015 06:39 PM, Dave Hansen wrote:
> On 09/02/2015 08:18 AM, Boaz Harrosh wrote:
>> On 09/02/2015 05:23 PM, Dave Hansen wrote:
I'd be curious what the cost is in practice. Do you have any actual
numbers of the cost of doing it this way?
Even if the instruction is a
On 09/02/2015 08:18 AM, Boaz Harrosh wrote:
> On 09/02/2015 05:23 PM, Dave Hansen wrote:
>> > I'd be curious what the cost is in practice. Do you have any actual
>> > numbers of the cost of doing it this way?
>> >
>> > Even if the instruction is a "noop", I'd really expect the overhead to
>> >
On 09/02/2015 05:23 PM, Dave Hansen wrote:
<>
> I'd be curious what the cost is in practice. Do you have any actual
> numbers of the cost of doing it this way?
>
> Even if the instruction is a "noop", I'd really expect the overhead to
> really add up for a tens-of-gigabytes mapping, no matter
On 09/02/2015 03:27 AM, Boaz Harrosh wrote:
>> > Yet you're ignoring the fact that flushing the entire range of the
>> > relevant VMAs may not be very efficient. It may be a very
>> > large mapping with only a few pages that need flushing from the
>> > cache, but you still iterate the mappings
On 09/02/2015 12:47 PM, Kirill A. Shutemov wrote:
<>
>
> I don't insist on applying the patch. And I worry about false-positives.
>
Thanks, yes
Boaz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo
On 09/02/2015 08:17 AM, Dave Chinner wrote:
> On Tue, Sep 01, 2015 at 09:19:45PM -0600, Ross Zwisler wrote:
>> On Wed, Sep 02, 2015 at 08:21:20AM +1000, Dave Chinner wrote:
>>> Which means applications that should "just work" without
>>> modification on DAX are now subtly broken and don't actually
On 09/02/2015 06:19 AM, Ross Zwisler wrote:
> On Wed, Sep 02, 2015 at 08:21:20AM +1000, Dave Chinner wrote:
>> Which means applications that should "just work" without
>> modification on DAX are now subtly broken and don't actually
>> guarantee data is safe after a crash. That's a pretty nasty
>>
On Wed, Sep 02, 2015 at 12:41:44PM +0300, Boaz Harrosh wrote:
> On 09/02/2015 12:37 PM, Boaz Harrosh wrote:
> >>
> >> + /*
> >> +* Make sure that for VM_MIXEDMAP VMA has both
> >> +* vm_ops->page_mkwrite and vm_ops->pfn_mkwrite or has
> >> none.
>
On 09/02/2015 12:37 PM, Boaz Harrosh wrote:
>>
>> + /*
>> +* Make sure that for VM_MIXEDMAP VMA has both
>> +* vm_ops->page_mkwrite and vm_ops->pfn_mkwrite or has none.
>> +*/
>> + if ((vma->vm_ops->page_mkwrite ||
On 09/02/2015 12:13 PM, Kirill A. Shutemov wrote:
> On Wed, Sep 02, 2015 at 08:49:22AM +1000, Dave Chinner wrote:
>> On Tue, Sep 01, 2015 at 01:08:04PM +0300, Kirill A. Shutemov wrote:
>>> On Tue, Sep 01, 2015 at 09:38:03AM +1000, Dave Chinner wrote:
On Mon, Aug 31, 2015 at 12:59:44PM -0600,
On Wed, Sep 02, 2015 at 08:49:22AM +1000, Dave Chinner wrote:
> On Tue, Sep 01, 2015 at 01:08:04PM +0300, Kirill A. Shutemov wrote:
> > On Tue, Sep 01, 2015 at 09:38:03AM +1000, Dave Chinner wrote:
> > > On Mon, Aug 31, 2015 at 12:59:44PM -0600, Ross Zwisler wrote:
> > > Even for DAX, msync has to
On Wed, Sep 02, 2015 at 08:49:22AM +1000, Dave Chinner wrote:
> On Tue, Sep 01, 2015 at 01:08:04PM +0300, Kirill A. Shutemov wrote:
> > On Tue, Sep 01, 2015 at 09:38:03AM +1000, Dave Chinner wrote:
> > > On Mon, Aug 31, 2015 at 12:59:44PM -0600, Ross Zwisler wrote:
> > > Even for DAX, msync has to
On 09/02/2015 05:23 PM, Dave Hansen wrote:
<>
> I'd be curious what the cost is in practice. Do you have any actual
> numbers of the cost of doing it this way?
>
> Even if the instruction is a "noop", I'd really expect the overhead to
> really add up for a tens-of-gigabytes mapping, no matter
On 09/02/2015 06:39 PM, Dave Hansen wrote:
> On 09/02/2015 08:18 AM, Boaz Harrosh wrote:
>> On 09/02/2015 05:23 PM, Dave Hansen wrote:
I'd be curious what the cost is in practice. Do you have any actual
numbers of the cost of doing it this way?
Even if the instruction is a
On Tue, Sep 01, 2015 at 04:12:42PM +0300, Boaz Harrosh wrote:
> On 08/31/2015 09:59 PM, Ross Zwisler wrote:
> > @@ -753,3 +755,18 @@ int dax_truncate_page(struct inode *inode, loff_t
> > from, get_block_t get_block)
> > return dax_zero_page_range(inode, from, length, get_block);
> > }
> >
On 09/02/2015 08:18 AM, Boaz Harrosh wrote:
> On 09/02/2015 05:23 PM, Dave Hansen wrote:
>> > I'd be curious what the cost is in practice. Do you have any actual
>> > numbers of the cost of doing it this way?
>> >
>> > Even if the instruction is a "noop", I'd really expect the overhead to
>> >
On 09/02/2015 09:00 AM, Boaz Harrosh wrote:
>> > We are going to have 2-socket systems with 6TB of persistent memory in
>> > them. I think it's important to design this mechanism so that it scales
>> > to memory sizes like that and supports large mmap()s.
>> >
>> > I'm not sure the application
On Tue, Sep 01, 2015 at 03:18:41PM +0300, Boaz Harrosh wrote:
> So the approach we took was a bit different to exactly solve these
> problem, and to also not over flush too much. here is what we did.
>
> * At vm_operations_struct we also override the .close vector (say call it
> dax_vm_close)
>
On Wed, Sep 02, 2015 at 12:13:21PM +0300, Kirill A. Shutemov wrote:
> On Wed, Sep 02, 2015 at 08:49:22AM +1000, Dave Chinner wrote:
> > On Tue, Sep 01, 2015 at 01:08:04PM +0300, Kirill A. Shutemov wrote:
> > > On Tue, Sep 01, 2015 at 09:38:03AM +1000, Dave Chinner wrote:
> > > > On Mon, Aug 31,
On Wed, Sep 02, 2015 at 01:04:01PM -0600, Ross Zwisler wrote:
> On Tue, Sep 01, 2015 at 03:18:41PM +0300, Boaz Harrosh wrote:
> > So the approach we took was a bit different to exactly solve these
> > problem, and to also not over flush too much. here is what we did.
> >
> > * At
On 09/02/2015 12:47 PM, Kirill A. Shutemov wrote:
<>
>
> I don't insist on applying the patch. And I worry about false-positives.
>
Thanks, yes
Boaz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo
On 09/02/2015 12:37 PM, Boaz Harrosh wrote:
>>
>> + /*
>> +* Make sure that for VM_MIXEDMAP VMA has both
>> +* vm_ops->page_mkwrite and vm_ops->pfn_mkwrite or has none.
>> +*/
>> + if ((vma->vm_ops->page_mkwrite ||
On 09/02/2015 08:17 AM, Dave Chinner wrote:
> On Tue, Sep 01, 2015 at 09:19:45PM -0600, Ross Zwisler wrote:
>> On Wed, Sep 02, 2015 at 08:21:20AM +1000, Dave Chinner wrote:
>>> Which means applications that should "just work" without
>>> modification on DAX are now subtly broken and don't actually
On 09/02/2015 12:13 PM, Kirill A. Shutemov wrote:
> On Wed, Sep 02, 2015 at 08:49:22AM +1000, Dave Chinner wrote:
>> On Tue, Sep 01, 2015 at 01:08:04PM +0300, Kirill A. Shutemov wrote:
>>> On Tue, Sep 01, 2015 at 09:38:03AM +1000, Dave Chinner wrote:
On Mon, Aug 31, 2015 at 12:59:44PM -0600,
On Wed, Sep 02, 2015 at 12:41:44PM +0300, Boaz Harrosh wrote:
> On 09/02/2015 12:37 PM, Boaz Harrosh wrote:
> >>
> >> + /*
> >> +* Make sure that for VM_MIXEDMAP VMA has both
> >> +* vm_ops->page_mkwrite and vm_ops->pfn_mkwrite or has
> >> none.
>
On 09/02/2015 06:19 AM, Ross Zwisler wrote:
> On Wed, Sep 02, 2015 at 08:21:20AM +1000, Dave Chinner wrote:
>> Which means applications that should "just work" without
>> modification on DAX are now subtly broken and don't actually
>> guarantee data is safe after a crash. That's a pretty nasty
>>
On 09/02/2015 03:27 AM, Boaz Harrosh wrote:
>> > Yet you're ignoring the fact that flushing the entire range of the
>> > relevant VMAs may not be very efficient. It may be a very
>> > large mapping with only a few pages that need flushing from the
>> > cache, but you still iterate the mappings
On Tue, Sep 01, 2015 at 09:19:45PM -0600, Ross Zwisler wrote:
> On Wed, Sep 02, 2015 at 08:21:20AM +1000, Dave Chinner wrote:
> > Which means applications that should "just work" without
> > modification on DAX are now subtly broken and don't actually
> > guarantee data is safe after a crash.
On Wed, Sep 02, 2015 at 08:21:20AM +1000, Dave Chinner wrote:
> Which means applications that should "just work" without
> modification on DAX are now subtly broken and don't actually
> guarantee data is safe after a crash. That's a pretty nasty
> landmine, and goes against *everything* we've
On Tue, Sep 01, 2015 at 01:08:04PM +0300, Kirill A. Shutemov wrote:
> On Tue, Sep 01, 2015 at 09:38:03AM +1000, Dave Chinner wrote:
> > On Mon, Aug 31, 2015 at 12:59:44PM -0600, Ross Zwisler wrote:
> > Even for DAX, msync has to call vfs_fsync_range() for the filesystem to
> > commit
> > the
On Tue, Sep 01, 2015 at 09:06:08AM +0200, Christoph Hellwig wrote:
> On Tue, Sep 01, 2015 at 09:38:03AM +1000, Dave Chinner wrote:
> > On Mon, Aug 31, 2015 at 12:59:44PM -0600, Ross Zwisler wrote:
> > > For DAX msync we just need to flush the given range using
> > > wb_cache_pmem(), which is now a
On 08/31/2015 09:59 PM, Ross Zwisler wrote:
> For DAX msync we just need to flush the given range using
> wb_cache_pmem(), which is now a public part of the PMEM API.
>
> The inclusion of in fs/dax.c was done to make checkpatch
> happy. Previously it was complaining about a bunch of undeclared
On 09/01/2015 10:06 AM, Christoph Hellwig wrote:
> On Tue, Sep 01, 2015 at 09:38:03AM +1000, Dave Chinner wrote:
>> On Mon, Aug 31, 2015 at 12:59:44PM -0600, Ross Zwisler wrote:
>>> For DAX msync we just need to flush the given range using
>>> wb_cache_pmem(), which is now a public part of the
On 09/01/2015 01:08 PM, Kirill A. Shutemov wrote:
<>
>
> Is that because XFS doesn't provide vm_ops->pfn_mkwrite?
>
Right that would explain it, because I sent that patch exactly to solve
this problem. I haven't looked at latest code for a while but I should
checkout the latest and make a patch
On Tue, Sep 01, 2015 at 09:38:03AM +1000, Dave Chinner wrote:
> On Mon, Aug 31, 2015 at 12:59:44PM -0600, Ross Zwisler wrote:
> > For DAX msync we just need to flush the given range using
> > wb_cache_pmem(), which is now a public part of the PMEM API.
>
> This is wrong, because it still leaves
On Tue, Sep 01, 2015 at 09:38:03AM +1000, Dave Chinner wrote:
> On Mon, Aug 31, 2015 at 12:59:44PM -0600, Ross Zwisler wrote:
> > For DAX msync we just need to flush the given range using
> > wb_cache_pmem(), which is now a public part of the PMEM API.
>
> This is wrong, because it still leaves
On Tue, Sep 01, 2015 at 09:38:03AM +1000, Dave Chinner wrote:
> On Mon, Aug 31, 2015 at 12:59:44PM -0600, Ross Zwisler wrote:
> > For DAX msync we just need to flush the given range using
> > wb_cache_pmem(), which is now a public part of the PMEM API.
>
> This is wrong, because it still leaves
On Wed, Sep 02, 2015 at 08:21:20AM +1000, Dave Chinner wrote:
> Which means applications that should "just work" without
> modification on DAX are now subtly broken and don't actually
> guarantee data is safe after a crash. That's a pretty nasty
> landmine, and goes against *everything* we've
On Tue, Sep 01, 2015 at 09:06:08AM +0200, Christoph Hellwig wrote:
> On Tue, Sep 01, 2015 at 09:38:03AM +1000, Dave Chinner wrote:
> > On Mon, Aug 31, 2015 at 12:59:44PM -0600, Ross Zwisler wrote:
> > > For DAX msync we just need to flush the given range using
> > > wb_cache_pmem(), which is now a
On Tue, Sep 01, 2015 at 01:08:04PM +0300, Kirill A. Shutemov wrote:
> On Tue, Sep 01, 2015 at 09:38:03AM +1000, Dave Chinner wrote:
> > On Mon, Aug 31, 2015 at 12:59:44PM -0600, Ross Zwisler wrote:
> > Even for DAX, msync has to call vfs_fsync_range() for the filesystem to
> > commit
> > the
On Tue, Sep 01, 2015 at 09:38:03AM +1000, Dave Chinner wrote:
> On Mon, Aug 31, 2015 at 12:59:44PM -0600, Ross Zwisler wrote:
> > For DAX msync we just need to flush the given range using
> > wb_cache_pmem(), which is now a public part of the PMEM API.
>
> This is wrong, because it still leaves
On 09/01/2015 01:08 PM, Kirill A. Shutemov wrote:
<>
>
> Is that because XFS doesn't provide vm_ops->pfn_mkwrite?
>
Right that would explain it, because I sent that patch exactly to solve
this problem. I haven't looked at latest code for a while but I should
checkout the latest and make a patch
On 09/01/2015 10:06 AM, Christoph Hellwig wrote:
> On Tue, Sep 01, 2015 at 09:38:03AM +1000, Dave Chinner wrote:
>> On Mon, Aug 31, 2015 at 12:59:44PM -0600, Ross Zwisler wrote:
>>> For DAX msync we just need to flush the given range using
>>> wb_cache_pmem(), which is now a public part of the
On Tue, Sep 01, 2015 at 09:19:45PM -0600, Ross Zwisler wrote:
> On Wed, Sep 02, 2015 at 08:21:20AM +1000, Dave Chinner wrote:
> > Which means applications that should "just work" without
> > modification on DAX are now subtly broken and don't actually
> > guarantee data is safe after a crash.
On 08/31/2015 09:59 PM, Ross Zwisler wrote:
> For DAX msync we just need to flush the given range using
> wb_cache_pmem(), which is now a public part of the PMEM API.
>
> The inclusion of in fs/dax.c was done to make checkpatch
> happy. Previously it was complaining about a bunch of undeclared
On Mon, Aug 31, 2015 at 12:59:44PM -0600, Ross Zwisler wrote:
> For DAX msync we just need to flush the given range using
> wb_cache_pmem(), which is now a public part of the PMEM API.
This is wrong, because it still leaves fsync() broken on dax.
Flushing dirty data to stable storage is the
On Mon, Aug 31, 2015 at 01:26:40PM -0600, Ross Zwisler wrote:
> > Should this be abstracted by adding a ->msync method? Maybe not
> > worth to do for now, but it might be worth to keep that in mind.
>
> Where would we add the ->msync method? Do you mean to the PMEM API, or
> somewhere else?
On Mon, Aug 31, 2015 at 09:06:19PM +0200, Christoph Hellwig wrote:
> On Mon, Aug 31, 2015 at 12:59:44PM -0600, Ross Zwisler wrote:
> > For DAX msync we just need to flush the given range using
> > wb_cache_pmem(), which is now a public part of the PMEM API.
> >
> > The inclusion of in fs/dax.c
On Mon, Aug 31, 2015 at 12:59:44PM -0600, Ross Zwisler wrote:
> For DAX msync we just need to flush the given range using
> wb_cache_pmem(), which is now a public part of the PMEM API.
>
> The inclusion of in fs/dax.c was done to make checkpatch
> happy. Previously it was complaining about a
On Mon, Aug 31, 2015 at 12:59:44PM -0600, Ross Zwisler wrote:
> For DAX msync we just need to flush the given range using
> wb_cache_pmem(), which is now a public part of the PMEM API.
>
> The inclusion of in fs/dax.c was done to make checkpatch
> happy. Previously it was complaining about a
On Mon, Aug 31, 2015 at 09:06:19PM +0200, Christoph Hellwig wrote:
> On Mon, Aug 31, 2015 at 12:59:44PM -0600, Ross Zwisler wrote:
> > For DAX msync we just need to flush the given range using
> > wb_cache_pmem(), which is now a public part of the PMEM API.
> >
> > The inclusion of in fs/dax.c
On Mon, Aug 31, 2015 at 01:26:40PM -0600, Ross Zwisler wrote:
> > Should this be abstracted by adding a ->msync method? Maybe not
> > worth to do for now, but it might be worth to keep that in mind.
>
> Where would we add the ->msync method? Do you mean to the PMEM API, or
> somewhere else?
On Mon, Aug 31, 2015 at 12:59:44PM -0600, Ross Zwisler wrote:
> For DAX msync we just need to flush the given range using
> wb_cache_pmem(), which is now a public part of the PMEM API.
This is wrong, because it still leaves fsync() broken on dax.
Flushing dirty data to stable storage is the
64 matches
Mail list logo