Re: [PATCH 2/2] Updating ctime and mtime at syncing

2008-01-15 Thread Peter Zijlstra

On Tue, 2008-01-15 at 20:18 +0300, Anton Salikhmetov wrote:
> 2008/1/15, Peter Zijlstra <[EMAIL PROTECTED]>:
> >
> > On Tue, 2008-01-15 at 19:02 +0300, Anton Salikhmetov wrote:
> >
> > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > > index 3d3848f..53d0e34 100644
> > > --- a/mm/page-writeback.c
> > > +++ b/mm/page-writeback.c
> > > @@ -997,35 +997,39 @@ int __set_page_dirty_no_writeback(struct page *page)
> > >   */
> > >  int __set_page_dirty_nobuffers(struct page *page)
> > >  {
> > > - if (!TestSetPageDirty(page)) {
> > > - struct address_space *mapping = page_mapping(page);
> > > - struct address_space *mapping2;
> > > + struct address_space *mapping = page_mapping(page);
> > > + struct address_space *mapping2;
> > >
> > > - if (!mapping)
> > > - return 1;
> > > + if (!mapping)
> > > + return 1;
> > >
> > > - write_lock_irq(>tree_lock);
> > > - mapping2 = page_mapping(page);
> > > - if (mapping2) { /* Race with truncate? */
> > > - BUG_ON(mapping2 != mapping);
> > > - WARN_ON_ONCE(!PagePrivate(page) && 
> > > !PageUptodate(page));
> > > - if (mapping_cap_account_dirty(mapping)) {
> > > - __inc_zone_page_state(page, NR_FILE_DIRTY);
> > > - __inc_bdi_stat(mapping->backing_dev_info,
> > > - BDI_RECLAIMABLE);
> > > - task_io_account_write(PAGE_CACHE_SIZE);
> > > - }
> > > - radix_tree_tag_set(>page_tree,
> > > - page_index(page), PAGECACHE_TAG_DIRTY);
> > > - }
> > > - write_unlock_irq(>tree_lock);
> > > - if (mapping->host) {
> > > - /* !PageAnon && !swapper_space */
> > > - __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
> > > + mapping->mtime = CURRENT_TIME;
> > > + set_bit(AS_MCTIME, >flags);
> >
> > This seems vulnerable to the race we have against truncate, handled by
> > the mapping2 magic below. Do we care?
> >
> > > +
> > > + if (TestSetPageDirty(page))
> > > + return 0;
> > > +
> > > + write_lock_irq(>tree_lock);
> > > + mapping2 = page_mapping(page);
> > > + if (mapping2) {
> > > + /* Race with truncate? */
> > > + BUG_ON(mapping2 != mapping);
> > > + WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page));
> > > + if (mapping_cap_account_dirty(mapping)) {
> > > + __inc_zone_page_state(page, NR_FILE_DIRTY);
> > > + __inc_bdi_stat(mapping->backing_dev_info,
> > > + BDI_RECLAIMABLE);
> > > + task_io_account_write(PAGE_CACHE_SIZE);
> > >   }
> > > - return 1;
> > > + radix_tree_tag_set(>page_tree,
> > > + page_index(page), PAGECACHE_TAG_DIRTY);
> > >   }
> > > - return 0;
> > > + write_unlock_irq(>tree_lock);
> > > +
> > > + if (mapping->host)
> > > + __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
> 
> The inode gets marked dirty using the same "mapping" variable
> as my code does. So, AFAIU, my change does not introduce any new
> vulnerabilities. I would nevertherless be grateful to you for a scenario
> where the race would be triggered.

Ah, right, so that would be a resounding no to my previous question :-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] Updating ctime and mtime at syncing

2008-01-15 Thread Anton Salikhmetov
2008/1/15, Christoph Hellwig <[EMAIL PROTECTED]>:
> On Tue, Jan 15, 2008 at 07:02:45PM +0300, Anton Salikhmetov wrote:
> > +/*
> > + * Update the ctime and mtime stamps for memory-mapped block device files.
> > + */
> > +static void bd_inode_update_time(struct inode *inode, struct timespec *ts)
> > +{
> > + struct block_device *bdev = inode->i_bdev;
> > + struct list_head *p;
> > +
> > + if (bdev == NULL)
> > + return;
>
> inode->i_bdev is never NULL for inodes currently beeing written to.
>
> > +
> > + mutex_lock(>bd_mutex);
> > + list_for_each(p, >bd_inodes) {
> > + inode = list_entry(p, struct inode, i_devices);
>
> this should use list_for_each_entry.
>
>

Thank you very much for your recommenations. I'll take them into account.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] Updating ctime and mtime at syncing

2008-01-15 Thread Christoph Hellwig
On Tue, Jan 15, 2008 at 07:02:45PM +0300, Anton Salikhmetov wrote:
> +/*
> + * Update the ctime and mtime stamps for memory-mapped block device files.
> + */
> +static void bd_inode_update_time(struct inode *inode, struct timespec *ts)
> +{
> + struct block_device *bdev = inode->i_bdev;
> + struct list_head *p;
> +
> + if (bdev == NULL)
> + return;

inode->i_bdev is never NULL for inodes currently beeing written to.

> +
> + mutex_lock(>bd_mutex);
> + list_for_each(p, >bd_inodes) {
> + inode = list_entry(p, struct inode, i_devices);

this should use list_for_each_entry.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] Updating ctime and mtime at syncing

2008-01-15 Thread Anton Salikhmetov
2008/1/15, Peter Zijlstra <[EMAIL PROTECTED]>:
>
> On Tue, 2008-01-15 at 19:02 +0300, Anton Salikhmetov wrote:
>
> > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > index 3d3848f..53d0e34 100644
> > --- a/mm/page-writeback.c
> > +++ b/mm/page-writeback.c
> > @@ -997,35 +997,39 @@ int __set_page_dirty_no_writeback(struct page *page)
> >   */
> >  int __set_page_dirty_nobuffers(struct page *page)
> >  {
> > - if (!TestSetPageDirty(page)) {
> > - struct address_space *mapping = page_mapping(page);
> > - struct address_space *mapping2;
> > + struct address_space *mapping = page_mapping(page);
> > + struct address_space *mapping2;
> >
> > - if (!mapping)
> > - return 1;
> > + if (!mapping)
> > + return 1;
> >
> > - write_lock_irq(>tree_lock);
> > - mapping2 = page_mapping(page);
> > - if (mapping2) { /* Race with truncate? */
> > - BUG_ON(mapping2 != mapping);
> > - WARN_ON_ONCE(!PagePrivate(page) && 
> > !PageUptodate(page));
> > - if (mapping_cap_account_dirty(mapping)) {
> > - __inc_zone_page_state(page, NR_FILE_DIRTY);
> > - __inc_bdi_stat(mapping->backing_dev_info,
> > - BDI_RECLAIMABLE);
> > - task_io_account_write(PAGE_CACHE_SIZE);
> > - }
> > - radix_tree_tag_set(>page_tree,
> > - page_index(page), PAGECACHE_TAG_DIRTY);
> > - }
> > - write_unlock_irq(>tree_lock);
> > - if (mapping->host) {
> > - /* !PageAnon && !swapper_space */
> > - __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
> > + mapping->mtime = CURRENT_TIME;
> > + set_bit(AS_MCTIME, >flags);
>
> This seems vulnerable to the race we have against truncate, handled by
> the mapping2 magic below. Do we care?
>
> > +
> > + if (TestSetPageDirty(page))
> > + return 0;
> > +
> > + write_lock_irq(>tree_lock);
> > + mapping2 = page_mapping(page);
> > + if (mapping2) {
> > + /* Race with truncate? */
> > + BUG_ON(mapping2 != mapping);
> > + WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page));
> > + if (mapping_cap_account_dirty(mapping)) {
> > + __inc_zone_page_state(page, NR_FILE_DIRTY);
> > + __inc_bdi_stat(mapping->backing_dev_info,
> > + BDI_RECLAIMABLE);
> > + task_io_account_write(PAGE_CACHE_SIZE);
> >   }
> > - return 1;
> > + radix_tree_tag_set(>page_tree,
> > + page_index(page), PAGECACHE_TAG_DIRTY);
> >   }
> > - return 0;
> > + write_unlock_irq(>tree_lock);
> > +
> > + if (mapping->host)
> > + __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);

The inode gets marked dirty using the same "mapping" variable
as my code does. So, AFAIU, my change does not introduce any new
vulnerabilities. I would nevertherless be grateful to you for a scenario
where the race would be triggered.

> > +
> > + return 1;
> >  }
> >  EXPORT_SYMBOL(__set_page_dirty_nobuffers);
> >
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] Updating ctime and mtime at syncing

2008-01-15 Thread Peter Zijlstra

On Tue, 2008-01-15 at 19:02 +0300, Anton Salikhmetov wrote:

> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 3d3848f..53d0e34 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -997,35 +997,39 @@ int __set_page_dirty_no_writeback(struct page *page)
>   */
>  int __set_page_dirty_nobuffers(struct page *page)
>  {
> - if (!TestSetPageDirty(page)) {
> - struct address_space *mapping = page_mapping(page);
> - struct address_space *mapping2;
> + struct address_space *mapping = page_mapping(page);
> + struct address_space *mapping2;
>  
> - if (!mapping)
> - return 1;
> + if (!mapping)
> + return 1;
>  
> - write_lock_irq(>tree_lock);
> - mapping2 = page_mapping(page);
> - if (mapping2) { /* Race with truncate? */
> - BUG_ON(mapping2 != mapping);
> - WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page));
> - if (mapping_cap_account_dirty(mapping)) {
> - __inc_zone_page_state(page, NR_FILE_DIRTY);
> - __inc_bdi_stat(mapping->backing_dev_info,
> - BDI_RECLAIMABLE);
> - task_io_account_write(PAGE_CACHE_SIZE);
> - }
> - radix_tree_tag_set(>page_tree,
> - page_index(page), PAGECACHE_TAG_DIRTY);
> - }
> - write_unlock_irq(>tree_lock);
> - if (mapping->host) {
> - /* !PageAnon && !swapper_space */
> - __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
> + mapping->mtime = CURRENT_TIME;
> + set_bit(AS_MCTIME, >flags);

This seems vulnerable to the race we have against truncate, handled by
the mapping2 magic below. Do we care?

> +
> + if (TestSetPageDirty(page))
> + return 0;
> +
> + write_lock_irq(>tree_lock);
> + mapping2 = page_mapping(page);
> + if (mapping2) {
> + /* Race with truncate? */
> + BUG_ON(mapping2 != mapping);
> + WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page));
> + if (mapping_cap_account_dirty(mapping)) {
> + __inc_zone_page_state(page, NR_FILE_DIRTY);
> + __inc_bdi_stat(mapping->backing_dev_info,
> + BDI_RECLAIMABLE);
> + task_io_account_write(PAGE_CACHE_SIZE);
>   }
> - return 1;
> + radix_tree_tag_set(>page_tree,
> + page_index(page), PAGECACHE_TAG_DIRTY);
>   }
> - return 0;
> + write_unlock_irq(>tree_lock);
> +
> + if (mapping->host)
> + __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
> +
> + return 1;
>  }
>  EXPORT_SYMBOL(__set_page_dirty_nobuffers);
>  

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] updating ctime and mtime at syncing

2008-01-15 Thread Anton Salikhmetov
2008/1/15, Miklos Szeredi <[EMAIL PROTECTED]>:
> > Thanks for your review, Peter and Miklos!
> >
> > I overlooked this case when AS_MCTIME flag has been turned off and the
> > page is still dirty.
> >
> > On the other hand, the words "shall be marked for update" may be
> > considered as just setting the AS_MCTIME flag, not updating the time
> > stamps.
> >
> > What do you think about calling mapping_update_time() inside of "if
> > (MS_SYNC & flags)"? I suggest such change because the code for
> > analysis of the case you've mentioned above seems impossible to me.
>
> I think that's a good idea.  As a first iteration, just updating the
> mtime/ctime in msync(MS_SYNC) and remove_vma() (called at munmap time)
> would be a big improvement over what we currently have.
>
> I would also recommend, that you drop mapping_update_time() and the
> related functions from the patch, and just use file_update_time()
> instead.

Thank you for your recommendations. I will submit my new solution shortly.

By the way, I've already changed the unlink_file_vma() function.

>
> Miklos
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] updating ctime and mtime at syncing

2008-01-15 Thread Miklos Szeredi
> Thanks for your review, Peter and Miklos!
> 
> I overlooked this case when AS_MCTIME flag has been turned off and the
> page is still dirty.
> 
> On the other hand, the words "shall be marked for update" may be
> considered as just setting the AS_MCTIME flag, not updating the time
> stamps.
> 
> What do you think about calling mapping_update_time() inside of "if
> (MS_SYNC & flags)"? I suggest such change because the code for
> analysis of the case you've mentioned above seems impossible to me.

I think that's a good idea.  As a first iteration, just updating the
mtime/ctime in msync(MS_SYNC) and remove_vma() (called at munmap time)
would be a big improvement over what we currently have.

I would also recommend, that you drop mapping_update_time() and the
related functions from the patch, and just use file_update_time()
instead.

Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] updating ctime and mtime at syncing

2008-01-15 Thread Miklos Szeredi
 Thanks for your review, Peter and Miklos!
 
 I overlooked this case when AS_MCTIME flag has been turned off and the
 page is still dirty.
 
 On the other hand, the words shall be marked for update may be
 considered as just setting the AS_MCTIME flag, not updating the time
 stamps.
 
 What do you think about calling mapping_update_time() inside of if
 (MS_SYNC  flags)? I suggest such change because the code for
 analysis of the case you've mentioned above seems impossible to me.

I think that's a good idea.  As a first iteration, just updating the
mtime/ctime in msync(MS_SYNC) and remove_vma() (called at munmap time)
would be a big improvement over what we currently have.

I would also recommend, that you drop mapping_update_time() and the
related functions from the patch, and just use file_update_time()
instead.

Miklos
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] updating ctime and mtime at syncing

2008-01-15 Thread Anton Salikhmetov
2008/1/15, Miklos Szeredi [EMAIL PROTECTED]:
  Thanks for your review, Peter and Miklos!
 
  I overlooked this case when AS_MCTIME flag has been turned off and the
  page is still dirty.
 
  On the other hand, the words shall be marked for update may be
  considered as just setting the AS_MCTIME flag, not updating the time
  stamps.
 
  What do you think about calling mapping_update_time() inside of if
  (MS_SYNC  flags)? I suggest such change because the code for
  analysis of the case you've mentioned above seems impossible to me.

 I think that's a good idea.  As a first iteration, just updating the
 mtime/ctime in msync(MS_SYNC) and remove_vma() (called at munmap time)
 would be a big improvement over what we currently have.

 I would also recommend, that you drop mapping_update_time() and the
 related functions from the patch, and just use file_update_time()
 instead.

Thank you for your recommendations. I will submit my new solution shortly.

By the way, I've already changed the unlink_file_vma() function.


 Miklos

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] Updating ctime and mtime at syncing

2008-01-15 Thread Peter Zijlstra

On Tue, 2008-01-15 at 19:02 +0300, Anton Salikhmetov wrote:

 diff --git a/mm/page-writeback.c b/mm/page-writeback.c
 index 3d3848f..53d0e34 100644
 --- a/mm/page-writeback.c
 +++ b/mm/page-writeback.c
 @@ -997,35 +997,39 @@ int __set_page_dirty_no_writeback(struct page *page)
   */
  int __set_page_dirty_nobuffers(struct page *page)
  {
 - if (!TestSetPageDirty(page)) {
 - struct address_space *mapping = page_mapping(page);
 - struct address_space *mapping2;
 + struct address_space *mapping = page_mapping(page);
 + struct address_space *mapping2;
  
 - if (!mapping)
 - return 1;
 + if (!mapping)
 + return 1;
  
 - write_lock_irq(mapping-tree_lock);
 - mapping2 = page_mapping(page);
 - if (mapping2) { /* Race with truncate? */
 - BUG_ON(mapping2 != mapping);
 - WARN_ON_ONCE(!PagePrivate(page)  !PageUptodate(page));
 - if (mapping_cap_account_dirty(mapping)) {
 - __inc_zone_page_state(page, NR_FILE_DIRTY);
 - __inc_bdi_stat(mapping-backing_dev_info,
 - BDI_RECLAIMABLE);
 - task_io_account_write(PAGE_CACHE_SIZE);
 - }
 - radix_tree_tag_set(mapping-page_tree,
 - page_index(page), PAGECACHE_TAG_DIRTY);
 - }
 - write_unlock_irq(mapping-tree_lock);
 - if (mapping-host) {
 - /* !PageAnon  !swapper_space */
 - __mark_inode_dirty(mapping-host, I_DIRTY_PAGES);
 + mapping-mtime = CURRENT_TIME;
 + set_bit(AS_MCTIME, mapping-flags);

This seems vulnerable to the race we have against truncate, handled by
the mapping2 magic below. Do we care?

 +
 + if (TestSetPageDirty(page))
 + return 0;
 +
 + write_lock_irq(mapping-tree_lock);
 + mapping2 = page_mapping(page);
 + if (mapping2) {
 + /* Race with truncate? */
 + BUG_ON(mapping2 != mapping);
 + WARN_ON_ONCE(!PagePrivate(page)  !PageUptodate(page));
 + if (mapping_cap_account_dirty(mapping)) {
 + __inc_zone_page_state(page, NR_FILE_DIRTY);
 + __inc_bdi_stat(mapping-backing_dev_info,
 + BDI_RECLAIMABLE);
 + task_io_account_write(PAGE_CACHE_SIZE);
   }
 - return 1;
 + radix_tree_tag_set(mapping-page_tree,
 + page_index(page), PAGECACHE_TAG_DIRTY);
   }
 - return 0;
 + write_unlock_irq(mapping-tree_lock);
 +
 + if (mapping-host)
 + __mark_inode_dirty(mapping-host, I_DIRTY_PAGES);
 +
 + return 1;
  }
  EXPORT_SYMBOL(__set_page_dirty_nobuffers);
  

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] Updating ctime and mtime at syncing

2008-01-15 Thread Anton Salikhmetov
2008/1/15, Peter Zijlstra [EMAIL PROTECTED]:

 On Tue, 2008-01-15 at 19:02 +0300, Anton Salikhmetov wrote:

  diff --git a/mm/page-writeback.c b/mm/page-writeback.c
  index 3d3848f..53d0e34 100644
  --- a/mm/page-writeback.c
  +++ b/mm/page-writeback.c
  @@ -997,35 +997,39 @@ int __set_page_dirty_no_writeback(struct page *page)
*/
   int __set_page_dirty_nobuffers(struct page *page)
   {
  - if (!TestSetPageDirty(page)) {
  - struct address_space *mapping = page_mapping(page);
  - struct address_space *mapping2;
  + struct address_space *mapping = page_mapping(page);
  + struct address_space *mapping2;
 
  - if (!mapping)
  - return 1;
  + if (!mapping)
  + return 1;
 
  - write_lock_irq(mapping-tree_lock);
  - mapping2 = page_mapping(page);
  - if (mapping2) { /* Race with truncate? */
  - BUG_ON(mapping2 != mapping);
  - WARN_ON_ONCE(!PagePrivate(page)  
  !PageUptodate(page));
  - if (mapping_cap_account_dirty(mapping)) {
  - __inc_zone_page_state(page, NR_FILE_DIRTY);
  - __inc_bdi_stat(mapping-backing_dev_info,
  - BDI_RECLAIMABLE);
  - task_io_account_write(PAGE_CACHE_SIZE);
  - }
  - radix_tree_tag_set(mapping-page_tree,
  - page_index(page), PAGECACHE_TAG_DIRTY);
  - }
  - write_unlock_irq(mapping-tree_lock);
  - if (mapping-host) {
  - /* !PageAnon  !swapper_space */
  - __mark_inode_dirty(mapping-host, I_DIRTY_PAGES);
  + mapping-mtime = CURRENT_TIME;
  + set_bit(AS_MCTIME, mapping-flags);

 This seems vulnerable to the race we have against truncate, handled by
 the mapping2 magic below. Do we care?

  +
  + if (TestSetPageDirty(page))
  + return 0;
  +
  + write_lock_irq(mapping-tree_lock);
  + mapping2 = page_mapping(page);
  + if (mapping2) {
  + /* Race with truncate? */
  + BUG_ON(mapping2 != mapping);
  + WARN_ON_ONCE(!PagePrivate(page)  !PageUptodate(page));
  + if (mapping_cap_account_dirty(mapping)) {
  + __inc_zone_page_state(page, NR_FILE_DIRTY);
  + __inc_bdi_stat(mapping-backing_dev_info,
  + BDI_RECLAIMABLE);
  + task_io_account_write(PAGE_CACHE_SIZE);
}
  - return 1;
  + radix_tree_tag_set(mapping-page_tree,
  + page_index(page), PAGECACHE_TAG_DIRTY);
}
  - return 0;
  + write_unlock_irq(mapping-tree_lock);
  +
  + if (mapping-host)
  + __mark_inode_dirty(mapping-host, I_DIRTY_PAGES);

The inode gets marked dirty using the same mapping variable
as my code does. So, AFAIU, my change does not introduce any new
vulnerabilities. I would nevertherless be grateful to you for a scenario
where the race would be triggered.

  +
  + return 1;
   }
   EXPORT_SYMBOL(__set_page_dirty_nobuffers);
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] Updating ctime and mtime at syncing

2008-01-15 Thread Christoph Hellwig
On Tue, Jan 15, 2008 at 07:02:45PM +0300, Anton Salikhmetov wrote:
 +/*
 + * Update the ctime and mtime stamps for memory-mapped block device files.
 + */
 +static void bd_inode_update_time(struct inode *inode, struct timespec *ts)
 +{
 + struct block_device *bdev = inode-i_bdev;
 + struct list_head *p;
 +
 + if (bdev == NULL)
 + return;

inode-i_bdev is never NULL for inodes currently beeing written to.

 +
 + mutex_lock(bdev-bd_mutex);
 + list_for_each(p, bdev-bd_inodes) {
 + inode = list_entry(p, struct inode, i_devices);

this should use list_for_each_entry.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] Updating ctime and mtime at syncing

2008-01-15 Thread Anton Salikhmetov
2008/1/15, Christoph Hellwig [EMAIL PROTECTED]:
 On Tue, Jan 15, 2008 at 07:02:45PM +0300, Anton Salikhmetov wrote:
  +/*
  + * Update the ctime and mtime stamps for memory-mapped block device files.
  + */
  +static void bd_inode_update_time(struct inode *inode, struct timespec *ts)
  +{
  + struct block_device *bdev = inode-i_bdev;
  + struct list_head *p;
  +
  + if (bdev == NULL)
  + return;

 inode-i_bdev is never NULL for inodes currently beeing written to.

  +
  + mutex_lock(bdev-bd_mutex);
  + list_for_each(p, bdev-bd_inodes) {
  + inode = list_entry(p, struct inode, i_devices);

 this should use list_for_each_entry.



Thank you very much for your recommenations. I'll take them into account.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] Updating ctime and mtime at syncing

2008-01-15 Thread Peter Zijlstra

On Tue, 2008-01-15 at 20:18 +0300, Anton Salikhmetov wrote:
 2008/1/15, Peter Zijlstra [EMAIL PROTECTED]:
 
  On Tue, 2008-01-15 at 19:02 +0300, Anton Salikhmetov wrote:
 
   diff --git a/mm/page-writeback.c b/mm/page-writeback.c
   index 3d3848f..53d0e34 100644
   --- a/mm/page-writeback.c
   +++ b/mm/page-writeback.c
   @@ -997,35 +997,39 @@ int __set_page_dirty_no_writeback(struct page *page)
 */
int __set_page_dirty_nobuffers(struct page *page)
{
   - if (!TestSetPageDirty(page)) {
   - struct address_space *mapping = page_mapping(page);
   - struct address_space *mapping2;
   + struct address_space *mapping = page_mapping(page);
   + struct address_space *mapping2;
  
   - if (!mapping)
   - return 1;
   + if (!mapping)
   + return 1;
  
   - write_lock_irq(mapping-tree_lock);
   - mapping2 = page_mapping(page);
   - if (mapping2) { /* Race with truncate? */
   - BUG_ON(mapping2 != mapping);
   - WARN_ON_ONCE(!PagePrivate(page)  
   !PageUptodate(page));
   - if (mapping_cap_account_dirty(mapping)) {
   - __inc_zone_page_state(page, NR_FILE_DIRTY);
   - __inc_bdi_stat(mapping-backing_dev_info,
   - BDI_RECLAIMABLE);
   - task_io_account_write(PAGE_CACHE_SIZE);
   - }
   - radix_tree_tag_set(mapping-page_tree,
   - page_index(page), PAGECACHE_TAG_DIRTY);
   - }
   - write_unlock_irq(mapping-tree_lock);
   - if (mapping-host) {
   - /* !PageAnon  !swapper_space */
   - __mark_inode_dirty(mapping-host, I_DIRTY_PAGES);
   + mapping-mtime = CURRENT_TIME;
   + set_bit(AS_MCTIME, mapping-flags);
 
  This seems vulnerable to the race we have against truncate, handled by
  the mapping2 magic below. Do we care?
 
   +
   + if (TestSetPageDirty(page))
   + return 0;
   +
   + write_lock_irq(mapping-tree_lock);
   + mapping2 = page_mapping(page);
   + if (mapping2) {
   + /* Race with truncate? */
   + BUG_ON(mapping2 != mapping);
   + WARN_ON_ONCE(!PagePrivate(page)  !PageUptodate(page));
   + if (mapping_cap_account_dirty(mapping)) {
   + __inc_zone_page_state(page, NR_FILE_DIRTY);
   + __inc_bdi_stat(mapping-backing_dev_info,
   + BDI_RECLAIMABLE);
   + task_io_account_write(PAGE_CACHE_SIZE);
 }
   - return 1;
   + radix_tree_tag_set(mapping-page_tree,
   + page_index(page), PAGECACHE_TAG_DIRTY);
 }
   - return 0;
   + write_unlock_irq(mapping-tree_lock);
   +
   + if (mapping-host)
   + __mark_inode_dirty(mapping-host, I_DIRTY_PAGES);
 
 The inode gets marked dirty using the same mapping variable
 as my code does. So, AFAIU, my change does not introduce any new
 vulnerabilities. I would nevertherless be grateful to you for a scenario
 where the race would be triggered.

Ah, right, so that would be a resounding no to my previous question :-)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] updating ctime and mtime at syncing

2008-01-14 Thread Anton Salikhmetov
2008/1/14, Miklos Szeredi <[EMAIL PROTECTED]>:
> > 2008/1/14, Miklos Szeredi <[EMAIL PROTECTED]>:
> > > > http://bugzilla.kernel.org/show_bug.cgi?id=2645
> > > >
> > > > Changes for updating the ctime and mtime fields for memory-mapped files:
> > > >
> > > > 1) new flag triggering update of the inode data;
> > > > 2) new function to update ctime and mtime for block device files;
> > > > 3) new helper function to update ctime and mtime when needed;
> > > > 4) updating time stamps for mapped files in sys_msync() and do_fsync();
> > > > 5) implementing the feature of auto-updating ctime and mtime.
> > >
> > > How exactly is this done?
> > >
> > > Is this catering for this case:
> > >
> > >  1 page is dirtied through mapping
> > >  2 app calls msync(MS_ASYNC)
> > >  3 page is written again through mapping
> > >  4 app calls msync(MS_ASYNC)
> > >  5 ...
> > >  6 page is written back
> > >
> > > What happens at 4?  Do we care about this one at all?
> >
> > The POSIX standard requires updating the file times every time when msync()
> > is called with MS_ASYNC. I.e. the time stamps should be updated even
> > when no physical synchronization is being done immediately.
>
> Yes.  However, on linux MS_ASYNC is basically a no-op, and without
> doing _something_ with the dirty pages (which afaics your patch
> doesn't do), it's impossible to observe later writes to the same page.
>
> I don't advocate full POSIX conformance anymore, because it's probably
> too expensive to do (I've tried).  Rather than that, we should
> probably find some sane compromise, that just fixes the real life
> issue.  Here's a pointer to the thread about this:
>
> http://lkml.org/lkml/2007/3/27/55
>
> Your patch may be a good soultion, but you should describe in detail
> what it does when pages are dirtied, and when msync/fsync are called,
> and what happens with multiple msync calls that I've asked about.
>
> I suspect your patch is ignoring writes after the first msync, but
> then why care about msync at all?  What's so special about the _first_
> msync?  Is it just that most test programs only check this, and not
> what happens if msync is called more than once?  That would be a bug
> in the test cases.
>
> > > > +/*
> > > > + * Update the ctime and mtime stamps for memory-mapped block device 
> > > > files.
> > > > + */
> > > > +static void bd_inode_update_time(struct inode *inode)
> > > > +{
> > > > + struct block_device *bdev = inode->i_bdev;
> > > > + struct list_head *p;
> > > > +
> > > > + if (bdev == NULL)
> > > > + return;
> > > > +
> > > > + mutex_lock(>bd_mutex);
> > > > + list_for_each(p, >bd_inodes) {
> > > > + inode = list_entry(p, struct inode, i_devices);
> > > > + inode_update_time(inode);
> > > > + }
> > > > + mutex_unlock(>bd_mutex);
> > > > +}
> > >
> > > Umm, why not just call with file->f_dentry->d_inode, so that you don't
> > > need to do this ugly search for the physical inode?  The file pointer
> > > is available in both msync and fsync.
> >
> > I'm not sure if I undestood your question. I see two possible
> > interpretations for this question, and I'm answering both.
> >
> > The intention was to make the data changes in the block device data
> > visible to all device files associated with the block device. Hence
> > the search, because the time stamps for all such device files should
> > be updated as well.
>
> Ahh, but it will only update "active" devices, which are currently
> open, no?  Is that what we want?
>
> > Not only the sys_msync() and do_fsync() routines call the helper
> > function mapping_update_time().
>
> Ah yes, __sync_single_inode() calls it as well.  Why?

The __sync_single_inode() function calls mapping_update_time()
to enable the "auto-updating" feature discussed earlier.

>
> Miklos
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] updating ctime and mtime at syncing

2008-01-14 Thread Anton Salikhmetov
2008/1/14, Peter Zijlstra <[EMAIL PROTECTED]>:
>
> On Mon, 2008-01-14 at 14:14 +0100, Miklos Szeredi wrote:
> > > 2008/1/14, Miklos Szeredi <[EMAIL PROTECTED]>:
> > > > > http://bugzilla.kernel.org/show_bug.cgi?id=2645
> > > > >
> > > > > Changes for updating the ctime and mtime fields for memory-mapped 
> > > > > files:
> > > > >
> > > > > 1) new flag triggering update of the inode data;
> > > > > 2) new function to update ctime and mtime for block device files;
> > > > > 3) new helper function to update ctime and mtime when needed;
> > > > > 4) updating time stamps for mapped files in sys_msync() and 
> > > > > do_fsync();
> > > > > 5) implementing the feature of auto-updating ctime and mtime.
> > > >
> > > > How exactly is this done?
> > > >
> > > > Is this catering for this case:
> > > >
> > > >  1 page is dirtied through mapping
> > > >  2 app calls msync(MS_ASYNC)
> > > >  3 page is written again through mapping
> > > >  4 app calls msync(MS_ASYNC)
> > > >  5 ...
> > > >  6 page is written back
> > > >
> > > > What happens at 4?  Do we care about this one at all?
> > >
> > > The POSIX standard requires updating the file times every time when 
> > > msync()
> > > is called with MS_ASYNC. I.e. the time stamps should be updated even
> > > when no physical synchronization is being done immediately.
> >
> > Yes.  However, on linux MS_ASYNC is basically a no-op, and without
> > doing _something_ with the dirty pages (which afaics your patch
> > doesn't do), it's impossible to observe later writes to the same page.
> >
> > I don't advocate full POSIX conformance anymore, because it's probably
> > too expensive to do (I've tried).  Rather than that, we should
> > probably find some sane compromise, that just fixes the real life
> > issue.  Here's a pointer to the thread about this:
> >
> > http://lkml.org/lkml/2007/3/27/55
> >
> > Your patch may be a good soultion, but you should describe in detail
> > what it does when pages are dirtied, and when msync/fsync are called,
> > and what happens with multiple msync calls that I've asked about.
> >
> > I suspect your patch is ignoring writes after the first msync, but
> > then why care about msync at all?  What's so special about the _first_
> > msync?  Is it just that most test programs only check this, and not
> > what happens if msync is called more than once?  That would be a bug
> > in the test cases.
>
> I must agree, doing the mmap dirty, MS_ASYNC, mmap retouch, MS_ASYNC
> case correctly would need a lot more code which I doubt is worth the
> effort.
>
> It would require scanning the PTEs and marking them read-only again on
> MS_ASYNC, and some more logic in set_page_dirty() because that currently
> bails out early if the page in question is already dirty.

Thanks for your review, Peter and Miklos!

I overlooked this case when AS_MCTIME flag has been turned off and the
page is still dirty.

On the other hand, the words "shall be marked for update" may be
considered as just setting the AS_MCTIME flag, not updating the time
stamps.

What do you think about calling mapping_update_time() inside of "if
(MS_SYNC & flags)"? I suggest such change because the code for
analysis of the case you've mentioned above seems impossible to me.

>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] updating ctime and mtime at syncing

2008-01-14 Thread Miklos Szeredi
> > More fun, it would require marking them RO but leaving the dirty bit
> > set, because this ext3 fudge where we confuse the page dirty state - or
> > did that get fixed?
> 
> That got fixed by Nick, I think.
> 
> The alternative to marking pages RO, is to walk the PTEs in MS_ASYNC,
> note the dirty bit and mark pages clean.  But it's possibly even more
  
 ptes, I mean
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] updating ctime and mtime at syncing

2008-01-14 Thread Miklos Szeredi
> More fun, it would require marking them RO but leaving the dirty bit
> set, because this ext3 fudge where we confuse the page dirty state - or
> did that get fixed?

That got fixed by Nick, I think.

The alternative to marking pages RO, is to walk the PTEs in MS_ASYNC,
note the dirty bit and mark pages clean.  But it's possibly even more
complicated.

Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] updating ctime and mtime at syncing

2008-01-14 Thread Peter Zijlstra

On Mon, 2008-01-14 at 14:35 +0100, Peter Zijlstra wrote:
> On Mon, 2008-01-14 at 14:14 +0100, Miklos Szeredi wrote:
> > > 2008/1/14, Miklos Szeredi <[EMAIL PROTECTED]>:
> > > > > http://bugzilla.kernel.org/show_bug.cgi?id=2645
> > > > >
> > > > > Changes for updating the ctime and mtime fields for memory-mapped 
> > > > > files:
> > > > >
> > > > > 1) new flag triggering update of the inode data;
> > > > > 2) new function to update ctime and mtime for block device files;
> > > > > 3) new helper function to update ctime and mtime when needed;
> > > > > 4) updating time stamps for mapped files in sys_msync() and 
> > > > > do_fsync();
> > > > > 5) implementing the feature of auto-updating ctime and mtime.
> > > >
> > > > How exactly is this done?
> > > >
> > > > Is this catering for this case:
> > > >
> > > >  1 page is dirtied through mapping
> > > >  2 app calls msync(MS_ASYNC)
> > > >  3 page is written again through mapping
> > > >  4 app calls msync(MS_ASYNC)
> > > >  5 ...
> > > >  6 page is written back
> > > >
> > > > What happens at 4?  Do we care about this one at all?
> > > 
> > > The POSIX standard requires updating the file times every time when 
> > > msync()
> > > is called with MS_ASYNC. I.e. the time stamps should be updated even
> > > when no physical synchronization is being done immediately.
> > 
> > Yes.  However, on linux MS_ASYNC is basically a no-op, and without
> > doing _something_ with the dirty pages (which afaics your patch
> > doesn't do), it's impossible to observe later writes to the same page.
> > 
> > I don't advocate full POSIX conformance anymore, because it's probably
> > too expensive to do (I've tried).  Rather than that, we should
> > probably find some sane compromise, that just fixes the real life
> > issue.  Here's a pointer to the thread about this:
> > 
> > http://lkml.org/lkml/2007/3/27/55
> > 
> > Your patch may be a good soultion, but you should describe in detail
> > what it does when pages are dirtied, and when msync/fsync are called,
> > and what happens with multiple msync calls that I've asked about.
> > 
> > I suspect your patch is ignoring writes after the first msync, but
> > then why care about msync at all?  What's so special about the _first_
> > msync?  Is it just that most test programs only check this, and not
> > what happens if msync is called more than once?  That would be a bug
> > in the test cases.
> 
> I must agree, doing the mmap dirty, MS_ASYNC, mmap retouch, MS_ASYNC
> case correctly would need a lot more code which I doubt is worth the
> effort.
> 
> It would require scanning the PTEs and marking them read-only again on
> MS_ASYNC, and some more logic in set_page_dirty() because that currently
> bails out early if the page in question is already dirty.

More fun, it would require marking them RO but leaving the dirty bit
set, because this ext3 fudge where we confuse the page dirty state - or
did that get fixed?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] updating ctime and mtime at syncing

2008-01-14 Thread Peter Zijlstra

On Mon, 2008-01-14 at 14:14 +0100, Miklos Szeredi wrote:
> > 2008/1/14, Miklos Szeredi <[EMAIL PROTECTED]>:
> > > > http://bugzilla.kernel.org/show_bug.cgi?id=2645
> > > >
> > > > Changes for updating the ctime and mtime fields for memory-mapped files:
> > > >
> > > > 1) new flag triggering update of the inode data;
> > > > 2) new function to update ctime and mtime for block device files;
> > > > 3) new helper function to update ctime and mtime when needed;
> > > > 4) updating time stamps for mapped files in sys_msync() and do_fsync();
> > > > 5) implementing the feature of auto-updating ctime and mtime.
> > >
> > > How exactly is this done?
> > >
> > > Is this catering for this case:
> > >
> > >  1 page is dirtied through mapping
> > >  2 app calls msync(MS_ASYNC)
> > >  3 page is written again through mapping
> > >  4 app calls msync(MS_ASYNC)
> > >  5 ...
> > >  6 page is written back
> > >
> > > What happens at 4?  Do we care about this one at all?
> > 
> > The POSIX standard requires updating the file times every time when msync()
> > is called with MS_ASYNC. I.e. the time stamps should be updated even
> > when no physical synchronization is being done immediately.
> 
> Yes.  However, on linux MS_ASYNC is basically a no-op, and without
> doing _something_ with the dirty pages (which afaics your patch
> doesn't do), it's impossible to observe later writes to the same page.
> 
> I don't advocate full POSIX conformance anymore, because it's probably
> too expensive to do (I've tried).  Rather than that, we should
> probably find some sane compromise, that just fixes the real life
> issue.  Here's a pointer to the thread about this:
> 
> http://lkml.org/lkml/2007/3/27/55
> 
> Your patch may be a good soultion, but you should describe in detail
> what it does when pages are dirtied, and when msync/fsync are called,
> and what happens with multiple msync calls that I've asked about.
> 
> I suspect your patch is ignoring writes after the first msync, but
> then why care about msync at all?  What's so special about the _first_
> msync?  Is it just that most test programs only check this, and not
> what happens if msync is called more than once?  That would be a bug
> in the test cases.

I must agree, doing the mmap dirty, MS_ASYNC, mmap retouch, MS_ASYNC
case correctly would need a lot more code which I doubt is worth the
effort.

It would require scanning the PTEs and marking them read-only again on
MS_ASYNC, and some more logic in set_page_dirty() because that currently
bails out early if the page in question is already dirty.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] updating ctime and mtime at syncing

2008-01-14 Thread Miklos Szeredi
> 2008/1/14, Miklos Szeredi <[EMAIL PROTECTED]>:
> > > http://bugzilla.kernel.org/show_bug.cgi?id=2645
> > >
> > > Changes for updating the ctime and mtime fields for memory-mapped files:
> > >
> > > 1) new flag triggering update of the inode data;
> > > 2) new function to update ctime and mtime for block device files;
> > > 3) new helper function to update ctime and mtime when needed;
> > > 4) updating time stamps for mapped files in sys_msync() and do_fsync();
> > > 5) implementing the feature of auto-updating ctime and mtime.
> >
> > How exactly is this done?
> >
> > Is this catering for this case:
> >
> >  1 page is dirtied through mapping
> >  2 app calls msync(MS_ASYNC)
> >  3 page is written again through mapping
> >  4 app calls msync(MS_ASYNC)
> >  5 ...
> >  6 page is written back
> >
> > What happens at 4?  Do we care about this one at all?
> 
> The POSIX standard requires updating the file times every time when msync()
> is called with MS_ASYNC. I.e. the time stamps should be updated even
> when no physical synchronization is being done immediately.

Yes.  However, on linux MS_ASYNC is basically a no-op, and without
doing _something_ with the dirty pages (which afaics your patch
doesn't do), it's impossible to observe later writes to the same page.

I don't advocate full POSIX conformance anymore, because it's probably
too expensive to do (I've tried).  Rather than that, we should
probably find some sane compromise, that just fixes the real life
issue.  Here's a pointer to the thread about this:

http://lkml.org/lkml/2007/3/27/55

Your patch may be a good soultion, but you should describe in detail
what it does when pages are dirtied, and when msync/fsync are called,
and what happens with multiple msync calls that I've asked about.

I suspect your patch is ignoring writes after the first msync, but
then why care about msync at all?  What's so special about the _first_
msync?  Is it just that most test programs only check this, and not
what happens if msync is called more than once?  That would be a bug
in the test cases.

> > > +/*
> > > + * Update the ctime and mtime stamps for memory-mapped block device 
> > > files.
> > > + */
> > > +static void bd_inode_update_time(struct inode *inode)
> > > +{
> > > + struct block_device *bdev = inode->i_bdev;
> > > + struct list_head *p;
> > > +
> > > + if (bdev == NULL)
> > > + return;
> > > +
> > > + mutex_lock(>bd_mutex);
> > > + list_for_each(p, >bd_inodes) {
> > > + inode = list_entry(p, struct inode, i_devices);
> > > + inode_update_time(inode);
> > > + }
> > > + mutex_unlock(>bd_mutex);
> > > +}
> >
> > Umm, why not just call with file->f_dentry->d_inode, so that you don't
> > need to do this ugly search for the physical inode?  The file pointer
> > is available in both msync and fsync.
> 
> I'm not sure if I undestood your question. I see two possible
> interpretations for this question, and I'm answering both.
> 
> The intention was to make the data changes in the block device data
> visible to all device files associated with the block device. Hence
> the search, because the time stamps for all such device files should
> be updated as well.

Ahh, but it will only update "active" devices, which are currently
open, no?  Is that what we want?

> Not only the sys_msync() and do_fsync() routines call the helper
> function mapping_update_time().

Ah yes, __sync_single_inode() calls it as well.  Why?

Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] updating ctime and mtime at syncing

2008-01-14 Thread Anton Salikhmetov
2008/1/14, Miklos Szeredi <[EMAIL PROTECTED]>:
> > > http://bugzilla.kernel.org/show_bug.cgi?id=2645
> > >
> > > Changes for updating the ctime and mtime fields for memory-mapped files:
> > >
> > > 1) new flag triggering update of the inode data;
> > > 2) new function to update ctime and mtime for block device files;
> > > 3) new helper function to update ctime and mtime when needed;
> > > 4) updating time stamps for mapped files in sys_msync() and do_fsync();
> > > 5) implementing the feature of auto-updating ctime and mtime.
> >
> > How exactly is this done?
> >
> > Is this catering for this case:
> >
> >  1 page is dirtied through mapping
> >  2 app calls msync(MS_ASYNC)
> >  3 page is written again through mapping
> >  4 app calls msync(MS_ASYNC)
> >  5 ...
> >  6 page is written back
> >
> > What happens at 4?  Do we care about this one at all?
>
> Oh, and here's a test program I wrote, that can be used to check this
> behavior.   It has two options:
>
>  -s   use MS_SYNC instead of MS_ASYNC
>  -f   fork and do the msync on a different mapping
>
> Back then I haven't found a single OS, that fully conformed to all the
> stupid POSIX rules regarding mmaps and ctime/mtime.

Thank you very much for sharing your code.

I'll integrate the MS_ASYNC and fork() test cases into my own unit test.

>
> Miklos
> 
>
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
>
> static const char *filename;
> static int msync_flag = MS_ASYNC;
> static int msync_fork = 0;
>
> static void print_times(const char *msg)
> {
> struct stat stbuf;
> stat(filename, );
> printf("%s\t%li\t%li\t%li\n", msg, stbuf.st_ctime, stbuf.st_mtime,
>stbuf.st_atime);
> }
>
> static void do_msync(void *addr, int len)
> {
> int res;
> if (!msync_fork) {
> res = msync(addr, len, msync_flag);
> if (res == -1) {
> perror("msync");
> exit(1);
> }
> } else {
> int pid = fork();
> if (pid == -1) {
> perror("fork");
> exit(1);
> }
> if (!pid) {
> int fd = open(filename, O_RDWR);
> if (fd == -1) {
> perror("open");
> exit(1);
> }
> addr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 
> 0);
> if (addr == MAP_FAILED) {
> perror("mmap");
> exit(1);
> }
> res = msync(addr, len, msync_flag);
> if (res == -1) {
> perror("msync");
> exit(1);
> }
> exit(0);
> }
> wait(NULL);
> }
> }
>
> static void usage(const char *progname)
> {
> fprintf(stderr, "usage: %s filename [-sf]\n", progname);
> exit(1);
> }
>
> int main(int argc, char *argv[])
> {
> int res;
> char *addr;
> int fd;
>
> if (argc < 2)
> usage(argv[0]);
>
> filename = argv[1];
> if (argc > 2) {
> if (argc > 3)
> usage(argv[0]);
> if (strcmp(argv[2], "-s") == 0)
> msync_flag = MS_SYNC;
> else if (strcmp(argv[2], "-f") == 0)
> msync_fork = 1;
> else if (strcmp(argv[2], "-sf") == 0 || strcmp(argv[2], "-fs") == 0) {
> msync_flag = MS_SYNC;
> msync_fork = 1;
> } else
> usage(argv[0]);
> }
>
> fd = open(filename, O_RDWR | O_TRUNC | O_CREAT, 0666);
> if (fd == -1) {
> perror(filename);
> return 1;
> }
> print_times("begin");
> sleep(1);
> write(fd, "\n", 4);
> print_times("write");
> sleep(1);
> addr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> if (addr == MAP_FAILED) {
> perror("mmap");
> return 1;
> }
> print_times("mmap");
> sleep(1);
>
> addr[1] = 'b';
> print_times("b");
> sleep(1);
> do_msync(addr, 4);
> print_times("msync b");
> sleep(1);
>
> addr[2] = 'c';
> print_times("c");
> sleep(1);
> do_msync(addr, 4);
> print_times("msync c");
> sleep(1);
>
> addr[3] = 'd';
> print_times("d");
> sleep(1);
> res = munmap(addr, 4);
> if (res == -1) {
> perror("munmap");
> return 1;
> }
> print_times("munmap");
> sleep(1);
>
> res = close(fd);
> if (res == -1) {
> perror("close");
> return 1;
> }
> print_times("close");
> sleep(1);
> sync();
> print_times("sync");
>
> return 0;
> }
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] updating ctime and mtime at syncing

2008-01-14 Thread Anton Salikhmetov
2008/1/14, Miklos Szeredi <[EMAIL PROTECTED]>:
> > http://bugzilla.kernel.org/show_bug.cgi?id=2645
> >
> > Changes for updating the ctime and mtime fields for memory-mapped files:
> >
> > 1) new flag triggering update of the inode data;
> > 2) new function to update ctime and mtime for block device files;
> > 3) new helper function to update ctime and mtime when needed;
> > 4) updating time stamps for mapped files in sys_msync() and do_fsync();
> > 5) implementing the feature of auto-updating ctime and mtime.
>
> How exactly is this done?
>
> Is this catering for this case:
>
>  1 page is dirtied through mapping
>  2 app calls msync(MS_ASYNC)
>  3 page is written again through mapping
>  4 app calls msync(MS_ASYNC)
>  5 ...
>  6 page is written back
>
> What happens at 4?  Do we care about this one at all?

The POSIX standard requires updating the file times every time when msync()
is called with MS_ASYNC. I.e. the time stamps should be updated even
when no physical synchronization is being done immediately.

At least, this is how I undestand the standard. Please tell me if I am wrong.

>
> More comments inline.
>
> > Signed-off-by: Anton Salikhmetov <[EMAIL PROTECTED]>
> > ---
> >  fs/buffer.c |1 +
> >  fs/fs-writeback.c   |2 ++
> >  fs/inode.c  |   42 +++---
> >  fs/sync.c   |2 ++
> >  include/linux/fs.h  |9 -
> >  include/linux/pagemap.h |3 ++-
> >  mm/msync.c  |   24 
> >  mm/page-writeback.c |1 +
> >  8 files changed, 67 insertions(+), 17 deletions(-)
> >
> > diff --git a/fs/buffer.c b/fs/buffer.c
> > index 7249e01..09adf7e 100644
> > --- a/fs/buffer.c
> > +++ b/fs/buffer.c
> > @@ -719,6 +719,7 @@ static int __set_page_dirty(struct page *page,
> >   }
> >   write_unlock_irq(>tree_lock);
> >   __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
> > + set_bit(AS_MCTIME, >flags);
> >
> >   return 1;
> >  }
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index 0fca820..c25ebd5 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -243,6 +243,8 @@ __sync_single_inode(struct inode *inode, struct 
> > writeback_control *wbc)
> >
> >   spin_unlock(_lock);
> >
> > + mapping_update_time(mapping);
> > +
> >   ret = do_writepages(mapping, wbc);
> >
> >   /* Don't write the inode if only I_DIRTY_PAGES was set */
> > diff --git a/fs/inode.c b/fs/inode.c
> > index ed35383..c02bfab 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -1243,8 +1243,8 @@ void touch_atime(struct vfsmount *mnt, struct dentry 
> > *dentry)
> >  EXPORT_SYMBOL(touch_atime);
> >
> >  /**
> > - *   file_update_time-   update mtime and ctime time
> > - *   @file: file accessed
> > + *   inode_update_time   -   update mtime and ctime time
> > + *   @inode: inode accessed
> >   *
> >   *   Update the mtime and ctime members of an inode and mark the inode
> >   *   for writeback.  Note that this function is meant exclusively for
> > @@ -1253,10 +1253,8 @@ EXPORT_SYMBOL(touch_atime);
> >   *   S_NOCTIME inode flag, e.g. for network filesystem where these
> >   *   timestamps are handled by the server.
> >   */
> > -
> > -void file_update_time(struct file *file)
> > +void inode_update_time(struct inode *inode)
> >  {
> > - struct inode *inode = file->f_path.dentry->d_inode;
> >   struct timespec now;
> >   int sync_it = 0;
> >
> > @@ -1279,8 +1277,39 @@ void file_update_time(struct file *file)
> >   if (sync_it)
> >   mark_inode_dirty_sync(inode);
> >  }
> > +EXPORT_SYMBOL(inode_update_time);
> > +
> > +/*
> > + * Update the ctime and mtime stamps for memory-mapped block device files.
> > + */
> > +static void bd_inode_update_time(struct inode *inode)
> > +{
> > + struct block_device *bdev = inode->i_bdev;
> > + struct list_head *p;
> > +
> > + if (bdev == NULL)
> > + return;
> > +
> > + mutex_lock(>bd_mutex);
> > + list_for_each(p, >bd_inodes) {
> > + inode = list_entry(p, struct inode, i_devices);
> > + inode_update_time(inode);
> > + }
> > + mutex_unlock(>bd_mutex);
> > +}
>
> Umm, why not just call with file->f_dentry->d_inode, so that you don't
> need to do this ugly search for the physical inode?  The file pointer
> is available in both msync and fsync.

I'm not sure if I undestood your question. I see two possible
interpretations for this
question, and I'm answering both.

The intention was to make the data changes in the block device data visible to
all device files associated with the block device. Hence the search,
because the time
stamps for all such device files should be updated as well.

Not only the sys_msync() and do_fsync() routines call the helper function
mapping_update_time(). Not all call sites of the helper function have the file
pointer available. Therefore, I pass the mapping pointer to the helper 

Re: [PATCH 2/2] updating ctime and mtime at syncing

2008-01-14 Thread Miklos Szeredi
> > http://bugzilla.kernel.org/show_bug.cgi?id=2645
> > 
> > Changes for updating the ctime and mtime fields for memory-mapped files:
> > 
> > 1) new flag triggering update of the inode data;
> > 2) new function to update ctime and mtime for block device files;
> > 3) new helper function to update ctime and mtime when needed;
> > 4) updating time stamps for mapped files in sys_msync() and do_fsync();
> > 5) implementing the feature of auto-updating ctime and mtime.
> 
> How exactly is this done?
> 
> Is this catering for this case:
> 
>  1 page is dirtied through mapping
>  2 app calls msync(MS_ASYNC)
>  3 page is written again through mapping
>  4 app calls msync(MS_ASYNC)
>  5 ...
>  6 page is written back
> 
> What happens at 4?  Do we care about this one at all?

Oh, and here's a test program I wrote, that can be used to check this
behavior.   It has two options:

 -s   use MS_SYNC instead of MS_ASYNC
 -f   fork and do the msync on a different mapping

Back then I haven't found a single OS, that fully conformed to all the
stupid POSIX rules regarding mmaps and ctime/mtime.

Miklos


#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

static const char *filename;
static int msync_flag = MS_ASYNC;
static int msync_fork = 0;

static void print_times(const char *msg)
{
struct stat stbuf;
stat(filename, );
printf("%s\t%li\t%li\t%li\n", msg, stbuf.st_ctime, stbuf.st_mtime,
   stbuf.st_atime);
}

static void do_msync(void *addr, int len)
{
int res;
if (!msync_fork) {
res = msync(addr, len, msync_flag);
if (res == -1) {
perror("msync");
exit(1);
}
} else {
int pid = fork();
if (pid == -1) {
perror("fork");
exit(1);
}
if (!pid) {
int fd = open(filename, O_RDWR);
if (fd == -1) {
perror("open");
exit(1);
}
addr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (addr == MAP_FAILED) {
perror("mmap");
exit(1);
}
res = msync(addr, len, msync_flag);
if (res == -1) {
perror("msync");
exit(1);
}
exit(0);
}
wait(NULL);
}
}

static void usage(const char *progname)
{
fprintf(stderr, "usage: %s filename [-sf]\n", progname);
exit(1);
}

int main(int argc, char *argv[])
{
int res;
char *addr;
int fd;

if (argc < 2)
usage(argv[0]);

filename = argv[1];
if (argc > 2) {
if (argc > 3)
usage(argv[0]);
if (strcmp(argv[2], "-s") == 0)
msync_flag = MS_SYNC;
else if (strcmp(argv[2], "-f") == 0)
msync_fork = 1;
else if (strcmp(argv[2], "-sf") == 0 || strcmp(argv[2], "-fs") == 0) {
msync_flag = MS_SYNC;
msync_fork = 1;
} else
usage(argv[0]);
}

fd = open(filename, O_RDWR | O_TRUNC | O_CREAT, 0666);
if (fd == -1) {
perror(filename);
return 1;
}
print_times("begin");
sleep(1);
write(fd, "\n", 4);
print_times("write");
sleep(1);
addr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (addr == MAP_FAILED) {
perror("mmap");
return 1;
}
print_times("mmap");
sleep(1);

addr[1] = 'b';
print_times("b");
sleep(1);
do_msync(addr, 4);
print_times("msync b");
sleep(1);

addr[2] = 'c';
print_times("c");
sleep(1);
do_msync(addr, 4);
print_times("msync c");
sleep(1);

addr[3] = 'd';
print_times("d");
sleep(1);
res = munmap(addr, 4);
if (res == -1) {
perror("munmap");
return 1;
}
print_times("munmap");
sleep(1);

res = close(fd);
if (res == -1) {
perror("close");
return 1;
}
print_times("close");
sleep(1);
sync();
print_times("sync");

return 0;
}


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] updating ctime and mtime at syncing

2008-01-14 Thread Miklos Szeredi
> http://bugzilla.kernel.org/show_bug.cgi?id=2645
> 
> Changes for updating the ctime and mtime fields for memory-mapped files:
> 
> 1) new flag triggering update of the inode data;
> 2) new function to update ctime and mtime for block device files;
> 3) new helper function to update ctime and mtime when needed;
> 4) updating time stamps for mapped files in sys_msync() and do_fsync();
> 5) implementing the feature of auto-updating ctime and mtime.

How exactly is this done?

Is this catering for this case:

 1 page is dirtied through mapping
 2 app calls msync(MS_ASYNC)
 3 page is written again through mapping
 4 app calls msync(MS_ASYNC)
 5 ...
 6 page is written back

What happens at 4?  Do we care about this one at all?

More comments inline.

> Signed-off-by: Anton Salikhmetov <[EMAIL PROTECTED]>
> ---
>  fs/buffer.c |1 +
>  fs/fs-writeback.c   |2 ++
>  fs/inode.c  |   42 +++---
>  fs/sync.c   |2 ++
>  include/linux/fs.h  |9 -
>  include/linux/pagemap.h |3 ++-
>  mm/msync.c  |   24 
>  mm/page-writeback.c |1 +
>  8 files changed, 67 insertions(+), 17 deletions(-)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 7249e01..09adf7e 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -719,6 +719,7 @@ static int __set_page_dirty(struct page *page,
>   }
>   write_unlock_irq(>tree_lock);
>   __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
> + set_bit(AS_MCTIME, >flags);
>  
>   return 1;
>  }
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 0fca820..c25ebd5 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -243,6 +243,8 @@ __sync_single_inode(struct inode *inode, struct 
> writeback_control *wbc)
>  
>   spin_unlock(_lock);
>  
> + mapping_update_time(mapping);
> +
>   ret = do_writepages(mapping, wbc);
>  
>   /* Don't write the inode if only I_DIRTY_PAGES was set */
> diff --git a/fs/inode.c b/fs/inode.c
> index ed35383..c02bfab 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -1243,8 +1243,8 @@ void touch_atime(struct vfsmount *mnt, struct dentry 
> *dentry)
>  EXPORT_SYMBOL(touch_atime);
>  
>  /**
> - *   file_update_time-   update mtime and ctime time
> - *   @file: file accessed
> + *   inode_update_time   -   update mtime and ctime time
> + *   @inode: inode accessed
>   *
>   *   Update the mtime and ctime members of an inode and mark the inode
>   *   for writeback.  Note that this function is meant exclusively for
> @@ -1253,10 +1253,8 @@ EXPORT_SYMBOL(touch_atime);
>   *   S_NOCTIME inode flag, e.g. for network filesystem where these
>   *   timestamps are handled by the server.
>   */
> -
> -void file_update_time(struct file *file)
> +void inode_update_time(struct inode *inode)
>  {
> - struct inode *inode = file->f_path.dentry->d_inode;
>   struct timespec now;
>   int sync_it = 0;
>  
> @@ -1279,8 +1277,39 @@ void file_update_time(struct file *file)
>   if (sync_it)
>   mark_inode_dirty_sync(inode);
>  }
> +EXPORT_SYMBOL(inode_update_time);
> +
> +/*
> + * Update the ctime and mtime stamps for memory-mapped block device files.
> + */
> +static void bd_inode_update_time(struct inode *inode)
> +{
> + struct block_device *bdev = inode->i_bdev;
> + struct list_head *p;
> +
> + if (bdev == NULL)
> + return;
> +
> + mutex_lock(>bd_mutex);
> + list_for_each(p, >bd_inodes) {
> + inode = list_entry(p, struct inode, i_devices);
> + inode_update_time(inode);
> + }
> + mutex_unlock(>bd_mutex);
> +}

Umm, why not just call with file->f_dentry->d_inode, so that you don't
need to do this ugly search for the physical inode?  The file pointer
is available in both msync and fsync.

> -EXPORT_SYMBOL(file_update_time);
> +/*
> + * Update the ctime and mtime stamps after checking if they are to be 
> updated.
> + */
> +void mapping_update_time(struct address_space *mapping)
> +{
> + if (test_and_clear_bit(AS_MCTIME, >flags)) {
> + if (S_ISBLK(mapping->host->i_mode))
> + bd_inode_update_time(mapping->host);
> + else
> + inode_update_time(mapping->host);
> + }
> +}
>  
>  int inode_needs_sync(struct inode *inode)
>  {
> @@ -1290,7 +1319,6 @@ int inode_needs_sync(struct inode *inode)
>   return 1;
>   return 0;
>  }
> -
>  EXPORT_SYMBOL(inode_needs_sync);
>  
>  int inode_wait(void *word)
> diff --git a/fs/sync.c b/fs/sync.c
> index 7cd005e..5561464 100644
> --- a/fs/sync.c
> +++ b/fs/sync.c
> @@ -87,6 +87,8 @@ long do_fsync(struct file *file, int datasync)
>   goto out;
>   }
>  
> + mapping_update_time(mapping);
> +
>   ret = filemap_fdatawrite(mapping);
>  
>   /*
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index b3ec4a4..1dccd4b 100644
> --- 

Re: [PATCH 2/2] updating ctime and mtime at syncing

2008-01-14 Thread Anton Salikhmetov
2008/1/14, Miklos Szeredi [EMAIL PROTECTED]:
  http://bugzilla.kernel.org/show_bug.cgi?id=2645
 
  Changes for updating the ctime and mtime fields for memory-mapped files:
 
  1) new flag triggering update of the inode data;
  2) new function to update ctime and mtime for block device files;
  3) new helper function to update ctime and mtime when needed;
  4) updating time stamps for mapped files in sys_msync() and do_fsync();
  5) implementing the feature of auto-updating ctime and mtime.

 How exactly is this done?

 Is this catering for this case:

  1 page is dirtied through mapping
  2 app calls msync(MS_ASYNC)
  3 page is written again through mapping
  4 app calls msync(MS_ASYNC)
  5 ...
  6 page is written back

 What happens at 4?  Do we care about this one at all?

The POSIX standard requires updating the file times every time when msync()
is called with MS_ASYNC. I.e. the time stamps should be updated even
when no physical synchronization is being done immediately.

At least, this is how I undestand the standard. Please tell me if I am wrong.


 More comments inline.

  Signed-off-by: Anton Salikhmetov [EMAIL PROTECTED]
  ---
   fs/buffer.c |1 +
   fs/fs-writeback.c   |2 ++
   fs/inode.c  |   42 +++---
   fs/sync.c   |2 ++
   include/linux/fs.h  |9 -
   include/linux/pagemap.h |3 ++-
   mm/msync.c  |   24 
   mm/page-writeback.c |1 +
   8 files changed, 67 insertions(+), 17 deletions(-)
 
  diff --git a/fs/buffer.c b/fs/buffer.c
  index 7249e01..09adf7e 100644
  --- a/fs/buffer.c
  +++ b/fs/buffer.c
  @@ -719,6 +719,7 @@ static int __set_page_dirty(struct page *page,
}
write_unlock_irq(mapping-tree_lock);
__mark_inode_dirty(mapping-host, I_DIRTY_PAGES);
  + set_bit(AS_MCTIME, mapping-flags);
 
return 1;
   }
  diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
  index 0fca820..c25ebd5 100644
  --- a/fs/fs-writeback.c
  +++ b/fs/fs-writeback.c
  @@ -243,6 +243,8 @@ __sync_single_inode(struct inode *inode, struct 
  writeback_control *wbc)
 
spin_unlock(inode_lock);
 
  + mapping_update_time(mapping);
  +
ret = do_writepages(mapping, wbc);
 
/* Don't write the inode if only I_DIRTY_PAGES was set */
  diff --git a/fs/inode.c b/fs/inode.c
  index ed35383..c02bfab 100644
  --- a/fs/inode.c
  +++ b/fs/inode.c
  @@ -1243,8 +1243,8 @@ void touch_atime(struct vfsmount *mnt, struct dentry 
  *dentry)
   EXPORT_SYMBOL(touch_atime);
 
   /**
  - *   file_update_time-   update mtime and ctime time
  - *   @file: file accessed
  + *   inode_update_time   -   update mtime and ctime time
  + *   @inode: inode accessed
*
*   Update the mtime and ctime members of an inode and mark the inode
*   for writeback.  Note that this function is meant exclusively for
  @@ -1253,10 +1253,8 @@ EXPORT_SYMBOL(touch_atime);
*   S_NOCTIME inode flag, e.g. for network filesystem where these
*   timestamps are handled by the server.
*/
  -
  -void file_update_time(struct file *file)
  +void inode_update_time(struct inode *inode)
   {
  - struct inode *inode = file-f_path.dentry-d_inode;
struct timespec now;
int sync_it = 0;
 
  @@ -1279,8 +1277,39 @@ void file_update_time(struct file *file)
if (sync_it)
mark_inode_dirty_sync(inode);
   }
  +EXPORT_SYMBOL(inode_update_time);
  +
  +/*
  + * Update the ctime and mtime stamps for memory-mapped block device files.
  + */
  +static void bd_inode_update_time(struct inode *inode)
  +{
  + struct block_device *bdev = inode-i_bdev;
  + struct list_head *p;
  +
  + if (bdev == NULL)
  + return;
  +
  + mutex_lock(bdev-bd_mutex);
  + list_for_each(p, bdev-bd_inodes) {
  + inode = list_entry(p, struct inode, i_devices);
  + inode_update_time(inode);
  + }
  + mutex_unlock(bdev-bd_mutex);
  +}

 Umm, why not just call with file-f_dentry-d_inode, so that you don't
 need to do this ugly search for the physical inode?  The file pointer
 is available in both msync and fsync.

I'm not sure if I undestood your question. I see two possible
interpretations for this
question, and I'm answering both.

The intention was to make the data changes in the block device data visible to
all device files associated with the block device. Hence the search,
because the time
stamps for all such device files should be updated as well.

Not only the sys_msync() and do_fsync() routines call the helper function
mapping_update_time(). Not all call sites of the helper function have the file
pointer available. Therefore, I pass the mapping pointer to the helper function
in order to make this function adequate for all situations.


  -EXPORT_SYMBOL(file_update_time);
  +/*
  + * Update the ctime and mtime stamps after checking if they are to be 
  updated.
  + 

Re: [PATCH 2/2] updating ctime and mtime at syncing

2008-01-14 Thread Miklos Szeredi
 2008/1/14, Miklos Szeredi [EMAIL PROTECTED]:
   http://bugzilla.kernel.org/show_bug.cgi?id=2645
  
   Changes for updating the ctime and mtime fields for memory-mapped files:
  
   1) new flag triggering update of the inode data;
   2) new function to update ctime and mtime for block device files;
   3) new helper function to update ctime and mtime when needed;
   4) updating time stamps for mapped files in sys_msync() and do_fsync();
   5) implementing the feature of auto-updating ctime and mtime.
 
  How exactly is this done?
 
  Is this catering for this case:
 
   1 page is dirtied through mapping
   2 app calls msync(MS_ASYNC)
   3 page is written again through mapping
   4 app calls msync(MS_ASYNC)
   5 ...
   6 page is written back
 
  What happens at 4?  Do we care about this one at all?
 
 The POSIX standard requires updating the file times every time when msync()
 is called with MS_ASYNC. I.e. the time stamps should be updated even
 when no physical synchronization is being done immediately.

Yes.  However, on linux MS_ASYNC is basically a no-op, and without
doing _something_ with the dirty pages (which afaics your patch
doesn't do), it's impossible to observe later writes to the same page.

I don't advocate full POSIX conformance anymore, because it's probably
too expensive to do (I've tried).  Rather than that, we should
probably find some sane compromise, that just fixes the real life
issue.  Here's a pointer to the thread about this:

http://lkml.org/lkml/2007/3/27/55

Your patch may be a good soultion, but you should describe in detail
what it does when pages are dirtied, and when msync/fsync are called,
and what happens with multiple msync calls that I've asked about.

I suspect your patch is ignoring writes after the first msync, but
then why care about msync at all?  What's so special about the _first_
msync?  Is it just that most test programs only check this, and not
what happens if msync is called more than once?  That would be a bug
in the test cases.

   +/*
   + * Update the ctime and mtime stamps for memory-mapped block device 
   files.
   + */
   +static void bd_inode_update_time(struct inode *inode)
   +{
   + struct block_device *bdev = inode-i_bdev;
   + struct list_head *p;
   +
   + if (bdev == NULL)
   + return;
   +
   + mutex_lock(bdev-bd_mutex);
   + list_for_each(p, bdev-bd_inodes) {
   + inode = list_entry(p, struct inode, i_devices);
   + inode_update_time(inode);
   + }
   + mutex_unlock(bdev-bd_mutex);
   +}
 
  Umm, why not just call with file-f_dentry-d_inode, so that you don't
  need to do this ugly search for the physical inode?  The file pointer
  is available in both msync and fsync.
 
 I'm not sure if I undestood your question. I see two possible
 interpretations for this question, and I'm answering both.
 
 The intention was to make the data changes in the block device data
 visible to all device files associated with the block device. Hence
 the search, because the time stamps for all such device files should
 be updated as well.

Ahh, but it will only update active devices, which are currently
open, no?  Is that what we want?

 Not only the sys_msync() and do_fsync() routines call the helper
 function mapping_update_time().

Ah yes, __sync_single_inode() calls it as well.  Why?

Miklos
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] updating ctime and mtime at syncing

2008-01-14 Thread Miklos Szeredi
  More fun, it would require marking them RO but leaving the dirty bit
  set, because this ext3 fudge where we confuse the page dirty state - or
  did that get fixed?
 
 That got fixed by Nick, I think.
 
 The alternative to marking pages RO, is to walk the PTEs in MS_ASYNC,
 note the dirty bit and mark pages clean.  But it's possibly even more
  
 ptes, I mean
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] updating ctime and mtime at syncing

2008-01-14 Thread Miklos Szeredi
 More fun, it would require marking them RO but leaving the dirty bit
 set, because this ext3 fudge where we confuse the page dirty state - or
 did that get fixed?

That got fixed by Nick, I think.

The alternative to marking pages RO, is to walk the PTEs in MS_ASYNC,
note the dirty bit and mark pages clean.  But it's possibly even more
complicated.

Miklos
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] updating ctime and mtime at syncing

2008-01-14 Thread Peter Zijlstra

On Mon, 2008-01-14 at 14:35 +0100, Peter Zijlstra wrote:
 On Mon, 2008-01-14 at 14:14 +0100, Miklos Szeredi wrote:
   2008/1/14, Miklos Szeredi [EMAIL PROTECTED]:
 http://bugzilla.kernel.org/show_bug.cgi?id=2645

 Changes for updating the ctime and mtime fields for memory-mapped 
 files:

 1) new flag triggering update of the inode data;
 2) new function to update ctime and mtime for block device files;
 3) new helper function to update ctime and mtime when needed;
 4) updating time stamps for mapped files in sys_msync() and 
 do_fsync();
 5) implementing the feature of auto-updating ctime and mtime.
   
How exactly is this done?
   
Is this catering for this case:
   
 1 page is dirtied through mapping
 2 app calls msync(MS_ASYNC)
 3 page is written again through mapping
 4 app calls msync(MS_ASYNC)
 5 ...
 6 page is written back
   
What happens at 4?  Do we care about this one at all?
   
   The POSIX standard requires updating the file times every time when 
   msync()
   is called with MS_ASYNC. I.e. the time stamps should be updated even
   when no physical synchronization is being done immediately.
  
  Yes.  However, on linux MS_ASYNC is basically a no-op, and without
  doing _something_ with the dirty pages (which afaics your patch
  doesn't do), it's impossible to observe later writes to the same page.
  
  I don't advocate full POSIX conformance anymore, because it's probably
  too expensive to do (I've tried).  Rather than that, we should
  probably find some sane compromise, that just fixes the real life
  issue.  Here's a pointer to the thread about this:
  
  http://lkml.org/lkml/2007/3/27/55
  
  Your patch may be a good soultion, but you should describe in detail
  what it does when pages are dirtied, and when msync/fsync are called,
  and what happens with multiple msync calls that I've asked about.
  
  I suspect your patch is ignoring writes after the first msync, but
  then why care about msync at all?  What's so special about the _first_
  msync?  Is it just that most test programs only check this, and not
  what happens if msync is called more than once?  That would be a bug
  in the test cases.
 
 I must agree, doing the mmap dirty, MS_ASYNC, mmap retouch, MS_ASYNC
 case correctly would need a lot more code which I doubt is worth the
 effort.
 
 It would require scanning the PTEs and marking them read-only again on
 MS_ASYNC, and some more logic in set_page_dirty() because that currently
 bails out early if the page in question is already dirty.

More fun, it would require marking them RO but leaving the dirty bit
set, because this ext3 fudge where we confuse the page dirty state - or
did that get fixed?

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] updating ctime and mtime at syncing

2008-01-14 Thread Peter Zijlstra

On Mon, 2008-01-14 at 14:14 +0100, Miklos Szeredi wrote:
  2008/1/14, Miklos Szeredi [EMAIL PROTECTED]:
http://bugzilla.kernel.org/show_bug.cgi?id=2645
   
Changes for updating the ctime and mtime fields for memory-mapped files:
   
1) new flag triggering update of the inode data;
2) new function to update ctime and mtime for block device files;
3) new helper function to update ctime and mtime when needed;
4) updating time stamps for mapped files in sys_msync() and do_fsync();
5) implementing the feature of auto-updating ctime and mtime.
  
   How exactly is this done?
  
   Is this catering for this case:
  
1 page is dirtied through mapping
2 app calls msync(MS_ASYNC)
3 page is written again through mapping
4 app calls msync(MS_ASYNC)
5 ...
6 page is written back
  
   What happens at 4?  Do we care about this one at all?
  
  The POSIX standard requires updating the file times every time when msync()
  is called with MS_ASYNC. I.e. the time stamps should be updated even
  when no physical synchronization is being done immediately.
 
 Yes.  However, on linux MS_ASYNC is basically a no-op, and without
 doing _something_ with the dirty pages (which afaics your patch
 doesn't do), it's impossible to observe later writes to the same page.
 
 I don't advocate full POSIX conformance anymore, because it's probably
 too expensive to do (I've tried).  Rather than that, we should
 probably find some sane compromise, that just fixes the real life
 issue.  Here's a pointer to the thread about this:
 
 http://lkml.org/lkml/2007/3/27/55
 
 Your patch may be a good soultion, but you should describe in detail
 what it does when pages are dirtied, and when msync/fsync are called,
 and what happens with multiple msync calls that I've asked about.
 
 I suspect your patch is ignoring writes after the first msync, but
 then why care about msync at all?  What's so special about the _first_
 msync?  Is it just that most test programs only check this, and not
 what happens if msync is called more than once?  That would be a bug
 in the test cases.

I must agree, doing the mmap dirty, MS_ASYNC, mmap retouch, MS_ASYNC
case correctly would need a lot more code which I doubt is worth the
effort.

It would require scanning the PTEs and marking them read-only again on
MS_ASYNC, and some more logic in set_page_dirty() because that currently
bails out early if the page in question is already dirty.



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] updating ctime and mtime at syncing

2008-01-14 Thread Anton Salikhmetov
2008/1/14, Peter Zijlstra [EMAIL PROTECTED]:

 On Mon, 2008-01-14 at 14:14 +0100, Miklos Szeredi wrote:
   2008/1/14, Miklos Szeredi [EMAIL PROTECTED]:
 http://bugzilla.kernel.org/show_bug.cgi?id=2645

 Changes for updating the ctime and mtime fields for memory-mapped 
 files:

 1) new flag triggering update of the inode data;
 2) new function to update ctime and mtime for block device files;
 3) new helper function to update ctime and mtime when needed;
 4) updating time stamps for mapped files in sys_msync() and 
 do_fsync();
 5) implementing the feature of auto-updating ctime and mtime.
   
How exactly is this done?
   
Is this catering for this case:
   
 1 page is dirtied through mapping
 2 app calls msync(MS_ASYNC)
 3 page is written again through mapping
 4 app calls msync(MS_ASYNC)
 5 ...
 6 page is written back
   
What happens at 4?  Do we care about this one at all?
  
   The POSIX standard requires updating the file times every time when 
   msync()
   is called with MS_ASYNC. I.e. the time stamps should be updated even
   when no physical synchronization is being done immediately.
 
  Yes.  However, on linux MS_ASYNC is basically a no-op, and without
  doing _something_ with the dirty pages (which afaics your patch
  doesn't do), it's impossible to observe later writes to the same page.
 
  I don't advocate full POSIX conformance anymore, because it's probably
  too expensive to do (I've tried).  Rather than that, we should
  probably find some sane compromise, that just fixes the real life
  issue.  Here's a pointer to the thread about this:
 
  http://lkml.org/lkml/2007/3/27/55
 
  Your patch may be a good soultion, but you should describe in detail
  what it does when pages are dirtied, and when msync/fsync are called,
  and what happens with multiple msync calls that I've asked about.
 
  I suspect your patch is ignoring writes after the first msync, but
  then why care about msync at all?  What's so special about the _first_
  msync?  Is it just that most test programs only check this, and not
  what happens if msync is called more than once?  That would be a bug
  in the test cases.

 I must agree, doing the mmap dirty, MS_ASYNC, mmap retouch, MS_ASYNC
 case correctly would need a lot more code which I doubt is worth the
 effort.

 It would require scanning the PTEs and marking them read-only again on
 MS_ASYNC, and some more logic in set_page_dirty() because that currently
 bails out early if the page in question is already dirty.

Thanks for your review, Peter and Miklos!

I overlooked this case when AS_MCTIME flag has been turned off and the
page is still dirty.

On the other hand, the words shall be marked for update may be
considered as just setting the AS_MCTIME flag, not updating the time
stamps.

What do you think about calling mapping_update_time() inside of if
(MS_SYNC  flags)? I suggest such change because the code for
analysis of the case you've mentioned above seems impossible to me.





--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] updating ctime and mtime at syncing

2008-01-14 Thread Anton Salikhmetov
2008/1/14, Miklos Szeredi [EMAIL PROTECTED]:
  2008/1/14, Miklos Szeredi [EMAIL PROTECTED]:
http://bugzilla.kernel.org/show_bug.cgi?id=2645
   
Changes for updating the ctime and mtime fields for memory-mapped files:
   
1) new flag triggering update of the inode data;
2) new function to update ctime and mtime for block device files;
3) new helper function to update ctime and mtime when needed;
4) updating time stamps for mapped files in sys_msync() and do_fsync();
5) implementing the feature of auto-updating ctime and mtime.
  
   How exactly is this done?
  
   Is this catering for this case:
  
1 page is dirtied through mapping
2 app calls msync(MS_ASYNC)
3 page is written again through mapping
4 app calls msync(MS_ASYNC)
5 ...
6 page is written back
  
   What happens at 4?  Do we care about this one at all?
 
  The POSIX standard requires updating the file times every time when msync()
  is called with MS_ASYNC. I.e. the time stamps should be updated even
  when no physical synchronization is being done immediately.

 Yes.  However, on linux MS_ASYNC is basically a no-op, and without
 doing _something_ with the dirty pages (which afaics your patch
 doesn't do), it's impossible to observe later writes to the same page.

 I don't advocate full POSIX conformance anymore, because it's probably
 too expensive to do (I've tried).  Rather than that, we should
 probably find some sane compromise, that just fixes the real life
 issue.  Here's a pointer to the thread about this:

 http://lkml.org/lkml/2007/3/27/55

 Your patch may be a good soultion, but you should describe in detail
 what it does when pages are dirtied, and when msync/fsync are called,
 and what happens with multiple msync calls that I've asked about.

 I suspect your patch is ignoring writes after the first msync, but
 then why care about msync at all?  What's so special about the _first_
 msync?  Is it just that most test programs only check this, and not
 what happens if msync is called more than once?  That would be a bug
 in the test cases.

+/*
+ * Update the ctime and mtime stamps for memory-mapped block device 
files.
+ */
+static void bd_inode_update_time(struct inode *inode)
+{
+ struct block_device *bdev = inode-i_bdev;
+ struct list_head *p;
+
+ if (bdev == NULL)
+ return;
+
+ mutex_lock(bdev-bd_mutex);
+ list_for_each(p, bdev-bd_inodes) {
+ inode = list_entry(p, struct inode, i_devices);
+ inode_update_time(inode);
+ }
+ mutex_unlock(bdev-bd_mutex);
+}
  
   Umm, why not just call with file-f_dentry-d_inode, so that you don't
   need to do this ugly search for the physical inode?  The file pointer
   is available in both msync and fsync.
 
  I'm not sure if I undestood your question. I see two possible
  interpretations for this question, and I'm answering both.
 
  The intention was to make the data changes in the block device data
  visible to all device files associated with the block device. Hence
  the search, because the time stamps for all such device files should
  be updated as well.

 Ahh, but it will only update active devices, which are currently
 open, no?  Is that what we want?

  Not only the sys_msync() and do_fsync() routines call the helper
  function mapping_update_time().

 Ah yes, __sync_single_inode() calls it as well.  Why?

The __sync_single_inode() function calls mapping_update_time()
to enable the auto-updating feature discussed earlier.


 Miklos

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] updating ctime and mtime at syncing

2008-01-12 Thread Rik van Riel
On Sun, 13 Jan 2008 07:39:59 +0300
Anton Salikhmetov <[EMAIL PROTECTED]> wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=2645
> 
> Changes for updating the ctime and mtime fields for memory-mapped files:
> 
> 1) new flag triggering update of the inode data;
> 2) new function to update ctime and mtime for block device files;
> 3) new helper function to update ctime and mtime when needed;
> 4) updating time stamps for mapped files in sys_msync() and do_fsync();
> 5) implementing the feature of auto-updating ctime and mtime.
> 
> Signed-off-by: Anton Salikhmetov <[EMAIL PROTECTED]>

Acked-by: Rik van Riel <[EMAIL PROTECTED]>

-- 
All rights reversed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] updating ctime and mtime at syncing

2008-01-12 Thread Rik van Riel
On Sun, 13 Jan 2008 07:39:59 +0300
Anton Salikhmetov [EMAIL PROTECTED] wrote:

 http://bugzilla.kernel.org/show_bug.cgi?id=2645
 
 Changes for updating the ctime and mtime fields for memory-mapped files:
 
 1) new flag triggering update of the inode data;
 2) new function to update ctime and mtime for block device files;
 3) new helper function to update ctime and mtime when needed;
 4) updating time stamps for mapped files in sys_msync() and do_fsync();
 5) implementing the feature of auto-updating ctime and mtime.
 
 Signed-off-by: Anton Salikhmetov [EMAIL PROTECTED]

Acked-by: Rik van Riel [EMAIL PROTECTED]

-- 
All rights reversed.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/