Re: [RFC v2 PATCH] kvm tools, qcow: Add the support for copy-on-write clusters

2011-11-21 Thread Kevin Wolf
Am 20.11.2011 11:59, schrieb Pekka Enberg:
 On Sun, 2011-11-20 at 14:14 +0800, Lan, Tianyu wrote:
 OK. Thx.
 But fsync is too slow. I try to find a way to sync a range of file. 
 Are there any solutions to meet my purpose?
 
 On Sun, 2011-11-20 at 08:23 +0200, Sasha Levin wrote:
 fdatasync() is as good as it'll get.

 tbh, maybe we should just consider opening QCOW images with O_SYNC and
 just get it over with?
 
 No, lets not do that. It's easier to improve the performance of correct
 code that doesn't use O_SYNC.

Yes, O_SYNC gives you horrible performance. With explicit fsyncs cluster
allocation is somewhat slow (you can try to speed it up by batching
updates like qemu does), but at least rewrites will perform reasonably
close to raw.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC v2 PATCH] kvm tools, qcow: Add the support for copy-on-write clusters

2011-11-20 Thread Pekka Enberg
On Sun, 2011-11-20 at 14:14 +0800, Lan, Tianyu wrote:
  OK. Thx.
  But fsync is too slow. I try to find a way to sync a range of file. 
  Are there any solutions to meet my purpose?

On Sun, 2011-11-20 at 08:23 +0200, Sasha Levin wrote:
 fdatasync() is as good as it'll get.
 
 tbh, maybe we should just consider opening QCOW images with O_SYNC and
 just get it over with?

No, lets not do that. It's easier to improve the performance of correct
code that doesn't use O_SYNC.

Pekka

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC v2 PATCH] kvm tools, qcow: Add the support for copy-on-write clusters

2011-11-19 Thread Lan, Tianyu
How about using the sync_file_range to sync the metadata?

-Original Message-
From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of 
Lan, Tianyu
Sent: Saturday, November 19, 2011 10:09 AM
To: Kevin Wolf
Cc: penb...@kernel.org; kvm@vger.kernel.org; asias.he...@gmail.com; 
levinsasha...@gmail.com; prasadjoshi...@gmail.com
Subject: RE: [RFC v2 PATCH] kvm tools, qcow: Add the support for copy-on-write 
clusters

Hi Kevin:
Thanks for your review. The following means that there should be a 
fsync after updating 
metadata(refcunt block, l1 table and l2 table).

Thanks 
Tianyu Lan

-Original Message-
 + /*write l2 table*/
 + l2t-dirty = 1;
 + if (qcow_l2_cache_write(q, l2t)  0)
   goto free_cache;

You need to make sure that the refcount update is written first (e.g.
with fsync), otherwise you risk corruption when the host crashes in the middle.

  
 - if (cache_table(q, l2t)  0) {
 - if (ftruncate(q-fd, f_sz)  0)
 - goto free_cache;
 + /* Update the l1 talble */
 + l1t-l1_table[l1t_idx] = cpu_to_be64(l2t_new_offset
 + | QCOW2_OFLAG_COPIED);
  
 - goto free_cache;
 - }
 + if (pwrite_in_full(q-fd, l1t-l1_table,
 + l1t-table_size * sizeof(u64),
 + header-l1_table_offset)  0)
 + goto error;

Likewise, the L1 table write must be ordered against the L2 write.

goto error is using the wrong label.

  
 - /* Update the in-core entry */
 - l1t-l1_table[l1t_idx] = cpu_to_be64(l2t_offset);
 + /*cache l2 table*/
 + cache_table(q, l2t);

After so many explicit comments, you can probably guess what's wrong here...
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC v2 PATCH] kvm tools, qcow: Add the support for copy-on-write clusters

2011-11-19 Thread Sasha Levin
On Sat, 2011-11-19 at 23:30 +0800, Lan, Tianyu wrote:
 How about using the sync_file_range to sync the metadata?

sync_file_range() is only a hint, it doesn't actually assure anything.

-- 

Sasha.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC v2 PATCH] kvm tools, qcow: Add the support for copy-on-write clusters

2011-11-19 Thread Lan, Tianyu
OK. Thx.
But fsync is too slow. I try to find a way to sync a range of file. 
Are there any solutions to meet my purpose?

Thanks
Tianyu Lan

-Original Message-
From: Sasha Levin [mailto:levinsasha...@gmail.com] 
Sent: Sunday, November 20, 2011 12:27 AM
To: Lan, Tianyu
Cc: Kevin Wolf; penb...@kernel.org; kvm@vger.kernel.org; asias.he...@gmail.com; 
prasadjoshi...@gmail.com
Subject: RE: [RFC v2 PATCH] kvm tools, qcow: Add the support for copy-on-write 
clusters

On Sat, 2011-11-19 at 23:30 +0800, Lan, Tianyu wrote:
 How about using the sync_file_range to sync the metadata?

sync_file_range() is only a hint, it doesn't actually assure anything.

-- 

Sasha.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC v2 PATCH] kvm tools, qcow: Add the support for copy-on-write clusters

2011-11-19 Thread Sasha Levin
On Sun, 2011-11-20 at 14:14 +0800, Lan, Tianyu wrote:
 OK. Thx.
 But fsync is too slow. I try to find a way to sync a range of file. 
 Are there any solutions to meet my purpose?

fdatasync() is as good as it'll get.

tbh, maybe we should just consider opening QCOW images with O_SYNC and
just get it over with?

 Thanks
 Tianyu Lan
 
 -Original Message-
 From: Sasha Levin [mailto:levinsasha...@gmail.com] 
 Sent: Sunday, November 20, 2011 12:27 AM
 To: Lan, Tianyu
 Cc: Kevin Wolf; penb...@kernel.org; kvm@vger.kernel.org; 
 asias.he...@gmail.com; prasadjoshi...@gmail.com
 Subject: RE: [RFC v2 PATCH] kvm tools, qcow: Add the support for 
 copy-on-write clusters
 
 On Sat, 2011-11-19 at 23:30 +0800, Lan, Tianyu wrote:
  How about using the sync_file_range to sync the metadata?
 
 sync_file_range() is only a hint, it doesn't actually assure anything.
 

-- 

Sasha.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC v2 PATCH] kvm tools, qcow: Add the support for copy-on-write clusters

2011-11-19 Thread Lan, Tianyu
Yeah. That will make the work quite simple.
After testing, fdatasync is better than fsync.

-Original Message-
From: Sasha Levin [mailto:levinsasha...@gmail.com] 
Sent: Sunday, November 20, 2011 2:23 PM
To: Lan, Tianyu
Cc: Kevin Wolf; penb...@kernel.org; kvm@vger.kernel.org; asias.he...@gmail.com; 
prasadjoshi...@gmail.com
Subject: RE: [RFC v2 PATCH] kvm tools, qcow: Add the support for copy-on-write 
clusters

On Sun, 2011-11-20 at 14:14 +0800, Lan, Tianyu wrote:
 OK. Thx.
 But fsync is too slow. I try to find a way to sync a range of file. 
 Are there any solutions to meet my purpose?

fdatasync() is as good as it'll get.

tbh, maybe we should just consider opening QCOW images with O_SYNC and
just get it over with?

 Thanks
 Tianyu Lan
 
 -Original Message-
 From: Sasha Levin [mailto:levinsasha...@gmail.com] 
 Sent: Sunday, November 20, 2011 12:27 AM
 To: Lan, Tianyu
 Cc: Kevin Wolf; penb...@kernel.org; kvm@vger.kernel.org; 
 asias.he...@gmail.com; prasadjoshi...@gmail.com
 Subject: RE: [RFC v2 PATCH] kvm tools, qcow: Add the support for 
 copy-on-write clusters
 
 On Sat, 2011-11-19 at 23:30 +0800, Lan, Tianyu wrote:
  How about using the sync_file_range to sync the metadata?
 
 sync_file_range() is only a hint, it doesn't actually assure anything.
 

-- 

Sasha.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 PATCH] kvm tools, qcow: Add the support for copy-on-write clusters

2011-11-18 Thread Kevin Wolf
Am 18.11.2011 09:47, schrieb Lan Tianyu:
 When meeting request to write the cluster without copied flag,
 allocate a new cluster and write original data with modification
 to the new cluster. This also adds support for the writing operation
 of the qcow2 compressed image. After testing, image file can pass
 through qemu-img check.
 
  Please enter the commit message for your changes. Lines starting
 
 Signed-off-by: Lan Tianyu tianyu@intel.com
 ---
  tools/kvm/disk/qcow.c|  366 
 +-
  tools/kvm/include/kvm/qcow.h |2 +
  2 files changed, 255 insertions(+), 113 deletions(-)
 
 diff --git a/tools/kvm/disk/qcow.c b/tools/kvm/disk/qcow.c
 index 680b37d..4d9125d 100644
 --- a/tools/kvm/disk/qcow.c
 +++ b/tools/kvm/disk/qcow.c
 @@ -122,9 +122,6 @@ static int cache_table(struct qcow *q, struct 
 qcow_l2_table *c)
*/
   lru = list_first_entry(l1t-lru_list, struct qcow_l2_table, 
 list);
  
 - if (qcow_l2_cache_write(q, lru)  0)
 - goto error;
 -
   /* Remove the node from the cache */
   rb_erase(lru-node, r);
   list_del_init(lru-list);
 @@ -618,9 +615,6 @@ static int cache_refcount_block(struct qcow *q, struct 
 qcow_refcount_block *c)
   if (rft-nr_cached == MAX_CACHE_NODES) {
   lru = list_first_entry(rft-lru_list, struct 
 qcow_refcount_block, list);
  
 - if (write_refcount_block(q, lru)  0)
 - goto error;
 -
   rb_erase(lru-node, r);
   list_del_init(lru-list);
   rft-nr_cached--;
 @@ -706,6 +700,11 @@ static struct qcow_refcount_block 
 *qcow_read_refcount_block(struct qcow *q, u64
  
   rfb_offset = be64_to_cpu(rft-rf_table[rft_idx]);
  
 + if (!rfb_offset) {
 + pr_warning(Don't support to grow refcount table);
 + return NULL;
 + }
 +
   rfb = refcount_block_search(q, rfb_offset);
   if (rfb)
   return rfb;
 @@ -728,35 +727,121 @@ error_free_rfb:
   return NULL;
  }
  
 -/*
 - * QCOW file might grow during a write operation. Not only data but metadata 
 is
 - * also written at the end of the file. Therefore it is necessary to ensure
 - * every write is committed to disk. Hence we use uses qcow_pwrite_sync() to
 - * synchronize the in-core state of QCOW image to disk.
 - *
 - * We also try to restore the image to a consistent state if the metdata
 - * operation fails. The two metadat operations are: level 1 and level 2 table
 - * update. If either of them fails the image is truncated to a consistent 
 state.
 +static u16 qcow_get_refcount(struct qcow *q, u64 clust_idx)
 +{
 + struct qcow_refcount_block *rfb = NULL;
 + struct qcow_header *header = q-header;
 + u64 rfb_idx;
 +
 + rfb = qcow_read_refcount_block(q, clust_idx);
 + if (!rfb) {
 + pr_warning(Error while reading refcount table);
 + return -1;
 + }
 +
 + rfb_idx = clust_idx  (((1ULL 
 + (header-cluster_bits - QCOW_REFCOUNT_BLOCK_SHIFT)) - 1));
 +
 + if (rfb_idx = rfb-size) {
 + pr_warning(L1: refcount block index out of bounds);
 + return -1;
 + }
 +
 + return be16_to_cpu(rfb-entries[rfb_idx]);
 +}
 +
 +static int update_cluster_refcount(struct qcow *q, u64 clust_idx, u16 append)
 +{
 + struct qcow_refcount_block *rfb = NULL;
 + struct qcow_header *header = q-header;
 + u16 refcount;
 + u64 rfb_idx;
 +
 + rfb = qcow_read_refcount_block(q, clust_idx);
 + if (!rfb) {
 + pr_warning(error while reading refcount table);
 + return -1;
 + }
 +
 + rfb_idx = clust_idx  (((1ULL 
 + (header-cluster_bits - QCOW_REFCOUNT_BLOCK_SHIFT)) - 1));
 + if (rfb_idx = rfb-size) {
 + pr_warning(refcount block index out of bounds);
 + return -1;
 + }
 +
 + refcount = be16_to_cpu(rfb-entries[rfb_idx]) + append;
 + rfb-entries[rfb_idx] = cpu_to_be16(refcount);
 + rfb-dirty = 1;
 +
 + /*write refcount block*/
 + write_refcount_block(q, rfb);

Missing error handling.

 +
 + /*update free_clust_idx since refcount becomes zero*/
 + if (!refcount  clust_idx  q-free_clust_idx)
 + q-free_clust_idx = clust_idx;
 +
 + return 0;
 +}
 +
 +static void  qcow_free_clusters(struct qcow *q, u64 clust_start, u64 size)
 +{
 + struct qcow_header *header = q-header;
 + u64 start, end, offset;
 +
 + start = clust_start  ~(q-cluster_size - 1);
 + end = (clust_start + size - 1)  ~(q-cluster_size - 1);
 + for (offset = start; offset = end; offset += q-cluster_size)
 + update_cluster_refcount(q, offset  header-cluster_bits, -1);
 +}
 +
 +/*Allocate clusters according to the size. Find a postion that
 + *can satisfy the size. free_clust_idx is initialized to zero and
 + *Record last position.
 +*/
 +static u64 qcow_alloc_clusters(struct 

RE: [RFC v2 PATCH] kvm tools, qcow: Add the support for copy-on-write clusters

2011-11-18 Thread Lan, Tianyu
Hi Kevin:
Thanks for your review. The following means that there should be a 
fsync after updating 
metadata(refcunt block, l1 table and l2 table).

Thanks 
Tianyu Lan

-Original Message-
 + /*write l2 table*/
 + l2t-dirty = 1;
 + if (qcow_l2_cache_write(q, l2t)  0)
   goto free_cache;

You need to make sure that the refcount update is written first (e.g.
with fsync), otherwise you risk corruption when the host crashes in the middle.

  
 - if (cache_table(q, l2t)  0) {
 - if (ftruncate(q-fd, f_sz)  0)
 - goto free_cache;
 + /* Update the l1 talble */
 + l1t-l1_table[l1t_idx] = cpu_to_be64(l2t_new_offset
 + | QCOW2_OFLAG_COPIED);
  
 - goto free_cache;
 - }
 + if (pwrite_in_full(q-fd, l1t-l1_table,
 + l1t-table_size * sizeof(u64),
 + header-l1_table_offset)  0)
 + goto error;

Likewise, the L1 table write must be ordered against the L2 write.

goto error is using the wrong label.

  
 - /* Update the in-core entry */
 - l1t-l1_table[l1t_idx] = cpu_to_be64(l2t_offset);
 + /*cache l2 table*/
 + cache_table(q, l2t);

After so many explicit comments, you can probably guess what's wrong here...
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html