Re: [4/4] DST: Algorithms used in distributed storage.
On Wed, Dec 12, 2007 at 12:12:47PM +0300, Dmitry Monakhov ([EMAIL PROTECTED]) wrote: > On 14:47 Mon 10 Dec , Evgeniy Polyakov wrote: > > > > Algorithms used in distributed storage. > > Mirror and linear mapping code. > Hi, i've finally take a look on your DST solution. > It seems what your current implementation will not work on nonstandard > devices for example software raid0. > other comments are follows: > > +static int dst_mirror_process_node_data(struct dst_node *n, > > + struct dst_mirror_node_data *ndata, int op) > > + > > + kunmap(cmp->page); > << MINOR_BUG: >You has forgot to unmap page on error path, so IMHO it is better to move >kunmap to "err_out_free_cmp" label. Yep, I will fix this. > > + priv = kzalloc(sizeof(struct dst_mirror_priv), GFP_KERNEL); > > + if (!priv) > > + return -ENOMEM; > > + > > + priv->chunk_num = st->disk_size; > > + > > + priv->chunk = vmalloc(DIV_ROUND_UP(priv->chunk_num, BITS_PER_LONG) * > > sizeof(long)); > << Ohhh. My. I want to add my 500G hdd. Do you really wanna >say what i have to store 128Mb in memory object for this. Right now yes. There was a code which used single bit for bigger data units, but I dropped it because of resync troubles (i.e. when one single sector has been updated, it requires to resync the whole block). I can not say which case is better though. > > + dprintk("%s: start: %llu, size: %llu/%u, bio: %p, req: %p, " > > + "node: %p.\n", > > + __func__, req->start, req->size, nr_pages, bio, > > + req, req->node); > > + > > + err = n->st->queue->make_request_fn(n->st->queue, bio); > << Why direct make_request_fn instead of generic_make_request? generic_make_request() will queue the bio in this case, so I call request_fn directly. > > + for (i = 0; i < DIV_ROUND_UP(priv->chunk_num, BITS_PER_LONG); ++i) { > > + int bit, num, start; > > + unsigned long word = priv->chunk[i]; > > + > > + if (!word) > > + continue; > > + > > + num = 0; > > + start = -1; > > + while (word && num < BITS_PER_LONG) { > > + bit = __ffs(word); > > + if (start == -1) > > + start = bit; > > + num++; > << MINOR_BUG: Seems you have misstyped here. AFAIU @num represent position >of last non zero bit (start + num == last_non_zero_bit_pos) > if (start == -1) { > start = bit; > num = 1; > } else > num += bit; Yes, you are right of course. Since I shift word to more than a single bit, @num has to be update accordingly. > > + word >>= (bit+1); Dmitry, thanks a lot for comments, I will fix issues you pointed in the next release, although will stay bitmap case opened for a while. -- Evgeniy Polyakov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [4/4] DST: Algorithms used in distributed storage.
On 14:47 Mon 10 Dec , Evgeniy Polyakov wrote: > > Algorithms used in distributed storage. > Mirror and linear mapping code. Hi, i've finally take a look on your DST solution. It seems what your current implementation will not work on nonstandard devices for example software raid0. other comments are follows: > > Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]> > > > diff --git a/drivers/block/dst/alg_linear.c b/drivers/block/dst/alg_linear.c > new file mode 100644 > index 000..9dc0976 > --- /dev/null > +++ b/drivers/block/dst/alg_linear.c > @@ -0,0 +1,105 @@ > +/* > + * 2007+ Copyright (c) Evgeniy Polyakov <[EMAIL PROTECTED]> > + * All rights reserved. > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + */ > + > +#include > +#include > +#include > +#include > + > +static struct dst_alg *alg_linear; > + > +/* > + * This callback is invoked when node is removed from storage. > + */ > +static void dst_linear_del_node(struct dst_node *n) > +{ > +} > + > +/* > + * This callback is invoked when node is added to storage. > + */ > +static int dst_linear_add_node(struct dst_node *n) > +{ > + struct dst_storage *st = n->st; > + > + dprintk("%s: disk_size: %llu, node_size: %llu.\n", > + __func__, st->disk_size, n->size); > + > + mutex_lock(>tree_lock); > + n->start = st->disk_size; > + st->disk_size += n->size; > + set_capacity(st->disk, st->disk_size); > + mutex_unlock(>tree_lock); > + > + return 0; > +} > + > +static int dst_linear_remap(struct dst_request *req) > +{ > + int err; > + > + if (req->node->bdev) { > + generic_make_request(req->bio); > + return 0; > + } > + > + err = kst_check_permissions(req->state, req->bio); > + if (err) > + return err; > + > + return req->state->ops->push(req); > +} > + > +/* > + * Failover callback - it is invoked each time error happens during > + * request processing. > + */ > +static int dst_linear_error(struct kst_state *st, int err) > +{ > + if (err) > + set_bit(DST_NODE_FROZEN, >node->flags); > + else > + clear_bit(DST_NODE_FROZEN, >node->flags); > + return 0; > +} > + > +static struct dst_alg_ops alg_linear_ops = { > + .remap = dst_linear_remap, > + .add_node = dst_linear_add_node, > + .del_node = dst_linear_del_node, > + .error = dst_linear_error, > + .owner = THIS_MODULE, > +}; > + > +static int __devinit alg_linear_init(void) > +{ > + alg_linear = dst_alloc_alg("alg_linear", _linear_ops); > + if (!alg_linear) > + return -ENOMEM; > + > + return 0; > +} > + > +static void __devexit alg_linear_exit(void) > +{ > + dst_remove_alg(alg_linear); > +} > + > +module_init(alg_linear_init); > +module_exit(alg_linear_exit); > + > +MODULE_LICENSE("GPL"); > +MODULE_AUTHOR("Evgeniy Polyakov <[EMAIL PROTECTED]>"); > +MODULE_DESCRIPTION("Linear distributed algorithm."); > diff --git a/drivers/block/dst/alg_mirror.c b/drivers/block/dst/alg_mirror.c > new file mode 100644 > index 000..3c457ff > --- /dev/null > +++ b/drivers/block/dst/alg_mirror.c > @@ -0,0 +1,1128 @@ > +/* > + * 2007+ Copyright (c) Evgeniy Polyakov <[EMAIL PROTECTED]> > + * All rights reserved. > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + */ > + > +#include > +#include > +#include > +#include > +#include > + > +struct dst_mirror_node_data > +{ > + u64 age; > +}; > + > +struct dst_mirror_priv > +{ > + unsigned intchunk_num; > + > + u64 last_start; > + > + spinlock_t backlog_lock; > + struct list_headbacklog_list; > + > + struct dst_mirror_node_data old_data, new_data; > + > + unsigned long *chunk; > +}; > + > +static struct dst_alg *alg_mirror; > +static struct bio_set *dst_mirror_bio_set; > + > +static int dst_mirror_resync(struct dst_node *n, int ndp); > + > +static void dst_mirror_mark_sync(struct
Re: [4/4] DST: Algorithms used in distributed storage.
On 14:47 Mon 10 Dec , Evgeniy Polyakov wrote: Algorithms used in distributed storage. Mirror and linear mapping code. Hi, i've finally take a look on your DST solution. It seems what your current implementation will not work on nonstandard devices for example software raid0. other comments are follows: Signed-off-by: Evgeniy Polyakov [EMAIL PROTECTED] diff --git a/drivers/block/dst/alg_linear.c b/drivers/block/dst/alg_linear.c new file mode 100644 index 000..9dc0976 --- /dev/null +++ b/drivers/block/dst/alg_linear.c @@ -0,0 +1,105 @@ +/* + * 2007+ Copyright (c) Evgeniy Polyakov [EMAIL PROTECTED] + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include linux/module.h +#include linux/kernel.h +#include linux/init.h +#include linux/dst.h + +static struct dst_alg *alg_linear; + +/* + * This callback is invoked when node is removed from storage. + */ +static void dst_linear_del_node(struct dst_node *n) +{ +} + +/* + * This callback is invoked when node is added to storage. + */ +static int dst_linear_add_node(struct dst_node *n) +{ + struct dst_storage *st = n-st; + + dprintk(%s: disk_size: %llu, node_size: %llu.\n, + __func__, st-disk_size, n-size); + + mutex_lock(st-tree_lock); + n-start = st-disk_size; + st-disk_size += n-size; + set_capacity(st-disk, st-disk_size); + mutex_unlock(st-tree_lock); + + return 0; +} + +static int dst_linear_remap(struct dst_request *req) +{ + int err; + + if (req-node-bdev) { + generic_make_request(req-bio); + return 0; + } + + err = kst_check_permissions(req-state, req-bio); + if (err) + return err; + + return req-state-ops-push(req); +} + +/* + * Failover callback - it is invoked each time error happens during + * request processing. + */ +static int dst_linear_error(struct kst_state *st, int err) +{ + if (err) + set_bit(DST_NODE_FROZEN, st-node-flags); + else + clear_bit(DST_NODE_FROZEN, st-node-flags); + return 0; +} + +static struct dst_alg_ops alg_linear_ops = { + .remap = dst_linear_remap, + .add_node = dst_linear_add_node, + .del_node = dst_linear_del_node, + .error = dst_linear_error, + .owner = THIS_MODULE, +}; + +static int __devinit alg_linear_init(void) +{ + alg_linear = dst_alloc_alg(alg_linear, alg_linear_ops); + if (!alg_linear) + return -ENOMEM; + + return 0; +} + +static void __devexit alg_linear_exit(void) +{ + dst_remove_alg(alg_linear); +} + +module_init(alg_linear_init); +module_exit(alg_linear_exit); + +MODULE_LICENSE(GPL); +MODULE_AUTHOR(Evgeniy Polyakov [EMAIL PROTECTED]); +MODULE_DESCRIPTION(Linear distributed algorithm.); diff --git a/drivers/block/dst/alg_mirror.c b/drivers/block/dst/alg_mirror.c new file mode 100644 index 000..3c457ff --- /dev/null +++ b/drivers/block/dst/alg_mirror.c @@ -0,0 +1,1128 @@ +/* + * 2007+ Copyright (c) Evgeniy Polyakov [EMAIL PROTECTED] + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include linux/module.h +#include linux/kernel.h +#include linux/init.h +#include linux/poll.h +#include linux/dst.h + +struct dst_mirror_node_data +{ + u64 age; +}; + +struct dst_mirror_priv +{ + unsigned intchunk_num; + + u64 last_start; + + spinlock_t backlog_lock; + struct list_headbacklog_list; + + struct dst_mirror_node_data old_data, new_data; + + unsigned long *chunk; +}; + +static struct dst_alg *alg_mirror; +static struct bio_set *dst_mirror_bio_set; + +static int dst_mirror_resync(struct dst_node *n, int ndp); + +static void dst_mirror_mark_sync(struct dst_node *n) +{ + if (test_bit(DST_NODE_NOTSYNC, n-flags)) { +
Re: [4/4] DST: Algorithms used in distributed storage.
On Wed, Dec 12, 2007 at 12:12:47PM +0300, Dmitry Monakhov ([EMAIL PROTECTED]) wrote: On 14:47 Mon 10 Dec , Evgeniy Polyakov wrote: Algorithms used in distributed storage. Mirror and linear mapping code. Hi, i've finally take a look on your DST solution. It seems what your current implementation will not work on nonstandard devices for example software raid0. other comments are follows: +static int dst_mirror_process_node_data(struct dst_node *n, + struct dst_mirror_node_data *ndata, int op) + + kunmap(cmp-page); MINOR_BUG: You has forgot to unmap page on error path, so IMHO it is better to move kunmap to err_out_free_cmp label. Yep, I will fix this. + priv = kzalloc(sizeof(struct dst_mirror_priv), GFP_KERNEL); + if (!priv) + return -ENOMEM; + + priv-chunk_num = st-disk_size; + + priv-chunk = vmalloc(DIV_ROUND_UP(priv-chunk_num, BITS_PER_LONG) * sizeof(long)); Ohhh. My. I want to add my 500G hdd. Do you really wanna say what i have to store 128Mb in memory object for this. Right now yes. There was a code which used single bit for bigger data units, but I dropped it because of resync troubles (i.e. when one single sector has been updated, it requires to resync the whole block). I can not say which case is better though. + dprintk(%s: start: %llu, size: %llu/%u, bio: %p, req: %p, + node: %p.\n, + __func__, req-start, req-size, nr_pages, bio, + req, req-node); + + err = n-st-queue-make_request_fn(n-st-queue, bio); Why direct make_request_fn instead of generic_make_request? generic_make_request() will queue the bio in this case, so I call request_fn directly. + for (i = 0; i DIV_ROUND_UP(priv-chunk_num, BITS_PER_LONG); ++i) { + int bit, num, start; + unsigned long word = priv-chunk[i]; + + if (!word) + continue; + + num = 0; + start = -1; + while (word num BITS_PER_LONG) { + bit = __ffs(word); + if (start == -1) + start = bit; + num++; MINOR_BUG: Seems you have misstyped here. AFAIU @num represent position of last non zero bit (start + num == last_non_zero_bit_pos) if (start == -1) { start = bit; num = 1; } else num += bit; Yes, you are right of course. Since I shift word to more than a single bit, @num has to be update accordingly. + word = (bit+1); Dmitry, thanks a lot for comments, I will fix issues you pointed in the next release, although will stay bitmap case opened for a while. -- Evgeniy Polyakov -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/