On 06/06/2012 08:39 PM, Christoph Hellwig wrote:
> Allow users to override the address advertised to other sheep. This is
> important for setups where the computers running sheep nodes have multiple
> network interfaces and we need to use a specific one.
Applied first one, second one need rebas
On 06/07/2012 06:23 AM, Christoph Hellwig wrote:
Applied first one, and the other need rebasing to master.
Thanks,
Yuan
--
sheepdog mailing list
sheepdog@lists.wpkg.org
http://lists.wpkg.org/mailman/listinfo/sheepdog
On 06/06/2012 08:49 PM, Christoph Hellwig wrote:
> Pass the actual node number to get_max_nr_copies_from, instead of
> the size of the nodes array.
This patch is dropped because subsequent patchset from you will override
it, I don't wanna merge a patch that is to be override very soon.
Thanks,
At Wed, 6 Jun 2012 18:38:55 -0400,
Christoph Hellwig wrote:
>
> I'm trying to understand the use case for the leave_list and all code
> associated with it.
>
> From my reading the intention is to allow a cluster to start as long
> as all the original nodes tried to join the cluster. What makes a
On 06/07/2012 06:38 AM, Christoph Hellwig wrote:
> I'm trying to understand the use case for the leave_list and all code
> associated with it.
>
> From my reading the intention is to allow a cluster to start as long
> as all the original nodes tried to join the cluster. What makes an
> original
On 06/07/2012 09:28 AM, MORITA Kazutaka wrote:
> At Wed, 06 Jun 2012 18:52:16 +0800,
> Liu Yuan wrote:
>>
>> On 06/06/2012 06:44 PM, Christoph Hellwig wrote:
>>
>>> On Wed, Jun 06, 2012 at 11:56:53AM +0800, Liu Yuan wrote:
The pros is simplification of code, but you don't say cons, I think we
At Wed, 06 Jun 2012 18:52:16 +0800,
Liu Yuan wrote:
>
> On 06/06/2012 06:44 PM, Christoph Hellwig wrote:
>
> > On Wed, Jun 06, 2012 at 11:56:53AM +0800, Liu Yuan wrote:
> >> The pros is simplification of code, but you don't say cons, I think we
> >> should know the perf number we lost by adding a
I'm trying to understand the use case for the leave_list and all code
associated with it.
>From my reading the intention is to allow a cluster to start as long
as all the original nodes tried to join the cluster. What makes an
original node that tried to join the cluster but failed special over
o
By opencoding it in the two callers we can not only simplify the code, but also
differenciate the nr_nodes = 0 case where we don't want to read the epoch log
from a real error reading the epoch log.
Signed-off-by: Christoph Hellwig
diff --git a/sheep/group.c b/sheep/group.c
index 96dea8d..d81454
Signed-off-by: Christoph Hellwig
diff --git a/sheep/farm/snap.c b/sheep/farm/snap.c
index 3134c28..1e5917d 100644
--- a/sheep/farm/snap.c
+++ b/sheep/farm/snap.c
@@ -161,10 +161,17 @@ int snap_file_write(uint32_t epoch, unsigned char
*trunksha1, unsigned char *out
struct strbuf buf = ST
Pass a struct sd_node array instead of an unformatted buffer to all
epoch_log_read variants, and cut down the epoch_log_read/epoch_log_read_nr
split down to a single variant, which returns the number of nodes,
but is called epoch_log_read. Also make epoch_log_read_remote return
the number of nodes
--
sheepdog mailing list
sheepdog@lists.wpkg.org
http://lists.wpkg.org/mailman/listinfo/sheepdog
On 06/06/2012 10:02 PM, Christoph Hellwig wrote:
> On Wed, Jun 06, 2012 at 09:56:00PM +0800, Liu Yuan wrote:
>>> That why I said I like the approch, but issues like the one above need
>>> to be carefully audited and fixed.
>>>
>>
>>
>> Yes, but I can't come up with an easy fix for concurrent VDI c
On Wed, Jun 06, 2012 at 09:56:00PM +0800, Liu Yuan wrote:
> > That why I said I like the approch, but issues like the one above need
> > to be carefully audited and fixed.
> >
>
>
> Yes, but I can't come up with an easy fix for concurrent VDI creation of
> the same name inside sheep and the uppe
On 06/06/2012 09:31 PM, Christoph Hellwig wrote:
> On Wed, Jun 06, 2012 at 09:28:38PM +0800, Liu Yuan wrote:
>> Yes, this patch only serialize the cluster requests on local node, but I
>> think this is good enough. Sheepdog doesn't need to carry on the burden
>> such as name collision detection, w
Am 2012-06-06 14:19, schrieb Liu Yuan:
Well, the membership management backend such as corosync can only
reliably support less than 20 nodes. This means you can't add more
nodes
into a running cluster with Corosync when number exceeds 15~20, See
more
info about it, https://github.com/collie/sh
On Wed, Jun 06, 2012 at 09:28:38PM +0800, Liu Yuan wrote:
> Yes, this patch only serialize the cluster requests on local node, but I
> think this is good enough. Sheepdog doesn't need to carry on the burden
> such as name collision detection, which could be handled very well
> (better) by the admin
On 06/06/2012 09:08 PM, Christoph Hellwig wrote:
> I like this idea, but I don't think it will actually work as-is.
>
> The current blocking cluster operations quience other cluster operations
> from the when they are handled in the main thread first, over the
> execution in the worker thread, un
I like this idea, but I don't think it will actually work as-is.
The current blocking cluster operations quience other cluster operations
from the when they are handled in the main thread first, over the
execution in the worker thread, until after the process_main method
is called on all nodes.
B
Pass the actual node number to get_max_nr_copies_from, instead of
the size of the nodes array.
Signed-off-by: Christoph Hellwig
diff --git a/sheep/ops.c b/sheep/ops.c
index 91f3536..2fe6fb4 100644
--- a/sheep/ops.c
+++ b/sheep/ops.c
@@ -349,16 +349,17 @@ static int local_stat_cluster(struct requ
Allow users to override the address advertised to other sheep. This is
important for setups where the computers running sheep nodes have multiple
network interfaces and we need to use a specific one.
Signed-off-by: Christoph Hellwig
diff --git a/sheep/group.c b/sheep/group.c
index 71c64ca..4715
Getting a suitable address to advertise to other sheep is substancially
different functionality from initializing the cluster driver. Split it
into a separate optional method that falls back to the getifaddrs loop
if not specified.
Signed-off-by: Christoph Hellwig
diff --git a/sheep/cluster.h b
On 06/06/2012 07:50 PM, Bastian Scholz wrote:
> Hi all,
>
> firstly, its cool stuff you made :-)
>
> Thanks to all participant.
>
> Maybe it helps, if we can collect some use-cases here?
>
Hi Bastian,
Thanks for your feedbacks. Yes, we are highly appreciated at
constructive feedback from
Hi all,
firstly, its cool stuff you made :-)
Thanks to all participant.
Maybe it helps, if we can collect some use-cases here?
Am 2012-06-06 12:59, schrieb Liu Yuan:
On 06/06/2012 06:54 PM, Christoph Hellwig wrote:
I'd say performance numbers only start to really matter for 20,30+
nodes, or
On 06/06/2012 06:38 PM, Christoph Hellwig wrote:
> Currenly we can easily get into a situation where we can't read objects
> after losing a node in an offline cluster and then doing a manual recovery.
>
> To fix this call start_recovery from cluster_manual_recover. Also move
> get_vnodes_from_ep
On 06/06/2012 06:36 PM, Christoph Hellwig wrote:
> The combination of gethostname and getnameinfo does not seem to work very
> well to find an IP address for a system that doesn't seem have a host
> name, or for one that has IPv6 configured in the kernel without actually
> using it.
Applied this
On 06/06/2012 06:54 PM, Christoph Hellwig wrote:
> On Wed, Jun 06, 2012 at 06:52:16PM +0800, Liu Yuan wrote:
>> At your convenience :) Maybe two (small cluster and bigger one) would
>> suffice. With cached FD pool, we might not lose that much performance
>> for a quick thought, though, but this is
On Wed, Jun 06, 2012 at 06:52:16PM +0800, Liu Yuan wrote:
> At your convenience :) Maybe two (small cluster and bigger one) would
> suffice. With cached FD pool, we might not lose that much performance
> for a quick thought, though, but this is quite radical change of design,
> numbers will definit
On 06/06/2012 06:44 PM, Christoph Hellwig wrote:
> Looks fine, altough I have a patch to completely remove read_epoch()
> in my queue.
will it be ready in near future? I am fine with dropping this patch.
Thanks,
Yuan
--
sheepdog mailing list
sheepdog@lists.wpkg.org
http://lists.wpkg.org/mailma
On 06/06/2012 06:40 PM, Christoph Hellwig wrote:
> Actually we should simply discard this patch 2 - local_stat_cluster
> handles the 0 return just fine. So we'll only need to apply v2 of
> patch 1.
Applied first one v2.
Thanks,
Yuan
--
sheepdog mailing list
sheepdog@lists.wpkg.org
http://list
On 06/06/2012 06:44 PM, Christoph Hellwig wrote:
> On Wed, Jun 06, 2012 at 11:56:53AM +0800, Liu Yuan wrote:
>> The pros is simplification of code, but you don't say cons, I think we
>> should know the perf number we lost by adding a network overhead for IOs
>> happen to be local, this can be easi
On Wed, Jun 06, 2012 at 11:56:53AM +0800, Liu Yuan wrote:
> The pros is simplification of code, but you don't say cons, I think we
> should know the perf number we lost by adding a network overhead for IOs
> happen to be local, this can be easily collected by built-in tracer.
What kind of setup do
On Wed, Jun 06, 2012 at 05:48:04PM +0800, Liu Yuan wrote:
> From: Liu Yuan
>
> When startup the cluster for the first time, we would get the following
> message:
>
> Jun 06 17:29:32 read_epoch(525) failed to read epoch 0
>
> This should be printed because cluster doesn't go error.
Looks fine,
On Wed, Jun 06, 2012 at 06:39:08AM -0400, Christoph Hellwig wrote:
> On Wed, Jun 06, 2012 at 10:40:07AM +0800, Liu Yuan wrote:
> > epoch_log_read_remote() doesn't return -1, so no need to check it. Also
> > need remove {} for one liner if clause.
>
> Oops, forgot to send the updated one when I mad
On Wed, Jun 06, 2012 at 10:40:07AM +0800, Liu Yuan wrote:
> epoch_log_read_remote() doesn't return -1, so no need to check it. Also
> need remove {} for one liner if clause.
Oops, forgot to send the updated one when I made the first patch not
return -1. I'll resend this one, but any reason the up
Currenly we can easily get into a situation where we can't read objects
after losing a node in an offline cluster and then doing a manual recovery.
To fix this call start_recovery from cluster_manual_recover. Also move
get_vnodes_from_epoch into group.c and rename it to fit with the rest of
the v
The combination of gethostname and getnameinfo does not seem to work very
well to find an IP address for a system that doesn't seem have a host
name, or for one that has IPv6 configured in the kernel without actually
using it.
Signed-off-by: Christoph Hellwig
diff --git a/lib/net.c b/lib/net.c
On Wed, Jun 06, 2012 at 10:45:12AM +0800, Liu Yuan wrote:
> Why callbacked isn't removed as in zookeeper and accord driver?
Because corosync also uses it for the join message.
--
sheepdog mailing list
sheepdog@lists.wpkg.org
http://lists.wpkg.org/mailman/listinfo/sheepdog
From: Liu Yuan
When startup the cluster for the first time, we would get the following message:
Jun 06 17:29:32 read_epoch(525) failed to read epoch 0
This should be printed because cluster doesn't go error.
Signed-off-by: Liu Yuan
---
sheep/store.c |3 +++
1 file changed, 3 insertions(+
From: Liu Yuan
update: clean up strcut cluster_driver too
--- >8
This patch tries to completely remove block/unblock as well as
sd_block_handler() code from both cluster driver and core sheep code, simpifying
the code a lot also boost performance a lot. We
From: Liu Yuan
This patch tries to completely remove block/unblock as well as
sd_block_handler() code from both cluster driver and core sheep code, simpifying
the code a lot also boost performance a lot. We actually have a very nice
construct to do blocking request handling:our top/bottom style w
On 06/05/2012 08:07 PM, Christoph Hellwig wrote:
> Let sd_block_handler handle the fine details of how to handle an incoming
> blocking event. By passing the sender node structure we can easily handle
> ignoring it on other nodes, and by keeping a local operation in progress
> flag in group.c we
hmm, I can reproduce this issue by followed steps:
1) insert a base image to sheep cluster.
2) clone 2 ~ 3 vm per each host.
3) start all vm simultaneously.
then I can see the random return value
On Tue, Jun 5, 2012 at 6:22 PM, MORITA Kazutaka
wrote:
> At Tue, 05 Jun 2012 16:43:05 +0800,
> Liu
43 matches
Mail list logo