On Tue, Aug 21, 2012 at 1:50 PM, Dietmar Maurer diet...@proxmox.com wrote:
Disabling automatic recovery by default doesn't work for you? You can
control the time to start recovery with collie cluster recover enable.
It just looks strange to me to design the system for immediate/automatic
Hi Dietmar, Hi Yuan,
Am 2012-08-21 07:27, schrieb Dietmar Maurer:
Membership change can happen for many reason. It can happen if
something is
wrong on the switch (or if some admin configures the switch), a
damaged network cable,
a bug in the bonding driver, a damaged network card, or simply a
At Tue, 21 Aug 2012 14:14:23 +0800,
Yunkai Zhang wrote:
I need a conclusion:
Does sheepdog need delay recovery supported by this series (or by
Kazum's new idea and implementation) ?
There are two different discussion in this thread:
1. turn on/off automatic recovery with a collie command
On Tue, Aug 21, 2012 at 2:43 PM, MORITA Kazutaka
morita.kazut...@lab.ntt.co.jp wrote:
At Tue, 21 Aug 2012 14:14:23 +0800,
Yunkai Zhang wrote:
I need a conclusion:
Does sheepdog need delay recovery supported by this series (or by
Kazum's new idea and implementation) ?
There are two
On 08/21/2012 02:48 PM, Yunkai Zhang wrote:
Ok, I'll continue to improve this series after I complete other things.
Why not choose Kazutaka's idea to implement delay recovery? It looks
simple yet efficient at least to me.
Thanks,
Yuan
--
sheepdog mailing list
sheepdog@lists.wpkg.org
On Tue, Aug 21, 2012 at 2:58 PM, Liu Yuan namei.u...@gmail.com wrote:
On 08/21/2012 02:48 PM, Yunkai Zhang wrote:
Ok, I'll continue to improve this series after I complete other things.
Why not choose Kazutaka's idea to implement delay recovery? It looks
simple yet efficient at least to me.
On Tue, Aug 21, 2012 at 3:04 PM, Yunkai Zhang yunkai...@gmail.com wrote:
On Tue, Aug 21, 2012 at 2:58 PM, Liu Yuan namei.u...@gmail.com wrote:
On 08/21/2012 02:48 PM, Yunkai Zhang wrote:
Ok, I'll continue to improve this series after I complete other things.
Why not choose Kazutaka's idea to
On Tue, Aug 21, 2012 at 2:03 AM, MORITA Kazutaka
morita.kazut...@lab.ntt.co.jp wrote:
At Mon, 20 Aug 2012 23:34:10 +0800,
Yunkai Zhang wrote:
In fact, I have thought this method, but we should face nearly the same
problem:
After sheep joined back, it should known which objects is dirty,
At Wed, 22 Aug 2012 01:16:49 +0800,
Yunkai Zhang wrote:
I have read and do simple test with this patch, it works at most time.
But write operation will be blocked in wait_forward_request(), I think
there are some corner case we should handle.
Can you create a testcase to reproduce it?
On Wed, Aug 22, 2012 at 9:31 AM, MORITA Kazutaka
morita.kazut...@lab.ntt.co.jp wrote:
At Wed, 22 Aug 2012 01:16:49 +0800,
Yunkai Zhang wrote:
I have read and do simple test with this patch, it works at most time.
But write operation will be blocked in wait_forward_request(), I think
there
On 08/22/2012 09:44 AM, Yunkai Zhang wrote:
ould you give a mature patch? We really want to use it in our cluster
as soon as possible.
Okay, but I'm currently working on another problem - sheep blocks I/O
requests long time while stale objects are moved to the farm backend
store. I'll
On Wed, Aug 22, 2012 at 9:55 AM, Liu Yuan namei.u...@gmail.com wrote:
On 08/22/2012 09:44 AM, Yunkai Zhang wrote:
ould you give a mature patch? We really want to use it in our cluster
as soon as possible.
Okay, but I'm currently working on another problem - sheep blocks I/O
requests long
On Wed, Aug 22, 2012 at 10:21 AM, MORITA Kazutaka
morita.kazut...@lab.ntt.co.jp wrote:
At Wed, 22 Aug 2012 10:14:07 +0800,
Yunkai Zhang wrote:
My intention is to respect Kazum's idea, if need my help, I'm pleasure
to do it:).
If you complete the work, it will help me a lot. :)
Well, I'll
On Mon, Aug 20, 2012 at 9:00 PM, MORITA Kazutaka
morita.kazut...@lab.ntt.co.jp wrote:
At Thu, 9 Aug 2012 16:43:38 +0800,
Yunkai Zhang wrote:
From: Yunkai Zhang qiushu@taobao.com
V2:
- fix a typo
- when an object is updated, delete it old version
- reset cluster recovery state in
On 08/20/2012 11:34 PM, Yunkai Zhang wrote:
On Mon, Aug 20, 2012 at 9:00 PM, MORITA Kazutaka
morita.kazut...@lab.ntt.co.jp wrote:
At Thu, 9 Aug 2012 16:43:38 +0800,
Yunkai Zhang wrote:
From: Yunkai Zhang qiushu@taobao.com
V2:
- fix a typo
- when an object is updated, delete it old
On Mon, Aug 20, 2012 at 11:34:10PM +0800, Yunkai Zhang wrote:
sheep can succeed in a write operation even if the data is not fully
replicated. But, if we allow it, it is difficult to prevent VMs from
reading old data. Actually this series put a lot of effort into it.
We want to upgrade
On 08/21/2012 12:07 AM, Christoph Hellwig wrote:
Another thing that sprang into mind is that instead of the formal
recovery enable/disable we should simply always delay recovery, that
is only do recovery after every N seconds if changes happened.
Especially in the cases of whole racks going
At Mon, 20 Aug 2012 23:34:10 +0800,
Yunkai Zhang wrote:
In fact, I have thought this method, but we should face nearly the same
problem:
After sheep joined back, it should known which objects is dirty, and
should do the clear work(because there are old version object stay in
it's working
At Tue, 21 Aug 2012 00:29:50 +0800,
Liu Yuan wrote:
On 08/21/2012 12:07 AM, Christoph Hellwig wrote:
Another thing that sprang into mind is that instead of the formal
recovery enable/disable we should simply always delay recovery, that
is only do recovery after every N seconds if changes
On Tue, Aug 21, 2012 at 10:46 AM, Liu Yuan namei.u...@gmail.com wrote:
On 08/21/2012 02:29 AM, MORITA Kazutaka wrote:
I think delaying recovery for a few seconds always is useful for many
users. Under heavy network load, sheep can wrongly detect node
failure and node membership can change
On Tue, Aug 21, 2012 at 2:03 AM, MORITA Kazutaka
morita.kazut...@lab.ntt.co.jp wrote:
At Mon, 20 Aug 2012 23:34:10 +0800,
Yunkai Zhang wrote:
In fact, I have thought this method, but we should face nearly the same
problem:
After sheep joined back, it should known which objects is dirty,
At Tue, 21 Aug 2012 10:46:19 +0800,
Liu Yuan wrote:
So I think we have to handle epoch mismatch and object multi-version
problems before evaluating delay recovery for network partition.
Yes, delay recovery doesn't solve my example at all unless sheepdog
handles network partition. I didn't
On 08/21/2012 11:21 AM, MORITA Kazutaka wrote:
At Tue, 21 Aug 2012 10:46:19 +0800,
Liu Yuan wrote:
So I think we have to handle epoch mismatch and object multi-version
problems before evaluating delay recovery for network partition.
Yes, delay recovery doesn't solve my example at all
On 08/21/2012 12:07 AM, Christoph Hellwig wrote:
Another thing that sprang into mind is that instead of the formal
recovery enable/disable we should simply always delay recovery, that
is only do recovery after every N seconds if changes happened.
Especially in the cases of whole racks
At Tue, 21 Aug 2012 04:34:05 +,
Dietmar Maurer wrote:
On 08/21/2012 12:07 AM, Christoph Hellwig wrote:
Another thing that sprang into mind is that instead of the formal
recovery enable/disable we should simply always delay recovery, that
is only do recovery after every N seconds
On 08/21/2012 12:34 PM, Dietmar Maurer wrote:
I still think that automatic recovery without delay is the wrong approach. At
least for
small clusters you simply want to avoid unnecessary traffic. Such recovery
can produce
massive traffic on the network (several TB of data), and can make the
I think your example is very vague, what kind of driver you use? Sheep itself
won't sense membership and rely on cluster drivers to maintain
membership. Could you detail how it happen exactly in real case?
Membership change can happen for many reason. It can happen if something is
wrong on
Disabling automatic recovery by default doesn't work for you? You can
control the time to start recovery with collie cluster recover enable.
It just looks strange to me to design the system for immediate/automatic
recovery, and
make 'disabling automatic recovery' an option. I would include
On 08/09/2012 04:43 PM, Yunkai Zhang wrote:
- fix a typo
- when an object is updated, delete it old version
- reset cluster recovery state in finish_recovery()
You should brief what your patch set does in the introduction cover
letter. I have no idea what your INTRODUCE means. Please complete
On Mon, Aug 13, 2012 at 10:29 AM, Liu Yuan namei.u...@gmail.com wrote:
On 08/09/2012 04:43 PM, Yunkai Zhang wrote:
- fix a typo
- when an object is updated, delete it old version
- reset cluster recovery state in finish_recovery()
You should brief what your patch set does in the introduction
From: Yunkai Zhang qiushu@taobao.com
V2:
- fix a typo
- when an object is updated, delete it old version
- reset cluster recovery state in finish_recovery()
Yunkai Zhang (11):
sheep: enable variale-length of join_message in response of join
event
sheep: share joining nodes with newly
31 matches
Mail list logo