Re: [Qemu-devel] [sheepdog] [PATCH v2 0/9] sheepdog: reconnect server after connection failure

2013-07-25 Thread Liu Yuan
On Thu, Jul 25, 2013 at 02:53:57PM +0900, MORITA Kazutaka wrote:
 At Thu, 25 Jul 2013 13:25:33 +0800,
 Liu Yuan wrote:
  
  Hello Kazutaka,
  
 I have two patches fixing the problems I found on my testing and they are
  complementary patches. Please consider sending them on top of your patch 
  set.
 
 Thanks a lot for your comments and patches, but I've already prepared
 patches, which would be probably better fixes.  I'll send the v3
 series soon.  It'd be appreciated if you would give a review for it.
 

Okay, no problem. Well, in my previous patches, patch 2/2 isn't correct, I did a
wrong manual rebase by hasty copy. Just FYI.

Thanks
Yuan



Re: [Qemu-devel] [sheepdog] [PATCH v2 0/9] sheepdog: reconnect server after connection failure

2013-07-24 Thread Liu Yuan
On Wed, Jul 24, 2013 at 04:56:24PM +0900, MORITA Kazutaka wrote:
 Currently, if a sheepdog server exits, all the connecting VMs need to
 be restarted.  This series implements a feature to reconnect the
 server, and enables us to do online sheepdog upgrade and avoid
 restarting VMs when sheepdog servers crash unexpectedly.
 

It doesn't work on my test. I tried start linux-0.2.img stored in sheepdog
cluster and then

1. did some buffered writes
2. restart sheep that this QEMU VM connected to.
3. $ sync

I got following error:

$ ../qemu/x86_64-softmmu/qemu-system-x86_64 --enable-kvm -m 1024 -hda 
sheepdog:test
qemu-system-x86_64: failed to get the header, Resource temporarily unavailable
qemu-system-x86_64: Failed to connect to socket: Connection refused
qemu-system-x86_64: Failed to connect to socket: Connection refused
qemu-system-x86_64: Failed to connect to socket: Connection refused
qemu-system-x86_64: Failed to connect to socket: Connection refused
qemu-system-x86_64: Failed to connect to socket: Connection refused
...repeat...

QEMU version is master tip

Thanks
Yuan



Re: [Qemu-devel] [sheepdog] [PATCH v2 0/9] sheepdog: reconnect server after connection failure

2013-07-24 Thread MORITA Kazutaka
At Wed, 24 Jul 2013 16:28:30 +0800,
Liu Yuan wrote:
 
 On Wed, Jul 24, 2013 at 04:56:24PM +0900, MORITA Kazutaka wrote:
  Currently, if a sheepdog server exits, all the connecting VMs need to
  be restarted.  This series implements a feature to reconnect the
  server, and enables us to do online sheepdog upgrade and avoid
  restarting VMs when sheepdog servers crash unexpectedly.
  
 
 It doesn't work on my test. I tried start linux-0.2.img stored in sheepdog
 cluster and then
 
 1. did some buffered writes
 2. restart sheep that this QEMU VM connected to.
 3. $ sync
 
 I got following error:
 
 $ ../qemu/x86_64-softmmu/qemu-system-x86_64 --enable-kvm -m 1024 -hda 
 sheepdog:test
 qemu-system-x86_64: failed to get the header, Resource temporarily unavailable
 qemu-system-x86_64: Failed to connect to socket: Connection refused
 qemu-system-x86_64: Failed to connect to socket: Connection refused
 qemu-system-x86_64: Failed to connect to socket: Connection refused
 qemu-system-x86_64: Failed to connect to socket: Connection refused
 qemu-system-x86_64: Failed to connect to socket: Connection refused
 ...repeat...
 
 QEMU version is master tip

Your sheep daemon looks like unreachable from qemu.  I tried the same
procedure, but couldn't reproduce it.

Is the problem reproducible?  Can you make sure that you can connect
to the sheep daemon from collie while the error message shows up?

Thanks,

Kazutaka



Re: [Qemu-devel] [sheepdog] [PATCH v2 0/9] sheepdog: reconnect server after connection failure

2013-07-24 Thread Liu Yuan
On Wed, Jul 24, 2013 at 06:07:21PM +0900, MORITA Kazutaka wrote:
 At Wed, 24 Jul 2013 16:28:30 +0800,
 Liu Yuan wrote:
  
  On Wed, Jul 24, 2013 at 04:56:24PM +0900, MORITA Kazutaka wrote:
   Currently, if a sheepdog server exits, all the connecting VMs need to
   be restarted.  This series implements a feature to reconnect the
   server, and enables us to do online sheepdog upgrade and avoid
   restarting VMs when sheepdog servers crash unexpectedly.
   
  
  It doesn't work on my test. I tried start linux-0.2.img stored in sheepdog
  cluster and then
  
  1. did some buffered writes
  2. restart sheep that this QEMU VM connected to.
  3. $ sync
  
  I got following error:
  
  $ ../qemu/x86_64-softmmu/qemu-system-x86_64 --enable-kvm -m 1024 -hda 
  sheepdog:test
  qemu-system-x86_64: failed to get the header, Resource temporarily 
  unavailable
  qemu-system-x86_64: Failed to connect to socket: Connection refused
  qemu-system-x86_64: Failed to connect to socket: Connection refused
  qemu-system-x86_64: Failed to connect to socket: Connection refused
  qemu-system-x86_64: Failed to connect to socket: Connection refused
  qemu-system-x86_64: Failed to connect to socket: Connection refused
  ...repeat...
  
  QEMU version is master tip
 
 Your sheep daemon looks like unreachable from qemu.  I tried the same
 procedure, but couldn't reproduce it.
 
 Is the problem reproducible?  Can you make sure that you can connect
 to the sheep daemon from collie while the error message shows up?
 

Yesh. Well I try to repeat it with following process:

1. did some buffered write
2. kill the sheep
3. $ sync # at guest, now 'sync' hang for response
4. restart sheep

After 4 'sync' still hangs until timeout with a message
hda:dma_timer_expiry: dma status == 0x21

Guest end up freeze.

QEMU output is the same:
qemu-system-x86_64: failed to get the header, Resource temporarily unavailable
qemu-system-x86_64: Failed to connect to socket: Connection refused
qemu-system-x86_64: Failed to connect to socket: Connection refused
qemu-system-x86_64: Failed to connect to socket: Connection refused
qemu-system-x86_64: Failed to connect to socket: Connection refused

But notice, if I did restart sheep with guest doing nothing, your patch set work
like a charm.

Thanks
Yuan



Re: [Qemu-devel] [sheepdog] [PATCH v2 0/9] sheepdog: reconnect server after connection failure

2013-07-24 Thread Liu Yuan
On Wed, Jul 24, 2013 at 11:42:49PM +0800, Liu Yuan wrote:
 On Wed, Jul 24, 2013 at 06:07:21PM +0900, MORITA Kazutaka wrote:
  At Wed, 24 Jul 2013 16:28:30 +0800,
  Liu Yuan wrote:
   
   On Wed, Jul 24, 2013 at 04:56:24PM +0900, MORITA Kazutaka wrote:
Currently, if a sheepdog server exits, all the connecting VMs need to
be restarted.  This series implements a feature to reconnect the
server, and enables us to do online sheepdog upgrade and avoid
restarting VMs when sheepdog servers crash unexpectedly.

   
   It doesn't work on my test. I tried start linux-0.2.img stored in sheepdog
   cluster and then
   
   1. did some buffered writes
   2. restart sheep that this QEMU VM connected to.
   3. $ sync
   
   I got following error:
   
   $ ../qemu/x86_64-softmmu/qemu-system-x86_64 --enable-kvm -m 1024 -hda 
   sheepdog:test
   qemu-system-x86_64: failed to get the header, Resource temporarily 
   unavailable
   qemu-system-x86_64: Failed to connect to socket: Connection refused
   qemu-system-x86_64: Failed to connect to socket: Connection refused
   qemu-system-x86_64: Failed to connect to socket: Connection refused
   qemu-system-x86_64: Failed to connect to socket: Connection refused
   qemu-system-x86_64: Failed to connect to socket: Connection refused
   ...repeat...
   
   QEMU version is master tip
  
  Your sheep daemon looks like unreachable from qemu.  I tried the same
  procedure, but couldn't reproduce it.
  
  Is the problem reproducible?  Can you make sure that you can connect
  to the sheep daemon from collie while the error message shows up?
  
 
 Yesh. Well I try to repeat it with following process:
 
 1. did some buffered write
 2. kill the sheep
 3. $ sync # at guest, now 'sync' hang for response
 4. restart sheep
 
 After 4 'sync' still hangs until timeout with a message
 hda:dma_timer_expiry: dma status == 0x21
 
 Guest end up freeze.
 
 QEMU output is the same:
 qemu-system-x86_64: failed to get the header, Resource temporarily unavailable
 qemu-system-x86_64: Failed to connect to socket: Connection refused
 qemu-system-x86_64: Failed to connect to socket: Connection refused
 qemu-system-x86_64: Failed to connect to socket: Connection refused
 qemu-system-x86_64: Failed to connect to socket: Connection refused
 
 But notice, if I did restart sheep with guest doing nothing, your patch set 
 work
 like a charm.

I have debug it a bit. The problem is that at stage 3, 'sync' invoke
add_aio_request() in the sheepdog driver and add_aio_request *succeed* with aio
put on the inflight_aio_head list, *not* on the failed_aio_head list. So in the
reconnect_to_sdog(), we have no way to resend the targeted aio and 'sync' wait
for ever.

Thanks
Yuan



Re: [Qemu-devel] [sheepdog] [PATCH v2 0/9] sheepdog: reconnect server after connection failure

2013-07-24 Thread MORITA Kazutaka
At Thu, 25 Jul 2013 13:25:33 +0800,
Liu Yuan wrote:
 
 Hello Kazutaka,
 
I have two patches fixing the problems I found on my testing and they are
 complementary patches. Please consider sending them on top of your patch set.

Thanks a lot for your comments and patches, but I've already prepared
patches, which would be probably better fixes.  I'll send the v3
series soon.  It'd be appreciated if you would give a review for it.

Thanks,

Kazutaka