Re: [Qemu-devel] QEMU Command Line Options

2014-03-28 Thread junqing . wang
hi, Qasim:
 pls refer to 'qemu-options.hx' and the big switch in 'vl.c'.


Thanks,
Jules. 





At 2014-03-28 14:25:43,Qasim Maqbool qasim.maqb...@gmail.com wrote:

Hi,


I need to add a few command line options to QEMU. However, I am yet to 
determine how QEMU takes input from the command line and parses the option 
values. I have tried looking at various files including vl.c and cmd.c but 
nothing seems to work right now.


Can anyone put me on the right path here? Or am I trying to do something which 
is impossible?


Thanks.


Re: [Qemu-devel] [PATCH v2 2/4] Curling: cmdline interface.

2013-10-09 Thread junqing . wang
At 2013-10-01 06:16:34,Eric Blake ebl...@redhat.com wrote: On 09/29/2013 
02:14 PM, Jules Wang wrote:  Add an option '-f' to migration cmdline.  
Indicating whether to enable fault tolerant or not.Signed-off-by: Jules 
Wang junqing.w...@cs2c.com.cn  ---   .help   = migrate to 
URI (using -d to not wait for completion)  \n\t\t\t -b for 
migration without shared storage with   full copy of disk\n\t\t\t 
-i for migration without   shared storage with incremental copy of 
disk   -   (base image shared between src and destination),  +   
(base image shared between src and destination)  +   \n\t\t\t -f for 
fault tolerant, this is another   +   feature rather than migrate,  
That sounds awkward, and overly long.  Maybe go with just:  -f for fault 
tolerance mode  and let the user then read the full documentation for what it 
entails.

Agree.

 -@item migrate [-d] [-b] [-i] @var{uri}
 +@item migrate [-d] [-b] [-i] [-f] @var{uri}
  @findex migrate
  Migrate to @var{uri} (using -d to not wait for completion).
  -b for migration with full copy of disk
  -i for migration with incremental copy of disk (base image is shared)
 +-f for fault tolerant

Can -d and -f be used at the same time, or are they exclusive?

AFAK, The migration is always detached(In the code, the -d option is always 
false),  -d and -f can be used at the same time with no doubt.
qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false, !!ft, ft, err);

By the way, neither -b nor -i could be used at the same time with -f,  fault 
tolerant needs shared storage.

  +++ b/hmp.c
 @@ -1213,10 +1213,11 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
  int detach = qdict_get_try_bool(qdict, detach, 0);
  int blk = qdict_get_try_bool(qdict, blk, 0);
  int inc = qdict_get_try_bool(qdict, inc, 0);
 +   int ft   = qdict_get_try_bool(qdict, ft, 0);

Why two spaces?

To align the '=',  I will remove them if you like. 

 
 +++ b/qapi-schema.json
 @@ -2420,7 +2420,8 @@
  # Since: 0.14.0
  ##
  { 'command': 'migrate',
 -  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' 
 } }
 +  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool',
 +   '*ft': 'bool' } }

Missing documentation, including mention that the new option was only
made available in 1.7.  We still don't have introspection; is there some
other means by which libvirt and other management apps can tell whether
this feature is available? 

I'm not clear about how to do that, could you pls give me some hints, where to 
add code and documentation. 

Furthermore, 'ft' is an awfully short name;
for QMP, we prefer to use full words where possible, such as
'fault-tolerant'.

Agree.

 -- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



Re: [Qemu-devel] [PATCH RFC 0/4] Curling: KVM Fault Tolerance

2013-09-12 Thread junqing . wang
hi,


At the moment in your implementation the prefetch buffer can be very large 
(several copies of guest memory size)  are you planning to address this 
issue? I agree but we need some way to notify the user of such problem.

This issue has been handled (maybe not in the best way).  The prefetch buffer 
size could be increased up to 1.5 * vm memory size. When the migration data 
size is larger than it, the prefetching is stopped with a warning (pls refer to 
the code 4/4) and the loading is started. In this situation, 
broken-in-the-middle problem is inevitable.

The second is that sadly live migration doesn't always converge this means  
that the backup VM won't have a consist state to failover to. You need to 
detect such a case and throttle down the guest to force convergence.
 
 Yes, that's a problem. AFAK, qemu already have an auto convergence feature.
 How about activating it when you do fault tolerance automatically?
That is feasible. Any comments by others?



Re: [Qemu-devel] [PATCH RFC 3/4] Curling: the sender

2013-09-11 Thread junqing . wang
hi,


 +bool create = false;
 This variable is never set.

It is set in the following 'if' block.
 +create = true;===

 -migration_bitmap = bitmap_new(ram_pages);
 -bitmap_set(migration_bitmap, 0, ram_pages);
 -migration_dirty_pages = ram_pages;
 +if (!ft_enabled() || !migration_bitmap)  {
 +migration_bitmap = bitmap_new(ram_pages);
 +bitmap_set(migration_bitmap, 0, ram_pages);
 +migration_dirty_pages = ram_pages;
 +create = true;   ==
 +}

Nothing in this patch sets the migration_bitmap to anything.

Let me explain all the odd 'if'  block:
1  +if (!ft_enabled() || !migration_bitmap)  {
2  +if (!ft_enabled() || create) {
3  +if (!ft_enabled()) {

As I mentioned in the commit log: 
 We need to handle the variables related to live migration very
 carefully. So the new migration does not restart from the very
 begin of the migration, instead, it continues the previous
 migration.

Some variables should not be reset after one migration, because
the next one need these variables to continue the migration.
This explains all the if ft_enabled()

Besides, some variables need to be initialized at the first migration of 
curling.
That explains the if create and if  !migration_bitmap

 +if (ft_enabled()) {
 +if (old_vm_running) {
 +qemu_mutex_lock_iothread();
 +vm_start();
 +qemu_mutex_unlock_iothread();
 +
 +current_time = 
 qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
 +time_spent = current_time - migration_start_time;
 +DPRINTF(this migration lasts for % PRId64 ms\n,
 +time_spent);
 +if (time_spent  time_window) {
 +g_usleep((time_window - time_spent)*1000);

Why are we waiting here?  If we are migration faster than allowed,  why
we are waiting?

Looping fast is not good, that means we enter iothread lock and do vm stop more 
frequently. The performance will drop and vm user will experience input stall 
if we do not sleep.

How to deal with this is a difficult issue, any suggestion is welcomed.

THIS IS ONE OF THE TWO MAIN PROBLEMS.  The other one is related to the magic 
number 0xfeedcafe.




Re: [Qemu-devel] [PATCH RFC 4/4] Curling: the receiver

2013-09-11 Thread junqing . wang
hi,


At 2013-09-10 22:19:48,Juan Quintela quint...@redhat.com wrote:

 @@ -112,13 +113,24 @@ static void process_incoming_migration_co(void *opaque)
  {
  QEMUFile *f = opaque;
  int ret;
 +int count = 0;
  
 -ret = qemu_loadvm_state(f);
 -qemu_fclose(f);
 -if (ret  0) {
 -fprintf(stderr, load of migration failed\n);
 -exit(EXIT_FAILURE);
 +if (ft_enabled()) {
 +while (qemu_loadvm_state_ft(f) = 0) {
 +count++;
 +DPRINTF(incoming count %d\r, count);
 +}
 +qemu_fclose(f);
 +fprintf(stderr, ft connection lost, launching self..\n);

Obviously,  here we are needing something more that an fprintf,,  right?

We are not checking either if it is one error.

Agree.

 +} else {
 +ret = qemu_loadvm_state(f);
 +qemu_fclose(f);
 +if (ret  0) {
 +fprintf(stderr, load of migration failed\n);
 +exit(EXIT_FAILURE);
 +}
  }
 +cpu_synchronize_all_post_init();
  qemu_announce_self();
  DPRINTF(successfully loaded vm state\n);
  
 diff --git a/savevm.c b/savevm.c
 index 6daf690..d5bf153 100644
 --- a/savevm.c
 +++ b/savevm.c
 @@ -52,6 +52,8 @@
  #define ARP_PTYPE_IP 0x0800
  #define ARP_OP_REQUEST_REV 0x3
  
 +#define PFB_SIZE 0x01
 +
  static int announce_self_create(uint8_t *buf,
  uint8_t *mac_addr)
  {
 @@ -135,6 +137,10 @@ struct QEMUFile {
  unsigned int iovcnt;
  
  int last_error;
 +
 +uint8_t *pfb;   /* pfb - PerFetch Buffer */

s/PreFetch/Prefetcth/

prefetch_buffer as name?  not used in so many places,  makes things
clearer or more convoluted?  Other comments?


Agree.

 +static int socket_get_prefetch_buffer(void *opaque, uint8_t *buf,
 +  int64_t pos, int size)
 +{
 +QEMUFile *f = opaque;
 +
 +if (f-pfb_size - pos = 0) {
 +return 0;
 +}
 +
 +if (f-pfb_size - pos  size) {
 +size = f-pfb_size - pos;
 +}
 +
 +memcpy(buf, f-pfb+pos, size);
 +
 +return size;
 +}
 +
 +
  static int socket_close(void *opaque)
  {
  QEMUFileSocket *s = opaque;
 @@ -440,6 +465,7 @@ QEMUFile *qemu_fdopen(int fd, const char *mode)
  static const QEMUFileOps socket_read_ops = {
  .get_fd = socket_get_fd,
  .get_buffer = socket_get_buffer,
 +.get_prefetch_buffer = socket_get_prefetch_buffer,
  .close =  socket_close
  };
  

  if (f-last_error) {
  ret = f-last_error;
  }
 +
 +if (f-pfb) {
 +g_free(f-pfb);

g_free(f-pfb);
It already checks for NULL.

Got it.

 +}
 +
  g_free(f);
  return ret;
  }
 @@ -822,6 +853,14 @@ void qemu_put_byte(QEMUFile *f, int v)
  
  static void qemu_file_skip(QEMUFile *f, int size)
  {
 +if (f-pfb_index + size = f-pfb_size) {
 +f-pfb_index += size;
 +return;
 +} else {
 +size -= f-pfb_size - f-pfb_index;
 +f-pfb_index = f-pfb_size;
 +}
 +
  if (f-buf_index + size = f-buf_size) {
  f-buf_index += size;
  }
 @@ -831,6 +870,21 @@ static int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, 
 int size, size_t offset)
  {
  int pending;
  int index;
 +int done;
 +
 +if (f-ops-get_prefetch_buffer) {
 +if (f-pfb_index + offset  f-pfb_size) {
 +done = f-ops-get_prefetch_buffer(f, buf, f-pfb_index + 
 offset,
 +   size);
 +if (done == size) {
 +return size;
 +}
 +size -= done;
 +buf  += done;
 +} else {
 +offset -= f-pfb_size - f-pfb_index;
 +}
 +}
  
  assert(!qemu_file_is_writable(f));
  
 @@ -875,7 +929,15 @@ int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size)
  
  static int qemu_peek_byte(QEMUFile *f, int offset)
  {
 -int index = f-buf_index + offset;
 +int index;
 +
 +if (f-pfb_index + offset  f-pfb_size) {
 +return f-pfb[f-pfb_index + offset];
 +} else {
 +offset -= f-pfb_size - f-pfb_index;
 +}
 +
 +index = f-buf_index + offset;
  
  assert(!qemu_file_is_writable(f));
  
 @@ -1851,7 +1913,7 @@ void qemu_savevm_state_begin(QEMUFile *f,
  }
  se-ops-set_params(params, se-opaque);
  }
 -
 +
  qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
  qemu_put_be32(f, QEMU_VM_FILE_VERSION);
  
 @@ -2294,8 +2356,6 @@ int qemu_loadvm_state(QEMUFile *f)
  }
  }
  
 -cpu_synchronize_all_post_init();
 -
  ret = 0;
  
  out:
 @@ -2311,6 +2371,89 @@ out:
  return ret;
  }
  
 +int qemu_loadvm_state_ft(QEMUFile *f)
 +{
 +int ret = 0;
 +int i   = 0;
 +int j   = 0;
 +int done = 0;
 +uint64_t size = 0;
 +uint64_t count = 0;
 +uint8_t *pfb = NULL;
 +uint8_t *buf = NULL;
 +
 +uint64_t max_mem = last_ram_offset() * 1.5;
 +
 +if (!f-ops-get_prefetch_buffer) {
 +fprintf(stderr, Fault tolerant is not supported by 

Re: [Qemu-devel] [PATCH RFC 0/4] Curling: KVM Fault Tolerance

2013-09-10 Thread junqing . wang
Hi,

The first is that if the VM failure happen in the middle on the live migration 
 the backup VM state will be inconsistent which means you can't failover to 
it.

Yes, I have concerned about this problem. That is why we need a prefetch buffer.

Solving it is not simple as you need some transaction mechanism that will 
change the backup VM state only when the transaction completes (the live 
migration completes). Kemari has something like that. 

The backup VM state will be loaded only when the one whole migration data is 
prefetched. Otherwise, VM state will not be loaded. So the backup VM is ensured 
to have a consistent state like a checkpoint.
However, how close this checkpoint to the point of the VM failure depends on 
the workload and bandwidth.

The second is that sadly live migration doesn't always converge this means  
that the backup VM won't have a consist state to failover to. You need to 
detect such a case and throttle down the guest to force convergence.

Yes, that's a problem. AFAK, qemu already have an auto convergence feature.
From another perspective,  if many migrations could not converge, maybe the 
workload is high and the bandwidth is low,  and it is not recommended to use 
FT in general.



Re: [Qemu-devel] [PATCH RFC 2/4] Curling: cmdline interface

2013-09-10 Thread junqing . wang


 Shouldn't this be in migration_state?  Just wondering.  And yes,  I
 don't see either a trivial place how to get it.  get_current_migration()?

That's a better idea, I will put 'ft_enabled'  in MigrationState Struct.

 I think for the outgoing side it should just be migrate -f tcp:foo:.
 On the incoming side, perhaps you could have a different ID instead of
 QEMU_VM_FILE_MAGIC, that triggers fault-tolerance mode automatically?

I am OK with this solution,  '-f'  indicates fault-tolerance, right?


Have youdecided yet?