Re: [vdsm] moving the collection of statistics to external process

2012-12-07 Thread Mark Wu

On 12/05/2012 10:23 PM, ybronhei wrote:
As part of an issue that if you push start for 200vms in the same time 
it takes hours because undefined issue, we thought about moving the 
collection of statistics outside vdsm.


It can help because the stat collection is an internal threads of vdsm 
that can spend not a bit of a time, I'm not sure if it would help with 
the issue of starting many vms simultaneously, but it might improve 
vdsm response.


Currently we start thread for each vm and then collecting stats on 
them in constant intervals, and it must effect vdsm if we have 200 
thread like this that can take some time. for example if we have 
connection errors to storage and we can't receive its response, all 
the 200 threads can get stuck and lock other threads (gil issue).
As far as I know, the design of oop is try to resolve the problem you 
state. However,  I don't understand how GIL can cause this problem?  
Python should release GIL before executing any I/O involved
instruction.  I did some tests before and found the other threads can 
continue to run while one thread get stuck on I/O.


I wanted to know what do you think about it and if you have better 
solution to avoid initiate so many threads? And if splitting vdsm is a 
good idea here?
In first look, my opinion is that it can help and would be nice to 
have vmStatisticService that runs and writes to separate log the vms 
status.


The problem with this solution is that if  those interval functions 
needs to communicate with internal parts of vdsm to set values or 
start internal processes when something has changed, it depends on the 
stat function.. and I'm not sure that stat function should control 
Asinternal flows.
Today to recognize connectivity error we count on this method, but we 
can add polling mechanics for those issues (which can raise same 
problems we are trying to deal with..)


I would like to here your ideas and comments.. thanks



___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] moving the collection of statistics to external process

2012-12-07 Thread Mark Wu

On 12/06/2012 11:29 PM, Adam Litke wrote:

On Thu, Dec 06, 2012 at 11:19:34PM +0800, Shu Ming wrote:

于 2012-12-6 4:51, Itamar Heim 写道:

On 12/05/2012 10:33 PM, Adam Litke wrote:

On Wed, Dec 05, 2012 at 10:21:39PM +0200, Itamar Heim wrote:

On 12/05/2012 10:16 PM, Adam Litke wrote:

On Wed, Dec 05, 2012 at 09:01:24PM +0200, Itamar Heim wrote:

On 12/05/2012 08:57 PM, Adam Litke wrote:

On Wed, Dec 05, 2012 at 08:30:10PM +0200, Itamar Heim wrote:

On 12/05/2012 04:42 PM, Adam Litke wrote:

I wanted to know what do you think about it and if
you have better

solution to avoid initiate so many threads? And
if splitting vdsm is
a good idea here?
In first look, my opinion is that it can help
and would be nice to
have vmStatisticService that runs and writes to
separate log the vms
status.

Vdsm recently started requiring the MOM package. MOM
also performs some host
and guest statistics collection as part of the
policy framework.  I think it
would be a really good idea to consolidate all stats
collection into MOM.  Then,
all stats become usable within the policy and by
vdsm for its own internal
purposes.  Today, MOM has one stats collection
thread per VM and one thread for
the host stats.  It has an API for gathering the
most recently collected stats
which vdsm can use.


isn't this what collectd (and its libvirt plugin) or
pcp are already doing?

Lot's of things collect statistics, but as of right now,
we're using MOM and
we're not yet using collectd on the host, right?


I think we should have a single stats collection service
and clients for it.
I think mom and vdsm should get their stats from that service,
rather than have either beholden to any new stats something needs to
collect.

How would this work for collecting guest statistics?  Would
we require collectd
to be installed in all guests running under oVirt?


my understanding is collectd is installed on the host, and uses
collects libvirt plugin to collect guests statistics?

Yes, but some statistics can only be collected by making a call
to the oVirt
guest agent (eg. guest memory statistics).  The logical next
step would be to
write a collectd plugin for ovirt-guest-agent, but vdsm owns the
connections to
the guest agents and probably does not want to multiplex those
connections for
many reasons (security being the main one).


and some will come from qemu-ga which libvirt will support?
maybe a collectd vdsm plugin for the guest agent stats?


I am thinking to have the collectd as a stand alone service to
collect the statics from both ovirt-guest and qemu-ga.  Then
collected can export the information to host proc file system in
layered architecture.  Then mom or other vdsm service can get the
information from the proc file system like other OS statics exported
in the host.

You wouldn't use the host /proc filesystem for this purpose.  /proc is an
interface between userspace and the kernel.  It is not for direct application
use.

The problem I see with hooking collectd up to ovirt-ga is that vdsm still needs
a connection to ovirt-ga for things like shutdown and desktopLogin.  Today vdsm,
owns the connection to the guest agent and there is not a nice way to multiplex
that connection for use by multiple clients simultaneously.

Actually,  I don't like to collect from statistics from guest agent.  
Now libvirt can provide the statistics of vcpu, block and network
interface.  So I think we should reconsider enabling guest memory report 
in virtio balloon driver.  I am not sure if async event is

supported in qmp now. How do you think of it?

In vdsm and mom,  we don't just simply collect statistics, but also need 
perform appropriate action on it.  So probably we still need a output plugin
for collectd to to make the data is available to vdsm and mom, and 
generate an event to vdsm or mom when the data reaches a given threshold.

Just an idea.  I am not sure how easy to implement it.











___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] moving the collection of statistics to external process

2012-12-07 Thread Itamar Heim

On 12/07/2012 12:39 PM, Mark Wu wrote:

On 12/06/2012 11:29 PM, Adam Litke wrote:

On Thu, Dec 06, 2012 at 11:19:34PM +0800, Shu Ming wrote:

于 2012-12-6 4:51, Itamar Heim 写道:

On 12/05/2012 10:33 PM, Adam Litke wrote:

On Wed, Dec 05, 2012 at 10:21:39PM +0200, Itamar Heim wrote:

On 12/05/2012 10:16 PM, Adam Litke wrote:

On Wed, Dec 05, 2012 at 09:01:24PM +0200, Itamar Heim wrote:

On 12/05/2012 08:57 PM, Adam Litke wrote:

On Wed, Dec 05, 2012 at 08:30:10PM +0200, Itamar Heim wrote:

On 12/05/2012 04:42 PM, Adam Litke wrote:

I wanted to know what do you think about it and if
you have better

solution to avoid initiate so many threads? And
if splitting vdsm is
a good idea here?
In first look, my opinion is that it can help
and would be nice to
have vmStatisticService that runs and writes to
separate log the vms
status.

Vdsm recently started requiring the MOM package. MOM
also performs some host
and guest statistics collection as part of the
policy framework.  I think it
would be a really good idea to consolidate all stats
collection into MOM.  Then,
all stats become usable within the policy and by
vdsm for its own internal
purposes.  Today, MOM has one stats collection
thread per VM and one thread for
the host stats.  It has an API for gathering the
most recently collected stats
which vdsm can use.


isn't this what collectd (and its libvirt plugin) or
pcp are already doing?

Lot's of things collect statistics, but as of right now,
we're using MOM and
we're not yet using collectd on the host, right?


I think we should have a single stats collection service
and clients for it.
I think mom and vdsm should get their stats from that service,
rather than have either beholden to any new stats something
needs to
collect.

How would this work for collecting guest statistics?  Would
we require collectd
to be installed in all guests running under oVirt?


my understanding is collectd is installed on the host, and uses
collects libvirt plugin to collect guests statistics?

Yes, but some statistics can only be collected by making a call
to the oVirt
guest agent (eg. guest memory statistics).  The logical next
step would be to
write a collectd plugin for ovirt-guest-agent, but vdsm owns the
connections to
the guest agents and probably does not want to multiplex those
connections for
many reasons (security being the main one).


and some will come from qemu-ga which libvirt will support?
maybe a collectd vdsm plugin for the guest agent stats?


I am thinking to have the collectd as a stand alone service to
collect the statics from both ovirt-guest and qemu-ga.  Then
collected can export the information to host proc file system in
layered architecture.  Then mom or other vdsm service can get the
information from the proc file system like other OS statics exported
in the host.

You wouldn't use the host /proc filesystem for this purpose.  /proc is an
interface between userspace and the kernel.  It is not for direct
application
use.

The problem I see with hooking collectd up to ovirt-ga is that vdsm
still needs
a connection to ovirt-ga for things like shutdown and desktopLogin.
Today vdsm,
owns the connection to the guest agent and there is not a nice way to
multiplex
that connection for use by multiple clients simultaneously.
/home/tlv/iheim/workspace

Actually,  I don't like to collect from statistics from guest agent. Now
libvirt can provide the statistics of vcpu, block and network
interface.  So I think we should reconsider enabling guest memory report
in virtio balloon driver.  I am not sure if async event is
supported in qmp now. How do you think of it?

In vdsm and mom,  we don't just simply collect statistics, but also need
perform appropriate action on it.  So probably we still need a output
plugin
for collectd to to make the data is available to vdsm and mom, and
generate an event to vdsm or mom when the data reaches a given threshold.
Just an idea.  I am not sure how easy to implement it.


should be easy for such stats, question is what other items are reported 
by the current guest agent (say, list of installed applications).

___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] RFC: New Storage API

2012-12-07 Thread Saggi Mizrahi


- Original Message -
 From: Deepak C Shetty deepa...@linux.vnet.ibm.com
 To: Saggi Mizrahi smizr...@redhat.com
 Cc: Shu Ming shum...@linux.vnet.ibm.com, engine-devel 
 engine-de...@ovirt.org, VDSM Project Development
 vdsm-devel@lists.fedorahosted.org, Deepak C Shetty 
 deepa...@linux.vnet.ibm.com
 Sent: Friday, December 7, 2012 12:23:15 AM
 Subject: Re: [vdsm] RFC: New Storage API
 
 On 12/06/2012 10:22 PM, Saggi Mizrahi wrote:
 
  - Original Message -
  From: Shu Ming shum...@linux.vnet.ibm.com
  To: Saggi Mizrahi smizr...@redhat.com
  Cc: VDSM Project Development
  vdsm-devel@lists.fedorahosted.org, engine-devel
  engine-de...@ovirt.org
  Sent: Thursday, December 6, 2012 11:02:02 AM
  Subject: Re: [vdsm] RFC: New Storage API
 
  Saggi,
 
  Thanks for sharing your thought and I get some comments below.
 
 
  Saggi Mizrahi:
  I've been throwing a lot of bits out about the new storage API
  and
  I think it's time to talk a bit.
  I will purposefully try and keep implementation details away and
  concentrate about how the API looks and how you use it.
 
  First major change is in terminology, there is no long a storage
  domain but a storage repository.
  This change is done because so many things are already called
  domain in the system and this will make things less confusing for
  new-commers with a libvirt background.
 
  One other changes is that repositories no longer have a UUID.
  The UUID was only used in the pool members manifest and is no
  longer needed.
 
 
  connectStorageRepository(repoId, repoFormat,
  connectionParameters={}):
  repoId - is a transient name that will be used to refer to the
  connected domain, it is not persisted and doesn't have to be the
  same across the cluster.
  repoFormat - Similar to what used to be type (eg. localfs-1.0,
  nfs-3.4, clvm-1.2).
  connectionParameters - This is format specific and will used to
  tell VDSM how to connect to the repo.
 
  Where does repoID come from? I think repoID doesn't exist before
  connectStorageRepository() return.  Isn't repoID a return value of
  connectStorageRepository()?
  No, repoIDs are no longer part of the domain, they are just a
  transient handle.
  The user can put whatever it wants there as long as it isn't
  already taken by another currently connected domain.
 
 So what happens when user mistakenly gives a repoID that is in use
 before.. there should be something in the return value that specifies
 the error and/or reason for error so that user can try with a
 new/diff
 repoID ?
Asi I said, connect fails if the repoId is in use ATM.
 
  disconnectStorageRepository(self, repoId)
 
 
  In the new API there are only images, some images are mutable and
  some are not.
  mutable images are also called VirtualDisks
  immutable images are also called Snapshots
 
  There are no explicit templates, you can create as many images as
  you want from any snapshot.
 
  There are 4 major image operations:
 
 
  createVirtualDisk(targetRepoId, size, baseSnapshotId=None,
  userData={}, options={}):
 
  targetRepoId - ID of a connected repo where the disk will be
  created
  size - The size of the image you wish to create
  baseSnapshotId - the ID of the snapshot you want the base the new
  virtual disk on
  userData - optional data that will be attached to the new VD,
  could
  be anything that the user desires.
  options - options to modify VDSMs default behavior
 
 IIUC, i can use options to do storage offloads ? For eg. I can create
 a
 LUN that represents this VD on my storage array based on the
 'options'
 parameter ? Is this the intended way to use 'options' ?
No, this has nothing to do with offloads.
If by offloads you mean having other VDSM hosts to the heavy lifting then 
this is what the option autoFix=False and the fix mechanism is for.
If you are talking about advanced scsi features (ie. write same) they will be 
used automatically whenever possible.
In any case, how we manage LUNs (if they are even used) is an implementation 
detail.
 
 
  returns the id of the new VD
  I think we will also need a function to check if a a VirtualDisk
  is
  based on a specific snapshot.
  Like: isSnapshotOf(virtualDiskId, baseSnapshotID):
  No, the design is that volume dependencies are an implementation
  detail.
  There is no reason for you to know that an image is physically a
  snapshot of another.
  Logical snapshots, template information, and any other information
  can be set by the user by using the userData field available for
  every image.
  createSnapshot(targetRepoId, baseVirtualDiskId,
   userData={}, options={}):
  targetRepoId - The ID of a connected repo where the new sanpshot
  will be created and the original image exists as well.
  size - The size of the image you wish to create
  baseVirtualDisk - the ID of a mutable image (Virtual Disk) you
  want
  to snapshot
  userData - optional data that will be attached to the new
  Snapshot,
  could be anything that