Re: [vdsm] [RFC] GlusterFS domain specific changes

2012-09-07 Thread Ayal Baron


- Original Message -
> On Fri, 07 Sep 2012 13:27:15 +0530, "M. Mohan Kumar"
>  wrote:
> > On Fri, 07 Sep 2012 09:35:10 +0300, Itamar Heim 
> > wrote:
> > > On 09/07/2012 08:21 AM, M. Mohan Kumar wrote:
> > > > On Thu, 6 Sep 2012 18:59:19 -0400 (EDT), Ayal Baron
> > > >  wrote:
> > > >>
> > > >>
> > >
> > > >
> > > > For start using the LVs we will always do truncate for the
> > > > required
> > > > size, it will resize the LV. I didn't get what you are
> > > > mentioning about
> > > > thin-provisioning, but I have a dumb code using dm-thin targets
> > > > showing
> > > > BD xlators can be extended to use dm-thin targets for
> > > > thin-provisioning.
> > > 
> > > so even though this is block storage, it will be extended as
> > > needed? how
> > > does that work exactly?
> > > say i have a VM with a 100GB disk.
> > > thin provisioning means we only allocated 1GB to it, then as the
> > > guest
> > > uses that storage, we allocate more as needed (lvextend, pause
> > > guest,
> > > lvrefresh, resume guest)
> > > 
> > > 
> > 
> > When we use device=lv, it means we use only thick provisioned
> > logical
> > volumes. If this logical volume runs out of space in the guest, one
> > can
> > resize it from the client by using truncate (results in lvresize at
> > the
> > server side) and run filesystem tools at guest to get added space.
> > 
> > But with device=thin type, all LVs are thinly provisioned and
> > allocating
> > space to them is taken care by device-mapper thin target
> > automatically. The thin-pool should have enough space to
> > accomoodate the
> > sizing requirements.
> > 
> As of now BD xlator supports only working with linear Logical
> volumes,
> they are thick provisioned. gluster cli command "gluster volume
> create"
> with option "device=lv" allows to work with logical volumes as files.
> 
> As a POC I have a code(not posted to external list), with option
> "device=thin" to gluster volume create command it allows to work with
> thin provisioned targets. But it does not take care of resizing
> thin-pool when it reaches low-level threshold. Supporting thin
> targets
> is in our TODO list. We have dependency on lvm2 library to provide
> apis
> to create thin-targets.

I'm definitely missing some background here.
1. Can the LV span on multiple bricks in gLuster?
 i. If 'yes' then
   a. do you use gLuster's replication and distribution schemes to gain 
performance and redundancy?
   b. what performance gain is there over normal gLuster with files?
 ii. If 'not' then you're only exposing single host local storage LVM? (in 
which case I don't see why gLuster is used at all and where).

From a different angle, the only benefit I can think of in exposing a fs 
interface over LVM is for consumers who do not wish to know the details of the 
underlying storage but want the performance gain of using block storage.
vdsm is already intimately familiar with LVM and block devices, so adding the 
FS layer scheme on top doesn't strike me as adding any value. In addition, you 
require the consumer to know a lot about your interface because it's not truely 
a FS interface.  e.g. consumer is not allowed to create directories, files are 
not sparse, not to mention that if you're indeed using LVM then I don't think 
you're considering the VG MD and extent size limitations:
1. LVM currently has severe limitations wrt number of objects it can manage 
(the limitation is actually the size of the VG metadata, but the distinction is 
not important just yet).  This means that creating a metadata LV in addition to 
each data LV is very costly (at around 1000 LVs you'd hit a problem.  vdsm 
currently creates 2 files per snapshot (the data and a small file with metadata 
describing it) meaning that you'd reach this limit really fast.
2. LVM max LV size is extent size * 65K, this means that if I choose a 4K 
extent size then my max LV size would be 256MB. This obviously won't do for VMs 
disks so you'd choose a much larget extent size.  However a larger extent size 
means that each metadata file vdsm creates wastes a lot of storage space.  So 
even if LVM could scale, your storage usage plummets and your $/MB ratio 
increases.
The way around this is of course not to have a metadata file per volume but 
have 1 file containing all the metadata, but then that means I'm fully aware of 
the limitations of the environment and treating my objects as files gains me 
nothing (but does require a new hybrid domain, a lot more code etc).

Also note that without thin provisioning we loose our ability to create 
snapshots.

> 
> 
>  
> 
> ___
> vdsm-devel mailing list
> vdsm-devel@lists.fedorahosted.org
> https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
> 
___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] Change in vdsm[master]: bootstrap: perform reboot asynchronously

2012-09-07 Thread Alon Bar-Lev


- Original Message -
> From: "Ryan Harper" 
> To: "Alon Bar-Lev" 
> Cc: "Ryan Harper" , vdsm-devel@lists.fedorahosted.org
> Sent: Friday, September 7, 2012 10:47:10 PM
> Subject: Re: Change in vdsm[master]: bootstrap: perform reboot asynchronously
> 
> * Alon Bar-Lev  [2012-09-07 14:45]:
> > 
> > 
> > - Original Message -
> > > From: "Ryan Harper" 
> > > To: "Alon Bar-Lev" 
> > > Cc: vdsm-devel@lists.fedorahosted.org
> > > Sent: Friday, September 7, 2012 10:30:18 PM
> > > Subject: Re: Change in vdsm[master]: bootstrap: perform reboot
> > > asynchronously
> > > 
> > > * Alon Bar-Lev  [2012-09-05 16:11]:
> > > > Alon Bar-Lev has uploaded a new change for review.
> > > > 
> > > > Change subject: bootstrap: perform reboot asynchronously
> > > > ..
> > > > 
> > > > bootstrap: perform reboot asynchronously
> > > > 
> > > > The use of /sbin/reboot may cause reboot to be performed at the
> > > > middle
> > > > of script execution.
> > > > 
> > > > Reboot should be delayed in background so that script will have
> > > > a
> > > > fair
> > > > chance to terminate properly.
> > > 
> > > So, we fork and sleep 10 seconds?  Is that really want we want to
> > > do?
> > > Why is 10 seconds enough?
> > > 
> > > Shouldn't the deployUtil be tracking the script execution and
> > > waiting
> > > for the scripts to complete before rebooting?
> > 
> > Hi,
> > 
> > Reboot is called at the very end of the script, 10 seconds is more
> > than enough.
> 
> I don't know how we can assert that... we're not the sole process on
> the
> box.
> 
> > 
> > You are right that we can track the pid of the bootstrap script's
> > parent parent parent, but it will introduce more complexity that I
> > am
> > not sure worth it.
> 
> Why can't we just wait on the PID if it we know it?

Because if we want to have this precise we need to track the following chain of 
processes.

sshd->sh->python->python

If we only track the last link in chain, it is not enough as we have race 
anyway, and have to wait some extra seconds, as the sh is doing some more logic 
and cleanups.

We can create the process tree which stop either at ssh or init... but even 
then if this is run differently we have a problem.

Alon.
___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] Change in vdsm[master]: bootstrap: perform reboot asynchronously

2012-09-07 Thread Ryan Harper
* Alon Bar-Lev  [2012-09-07 14:45]:
> 
> 
> - Original Message -
> > From: "Ryan Harper" 
> > To: "Alon Bar-Lev" 
> > Cc: vdsm-devel@lists.fedorahosted.org
> > Sent: Friday, September 7, 2012 10:30:18 PM
> > Subject: Re: Change in vdsm[master]: bootstrap: perform reboot 
> > asynchronously
> > 
> > * Alon Bar-Lev  [2012-09-05 16:11]:
> > > Alon Bar-Lev has uploaded a new change for review.
> > > 
> > > Change subject: bootstrap: perform reboot asynchronously
> > > ..
> > > 
> > > bootstrap: perform reboot asynchronously
> > > 
> > > The use of /sbin/reboot may cause reboot to be performed at the
> > > middle
> > > of script execution.
> > > 
> > > Reboot should be delayed in background so that script will have a
> > > fair
> > > chance to terminate properly.
> > 
> > So, we fork and sleep 10 seconds?  Is that really want we want to do?
> > Why is 10 seconds enough?
> > 
> > Shouldn't the deployUtil be tracking the script execution and waiting
> > for the scripts to complete before rebooting?
> 
> Hi,
> 
> Reboot is called at the very end of the script, 10 seconds is more than 
> enough.

I don't know how we can assert that... we're not the sole process on the
box.

> 
> You are right that we can track the pid of the bootstrap script's
> parent parent parent, but it will introduce more complexity that I am
> not sure worth it.

Why can't we just wait on the PID if it we know it?

> 
> Regars,
> Alon

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] Change in vdsm[master]: bootstrap: perform reboot asynchronously

2012-09-07 Thread Alon Bar-Lev


- Original Message -
> From: "Ryan Harper" 
> To: "Alon Bar-Lev" 
> Cc: vdsm-devel@lists.fedorahosted.org
> Sent: Friday, September 7, 2012 10:30:18 PM
> Subject: Re: Change in vdsm[master]: bootstrap: perform reboot asynchronously
> 
> * Alon Bar-Lev  [2012-09-05 16:11]:
> > Alon Bar-Lev has uploaded a new change for review.
> > 
> > Change subject: bootstrap: perform reboot asynchronously
> > ..
> > 
> > bootstrap: perform reboot asynchronously
> > 
> > The use of /sbin/reboot may cause reboot to be performed at the
> > middle
> > of script execution.
> > 
> > Reboot should be delayed in background so that script will have a
> > fair
> > chance to terminate properly.
> 
> So, we fork and sleep 10 seconds?  Is that really want we want to do?
> Why is 10 seconds enough?
> 
> Shouldn't the deployUtil be tracking the script execution and waiting
> for the scripts to complete before rebooting?

Hi,

Reboot is called at the very end of the script, 10 seconds is more than enough.

You are right that we can track the pid of the bootstrap script's parent parent 
parent, but it will introduce more complexity that I am not sure worth it.

Regars,
Alon
___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] Change in vdsm[master]: bootstrap: perform reboot asynchronously

2012-09-07 Thread Ryan Harper
* Alon Bar-Lev  [2012-09-05 16:11]:
> Alon Bar-Lev has uploaded a new change for review.
> 
> Change subject: bootstrap: perform reboot asynchronously
> ..
> 
> bootstrap: perform reboot asynchronously
> 
> The use of /sbin/reboot may cause reboot to be performed at the middle
> of script execution.
> 
> Reboot should be delayed in background so that script will have a fair
> chance to terminate properly.

So, we fork and sleep 10 seconds?  Is that really want we want to do?
Why is 10 seconds enough?  

Shouldn't the deployUtil be tracking the script execution and waiting
for the scripts to complete before rebooting?


> 
> Change-Id: I0abb02ae4d5033a8b9f2d468da86fcdc53e2e1c2
> Signed-off-by: Alon Bar-Lev 
> ---
> M vdsm_reg/deployUtil.py.in
> 1 file changed, 39 insertions(+), 5 deletions(-)
> 
> 
>   git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/83/7783/1
> 
> diff --git a/vdsm_reg/deployUtil.py.in b/vdsm_reg/deployUtil.py.in
> index ebc7d36..b72cb44 100644
> --- a/vdsm_reg/deployUtil.py.in
> +++ b/vdsm_reg/deployUtil.py.in
> @@ -166,13 +166,47 @@
>  
>  def reboot():
>  """
> -This function reboots the machine.
> +This function reboots the machine async
>  """
> -fReturn = True
> +fReturn = False
>  
> -out, err, ret = _logExec([EX_REBOOT])
> -if ret:
> -fReturn = False
> +# Default maximum for the number of available file descriptors.
> +MAXFD = 1024
> +
> +import resource  # Resource usage information.
> +maxfd = resource.getrlimit(resource.RLIMIT_NOFILE)[1]
> +if (maxfd == resource.RLIM_INFINITY):
> +maxfd = MAXFD
> +
> +try:
> +pid = os.fork()
> +if pid == 0:
> +try:
> +os.setsid()
> +for fd in range(0, maxfd):
> +try:
> +os.close(fd)
> +except OSError:  # ERROR, fd wasn't open to begin with 
> (ignored)
> +pass
> +
> +os.open(os.devnull, os.O_RDWR)  # standard input (0)
> +os.dup2(0, 1)  # standard output (1)
> +os.dup2(0, 2)  # standard error (2)
> +
> +if os.fork() != 0:
> +os._exit(0)
> +
> +time.sleep(10)
> +os.execl(EX_REBOOT, EX_REBOOT)
> +finally:
> +os._exit(1)
> +
> +pid, status = os.waitpid(pid, 0)
> +
> +if os.WIFEXITED(status) and os.WEXITSTATUS(status) == 0:
> +fReturn = True
> +except OSError:
> +pass
>  
>  return fReturn
>  
> 
> 
> --
> To view, visit http://gerrit.ovirt.org/7783
> To unsubscribe, visit http://gerrit.ovirt.org/settings
> 
> Gerrit-MessageType: newchange
> Gerrit-Change-Id: I0abb02ae4d5033a8b9f2d468da86fcdc53e2e1c2
> Gerrit-PatchSet: 1
> Gerrit-Project: vdsm
> Gerrit-Branch: master
> Gerrit-Owner: Alon Bar-Lev 
> ___
> vdsm-patches mailing list
> vdsm-patc...@lists.fedorahosted.org
> https://lists.fedorahosted.org/mailman/listinfo/vdsm-patches

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] Change in vdsm[master]: report cpuUser and cpuSys separately

2012-09-07 Thread Ryan Harper
* Mark Wu  [2012-09-07 05:00]:
> Mark Wu has posted comments on this change.
> 
> Change subject: report cpuUser and cpuSys separately
> ..
> 
> 
> Patch Set 1:
> 
> Here's the cpustats of a running vm:
> 
> virsh # cpu-stats 3
> CPU0:
>   cpu_time   152.626922854 seconds
> CPU1:
>   cpu_time   460.343462849 seconds
> CPU2:
>   cpu_time37.569566994 seconds
> CPU3:
>   cpu_time21.092908393 seconds
> Total:
>   cpu_time   671.632861090 seconds
>   user_time3.44000 seconds
>   system_time 18.38000 seconds
> 
> You can see cpu_time is much bigger than the sum of user_time and
> system_time, because the guest time is not reported here. So most cpu
> time is used to run guest code. It makes sense!  For a qemu-kvm
> process, user_time represents the time used by qemu, and system_time
> represents the time used by kvm module, which doesn't contain guest
> time. I am not sure if this is what you expect. It could be a little
> bit confusing for user.

I think it's useful to split the data out seperately for debugging
purposes, not clear to me that the user is interested in the break down.
Advanced performance tuning will definitely want to observe the data as
well as users looking to debug issues.

I don't see the harm in adding additional information.


> 
> --
> To view, visit http://gerrit.ovirt.org/7718
> To unsubscribe, visit http://gerrit.ovirt.org/settings
> 
> Gerrit-MessageType: comment
> Gerrit-Change-Id: I663ad25ff3ff5ce426b5159b6c9a65b7f5167605
> Gerrit-PatchSet: 1
> Gerrit-Project: vdsm
> Gerrit-Branch: master
> Gerrit-Owner: Laszlo Hornyak 
> Gerrit-Reviewer: Dan Kenigsberg 
> Gerrit-Reviewer: Gal Hammer 
> Gerrit-Reviewer: Laszlo Hornyak 
> Gerrit-Reviewer: Mark Wu 
> Gerrit-Reviewer: Royce Lv 
> Gerrit-Reviewer: oVirt Jenkins CI Server
> ___
> vdsm-patches mailing list
> vdsm-patc...@lists.fedorahosted.org
> https://lists.fedorahosted.org/mailman/listinfo/vdsm-patches

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com

___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] [RFC] GlusterFS domain specific changes

2012-09-07 Thread M. Mohan Kumar
On Fri, 07 Sep 2012 13:27:15 +0530, "M. Mohan Kumar"  wrote:
> On Fri, 07 Sep 2012 09:35:10 +0300, Itamar Heim  wrote:
> > On 09/07/2012 08:21 AM, M. Mohan Kumar wrote:
> > > On Thu, 6 Sep 2012 18:59:19 -0400 (EDT), Ayal Baron  
> > > wrote:
> > >>
> > >>
> >
> > >
> > > For start using the LVs we will always do truncate for the required
> > > size, it will resize the LV. I didn't get what you are mentioning about
> > > thin-provisioning, but I have a dumb code using dm-thin targets showing
> > > BD xlators can be extended to use dm-thin targets for thin-provisioning.
> > 
> > so even though this is block storage, it will be extended as needed? how 
> > does that work exactly?
> > say i have a VM with a 100GB disk.
> > thin provisioning means we only allocated 1GB to it, then as the guest 
> > uses that storage, we allocate more as needed (lvextend, pause guest, 
> > lvrefresh, resume guest)
> > 
> > 
> 
> When we use device=lv, it means we use only thick provisioned logical
> volumes. If this logical volume runs out of space in the guest, one can
> resize it from the client by using truncate (results in lvresize at the
> server side) and run filesystem tools at guest to get added space.
> 
> But with device=thin type, all LVs are thinly provisioned and allocating
> space to them is taken care by device-mapper thin target
> automatically. The thin-pool should have enough space to accomoodate the
> sizing requirements. 
> 
As of now BD xlator supports only working with linear Logical volumes,
they are thick provisioned. gluster cli command "gluster volume create"
with option "device=lv" allows to work with logical volumes as files.

As a POC I have a code(not posted to external list), with option
"device=thin" to gluster volume create command it allows to work with
thin provisioned targets. But it does not take care of resizing
thin-pool when it reaches low-level threshold. Supporting thin targets
is in our TODO list. We have dependency on lvm2 library to provide apis
to create thin-targets.


 

___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] [RFC] GlusterFS domain specific changes

2012-09-07 Thread M. Mohan Kumar
On Fri, 07 Sep 2012 14:23:08 +0800, Shu Ming  wrote:
> 于 2012-9-7 13:21, M. Mohan Kumar 写道:
> > On Thu, 6 Sep 2012 18:59:19 -0400 (EDT), Ayal Baron  
> > wrote:
> >>
> >> - Original Message -
> >>> - Original Message -
>  From: "M. Mohan Kumar" 
>  To: vdsm-devel@lists.fedorahosted.org
>  Sent: Wednesday, July 25, 2012 1:26:15 PM
>  Subject: [vdsm] [RFC] GlusterFS domain specific changes
> 
> 
>  We are developing a GlusterFS server translator to export block
>  devices
>  as regular files to the client. Using block devices to serve VM
>  images
>  gives performance improvements, since it avoids some file system
>  bottlenecks in the host kernel. Goal is to use one block device(ie
>  file
>  at the client side) per VM image and feed this file to QEMU to get
>  the
>  performance improvements. QEMU will talk to glusterfs server
>  directly
>  using libgfapi.
> 
>  Currently we support only exporting Volume groups and Logical
>  Volumes. Logical volumes are exported as regular files to the
>  client.
> >> Are you actually using LVM behind the scenes?
> >> If so, why bother with exposing the LVs as files and not raw block devices?
> >>
> > Ayal,
> >
> > The idea is to provide a FS interface for managing block devices. One
> > can mount the Block Device Gluster Volume and create a LV and size it
> > just by
> >   $ touch lv1
> >   $ truncate -s5G lv1
> >
> > And other file commands can be used to clone LVs, snapshot LVs
> >   $ ln lv1 lv2 # clones
> >   $ ln -s lv1 lv1.sn # creates snapshot
> Do we have special reason to use "ln"?
> Why not use "cp" as the comannd to do the snapshot instead of "ln"?

cp involves opening source file in read-only mode, opening/creating
destination file with write-mode and issue series of read on source file
and write that into destination file till end of source file.

But we can't apply this to logical volume copy (or clone), because when
we create a logical volume we have to specify the size, but thats not
possible with above approach ie open/create does not take size as the
parameter so we can't create destination lv with required size.

But if I use link interface to copy LVs, VFS/FUSE/GlusterFS provides
link() interface that takes source file, destination file name. In BD
xlator link() code, I will get size of source LV and create destination
LV with that size and copy the contents.

This problem can be solved if we have a syscall copyfile(source, dest,
size). There have been discussions in the past on copyfile() interface which
could be made use of in this scenario copy.
http://www.spinics.net/lists/linux-nfs/msg26203.html 

> >
> > By enabling this feature GlusterFS can directly export storage in
> > SAN. We are planning to add feature to export LUNs also as regular files
> > in future.
> 
> IMO, The major feature of GlusterFS is to export distributed local disks 
> to the clients.
> If we have SAN in the backend, that means the storage block devices 
> should be exported
> to clients natually.  Why do we need GlusterSF to export the block 
> devices in SAN?
> 

By enabling this feature we are allowing GlusterFS to work with local
storage, NAS storage and SAN storage. ie it allows machines to access
block devices from the SAN which are not directly connected to SAN
storage.

Also providing block devices as vm disk image has some advantages like
 * it does not incur host side filesystem over head
 * if storage arrays provide storage offload features such as flashcopy,
   it can be exploited (these offloads will be usually at LUN level)

___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] [RFC] GlusterFS domain specific changes

2012-09-07 Thread M. Mohan Kumar
On Fri, 07 Sep 2012 09:35:10 +0300, Itamar Heim  wrote:
> On 09/07/2012 08:21 AM, M. Mohan Kumar wrote:
> > On Thu, 6 Sep 2012 18:59:19 -0400 (EDT), Ayal Baron  
> > wrote:
> >>
> >>
>
> >
> > For start using the LVs we will always do truncate for the required
> > size, it will resize the LV. I didn't get what you are mentioning about
> > thin-provisioning, but I have a dumb code using dm-thin targets showing
> > BD xlators can be extended to use dm-thin targets for thin-provisioning.
> 
> so even though this is block storage, it will be extended as needed? how 
> does that work exactly?
> say i have a VM with a 100GB disk.
> thin provisioning means we only allocated 1GB to it, then as the guest 
> uses that storage, we allocate more as needed (lvextend, pause guest, 
> lvrefresh, resume guest)
> 
> 

When we use device=lv, it means we use only thick provisioned logical
volumes. If this logical volume runs out of space in the guest, one can
resize it from the client by using truncate (results in lvresize at the
server side) and run filesystem tools at guest to get added space.

But with device=thin type, all LVs are thinly provisioned and allocating
space to them is taken care by device-mapper thin target
automatically. The thin-pool should have enough space to accomoodate the
sizing requirements. 

___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] [RFC]about the implement of text-based console

2012-09-07 Thread Xu He Jie

On 09/04/2012 10:36 PM, Xu He Jie wrote:

On 09/04/2012 06:52 PM, Dan Kenigsberg wrote:

On Tue, Sep 04, 2012 at 03:05:37PM +0800, Xu He Jie wrote:

On 09/03/2012 10:33 PM, Dan Kenigsberg wrote:

On Thu, Aug 30, 2012 at 04:26:31PM -0500, Adam Litke wrote:

On Thu, Aug 30, 2012 at 11:32:02AM +0800, Xu He Jie wrote:

Hi,

   I submited a patch for text-based console
http://gerrit.ovirt.org/#/c/7165/

the issue I want to discussing as below:
1. fix port VS dynamic port

Use fix port for all VM's console. connect console with 'ssh
vmUUID@ip -p port'.
Distinguishing VM by vmUUID.


   The current implement was vdsm will allocated port for console
dynamically and spawn sub-process when VM creating.
In sub-process the main thread responsible for accept new connection
and dispatch output of console to each connection.
When new connection is coming, main processing create new thread for
each new connection. Dynamic port will allocated
port for each VM and use range port. It isn't good for firewall 
rules.



   so I got a suggestion that use fix port. and connect console with
'ssh vmuuid@hostip -p fixport'. this is simple for user.
We need one process for accept new connection from fix port and when
new connection is coming, spawn sub-process for each vm.
But because the console only can open by one process, main process
need responsible for dispatching console's output of all vms and all
connection.
So the code will be a little complex then dynamic port.

   So this is dynamic port VS fix port and simple code VS complex 
code.
>From a usability point of view, I think the fixed port suggestion 
is nicer.
This means that a system administrator needs only to open one port 
to enable
remote console access.  If your initial implementation limits 
console access to

one connection per VM would that simplify the code?
Yes, using a fixed port for all consoles of all VMs seems like a 
cooler

idea. Besides the firewall issue, there's user experience: instead of
calling getVmStats to tell the vm port, and then use ssh, only one ssh
call is needed. (Taking this one step further - it would make sense to
add another layer on top, directing console clients to the specific 
host

currently running the Vm.)

I did not take a close look at your implementation, and did not 
research
this myself, but have you considered using sshd for this? I suppose 
you

can configure sshd to collect the list of known "users" from
`getAllVmStats`, and force it to run a command that redirects VM's
console to the ssh client. It has a potential of being a more robust
implementation.

I have considered using sshd and ssh tunnel. They
can't implement fixed port and share console.

Would you elaborate on that? Usually sshd listens to a fixed port 22,
and allows multiple users to have independet shells. What do you mean by
"share console"?


sharable console is like qemu vnc, you can open multiple connection, 
but picture is same in all
connection. virsh limited only one user can open console, so I think 
make it sharable is more

powerful.

Hmm... for sshd, I think I missing something. It could be implemented 
using sshd in the following way:


Add new system user for that vm on setVmTicket. And change that user's 
login program to another program that can redirect console.
To share console among multiple connection, It need that a process 
redirects the console to local unix socket, then we can copy console's 
output to multiple connection.


This is just in my mind. I am going to give a try. Thanks for your 
suggestion!


I gave a try for system sshd. That can works. But I think add user in 
system for each vm is't good enough. So I have look in PAM, try to find 
a way skip create real user in system, but it doesn't work. Even
we can create virtual user with PAM, we still can't tell sshd use which 
user and which login program. That means sshd doesn't support that. And 
I didn't find any other solution if I didn't miss something.


I think create user in system isn't good, there have security 
implication too, and it will mess the system
configuration, we need be care for clean all the user of vm. So I think 
again for implement console server by ourself. I want to ask is that 
really unsafe? We just use ssh protocol as transfer protocol. It isn't a 
real
sshd. It didn't access any system resource and shell. It only can 
redirect the vm's console after setVmTicket.





Current implement
we can do anything that what we want.

Yes, it is completely under our control, but there are down sides, too:
we have to maintain another process, and another entry point, instead of
configuring a universally-used, well maintained and debugged
application.

Dan.



___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel


___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/ma