Re: [PATCH master 00/30] Remove the instance disk template

2014-11-28 Thread Constantinos Venetsanopoulos
Hello,

On 11/28/14, 13:52 PM, 'Aaron Karper' via ganeti-devel wrote:
> Hi Constantinos,
>
> On Fri, Nov 28, 2014 at 01:31:18PM +0200, Constantinos Venetsanopoulos wrote:
>> Hello Aaron,
>>
>> On 11/27/14, 18:24 PM, 'Aaron Karper' via ganeti-devel wrote:
>>> This patch series removes the instance disk template for all intends and
>>> purposes.
>>>
>>> This introduces the notion of the aggregate disk template: It is 'diskless' 
>>> if
>>> no disks are attached to the instance, 'mixed' if multiple types of disks 
>>> are
>>> attached and the unique disk type otherwise.
>> Is there a specific reason to have this slot? Is it here for transitional
>> reasons and will go away on next iterations or is it there by design?
>>
> The reason is backwards-compatibility and graceful transition.
>
> At a future iteration no internal code should depend on this, but RAPI
> for example shouldn't be broken if the use case doesn't change. The idea
> is that someone who doesn't use the new features, doesn't need to fix
> their scripts to handle the removal of the instance disk template.

Ack. That makes sense.

> This transition isn't complete with this series, there are still spots
> where the aggregate disk template is used internally.

Ack.

>> IMHO, we do not need anything related to disk templates at instance
>> level. It is duplicate information and we then need to make sure it is
>> synced with the underlying disks.
>>
>> If we need something like it at instance level, shouldn't it be a
>> property, dynamically calculated from the actual underlying disks?
>>
> Yes, sorry for being imprecise in the original description! The
> aggregate disk template refers to the GetInstanceDiskTemplate method of
> the configuration.

No worries. OK, that sounds good.

Thanks for the explanation,
Constantinos

> Thanks and cheerio,
>
> Aaron
>
>> Best Regards,
>> Constantinos
>>> * The cfgupgrade removes the entry and adds it for downgrades (if the 
>>> instance
>>>   is homogeneous)
>>> * Via the query interface, the aggregate disk template is served.
>>> * Checks are replaced by either the aggregate disk template or
>>>   Any/AllTypeOfDisk - or iterative handling of the disks.
>>> * LUInstanceSetParam is currently assuming that there are no mixed
>>>   instances/it doesn't have to convert them to single-disk-type instances.
>>>
>>> Aaron Karper (30):
>>>   Remove instance disk template from Query.Instance
>>>   Remove disk template from haskell
>>>   Show all steps before failing the up/downgrade
>>>   Remove disk template from config in cfgupgrade
>>>   Determine enabled disk templates by looking at disks
>>>   Add helper to simulate disk template
>>>   Use disk based export summary
>>>   Add constant for disk template "mixed"
>>>   Add config getter for disk template
>>>   Make CalculateFileStorageDir params explicit
>>>   Wait for sync on create if any disk is DRBD
>>>   Ensure that ganeti.objects_unittest passes
>>>   Remove SetInstanceDiskTemplate from config
>>>   Introduce IDISK_TYPE
>>>   Remove instance disk template from error reporting
>>>   Remove disk_template option from IAReqInstanceAlloc
>>>   Remove iallocator dependency on disk_template
>>>   Decide cache value per disk for KVM hypervisor
>>>   Don't serialise fake disk_template
>>>   Use consensus disk template for hook env
>>>   Check modifications for LUInstanceSetParams correctly
>>>   Unify finding the correct file driver
>>>   Introduce temporary homodiskosity assumption
>>>   Only sync disks that are internally mirrored
>>>   Check that removed disks have enabled type
>>>   Remove default param 'disk_template' for creation
>>>   Use dev_type in moving of disks
>>>   Replace various checks in LUInstanceSetParams with per-disk
>>>   Ensure that disk template is given on creation of disk
>>>   Move tools/cfgupgrade to lib/tools/cfgupgrade.py
>>>
>>>  Makefile.am |   3 +-
>>>  lib/backend.py  |   3 +-
>>>  lib/cmdlib/instance_create.py   |   7 +-
>>>  lib/cmdlib/instance_set_params.py   | 182 ---
>>>  lib/cmdlib/instance_storage.py  |  57 +-
>>>  lib/cmdlib/instance_utils.py|  11 +-
>>>  lib/cmdlib/test.py  

Re: [PATCH master 00/30] Remove the instance disk template

2014-11-28 Thread Constantinos Venetsanopoulos
Hello Aaron,

On 11/27/14, 18:24 PM, 'Aaron Karper' via ganeti-devel wrote:
> This patch series removes the instance disk template for all intends and
> purposes.
>
> This introduces the notion of the aggregate disk template: It is 'diskless' if
> no disks are attached to the instance, 'mixed' if multiple types of disks are
> attached and the unique disk type otherwise.

Is there a specific reason to have this slot? Is it here for transitional
reasons and will go away on next iterations or is it there by design?

IMHO, we do not need anything related to disk templates at instance
level. It is duplicate information and we then need to make sure it is
synced with the underlying disks.

If we need something like it at instance level, shouldn't it be a
property, dynamically calculated from the actual underlying disks?

Best Regards,
Constantinos

>
> * The cfgupgrade removes the entry and adds it for downgrades (if the instance
>   is homogeneous)
> * Via the query interface, the aggregate disk template is served.
> * Checks are replaced by either the aggregate disk template or
>   Any/AllTypeOfDisk - or iterative handling of the disks.
> * LUInstanceSetParam is currently assuming that there are no mixed
>   instances/it doesn't have to convert them to single-disk-type instances.
>
> Aaron Karper (30):
>   Remove instance disk template from Query.Instance
>   Remove disk template from haskell
>   Show all steps before failing the up/downgrade
>   Remove disk template from config in cfgupgrade
>   Determine enabled disk templates by looking at disks
>   Add helper to simulate disk template
>   Use disk based export summary
>   Add constant for disk template "mixed"
>   Add config getter for disk template
>   Make CalculateFileStorageDir params explicit
>   Wait for sync on create if any disk is DRBD
>   Ensure that ganeti.objects_unittest passes
>   Remove SetInstanceDiskTemplate from config
>   Introduce IDISK_TYPE
>   Remove instance disk template from error reporting
>   Remove disk_template option from IAReqInstanceAlloc
>   Remove iallocator dependency on disk_template
>   Decide cache value per disk for KVM hypervisor
>   Don't serialise fake disk_template
>   Use consensus disk template for hook env
>   Check modifications for LUInstanceSetParams correctly
>   Unify finding the correct file driver
>   Introduce temporary homodiskosity assumption
>   Only sync disks that are internally mirrored
>   Check that removed disks have enabled type
>   Remove default param 'disk_template' for creation
>   Use dev_type in moving of disks
>   Replace various checks in LUInstanceSetParams with per-disk
>   Ensure that disk template is given on creation of disk
>   Move tools/cfgupgrade to lib/tools/cfgupgrade.py
>
>  Makefile.am |   3 +-
>  lib/backend.py  |   3 +-
>  lib/cmdlib/instance_create.py   |   7 +-
>  lib/cmdlib/instance_set_params.py   | 182 ---
>  lib/cmdlib/instance_storage.py  |  57 +-
>  lib/cmdlib/instance_utils.py|  11 +-
>  lib/cmdlib/test.py  |   3 -
>  lib/config/__init__.py  |  36 +-
>  lib/hypervisor/hv_kvm/__init__.py   |  29 +-
>  lib/masterd/iallocator.py   |  24 +-
>  lib/masterd/instance.py |  31 +-
>  lib/objects.py  |   8 +-
>  lib/tools/cfgupgrade.py | 809 
> 
>  lib/utils/__init__.py   |  18 +
>  src/Ganeti/Constants.hs |  26 +-
>  src/Ganeti/Objects/Instance.hs  |   1 -
>  src/Ganeti/Query/Instance.hs|  42 +-
>  test/data/instance-disks.txt|   1 -
>  test/hs/Test/Ganeti/Objects.hs  |   4 -
>  test/hs/Test/Ganeti/Query/Instance.hs   |   2 +-
>  test/py/cfgupgrade_unittest.py  |  31 +-
>  test/py/cmdlib/instance_storage_unittest.py |   3 +-
>  test/py/cmdlib/test_unittest.py |   3 -
>  test/py/ganeti.config_unittest.py   |  14 -
>  test/py/ganeti.objects_unittest.py  |   2 +
>  test/py/ganeti.query_unittest.py|   4 +-
>  test/py/ganeti.utils_unittest.py|  16 +
>  tools/cfgupgrade| 751 +-
>  28 files changed, 1129 insertions(+), 992 deletions(-)
>  create mode 100644 lib/tools/cfgupgrade.py
>



Re: [PATCH master 09/21] cmdlib: Add checks for attach/detach

2014-11-20 Thread Constantinos Venetsanopoulos
On 11/19/14, 15:05 PM, Aaron Karper wrote:
> On Fri, Nov 14, 2014 at 11:37:43AM +0200, Constantinos Venetsanopoulos wrote:
>> also cc:ing the list
>>
>> On 11/14/14, 11:36 AM, Constantinos Venetsanopoulos wrote:
>>> Hello Aaron,
>>>
>>> On 11/10/14, 18:04 PM, 'Aaron Karper' via ganeti-devel wrote:
>>>> On Thu, Nov 06, 2014 at 12:30:43AM +0200, Alex Pyrgiotis wrote:
>>>>> Add checks for:
>>>>> * number of arguments,
>>>>> * instance's compatibility with disk template/nodes
>>>>>
>>>>> and update existing ones.
>>>>>
>>>>> Signed-off-by: Alex Pyrgiotis >>> <mailto:apyr...@grnet.gr>>
>>>>> diff --git a/lib/cmdlib/instance_set_params.py
>>>> b/lib/cmdlib/instance_set_params.py
>>>>> index 37ece5b..a4cc576 100644
>>>>> --- a/lib/cmdlib/instance_set_params.py
>>>>> +++ b/lib/cmdlib/instance_set_params.py
>>>>> @@ -89,6 +89,30 @@ class LUInstanceSetParams(LogicalUnit):
>>>>>HTYPE = constants.HTYPE_INSTANCE
>>>>>REQ_BGL = False
>>>>>
>>>>> +  def GenericGetDiskInfo(self, **params):
>>>> Avoid taking all params and prefer `params.get('uuid', None)` with
>>>> checks for the values provided here. The linter warns for a reason.
>>>>
>>>>> +"""Find a disk object using the provided params.
>>>>> +
>>>>> +Accept arguments as keywords and use the
>>>> GetDiskInfo/GetDiskInfoByName
>>>>> +config functions to retrieve the disk info based on these
>>>> arguments.
>>>>> +
>>>>> +In case of an error, raise the appropriate exceptions.
>>>>> +"""
>>>>> +name = constants.IDISK_NAME
>>>>> +if "uuid" in params:
>>>>> +  disk = self.cfg.GetDiskInfo(params["uuid"])
>>>>> +  if disk is None:
>>>>> +raise errors.OpPrereqError("No disk was found with this
>>>> UUID: %s" %
>>>>> +   params["uuid"], errors.ECODE_INVAL)
>>>>> +elif name in params:
>>>>> +  disk = self.cfg.GetDiskInfoByName(params[name])
>>>>> +  if disk is None:
>>>>> +raise errors.OpPrereqError("No disk was found with this %s:
>>>> %s" %
>>>>> +   (name, params[name]),
>>>> errors.ECODE_INVAL)
>>>>> +else:
>>>>> +  raise errors.ProgrammerError("No disk UUID or name was given")
>>>>> +
>>>>> +return disk
>>>>> +
>>>>>@staticmethod
>>>>>def _UpgradeDiskNicMods(kind, mods, verify_fn):
>>>>>  assert ht.TList(mods)
>>>>> @@ -99,14 +123,15 @@ class LUInstanceSetParams(LogicalUnit):
>>>>>
>>>>>addremove = 0
>>>>>for op, params in mods:
>>>>> -if op in (constants.DDM_ADD, constants.DDM_REMOVE):
>>>>> +if op in (constants.DDM_ADD, constants.DDM_ATTACH,
>>>>> +  constants.DDM_REMOVE, constants.DDM_DETACH):
>>>>>result.append((op, -1, params))
>>>>>addremove += 1
>>>>>
>>>>>if addremove > 1:
>>>>> -raise errors.OpPrereqError("Only one %s add or remove
>>>> operation is"
>>>>> -   " supported at a time" % kind,
>>>>> -   errors.ECODE_INVAL)
>>>>> +raise errors.OpPrereqError("Only one %s
>>>> add/attach/remove/detach "
>>>>> +   "operation is supported at a
>>>> time" %
>>>>> +   kind, errors.ECODE_INVAL)
>>>>>  else:
>>>>>result.append((constants.DDM_MODIFY, op, params))
>>>>>
>>>>> @@ -120,21 +145,33 @@ class LUInstanceSetParams(LogicalUnit):
>>>>>def _CheckMods(kind, mods, key_types, item_fn):
>>>>>  """Ensures requested disk/NIC modifications are valid.
>>>>>
>>>>> +No

Re: [PATCH master 00/21] Add attach/detach functionality for disks

2014-11-20 Thread Constantinos Venetsanopoulos
Hello Aaron,

On 11/19/14, 15:11 PM, Aaron Karper wrote:
> Hi Constantinos,
>
> Thanks for replying for Alex and helping with this :) Actually shortly
> after writing that, I got push rights, so there actually will be no need
> for a second review.

That sounds good. Then, Alex will resend the patch series incorporating
your comments once he gets back.

Thanks,
Constantinos

> Cheerio,
> Aaron
>
>
> On Mon, Nov 17, 2014 at 08:07:52PM +0200, Constantinos Venetsanopoulos wrote:
>> Hello Aaron,
>>
>> I read all your comments and most of them seem reasonable.
>> I guess the only one that I could argue against is the one on
>> patch 9 that I've already replied upon. This means that Alex
>> will be happy to do all the changes you propose. The only
>> problem is that Alex will be out of work until the end of the
>> month and won't be able to reply to your comments (that's
>> why I'm doing it :) ). So would it be possible to have the second
>> reviewer review the patch now, so that Alex can incorporate the
>> comments of both once he gets back? I think we'll save a lot of
>> time this way.
>>
>> Thanks a lot,
>> Constantinos
>>
>> On 11/10/14, 17:48 PM, 'Aaron Karper' via ganeti-devel wrote:
>>> On Thu, Nov 06, 2014 at 12:30:34AM +0200, Alex Pyrgiotis wrote:
>>>> Hi everybody,
>>>>
>>>> This patch series implements the necessary logic for the attach/detach
>>>> operation on disks. This functionality will allow users to remove a disk
>>>> from an instance and attach it later on to another one, or keep it for
>>>> forensic reasons. The latter part means that the Ganeti configuration
>>>> now supports having detached disks.
>>>>
>>>> In order to document this new functionality, we have extended the man
>>>> page for gnt-instance, as well as the design doc for disks. Also, to
>>>> make sure that we don't introduce any new bugs, we have added many new
>>>> unit tests and extended older ones. Moreover, we have extended the
>>>> burnin tool in order to live-test this new functionality on an existing
>>>> cluster.
>>>>
>>>> Finally, a related patch about file-based instances is in the works and
>>>> will be sent in a couple of days.
>>>>
>>>> Cheers,
>>>> Alex
>>>>
>>>> Alex Pyrgiotis (19):
>>>>   config: Use disk name to get its info
>>>>   config: Fix docstring of _UnlockedGetInstanceNodes
>>>>   cmdlib: Turn index code to functions
>>>>   cmdlib: Return RPC result in AssembleInstanceDisks
>>>>   config: Create wrappers for attach/detach ops
>>>>   constants: Add attach/detach and UUID constant
>>>>   cmdlib: Add checks for attach/detach
>>>>   cmdlib: Add core attach/detach logic
>>>>   config: Remove checks for detached disks
>>>>   burnin: Test if attach/detach works properly
>>>>   tests.config: Add test for GetDiskInfoByName
>>>>   tests.cmdlib: Test index operations
>>>>   tests.opcodes/client: Add test for constants
>>>>   tests.config: Test attach/detach wrappers
>>>>   tests.cmdlib: Check if attach/detach checks work
>>>>   tests.cmdlib: Extend tests for _ApplyContainerMods
>>>>   tests.cluster: Add test for VerifyConfig
>>>>   doc: Extend the design document for disks
>>>>   man: Update the gnt-instance man page
>>>>
>>>> Ilias Tsitsimpis (2):
>>>>   Implement GetAllDisksInfo in config
>>>>   AllocateDRBDMinor per disk, not per instance
>>>>
>>>>  doc/design-disks.rst   |   49 ++-
>>>>  lib/client/gnt_instance.py |   54 +++-
>>>>  lib/cmdlib/cluster.py  |   35 ++-
>>>>  lib/cmdlib/instance.py |   10 +-
>>>>  lib/cmdlib/instance_create.py  |3 +-
>>>>  lib/cmdlib/instance_migration.py   |4 +-
>>>>  lib/cmdlib/instance_set_params.py  |  185 +--
>>>>  lib/cmdlib/instance_storage.py |   47 +--
>>>>  lib/cmdlib/instance_utils.py   |   97 --
>>>>  lib/config.py  |  138 ++--
>>>>  lib/tools/burnin.py|  122 ++--
>>>>  man/gnt-instance.rst   |   25 +-
>>>>  src/Ganeti/Co

Re: [PATCH master 00/21] Add attach/detach functionality for disks

2014-11-17 Thread Constantinos Venetsanopoulos
Hello Aaron,

I read all your comments and most of them seem reasonable.
I guess the only one that I could argue against is the one on
patch 9 that I've already replied upon. This means that Alex
will be happy to do all the changes you propose. The only
problem is that Alex will be out of work until the end of the
month and won't be able to reply to your comments (that's
why I'm doing it :) ). So would it be possible to have the second
reviewer review the patch now, so that Alex can incorporate the
comments of both once he gets back? I think we'll save a lot of
time this way.

Thanks a lot,
Constantinos

On 11/10/14, 17:48 PM, 'Aaron Karper' via ganeti-devel wrote:
> On Thu, Nov 06, 2014 at 12:30:34AM +0200, Alex Pyrgiotis wrote:
> > Hi everybody,
> >
> > This patch series implements the necessary logic for the attach/detach
> > operation on disks. This functionality will allow users to remove a disk
> > from an instance and attach it later on to another one, or keep it for
> > forensic reasons. The latter part means that the Ganeti configuration
> > now supports having detached disks.
> >
> > In order to document this new functionality, we have extended the man
> > page for gnt-instance, as well as the design doc for disks. Also, to
> > make sure that we don't introduce any new bugs, we have added many new
> > unit tests and extended older ones. Moreover, we have extended the
> > burnin tool in order to live-test this new functionality on an existing
> > cluster.
> >
> > Finally, a related patch about file-based instances is in the works and
> > will be sent in a couple of days.
> >
> > Cheers,
> > Alex
> >
> > Alex Pyrgiotis (19):
> >   config: Use disk name to get its info
> >   config: Fix docstring of _UnlockedGetInstanceNodes
> >   cmdlib: Turn index code to functions
> >   cmdlib: Return RPC result in AssembleInstanceDisks
> >   config: Create wrappers for attach/detach ops
> >   constants: Add attach/detach and UUID constant
> >   cmdlib: Add checks for attach/detach
> >   cmdlib: Add core attach/detach logic
> >   config: Remove checks for detached disks
> >   burnin: Test if attach/detach works properly
> >   tests.config: Add test for GetDiskInfoByName
> >   tests.cmdlib: Test index operations
> >   tests.opcodes/client: Add test for constants
> >   tests.config: Test attach/detach wrappers
> >   tests.cmdlib: Check if attach/detach checks work
> >   tests.cmdlib: Extend tests for _ApplyContainerMods
> >   tests.cluster: Add test for VerifyConfig
> >   doc: Extend the design document for disks
> >   man: Update the gnt-instance man page
> >
> > Ilias Tsitsimpis (2):
> >   Implement GetAllDisksInfo in config
> >   AllocateDRBDMinor per disk, not per instance
> >
> >  doc/design-disks.rst   |   49 ++-
> >  lib/client/gnt_instance.py |   54 +++-
> >  lib/cmdlib/cluster.py  |   35 ++-
> >  lib/cmdlib/instance.py |   10 +-
> >  lib/cmdlib/instance_create.py  |3 +-
> >  lib/cmdlib/instance_migration.py   |4 +-
> >  lib/cmdlib/instance_set_params.py  |  185 +--
> >  lib/cmdlib/instance_storage.py |   47 +--
> >  lib/cmdlib/instance_utils.py   |   97 --
> >  lib/config.py  |  138 ++--
> >  lib/tools/burnin.py|  122 ++--
> >  man/gnt-instance.rst   |   25 +-
> >  src/Ganeti/Config.hs   |1 +
> >  src/Ganeti/Constants.hs|8 +-
> >  src/Ganeti/Types.hs|4 +
> >  src/Ganeti/WConfd/Core.hs  |   14 +-
> >  src/Ganeti/WConfd/TempRes.hs   |   40 +--
> >  test/py/cmdlib/cluster_unittest.py |6 +
> >  test/py/cmdlib/instance_unittest.py|  399
> 
> >  test/py/cmdlib/testsupport/config_mock.py  |   18 +-
> >  test/py/ganeti.client.gnt_instance_unittest.py
>  |   20 +-
> >  test/py/ganeti.config_unittest.py
>   |   96 ++
> >  test/py/ganeti.opcodes_unittest.py
>  |3 +
> >  23 files changed, 1097 insertions(+), 281 deletions(-)
> >
> > --
> > 1.7.10.4
> >
>
> Hi Alex,
> Thanks for adding this feature. I will be doing a first pass of review,
> but since I don't have push rights, there will be a second reviewer.
>
>
> --
> Google Germany GmbH
> Dienerstr. 12
> 80331 München
>
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg
> Geschäftsführer: Graham Law, Christine Elizabeth Flores



Re: [PATCH master 09/21] cmdlib: Add checks for attach/detach

2014-11-14 Thread Constantinos Venetsanopoulos
also cc:ing the list

On 11/14/14, 11:36 AM, Constantinos Venetsanopoulos wrote:
> Hello Aaron,
>
> On 11/10/14, 18:04 PM, 'Aaron Karper' via ganeti-devel wrote:
>> On Thu, Nov 06, 2014 at 12:30:43AM +0200, Alex Pyrgiotis wrote:
>>> Add checks for:
>>> * number of arguments,
>>> * instance's compatibility with disk template/nodes
>>>
>>> and update existing ones.
>>>
>>> Signed-off-by: Alex Pyrgiotis > <mailto:apyr...@grnet.gr>>
>>> diff --git a/lib/cmdlib/instance_set_params.py
>> b/lib/cmdlib/instance_set_params.py
>>> index 37ece5b..a4cc576 100644
>>> --- a/lib/cmdlib/instance_set_params.py
>>> +++ b/lib/cmdlib/instance_set_params.py
>>> @@ -89,6 +89,30 @@ class LUInstanceSetParams(LogicalUnit):
>>>HTYPE = constants.HTYPE_INSTANCE
>>>REQ_BGL = False
>>>
>>> +  def GenericGetDiskInfo(self, **params):
>> Avoid taking all params and prefer `params.get('uuid', None)` with
>> checks for the values provided here. The linter warns for a reason.
>>
>>> +"""Find a disk object using the provided params.
>>> +
>>> +Accept arguments as keywords and use the
>> GetDiskInfo/GetDiskInfoByName
>>> +config functions to retrieve the disk info based on these
>> arguments.
>>> +
>>> +In case of an error, raise the appropriate exceptions.
>>> +"""
>>> +name = constants.IDISK_NAME
>>> +if "uuid" in params:
>>> +  disk = self.cfg.GetDiskInfo(params["uuid"])
>>> +  if disk is None:
>>> +raise errors.OpPrereqError("No disk was found with this
>> UUID: %s" %
>>> +   params["uuid"], errors.ECODE_INVAL)
>>> +elif name in params:
>>> +  disk = self.cfg.GetDiskInfoByName(params[name])
>>> +  if disk is None:
>>> +raise errors.OpPrereqError("No disk was found with this %s:
>> %s" %
>>> +   (name, params[name]),
>> errors.ECODE_INVAL)
>>> +else:
>>> +  raise errors.ProgrammerError("No disk UUID or name was given")
>>> +
>>> +return disk
>>> +
>>>@staticmethod
>>>def _UpgradeDiskNicMods(kind, mods, verify_fn):
>>>  assert ht.TList(mods)
>>> @@ -99,14 +123,15 @@ class LUInstanceSetParams(LogicalUnit):
>>>
>>>addremove = 0
>>>for op, params in mods:
>>> -if op in (constants.DDM_ADD, constants.DDM_REMOVE):
>>> +if op in (constants.DDM_ADD, constants.DDM_ATTACH,
>>> +  constants.DDM_REMOVE, constants.DDM_DETACH):
>>>result.append((op, -1, params))
>>>addremove += 1
>>>
>>>if addremove > 1:
>>> -raise errors.OpPrereqError("Only one %s add or remove
>> operation is"
>>> -   " supported at a time" % kind,
>>> -   errors.ECODE_INVAL)
>>> +raise errors.OpPrereqError("Only one %s
>> add/attach/remove/detach "
>>> +   "operation is supported at a
>> time" %
>>> +   kind, errors.ECODE_INVAL)
>>>  else:
>>>result.append((constants.DDM_MODIFY, op, params))
>>>
>>> @@ -120,21 +145,33 @@ class LUInstanceSetParams(LogicalUnit):
>>>def _CheckMods(kind, mods, key_types, item_fn):
>>>  """Ensures requested disk/NIC modifications are valid.
>>>
>>> +Note that the 'attach' action needs a way to refer to the UUID
>> of the disk,
>>> +since the disk name is not unique cluster-wide. However, the
>> UUID of the
>>> +disk is not settable but rather generated by Ganeti automatically,
>>> +therefore it cannot be passed as an IDISK parameter. For this
>> reason, this
>>> +function will override the checks to accept uuid parameters
>> solely for the
>>> +attach action.
>>>  """
>>> +# Create a key_types copy with the 'uuid' as a valid key type.
>>> +key_types_attach = key_types.copy()
>>> +key_types_attach['uuid'] = 'string'
>>> +
>> Why not add the UUID as 

Re: [PATCH stable-2.10] Clarify default setting of 'metavg'

2014-04-22 Thread Constantinos Venetsanopoulos
On 04/22/2014 02:37 PM, 'Helga Velroyen ' via 
ganeti-devel wrote:

This fixes issue 810, suggesting to clarify where the
default for 'metavg' comes from.

Signed-off-by: Helga Velroyen 
---
  man/gnt-instance.rst | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/man/gnt-instance.rst b/man/gnt-instance.rst
index 0174a65..ba12b62 100644
--- a/man/gnt-instance.rst
+++ b/man/gnt-instance.rst
@@ -74,7 +74,9 @@ vg
  
  metavg

 This options specifies a different VG for the metadata device. This
-   works only for DRBD devices
+   works only for DRBD devices. If not specified, the default megavg


s/megavg/metavg

Thanks,
Constantinos


+   of the node-group (possibly inherited from the cluster-wide settings)
+   will be used.
  
  When creating ExtStorage disks, also arbitrary parameters can be passed,

  to the ExtStorage provider. Those parameters are passed as additional




Re: [PATCH master 00/15] OS image in instance add and reinstall

2014-03-11 Thread Constantinos Venetsanopoulos

On 03/11/2014 12:15 PM, Jose A. Lopes wrote:

On Mar 10 13:33, Constantinos Venetsanopoulos wrote:

On 03/10/2014 12:47 PM, Jose A. Lopes wrote:

On Mar 10 12:29, Constantinos Venetsanopoulos wrote:

On 03/07/2014 07:15 PM, Jose A. Lopes wrote:

On Mar 07 17:30, Constantinos Venetsanopoulos wrote:

Hello,

On 03/07/2014 03:42 PM, Jose A. Lopes wrote:


This patch series allows instances to be installed with OS images.
Instances are installed via 'gnt-instance create' or 'gnt-instance
reinstall' and a new OS parameter, 'os-image', is added to
'--os-parameters' which allows images to be specified, either through
an absolute file path or a URL.

Quoting from the design doc:
"Currently, it is already possible to specify an installation medium,
  such as, a cdrom, but not a disk image. Therefore, a new parameter
  ``--os-image`` will be used to specify the location of a disk image
  which will be dumped to the instance's first disk before the instance
  is started."

IMHO, it makes more sense to have a new option --os-image as stated
in the design doc, rather than make os-image an OS parameter. OS
parameters are coupled with OS scripts, verified by them and can
have arbitrary names depending on the definition. I think 'imaging'
an instance's disk should be a completely different and independent
mechanism from OS scripts, switched on and off by a dedicated option.

Is there something that I'm missing, that led to this approach?

Also, won't it be a bit strange to pass the --os-parameters option
just to specify os-name in a case where we don't want to run OS
scripts at all (thus not passing --os-type)?

Having the os-image in OS parameters has several advantages.

The OS image is not completely independent from the OS scripts.  As
per the design doc, it is possible to specify an OS image and run the
OS scripts simultaneously.  The scripts can know if an OS image
parameter was specified simply by checking the OS parameters which
they already receive.  And this is extremely important because the
scripts must know whether the disk is already imaged or not, given
that a non-imaged disk will have to be formatted, installed, etc.

So, this keeps the semantics of the OS parameters the same, as in they
are the parameters passed to the scripts.  It also makes this
parameter optional automatically, given that the OS parameters are
also optional.  Making it a class slot requires more logic to keep the
parameter optional and, as an example, you can look at how the "os"
attribute of the class "Instance" was changed to become optional and
the result is not as good and reliable.

Even if you don't use the scripts, there is no harm done.  And, you
get to reuse a lot of code from Ganeti that already handles the OS
parameters everywhere, which helps keep the codebase smaller.  That's
also how I see the OS parameters dict, something that can grow as OS
features grow.

My feeling also is that eventually the OS scripts will become less
important and the images more, and the OS parameters will play a
slightly different role, namely, to keep OS related stuff together in
a way similar to the hypervisor parameters, and the other param dicts.

Another OS parameter that will eventually be added is the
personalization package, which follows the same reasoning as the OS
image parameter: it is optional and it is passed to scripts.  Given
that we already have the infrastructure in place to handle OS
parameters, we can just add a new key to the dict and it gets
automatically handled by Ganeti.

Indeed, you are right that you need a way to have scripts
know whether the disk is empty or not. You have a point
there. OS params seem like a good way to pass that through,
although until now OS params are defined by the definition
itself, not by Ganeti and should exist in the definition's
'parameters.list' file. How will that get adjusted?

Will every definition from now on need to have a mandatory
defined parameter, os-name, and in the future the
personalization parameter too?

The only use case that seems to be affected by this is if you specify
an OS image and the OS scripts simultaneously.  As per the design doc,
the image will be first dumped to the instance's first disk and the OS
scripts will run inside the virtualization environment.  Also, this
means that the OS scripts will have to fetch the OS parameters from
the metadata service and, as a result, Ganeti will not verify whether
the OS params are valid according to the OS definition's
'parameters.list'.

Ack.


You can check this in:
   OS reinstall design doc > Installation procedure > Paragraph 4

Also, I think in the future the OS scripts will be marked as untrusted
because giving root access to the scripts on the host is not a good
idea. That means that OS scripts will always run inside the
virtualization environment.  Therefore, th

Re: [PATCH master 00/15] OS image in instance add and reinstall

2014-03-10 Thread Constantinos Venetsanopoulos

On 03/10/2014 12:47 PM, Jose A. Lopes wrote:

On Mar 10 12:29, Constantinos Venetsanopoulos wrote:

On 03/07/2014 07:15 PM, Jose A. Lopes wrote:

On Mar 07 17:30, Constantinos Venetsanopoulos wrote:

Hello,

On 03/07/2014 03:42 PM, Jose A. Lopes wrote:


This patch series allows instances to be installed with OS images.
Instances are installed via 'gnt-instance create' or 'gnt-instance
reinstall' and a new OS parameter, 'os-image', is added to
'--os-parameters' which allows images to be specified, either through
an absolute file path or a URL.

Quoting from the design doc:
"Currently, it is already possible to specify an installation medium,
  such as, a cdrom, but not a disk image. Therefore, a new parameter
  ``--os-image`` will be used to specify the location of a disk image
  which will be dumped to the instance's first disk before the instance
  is started."

IMHO, it makes more sense to have a new option --os-image as stated
in the design doc, rather than make os-image an OS parameter. OS
parameters are coupled with OS scripts, verified by them and can
have arbitrary names depending on the definition. I think 'imaging'
an instance's disk should be a completely different and independent
mechanism from OS scripts, switched on and off by a dedicated option.

Is there something that I'm missing, that led to this approach?

Also, won't it be a bit strange to pass the --os-parameters option
just to specify os-name in a case where we don't want to run OS
scripts at all (thus not passing --os-type)?

Having the os-image in OS parameters has several advantages.

The OS image is not completely independent from the OS scripts.  As
per the design doc, it is possible to specify an OS image and run the
OS scripts simultaneously.  The scripts can know if an OS image
parameter was specified simply by checking the OS parameters which
they already receive.  And this is extremely important because the
scripts must know whether the disk is already imaged or not, given
that a non-imaged disk will have to be formatted, installed, etc.

So, this keeps the semantics of the OS parameters the same, as in they
are the parameters passed to the scripts.  It also makes this
parameter optional automatically, given that the OS parameters are
also optional.  Making it a class slot requires more logic to keep the
parameter optional and, as an example, you can look at how the "os"
attribute of the class "Instance" was changed to become optional and
the result is not as good and reliable.

Even if you don't use the scripts, there is no harm done.  And, you
get to reuse a lot of code from Ganeti that already handles the OS
parameters everywhere, which helps keep the codebase smaller.  That's
also how I see the OS parameters dict, something that can grow as OS
features grow.

My feeling also is that eventually the OS scripts will become less
important and the images more, and the OS parameters will play a
slightly different role, namely, to keep OS related stuff together in
a way similar to the hypervisor parameters, and the other param dicts.

Another OS parameter that will eventually be added is the
personalization package, which follows the same reasoning as the OS
image parameter: it is optional and it is passed to scripts.  Given
that we already have the infrastructure in place to handle OS
parameters, we can just add a new key to the dict and it gets
automatically handled by Ganeti.

Indeed, you are right that you need a way to have scripts
know whether the disk is empty or not. You have a point
there. OS params seem like a good way to pass that through,
although until now OS params are defined by the definition
itself, not by Ganeti and should exist in the definition's
'parameters.list' file. How will that get adjusted?

Will every definition from now on need to have a mandatory
defined parameter, os-name, and in the future the
personalization parameter too?

The only use case that seems to be affected by this is if you specify
an OS image and the OS scripts simultaneously.  As per the design doc,
the image will be first dumped to the instance's first disk and the OS
scripts will run inside the virtualization environment.  Also, this
means that the OS scripts will have to fetch the OS parameters from
the metadata service and, as a result, Ganeti will not verify whether
the OS params are valid according to the OS definition's
'parameters.list'.


Ack.


You can check this in:
   OS reinstall design doc > Installation procedure > Paragraph 4

Also, I think in the future the OS scripts will be marked as untrusted
because giving root access to the scripts on the host is not a good
idea. That means that OS scripts will always run inside the
virtualization environment.  Therefore, the 'parameters.list' will
become somewhat useless.


OK, that makes sense.


Also note th

Re: [PATCH master 00/15] OS image in instance add and reinstall

2014-03-10 Thread Constantinos Venetsanopoulos

On 03/07/2014 07:15 PM, Jose A. Lopes wrote:

On Mar 07 17:30, Constantinos Venetsanopoulos wrote:

Hello,

On 03/07/2014 03:42 PM, Jose A. Lopes wrote:


This patch series allows instances to be installed with OS images.
Instances are installed via 'gnt-instance create' or 'gnt-instance
reinstall' and a new OS parameter, 'os-image', is added to
'--os-parameters' which allows images to be specified, either through
an absolute file path or a URL.

Quoting from the design doc:
"Currently, it is already possible to specify an installation medium,
  such as, a cdrom, but not a disk image. Therefore, a new parameter
  ``--os-image`` will be used to specify the location of a disk image
  which will be dumped to the instance's first disk before the instance
  is started."

IMHO, it makes more sense to have a new option --os-image as stated
in the design doc, rather than make os-image an OS parameter. OS
parameters are coupled with OS scripts, verified by them and can
have arbitrary names depending on the definition. I think 'imaging'
an instance's disk should be a completely different and independent
mechanism from OS scripts, switched on and off by a dedicated option.

Is there something that I'm missing, that led to this approach?

Also, won't it be a bit strange to pass the --os-parameters option
just to specify os-name in a case where we don't want to run OS
scripts at all (thus not passing --os-type)?

Having the os-image in OS parameters has several advantages.

The OS image is not completely independent from the OS scripts.  As
per the design doc, it is possible to specify an OS image and run the
OS scripts simultaneously.  The scripts can know if an OS image
parameter was specified simply by checking the OS parameters which
they already receive.  And this is extremely important because the
scripts must know whether the disk is already imaged or not, given
that a non-imaged disk will have to be formatted, installed, etc.

So, this keeps the semantics of the OS parameters the same, as in they
are the parameters passed to the scripts.  It also makes this
parameter optional automatically, given that the OS parameters are
also optional.  Making it a class slot requires more logic to keep the
parameter optional and, as an example, you can look at how the "os"
attribute of the class "Instance" was changed to become optional and
the result is not as good and reliable.

Even if you don't use the scripts, there is no harm done.  And, you
get to reuse a lot of code from Ganeti that already handles the OS
parameters everywhere, which helps keep the codebase smaller.  That's
also how I see the OS parameters dict, something that can grow as OS
features grow.

My feeling also is that eventually the OS scripts will become less
important and the images more, and the OS parameters will play a
slightly different role, namely, to keep OS related stuff together in
a way similar to the hypervisor parameters, and the other param dicts.

Another OS parameter that will eventually be added is the
personalization package, which follows the same reasoning as the OS
image parameter: it is optional and it is passed to scripts.  Given
that we already have the infrastructure in place to handle OS
parameters, we can just add a new key to the dict and it gets
automatically handled by Ganeti.


Indeed, you are right that you need a way to have scripts
know whether the disk is empty or not. You have a point
there. OS params seem like a good way to pass that through,
although until now OS params are defined by the definition
itself, not by Ganeti and should exist in the definition's
'parameters.list' file. How will that get adjusted?

Will every definition from now on need to have a mandatory
defined parameter, os-name, and in the future the
personalization parameter too?

Also note that in the near future there will be the case where:

 * You won't be fetching and extracting an image on disk
 * You don't want the scripts to fetch anything either
 * You want the scripts to do the customization

That will happen when you have disks that get created from
an image via cloning (by a storage technology that supports it),
so the data will be already on the disk. Then you need to
somehow tell the scripts that the disk is loaded with data, so
that they can customize it, but you won't have an actual URL
or file path to pass.

That's why I was wondering if it would be better to have
an explicit OS interface parameter (e.g.: like INSTANCE_NAME,
and not an os-param) that states whether the disk is empty
or not and this would be independent from the os-name.

What do you think?


Having said that, we could also add an '--os-image' as an alias to
'--os-parameter os-image='.  The reason it was not yet done it's
because I'm not entirely convinced it won't be confusi

Re: [PATCH master 00/15] OS image in instance add and reinstall

2014-03-07 Thread Constantinos Venetsanopoulos

Hello,

On 03/07/2014 03:42 PM, Jose A. Lopes wrote:


This patch series allows instances to be installed with OS images.
Instances are installed via 'gnt-instance create' or 'gnt-instance
reinstall' and a new OS parameter, 'os-image', is added to
'--os-parameters' which allows images to be specified, either through
an absolute file path or a URL.


Quoting from the design doc:
"Currently, it is already possible to specify an installation medium,
 such as, a cdrom, but not a disk image. Therefore, a new parameter
 ``--os-image`` will be used to specify the location of a disk image
 which will be dumped to the instance's first disk before the instance
 is started."

IMHO, it makes more sense to have a new option --os-image as stated
in the design doc, rather than make os-image an OS parameter. OS
parameters are coupled with OS scripts, verified by them and can
have arbitrary names depending on the definition. I think 'imaging'
an instance's disk should be a completely different and independent
mechanism from OS scripts, switched on and off by a dedicated option.

Is there something that I'm missing, that led to this approach?

Also, won't it be a bit strange to pass the --os-parameters option
just to specify os-name in a case where we don't want to run OS
scripts at all (thus not passing --os-type)?

I think that the design doc's approach seems a better way to go.



If the image is an absolute file
path, the file will be copied to the respective node, whereas, if the
image is a URL, then the node will download this URL directly to the
instance's first disk.


Don't have a strong opinion on this, but until now Ganeti used
to make the assumption that all files needed, should pre-exist
under the directories that tries to find them and that the admin
is responsible to put them there (see OS installation scripts,
RADOS conf files, ExtStorage provider scripts, etc). I had the
impression that it was against its design to move files around,
let alone big ones such as ~13GB Windows images.

Would it make sense to have the same approach here, a.k.a have
the admin replicate the files, or setup a shared dir, and just
return an error if the file does not exist?

Or the Ganeti cache mechanism will solve that too, besides
caching files fetched from URLs?

Thanks,
Constantinos



This patch series starts with some general fixes:
* reuse method that parses OS variants
* fix some docstrings
* fix export order according to definition order

OS image related patches
* add helper functions to get and put 'os-image' into OS params
* generalize 'WipeDevice' to 'DumpDevice' which takes a source parameter
* add helper function that downloads a file and dumps it
* add helper function that disptaches to 'DumpDevice' or 
'_DownloadAndDumpDevice'
* add RPC 'blockdev_image' to image devices
* add helper function to remove instance if disks are degraded
* add helper function to image disks while ensuring that disks are paused
* add function to check if the OS image parameter is valid
* instance create with OS image
* add OS images to 'LUInstanceCreate' and make OS scripts optional
* instance reinstall with OS images
* extend QA with OS image tests

Jose A. Lopes (15):
   Reuse method to parse name from OS 'name+variant' string
   Fix docstrings
   Fix export order according to definition order
   Helper functions to get and update OS image from OSParams
   Generalize 'WipeDevice' to 'DumpDevice'
   Helper function that downloads an image and dumps it to disk
   Helper function to image a device by downloading or dumping
   RPC 'blockdev_image' to image devices
   Function to remove instance if disks are degraded
   Function to image disks while ensuring that disks are paused
   Function to check if the OS image parameter is valid
   Instance create with OS image
   OS images in 'LUInstanceCreate' and OS scripts optional
   Instance reinstall with OS images
   Extend QA with OS image

  lib/backend.py  | 138 ++--
  lib/cli.py  |   2 +-
  lib/cmdlib/common.py|  51 +
  lib/cmdlib/instance.py  | 133 ++
  lib/cmdlib/instance_operation.py| 108 ++--
  lib/cmdlib/instance_storage.py  |  62 +++-
  lib/objects.py  |  40 ++-
  lib/rpc_defs.py |   8 ++-
  lib/server/noded.py |  11 ++-
  qa/qa_instance.py   |  19 +
  src/Ganeti/Constants.hs |   5 +-
  test/py/cmdlib/instance_unittest.py |   2 +-
  12 files changed, 486 insertions(+), 93 deletions(-)





Re: Ganeti Cache

2014-03-06 Thread Constantinos Venetsanopoulos

On 03/04/2014 11:36 AM, Jose A. Lopes wrote:

On Feb 27 18:06, Constantinos Venetsanopoulos wrote:

On 02/27/2014 05:38 PM, Jose A. Lopes wrote:

On Feb 27 17:31, Constantinos Venetsanopoulos wrote:

Hi,

On 02/27/2014 04:47 PM, Jose A. Lopes wrote:

Hi,

riba suggested to share the cache design on the mailing list because
others might have comments.  This cache is used for the OS installs in
the following use case:

   Suppose you want to install several vms with the same image which is
   stored somewhere on a remote repository but you only want to
   download the image once (per node).

While reading this keep in mind that the cache is made simple on
purpose.  We could add lots of features but I think currently we do
not have other uses cases that justify making it more complicated.
Also, if there is a general interest in the cache we can also make a
design doc.

The cache contains an index that is persisted to disk and operations
that modify the index must first acquire a filesystem lock on the
index file because the node daemon runs jobs that do not share memory.
The index file is read often from disk but it is only written back if
it has been modified (if it is dirty).  Currently the cache index and
files reside in '/var/cache/ganeti'.  riba suggested using '/srv'
instead because there is more space there.  It's also a possibility.

Downloads do not hold a lock on the index but do use an auxiliary lock
in order to prevent and synchronize parallel downloads on the same
file.  File checksums (called fingerprints) are also calculated
without holding the index lock.  Currently, bad fingerprints simply
issue a warning on the log file and the user needs to check the log
file because this warning is not displayed on the CLI.

Each cached resource has a dtime (like ctime and mtime) but fo death.
The dtime specifies the absolute time when a resource is expunged.
Once the dtime has expired, the next cache flush will delete the
resources.

The cache uses an ensure semantics which means resources are created
or modified using the same function.  This function simply ensures
that the specified state is in the cache after it has been executed.

When a resource is cached, a file path pointing to the file that is
being cached is returned.  This file path is calculated from the URL
of the resource and therefore the path is the same in all nodes.
Also, by returning a file path we can cache virtually any file.

On the command line, the following commands are used:

   gnt-cluster cache-modify ...
   gnt-cluster cache-list ...

It is also possible to have gnt-cache, if it becomes necessary.

Finally, riba suggested adding a space limit to the cache.  This way
the cache would eliminate resources to free up space if the limit was
reached.  This complicates things considerably as it becomes necessary
to have a policy on what to delete, etc.

Comments/feedback is highly appreciated.

Just 2 questions:

1. Will there be the option to disable the cache altogether?
2. If I understand correctly (from the OS design doc) the first
implementation of fetching an Image will include fetching (a)
from an http/ftp location or (b) a local file location. I
guess cache makes sense when fetching from (a). What happens
with (b)? If cache is enabled it will be used for both?

If you don't want the cache we could have a cluster option to
enable/disable the cache.

I think that this would be definitely a good option to have.


   Just to clarify, by saying the cache is not
used I just mean that the file is downloaded, the instance is created
and the file is immediately deleted afterwards.  Do this sound good?

Oh, so when you are fetching from an http/ftp location you are
always saving the image locally before you copy it to the
instance's disk? Even if the cache is disabled?

What happens if one doesn't want to allocate any space on
the node? Wouldn't it better for the mechanism to directly
pass the image to the target disk, if the admin chooses so?
(e.g.: by doing curl  | dd 
Is 'dd' used to limit the maximum file size, given that curl cannot do
this reliably?


We do this to buffer curl's output and send it to the disk
in bigger chunks, by tuning dd's bs parameter. Also, in
general we find that using oflag=direct with dd helps with
not polluting the host page cache with streaming image data,
and has better performance.

Thanks,
Constantinos


IMHO, the caching mechanism should be separate and independent
from the fetching mechanism. The admin should have the option
to never store anything on the node if he/she chooses to go
with http fetch + no cache.

What do you think?

Thanks,
Constantinos


Thanks,
Constantinos


Cheers,
Jose





Re: Ganeti Cache

2014-02-27 Thread Constantinos Venetsanopoulos


"Jose A. Lopes"  wrote:
>On Feb 27 18:46, Constantinos Venetsanopoulos wrote:
>> On 02/27/2014 06:29 PM, Jose A. Lopes wrote:
>> >On Feb 27 18:28, Constantinos Venetsanopoulos wrote:
>> >>On 02/27/2014 06:12 PM, Jose A. Lopes wrote:
>> >>>On Feb 27 18:06, Constantinos Venetsanopoulos wrote:
>> >>>>On 02/27/2014 05:38 PM, Jose A. Lopes wrote:
>> >>>>>On Feb 27 17:31, Constantinos Venetsanopoulos wrote:
>> >>>>>>Hi,
>> >>>>>>
>> >>>>>>On 02/27/2014 04:47 PM, Jose A. Lopes wrote:
>> >>>>>>>Hi,
>> >>>>>>>
>> >>>>>>>riba suggested to share the cache design on the mailing list
>because
>> >>>>>>>others might have comments.  This cache is used for the OS
>installs in
>> >>>>>>>the following use case:
>> >>>>>>>
>> >>>>>>>   Suppose you want to install several vms with the same image
>which is
>> >>>>>>>   stored somewhere on a remote repository but you only want
>to
>> >>>>>>>   download the image once (per node).
>> >>>>>>>
>> >>>>>>>While reading this keep in mind that the cache is made simple
>on
>> >>>>>>>purpose.  We could add lots of features but I think currently
>we do
>> >>>>>>>not have other uses cases that justify making it more
>complicated.
>> >>>>>>>Also, if there is a general interest in the cache we can also
>make a
>> >>>>>>>design doc.
>> >>>>>>>
>> >>>>>>>The cache contains an index that is persisted to disk and
>operations
>> >>>>>>>that modify the index must first acquire a filesystem lock on
>the
>> >>>>>>>index file because the node daemon runs jobs that do not share
>memory.
>> >>>>>>>The index file is read often from disk but it is only written
>back if
>> >>>>>>>it has been modified (if it is dirty).  Currently the cache
>index and
>> >>>>>>>files reside in '/var/cache/ganeti'.  riba suggested using
>'/srv'
>> >>>>>>>instead because there is more space there.  It's also a
>possibility.
>> >>>>>>>
>> >>>>>>>Downloads do not hold a lock on the index but do use an
>auxiliary lock
>> >>>>>>>in order to prevent and synchronize parallel downloads on the
>same
>> >>>>>>>file.  File checksums (called fingerprints) are also
>calculated
>> >>>>>>>without holding the index lock.  Currently, bad fingerprints
>simply
>> >>>>>>>issue a warning on the log file and the user needs to check
>the log
>> >>>>>>>file because this warning is not displayed on the CLI.
>> >>>>>>>
>> >>>>>>>Each cached resource has a dtime (like ctime and mtime) but fo
>death.
>> >>>>>>>The dtime specifies the absolute time when a resource is
>expunged.
>> >>>>>>>Once the dtime has expired, the next cache flush will delete
>the
>> >>>>>>>resources.
>> >>>>>>>
>> >>>>>>>The cache uses an ensure semantics which means resources are
>created
>> >>>>>>>or modified using the same function.  This function simply
>ensures
>> >>>>>>>that the specified state is in the cache after it has been
>executed.
>> >>>>>>>
>> >>>>>>>When a resource is cached, a file path pointing to the file
>that is
>> >>>>>>>being cached is returned.  This file path is calculated from
>the URL
>> >>>>>>>of the resource and therefore the path is the same in all
>nodes.
>> >>>>>>>Also, by returning a file path we can cache virtually any
>file.
>> >>>>>>>
>> >>>>>>>On the command line, the following commands are used:
>> >>>>>>>
>> >>>>>>>   gnt-cluster cache-modify ...
>> >>>>>>>   gnt-cluster cache-list ...
>> >>>>>>>
>> >>>>>>>It is 

Re: Ganeti Cache

2014-02-27 Thread Constantinos Venetsanopoulos

On 02/27/2014 06:29 PM, Jose A. Lopes wrote:

On Feb 27 18:28, Constantinos Venetsanopoulos wrote:

On 02/27/2014 06:12 PM, Jose A. Lopes wrote:

On Feb 27 18:06, Constantinos Venetsanopoulos wrote:

On 02/27/2014 05:38 PM, Jose A. Lopes wrote:

On Feb 27 17:31, Constantinos Venetsanopoulos wrote:

Hi,

On 02/27/2014 04:47 PM, Jose A. Lopes wrote:

Hi,

riba suggested to share the cache design on the mailing list because
others might have comments.  This cache is used for the OS installs in
the following use case:

   Suppose you want to install several vms with the same image which is
   stored somewhere on a remote repository but you only want to
   download the image once (per node).

While reading this keep in mind that the cache is made simple on
purpose.  We could add lots of features but I think currently we do
not have other uses cases that justify making it more complicated.
Also, if there is a general interest in the cache we can also make a
design doc.

The cache contains an index that is persisted to disk and operations
that modify the index must first acquire a filesystem lock on the
index file because the node daemon runs jobs that do not share memory.
The index file is read often from disk but it is only written back if
it has been modified (if it is dirty).  Currently the cache index and
files reside in '/var/cache/ganeti'.  riba suggested using '/srv'
instead because there is more space there.  It's also a possibility.

Downloads do not hold a lock on the index but do use an auxiliary lock
in order to prevent and synchronize parallel downloads on the same
file.  File checksums (called fingerprints) are also calculated
without holding the index lock.  Currently, bad fingerprints simply
issue a warning on the log file and the user needs to check the log
file because this warning is not displayed on the CLI.

Each cached resource has a dtime (like ctime and mtime) but fo death.
The dtime specifies the absolute time when a resource is expunged.
Once the dtime has expired, the next cache flush will delete the
resources.

The cache uses an ensure semantics which means resources are created
or modified using the same function.  This function simply ensures
that the specified state is in the cache after it has been executed.

When a resource is cached, a file path pointing to the file that is
being cached is returned.  This file path is calculated from the URL
of the resource and therefore the path is the same in all nodes.
Also, by returning a file path we can cache virtually any file.

On the command line, the following commands are used:

   gnt-cluster cache-modify ...
   gnt-cluster cache-list ...

It is also possible to have gnt-cache, if it becomes necessary.

Finally, riba suggested adding a space limit to the cache.  This way
the cache would eliminate resources to free up space if the limit was
reached.  This complicates things considerably as it becomes necessary
to have a policy on what to delete, etc.

Comments/feedback is highly appreciated.

Just 2 questions:

1. Will there be the option to disable the cache altogether?
2. If I understand correctly (from the OS design doc) the first
implementation of fetching an Image will include fetching (a)
from an http/ftp location or (b) a local file location. I
guess cache makes sense when fetching from (a). What happens
with (b)? If cache is enabled it will be used for both?

If you don't want the cache we could have a cluster option to
enable/disable the cache.

I think that this would be definitely a good option to have.


   Just to clarify, by saying the cache is not
used I just mean that the file is downloaded, the instance is created
and the file is immediately deleted afterwards.  Do this sound good?

Oh, so when you are fetching from an http/ftp location you are
always saving the image locally before you copy it to the
instance's disk? Even if the cache is disabled?

What happens if one doesn't want to allocate any space on
the node? Wouldn't it better for the mechanism to directly
pass the image to the target disk, if the admin chooses so?
(e.g.: by doing curl  | dd 
>from the fetching mechanism. The admin should have the option

to never store anything on the node if he/she chooses to go
with http fetch + no cache.

What do you think?

Sounds good!

Cool. I think with these minor additions the caching mechanism
will be a really nice feature.

Since, we referred to the fetching mechanism: are you guys
planning to do it extensible, so that more protocols/fetching
methods may be added in the future?

What did you have in mind?
  


Haven't thought about it much, but the first thing that comes
to my mind is:

Why not have a very simple interface for fetching, let's
say run a 'fetch' script, which will take a block device
(the instance's disk) and a location URI as its input
and take care of filling up that block device with image
data, then return

Re: Ganeti Cache

2014-02-27 Thread Constantinos Venetsanopoulos

On 02/27/2014 06:12 PM, Jose A. Lopes wrote:

On Feb 27 18:06, Constantinos Venetsanopoulos wrote:

On 02/27/2014 05:38 PM, Jose A. Lopes wrote:

On Feb 27 17:31, Constantinos Venetsanopoulos wrote:

Hi,

On 02/27/2014 04:47 PM, Jose A. Lopes wrote:

Hi,

riba suggested to share the cache design on the mailing list because
others might have comments.  This cache is used for the OS installs in
the following use case:

   Suppose you want to install several vms with the same image which is
   stored somewhere on a remote repository but you only want to
   download the image once (per node).

While reading this keep in mind that the cache is made simple on
purpose.  We could add lots of features but I think currently we do
not have other uses cases that justify making it more complicated.
Also, if there is a general interest in the cache we can also make a
design doc.

The cache contains an index that is persisted to disk and operations
that modify the index must first acquire a filesystem lock on the
index file because the node daemon runs jobs that do not share memory.
The index file is read often from disk but it is only written back if
it has been modified (if it is dirty).  Currently the cache index and
files reside in '/var/cache/ganeti'.  riba suggested using '/srv'
instead because there is more space there.  It's also a possibility.

Downloads do not hold a lock on the index but do use an auxiliary lock
in order to prevent and synchronize parallel downloads on the same
file.  File checksums (called fingerprints) are also calculated
without holding the index lock.  Currently, bad fingerprints simply
issue a warning on the log file and the user needs to check the log
file because this warning is not displayed on the CLI.

Each cached resource has a dtime (like ctime and mtime) but fo death.
The dtime specifies the absolute time when a resource is expunged.
Once the dtime has expired, the next cache flush will delete the
resources.

The cache uses an ensure semantics which means resources are created
or modified using the same function.  This function simply ensures
that the specified state is in the cache after it has been executed.

When a resource is cached, a file path pointing to the file that is
being cached is returned.  This file path is calculated from the URL
of the resource and therefore the path is the same in all nodes.
Also, by returning a file path we can cache virtually any file.

On the command line, the following commands are used:

   gnt-cluster cache-modify ...
   gnt-cluster cache-list ...

It is also possible to have gnt-cache, if it becomes necessary.

Finally, riba suggested adding a space limit to the cache.  This way
the cache would eliminate resources to free up space if the limit was
reached.  This complicates things considerably as it becomes necessary
to have a policy on what to delete, etc.

Comments/feedback is highly appreciated.

Just 2 questions:

1. Will there be the option to disable the cache altogether?
2. If I understand correctly (from the OS design doc) the first
implementation of fetching an Image will include fetching (a)
from an http/ftp location or (b) a local file location. I
guess cache makes sense when fetching from (a). What happens
with (b)? If cache is enabled it will be used for both?

If you don't want the cache we could have a cluster option to
enable/disable the cache.

I think that this would be definitely a good option to have.


   Just to clarify, by saying the cache is not
used I just mean that the file is downloaded, the instance is created
and the file is immediately deleted afterwards.  Do this sound good?

Oh, so when you are fetching from an http/ftp location you are
always saving the image locally before you copy it to the
instance's disk? Even if the cache is disabled?

What happens if one doesn't want to allocate any space on
the node? Wouldn't it better for the mechanism to directly
pass the image to the target disk, if the admin chooses so?
(e.g.: by doing curl  | dd 
Sounds good!


Cool. I think with these minor additions the caching mechanism
will be a really nice feature.

Since, we referred to the fetching mechanism: are you guys
planning to do it extensible, so that more protocols/fetching
methods may be added in the future?

Thanks,
Constantinos



Thanks,
Constantinos


Thanks,
Constantinos


Cheers,
Jose





Re: Ganeti Cache

2014-02-27 Thread Constantinos Venetsanopoulos

On 02/27/2014 05:38 PM, Jose A. Lopes wrote:

On Feb 27 17:31, Constantinos Venetsanopoulos wrote:

Hi,

On 02/27/2014 04:47 PM, Jose A. Lopes wrote:

Hi,

riba suggested to share the cache design on the mailing list because
others might have comments.  This cache is used for the OS installs in
the following use case:

   Suppose you want to install several vms with the same image which is
   stored somewhere on a remote repository but you only want to
   download the image once (per node).

While reading this keep in mind that the cache is made simple on
purpose.  We could add lots of features but I think currently we do
not have other uses cases that justify making it more complicated.
Also, if there is a general interest in the cache we can also make a
design doc.

The cache contains an index that is persisted to disk and operations
that modify the index must first acquire a filesystem lock on the
index file because the node daemon runs jobs that do not share memory.
The index file is read often from disk but it is only written back if
it has been modified (if it is dirty).  Currently the cache index and
files reside in '/var/cache/ganeti'.  riba suggested using '/srv'
instead because there is more space there.  It's also a possibility.

Downloads do not hold a lock on the index but do use an auxiliary lock
in order to prevent and synchronize parallel downloads on the same
file.  File checksums (called fingerprints) are also calculated
without holding the index lock.  Currently, bad fingerprints simply
issue a warning on the log file and the user needs to check the log
file because this warning is not displayed on the CLI.

Each cached resource has a dtime (like ctime and mtime) but fo death.
The dtime specifies the absolute time when a resource is expunged.
Once the dtime has expired, the next cache flush will delete the
resources.

The cache uses an ensure semantics which means resources are created
or modified using the same function.  This function simply ensures
that the specified state is in the cache after it has been executed.

When a resource is cached, a file path pointing to the file that is
being cached is returned.  This file path is calculated from the URL
of the resource and therefore the path is the same in all nodes.
Also, by returning a file path we can cache virtually any file.

On the command line, the following commands are used:

   gnt-cluster cache-modify ...
   gnt-cluster cache-list ...

It is also possible to have gnt-cache, if it becomes necessary.

Finally, riba suggested adding a space limit to the cache.  This way
the cache would eliminate resources to free up space if the limit was
reached.  This complicates things considerably as it becomes necessary
to have a policy on what to delete, etc.

Comments/feedback is highly appreciated.

Just 2 questions:

1. Will there be the option to disable the cache altogether?
2. If I understand correctly (from the OS design doc) the first
implementation of fetching an Image will include fetching (a)
from an http/ftp location or (b) a local file location. I
guess cache makes sense when fetching from (a). What happens
with (b)? If cache is enabled it will be used for both?

If you don't want the cache we could have a cluster option to
enable/disable the cache.


I think that this would be definitely a good option to have.


   Just to clarify, by saying the cache is not
used I just mean that the file is downloaded, the instance is created
and the file is immediately deleted afterwards.  Do this sound good?


Oh, so when you are fetching from an http/ftp location you are
always saving the image locally before you copy it to the
instance's disk? Even if the cache is disabled?

What happens if one doesn't want to allocate any space on
the node? Wouldn't it better for the mechanism to directly
pass the image to the target disk, if the admin chooses so?
(e.g.: by doing curl  | dd 
Thanks,
Constantinos


Cheers,
Jose





Re: Ganeti Cache

2014-02-27 Thread Constantinos Venetsanopoulos

Hi,

On 02/27/2014 04:47 PM, Jose A. Lopes wrote:

Hi,

riba suggested to share the cache design on the mailing list because
others might have comments.  This cache is used for the OS installs in
the following use case:

   Suppose you want to install several vms with the same image which is
   stored somewhere on a remote repository but you only want to
   download the image once (per node).

While reading this keep in mind that the cache is made simple on
purpose.  We could add lots of features but I think currently we do
not have other uses cases that justify making it more complicated.
Also, if there is a general interest in the cache we can also make a
design doc.

The cache contains an index that is persisted to disk and operations
that modify the index must first acquire a filesystem lock on the
index file because the node daemon runs jobs that do not share memory.
The index file is read often from disk but it is only written back if
it has been modified (if it is dirty).  Currently the cache index and
files reside in '/var/cache/ganeti'.  riba suggested using '/srv'
instead because there is more space there.  It's also a possibility.

Downloads do not hold a lock on the index but do use an auxiliary lock
in order to prevent and synchronize parallel downloads on the same
file.  File checksums (called fingerprints) are also calculated
without holding the index lock.  Currently, bad fingerprints simply
issue a warning on the log file and the user needs to check the log
file because this warning is not displayed on the CLI.

Each cached resource has a dtime (like ctime and mtime) but fo death.
The dtime specifies the absolute time when a resource is expunged.
Once the dtime has expired, the next cache flush will delete the
resources.

The cache uses an ensure semantics which means resources are created
or modified using the same function.  This function simply ensures
that the specified state is in the cache after it has been executed.

When a resource is cached, a file path pointing to the file that is
being cached is returned.  This file path is calculated from the URL
of the resource and therefore the path is the same in all nodes.
Also, by returning a file path we can cache virtually any file.

On the command line, the following commands are used:

   gnt-cluster cache-modify ...
   gnt-cluster cache-list ...

It is also possible to have gnt-cache, if it becomes necessary.

Finally, riba suggested adding a space limit to the cache.  This way
the cache would eliminate resources to free up space if the limit was
reached.  This complicates things considerably as it becomes necessary
to have a policy on what to delete, etc.

Comments/feedback is highly appreciated.


Just 2 questions:

1. Will there be the option to disable the cache altogether?
2. If I understand correctly (from the OS design doc) the first
   implementation of fetching an Image will include fetching (a)
   from an http/ftp location or (b) a local file location. I
   guess cache makes sense when fetching from (a). What happens
   with (b)? If cache is enabled it will be used for both?

Thanks,
Constantinos


Cheers,
Jose





Re: [PATCH master] Add design doc for ifdown script support

2014-02-26 Thread Constantinos Venetsanopoulos

Hello,

after an internal discussion on this, it seems that we are
both talking about the same thing here, but this thread
doesn't reflect it clearly.

You are right that ifdown should make only node-local
changes and not interact with external systems (e.g.: DNS).
This is the hooks' job to do.

Dimitris will update the doc to note that, as you suggest,
and also state that one cannot rely on the ifdown script
for crucial actions since a lot of things may happen and
the script may never run.

I think hooks already contain what we need. If we see in
the future that the environment doesn't suffice, we can
patch the hooks interface accordingly.

So, shall we send an updated doc incorporating your comments
to proceed?

Regards,
Constantinos

On 02/14/2014 11:56 AM, Guido Trotter wrote:


On Thu, Feb 13, 2014 at 3:01 PM, Dimitris Aragiorgis  wrote:

The ifdown script will be responsible for deconfiguring network
devices and cleanup changes made by the ifup script. The first
implementation will target KVM but it could be ported to Xen as well
especially when Xen hotplug gets implemented.

Signed-off-by: Dimitris Aragiorgis 
---
  Makefile.am   |1 +
  doc/design-draft.rst  |1 +
  doc/design-ifdown.rst |  134 +
  3 files changed, 136 insertions(+)
  create mode 100644 doc/design-ifdown.rst

diff --git a/Makefile.am b/Makefile.am
index 3e216bd..32bfff6 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -533,6 +533,7 @@ docinput = \
 doc/design-htools-2.3.rst \
 doc/design-http-server.rst \
 doc/design-hugepages-support.rst \
+   doc/design-ifdown.rst \
 doc/design-impexp2.rst \
 doc/design-internal-shutdown.rst \
 doc/design-kvmd.rst \
diff --git a/doc/design-draft.rst b/doc/design-draft.rst
index 87c06e7..6ff8059 100644
--- a/doc/design-draft.rst
+++ b/doc/design-draft.rst
@@ -21,6 +21,7 @@ Design document drafts
 design-hsqueeze.rst
 design-ssh-ports.rst
 design-os.rst
+   design-ifdown.rst

  .. vim: set textwidth=72 :
  .. Local Variables:
diff --git a/doc/design-ifdown.rst b/doc/design-ifdown.rst
new file mode 100644
index 000..839cf8b
--- /dev/null
+++ b/doc/design-ifdown.rst
@@ -0,0 +1,134 @@
+==
+Design for adding ifdown script to KVM
+==
+
+.. contents:: :depth: 4
+
+This is a design document about adding support for an ifdown script responsible
+for deconfiguring network devices and cleanup changes made by the ifup script. 
The
+first implementation will target KVM but it could be ported to Xen as well
+especially when hotplug gets implemented.
+
+
+Current state and shortcomings
+==
+
+Currently, KVM before instance startup, instance migration and NIC hotplug, it
+creates a tap and invokes explicitly the kvm-ifup script with the relevant
+environment (INTERFACE, MAC, IP, MODE, LINK, TAGS, and all the network info if
+any; NETWORK\_SUBNET, NETWORK\_TAGS, etc).
+
+For Xen we have the `vif-ganeti` script (associated with vif-script hypervisor
+parameter). The main difference is that Xen calls it by itself by passing it as
+an extra option in the configuration file.
+
+This ifup script can do several things; bridge a tap to a bridge, add ip rules,
+update a external DNS or DHCP server, enable proxy ARP or proxy NDP, issue
+openvswitch commands, etc.  In general we can divide those actions in two
+categories:
+
+1) Commands that change the state of the host
+2) Commands that change the state of external components.
+
+Currently those changes do not get cleaned up or modified upon instance
+shutdown, remove, migrate, or NIC hot-unplug. Thus we have stale entries in
+hosts and most important might have stale/invalid configuration on external
+components like routers that could affect connectivity.
+
+A workaround could be hooks but:
+
+1) During migrate hooks the environment is the one held in config data
+and not in runtime files. The NIC configuration might have changed on
+master but not on the running KVM process (unless hotplug is used).
+Plus the NIC order in config data might not be the same one on the KVM
+process.
+
+2) On instance modification, changes are not available on hooks. With
+other words we do not know the configuration before and after modification.
+
+Since Ganeti is the orchestrator and is the one who explicitly configures
+host devices (tap, vif) it should be the one responsible for cleanup/
+deconfiguration. Especially on a SDN approach this kind of script might
+be useful to cleanup flows in the cluster in order to ensure correct paths
+without ping pongs between hosts or connectivity loss for the instance.
+
+
+Proposed Changes
+
+
+We add an new script, kvm-ifdown that is explicitly invoked after:
+
+1) instance shutdown on primary node
+2) successful instance migration on source node
+3) failed instance migration on target node
+4) successful N

Introducing a student for GSoC 2014

2014-02-26 Thread Constantinos Venetsanopoulos

Hello everybody,

I'd like to introduce to you Dimitris Bliablias. Dimitris
will be applying for this year's Ganeti GSoC for the
"Conversion between disk templates" idea:

https://code.google.com/p/ganeti/wiki/SummerOfCode2014Ideas

Dimitris has worked with us the last year and integrated
Ganeti with CouchDB (to act as the config.data store), in
the context of his diploma thesis. We will be glad to
mentor him and provide whatever is necessary for him to
work on the project.

Thanks,
Constantinos


List of available ExtStorage providers and OS definitions

2014-02-04 Thread Constantinos Venetsanopoulos

Hello everybody,

we just created two new pages on the Ganeti wiki:

1.List of all known ExtStorage providers:
https://code.google.com/p/ganeti/wiki/ExtStorageProviders

2.List of all known OS definitions:
https://code.google.com/p/ganeti/wiki/OSDefinitions

They are also accessible from the wiki's home page under Plugins
(https://code.google.com/p/ganeti/wiki/DocumentationByCategory).

The purpose of these pages is to have a central point on the
official site listing all available external plugins that can be used
with Ganeti. This way, existing Ganeti users can find out about
plugins they didn't know existed and potential new users can
have an overall view of Ganeti's options and integration with
external systems.

If you have implemented an ExtStorage provider, or OS definition
that is not included in these lists (and you want to share it),
please drop us a line on the ganeti@ mailing list along with a link
and description, so that we can include it.

Hope to see the lists growing soon.

Thanks,
Constantinos


Re: [PATCH master] OS installation redesign

2013-12-13 Thread Constantinos Venetsanopoulos

On 12/13/2013 03:48 PM, Michele Tartara wrote:

On Fri, Dec 13, 2013 at 2:31 PM, Jose A. Lopes  wrote:

On Fri, Dec 13, 2013 at 03:15:43PM +0200, Constantinos Venetsanopoulos wrote:

On 12/13/2013 03:08 PM, Michele Tartara wrote:

On Fri, Dec 13, 2013 at 2:06 PM, Michele Tartara  wrote:

On Fri, Dec 13, 2013 at 2:01 PM, Constantinos Venetsanopoulos
 wrote:

On 12/13/2013 12:53 PM, Michele Tartara wrote:

On Fri, Dec 13, 2013 at 11:36 AM, Constantinos Venetsanopoulos
 wrote:

On 12/13/2013 11:40 AM, Jose A. Lopes wrote:

On Thu, Dec 12, 2013 at 06:23:57PM +0200, Constantinos Venetsanopoulos
wrote:

Hello Michele,

On 12/12/2013 03:54 PM, Michele Tartara wrote:

On Thu, Dec 12, 2013 at 1:08 PM, Constantinos Venetsanopoulos
 wrote:

Hello Michele,

the design looks good. Just two points:

1. There is no description in the doc w.r.t. the location that the
  virtual appliance is going to be stored, and how is it going to
be
  spawned. I guess the disk-to-be-customized of the instance in
  ``install`` mode will have the disk template defined by the
user.
  However, how will the disk containing the virtual appliance get
  provisioned? In a QCOW manner maybe, since we want it to be fast
  and since it is going to be read-only?

I left out the location and file format of the virtual appliance as I
considered them as implementation details, that have to appear in the
documentation, rather than in the design.
QCOW sounds indeed as a good option, and probably
/srv/ganeti/helper_vm/ (or something similar) would be a good choice.

OK. That sounds good.


2. More importantly, I don't see a way how you could do the
following:
  dd a predefined OS image onto the disk-to-get-customized of the
  instance (like the one in ``self_install`` mode) and then spawn
  the virtual appliance and continue with the ``install``
procedure.
  How do you plan to support the above scenario which IMHO is
going
  to be the most common case? Maybe we should have the
:image:
  option in ``install`` mode too?

The idea is that in ``install`` mode things work more or less as they
do now, with OS installation scripts doing whatever they like, with
the additional safety of being run inside a VM, so it's up to the
scripts to decide how to write the image on the disk.

The problem is that you may want to access private repositories
to fetch the image data that you don't want them to be accessible
by untrusted code, or even access a host's local directory. You do
not want to do that from inside the helper VM. Right?

I know that you could do that from the trusted part of the
definition, but again I don't see how this is possible since you
say that trusted and untrusted code will run synchronously.

Another option would be to pass the data over the communication
channel, but I don't think this is the best way to do it either.

I proposed the :image: option thinking that we could
use the same mechanism that will already get implemented for
the ``self_install`` mode.

What do you think?

If I understand you correctly, you want to pass in an image to be used
as the user instance's disk, then boot the helper VM with the user
instance's disk mounted and run the untrusted OS scripts inside the
helper VM, while at the same time the trusted OS scripts run on the
host.  Is this correct?

Exactly.

And I think it could be the same codepath as in ``self_install``,
if the admin specifies the :image: option after the name
of the definition.

If the :image: is set, then the second disk of the instance
(since the first will contain the virtual appliance) will not be empty,
but rather contain the requested image data. I think it's much
simpler than having the scripts getting synchronized and pass
let's say 10GB of a Windows Image data over the communication
channel.

What do you think?


I like the idea in general, but this creates an issue: how can you
specify both an image and an os-installation script to be used? My
original design used two separate parameters for that, but then it was
suggested that they be united in a single one, "-o", with or without
the "image:" prefix.

Isn't it possible to do:

-o my_definition+default:image:
will fetch and extract the image (for ``self_install`` and ``install``).

-o my_definition+default
will not fetch and extract the image (for ``self_install`` and ``install``)

Am I missing something?


Just a lot more parsing required.

And, the fact that the original idea is not to have os scripts at all,

*for the ``self_install`` mode*


so we should also support one more option with respect to the ones
you are listing, which is:

   -o image:

You are right. I forgot that you can have ``self_install``
with no scripts at all.


But I think it's perfectly doable.

Perfect. So, we would have:

for ``install``:
-o my_definition+default:image:
(fetches and extracts image first, then spawns helper)
-o my_definiti

Re: [PATCH master] OS installation redesign

2013-12-13 Thread Constantinos Venetsanopoulos

On 12/13/2013 03:31 PM, Jose A. Lopes wrote:

On Fri, Dec 13, 2013 at 03:15:43PM +0200, Constantinos Venetsanopoulos wrote:

On 12/13/2013 03:08 PM, Michele Tartara wrote:

On Fri, Dec 13, 2013 at 2:06 PM, Michele Tartara  wrote:

On Fri, Dec 13, 2013 at 2:01 PM, Constantinos Venetsanopoulos
 wrote:

On 12/13/2013 12:53 PM, Michele Tartara wrote:

On Fri, Dec 13, 2013 at 11:36 AM, Constantinos Venetsanopoulos
 wrote:

On 12/13/2013 11:40 AM, Jose A. Lopes wrote:

On Thu, Dec 12, 2013 at 06:23:57PM +0200, Constantinos Venetsanopoulos
wrote:

Hello Michele,

On 12/12/2013 03:54 PM, Michele Tartara wrote:

On Thu, Dec 12, 2013 at 1:08 PM, Constantinos Venetsanopoulos
 wrote:

Hello Michele,

the design looks good. Just two points:

1. There is no description in the doc w.r.t. the location that the
  virtual appliance is going to be stored, and how is it going to
be
  spawned. I guess the disk-to-be-customized of the instance in
  ``install`` mode will have the disk template defined by the
user.
  However, how will the disk containing the virtual appliance get
  provisioned? In a QCOW manner maybe, since we want it to be fast
  and since it is going to be read-only?

I left out the location and file format of the virtual appliance as I
considered them as implementation details, that have to appear in the
documentation, rather than in the design.
QCOW sounds indeed as a good option, and probably
/srv/ganeti/helper_vm/ (or something similar) would be a good choice.

OK. That sounds good.


2. More importantly, I don't see a way how you could do the
following:
  dd a predefined OS image onto the disk-to-get-customized of the
  instance (like the one in ``self_install`` mode) and then spawn
  the virtual appliance and continue with the ``install``
procedure.
  How do you plan to support the above scenario which IMHO is
going
  to be the most common case? Maybe we should have the
:image:
  option in ``install`` mode too?

The idea is that in ``install`` mode things work more or less as they
do now, with OS installation scripts doing whatever they like, with
the additional safety of being run inside a VM, so it's up to the
scripts to decide how to write the image on the disk.

The problem is that you may want to access private repositories
to fetch the image data that you don't want them to be accessible
by untrusted code, or even access a host's local directory. You do
not want to do that from inside the helper VM. Right?

I know that you could do that from the trusted part of the
definition, but again I don't see how this is possible since you
say that trusted and untrusted code will run synchronously.

Another option would be to pass the data over the communication
channel, but I don't think this is the best way to do it either.

I proposed the :image: option thinking that we could
use the same mechanism that will already get implemented for
the ``self_install`` mode.

What do you think?

If I understand you correctly, you want to pass in an image to be used
as the user instance's disk, then boot the helper VM with the user
instance's disk mounted and run the untrusted OS scripts inside the
helper VM, while at the same time the trusted OS scripts run on the
host.  Is this correct?

Exactly.

And I think it could be the same codepath as in ``self_install``,
if the admin specifies the :image: option after the name
of the definition.

If the :image: is set, then the second disk of the instance
(since the first will contain the virtual appliance) will not be empty,
but rather contain the requested image data. I think it's much
simpler than having the scripts getting synchronized and pass
let's say 10GB of a Windows Image data over the communication
channel.

What do you think?


I like the idea in general, but this creates an issue: how can you
specify both an image and an os-installation script to be used? My
original design used two separate parameters for that, but then it was
suggested that they be united in a single one, "-o", with or without
the "image:" prefix.

Isn't it possible to do:

-o my_definition+default:image:
will fetch and extract the image (for ``self_install`` and ``install``).

-o my_definition+default
will not fetch and extract the image (for ``self_install`` and ``install``)

Am I missing something?


Just a lot more parsing required.

And, the fact that the original idea is not to have os scripts at all,

*for the ``self_install`` mode*


so we should also support one more option with respect to the ones
you are listing, which is:

   -o image:

You are right. I forgot that you can have ``self_install``
with no scripts at all.


But I think it's perfectly doable.

Perfect. So, we would have:

for ``install``:
-o my_definition+default:image:
(fetches and extracts image first, then spawns helper)
-o my_definition+default
(just runs the scripts)

and for ``self_insta

Re: [PATCH master] OS installation redesign

2013-12-13 Thread Constantinos Venetsanopoulos

On 12/13/2013 03:08 PM, Michele Tartara wrote:

On Fri, Dec 13, 2013 at 2:06 PM, Michele Tartara  wrote:

On Fri, Dec 13, 2013 at 2:01 PM, Constantinos Venetsanopoulos
 wrote:

On 12/13/2013 12:53 PM, Michele Tartara wrote:

On Fri, Dec 13, 2013 at 11:36 AM, Constantinos Venetsanopoulos
 wrote:

On 12/13/2013 11:40 AM, Jose A. Lopes wrote:

On Thu, Dec 12, 2013 at 06:23:57PM +0200, Constantinos Venetsanopoulos
wrote:

Hello Michele,

On 12/12/2013 03:54 PM, Michele Tartara wrote:

On Thu, Dec 12, 2013 at 1:08 PM, Constantinos Venetsanopoulos
 wrote:

Hello Michele,

the design looks good. Just two points:

1. There is no description in the doc w.r.t. the location that the
  virtual appliance is going to be stored, and how is it going to
be
  spawned. I guess the disk-to-be-customized of the instance in
  ``install`` mode will have the disk template defined by the
user.
  However, how will the disk containing the virtual appliance get
  provisioned? In a QCOW manner maybe, since we want it to be fast
  and since it is going to be read-only?

I left out the location and file format of the virtual appliance as I
considered them as implementation details, that have to appear in the
documentation, rather than in the design.
QCOW sounds indeed as a good option, and probably
/srv/ganeti/helper_vm/ (or something similar) would be a good choice.

OK. That sounds good.


2. More importantly, I don't see a way how you could do the
following:
  dd a predefined OS image onto the disk-to-get-customized of the
  instance (like the one in ``self_install`` mode) and then spawn
  the virtual appliance and continue with the ``install``
procedure.
  How do you plan to support the above scenario which IMHO is
going
  to be the most common case? Maybe we should have the
:image:
  option in ``install`` mode too?

The idea is that in ``install`` mode things work more or less as they
do now, with OS installation scripts doing whatever they like, with
the additional safety of being run inside a VM, so it's up to the
scripts to decide how to write the image on the disk.

The problem is that you may want to access private repositories
to fetch the image data that you don't want them to be accessible
by untrusted code, or even access a host's local directory. You do
not want to do that from inside the helper VM. Right?

I know that you could do that from the trusted part of the
definition, but again I don't see how this is possible since you
say that trusted and untrusted code will run synchronously.

Another option would be to pass the data over the communication
channel, but I don't think this is the best way to do it either.

I proposed the :image: option thinking that we could
use the same mechanism that will already get implemented for
the ``self_install`` mode.

What do you think?

If I understand you correctly, you want to pass in an image to be used
as the user instance's disk, then boot the helper VM with the user
instance's disk mounted and run the untrusted OS scripts inside the
helper VM, while at the same time the trusted OS scripts run on the
host.  Is this correct?


Exactly.

And I think it could be the same codepath as in ``self_install``,
if the admin specifies the :image: option after the name
of the definition.

If the :image: is set, then the second disk of the instance
(since the first will contain the virtual appliance) will not be empty,
but rather contain the requested image data. I think it's much
simpler than having the scripts getting synchronized and pass
let's say 10GB of a Windows Image data over the communication
channel.

What do you think?


I like the idea in general, but this creates an issue: how can you
specify both an image and an os-installation script to be used? My
original design used two separate parameters for that, but then it was
suggested that they be united in a single one, "-o", with or without
the "image:" prefix.


Isn't it possible to do:

-o my_definition+default:image:
will fetch and extract the image (for ``self_install`` and ``install``).

-o my_definition+default
will not fetch and extract the image (for ``self_install`` and ``install``)

Am I missing something?


Just a lot more parsing required.

And, the fact that the original idea is not to have os scripts at all,

*for the ``self_install`` mode*


so we should also support one more option with respect to the ones
you are listing, which is:

   -o image:


You are right. I forgot that you can have ``self_install``
with no scripts at all.


But I think it's perfectly doable.


Perfect. So, we would have:

for ``install``:
-o my_definition+default:image:
(fetches and extracts image first, then spawns helper)
-o my_definition+default
(just runs the scripts)

and for ``self_install``:
-o image:
(fetches and extracts image onto disk, doesn't run scripts)
None
(doesn't fetch and extract image

Re: [PATCH master] OS installation redesign

2013-12-13 Thread Constantinos Venetsanopoulos

On 12/13/2013 12:53 PM, Michele Tartara wrote:

On Fri, Dec 13, 2013 at 11:36 AM, Constantinos Venetsanopoulos
 wrote:

On 12/13/2013 11:40 AM, Jose A. Lopes wrote:

On Thu, Dec 12, 2013 at 06:23:57PM +0200, Constantinos Venetsanopoulos
wrote:

Hello Michele,

On 12/12/2013 03:54 PM, Michele Tartara wrote:

On Thu, Dec 12, 2013 at 1:08 PM, Constantinos Venetsanopoulos
 wrote:

Hello Michele,

the design looks good. Just two points:

1. There is no description in the doc w.r.t. the location that the
 virtual appliance is going to be stored, and how is it going to be
 spawned. I guess the disk-to-be-customized of the instance in
 ``install`` mode will have the disk template defined by the user.
 However, how will the disk containing the virtual appliance get
 provisioned? In a QCOW manner maybe, since we want it to be fast
 and since it is going to be read-only?

I left out the location and file format of the virtual appliance as I
considered them as implementation details, that have to appear in the
documentation, rather than in the design.
QCOW sounds indeed as a good option, and probably
/srv/ganeti/helper_vm/ (or something similar) would be a good choice.

OK. That sounds good.


2. More importantly, I don't see a way how you could do the following:
 dd a predefined OS image onto the disk-to-get-customized of the
 instance (like the one in ``self_install`` mode) and then spawn
 the virtual appliance and continue with the ``install`` procedure.
 How do you plan to support the above scenario which IMHO is going
 to be the most common case? Maybe we should have the :image:
 option in ``install`` mode too?

The idea is that in ``install`` mode things work more or less as they
do now, with OS installation scripts doing whatever they like, with
the additional safety of being run inside a VM, so it's up to the
scripts to decide how to write the image on the disk.

The problem is that you may want to access private repositories
to fetch the image data that you don't want them to be accessible
by untrusted code, or even access a host's local directory. You do
not want to do that from inside the helper VM. Right?

I know that you could do that from the trusted part of the
definition, but again I don't see how this is possible since you
say that trusted and untrusted code will run synchronously.

Another option would be to pass the data over the communication
channel, but I don't think this is the best way to do it either.

I proposed the :image: option thinking that we could
use the same mechanism that will already get implemented for
the ``self_install`` mode.

What do you think?

If I understand you correctly, you want to pass in an image to be used
as the user instance's disk, then boot the helper VM with the user
instance's disk mounted and run the untrusted OS scripts inside the
helper VM, while at the same time the trusted OS scripts run on the
host.  Is this correct?


Exactly.

And I think it could be the same codepath as in ``self_install``,
if the admin specifies the :image: option after the name
of the definition.

If the :image: is set, then the second disk of the instance
(since the first will contain the virtual appliance) will not be empty,
but rather contain the requested image data. I think it's much
simpler than having the scripts getting synchronized and pass
let's say 10GB of a Windows Image data over the communication
channel.

What do you think?


I like the idea in general, but this creates an issue: how can you
specify both an image and an os-installation script to be used? My
original design used two separate parameters for that, but then it was
suggested that they be united in a single one, "-o", with or without
the "image:" prefix.


Isn't it possible to do:

-o my_definition+default:image:
will fetch and extract the image (for ``self_install`` and ``install``).

-o my_definition+default
will not fetch and extract the image (for ``self_install`` and ``install``)

Am I missing something?

Regards,
Constantinos



Thanks,
Michele





Re: [PATCH master] OS installation redesign

2013-12-13 Thread Constantinos Venetsanopoulos

On 12/13/2013 10:50 AM, Michele Tartara wrote:

On Thu, Dec 12, 2013 at 5:23 PM, Constantinos Venetsanopoulos
 wrote:

Hello Michele,


On 12/12/2013 03:54 PM, Michele Tartara wrote:

On Thu, Dec 12, 2013 at 1:08 PM, Constantinos Venetsanopoulos
 wrote:

Hello Michele,

the design looks good. Just two points:

1. There is no description in the doc w.r.t. the location that the
 virtual appliance is going to be stored, and how is it going to be
 spawned. I guess the disk-to-be-customized of the instance in
 ``install`` mode will have the disk template defined by the user.
 However, how will the disk containing the virtual appliance get
 provisioned? In a QCOW manner maybe, since we want it to be fast
 and since it is going to be read-only?

I left out the location and file format of the virtual appliance as I
considered them as implementation details, that have to appear in the
documentation, rather than in the design.
QCOW sounds indeed as a good option, and probably
/srv/ganeti/helper_vm/ (or something similar) would be a good choice.


OK. That sounds good.



2. More importantly, I don't see a way how you could do the following:
 dd a predefined OS image onto the disk-to-get-customized of the
 instance (like the one in ``self_install`` mode) and then spawn
 the virtual appliance and continue with the ``install`` procedure.
 How do you plan to support the above scenario which IMHO is going
 to be the most common case? Maybe we should have the :image:
 option in ``install`` mode too?

The idea is that in ``install`` mode things work more or less as they
do now, with OS installation scripts doing whatever they like, with
the additional safety of being run inside a VM, so it's up to the
scripts to decide how to write the image on the disk.


The problem is that you may want to access private repositories
to fetch the image data that you don't want them to be accessible
by untrusted code, or even access a host's local directory. You do
not want to do that from inside the helper VM. Right?

I know that you could do that from the trusted part of the
definition, but again I don't see how this is possible since you
say that trusted and untrusted code will run synchronously.

Another option would be to pass the data over the communication
channel, but I don't think this is the best way to do it either.

I proposed the :image: option thinking that we could
use the same mechanism that will already get implemented for
the ``self_install`` mode.

What do you think?

My idea was that in such a case the user can use the communication
channel to have the untrusted script wait for the trusted one. So, the
trusted one can download the image and put it on the disk, then it
notifies the untrusted scripts that can go on with the rest of the
work.

But let's hear also what Jose thinks about this, given that he's the
one who's going to work on the implementation of this part.



Please see my comments on Jose's reply.

Thanks,
Constantinos




What is not explicit in the design, and that could be indeed added, is
to specify a location, part of the metadata, where the URL of the
image appears, so that the scripts can actually fetch it and use it.


Yes, I also think we should have that in any case.

Thanks,
Constantinos


Thanks,
Michele


Cheers,
Michele





Re: [PATCH master] OS installation redesign

2013-12-13 Thread Constantinos Venetsanopoulos

On 12/13/2013 11:40 AM, Jose A. Lopes wrote:

On Thu, Dec 12, 2013 at 06:23:57PM +0200, Constantinos Venetsanopoulos wrote:

Hello Michele,

On 12/12/2013 03:54 PM, Michele Tartara wrote:

On Thu, Dec 12, 2013 at 1:08 PM, Constantinos Venetsanopoulos
 wrote:

Hello Michele,

the design looks good. Just two points:

1. There is no description in the doc w.r.t. the location that the
virtual appliance is going to be stored, and how is it going to be
spawned. I guess the disk-to-be-customized of the instance in
``install`` mode will have the disk template defined by the user.
However, how will the disk containing the virtual appliance get
provisioned? In a QCOW manner maybe, since we want it to be fast
and since it is going to be read-only?

I left out the location and file format of the virtual appliance as I
considered them as implementation details, that have to appear in the
documentation, rather than in the design.
QCOW sounds indeed as a good option, and probably
/srv/ganeti/helper_vm/ (or something similar) would be a good choice.

OK. That sounds good.


2. More importantly, I don't see a way how you could do the following:
dd a predefined OS image onto the disk-to-get-customized of the
instance (like the one in ``self_install`` mode) and then spawn
the virtual appliance and continue with the ``install`` procedure.
How do you plan to support the above scenario which IMHO is going
to be the most common case? Maybe we should have the :image:
option in ``install`` mode too?

The idea is that in ``install`` mode things work more or less as they
do now, with OS installation scripts doing whatever they like, with
the additional safety of being run inside a VM, so it's up to the
scripts to decide how to write the image on the disk.

The problem is that you may want to access private repositories
to fetch the image data that you don't want them to be accessible
by untrusted code, or even access a host's local directory. You do
not want to do that from inside the helper VM. Right?

I know that you could do that from the trusted part of the
definition, but again I don't see how this is possible since you
say that trusted and untrusted code will run synchronously.

Another option would be to pass the data over the communication
channel, but I don't think this is the best way to do it either.

I proposed the :image: option thinking that we could
use the same mechanism that will already get implemented for
the ``self_install`` mode.

What do you think?

If I understand you correctly, you want to pass in an image to be used
as the user instance's disk, then boot the helper VM with the user
instance's disk mounted and run the untrusted OS scripts inside the
helper VM, while at the same time the trusted OS scripts run on the
host.  Is this correct?


Exactly.

And I think it could be the same codepath as in ``self_install``,
if the admin specifies the :image: option after the name
of the definition.

If the :image: is set, then the second disk of the instance
(since the first will contain the virtual appliance) will not be empty,
but rather contain the requested image data. I think it's much
simpler than having the scripts getting synchronized and pass
let's say 10GB of a Windows Image data over the communication
channel.

What do you think?


What is not explicit in the design, and that could be indeed added, is
to specify a location, part of the metadata, where the URL of the
image appears, so that the scripts can actually fetch it and use it.

Yes, I also think we should have that in any case.

Thanks,
Constantinos


Thanks,
Michele





Re: [PATCH master] OS installation redesign

2013-12-12 Thread Constantinos Venetsanopoulos

Hello Michele,

On 12/12/2013 03:54 PM, Michele Tartara wrote:

On Thu, Dec 12, 2013 at 1:08 PM, Constantinos Venetsanopoulos
 wrote:

Hello Michele,

the design looks good. Just two points:

1. There is no description in the doc w.r.t. the location that the
virtual appliance is going to be stored, and how is it going to be
spawned. I guess the disk-to-be-customized of the instance in
``install`` mode will have the disk template defined by the user.
However, how will the disk containing the virtual appliance get
provisioned? In a QCOW manner maybe, since we want it to be fast
and since it is going to be read-only?

I left out the location and file format of the virtual appliance as I
considered them as implementation details, that have to appear in the
documentation, rather than in the design.
QCOW sounds indeed as a good option, and probably
/srv/ganeti/helper_vm/ (or something similar) would be a good choice.


OK. That sounds good.


2. More importantly, I don't see a way how you could do the following:
dd a predefined OS image onto the disk-to-get-customized of the
instance (like the one in ``self_install`` mode) and then spawn
the virtual appliance and continue with the ``install`` procedure.
How do you plan to support the above scenario which IMHO is going
to be the most common case? Maybe we should have the :image:
option in ``install`` mode too?

The idea is that in ``install`` mode things work more or less as they
do now, with OS installation scripts doing whatever they like, with
the additional safety of being run inside a VM, so it's up to the
scripts to decide how to write the image on the disk.


The problem is that you may want to access private repositories
to fetch the image data that you don't want them to be accessible
by untrusted code, or even access a host's local directory. You do
not want to do that from inside the helper VM. Right?

I know that you could do that from the trusted part of the
definition, but again I don't see how this is possible since you
say that trusted and untrusted code will run synchronously.

Another option would be to pass the data over the communication
channel, but I don't think this is the best way to do it either.

I proposed the :image: option thinking that we could
use the same mechanism that will already get implemented for
the ``self_install`` mode.

What do you think?


What is not explicit in the design, and that could be indeed added, is
to specify a location, part of the metadata, where the URL of the
image appears, so that the scripts can actually fetch it and use it.


Yes, I also think we should have that in any case.

Thanks,
Constantinos


Thanks,
Michele





Re: [PATCH master] OS installation redesign

2013-12-12 Thread Constantinos Venetsanopoulos

Hello Michele,

the design looks good. Just two points:

1. There is no description in the doc w.r.t. the location that the
   virtual appliance is going to be stored, and how is it going to be
   spawned. I guess the disk-to-be-customized of the instance in
   ``install`` mode will have the disk template defined by the user.
   However, how will the disk containing the virtual appliance get
   provisioned? In a QCOW manner maybe, since we want it to be fast
   and since it is going to be read-only?

2. More importantly, I don't see a way how you could do the following:
   dd a predefined OS image onto the disk-to-get-customized of the
   instance (like the one in ``self_install`` mode) and then spawn
   the virtual appliance and continue with the ``install`` procedure.
   How do you plan to support the above scenario which IMHO is going
   to be the most common case? Maybe we should have the :image:
   option in ``install`` mode too?

Thanks,
Constantinos


On 12/12/2013 11:06 AM, Michele Tartara wrote:

Add the document describing a new design for the OS installation process for
new instances.

Signed-off-by: Michele Tartara 
Signed-off-by: Jose A. Lopes 
---
  doc/design-draft.rst |1 +
  doc/design-os.rst|  399 ++
  2 files changed, 400 insertions(+)
  create mode 100644 doc/design-os.rst

diff --git a/doc/design-draft.rst b/doc/design-draft.rst
index c821292..3ed3852 100644
--- a/doc/design-draft.rst
+++ b/doc/design-draft.rst
@@ -20,6 +20,7 @@ Design document drafts
 design-daemons.rst
 design-hsqueeze.rst
 design-ssh-ports.rst
+   design-os.rst
  
  .. vim: set textwidth=72 :

  .. Local Variables:
diff --git a/doc/design-os.rst b/doc/design-os.rst
new file mode 100644
index 000..5714efc
--- /dev/null
+++ b/doc/design-os.rst
@@ -0,0 +1,399 @@
+===
+Ganeti OS installation redesign
+===
+
+.. contents:: :depth: 3
+
+This is a design document detailing a new OS installation procedure, which is
+more secure, able to provide more features and easier to use for many common
+tasks w.r.t. the current one.
+
+Current state and shortcomings
+==
+
+As of Ganeti 2.10, each instance is associated with an OS definition. An OS
+definition is a set of scripts (``create``, ``export``, ``import``, ``rename``)
+that are executed with root privileges on the primary host of the instance to
+perform all the OS-related functionality (setting up an operating system inside
+the disks of the instance being created, exporting/importing the instance,
+renaming it).
+
+These scripts receive, as environment variables, a fixed set of parameters
+related to the instance (such as the hypervisor, the name of the instance, the
+number of disks, and their location) and a set of user defined parameters.
+These parameters are also written in the configuration file of Ganeti, to allow
+future reinstalls of the instance, and in various log files, namely:
+
+* node daemon log file: contains DEBUG strings of the ``/os_validate``,
+  ``/instance_os_add`` and ``/instance_start`` RPC calls.
+
+* master daemon log file: DEBUG strings related to the same RPC calls are 
stored
+  here as well.
+
+* commands log: the CLI commands that create a new instance, including their
+  parameters, are logged here.
+
+* RAPI log: the RAPI commands that create a new instance, including their
+  parameters, are logged here.
+
+* job logs: the job files stored in the job queue, or in its archive, contain
+  the parameters.
+
+The current situation presents a number of shortcomings:
+
+* Having the installation scripts run as root on the nodes doesn't allow
+  user-defined OS scripts, as they would pose a huge security issue.
+  Furthermore, even a script without malicious intentions might end up
+  distrupting a node because of a bug in it.
+
+* Ganeti cannot be used to create instances starting from user provided disk
+  images: even in the (hypothetical) case where the scripts are completely
+  secure and run not by root but by an unprivileged user with only the power to
+  mount arbitrary files as disk images, this is a security issue. It has been
+  proven that a carefully crafted file system might exploit kernel
+  vulnerabilities to gain control of the system. Therefore, directly mounting
+  images on the Ganeti nodes is not an option.
+
+* There is no way to inject files into an existing disk image. A common use 
case
+  is for the system administrator to provide a standard image of the system, to
+  be later personalized with the network configuration, private keys 
identifying
+  the machine, ssh keys of the users and so on. A possible workaround would be
+  for the scripts to mount the image (only if this is trusted!) and to receive
+  the configurations and ssh keys as user defined OS parameters. Unfortunately,
+  this is also not an option for security sensitive material (such as the ssh
+  keys)

Re: [PATCH stable-2.8] Allow modification of arbitrary params for ext

2013-12-10 Thread Constantinos Venetsanopoulos

On 12/10/2013 02:34 PM, Guido Trotter wrote:


Makes sense. I'm just worried that the change will expose bugs that 
might hardcode the default values in the dict and there would be no 
way to reset them to default. I guess we should address this checking 
in qa.




Ack.

So, we are good and we should discuss how we can
enable changing parameters also at disk level in a
generic way. Right?

Thanks,
Constantinos


G

On 10 Dec 2013 13:31, "Constantinos Venetsanopoulos" <mailto:c...@grnet.gr>> wrote:


On 12/10/2013 02:26 PM, Guido Trotter wrote:

On Tue, Dec 10, 2013 at 1:03 PM, Dimitris Aragiorgis
mailto:dim...@grnet.gr>> wrote:

Disks of ext template are allowed to have arbitrary parameters
stored in Disk object's params slot. Those parameters can be
passed during creation of a new disk, either in
LUInstanceCreate()
or in LUInsanceSetParams(). Still those parameters can not be
changed afterwards. With this patch we override this
limitation.

Additionally, for other disk templates we allow modifying only
name and mode. If any other parameter is passed
_VerifyDiskModification() will raise an exception.

Why not allowing it also for others?


Which ones exactly? I don't think I'm following here.

Do you mean the "vgname"? If yes, I think this belongs
to a different patch that will introduce such functionality.

Thanks,
Constantinos

Signed-off-by: Dimitris Aragiorgis mailto:dim...@grnet.gr>>
---
  lib/cmdlib/instance.py |   23 +++
  1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/lib/cmdlib/instance.py b/lib/cmdlib/instance.py
index be30315..0a0d29b 100644
--- a/lib/cmdlib/instance.py
+++ b/lib/cmdlib/instance.py
@@ -2277,8 +2277,7 @@ class LUInstanceSetParams(LogicalUnit):
else:
  raise errors.ProgrammerError("Unhandled
operation '%s'" % op)

-  @staticmethod
-  def _VerifyDiskModification(op, params):
+  def _VerifyDiskModification(self, op, params):

Can't you leave this as static and pass the instance in?

  """Verifies a disk modification.

  """
@@ -2308,10 +2307,13 @@ class
LUInstanceSetParams(LogicalUnit):
if constants.IDISK_SIZE in params:
  raise errors.OpPrereqError("Disk size change not
possible, use"
 " grow-disk",
errors.ECODE_INVAL)
-  if len(params) > 2:
-raise errors.OpPrereqError("Disk modification
doesn't support"
-   " additional arbitrary
parameters",
-   errors.ECODE_INVAL)
+
+  # Disk modification supports changing only the disk
name and mode.
+  # Changing arbitrary parameters is allowed only for
ext disk template",
+  if self.instance.disk_template != constants.DT_EXT:
+allowed_mods = [constants.IDISK_MODE,
constants.IDISK_NAME]

this belongs to constants, not here.

+utils.ForceDictType(params, allowed_mods)
+
name = params.get(constants.IDISK_NAME, None)
if name is not None and name.lower() ==
constants.VALUE_NONE:
  params[constants.IDISK_NAME] = None
@@ -3182,8 +3184,7 @@ class LUInstanceSetParams(LogicalUnit):
("disk/%d" % idx, "add:size=%s,mode=%s" %
(disk.size, disk.mode)),
])

-  @staticmethod
-  def _ModifyDisk(idx, disk, params, _):
+  def _ModifyDisk(self, idx, disk, params, _):
  """Modifies a disk.

  """
@@ -3196,6 +3197,12 @@ class LUInstanceSetParams(LogicalUnit):
disk.name <http://disk.name> =
params.get(constants.IDISK_NAME)
changes.append(("disk.name/%d
<http://disk.name/%d>" % idx, disk.name <http://disk.name>))

+for key, value in params.iteritems():
+  if (key not in [constants.IDISK_MODE,
constants.IDISK_NAME] and
+  self.instance.disk_template == constants.DT_EXT):
+disk.params[key] = value
+changes.append(("disk.params:%s/%d" % (key, idx),
value))
+
  return changes

def _RemoveDisk(self, idx, root, _):

Thanks,

Guido






Re: [PATCH stable-2.8] Allow modification of arbitrary params for ext

2013-12-10 Thread Constantinos Venetsanopoulos

On 12/10/2013 02:26 PM, Guido Trotter wrote:

On Tue, Dec 10, 2013 at 1:03 PM, Dimitris Aragiorgis  wrote:

Disks of ext template are allowed to have arbitrary parameters
stored in Disk object's params slot. Those parameters can be
passed during creation of a new disk, either in LUInstanceCreate()
or in LUInsanceSetParams(). Still those parameters can not be
changed afterwards. With this patch we override this limitation.

Additionally, for other disk templates we allow modifying only
name and mode. If any other parameter is passed
_VerifyDiskModification() will raise an exception.


Why not allowing it also for others?


Which ones exactly? I don't think I'm following here.

Do you mean the "vgname"? If yes, I think this belongs
to a different patch that will introduce such functionality.

Thanks,
Constantinos


Signed-off-by: Dimitris Aragiorgis 
---
  lib/cmdlib/instance.py |   23 +++
  1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/lib/cmdlib/instance.py b/lib/cmdlib/instance.py
index be30315..0a0d29b 100644
--- a/lib/cmdlib/instance.py
+++ b/lib/cmdlib/instance.py
@@ -2277,8 +2277,7 @@ class LUInstanceSetParams(LogicalUnit):
else:
  raise errors.ProgrammerError("Unhandled operation '%s'" % op)

-  @staticmethod
-  def _VerifyDiskModification(op, params):
+  def _VerifyDiskModification(self, op, params):

Can't you leave this as static and pass the instance in?


  """Verifies a disk modification.

  """
@@ -2308,10 +2307,13 @@ class LUInstanceSetParams(LogicalUnit):
if constants.IDISK_SIZE in params:
  raise errors.OpPrereqError("Disk size change not possible, use"
 " grow-disk", errors.ECODE_INVAL)
-  if len(params) > 2:
-raise errors.OpPrereqError("Disk modification doesn't support"
-   " additional arbitrary parameters",
-   errors.ECODE_INVAL)
+
+  # Disk modification supports changing only the disk name and mode.
+  # Changing arbitrary parameters is allowed only for ext disk template",
+  if self.instance.disk_template != constants.DT_EXT:
+allowed_mods = [constants.IDISK_MODE, constants.IDISK_NAME]

this belongs to constants, not here.


+utils.ForceDictType(params, allowed_mods)
+
name = params.get(constants.IDISK_NAME, None)
if name is not None and name.lower() == constants.VALUE_NONE:
  params[constants.IDISK_NAME] = None
@@ -3182,8 +3184,7 @@ class LUInstanceSetParams(LogicalUnit):
("disk/%d" % idx, "add:size=%s,mode=%s" % (disk.size, disk.mode)),
])

-  @staticmethod
-  def _ModifyDisk(idx, disk, params, _):
+  def _ModifyDisk(self, idx, disk, params, _):
  """Modifies a disk.

  """
@@ -3196,6 +3197,12 @@ class LUInstanceSetParams(LogicalUnit):
disk.name = params.get(constants.IDISK_NAME)
changes.append(("disk.name/%d" % idx, disk.name))

+for key, value in params.iteritems():
+  if (key not in [constants.IDISK_MODE, constants.IDISK_NAME] and
+  self.instance.disk_template == constants.DT_EXT):
+disk.params[key] = value
+changes.append(("disk.params:%s/%d" % (key, idx), value))
+
  return changes

def _RemoveDisk(self, idx, root, _):

Thanks,

Guido





Re: [PATCH stable-2.8] Do not override disk.params in case of ext

2013-12-09 Thread Constantinos Venetsanopoulos

On 12/09/2013 05:22 PM, Guido Trotter wrote:

On Mon, Dec 9, 2013 at 2:35 PM, Dimitris Aragiorgis  wrote:

* Constantinos Venetsanopoulos  [2013-12-02 17:23:38 +0200]:


Hello Guido,

On 12/02/2013 05:07 PM, Guido Trotter wrote:

The title of the patch is misleading: it seems you're fixing something
just for "ext" while you're actually making a change that affects all
disk templates, due to a bug that affects ext. Problem here is that
doing that you're reverting a bugfix from last year, that says that
some parameter that weren't supposed to be there showed up there.

I totally agree with you regarding the wording.
This should change.


Ack, thanks.


  A
proper fix would include making sure that the parameters disk stays
empty for other disk templates where overriding is not supported.

However, if I understand correctly, the bug was introduced
in UpgradeConfig and was fixed wrong twice. The initial code
in UpgradeConfig actually filled the dict. If that's right, doesn't
this fix suffice?


Any comment on that?


As long as you confirm (I haven't tried) that by gnt-instance modify
it's always possible to update the change *and* to reset them to
default (aka remove an overridal from this disk) then the fix is
perfect. If that's not possible then fixing gnt-instance modify to
allow it is also required.


Ack.

Dimitris will send the appropriate patches right away.

Thanks,
Constantinos


Thanks a lot,

Guido




Thanks,
Constantinos


This should be done at instance save time, but not at UpgradeConfig
time of course, so the core of this patch is OK, provided the extra
work is added for non-ext disk templates.

Thanks a lot,

Guido

On Mon, Dec 2, 2013 at 3:44 PM, Dimitris Aragiorgis  wrote:

Commits 5dbee5e and cce4616 fix disk upgrades concerning params
slot. Since 2.7 params slot should be empty and gets filled
any time needed.

Still ext template allows passing arbitrary params per disk.
These params should be saved in config file for future use.
For instance if we have the shared-filer provider and we
specify shared_dir param during instance create, this param
is needed when we want to attach the disk e.g., during
retrieving instance info. If it gets overridden during a daemon
restart or a config reload we fail to get the instance's info.

To avoid such a failure, we set params slot to an empty dict
only if params not found in the first place.

Signed-off-by: Dimitris Aragiorgis 
---
  lib/objects.py |7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/lib/objects.py b/lib/objects.py
index ad9f1d7..f437fb8 100644
--- a/lib/objects.py
+++ b/lib/objects.py
@@ -828,7 +828,12 @@ class Disk(ConfigObject):
  child.UpgradeConfig()

  # FIXME: Make this configurable in Ganeti 2.7
-self.params = {}
+# Params should be an empty dict that gets filled any time needed
+# In case of ext template we allow arbitrary params that should not
+# be overrided during a config reload/upgrade.
+if not self.params or not isinstance(self.params, dict):
+  self.params = {}
+
  # add here config upgrade for this disk

@staticmethod
--
1.7.10.4




-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)

iQEcBAEBAgAGBQJSpccYAAoJEHFDHex6CBG9ZZEIAKjzZOS7MHRGaB5IdQEI/UD9
yfh71l7NS+j0M2BQbtJHh9Tj9//qR8GLD+6/BsyPsk3J6+8ksaEKNM6TSeO6C6E1
cvYQR0cgMGdTVQxZjw2MUplDOJwq5/bSInXE4yqX3RezfzfbclL0G6PSNBvPwjI9
BB2GFH+1LXIxdNzar24DYUnUKjNo/J0qcp6qNcLfuNWb+qm7W9SGcHbq97z1B3eU
cc9kl2R2ZeL1/CdUWMWE/rpdiTGHwAV0oupD7bFgtf6Ir5WYla71oxwvZ9fp3C5S
DyOmjcOx8nhYyp4i97/UtfOvAERy114UQZpLMU1nyH4SsNs7xz/swSAL4eOJhXw=
=nytf
-END PGP SIGNATURE-








Re: [PATCH stable-2.8] Do not override disk.params in case of ext

2013-12-02 Thread Constantinos Venetsanopoulos

Hello Guido,

On 12/02/2013 05:07 PM, Guido Trotter wrote:

The title of the patch is misleading: it seems you're fixing something
just for "ext" while you're actually making a change that affects all
disk templates, due to a bug that affects ext. Problem here is that
doing that you're reverting a bugfix from last year, that says that
some parameter that weren't supposed to be there showed up there.


I totally agree with you regarding the wording.
This should change.


  A
proper fix would include making sure that the parameters disk stays
empty for other disk templates where overriding is not supported.


However, if I understand correctly, the bug was introduced
in UpgradeConfig and was fixed wrong twice. The initial code
in UpgradeConfig actually filled the dict. If that's right, doesn't
this fix suffice?

Thanks,
Constantinos


This should be done at instance save time, but not at UpgradeConfig
time of course, so the core of this patch is OK, provided the extra
work is added for non-ext disk templates.

Thanks a lot,

Guido

On Mon, Dec 2, 2013 at 3:44 PM, Dimitris Aragiorgis  wrote:

Commits 5dbee5e and cce4616 fix disk upgrades concerning params
slot. Since 2.7 params slot should be empty and gets filled
any time needed.

Still ext template allows passing arbitrary params per disk.
These params should be saved in config file for future use.
For instance if we have the shared-filer provider and we
specify shared_dir param during instance create, this param
is needed when we want to attach the disk e.g., during
retrieving instance info. If it gets overridden during a daemon
restart or a config reload we fail to get the instance's info.

To avoid such a failure, we set params slot to an empty dict
only if params not found in the first place.

Signed-off-by: Dimitris Aragiorgis 
---
  lib/objects.py |7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/lib/objects.py b/lib/objects.py
index ad9f1d7..f437fb8 100644
--- a/lib/objects.py
+++ b/lib/objects.py
@@ -828,7 +828,12 @@ class Disk(ConfigObject):
  child.UpgradeConfig()

  # FIXME: Make this configurable in Ganeti 2.7
-self.params = {}
+# Params should be an empty dict that gets filled any time needed
+# In case of ext template we allow arbitrary params that should not
+# be overrided during a config reload/upgrade.
+if not self.params or not isinstance(self.params, dict):
+  self.params = {}
+
  # add here config upgrade for this disk

@staticmethod
--
1.7.10.4








Re: [PATCH master] Gluster: Update design document

2013-12-02 Thread Constantinos Venetsanopoulos

On 12/02/2013 02:11 PM, Santi Raffa wrote:

On Mon, Dec 2, 2013 at 12:21 PM, Constantinos Venetsanopoulos
 wrote:

I see that you "Unmount" the volume e.g. inside the "Shutdown"
method. Why would you want to do that even if it was the last
disk on that physical node? Isn't it really possible that you will
need to "Mount" it again soon, once another disk gets created
on that physical node?

Yes, it could possible. Creating the first kernelspace Gluster
instance on a node would do that if it is started immediately Same
with destroying the last running kernelspace Gluster instance on a
node.


OK, so what I understand from what you're saying is that you
will mount only one time for the first instance on the node and
you will unmount when removing the last instance on the node.
Is that right? And if the volume is already mounted by the admin
even before the first instance gets created, this will be idempotent?


That said:
* if we do use userspace, one of the two mounts is not needed in the
first place.


I don't think I get that. Which mount is not needed? Don't
you need to mount to be able to run the OS scripts?


* if we don't, there's still no guarantee of coupling between adding
and starting an instance (gnt-instance add ... --no-start) or between
stopping and removing an instance (gnt-instance shutdown ...)


Again, even without starting, don't you need to mount to
be able to run the OS scripts as above? Maybe with
--no-start and --no-install you could bypass it...


* If this is a problem, it's probably a problem with the callers of
Attach and Shutdown, rather than their implementation.

Am I perhaps misreading you here?

So yes, there are a few... specific, yet common cases where an
instance might be shutdown then reattached immediately thereafter. I
agree this could be optimized, but doing so correctly seems a little
difficult or otherwise out of scope here.


OK. I Just wanted to have a clear view of the way you
mount and unmount the volumes and why.

Thanks a lot for the expalanation,
Constantinos



Re: [PATCH master] Gluster: Update design document

2013-12-02 Thread Constantinos Venetsanopoulos

Hello,

On 11/28/2013 05:47 PM, Constantinos Venetsanopoulos wrote:

On 11/28/2013 04:36 PM, Santi Raffa wrote:

On Thu, Nov 28, 2013 at 1:27 PM, Constantinos Venetsanopoulos
 wrote:

disk_dev.uuid = lu.cfg.GenerateUniqueID(lu.proc.GetECId())

I stand corrected. Sorry.


No problem.


On Thu, Nov 28, 2013 at 1:27 PM, Constantinos Venetsanopoulos
 wrote:

How exactly does Ganeti "manage the mountpoints for Gluster"
then? Can you ellaborate a bit on the following:

Sure: I simply call mount and umount directly. :) You can see the
proposed implementation at http://goo.gl/edqBmo
<https://github.com/badp/ganeti/blob/3842ccf41dc5adf0f9624362370e057877315d77/lib/storage/gluster.py#L207> 



OK. I get that, but what I don't understand is this:

With a really quick look at the code in gluster.py (correct me
if I'm wrong), I understand the following:

A Gluster "Volume" is the whole available space that Ganeti
sees and provisions disks (= files) on. The shared file storage
dir path points to this volume. If these are true, what I don't
understand is this:

I see that you "Unmount" the volume e.g. inside the "Shutdown"
method. Why would you want to do that even if it was the last
disk on that physical node? Isn't it really possible that you will
need to "Mount" it again soon, once another disk gets created
on that physical node?



Any comment on this one?

Thanks,
Constantinos


Thanks,
Constantinos




Re: [PATCH master] Gluster: Update design document

2013-11-28 Thread Constantinos Venetsanopoulos

On 11/28/2013 04:36 PM, Santi Raffa wrote:

On Thu, Nov 28, 2013 at 1:27 PM, Constantinos Venetsanopoulos
 wrote:

disk_dev.uuid = lu.cfg.GenerateUniqueID(lu.proc.GetECId())

I stand corrected. Sorry.


No problem.


On Thu, Nov 28, 2013 at 1:27 PM, Constantinos Venetsanopoulos
 wrote:

How exactly does Ganeti "manage the mountpoints for Gluster"
then? Can you ellaborate a bit on the following:

Sure: I simply call mount and umount directly. :) You can see the
proposed implementation at http://goo.gl/edqBmo
<https://github.com/badp/ganeti/blob/3842ccf41dc5adf0f9624362370e057877315d77/lib/storage/gluster.py#L207>


OK. I get that, but what I don't understand is this:

With a really quick look at the code in gluster.py (correct me
if I'm wrong), I understand the following:

A Gluster "Volume" is the whole available space that Ganeti
sees and provisions disks (= files) on. The shared file storage
dir path points to this volume. If these are true, what I don't
understand is this:

I see that you "Unmount" the volume e.g. inside the "Shutdown"
method. Why would you want to do that even if it was the last
disk on that physical node? Isn't it really possible that you will
need to "Mount" it again soon, once another disk gets created
on that physical node?

Thanks,
Constantinos


Re: [PATCH master] Gluster: Update design document

2013-11-28 Thread Constantinos Venetsanopoulos

Hello,

On 11/28/2013 01:45 PM, Santi Raffa wrote:

On Thu, Nov 28, 2013 at 12:36 PM, Constantinos Venetsanopoulos
 wrote:

what is this {id} here? The disk's index? Would it make
sense to include the disk's UUID too?

There is no disk UUID afaik.


In GenerateDiskTemplate just before returning:

disk_dev.uuid = lu.cfg.GenerateUniqueID(lu.proc.GetECId())


The disk's logical name, otoh, (as built
in GenerateDiskTemplate) is going to have the instance's UUID in it -
much like DRBD does. (See [0])


This was done because disks didn't have a UUID of their own.
I guess now that they do, it is maybe better to use that for
the logical name before the actual disk index. But this can be
a separate patch changing that for all templates.

Also you may want to include _DISK_TEMPLATE_NAME_PREFIX
for uniformity with 'rbd' and 'ext', although these can go
away altogether, now that the disk template type has
gone down to the disk level too.


Regarding Gluster, will it be possible for the admin to
setup the Gluster cluster the way he/she thinks fits best,
mount it on the Ganeti nodes, and just tell Ganeti to use it?

The original design document covered both gluster support and gluster
deployment by Ganeti, as that was the scope of the (failed) GSoC
project. The deployment projects, as far as I know, have been dropped
so what you explained above is precisely how Ganeti is supporting
Gluster.


OK. So, I guess I misunderstood something here.
How exactly does Ganeti "manage the mountpoints for Gluster"
then? Can you ellaborate a bit on the following:


Also, since Ganeti manages the mountpoints for Gluster, we can also be
a little smart with them and do things such as only creating as many
mountpoints - and therefore, connections to the actual Gluster demons
- as necessary, or even allowing you to have a different kind of
sharedfile and gluster at the same time.




Thanks for your time and attention!


No problem. Thanks for explaining.

Regards,
Constantinos


[0]: http://goo.gl/ntmllC
<https://github.com/badp/ganeti/commit/02a7846f6169b4f9998f7bbb5603c0683314fff4#diff-cd9412e50abfaec548a7b7f6a9e93d4cL460>





Re: [PATCH master] Gluster: Update design document

2013-11-28 Thread Constantinos Venetsanopoulos

On 11/27/2013 07:37 PM, Santi Raffa wrote:

The main reason is enabling userspace access, which requires
gluster-specific knowledge in the codebase. Of course, given the
current architecture of Ganeti, this also means presenting a
kernelspace interface through the mount.glusterfs-provided mountpoint
in order for OS installer scripts to work out their magic.


OK. If you plan to implement userspace access, then yes,
a new template is required.


Additionally, since a gluster volume is essentially a hashmap over the
full file path, this creates trouble with the naming scheme sharedfile
uses ({instance fqdn}/disk.{id}); folders need to be replicated across
all bricks, and if you have dozens of hundreds of those it doesn't
scale very well. For Gluster I'm picking instead ganeti/{instance
uuid}.{id};


what is this {id} here? The disk's index? Would it make
sense to include the disk's UUID too?


  given that the files are shared across n nodes, you have
1/n times the problems that enumerating large directories might have.
See also [0]. I don't know if the assumptions that are valid for
Gluster are general enough to warrant a change in how sharedfile works
(including the need to handle current customers of the sharedfile disk
template).


Ack.


Also, since Ganeti manages the mountpoints for Gluster, we can also be
a little smart with them and do things such as only creating as many
mountpoints - and therefore, connections to the actual Gluster demons
- as necessary,


This sounds good, but I'm a bit worried if we are violating
Ganeti's principal design decision to decouple the external
systems from Ganeti itself and let the admin decide on
how to deploy them. If I remember correctly from a similar
discussion about RBD and automatic configuration of
RADOS by Ganeti, there we had concluded that Ganeti
could be configured to either setup RADOS itself, or use
an existing pre-provisioned RADOS cluster.

Regarding Gluster, will it be possible for the admin to
setup the Gluster cluster the way he/she thinks fits best,
mount it on the Ganeti nodes, and just tell Ganeti to use it?


or even allowing you to have a different kind of
sharedfile and gluster at the same time.


Regarding this point, with which I agree completely, I think
that this is an orthogonal problem to Gluster implementation.
Ganeti should be extended with a generic mechanism to allow
multiple 'sharedfile' directories to be setup, and then defined
during instance or disk creation with a fall back on a default
'sharedfile' directory.

Let's assume one wants to use CephFS in the future or another
shared filesystem with Ganeti. And wants to have more than one
of them enabled simultaneously (e.g., have a CephFS dir, a
GlusterFS dir, and an NFS dir). It would be nice for Ganeti to
provide a mechanism that stores multiple 'sharedfile' paths. It
boosts flexibility with relative minor effort. What do you think?


Finally, given that mount.glusterfs (at least in the version shipped
in Debian Wheezy, 3.2.7) is in my experience not exactly a very well
behaved program (exit status of 0 no matter what, syntax message
presented on network errors, etc.), I've written a little code making
sure that the mount point did succeed - and if it didn't, why it
mightn't have, including a rather nasty corner case I accidentally
stumbled upon :)


Ack.


Let me know if this doesn't really convince you.


Yes. You're arguments make sense, since you plan to support
userspace access for Gluster.

Thanks a lot for the in-depth explanation.

Kind Regards,
Constantinos


--RS

[0]: http://joejulian.name/blog/dht-misses-are-expensive/




Re: [PATCH master] Gluster: Update design document

2013-11-27 Thread Constantinos Venetsanopoulos

Hello,

On 11/27/2013 06:01 PM, Santi Raffa wrote:

I'm happy to drop the gluster-deploy part of the document altogether.
I also tried to be a little more specific on what exactly are the
changes here.

It's kind of annoying to work with tables in reStructured text but
here we go. Interdiff:

--- a/doc/design-glusterfs-ganeti-support.rst
+++ b/doc/design-glusterfs-ganeti-support.rst
@@ -67,17 +67,32 @@ Ganeti has a number of storage types that abstract
over disk templates. This
  matters mainly in terms of disk space reporting. Gluster support is improved 
by
  a rethinking of how disk templates are assigned to storage types in Ganeti.

-+--+-+--+-+
-|Disk template | Before  | After| Characteristics |
-+==+=+==+=+
-| File | File| File storage | - ``gnt-node list`` enabled |
-|  | storage | type | - ``gnt-node list-storage`` enabled |
-|  | type|  | - ``hail`` uses disk information|
-+--+ +--+-+
-| Shared file  | | Shared file  | - ``gnt-node`` list disabled|
-+--+-+ storage type | - ``gnt-node`` list-storage enabled |
-| Gluster (new)| N/A | (new)| - ``hail`` ignores disk information |
-+--+-+--+-+
+This is the summary of the changes:
+
++--+-+-+--+
+| Disk | Current | New | Does it report storage information to... |
+| template | storage | storage +-++---+
+|  | type| type| ``gnt-node  | ``gnt-node | iallocator|
+|  | | | list``  | list-storage`` |   |
++==+=+=+=++===+
+| File | File| File| Yes.| Yes.   | Yes.  |
+|  | | | ||   |
+|  | | | ||   |
++--+-+-+-++---+
+| Shared file  | File| Shared  | No. | Yes.   | No.   |
++--+-+ file| ||   |
+| Gluster (new)| N/A | (new)   | ||   |
++--+-+-+-++---+
+| RBD (for | RBD   | No. | No.| No.   |
+| reference)   |   | ||   |
++--+---+-++---+
+
+Essentially, like RBD, we should not report storage information to
gnt-node list
+and hail. The only sensible way to do so right now is by claiming that storage
+reporting for the relevant storage type is not implemented. To do so without
+breaking the File disk template, we must introduce a new storage
type.


Please forgive me if I'm missing something here, but if we
are not using the BD translator or the libgfapi interface and
we just use Gluster as a shared filesystem, why do we need
a separate disk template?

Couldn't we just mount Gluster and use the 'sharedfile'
template?

Is it just for storage reporting? And if so, that seems more
like a problem to 'file' and 'sharedfile' template having the same
type rather than a Gluster implementation issue. What do
you think?

Kind Regards,
Constantinos





  Like RBD,
+it does not claim to support disk reporting. However, we can still make an
+effort of reporting stats to ``gnt-node list storage``.

  The rationale is simple. For shared file and gluster storage, disk space is 
not
  a function of any one node. If storage types with disk space
reporting are used,
@@ -99,32 +114,6 @@ This is expected and consistent with behaviour in
RBD. A future improvement may
  involve not displaying those columns at all in the command output unless
  per-node disk resources are actually in use.

-Gluster deployment by Ganeti
-
-
-Basic GlusterFS deployment is relatively simple:
-
-1. Create bricks on nodes 1..n
-2. Mount them to /export/brick1/gluster
-3. On nodes 2..n, run `gluster peer probe`
-4. On node 1, run `gluster volume create`, enumerating all bricks.
-5. On node 1, run `gluster volume start`
-
-From here onwards, however, the sky's the limit. Gluster has support for
-translators, essentially "pure transformations" of simpler volumes. The
-`distribute` translator, for example, takes arbitrary "subvolumes" and
-distributes files amongst them, offering as output the union "subvolume".
-Gluster uses this translator by default, but there are

Re: [PATCH master 00/18] Hotplug support for KVM hypervisor

2013-10-18 Thread Constantinos Venetsanopoulos

Hello,

On 10/18/2013 06:45 PM, Dimitris Aragiorgis wrote:

* Thomas Thrainer  [2013-10-18 15:40:08 +0200]:


On Fri, Oct 18, 2013 at 3:10 PM, Dimitris Aragiorgis wrote:


* Thomas Thrainer  [2013-10-18 14:29:18 +0200]:


FYI: I'm testing with QEMU 0.12.5.


And how does hotplug code run? There is a check in VerifyHotplugSupport()
for qemu version as returned by qemu-monitor. And if < 1.0 it
raises HotplugError and RCP fails. Isn't that correct?


You are right! The version detection code is broken, and hotplugging
shouldn't even be tried on this machine.
The problem is that the QEMU version is parsed using a regex, which
captures strings rather than integers, so ("0", "12") < (1, 0) is False all
of the sudden!


Nice catch! Sorry for not finding it in the first place. To be honest we
haven't been playing with qemu less than 1.1 for more than a year.

Well psomas found that there was an issue with drive_del back in 0.12 [1].
I guess this is what you are hitting on.


Indeed, currently it seems that you are hitting on this bug.

However the retry problem is orthogonal to this qemu bug or
hotplug, as we discussed during GanetiCon.

We have also been hitting on the same problem in production
under heavy load, with drbd disks due to blkid (without hotplug,
and with newer qemu versions).

So, I think retrying before blockdev removal, should be patched
in any way, even for previous stable versions (2.8, 2.9), before
applying the hotplug patch series for 2.10.

Kind Regards,
Constantinos



I'll fix that in the corresponding patch and make the hotplug QA
configurable, otherwise our automatic builds would fail.

Hopefully we'll manage to set up a KVM cluster with a new enough version of
KVM soonish, so we actually can test the code.


Thanks a lot! I really appreciate it.

dimara

[1] http://lists.gnu.org/archive/html/qemu-devel/2010-11/msg00592.html


Cheers,
Thomas



Hotplugging works fine, only the disk removal fails. I'd prefer to

support

older QEMU versions too (by retrying the remove operation), because
spuriously failing removes are a problem in other circumstances as well

(as

far as I can remember sometimes during instance removes, where the
hypervisor keeps the disk open even after the instance is already down).

So

we could fix multiple problems at once.


In this case I think you are hitting an existing bug (before hotplug
implementation) which is triggered when you remove a disk from a running
instance. blockdev_remove() RPC fails because lvremove fails too,
_RemoveDisk()
prints a warning and continues removing it from config. So you have orphan
volumes.


Cheers,
Thomas


On Thu, Oct 17, 2013 at 9:49 PM, Guido Trotter 

wrote:

qq: might it be that our qemu version is too old, and this should just
be restricted to newer versions?

Guido

On Thu, Oct 17, 2013 at 12:40 PM, Thomas Thrainer 
Hi,

Unfortunately I'm still hitting the same problem with the patch.
udev rules are still deactivated such that blkid does not run (I

tested

with

`lvcreate -L 100G -n test xenvg && for i in {0..100}; do lsof
/dev/xenvg/test; done`).
The instances I'm testing with are not under heavy load, so I don't

know

what could keep the disk busy. I suspect it's QEMU, as there is not

much

else left... Please note that there are ~2 seconds between the hot

add

and

the hot remove, so hopefully everything which happens during the add

has

already finished once the hot remove takes place.
I guess the next best thing we can do is to actually retry to remove

the

disk multiple times. I wouldn't rely on `lsof`, but rather just try

the

removal and retry if it fails. You might want to look into retry.py

for a

helper function for this.

Cheers,
Thomas


On Thu, Oct 17, 2013 at 5:12 PM, Thomas Thrainer <

thoma...@google.com>

wrote:

Sorry, I haven't seen your patch yet. I'll test it right away, and

report

back.

Thanks,
Thomas


On Thu, Oct 17, 2013 at 5:11 PM, Thomas Thrainer <

thoma...@google.com>

wrote:

Hi,

Thanks for the updates. I think the best way would be to make sure

that

hot-removing a disk actually only reports back as successful once

the

block

device is no longer kept open by KVM. If drive_del after device_del

provides

just that, that would be perfect!

We can postpone the "retry lvremove / drbdsetup down" change until

after

this patch series, if we can fix the hotplug problem with this

change.

Thanks a lot,
Thomas


On Thu, Oct 17, 2013 at 2:15 PM, Dimitris Aragiorgis <

dim...@grnet.gr>

wrote:


Another UPDATE (thanks to psomas)

If we run:

drive_del just after device_del seems to work as expected!

The VM reports IO error and dmsetup/lvmremove succeed even if we

dd

inside the VM.

I'll do some more tests and send an oneliner interdiff asap (if

tests

pass)

Thanks,
dimara

PS: sorry for spamming the list..

* Dimitris Aragiorgis  [2013-10-17 14:45:13

+0300]:

UPDATE!

I hot-added a drbd disk, went inside the VM and

dd if=/dev/zero of=/dev/vdb

In master I hot-removed the device.

Re: 2.10 freeze coming

2013-10-07 Thread Constantinos Venetsanopoulos

Hello Thomas,

On 10/07/2013 11:58 AM, Thomas Thrainer wrote:

Hi all,

our time for freezing 2.10 is approaching.

By October 15th all important/big changes for 2.10 should be reviewed 
and integrated, and all minor changes sent out, and we'll branch the 
stable-2.10 branch.


Please let me know if you still have big features pending which you 
plan to integrate in 2.10, and please hold back patch series which are 
meant to be integrated in 2.11.




The hotplug feature is pending for 2.10. It is already reviewed
by Guido, but was missing some tests and qa. Dimitris has
already implemented them and was about to send it last week,
but the patch broke due to the latest changes wrt to RBD
access mode inside hv_kvm.py.

We are working on it right now and will send it as soon as
possible.

Thanks,
Constantinos


Thanks,
Thomas

--
Thomas Thrainer | Software Engineer |thoma...@google.com 
 |


Google Germany GmbH
Dienerstr. 12
80331 München

Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Graham Law, Christine Elizabeth Flores




Re: Fwd: Extend hail interface

2013-08-09 Thread Constantinos Venetsanopoulos

On 08/09/2013 12:18 PM, Constantinos Venetsanopoulos wrote:

Hi,

On 08/08/2013 08:38 PM, Iustin Pop wrote:

On Thu, Aug 08, 2013 at 05:32:26PM +0200, Guido Trotter wrote:

+list


-- Forwarded message --
From: Guido Trotter 
Date: Thu, Aug 8, 2013 at 5:32 PM
Subject: Re: Extend hail interface
To: Spyros Trigazis 


On Thu, Aug 8, 2013 at 4:48 PM, Spyros Trigazis  
wrote:

Στις 8 Αυγ 2013, 5:22 μ.μ., ο/η Michele Tartara 
έγραψε:

On Thu, Aug 8, 2013 at 4:09 PM, Spyros Trigazis 
 wrote:


Στις 8 Αυγ 2013, 4:51 μ.μ., ο/η Michele Tartara 
έγραψε:

On Thu, Aug 8, 2013 at 1:22 PM, Spyros Trigazis 
wrote:

Hi Michele,

I've seen the implementation of iallocator and how masterd makes 
RPCs to
all nodes. I think it is better to use noded to collect data from 
the

local mond, as we discussed during our last hangout.


As I said the other day, I don't really see a reason for reading 
the data
from MonD though the node daemon, instead of directly accessing it 
from

where you need it.

It's not impossible (and that's why I didn't just say no), but it 
adds a
new level of indirection, thus adding more delay and an extra 
point of
failure in between. Therefore, my question is: why do you think 
it's better
to route the query through NodeD? Unless there is a really good 
reason, I

think just contacting MonD directly could be better.


I don't think I understand what you mean by saying "contact Mond
directly".  Sould we contact MonD directly from Hail or MasterD?


With "directly" I mean contacting MonD using it's own protocol 
(that is,
just an HTTP GET request at the right address) and getting the JSON 
as a

result, as opposed to contacting MonD through NodeD.

Given that hail operates taking its input from a file, I guess this 
should
be done by who's generating that file and passing it to hail, which 
should

be MasterD.


I was confused because you mentioned Network.Curl which is a Haskell
library and MasterD is written in python. So, I will query all monds
included in the nodes list from lib/masterd/iallocator.py right after
the RPCs and before constructing the final dict. Since this part of 
the

code is written in Python, I will use PyCurl as you already do in
lib/rapi/client.py.


I wonder if this makes sense at all: woudn't it be better to contact
mond from hail?
rationale: there will need to be code in htools to contact mond
anyway, in order for hbal to be able to do the right thing, so it
makes sense for hail to also contact mond directly, no?

I haven't seen any doc about how hail/hbal will be changed (or I just
missed it), but this proposed change seems to have some layering
violation in it?


I am probably missing the same information here, too. What is
the exact plan for htools and their interaction with MonD?

If we move the querying functionality in htools, apart from
layering violations, doesn't that also lead to duplicate querying
code if someone wants to implement let's say their own
iallocator?



Any comments on the above?

Thanks,
Constantinos


In 2.7, masterd/confd have the authoritative "view" on the cluster
state. In the future, if hail talks directly to MonD, but jobs talk to
NodeD, won't that lead to inconsistencies? How will this be managed?


From what I've understood so far, and looking at the new
architecture figure again, MonD will query ConfDR for
information, whereas now MonD has its own Data Collectors,
and provides additional information to what's in config.data.

I also see the same problem as mentioned by Iustin. Who
synchronizes the cluster state and who has the authoritative
view?

Thanks,
Constantinos


Just a side note: FYI, it doesn't seem like it's possible to follow
Ganeti development just from this list :/ If there are hang-outs, it
might help people to post a short summary?

just making some noise,
iustin






Re: Fwd: Extend hail interface

2013-08-09 Thread Constantinos Venetsanopoulos

Hi,

On 08/08/2013 08:38 PM, Iustin Pop wrote:

On Thu, Aug 08, 2013 at 05:32:26PM +0200, Guido Trotter wrote:

+list


-- Forwarded message --
From: Guido Trotter 
Date: Thu, Aug 8, 2013 at 5:32 PM
Subject: Re: Extend hail interface
To: Spyros Trigazis 


On Thu, Aug 8, 2013 at 4:48 PM, Spyros Trigazis  wrote:

Στις 8 Αυγ 2013, 5:22 μ.μ., ο/η Michele Tartara 
έγραψε:

On Thu, Aug 8, 2013 at 4:09 PM, Spyros Trigazis  wrote:


Στις 8 Αυγ 2013, 4:51 μ.μ., ο/η Michele Tartara 
έγραψε:

On Thu, Aug 8, 2013 at 1:22 PM, Spyros Trigazis 
wrote:

Hi Michele,

I've seen the implementation of iallocator and how masterd makes RPCs to
all nodes. I think it is better to use noded to collect data from the
local mond, as we discussed during our last hangout.


As I said the other day, I don't really see a reason for reading the data
from MonD though the node daemon, instead of directly accessing it from
where you need it.

It's not impossible (and that's why I didn't just say no), but it adds a
new level of indirection, thus adding more delay and an extra point of
failure in between. Therefore, my question is: why do you think it's better
to route the query through NodeD? Unless there is a really good reason, I
think just contacting MonD directly could be better.


I don't think I understand what you mean by saying "contact Mond
directly".  Sould we contact MonD directly from Hail or MasterD?


With "directly" I mean contacting MonD using it's own protocol (that is,
just an HTTP GET request at the right address) and getting the JSON as a
result, as opposed to contacting MonD through NodeD.

Given that hail operates taking its input from a file, I guess this should
be done by who's generating that file and passing it to hail, which should
be MasterD.


I was confused because you mentioned Network.Curl which is a Haskell
library and MasterD is written in python. So, I will query all monds
included in the nodes list from lib/masterd/iallocator.py right after
the RPCs and before constructing the final dict. Since this part of the
code is written in Python, I will use PyCurl as you already do in
lib/rapi/client.py.


I wonder if this makes sense at all: woudn't it be better to contact
mond from hail?
rationale: there will need to be code in htools to contact mond
anyway, in order for hbal to be able to do the right thing, so it
makes sense for hail to also contact mond directly, no?

I haven't seen any doc about how hail/hbal will be changed (or I just
missed it), but this proposed change seems to have some layering
violation in it?


I am probably missing the same information here, too. What is
the exact plan for htools and their interaction with MonD?

If we move the querying functionality in htools, apart from
layering violations, doesn't that also lead to duplicate querying
code if someone wants to implement let's say their own
iallocator?


In 2.7, masterd/confd have the authoritative "view" on the cluster
state. In the future, if hail talks directly to MonD, but jobs talk to
NodeD, won't that lead to inconsistencies? How will this be managed?


From what I've understood so far, and looking at the new
architecture figure again, MonD will query ConfDR for
information, whereas now MonD has its own Data Collectors,
and provides additional information to what's in config.data.

I also see the same problem as mentioned by Iustin. Who
synchronizes the cluster state and who has the authoritative
view?

Thanks,
Constantinos


Just a side note: FYI, it doesn't seem like it's possible to follow
Ganeti development just from this list :/ If there are hang-outs, it
might help people to post a short summary?

just making some noise,
iustin




Re: [PATCH Master] Implemented RBD userspace support for KVM

2013-07-26 Thread Constantinos Venetsanopoulos

On 07/26/2013 12:16 PM, Thomas Thrainer wrote:




On Thu, Jul 25, 2013 at 9:42 AM, Constantinos Venetsanopoulos 
mailto:c...@grnet.gr>> wrote:


Hello Pulkit,

On 7/24/13 18:50 PM, Pulkit Singhal wrote:
> Introduced two new disk params - access, kvm_dev_path for
storing access
> type(userspace/kernelspace) and RBD device path used by KVM
> respectively.
>
> Signed-off-by: Pulkit Singhal mailto:pulkitati...@gmail.com>>
> ---
>  lib/constants.py |   10 --
>  lib/hypervisor/hv_kvm.py |9 +++--
>  lib/storage/bdev.py  |4 
>  3 files changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/lib/constants.py b/lib/constants.py
> index 35b47ae..bbb9e1b 100644
> --- a/lib/constants.py
> +++ b/lib/constants.py
> @@ -1224,6 +1224,7 @@ LDP_DELAY_TARGET = "c-delay-target"
>  LDP_MAX_RATE = "c-max-rate"
>  LDP_MIN_RATE = "c-min-rate"
>  LDP_POOL = "pool"
> +LDP_ACCESS = "access"
>  DISK_LD_TYPES = {
>LDP_RESYNC_RATE: VTYPE_INT,
>LDP_STRIPES: VTYPE_INT,
> @@ -1240,6 +1241,7 @@ DISK_LD_TYPES = {
>LDP_MAX_RATE: VTYPE_INT,
>LDP_MIN_RATE: VTYPE_INT,
>LDP_POOL: VTYPE_STRING,
> +  LDP_ACCESS: VTYPE_STRING,
>}
>  DISK_LD_PARAMETERS = frozenset(DISK_LD_TYPES.keys())
>
> @@ -1262,6 +1264,7 @@ DRBD_MAX_RATE = "c-max-rate"
>  DRBD_MIN_RATE = "c-min-rate"
>  LV_STRIPES = "stripes"
>  RBD_POOL = "pool"
> +RBD_ACCESS = "access"
>  DISK_DT_TYPES = {
>DRBD_RESYNC_RATE: VTYPE_INT,
>DRBD_DATA_STRIPES: VTYPE_INT,
> @@ -1280,6 +1283,7 @@ DISK_DT_TYPES = {
>DRBD_MIN_RATE: VTYPE_INT,
>LV_STRIPES: VTYPE_INT,
>RBD_POOL: VTYPE_STRING,
> +  RBD_ACCESS: VTYPE_STRING,
>}
>
>  DISK_DT_PARAMETERS = frozenset(DISK_DT_TYPES.keys())
> @@ -2225,7 +2229,8 @@ DISK_LD_DEFAULTS = {
>LD_FILE: {},
>LD_BLOCKDEV: {},
>LD_RBD: {
> -LDP_POOL: "rbd"
> +LDP_POOL: "rbd",
> +LDP_ACCESS: "userspace",
>  },
>LD_EXT: {},
>}
> @@ -2260,7 +2265,8 @@ DISK_DT_DEFAULTS = {
>DT_SHARED_FILE: {},
>DT_BLOCK: {},
>DT_RBD: {
> -RBD_POOL: DISK_LD_DEFAULTS[LD_RBD][LDP_POOL]
> +RBD_POOL: DISK_LD_DEFAULTS[LD_RBD][LDP_POOL],
> +RBD_ACCESS: DISK_LD_DEFAULTS[LD_RBD][LDP_ACCESS],
>  },
>DT_EXT: {},
>}
> diff --git a/lib/hypervisor/hv_kvm.py b/lib/hypervisor/hv_kvm.py
> index 4bd8a9e..5daa2c2 100644
> --- a/lib/hypervisor/hv_kvm.py
> +++ b/lib/hypervisor/hv_kvm.py
> @@ -1136,8 +1136,13 @@ class KVMHypervisor(hv_base.BaseHypervisor):
>  if needs_boot_flag and disk_type != constants.HT_DISK_IDE:
>boot_val = ",boot=on"
>
> -  drive_val = "file=%s,format=raw%s%s%s" % (dev_path,
if_val, boot_val,
> -  cache_val)
> +  if cfdev.params['access'] == 'userspace':

This should be cfdev.params[constants.LDP_ACCESS], since
we don't want to hardcode strings. Also, 'userspace' and 'kernelspace'
should also become constants.

> +kvm_dev_path = cfdev.params['kvm_dev_path']
> +drive_val = "file=%s,format=raw%s%s%s" % (kvm_dev_path,
> +if_val, boot_val, cache_val)
> +  else:
> +drive_val = "file=%s,format=raw%s%s%s" % (dev_path, if_val,
> +  boot_val, cache_val)
>kvm_cmd.extend(["-drive", drive_val])
>
>  #Now we can specify a different device type for CDROM devices.
> diff --git a/lib/storage/bdev.py b/lib/storage/bdev.py
> index 932bc98..82a6be7 100644
> --- a/lib/storage/bdev.py
> +++ b/lib/storage/bdev.py
> @@ -999,6 +999,10 @@ class RADOSBlockDevice(base.BlockDev):
>raise ValueError("Invalid configuration data %s" %
str(unique_id))
>
>  self.driver, self.rbd_name = unique_id
> +rbd_pool = params[constants.LDP_POOL]
> +
> +if params['access'] == 'userspace':

same here for 'access' and 'userspace

> +  params['kvm_dev_path'] = "rbd:" + rbd_pool + "/" +
self.rbd_name

I'm not sure 'kvm_dev_path' should be passed through params.
Maybe the disk needs to get a new dedicated attribute 

Re: Thoughts on OpenvSwitch and KVM

2013-07-25 Thread Constantinos Venetsanopoulos

On 07/25/2013 03:43 PM, Guido Trotter wrote:

On Thu, Jul 25, 2013 at 2:25 PM, Constantinos Venetsanopoulos
 wrote:

Hi Sebastian,


On 07/25/2013 03:20 PM, Sebastian Gebhard wrote:

Hi Constantinos,

it sounds very good to abstract this even more. I just have one question:

Am 25.07.13 14:01, schrieb Constantinos Venetsanopoulos:

If hypervisor=kvm:
   run kvm-ifup:
   Inside kvm-ifup:
 if custom script (kvm-vif-bridge) is found run that
 else
   run default actions

If hypervisor=xen:
run vif-ganeti
Inside vif-ganeti:
  if custom script (vif-custom) is found run that
  else
   run default actions


Where or how would you want to define such a custom script? Or where would
you check if there is one?


AFAIK, this already happens inside kvm-ifup and vif-ganeti.
kvm-ifup searches for kvm-vif-bridge and vif-ganeti
searches for vif-custom. Dimitris knows the exact details
on this.


Historically this was a way to override the script. This too should be
replaced with a more modular approach to allow overriding single parts
and/or inserting custom code at certain hook points, instead of an
all-or-nothing approach.


Ack. Of course, that would be even better.

Thanks,
Constantinos


Thanks,

Guido




Re: Thoughts on OpenvSwitch and KVM

2013-07-25 Thread Constantinos Venetsanopoulos

Hi Sebastian,

On 07/25/2013 03:20 PM, Sebastian Gebhard wrote:

Hi Constantinos,

it sounds very good to abstract this even more. I just have one question:

Am 25.07.13 14:01, schrieb Constantinos Venetsanopoulos:

If hypervisor=kvm:
  run kvm-ifup:
  Inside kvm-ifup:
if custom script (kvm-vif-bridge) is found run that
else
  run default actions

If hypervisor=xen:
   run vif-ganeti
   Inside vif-ganeti:
 if custom script (vif-custom) is found run that
 else
  run default actions 


Where or how would you want to define such a custom script? Or where 
would you check if there is one?




AFAIK, this already happens inside kvm-ifup and vif-ganeti.
kvm-ifup searches for kvm-vif-bridge and vif-ganeti
searches for vif-custom. Dimitris knows the exact details
on this.

Thanks,
Constantinos


Thanks,
Sebastian




Re: Thoughts on OpenvSwitch and KVM

2013-07-25 Thread Constantinos Venetsanopoulos

Hello everybody,

On 07/25/2013 02:47 PM, Guido Trotter wrote:

On Thu, Jul 25, 2013 at 1:43 PM, Sebastian Gebhard  wrote:

Hi there,

[resend again, as it didn't arrive on the list]

yesterday I gave the OpenvSwitch implementation for KVM some thoughts.
Especially the vlan stuff. I came across Dimitris NIC refactoring [1] which
introduces common functions for KVM and XEN networking using
tools/net-common.in for common functionality which is used by
tools/kvm-ifup.in and tools/vif-ganeti.in respectively.

I think this approach is very good, because it keeps networking as
hypervisor indepentend as it can. Now my implementation of vlan uses
vif-openvswitch which comes with XEN 4.3 and is not compatible with
vif-ganeti right now.

vif-openvswitch requires bridge=switch1.onevlan:anothervlan
whereas
vif-ganeti requires bridge=switch1 (which will crash when giving it
switch1.2:3:4 or something)

I would suggest to merge the functionality of vif-openvswitch into
vif-ganeti. This way users stay independent of whether they wanna use
vif-ganeti or vif-openvswitch.


+1. Not only, but also in net-common, so it can be used under kvm as well.



+1! That's exactly what we were discussing internally with Dimitris
this week and were ready to propose. Specifically, I'd propose a
workflow like the following:

If hypervisor=kvm:
  run kvm-ifup:
  Inside kvm-ifup:
if custom script (kvm-vif-bridge) is found run that
else
  run default actions

If hypervisor=xen:
   run vif-ganeti
   Inside vif-ganeti:
 if custom script (vif-custom) is found run that
 else
  run default actions

In the second workflow the 'default actions' will
include running the actual vif-openvswitch script,
if mode=openvswitch

Anyway, I think this is a good direction and keeps the hypervisor
abstraction. I'm sure Dimitris would like to comment too, since
he knows the exact details.

Just a note here. I think we should probably rename
those scripts at some point, so that they have more
consistent naming. One quick suggestion would be:

kvm-ifup -> kvm-vif-ganeti
kvm-vif-bridge -> kvm-vif-custom
vif-ganeti -> xen-vif-ganeti
vif-custom -> xen-vif-custom
net-common -> vif-common

Thanks,
Constantinos


What are your thoughts about that? Do you see any issues I may have
overseen?

Also, I would base my future implementations (QoS for example) on this
structure.


Indeed, it would be good if the implementation was independent of
vif-openvswitch vs vif-ganeti, and of course compatible with all the
ganeti hypervisors, and not dependent on Xen.

Thanks,

Guido




Re: [PATCH] Add detail plan to gluster for XEN, KVM and such VM

2013-07-25 Thread Constantinos Venetsanopoulos
On 7/25/13 11:39 AM, Michele Tartara wrote:
> On Thu, Jul 25, 2013 at 10:20 AM, Constantinos Venetsanopoulos
> mailto:c...@grnet.gr>> wrote:
>
> Hello,
>
> On 7/25/13 11:15 AM, Michele Tartara wrote:
> > On Wed, Jul 24, 2013 at 7:14 PM, Weiwei Jia
> mailto:harryxi...@gmail.com>
> > <mailto:harryxi...@gmail.com <mailto:harryxi...@gmail.com>>> wrote:
> >
> > This patch adds detailed plans to support Gluster in XEN,
> QEMU/KVM
> > and such VM, in a way that is similar to sharedfile disk
> template.
> >
> > ---
> >  doc/design-glusterfs-ganeti-support.rst |   37
> > ++-
> >  1 file changed, 36 insertions(+), 1 deletion(-)
> >
> > diff --git a/doc/design-glusterfs-ganeti-support.rst
> > b/doc/design-glusterfs-ganeti-support.rst
> > index a26074e..896c981 100644
> > --- a/doc/design-glusterfs-ganeti-support.rst
> > +++ b/doc/design-glusterfs-ganeti-support.rst
> > @@ -74,7 +74,7 @@ Now, there are two specific enhancements:
> >works with VM images on Gluster volumes.
> >
> >  Proposed implementation
> > 
> > +===
> >
> >  QEMU/KVM includes support for GlusterFS and Ganeti could
> support
> > GlusterFS
> >  through QEMU/KVM. However, this way could just let VMs of
> > QEMU/KVM use GlusterFS
> > @@ -92,6 +92,41 @@ VMs), RBD disk template would be a good
> > reference. The first part would be finis
> >  at first and then the second part would be completed, which is
> > based on the first
> >  part.
> >
> > +Gluster for XEN and QEMU/KVM VM (similar to sharedfile)
> > +---
> > +This part adds support for Gluster for XEN VM, QEMU/KVM VM and
> > such, in a way
> > +which is similar to sharedfile disk template. It would be
> > finished with two
> > +steps as follows.
> > +
> > +- Mount server gluster volume to local directory of each
> instance
> > in the
> > +  cluster.
> > +- Add instance with the disk installed under the local
> directory
> > that is
> > +  based on gluster backend storage.
> > +
> > +The second step is easy to realize because sharedfile disk
> > template is a good
> > +reference (see commit
> 4b97f9024376b7bd5db9fce0aba50e29ed842fa2 and
> > +design-shared-storage.rst for details). The first step would be
> > completed as
> > +follows.
> > +
> > +- Add "--disk-parameters gluster:hostname=node1,volname=ganeti"
> > parameters
> > +  to ``gnt-instance add`` command.
> > +- Run "gnt-instance add --disk-parameters
> > hostname=node1,volname=ganeti
> > +  instance.example.com <http://instance.example.com>
> <http://instance.example.com>", that would
> > get the disk parameters, mount Gluster volume
> > +  to local Gluster storage dir, save (hostname, node1),
> (volname,
> > ganeti) into
> > +  ganeti config file, and then create instance with Gluster
> > backend storage.
> >
>
> If I understood correctly from Guido's comments in the
> previous thread, we are going for the '-S' option for both
> Gluster/RBD and not --disk-parameters. Am I missing something?
>
>
> Yes, you missed the IRC discussion yesterday afternoon :-)
>
> For setting the disk parameters cluster-wide, -S is the way to go, and
> it will probably be used by gluster as well in the future for setting
> the cluster-wide default.
>
> But for the first implementation, we are just allowing the user to
> specify the parameters on an instance-by-instance basis.

Makes sense for a first iteration, but if we go for disk params, why
don't we do it at nodegroup level, which will work out of the box?
The same thing that happens now with the rbd 'pool' disk parameter.

I guess 'volname' for Gluster is exactly what 'pool' is for RBD. Right?

> So, as Guido just wrote, these are actually instance disks, and
> therefore described through --disk-parameters.

> Thanks,
> Michele
>
> -- 
> Google Germany GmbH
> Dienerstr. 12
> 80331 München
>
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg
> Geschäftsführer: Graham Law, Christine Elizabeth Flores



Re: [PATCH] Add detail plan to gluster for XEN, KVM and such VM

2013-07-25 Thread Constantinos Venetsanopoulos
Hello,

On 7/25/13 11:15 AM, Michele Tartara wrote:
> On Wed, Jul 24, 2013 at 7:14 PM, Weiwei Jia  > wrote:
>
> This patch adds detailed plans to support Gluster in XEN, QEMU/KVM
> and such VM, in a way that is similar to sharedfile disk template.
>
> ---
>  doc/design-glusterfs-ganeti-support.rst |   37
> ++-
>  1 file changed, 36 insertions(+), 1 deletion(-)
>
> diff --git a/doc/design-glusterfs-ganeti-support.rst
> b/doc/design-glusterfs-ganeti-support.rst
> index a26074e..896c981 100644
> --- a/doc/design-glusterfs-ganeti-support.rst
> +++ b/doc/design-glusterfs-ganeti-support.rst
> @@ -74,7 +74,7 @@ Now, there are two specific enhancements:
>works with VM images on Gluster volumes.
>
>  Proposed implementation
> 
> +===
>
>  QEMU/KVM includes support for GlusterFS and Ganeti could support
> GlusterFS
>  through QEMU/KVM. However, this way could just let VMs of
> QEMU/KVM use GlusterFS
> @@ -92,6 +92,41 @@ VMs), RBD disk template would be a good
> reference. The first part would be finis
>  at first and then the second part would be completed, which is
> based on the first
>  part.
>
> +Gluster for XEN and QEMU/KVM VM (similar to sharedfile)
> +---
> +This part adds support for Gluster for XEN VM, QEMU/KVM VM and
> such, in a way
> +which is similar to sharedfile disk template. It would be
> finished with two
> +steps as follows.
> +
> +- Mount server gluster volume to local directory of each instance
> in the
> +  cluster.
> +- Add instance with the disk installed under the local directory
> that is
> +  based on gluster backend storage.
> +
> +The second step is easy to realize because sharedfile disk
> template is a good
> +reference (see commit 4b97f9024376b7bd5db9fce0aba50e29ed842fa2 and
> +design-shared-storage.rst for details). The first step would be
> completed as
> +follows.
> +
> +- Add "--disk-parameters gluster:hostname=node1,volname=ganeti"
> parameters
> +  to ``gnt-instance add`` command.
> +- Run "gnt-instance add --disk-parameters
> hostname=node1,volname=ganeti
> +  instance.example.com ", that would
> get the disk parameters, mount Gluster volume
> +  to local Gluster storage dir, save (hostname, node1), (volname,
> ganeti) into
> +  ganeti config file, and then create instance with Gluster
> backend storage.
>

If I understood correctly from Guido's comments in the
previous thread, we are going for the '-S' option for both
Gluster/RBD and not --disk-parameters. Am I missing something?

Thanks,
Constantinos

> +- VerifyDisks checks if Gluster storage dir is mounted when
> creating an instance.
> +- If not mounted, ActivateDisks would mount the Gluster volume to
> node's Gluster
> +  storage dir.
> +- Check (and if needed mount) the Gluster volume from inside
> CreateDisk (a function
> +  called as part of the InstanceCreate opcode)
> +
> +In order to run above mount command correctly, the
> '--disk-parameters disk-template:
> +disk-param=value[, disk-param=value...]' option of ``gnt-instance
> add`` command
> +has to support the format of '--disk-parameters
> gluster:hostname=node1,volname=
> +ganeti'. Through this way, hostname and volume name would be
> obtained. The gluster
> +storage dir will also be obtained by adding the parameter
> '--gluster-storage-dir'
> +at the instance adding time.
> +
>  .. vim: set textwidth=72 :
>  .. Local Variables:
>  .. mode: rst
> --
> 1.7.10.4
>
>
> LGTM, thanks.
> Michele
>
>
> -- 
> Google Germany GmbH
> Dienerstr. 12
> 80331 München
>
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg
> Geschäftsführer: Graham Law, Christine Elizabeth Flores



Re: [PATCH Master] Implemented RBD userspace support for KVM

2013-07-25 Thread Constantinos Venetsanopoulos
Hello Pulkit,

On 7/24/13 18:50 PM, Pulkit Singhal wrote:
> Introduced two new disk params - access, kvm_dev_path for storing access
> type(userspace/kernelspace) and RBD device path used by KVM
> respectively.
>
> Signed-off-by: Pulkit Singhal 
> ---
>  lib/constants.py |   10 --
>  lib/hypervisor/hv_kvm.py |9 +++--
>  lib/storage/bdev.py  |4 
>  3 files changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/lib/constants.py b/lib/constants.py
> index 35b47ae..bbb9e1b 100644
> --- a/lib/constants.py
> +++ b/lib/constants.py
> @@ -1224,6 +1224,7 @@ LDP_DELAY_TARGET = "c-delay-target"
>  LDP_MAX_RATE = "c-max-rate"
>  LDP_MIN_RATE = "c-min-rate"
>  LDP_POOL = "pool"
> +LDP_ACCESS = "access"
>  DISK_LD_TYPES = {
>LDP_RESYNC_RATE: VTYPE_INT,
>LDP_STRIPES: VTYPE_INT,
> @@ -1240,6 +1241,7 @@ DISK_LD_TYPES = {
>LDP_MAX_RATE: VTYPE_INT,
>LDP_MIN_RATE: VTYPE_INT,
>LDP_POOL: VTYPE_STRING,
> +  LDP_ACCESS: VTYPE_STRING,
>}
>  DISK_LD_PARAMETERS = frozenset(DISK_LD_TYPES.keys())
>  
> @@ -1262,6 +1264,7 @@ DRBD_MAX_RATE = "c-max-rate"
>  DRBD_MIN_RATE = "c-min-rate"
>  LV_STRIPES = "stripes"
>  RBD_POOL = "pool"
> +RBD_ACCESS = "access"
>  DISK_DT_TYPES = {
>DRBD_RESYNC_RATE: VTYPE_INT,
>DRBD_DATA_STRIPES: VTYPE_INT,
> @@ -1280,6 +1283,7 @@ DISK_DT_TYPES = {
>DRBD_MIN_RATE: VTYPE_INT,
>LV_STRIPES: VTYPE_INT,
>RBD_POOL: VTYPE_STRING,
> +  RBD_ACCESS: VTYPE_STRING,
>}
>  
>  DISK_DT_PARAMETERS = frozenset(DISK_DT_TYPES.keys())
> @@ -2225,7 +2229,8 @@ DISK_LD_DEFAULTS = {
>LD_FILE: {},
>LD_BLOCKDEV: {},
>LD_RBD: {
> -LDP_POOL: "rbd"
> +LDP_POOL: "rbd",
> +LDP_ACCESS: "userspace",
>  },
>LD_EXT: {},
>}
> @@ -2260,7 +2265,8 @@ DISK_DT_DEFAULTS = {
>DT_SHARED_FILE: {},
>DT_BLOCK: {},
>DT_RBD: {
> -RBD_POOL: DISK_LD_DEFAULTS[LD_RBD][LDP_POOL]
> +RBD_POOL: DISK_LD_DEFAULTS[LD_RBD][LDP_POOL],
> +RBD_ACCESS: DISK_LD_DEFAULTS[LD_RBD][LDP_ACCESS],
>  },
>DT_EXT: {},
>}
> diff --git a/lib/hypervisor/hv_kvm.py b/lib/hypervisor/hv_kvm.py
> index 4bd8a9e..5daa2c2 100644
> --- a/lib/hypervisor/hv_kvm.py
> +++ b/lib/hypervisor/hv_kvm.py
> @@ -1136,8 +1136,13 @@ class KVMHypervisor(hv_base.BaseHypervisor):
>  if needs_boot_flag and disk_type != constants.HT_DISK_IDE:
>boot_val = ",boot=on"
>  
> -  drive_val = "file=%s,format=raw%s%s%s" % (dev_path, if_val, boot_val,
> -cache_val)
> +  if cfdev.params['access'] == 'userspace':

This should be cfdev.params[constants.LDP_ACCESS], since
we don't want to hardcode strings. Also, 'userspace' and 'kernelspace'
should also become constants.

> +kvm_dev_path = cfdev.params['kvm_dev_path']
> +drive_val = "file=%s,format=raw%s%s%s" % (kvm_dev_path,
> +if_val, boot_val, cache_val)
> +  else:
> +drive_val = "file=%s,format=raw%s%s%s" % (dev_path, if_val,
> +  boot_val, cache_val)
>kvm_cmd.extend(["-drive", drive_val])
>  
>  #Now we can specify a different device type for CDROM devices.
> diff --git a/lib/storage/bdev.py b/lib/storage/bdev.py
> index 932bc98..82a6be7 100644
> --- a/lib/storage/bdev.py
> +++ b/lib/storage/bdev.py
> @@ -999,6 +999,10 @@ class RADOSBlockDevice(base.BlockDev):
>raise ValueError("Invalid configuration data %s" % str(unique_id))
>  
>  self.driver, self.rbd_name = unique_id
> +rbd_pool = params[constants.LDP_POOL]
> +
> +if params['access'] == 'userspace':

same here for 'access' and 'userspace

> +  params['kvm_dev_path'] = "rbd:" + rbd_pool + "/" + self.rbd_name

I'm not sure 'kvm_dev_path' should be passed through params.
Maybe the disk needs to get a new dedicated attribute as it has
for 'dev_path'. Team?

Regards,
Constantinos

>  
>  self.major = self.minor = None
>  self.Attach()



Re: [PATCH master 0/2] Add gluster disk template

2013-07-23 Thread Constantinos Venetsanopoulos
Hello everybody,

On 7/23/13 07:04 AM, Weiwei Jia wrote:
> This patch set would 1) mount glusterfs automatically
> by Ganeti and 2) add gluster disk template for XEN/KVM
> VMs (similar to sharedfile disk template). Before mount
> glusterfs, user should create and start volume by himself
> as follows.
>
> # gluster volume create ganeti node1:/var/tmp/g2
> # gluster volume start ganeti
>
> Then do above 1) and 2) as follows.
> 1) gnt-cluster init --disk-parameters gluster:hostname=node1,volname=ganeti 
> --gluster-file-storage-dir /srv/ganeti/gluster-file-storage 
> --enabled-disk-templates gluster cluster1

Having also read the relevant thread for RBD, I have one
question:

It seems that for RBD the parameters needed for the auto configuration
are going to be implemented at cluster level (and not as disk params)
with a new option "-S".

Shouldn't both implementations (Gluster/RBD) follow the same
approach, either the one (disk params) or the other ("-S")?

Thanks,
Constantinos

> 2) gnt-instance add -n node1.harrycluster.com -o debootstrap+default -t 
> gluster -s 5G --file-storage-dir=mysubdir instance2.harrycluster.com
>
> Weiwei Jia (2):
>   Add gluster disk template for XEN KVM VMs
>   Add mount glusterfs automatically by Ganeti
>
>  Makefile.am|2 ++
>  configure.ac   |   18 ++
>  lib/bootstrap.py   |   20 +++-
>  lib/cli.py |9 +
>  lib/client/gnt_cluster.py  |7 +--
>  lib/cmdlib/cluster.py  |7 +--
>  lib/cmdlib/common.py   |3 ++-
>  lib/cmdlib/instance.py |2 ++
>  lib/cmdlib/instance_storage.py |6 +-
>  lib/cmdlib/node.py |4 +++-
>  lib/config.py  |8 
>  lib/constants.py   |   31 +++
>  lib/objects.py |   20 +++-
>  lib/opcodes.py |   14 ++
>  lib/pathutils.py   |7 +++
>  lib/ssconf.py  |7 +++
>  lib/utils/storage.py   |   10 ++
>  17 files changed, 166 insertions(+), 9 deletions(-)
>



Re: Regarding GlusterFS support

2013-07-19 Thread Constantinos Venetsanopoulos

On 07/19/2013 08:42 AM, harryxiyou wrote:

On Thu, Jul 18, 2013 at 11:07 PM, harryxiyou  wrote:

On Thu, Jul 18, 2013 at 5:59 PM, Constantinos Venetsanopoulos
 wrote:
Hi Constantinos,


As we have it designed now, the new libgfapi interface will only work on kvm

= 1.4. I am not sure if similar support will ever make it into Xen or if

its even possible but I think that would be an upstream problem to sort out.
Unless we write a driver (I think), there is no way to have glusterfs
provide a block device directly. The patch in kvm essentially does that for
the guest directly.


Ack. I don't think we want to write our own block driver for Gluster,
so I guess we can stick with the file approach. However, as I said
above, we need the template to be able to provide dynamically at
least a file path for every newly created volume, even if the creation
happens using qemu-img.

So, this is something I would like to see in the design doc, explaining
how are you going to be mounting each volume to the host. And
once this is done, how are you going to decide when to enable
userspace access with KVM+libgfapi.

Now, I am finishing the XEN && QEMU/KVM (not by libgfapi) VMs part
that is similar to sharedfile way.

I would explain following stuffs later on in my doc when I would finish
libfgapi + QEMU/KVM (latter one) way right now.

1, How I am going to be mounting each volume to the host.
2, How I am going to decide when to enable userspace access
 with KVM+libgfapi.
3, Other stuffs not list here.




After my mentor Michele merges my latest update for
design-gluster-ganeti-support.rst, I would add detail description for
the first part (similar to sharedfile part) right now.


thanks.

Constantinos



--
Thanks
Weiwei  Jia (Harry Wei)




Re: Updates on Userspace mode KVM+RBD

2013-07-18 Thread Constantinos Venetsanopoulos

Hello Pulkit,

On 07/18/2013 05:48 PM, Pulkit Singhal wrote:

Hi,

On Thu, Jul 18, 2013 at 7:23 PM, Constantinos Venetsanopoulos 
mailto:c...@grnet.gr>> wrote:

>>
>> 2. With a generic name such as `access`, other templates can use 
the same

>> disk parameter as well.
>>
>>
>> Ack, but see above.
>>
>> 3. Implementation-wise, using a disk template is easier IMHO 
(especially

>> in the context of a GSoC project).
>>
>>
>> If we want to introduce it just as a template's disk-param, indeed
>> it is easier. But if we want to do that, and in the same time being
>> able to set it at disk level, I think that it is even harder than
>> introducing the functionality independently, because one has
>> to cope with the inheritance issues too.
>>
>> I'd propose we decide in which level we want it to be configurable
>> first. If that is nodegroup level, go with just disk-params. If it is
>> instance or even better disk level, implement it independently.
>> I would be fine with either way, but IMHO mixing both will result
>> in complicated code.
>
> For simplicity reasons, let's stick with disk parameters at the 
cluster/node
> group level. It makes no sense to implement an individual solution 
right now
> in a GSoC project if we want to create a more powerful and general 
solution

> later on with storage pools.
>
>
> ACK. I also agree with you. So, I guess the design doc will probably
> get updated accordingly, to state the limitations we have for now.

I'm a bit unclear about the implementation level we're discussing 
about. I would be helpful if you could provide some pointers to 
understand the issue.





What we are saying is that if you introduce 'access' as a new
disk-param (i.e. RBD_ACCESS in DISK_DT_PARAMETERS), you
do not need to also introduce it as a new IDISK_PARAM. At least
not in the context of GSoC, because it complicates things.

If you just introduce it the same way we already have e.g.
RBD_POOL, then it will be automatically configurable at
cluster and nodegroup level, as happens with all other disk-params.
And you will also automatically have it in your 'params' variable
inside the bdev functions.

The only drawback is that for now, you will not be able to
have different access modes for different disks of the same
instance or for different instances of the same nodegroup.
But, as Thomas says, this is not a problem for now.


>>
>> Ack. We just need a new disk parameter that will be set in
>> bdev's Attach method with the exact argument that we
>> need to pass to kvm. Then at hv_kvm.py we decide whether
>> to use that argument, according to the 'access' value.
>> Isn't that right?
>
> We can even construct the whole path to the disk ('rbd:/)
> directly in hv_kvm.py, as it's KVM specific anyway. This (small) 
piece of
> logic does not really belong to KVM nor RBD, so I'd put it where it 
creates

> the least amount of work.
>
>
> Hmm. Wouldn't it be better to construct it inside the template's
> code in bdev, and set it to a corresponding dedicated disk variable?
>
> That way the kvm code would be totally abstracted. It will just
> check the access value and if it is 'kernelspace' will use the current
> 'dev_path'. If it is 'userspace' it will use the new variable.
>
> So, anybody who is introducing a new storage backend will have
> to write code in one place, bdev, and not having to update the
> logic in hv_kvm every time. Even the ExtStorage interface can
> be extended to support this feature in the future.
>
> I guess it's the same amount of work, just in a different place.
> What do you think?

I believe hypervisor.StartInstance() receives a list of tuples, 
(instance.disks, link_name). 'link_name' is a string returned by 
_SymlinkBlockDevice(). An additional attribute in the bdev.py class 
won't be accessible to the function. Please correct me if I'm wrong.




The first element of the tuple is the disks of the instance, so you
will be able to access the attribute from there. The same way
you will access the type of disk to decide on whether it is RBD.

Thanks,
Constantinos


Thanks,
Pulkit Singhal




Re: Updates on Userspace mode KVM+RBD

2013-07-18 Thread Constantinos Venetsanopoulos

Hello Thomas,

Do you have any comments regarding the following?

Thanks,
Constantinos

On 07/16/2013 03:22 PM, Constantinos Venetsanopoulos wrote:

Hi,

On 07/16/2013 09:35 AM, Thomas Thrainer wrote:




On Mon, Jul 15, 2013 at 8:55 PM, Pulkit Singhal 
mailto:pulkitati...@gmail.com>> wrote:


Hi Constantinos and Thomas,


On 07/15/2013 10:52 AM, Thomas Thrainer wrote:


On Sun, Jul 14, 2013 at 10:51 PM, Pulkit Singhal
mailto:pulkitati...@gmail.com>> wrote:

Hi everyone,

I made the following changes for providing direct RBD
access to KVM,

  * Introduction of a "direct:True/False" disk_param

I thought (based on Constantinous' comments) that we wanted
to call this parameter `access:userspace` vs.
`access:kernelspace`?



Yes, it should be access:kernelspace/userspace. Agreed.


I'm not sure if this option should be a disk-param for two
reasons:

1. I think we should at least be able to set it at
instance-level. I thought
that you were thinking to even to do it at disk-level from
our previous
discussions. Introducing it as a disk-param means that we
will only be
able to set it at {nodegroup, cluster}-level.

2. I'd prefer a generic way to handle this operation which
can be used
by other templates too e.g. Gluster or even the ExtStorage
interface in
the future.


Probably I used the wrong term. This option will be used in the
following way.

/gnt-instance -t rbd --disk 0:size=4000m,access:userspace --net
0:ip=10.2.4.17 -n node1 -d instance1/

I believe this should be instance-level.


Actually, this is a disk-param (as it could change in disk 1, for 
example).

I still think that this is the right approach for those reasons:

1. Having it as disk param is the most flexible option. Usually you 
want to treat all nodes/instances in your nodegroup the same, which 
is covered automatically. However, you still can change the 
access-mode on a per-disk basis (and also per-instance, you just have 
to remember to include the parameter for all disks).


AFAIK, defining something as a disk-param of a template doesn't
automatically make it configurable at instance or disk level, so
you can't have different values per instance or disk (e.g. see most
of the DRBD disk-params, or the RBD 'pool' param).

You need additional specific code and addition to IDISK_PARAMS to
be able to handle it at disk level, as shown above. Currently, this
is only done for the 'vg' and 'metavg' disk-params and is not very
consistent with the rest of the disk-params design. That's why,
in previous discussions (with Guido and Iustin, if I remember
correctly) we were thinking of even removing this even for 'vg' and
'metavg', with the introduction of storage pools.

2. With a generic name such as `access`, other templates can use the 
same disk parameter as well.


Ack, but see above.

3. Implementation-wise, using a disk template is easier IMHO 
(especially in the context of a GSoC project).


If we want to introduce it just as a template's disk-param, indeed
it is easier. But if we want to do that, and in the same time being
able to set it at disk level, I think that it is even harder than
introducing the functionality independently, because one has
to cope with the inheritance issues too.

I'd propose we decide in which level we want it to be configurable
first. If that is nodegroup level, go with just disk-params. If it is
instance or even better disk level, implement it independently.
I would be fine with either way, but IMHO mixing both will result
in complicated code.


  * Pass unique_id to RADOSBlockDevice.__init__() where
the last element is a boolean indicating if
direct-rbd integration is requested or not.



Does this mean that the unique_id tuple gets a third element?
This should be also documented as a generic mechanism,
so that other templates can use it too.


Yes, a third element will be added,/' access' ./


After looking at the code again, I guess it's not even needed to 
extend the unique_id tuple. All disk related RPC receive the values 
of all disk parameters, so we can directly access them in noded.


Indeed that's true. If it's a disk-param it is already there in bdev.py.
We don't need to pass it through unique_id.


  * Skip _symlinking_ if direct-rbd mode is requested
and the variable /link_name/ is set to
'rbd:device.rbd_pool/device.rbd_name' to be directly
accessible by kvm instead of symlinked path.

Even if we don't symlink, we should not misuse the link_name
variable with some

Re: Regarding GlusterFS support

2013-07-18 Thread Constantinos Venetsanopoulos

Hi Lance,

On 07/18/2013 01:32 AM, Lance Albertson wrote:




On Wed, Jul 17, 2013 at 4:23 AM, Constantinos Venetsanopoulos 
mailto:c...@grnet.gr>> wrote:


Hello everybody,

I just finished reading all the threads regarding GlusterFS support
and the corresponding design doc that is already merged on master.

I'd like to ask some questions since some things are not very clear
to me. Please forgive me if I'm not understanding something
correctly, since I have no experience with Gluster yet.

From what I understand, Gluster provides a filesystem namespace
under a shared directory, which means that we could actually be
using Gluster even now, if we had mounted it under the
'shared-storage-dir' and used the 'sharedfile' template. And that
would work for both Xen and KVM. Is that right?


That is correct. Its not the most ideal way to run on top of Gluster 
but you can certainly do it way. I think the main kicker is if we want 
to have Ganeti manage the Glusterfs cluster it was thought to have its 
own type similar to shared-file. But the initial run will be 
externally managed. I have been pondering if this is a good approach 
or not. But I really wanted to see the libgfapi support added more 
than the general shared-file approach as it has much better performance.




Ack. I think that having Ganeti managing the Gluster cluster itself,
doesn't conflict with the way Gluster volumes will be handled. It is
a completely independent problem, the same applies to RBD too.
It's a different feature that can be implemented independently.

I'd like to focus a bit more on the latter, handling of Gluster volumes.
Indeed, if we want to support the libgfapi we need a new 'gluster'
template. But if we introduce a new template, this template has to
work with both Xen and KVM (when 'access=kernelspace') to be
consistent with Ganeti's design and it has to also work with
'access=userspace' for KVM (libgfapi), to support the new feature.
Thus, we need two things:

1. The template should be able to return a block device or file path
of the Gluster volume via the Attach method at the 'dev_path' variable.
Probably a file path, since from what you are saying there is no way
to map to a block device.
2. The template's Attach method should be able to return the
corresponding KVM argument (in a new disk variable? see also the
RBD thread), that will enable the libgfapi support if access=userspace.

(1) is needed for at least 3 reasons:

* Be able to use the template with Xen
* Be able to run 'activate-disks' for troubleshooting
* For the OS definitions to work


Now, from what I understand Gluster >= 3.4 provides the libgfapi,
which can be used by QEMU to directly access Gluster volumes.
Does this library also provide a way (a command-line tool like the
'rbd' tool) to handle volumes e.g. provision, remove? Or the only
way to do that is via qemu-img?


The most ideal way is via qemu-img but if we really wanted to we could 
write our own library that interfaces with libgfapi and probably 
control a lot more of whats going on. I think that's outside of the 
scope of this project. But for what we need for this project I think 
the access that qemu-img provides is enough. It should allow us to 
create, delete and grow volumes.


Ack.


Also does it provide a way to
map volumes as kernel block devices? From the design doc:

"- Make sure Ganeti VMs can use GlusterFS backends in userspace
   mode (for newer QEMU/KVM which has this support) and
   otherwise, if possible, through some kernel exported block device."

If it doesn't provide the above, if I understand correctly the new
class in bdev.py and the corresponding template type 'gluster' will
be only usable with KVM and only with access=userspace. Is that
correct or am I missing something?


As we have it designed now, the new libgfapi interface will only work 
on kvm >= 1.4. I am not sure if similar support will ever make it into 
Xen or if its even possible but I think that would be an upstream 
problem to sort out. Unless we write a driver (I think), there is no 
way to have glusterfs provide a block device directly. The patch in 
kvm essentially does that for the guest directly.




Ack. I don't think we want to write our own block driver for Gluster,
so I guess we can stick with the file approach. However, as I said
above, we need the template to be able to provide dynamically at
least a file path for every newly created volume, even if the creation
happens using qemu-img.

So, this is something I would like to see in the design doc, explaining
how are you going to be mounting each volume to the host. And
once this is done, how are you going to decide when to enable
userspace access with KVM+libgfapi.

For the latter, you p

Re: Regarding GlusterFS support

2013-07-17 Thread Constantinos Venetsanopoulos
On 7/17/13 14:44 PM, harryxiyou wrote:
> On Wed, Jul 17, 2013 at 7:23 PM, Constantinos Venetsanopoulos
>  wrote:
>> Hello everybody,
>>
>> I just finished reading all the threads regarding GlusterFS support
>> and the corresponding design doc that is already merged on master.
>>
>> I'd like to ask some questions since some things are not very clear
>> to me. Please forgive me if I'm not understanding something
>> correctly, since I have no experience with Gluster yet.
>>
>> From what I understand, Gluster provides a filesystem namespace
>> under a shared directory, which means that we could actually be
>> using Gluster even now, if we had mounted it under the
>> 'shared-storage-dir' and used the 'sharedfile' template. And that
>> would work for both Xen and KVM. Is that right?
> Yup, in addition, for QEMU/KVM VMs, there is still another way to be
> finished for Gluster disk template (similar to RBD), which is based on
> Gluster provides the libgfapi which could let QEMU/KVM access
> Gluster volume (The way has a better performance for us).

OK, I get that, but that's exactly why I'm asking the following questions.

>> Now, from what I understand Gluster >= 3.4 provides the libgfapi,
>> which can be used by QEMU to directly access Gluster volumes.
>> Does this library also provide a way (a command-line tool like the
>> 'rbd' tool) to handle volumes e.g. provision, remove? Or the only
>> way to do that is via qemu-img?
> I just know the way that is via qemu-img.
>
> http://www.gluster.org/2012/11/integration-with-kvmqemu/
> http://raobharata.wordpress.com/2012/10/29/qemu-glusterfs-native-integration/
>> Also does it provide a way to
>> map volumes as kernel block devices? From the design doc:
>>
>> "- Make sure Ganeti VMs can use GlusterFS backends in userspace
>>mode (for newer QEMU/KVM which has this support) and
>>otherwise, if possible, through some kernel exported block device."
>>
>> If it doesn't provide the above, if I understand correctly the new
>> class in bdev.py and the corresponding template type 'gluster' will
>> be only usable with KVM and only with access=userspace. Is that
>> correct or am I missing something?
>>
> This would be finished the way GlusterFS for QEMU/KVM VM, which is
> supported by the GlusterFS driver for QEMU/KVM. Actually, I would give
> the more detail doc (Update current doc) when I would finish it later on ;)
>

OK, I understand, but this doesn't actually answer my questions.

Team?

Regards,
Constantinos


Regarding GlusterFS support

2013-07-17 Thread Constantinos Venetsanopoulos

Hello everybody,

I just finished reading all the threads regarding GlusterFS support
and the corresponding design doc that is already merged on master.

I'd like to ask some questions since some things are not very clear
to me. Please forgive me if I'm not understanding something
correctly, since I have no experience with Gluster yet.

From what I understand, Gluster provides a filesystem namespace
under a shared directory, which means that we could actually be
using Gluster even now, if we had mounted it under the
'shared-storage-dir' and used the 'sharedfile' template. And that
would work for both Xen and KVM. Is that right?

Now, from what I understand Gluster >= 3.4 provides the libgfapi,
which can be used by QEMU to directly access Gluster volumes.
Does this library also provide a way (a command-line tool like the
'rbd' tool) to handle volumes e.g. provision, remove? Or the only
way to do that is via qemu-img? Also does it provide a way to
map volumes as kernel block devices? From the design doc:

"- Make sure Ganeti VMs can use GlusterFS backends in userspace
   mode (for newer QEMU/KVM which has this support) and
   otherwise, if possible, through some kernel exported block device."

If it doesn't provide the above, if I understand correctly the new
class in bdev.py and the corresponding template type 'gluster' will
be only usable with KVM and only with access=userspace. Is that
correct or am I missing something?

Thanks,
Constantinos





Re: Updates on Userspace mode KVM+RBD

2013-07-16 Thread Constantinos Venetsanopoulos

Hi,

On 07/16/2013 09:35 AM, Thomas Thrainer wrote:




On Mon, Jul 15, 2013 at 8:55 PM, Pulkit Singhal 
mailto:pulkitati...@gmail.com>> wrote:


Hi Constantinos and Thomas,


On 07/15/2013 10:52 AM, Thomas Thrainer wrote:


On Sun, Jul 14, 2013 at 10:51 PM, Pulkit Singhal
mailto:pulkitati...@gmail.com>> wrote:

Hi everyone,

I made the following changes for providing direct RBD
access to KVM,

  * Introduction of a "direct:True/False" disk_param

I thought (based on Constantinous' comments) that we wanted
to call this parameter `access:userspace` vs.
`access:kernelspace`?



Yes, it should be access:kernelspace/userspace. Agreed.


I'm not sure if this option should be a disk-param for two
reasons:

1. I think we should at least be able to set it at
instance-level. I thought
that you were thinking to even to do it at disk-level from our
previous
discussions. Introducing it as a disk-param means that we will
only be
able to set it at {nodegroup, cluster}-level.

2. I'd prefer a generic way to handle this operation which can
be used
by other templates too e.g. Gluster or even the ExtStorage
interface in
the future.


Probably I used the wrong term. This option will be used in the
following way.

/gnt-instance -t rbd --disk 0:size=4000m,access:userspace --net
0:ip=10.2.4.17 -n node1 -d instance1/

I believe this should be instance-level.


Actually, this is a disk-param (as it could change in disk 1, for 
example).

I still think that this is the right approach for those reasons:

1. Having it as disk param is the most flexible option. Usually you 
want to treat all nodes/instances in your nodegroup the same, which is 
covered automatically. However, you still can change the access-mode 
on a per-disk basis (and also per-instance, you just have to remember 
to include the parameter for all disks).


AFAIK, defining something as a disk-param of a template doesn't
automatically make it configurable at instance or disk level, so
you can't have different values per instance or disk (e.g. see most
of the DRBD disk-params, or the RBD 'pool' param).

You need additional specific code and addition to IDISK_PARAMS to
be able to handle it at disk level, as shown above. Currently, this
is only done for the 'vg' and 'metavg' disk-params and is not very
consistent with the rest of the disk-params design. That's why,
in previous discussions (with Guido and Iustin, if I remember
correctly) we were thinking of even removing this even for 'vg' and
'metavg', with the introduction of storage pools.

2. With a generic name such as `access`, other templates can use the 
same disk parameter as well.


Ack, but see above.

3. Implementation-wise, using a disk template is easier IMHO 
(especially in the context of a GSoC project).


If we want to introduce it just as a template's disk-param, indeed
it is easier. But if we want to do that, and in the same time being
able to set it at disk level, I think that it is even harder than
introducing the functionality independently, because one has
to cope with the inheritance issues too.

I'd propose we decide in which level we want it to be configurable
first. If that is nodegroup level, go with just disk-params. If it is
instance or even better disk level, implement it independently.
I would be fine with either way, but IMHO mixing both will result
in complicated code.


  * Pass unique_id to RADOSBlockDevice.__init__() where
the last element is a boolean indicating if
direct-rbd integration is requested or not.



Does this mean that the unique_id tuple gets a third element?
This should be also documented as a generic mechanism,
so that other templates can use it too.


Yes, a third element will be added,/' access' ./


After looking at the code again, I guess it's not even needed to 
extend the unique_id tuple. All disk related RPC receive the values of 
all disk parameters, so we can directly access them in noded.


Indeed that's true. If it's a disk-param it is already there in bdev.py.
We don't need to pass it through unique_id.


  * Skip _symlinking_ if direct-rbd mode is requested and
the variable /link_name/ is set to
'rbd:device.rbd_pool/device.rbd_name' to be directly
accessible by kvm instead of symlinked path.

Even if we don't symlink, we should not misuse the link_name
variable with something other than an actual link. Otherwise
the code will get rather hard to read later on.



I agree with Thomas here. That's why I proposed introducing
a new disk slot/variable ('kvm_dev_path'?) to handle this, in our
previous discussions. Again, this should become a generic

Re: Updates on Userspace mode KVM+RBD

2013-07-15 Thread Constantinos Venetsanopoulos

Hello everybody,

On 07/15/2013 10:52 AM, Thomas Thrainer wrote:




On Sun, Jul 14, 2013 at 10:51 PM, Pulkit Singhal 
mailto:pulkitati...@gmail.com>> wrote:


Hi everyone,

I made the following changes for providing direct RBD access to KVM,

  * Introduction of a "direct:True/False" disk_param

I thought (based on Constantinous' comments) that we wanted to call 
this parameter `access:userspace` vs. `access:kernelspace`?


I'm not sure if this option should be a disk-param for two reasons:

1. I think we should at least be able to set it at instance-level. I thought
that you were thinking to even to do it at disk-level from our previous
discussions. Introducing it as a disk-param means that we will only be
able to set it at {nodegroup, cluster}-level.

2. I'd prefer a generic way to handle this operation which can be used
by other templates too e.g. Gluster or even the ExtStorage interface in
the future.


  * Pass unique_id to RADOSBlockDevice.__init__() where the last
element is a boolean indicating if direct-rbd integration is
requested or not.



Does this mean that the unique_id tuple gets a third element?
This should be also documented as a generic mechanism,
so that other templates can use it too.


  * Skip _symlinking_ if direct-rbd mode is requested and the
variable /link_name/ is set to
'rbd:device.rbd_pool/device.rbd_name' to be directly
accessible by kvm instead of symlinked path.

Even if we don't symlink, we should not misuse the link_name variable 
with something other than an actual link. Otherwise the code will get 
rather hard to read later on.




I agree with Thomas here. That's why I proposed introducing
a new disk slot/variable ('kvm_dev_path'?) to handle this, in our
previous discussions. Again, this should become a generic
mechanism and documented accordingly.


As per the earlier discussions, mapping rbd as a local block
device was skipped in RADOSBlockDevice.Attach(). But, this
resulted in failure of InstanceOsAdd.

It turns out that debootstrap script while installing an OS looks
for a block device path through the DISK_0_PATH variable. It
seems, this step cannot be avoided for an OS installation. Thus,
its seems necessary to continue mapping the RBD volume as a local
block device. This won't obstruct direct RBD access by KVM as we
have already updated the /link_name/ variable which is used for
KVM disk param generation.


As discussed when talking about `gnt-instance activate-disks`, this 
approach seems reasonable to me. The system (when installing the 
instance) or administrator are then responsible for actually mounting 
the block device (and making sure to only do so when it's save, i.e. 
when the instance is not running).


I agree on that, too.

Thanks,
Constantinos



I would appreciate your comments on this approach.

Thanks,
Pulkit Singhal


As a side note, there is not yet a complete and reviewed design 
document submitted for those changes. Could you please make sure to 
update your design document draft with the results of the recent 
discussions and post it again? Feel free to send it as patch against 
master if you feel you don't need any more comments on it.


Thanks,
Thomas


--
Thomas Thrainer | Software Engineer |thoma...@google.com 
 |


Google Germany GmbH
Dienerstr. 12
80331 München

Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Graham Law, Katherine Stephens




Re: Distribute storage in Ganeti clusters

2013-07-05 Thread Constantinos Venetsanopoulos

Hi Pulkit,

On 7/5/13 18:46 PM, Pulkit Singhal wrote:

Hi Constantinos,

On Fri, Jul 5, 2013 at 4:25 PM, Constantinos Venetsanopoulos 
mailto:c...@grnet.gr>> wrote:



On 7/4/13 21:58 PM, Pulkit Singhal wrote:

* When you say 'only one pool will be supported' for Ceph do
you mean one
pool at Ganeti level (i.e. one storage pool) or one pool on
Ceph level (i.e. one
pool as known by Ceph)?


For the prototype to be developed as part of GSoC, they are same.
I believe even in future, Ganeti won't enforce only one pool(at
ganeti level). Users would want to use all the infrastructure. The
only reason this discussion arose was because of the loopback bug.

http://tracker.ceph.com/issues/3076

By creating different pools(ceph level) on each node group we can
avoid such situation.
.
Sorry, but I don't think I understand completely.
If I understand correctly, the issue notes that one cannot run the RBD
kernel driver or the CephFS kernel driver on the same physical machine
that runs an OSD. How does this translates to pools?

Doesn't making vm_capable and distributed-storage mutually
exclusive suffice at this point?


Yes, making vm_capable and distribute_storage_capable mutually 
exclusive solves everything. But considering that users have been 
using Ganeti to host VMs and their images on the same machine, we 
shouldn't restrict nodes to be either vm or storage capable.




OK, so this goes for future work, when and if the issue get resolved.

For the time being, we're considering a single pool scenario(both 
ganeti and ceph/gluster wise) where all the machines in that 
nodegroup/pool are dedicated for storage. Once the initial 
implementation works, we can work on more scenarios.


OK. What I was trying to say is that the mechanism that prevents
hitting the above issue has nothing to do with the manipulation of
the Ceph pool. ACK.


* The pool in which to put objects in Ceph is already
configurable at cluster
and nodegroup level via the 'pool' diskparam for the rbd
template.


Yes and that pool refers to the disks being used by instances.
Although, I don't think RADOSBlockDevice implementation in
lib/storage/bdev.py uses it.


http://git.ganeti.org/?p=ganeti.git;a=blob;f=lib/storage/bdev.py;h=44860848f49831e64b35c5e006e49b52eb5f8f56;hb=refs/heads/master#l1128

Unless I'm missing something, this needs to be addressed.


Again I don't seem to get what you mean.

Currently, one can set the rbd pool at nodegroup level. If that's
done, then all disks of all instances in that nodegroup will get
created in that pool. The default pool if nothing is set, is 'rbd'
which is also the default pool for disk images in Ceph.
Isn't that what we want?


Yes, the default pool 'rbd' will work out of the box. Through my 
earlier comment, I was trying to gather your attention to the fact 
that create() method in RADOSBlockDevice class still refers to 
constants.LDP_POOL for fetching the pool name. Editing disk param at 
nodegroup level won't have any effect. As a result, this needs to be 
addressed.


I don't think this is the case. The line you are referring to is:

rbd_pool = params[constants.LDP_POOL]

which should contain the value set at nodegroup level and
not the default.

Am I missing something?



In addition to this, in future if we plan to use multiple pools, for 
location based partitioning or to get around loopback bug, this issue 
ought to be resolved.


Won't we be able to do this by just setting different pools on different
nodegroups, even with the first implementation? Again, I don't see
how this is connected with the loopback bug.

Thanks,
Constantinos


This command should initialize all the nodes in the given
node group for distributed storage. This involves
distributing required configuration files, starting daemons,
etc. If it's not possible to sensibly use the distributed
storage after this step (because there are not enough nodes
in the node group), this distributed storage type should be
marked as disabled in the cluster configuration until it is
actually usable.


Does this mean that if we haven't initialized distributed storage
at all, we won't be able to use the corresponding templates?


If the admin is at this step, one assumption follows (for the
prototype) - /no external ceph/gluster cluster is configured./
If the storage cluster isn't in a healthy/usable state, user
should be warned against using corresponding disk-template.



OK, but this conflicts a bit with the admin's ability to run the
external distributed storage completely independently of
Ganeti and still be able to s

Re: Distribute storage in Ganeti clusters

2013-07-05 Thread Constantinos Venetsanopoulos

Hi Pulkit,

On 7/4/13 21:58 PM, Pulkit Singhal wrote:

Hi Constantinos,

Thanks for your detailed insight. I'm the student working on the Ceph 
integration. Please find my replies inline.


* For instance disks in Ceph/RADOS there is no need for MDSs. We just
need MONs and OSDs. MDSs are needed only for the Ceph "filesystem"
which I don't think is relevant to this discussion.


Yes, you're right. MDS won't be required for RBD.

* When you say 'only one pool will be supported' for Ceph do you
mean one
pool at Ganeti level (i.e. one storage pool) or one pool on Ceph
level (i.e. one
pool as known by Ceph)?


For the prototype to be developed as part of GSoC, they are same. I 
believe even in future, Ganeti won't enforce only one pool(at ganeti 
level). Users would want to use all the infrastructure. The only 
reason this discussion arose was because of the loopback bug.


http://tracker.ceph.com/issues/3076

By creating different pools(ceph level) on each node group we can 
avoid such situation.

.


Sorry, but I don't think I understand completely.
If I understand correctly, the issue notes that one cannot run the RBD
kernel driver or the CephFS kernel driver on the same physical machine
that runs an OSD. How does this translates to pools?

Doesn't making vm_capable and distributed-storage mutually
exclusive suffice at this point?


* The pool in which to put objects in Ceph is already configurable
at cluster
and nodegroup level via the 'pool' diskparam for the rbd template.


Yes and that pool refers to the disks being used by instances. 
Although, I don't think RADOSBlockDevice implementation in 
lib/storage/bdev.py uses it.


http://git.ganeti.org/?p=ganeti.git;a=blob;f=lib/storage/bdev.py;h=44860848f49831e64b35c5e006e49b52eb5f8f56;hb=refs/heads/master#l1128

Unless I'm missing something, this needs to be addressed.


Again I don't seem to get what you mean.

Currently, one can set the rbd pool at nodegroup level. If that's
done, then all disks of all instances in that nodegroup will get
created in that pool. The default pool if nothing is set, is 'rbd'
which is also the default pool for disk images in Ceph.
Isn't that what we want?


Some general Ceph/GlusterFS questions and comments:

* I think that the term 'direct-{rbd,gluster}' is not very
descriptive. Wouldn't
something like: 'access:userspace', 'access:kernelspace' or
--userspace-access,
--kernelspace-access make more sense?


Makes sense.

* Should this option be supported at disk level (i.e. IDISK_PARAM)
or instance
level via an option? Will it be possible to have one disk from
userspace and one
from kernelspace in the same instance?


I plan to support it at the disk level. The information will be passed 
to the RBD class in 'unique_id' tuple and it'll act accordingly.


ACK.


* What happens if someone has already setup a cluster and doesn't
want it to
be handled by Ganeti (i.e. for Ceph already having an
/etc/ceph.conf), but just
have the instances access it, just as happens now?


Previous ceph.conf is respected and cluster configuration is 
warned/denied.


That sounds good. ACK.


Finally, regarding the Attach() procedure: Currently, attach
actually returns
the dev_path. Don't we need something similar that will return the
corresponding
kvm argument (i.e.: file=rbd://)? Should we
introduce a new
variable, let's say 'dev_kvm_arg' for that, since we can't use
dev_path due to the
symlinking afterwards? The variable will get filled only if the
'direct' or whatever
flag is present.


I'm working on the implementation details. I'll get back to you on this.


OK.


On 7/1/13 04:40 AM, Thomas Thrainer wrote:

Distributed storage configuration parameters will be stored at
the cluster level. We only support one storage pool per type
(Ceph/Gluster) in the beginning, and all its parameters are
stored at the cluster level. Note that we initially only want to
support the minimum amount of parameters to keep the
implementation simple. Those parameters should behave like the
hypervisor parameters from a users point of view. Thus, the
cluster init and modify command will get additional options:
`gnt-cluster init -S `
`gnt-cluster modify -S `
(modification of parameters might be restricted as it's necessary)


What do you have in mind regarding these parameters? For example, for
Ceph, the only thing you need to interact with the cluster is the
IPs for each
MON. This is what is included in the minimum /etc/ceph.conf file for a
working setup.


The purpose of the parameters is to allow monitoring/osd specific 
configuration parameters, e.g. osd:/dev/sdb1,/dev/sdc1 etc. More 
advanced options can be added later for providing better ceph 
configuration control through Ganeti.


ACK.


Node groups will be used to actually specify which nodes 

Re: Distribute storage in Ganeti clusters

2013-07-03 Thread Constantinos Venetsanopoulos

Hello everybody,

I just finished reading all the relevant threads for the distributed storage
support in Ganeti (Ceph/GlusterFS), which have grown significantly fast
the last days :)

I'd like to make some questions and some comments, if I may. First of all,
some comments regarding Ceph:

* For instance disks in Ceph/RADOS there is no need for MDSs. We just
need MONs and OSDs. MDSs are needed only for the Ceph "filesystem"
which I don't think is relevant to this discussion.

* When you say 'only one pool will be supported' for Ceph do you mean one
pool at Ganeti level (i.e. one storage pool) or one pool on Ceph level 
(i.e. one

pool as known by Ceph)?

* The pool in which to put objects in Ceph is already configurable at 
cluster

and nodegroup level via the 'pool' diskparam for the rbd template.


Some general Ceph/GlusterFS questions and comments:

* I think that the term 'direct-{rbd,gluster}' is not very descriptive. 
Wouldn't
something like: 'access:userspace', 'access:kernelspace' or 
--userspace-access,

--kernelspace-access make more sense?

* Should this option be supported at disk level (i.e. IDISK_PARAM) or 
instance
level via an option? Will it be possible to have one disk from userspace 
and one

from kernelspace in the same instance?

* What happens if someone has already setup a cluster and doesn't want it to
be handled by Ganeti (i.e. for Ceph already having an /etc/ceph.conf), 
but just

have the instances access it, just as happens now?

Finally, regarding the Attach() procedure: Currently, attach actually 
returns
the dev_path. Don't we need something similar that will return the 
corresponding

kvm argument (i.e.: file=rbd://)? Should we introduce a new
variable, let's say 'dev_kvm_arg' for that, since we can't use dev_path 
due to the
symlinking afterwards? The variable will get filled only if the 'direct' 
or whatever

flag is present.

Some additional inline comments follow.


On 7/1/13 04:40 AM, Thomas Thrainer wrote:

Hello,

@Harry, Pulkit: We (as the Ganeti core engineers) were discussing the 
design of distributed storage (Ceph and GlusterFS) in Ganeti today. I 
wanted to share our thoughts, so you have the same design decisions in 
mind when you work on your projects.


The goal is that Ganeti can be used to configure distributed storage 
within a cluster automatically. Installation of required software is 
not in the scope of Ganeti.


We made a couple of simplifications which reduce the effort the GSoC 
projects have to put into the implementation. The goal of those 
projects should be a working initial version on which improvements can 
be based.


Distributed storage configuration parameters will be stored at the 
cluster level. We only support one storage pool per type 
(Ceph/Gluster) in the beginning, and all its parameters are stored at 
the cluster level. Note that we initially only want to support the 
minimum amount of parameters to keep the implementation simple. Those 
parameters should behave like the hypervisor parameters from a users 
point of view. Thus, the cluster init and modify command will get 
additional options:

`gnt-cluster init -S `
`gnt-cluster modify -S ` (modification 
of parameters might be restricted as it's necessary)




What do you have in mind regarding these parameters? For example, for
Ceph, the only thing you need to interact with the cluster is the IPs 
for each

MON. This is what is included in the minimum /etc/ceph.conf file for a
working setup.

Node groups will be used to actually specify which nodes belong to the 
distributed storage cluster. To assign a distributed storage cluster 
to a node group, `gnt-storage` will be used:

`gnt-storage connect  `



Is 'connect' what we want here? I think 'connect' is more suitable
for denoting/enabling allocation policies and applying common
parameters rather than initializing a storage backend. See also the
use in networks and the design doc for the abstract storage pool concept.
I'd propose something like 'assign'/'assosiate'/'init-distributed-storage'.
What do you think?

Also, I didn't understand where you concluded regarding nodes
that are both vm_capable and part of a distributed-storage nodegroup.
Will that be an option or these are mutual exclusive?

This command should initialize all the nodes in the given node group 
for distributed storage. This involves distributing required 
configuration files, starting daemons, etc. If it's not possible to 
sensibly use the distributed storage after this step (because there 
are not enough nodes in the node group), this distributed storage type 
should be marked as disabled in the cluster configuration until it is 
actually usable.


Does this mean that if we haven't initialized distributed storage
at all, we won't be able to use the corresponding templates?

Kind Regards,
Constantinos

Ganeti should automatically keep track of "special" nodes, which run 
things like metadata or monitoring daemons. This can be done by 
following the official gu

Re: [PATCH master 0/4] Adding support for detachings disks from instances

2013-06-27 Thread Constantinos Venetsanopoulos

On 6/27/13 12:15 PM, Guido Trotter wrote:

On Thu, Jun 27, 2013 at 8:25 PM, Constantinos Venetsanopoulos
 wrote:

Hello team,

Just a few questions, since some things are not 100% clear
to me.

What is the exact behavior we want when detaching a disk?
I mean should we actually run the 'shutdown' functions
of bdev? If yes, where is the detach rpc implemented at
'backend' level?

Is 'detach' supported for all disk templates?


Actually this is not completely clear to me, and also it's not clear
why we started to work on this without a design doc.
Ok, michele opened a bug about this. And ok, we have ideas to treat
disks as entities that will require some sort of similar
functionalities, but it's still not a good reason to "just add
functionality" without a proper design.


I also agree, and that's why I asked..

Thanks,
Constantinos


Thanks,

Guido




Re: [PATCH master 0/4] Adding support for detachings disks from instances

2013-06-27 Thread Constantinos Venetsanopoulos

Hello team,

Just a few questions, since some things are not 100% clear
to me.

What is the exact behavior we want when detaching a disk?
I mean should we actually run the 'shutdown' functions
of bdev? If yes, where is the detach rpc implemented at
'backend' level?

Is 'detach' supported for all disk templates?

Thanks,
cven


On 6/26/13 01:39 AM, Klaus Aehlig wrote:

On Wed, Jun 26, 2013 at 02:35:00AM +0200, Sebastian Gebhard wrote:

This patch series allows detaching disks from instances without deleting them, 
so
they can be reused. Fixes issue 482.

Sebastian Gebhard (4):
   Define DETACH functions for DDM
   Add functionality for detaching disks
   Add infos about detaching to man page of gnt-instance
   Adapt unittests for new detach functionality

  lib/client/gnt_instance.py|  3 ++
  lib/cmdlib/instance.py| 60 ---
  lib/constants.py  |  3 ++
  lib/rpc_defs.py   |  3 ++
  man/gnt-instance.rst  |  7 +++--
  test/py/ganeti.cmdlib_unittest.py | 26 ++---
  6 files changed, 78 insertions(+), 24 deletions(-)

As mentioned, patches 2--4 should be squashed (using the log
message of patch 2). Also, please fix the lint errors.

Rest LGTM.





Re: GlusterFS Ganeti Support Plan

2013-06-17 Thread Constantinos Venetsanopoulos

Hello everybody,

On 06/08/2013 02:32 PM, harryxiyou wrote:

On Sat, Jun 8, 2013 at 6:28 PM, Guido Trotter  wrote:
[...]

Would the other way around make more sense? It seems that internal mode
requires external mode, but not give versa.


What do you mean here exactly? Internal mode (a.k.a implementation
inside bdev.py) does not require the external mode (a.k.a implementation
as an ExtStorage provider). You probably need the first, since you want
Ganeti to have native support and not depending on installing an
ExtStorage provider for GlusterFS support to work.


  (Internal and external should
behave the same for instances,


That's true. Even the most part of the code would be the same if
someone wanted to write a GlusterFS ExtStorage provider, but I
don't think this is what we want right now.


it's just node
add/remove/offline/online/drain that would change and create a gluster pool
that then can be used (however it's hosted). Or am I missing something?



Ack.

Kind Regards,
Constantinos


Re: How to use RBD for Ganeti

2013-06-17 Thread Constantinos Venetsanopoulos

Hello everybody,

RBD with Ganeti has been tested extensively and is fairly
simple to set up. You just need the 4 things mentioned in
the guide Michele pointed you at.

In a nutshell, if you are installing with the RADOS cluster
on separate machines you need:

On the Ganeti nodes running the instances:

 * The corresponding rbd kernel modules or linux kernel >= 3.2
 * The `ceph-common' package installed
 * A valid ceph.conf configuration file

On the machines running the actual RADOS cluster:

 * A working setup of RADOS

Once the above are setup, you can spin up rbd instances by
just defining the corresponding disk template at the
gnt-instance command (`-t rbd').


Regards,
Constantinos

On 06/17/2013 10:50 AM, harryxiyou wrote:

On Mon, Jun 17, 2013 at 3:18 PM, Michele Tartara  wrote:
[...]

I have never set it up personally, but that's part of the official
documentation of Ganeti, so, yes, somebody definitely tried it. :-)


Yeah, let me have a try, thanks very much ;-)




--
Thanks
Harry Wei




snf-image v0.9.1 released (support for FreeBSD images as guests)

2013-05-30 Thread Constantinos Venetsanopoulos

Hello everybody,

we are happy to announce the release of snf-image v0.9.1.

The major feature in this release is the new support for
FreeBSD-based images as guests. We also created and
sharing the first FreeBSD 9.1 image! You can find it here:

https://code.grnet.gr/projects/snf-image/wiki/Wiki#Sample-Images

As you will see above, we also have a new image for Debian
Wheezy and also updated most of our other images to newer
versions too.

Finally, during the last release cycle, we tested snf-image with
Xen-based Ganeti clusters too, to be sure for the stability of
the Xen support we introduced on v0.8. No problems occurred,
so we encourage you to take it for a spin, if you are running
Ganeti with Xen.

As usual, you can install snf-image using our apt repository:

http://apt.dev.grnet.gr

or find the packages/source here:

https://code.grnet.gr/projects/snf-image/files

For more information about the design, installation, configuration,
sample images, etc., please take a look at the wiki:

https://code.grnet.gr/projects/snf-image/wiki

As always, any kind of feedback is highly appreciated.


Kind Regards,
on behalf of the Synnefo team,
Constantinos


Re: [PATCH master] Extension of storage reporting design doc

2013-03-26 Thread Constantinos Venetsanopoulos

On 03/25/2013 04:23 PM, Helga Velroyen wrote:

This patch rewrites and extends the design doc about storage reporting
with respect to disk templates and storage types. In constrast to the
previous version, we now consider disk templates as the user-facing
entity, that the user can dis/enabled for the cluster. Storage types
on the other hand describe the underlying technology used by the various
disk templates. Storage reporting will use a mapping from disk templates
to storage types to pick the correct method to report the storage for
the respective disk templates.

Note that the design doc in this state still contains some questions and
FIXMEs. Feel free to comment on those. I will complete them directly or
in future patches.

Signed-off-by: Helga Velroyen 
---
  doc/design-storagespace.rst | 232 +++-
  1 file changed, 165 insertions(+), 67 deletions(-)

diff --git a/doc/design-storagespace.rst b/doc/design-storagespace.rst
index f22546a..42bb7a2 100644
--- a/doc/design-storagespace.rst
+++ b/doc/design-storagespace.rst
@@ -7,77 +7,170 @@ Storage free space reporting
  Background
  ==
  
-Currently Space reporting is broken for all storage types except drbd or

-lvm (plain). This design looks at the root causes and proposes a way to
-fix it.
+Currently, there is no consistent management of different variants of storage
+in Ganeti. One direct consequence is that storage space reporting is currently
+broken for all storage that is not based on lvm technolgy. This design looks at
+the root causes and proposes a way to fix it.
+
+FIXME: rename the design doc to make clear that space reporting is not the only
+thing covered here?
  
  Proposed changes

  
  
-The changes below will streamline Ganeti to properly support

-interaction with different storage types.
+We propose to streamline handling of different storage types and disk 
templates.
+Currently, there is no consistent implementation for dis/enabling of disk
+templates and/or storage types.
+
+Our idea is to introduce a list of enabled disk templates, which can be
+used by instances in the cluster. Based on this list, we want to provide
+storage reporting mechanisms for the available disk templates. Since some
+disk templates share the same underlying storage technology (for example
+``drbd`` and ``plain`` are based on ``lvm``), we map disk templates to storage
+types and implement storage space reporting for each storage type.
  
  Configuration changes

  -
  
-Add a new attribute "enabled_storage_types" (type: list of strings) to the

-cluster config which holds the types of storages, for example, "file", "rados",
-or "ext". We consider the first one of the list as the default type.
+Add a new attribute "enabled_disk_templates" (type: list of strings) to the
+cluster config which holds disk templates, for example, "drbd", "file",
+or "ext". This attribute represents the list of disk templates that are enabled
+cluster-wide for usage by the instances. It will not be possible to create
+instances with a disk template that is not enabled, as well as it will not be
+possible to remove a disk template from the list if there are still instances
+using it.
+
+The list of enabled disk templates can contain any non-empty subset of
+the currently implemented disk templates: ``blockdev``, ``diskless``, ``drbd``,
+``ext``, ``file``, ``plain``, ``rbd``, and ``sharedfile``. See
+``DISK_TEMPLATES`` in ``constants.py``.
+
+Note that the abovementioned list of enabled disk types is just a "mechanism"
+parameter that defines which disk templates the cluster can use. Further
+filtering about what's allowed can go in the ipolicy, which is not covered in
+this design doc. Note that it is possible to force an instance to use a disk
+template that is not allowed by the ipolicy. This is not possible if the
+template is not enabled by the cluster.
+
+We consider the first disk template in the list to be the default template for
+instance creation and storage reporting. This will remove the need to specify
+the disk template with ``-t`` on instance creation.
+
+Currently, cluster-wide dis/enabling of disk templates is not implemented
+consistently. ``lvm`` based disk templates are enabled by specifying a volume
+group name on cluster initialization and can only be disabled by explicitly
+using the option ``--no-lvm-storage``. This will be replaced by adding/removing
+``drbd`` and ``plain`` from the set of enabled disk templates.
+
+Up till now, file storage and shared file storage could be dis/enabeled at
+``./configure`` time. This will also be replaced by adding/removing the
+respective disk templates from the set of enabled disk templates.
+
+There is currently no possibility to dis/enable the disk templates
+``diskless``, ``blockdev``, ``ext``, and ``rdb``. By introducing the set of
+enabled enabled disk templates, we will require these disk templates to be


typo: "enabled enabled"


+

Ganeti 2.7 debian/rules file wrt ExtStorage

2013-03-22 Thread Constantinos Venetsanopoulos

Hello,

Just a reminder since we are close to the 2.7 release and I don't
know when you are going to build the relevant debian packages.
For ExtStorage which will be released with 2.7 you need the
following line in the debian/rules file in the `build-stamp' section:

--with-extstorage-search-path=/srv/ganeti/extstorage,/usr/local/lib/ganeti/extstorage,/usr/lib/ganeti/extstorage,/usr/share/ganeti/extstorage

Kind Regards,
Constantinos


Re: Why is there a different Storage Type for file and shared file?

2013-03-22 Thread Constantinos Venetsanopoulos

Hello Helga,

On 03/22/2013 09:50 AM, Helga Velroyen wrote:

Hi Constantinos,

thanks for your input!

I was wondering about that, too, but for now I left them as separate 
types. Of course, the way to calculate the free storage would use the 
same mechanisms, but for file we have to ask every node while for 
shared file storage we might optimize it in a way that not all nodes 
are queried about the same shared storage.




Please see my previous email. The basic argument is that you can decide 
on whether
to query all or one node by checking out the disk template and not the 
disk type.


Kind Regards,
Constantinos

I haven't started working on the backend yet, so don't consider it set 
in stone yet.


Cheers,
Helga


On Thu, Mar 21, 2013 at 10:37 PM, Guido Trotter <mailto:ultrot...@google.com>> wrote:


The difference here is that for file we care about space (eg. when
making iallocator decisions) while for sharedfile we don't.

Also one might want to enable or disable them separately. Although
that might also be for drbd and lvm... Mmm...

Thanks,

Guido

    On 21 Mar 2013 17:47, "Constantinos Venetsanopoulos"
mailto:c...@grnet.gr>> wrote:

Hello team,

I just saw the enabled_storage_types patch series and I have
one question:

Why is there a different Storage Type for the disk template file
and shared file?

# mapping of disk templates to storage types
DISK_TEMPLATES_STORAGE_TYPE = {
  DT_BLOCK: ST_BLOCK,
  DT_DISKLESS: ST_DISKLESS,
  DT_DRBD8: ST_LVM_VG,
  DT_EXT: ST_EXT,
  DT_FILE: ST_FILE,
  DT_PLAIN: ST_LVM_VG,
  DT_RBD: ST_RADOS,
  DT_SHARED_FILE: ST_SHARED_FILE,
  }

Shouldn't we only have only one type for files (ST_FILE) since its
the same backend technology? The difference between
DT_SHARED_FILE and DT_FILE is only there to differentiate the
mirroring logic. Why do we need a different Storage Type to
calculate free space? Shouldn't all nodes report the free storage
space under the directory independently whether this directory
is local or shared?

Regards,
Constantinos






Re: Why is there a different Storage Type for file and shared file?

2013-03-22 Thread Constantinos Venetsanopoulos

On 03/21/2013 11:37 PM, Guido Trotter wrote:


The difference here is that for file we care about space (eg. when 
making iallocator decisions) while for sharedfile we don't.




Indeed, but for iallocator decisions we currently use the disk template
of the instance and not the disk type and I think this is the right thing
to do in the future too. E.g. when we are making decisions about drbd
instance moves/migration/failover, we don't care that the underlying
technology is of type ST_LVM_VG. But when we want to report free space
we do.

Also one might want to enable or disable them separately. Although 
that might also be for drbd and lvm... Mmm...




Exactly. How do you disable drbd but keep lvm support with the current
design?

Now that I think about it again, I'm not even sure we need to differentiate
between storage template and storage type for something else rather than
free space reporting. It seems to me that we are increasing the complexity
a bit here, without obvious advantages. After all IMHO, the user only cares
about enabling and disabling the disk templates he will later use.

So, just a thought: why not enable/disable disk templates and have an 
internal

mapping only for storage space reporting that says:

DT_DRBD8 -> ST_LVM_VG
DT_PLAIN -> ST_LVM_VG
DT_RBD -> ST_RADOS
DT_FILE -> ST_FILE
DT_SHARED_FILE -> ST_FILE

so that free spacing reporting logic knows about the underlying technology
of the disk template? The same logic will figure out whether to query all
nodes or one node (for shared storage) again by combining the disk template
with the storage type.

Thoughts?

Kind Regards,
Constantinos


Thanks,

Guido

On 21 Mar 2013 17:47, "Constantinos Venetsanopoulos" <mailto:c...@grnet.gr>> wrote:


Hello team,

I just saw the enabled_storage_types patch series and I have
one question:

Why is there a different Storage Type for the disk template file
and shared file?

# mapping of disk templates to storage types
DISK_TEMPLATES_STORAGE_TYPE = {
  DT_BLOCK: ST_BLOCK,
  DT_DISKLESS: ST_DISKLESS,
  DT_DRBD8: ST_LVM_VG,
  DT_EXT: ST_EXT,
  DT_FILE: ST_FILE,
  DT_PLAIN: ST_LVM_VG,
  DT_RBD: ST_RADOS,
  DT_SHARED_FILE: ST_SHARED_FILE,
  }

Shouldn't we only have only one type for files (ST_FILE) since its
the same backend technology? The difference between
DT_SHARED_FILE and DT_FILE is only there to differentiate the
mirroring logic. Why do we need a different Storage Type to
calculate free space? Shouldn't all nodes report the free storage
space under the directory independently whether this directory
is local or shared?

Regards,
Constantinos





Why is there a different Storage Type for file and shared file?

2013-03-21 Thread Constantinos Venetsanopoulos

Hello team,

I just saw the enabled_storage_types patch series and I have
one question:

Why is there a different Storage Type for the disk template file
and shared file?

# mapping of disk templates to storage types
DISK_TEMPLATES_STORAGE_TYPE = {
  DT_BLOCK: ST_BLOCK,
  DT_DISKLESS: ST_DISKLESS,
  DT_DRBD8: ST_LVM_VG,
  DT_EXT: ST_EXT,
  DT_FILE: ST_FILE,
  DT_PLAIN: ST_LVM_VG,
  DT_RBD: ST_RADOS,
  DT_SHARED_FILE: ST_SHARED_FILE,
  }

Shouldn't we only have only one type for files (ST_FILE) since its
the same backend technology? The difference between
DT_SHARED_FILE and DT_FILE is only there to differentiate the
mirroring logic. Why do we need a different Storage Type to
calculate free space? Shouldn't all nodes report the free storage
space under the directory independently whether this directory
is local or shared?

Regards,
Constantinos


Re: Clarification wrt disks' `physical_id' and `logical_id'

2013-03-19 Thread Constantinos Venetsanopoulos

On 03/19/2013 03:48 PM, Iustin Pop wrote:

On Wed, Mar 13, 2013 at 01:56:39PM +0200, Constantinos Venetsanopoulos wrote:

Hello team,

I'd like to know a bit more about rationale behind the existence of
the `physical_id' and `logical_id' for disks. I see that `physical_id'
differs from `logical_id' only for drbd disks, and it is stored in
config.data only for drbd disks.

Also, when I add a new disk to a drbd instance, then the second disk's
`physical_id' is not stored in config.data, whereas the first disk's
`physical_id' is there. Why does this happen?

The physical_id is or not stored, depending on how/if/when the disk RPCs
are called. It's the only thing which updates is.


Ack. So, you don't actually need it for the additional disks when e.g.
rebuilding the instance.


Wouldn't it make more sense to just have only one of the two?

Yes. That is the plan, however cleaning this up is low on the priority
list.


Ack.


physical_id is a leftover from the drdb7/md implementation, and is not
needed anymore except for the fact that it's easier to implement the RPC
with it.

The following would need to be done:

- extend disk RPCs to take the physical_id, or change RPC code so that
   it updated the logical_id correctly
- remove the physical_id slot from the instance.


I assume you mean the slot from the disk and not the instance here.
Right?

So, just to double check, ideally we would only need the logical_id.
Would it have the format it has now or we would need to update
it via RPC to the physical_id format?

Thanks for clarifying things,
Constantinos


Clarification wrt disks' `physical_id' and `logical_id'

2013-03-13 Thread Constantinos Venetsanopoulos

Hello team,

I'd like to know a bit more about rationale behind the existence of
the `physical_id' and `logical_id' for disks. I see that `physical_id'
differs from `logical_id' only for drbd disks, and it is stored in
config.data only for drbd disks.

Also, when I add a new disk to a drbd instance, then the second disk's
`physical_id' is not stored in config.data, whereas the first disk's
`physical_id' is there. Why does this happen?

Wouldn't it make more sense to just have only one of the two?

Thanks in advance,
Constantinos



Re: Naming: Rbd versus Drbd

2013-03-12 Thread Constantinos Venetsanopoulos

Hi,

you are definitely right that those two are very similar and I used
to have the same problem when implementing RBD support.

However, probably RBD is more accurate, because it refers to the
Rados Block Device which is the actual driver you use to talk with
the storage backend; that's what the instance sees. Whereas
RADOS is the external storage cluster itself (the Object Store).

QEMU also refers to RADOS in the same way (as `rbd'). See here:
http://ceph.com/docs/master/rbd/qemu-rbd/

In any way, I don't have a strong opinion and think that indeed it
would simplify things a lot.

That means that we would also refer to it with `-t rados' ?

Constantinos


On 03/12/2013 06:35 PM, Iustin Pop wrote:

Hi all,

It's nigh on impossible to search (meaningfully) for "rbd", as the very
similar naming to "drbd" makes it hard (grep -w, yes, but still). And
it also makes it harder to read code, as you always have to squint and
make sure there are 3 or 4 letters :)

What about renaming rbd to rados? In the Haskell code I've already used
this name, and it makes for much more readable code (IMHO).

iustin




Fwd: Ganeti core + Grnet storage plans meeting

2013-02-23 Thread Constantinos Venetsanopoulos

Forwarding to synnefo-devel which is (our public list).


 Original Message 
Subject:Ganeti core + Grnet storage plans meeting
Date:   Fri, 22 Feb 2013 13:16:37 -0800
From:   Guido Trotter 
To: Ganeti Development 



Hi all,

We had a quick meeting w.r.t. all the storage design and discussions
that had happened on the list and at conferences in the last few
months. Here is a quick summary, please let us know if you have any
comments:

Participants: ultrotter, helgav, dimitris, constantinos, vangelis, christos

Topics: Free storage reporting, storage pools, multiple disk backends
per instrance, disk and nic referencing, RBD, hotplug.

Outcomes:
- Free storage reporting: this can be done without conflicting with
future features. helgav will be updating the design doc and proceed
with the implementation
- storage pools: new design doc coming in march
- multiple disk backends for an instance: to be removed from the
storage pool design. new design doc to be written. (possibly will
depend on storage pools)
- disks and nic referencing: it would be useful to be able to
reference to disks and nics by UUID and/or (optionally) by name. grnet
(christos) to write design and implementation, for 2.8.
- disks as separate entities can be considered afterwards (second part
of 2013) as well.
- hotplug: this is deferred until after UUID referencing is implemented.
- RBD: userspace connection from kvm not to be implemented for now by grnet
- RBD: managing of RBD storage as ganeti nodes can be considered as an
idea, after storage pools are implemented.

Thanks,

Guido





Re: [RFC master] Initial Storage Pools design doc

2013-02-14 Thread Constantinos Venetsanopoulos

Hello everybody,

firing up the thread again :)

During the merge of the ExtStorage patch series we had a very interesting
discussion with Iustin and seems that we are getting close to bridging
whatever gaps, after Guido's comments on this thread too.

Please see here:

https://groups.google.com/forum/?fromgroups=#!searchin/ganeti-devel/Multiple$20ExtStorage$20Providers$20and$20ext-params/ganeti-devel/d8lE_JwHJ70/lfOwS2LlrCwJ

so that we can all be in sync. I think it combines and merges more or less
everybody's comments.

comments inline with reference to the above mail.

On 09/14/2012 11:47 AM, Guido Trotter wrote:

On Thu, Sep 13, 2012 at 4:31 PM, Constantinos Venetsanopoulos
 wrote:


Yes I agree on the mix! So, I think the best would be to have a
cluster-level
dict that defines each pool, but not in the previous format. The template
would be just one of the pool's attributes. And there, we should also define
specific template's params, by setting only the ones that we want to
override
from the default cluster-level disk-params dict. So it would look like this:

storage_pools: {
   "default_drbd_pool": {
 "template": "drbd",
 "template_params": {}
   }
   "second_drbd_pool": {
 "template": "drbd",
 "template_params": {
   "metavg": "different_metavg_than_the_default"
 }
   },
   "default_lvm_pool: {
 "template": "plain",
 "template_params": {}
   }
   "second_lvm_pool: {
 "template": "plain",
 "template_params": {
   "stripes": 2
 }
   },
   "radoscluster1": {
 "template": "rbd",
 "template_params": {}
   },
   "radoscluster2": {
 "template": "rbd",
 "template_params": {
   "pool": "ganetipool"
 }
   },

This sounds good. Just confirming with the Haskell overlords: is this
the best data structure, for parsing and handling in a strongly typed
language?
I believe so, since the "differentiation" between what template_params
should be is defined at the external level in the "template"
parameter. Do you confirm?


   "nas1": {
 "template": "ext",
 "template_params": {},
 "provider": "emc"
   },
   "nas2": {
 "template": "ext",
 "template_params": {},
 "provider": "netapp"
   }
}

Here we might have to put "provider" inside the params, and perhaps if
needed have an extra level of params there, for the reason above.
(unless all pools support a provider, in which case it might be that
they support only one)


ACK. See reference email. We could set 'provider' as a template
parameter and also mark it as modifyable for template 'ext'.
Then we could just ensure than when creating an 'ext' Storage
Pool, this specific parameter should be always set at Storage
Pool level. I think that solves both our problems.


If we do that, then indeed we do not need parameters being specified at
nodegroup level. There, we will only define the connected_storage_pools
attribute as a list available pools for the specific nodegroup, defining the
mobility domain (exactly as you proposed).



Note that in my proposal it was still possible to override "some"
parameters. Example of this would be instances in the same pool, but
which change dynamic parameters such as resync speed in different
nodegroup (due to different hardware capabilities of the nodes in each
group).
If we don't do this we must have an "easy" way to move instances
between pools, without copying the data if the template is the same,
which sounds more tricky! :/


Another thing to do, is make the "vg" a disk parameter for templates
"plain" and "drbd" to unify the design and align with metavg.


Yes, vg and metavg should be template level parameters. I think
removing the possibility of configuring those separately for each
instance and just tying them to the template is fine.
In the future we might have parameters overridden at the instance
level to solve this as well.


ACK.


We can also cease the disk-params dict on nodegroup level, which
complicates the inheritance issues and doesn't seem to add any
functionality if we proceed with the above design. The only thing I can
think of is: if a parameter is set at nodegroup level, use that, to override
the value found inside the "template_params". However this seems a bit
ugly to me, because it complicates a lot the flow of the inheritance.


No, I'm not sure we can do that, see above. It actually is a needed
functionality to differentiate what drbd does in different nodegroups

[PATCH devel-2.7] Update man pages wrt ExtStorage

2013-02-14 Thread Constantinos Venetsanopoulos
 * ganeti-extstorage-interface: add examples
 * gnt-instance: document the “ext” template
 * remove a few double spaces

Also link to its design doc in the design-2.7 doc

Signed-off-by: Constantinos Venetsanopoulos 
---

Hello,

resending after applying all comments by Guido.

Regards,
Constantinos

 doc/design-2.7.rst  |2 +
 doc/design-shared-storage.rst   |8 +-
 man/ganeti-extstorage-interface.rst |   33 -
 man/gnt-instance.rst|  136 ---
 4 files changed, 130 insertions(+), 49 deletions(-)

diff --git a/doc/design-2.7.rst b/doc/design-2.7.rst
index 1c27e24..3b5303d 100644
--- a/doc/design-2.7.rst
+++ b/doc/design-2.7.rst
@@ -11,6 +11,8 @@ The following design documents have been implemented in 
Ganeti 2.7:
 - :doc:`design-virtual-clusters`
 - :doc:`design-network`
 - :doc:`design-linuxha`
+- :doc:`design-shared-storage` (Updated to reflect the new ExtStorage
+  Interface)
 
 The following designs have been partially implemented in Ganeti 2.7:
 
diff --git a/doc/design-shared-storage.rst b/doc/design-shared-storage.rst
index 05e91c0..5c9a9a5 100644
--- a/doc/design-shared-storage.rst
+++ b/doc/design-shared-storage.rst
@@ -1,9 +1,9 @@
-==
-Ganeti shared storage support for 2.3+
-==
+=
+Ganeti shared storage support
+=
 
 This document describes the changes in Ganeti 2.3+ compared to Ganeti
-2.3 storage model.
+2.3 storage model. It also documents the ExtStorage Interface.
 
 .. contents:: :depth: 4
 .. highlight:: shell-example
diff --git a/man/ganeti-extstorage-interface.rst 
b/man/ganeti-extstorage-interface.rst
index 4b1e0b1..81c03c8 100644
--- a/man/ganeti-extstorage-interface.rst
+++ b/man/ganeti-extstorage-interface.rst
@@ -66,7 +66,6 @@ VOL_METADATA
 EXECUTABLE SCRIPTS
 --
 
-
 create
 ~~
 
@@ -198,7 +197,6 @@ The script should return ``0`` on success.
 TEXT FILES
 --
 
-
 parameters.list
 ~~~
 
@@ -213,6 +211,37 @@ The parameters can then be used during instance add as 
follows::
 
 # gnt-instance add --disk=0:fromsnap="file_name",nas_ip="1.2.3.4" ...
 
+EXAMPLES
+
+
+In the following examples we assume that you have already installed
+successfully two ExtStorage providers: ``pvdr1`` and ``pvdr2``
+
+Add a new instance with a 10G first disk provided by ``pvdr1`` and a 20G
+second disk provided by ``pvdr2``::
+
+# gnt-instance add -t ext --disk=0:size=10G,provider=pvdr1
+  --disk=1:size=20G,provider=pvdr2
+
+Add a new instance with a 5G first disk provided by provider ``pvdr1``
+and also pass the ``prm1``, ``prm2`` parameters to the provider, with
+the corresponding values ``val1``, ``val2``::
+
+   # gnt-instance add -t ext
+  --disk=0:size=5G,provider=pvdr1,prm1=val1,prm2=val2
+
+Modify an existing instance of disk type ``ext`` by adding a new 30G
+disk provided by provider ``pvdr2``::
+
+   # gnt-instance modify --disk 1:add,size=30G,provider=pvdr2 
+
+Modify an existing instance of disk type ``ext`` by adding 2 new disks,
+of different providers, passing one parameter for the first one::
+
+   # gnt-instance modify --disk 2:add,size=3G,provider=pvdr1,prm1=val1
+ --disk 3:add,size=5G,provider=pvdr2
+ 
+
 NOTES
 -
 
diff --git a/man/gnt-instance.rst b/man/gnt-instance.rst
index 1cb945d..6fe996b 100644
--- a/man/gnt-instance.rst
+++ b/man/gnt-instance.rst
@@ -29,6 +29,7 @@ ADD
 | **add**
 | {-t|\--disk-template {diskless | file \| plain \| drbd \| rbd}}
 | {\--disk=*N*: {size=*VAL* \| 
adopt=*LV*}[,vg=*VG*][,metavg=*VG*][,mode=*ro\|rw*]
+|  \| {size=*VAL*,provider=*PROVIDER*}[,param=*value*... ][,mode=*ro\|rw*]
 |  \| {-s|\--os-size} *SIZE*}
 | [\--no-ip-check] [\--no-name-check] [\--no-start] [\--no-install]
 | [\--net=*N* [:options...] \| \--no-nics]
@@ -50,12 +51,20 @@ The ``disk`` option specifies the parameters for the disks 
of the
 instance. The numbering of disks starts at zero, and at least one disk
 needs to be passed. For each disk, either the size or the adoption
 source needs to be given, and optionally the access mode (read-only or
-the default of read-write) and the LVM volume group can also be
-specified (via the ``vg`` key). For DRBD devices, a different VG can
-be specified for the metadata device using the ``metavg`` key.  The
-size is interpreted (when no unit is given) in mebibytes. You can also
-use one of the suffixes *m*, *g* or *t* to specify the exact the units
-used; these suffixes map to mebibytes, gibibytes and tebibytes.
+the default of read-write). The size is interpreted (when no unit is
+given) in mebibytes. You can also use one of the suffixes *m*, *g* or
+*t* to specify the exact the units used; these suffixes map to
+mebibytes, gibibytes and tebibytes. For LVM and DRBD devices, t

[PATCH devel-2.7] Update man pages wrt ExtStorage

2013-02-14 Thread Constantinos Venetsanopoulos
 * ganeti-extstorage-interface: add examples
 * gnt-instance: document the “ext” template

Also link to its design doc in the design-2.7 doc

Signed-off-by: Constantinos Venetsanopoulos 
---

Hello team,

this addresses issue 354.

In the future you could also copy/paste some parts from the man pages
and the design-shared-storage document to create a new top level section
(after "Ganeti remote API" ?) that will refer to ExtStorage. Or just
link to its design doc from the index, if you would like it to be more
visible to contributors that would like to write their own ExtStorage
providers or use it.

Regards,
Constantinos

 doc/design-2.7.rst  |2 +
 doc/design-shared-storage.rst   |8 +-
 man/ganeti-extstorage-interface.rst |   33 -
 man/gnt-instance.rst|  131 ++
 4 files changed, 122 insertions(+), 52 deletions(-)

diff --git a/doc/design-2.7.rst b/doc/design-2.7.rst
index 1c27e24..3b5303d 100644
--- a/doc/design-2.7.rst
+++ b/doc/design-2.7.rst
@@ -11,6 +11,8 @@ The following design documents have been implemented in 
Ganeti 2.7:
 - :doc:`design-virtual-clusters`
 - :doc:`design-network`
 - :doc:`design-linuxha`
+- :doc:`design-shared-storage` (Updated to reflect the new ExtStorage
+  Interface)
 
 The following designs have been partially implemented in Ganeti 2.7:
 
diff --git a/doc/design-shared-storage.rst b/doc/design-shared-storage.rst
index 05e91c0..5c9a9a5 100644
--- a/doc/design-shared-storage.rst
+++ b/doc/design-shared-storage.rst
@@ -1,9 +1,9 @@
-==
-Ganeti shared storage support for 2.3+
-==
+=
+Ganeti shared storage support
+=
 
 This document describes the changes in Ganeti 2.3+ compared to Ganeti
-2.3 storage model.
+2.3 storage model. It also documents the ExtStorage Interface.
 
 .. contents:: :depth: 4
 .. highlight:: shell-example
diff --git a/man/ganeti-extstorage-interface.rst 
b/man/ganeti-extstorage-interface.rst
index 4b1e0b1..81c03c8 100644
--- a/man/ganeti-extstorage-interface.rst
+++ b/man/ganeti-extstorage-interface.rst
@@ -66,7 +66,6 @@ VOL_METADATA
 EXECUTABLE SCRIPTS
 --
 
-
 create
 ~~
 
@@ -198,7 +197,6 @@ The script should return ``0`` on success.
 TEXT FILES
 --
 
-
 parameters.list
 ~~~
 
@@ -213,6 +211,37 @@ The parameters can then be used during instance add as 
follows::
 
 # gnt-instance add --disk=0:fromsnap="file_name",nas_ip="1.2.3.4" ...
 
+EXAMPLES
+
+
+In the following examples we assume that you have already installed
+successfully two ExtStorage providers: ``pvdr1`` and ``pvdr2``
+
+Add a new instance with a 10G first disk provided by ``pvdr1`` and a 20G
+second disk provided by ``pvdr2``::
+
+# gnt-instance add -t ext --disk=0:size=10G,provider=pvdr1
+  --disk=1:size=20G,provider=pvdr2
+
+Add a new instance with a 5G first disk provided by provider ``pvdr1``
+and also pass the ``prm1``, ``prm2`` parameters to the provider, with
+the corresponding values ``val1``, ``val2``::
+
+   # gnt-instance add -t ext
+  --disk=0:size=5G,provider=pvdr1,prm1=val1,prm2=val2
+
+Modify an existing instance of disk type ``ext`` by adding a new 30G
+disk provided by provider ``pvdr2``::
+
+   # gnt-instance modify --disk 1:add,size=30G,provider=pvdr2 
+
+Modify an existing instance of disk type ``ext`` by adding 2 new disks,
+of different providers, passing one parameter for the first one::
+
+   # gnt-instance modify --disk 2:add,size=3G,provider=pvdr1,prm1=val1
+ --disk 3:add,size=5G,provider=pvdr2
+ 
+
 NOTES
 -
 
diff --git a/man/gnt-instance.rst b/man/gnt-instance.rst
index 1cb945d..99d1d54 100644
--- a/man/gnt-instance.rst
+++ b/man/gnt-instance.rst
@@ -29,6 +29,7 @@ ADD
 | **add**
 | {-t|\--disk-template {diskless | file \| plain \| drbd \| rbd}}
 | {\--disk=*N*: {size=*VAL* \| 
adopt=*LV*}[,vg=*VG*][,metavg=*VG*][,mode=*ro\|rw*]
+|  \| {size=*VAL*,provider=*PROVIDER*}[,param=*value*... ][,mode=*ro\|rw*]
 |  \| {-s|\--os-size} *SIZE*}
 | [\--no-ip-check] [\--no-name-check] [\--no-start] [\--no-install]
 | [\--net=*N* [:options...] \| \--no-nics]
@@ -50,12 +51,20 @@ The ``disk`` option specifies the parameters for the disks 
of the
 instance. The numbering of disks starts at zero, and at least one disk
 needs to be passed. For each disk, either the size or the adoption
 source needs to be given, and optionally the access mode (read-only or
-the default of read-write) and the LVM volume group can also be
-specified (via the ``vg`` key). For DRBD devices, a different VG can
-be specified for the metadata device using the ``metavg`` key.  The
-size is interpreted (when no unit is given) in mebibytes. You can also
-use one of the suffixes *m*, *g* or *t* to specify the exact the u

Re: RFC: ganeti improvements

2013-01-28 Thread Constantinos Venetsanopoulos

On 01/25/2013 08:14 PM, Guido Trotter wrote:




On Fri, Jan 25, 2013 at 5:21 PM, Constantinos Venetsanopoulos 
mailto:c...@grnet.gr>> wrote:


[forwarding to the list too, to continue the discussion]


 Original Message 
Subject:Re: RFC: ganeti improvements
Date:   Fri, 25 Jan 2013 18:06:57 +0200
From:   Constantinos Venetsanopoulos 
<mailto:c...@grnet.gr>
Organization:   Greek Research and Technology Network
To: Guido Trotter 
<mailto:ultrot...@google.com>



On 01/25/2013 03:40 PM, Guido Trotter wrote:
> On Fri, Jan 25, 2013 at 12:52 PM, Constantinos Venetsanopoulos
>  <mailto:c...@grnet.gr>  wrote:
>> Hello all,
>>
>>
>> On 01/25/2013 01:42 PM, Guido Trotter wrote:
>>> On Fri, Jan 25, 2013 at 11:01 AM, Nicolas Sebrecht  
<mailto:nsebre...@piing.fr>
>>> wrote:
>>>> [ X-post to ganeti and ganeti-devel mailing lists.  ]
>>>>
>>>>
>>>> Hi all,
>>>>
>>>> Ganeti is now used in our compagny, internally. We'd like to make it
>>>> full solution ready for our customers but we are missing some critical
>>>> features.
>>>>
>>>> Before going further and starting to play with the code I'd like to
>>>> share my thoughts for comments. The idea is to submit patches in the
>>>> correct direction as early as possible in the development process.
>>>>
>>>> I didn't looked much at the code, yet.
>>>>
>>>>
>>>> 1. Improving the KVM integration.
>>>>
>>>> 1.1. Hibernation.
>>>>
>>>> AFAICT, we can't hibernate a KVM instance from Ganeti. I guess it is
>>>> possible to do with QMP and store the RAM in a dedicated path or LVM
>>>> volume.
>>>>
>>> I believe adding a gnt-instance hibernate (or shutdown --hibernate)
>>> would work well from our side.
>>
>> I agree on that.
>>
>>
>>>> 1.2. CPU Tunning.
>>>>
>>>> - CPU pinning (mask) is supported.
>>>> - CPU SMP is under work.
>>>>
>>> Yes, this is all part of 2.7. Feel free to test it further and let us
>>> know if it works.
>>>
>>>> 2. Snapshots.
>>>>
>>>> The current gnt-backup tool does not fit our needs. Here are some
>>>> limitations I've found (correct me if I'm wrong):
>>>>
>>>> - It relies on os-type (and variants?) which is fine if you have
>>>> homogeneous instances but does not help for a lot of per-instance
>>>> customing.
>>>> BTW, os-interface mix various kind of requirements (creation,
>>>> export/import, rename, verify). It is hard for us to get relevant
>>>> families of instances matching all these kind of requirements.
>>>>
>>> We do want to improve the os area, and are willing of course to
>>> discuss a design and accept patches.
>>> While allowing an OS to do something smarter is good for disk space
>>> and efficiency, we can definitely add a fallback option.
>>
>> Comments below.
>>
>>
>>>> - gnt-backup command does not support more than one export strategy.
>>>>
>>> What export strategies would you like?
>>>
>>>> - The export scripts are too much hard do understand and modify for our
>>>> system administrators. But full instances management must be
>>>> accessible for sysadmins (creation, tunning, backups, etc).
>>>>
>>> They don't seem "very" hard, but I definitely agree to any
>>> simplification, and to making sure these features can run outside the
>>> node os, if possible. (a-la-snf-image, but ganeti driven rather than
>>> os driven, perhaps?)
>>
>> Comments below.
>>
>>
>>>> - Perhaps other limitations I'm not aware?
>>>>
>>> The oses running as root on the node are a limitation if you want
>>> arbitrary users to upload their own.
>>> snf-image solves it for kvm, but it would be good to have ganeti
>>> understand the concept of "boot auxiliary i

Fwd: Re: RFC: ganeti improvements

2013-01-25 Thread Constantinos Venetsanopoulos

[forwarding to the list too, to continue the discussion]


 Original Message 
Subject:Re: RFC: ganeti improvements
Date:   Fri, 25 Jan 2013 18:06:57 +0200
From:   Constantinos Venetsanopoulos 
Organization:   Greek Research and Technology Network
To: Guido Trotter 




On 01/25/2013 03:40 PM, Guido Trotter wrote:

On Fri, Jan 25, 2013 at 12:52 PM, Constantinos Venetsanopoulos
 wrote:

Hello all,


On 01/25/2013 01:42 PM, Guido Trotter wrote:

On Fri, Jan 25, 2013 at 11:01 AM, Nicolas Sebrecht 
wrote:

[ X-post to ganeti and ganeti-devel mailing lists.  ]


Hi all,

Ganeti is now used in our compagny, internally. We'd like to make it
full solution ready for our customers but we are missing some critical
features.

Before going further and starting to play with the code I'd like to
share my thoughts for comments. The idea is to submit patches in the
correct direction as early as possible in the development process.

I didn't looked much at the code, yet.


1. Improving the KVM integration.

1.1. Hibernation.

AFAICT, we can't hibernate a KVM instance from Ganeti. I guess it is
possible to do with QMP and store the RAM in a dedicated path or LVM
volume.


I believe adding a gnt-instance hibernate (or shutdown --hibernate)
would work well from our side.


I agree on that.



1.2. CPU Tunning.

- CPU pinning (mask) is supported.
- CPU SMP is under work.


Yes, this is all part of 2.7. Feel free to test it further and let us
know if it works.


2. Snapshots.

The current gnt-backup tool does not fit our needs. Here are some
limitations I've found (correct me if I'm wrong):

- It relies on os-type (and variants?) which is fine if you have
homogeneous instances but does not help for a lot of per-instance
customing.
BTW, os-interface mix various kind of requirements (creation,
export/import, rename, verify). It is hard for us to get relevant
families of instances matching all these kind of requirements.


We do want to improve the os area, and are willing of course to
discuss a design and accept patches.
While allowing an OS to do something smarter is good for disk space
and efficiency, we can definitely add a fallback option.


Comments below.



- gnt-backup command does not support more than one export strategy.


What export strategies would you like?


- The export scripts are too much hard do understand and modify for our
system administrators. But full instances management must be
accessible for sysadmins (creation, tunning, backups, etc).


They don't seem "very" hard, but I definitely agree to any
simplification, and to making sure these features can run outside the
node os, if possible. (a-la-snf-image, but ganeti driven rather than
os driven, perhaps?)


Comments below.



- Perhaps other limitations I'm not aware?


The oses running as root on the node are a limitation if you want
arbitrary users to upload their own.
snf-image solves it for kvm, but it would be good to have ganeti
understand the concept of "boot auxiliary image to drive os
installation and other events".


I believe it might worth starting another tool from scratch with a new
design. I'm not saying it would have to replace the current
gnt-backup. I'm more suggesting something like 'gnt-snapshot' or
whatever at the side. Any thoughts on this?


I believe having gnt-backup snapshot would be as good. I don't see a
particular reason for another extra command (sometimes I wonder why it
should be gnt-backup at all, rather than subcommands of gnt-instance).


Comments below.



3. Clone.

Clonning an instance looks not supported.


Marco's idea sounds good, on this. Why not gnt-backup clone that does
this via export+import? (if not somehow in a smarter way?)


The ability for clones and snapshots is a very hot issue, but from our
experience,
it's not that easy to tackle, if you want to do it correctly. I believe that
it
should not be confused with the OS definitions level and the gnt-backup
functionality. Some food for thought:

Currently, we are able to do full clones and snapshots using a custom
ExtStorage
provider. This provider speaks with our custom distributed storage layer
which
undertakes the actual operation. So, to create an instance by cloning an
existing
image disk known by the external system, we run:

gnt-instance add -t ext -o snf-image+default  ...
  --disk
0:size=10G,origin="disk-UUID-known-to-the-external-system"

This way "origin" is passed to the ExtStorage provider as a parameter. The
provider knows from where and how to clone the disk transparently to
Ganeti. After that, the OS definition (snf-image) operates on a top of the
disk
and does all the customization (sets passwords, resizes the filesystem,
etc).

However, we believe (and after a live talk with Iustin) that it would be
nice
to have Ganeti know about the concept of clo

Re: RFC: ganeti improvements

2013-01-25 Thread Constantinos Venetsanopoulos

Hello all,

On 01/25/2013 01:42 PM, Guido Trotter wrote:

On Fri, Jan 25, 2013 at 11:01 AM, Nicolas Sebrecht  wrote:

[ X-post to ganeti and ganeti-devel mailing lists.  ]


   Hi all,

Ganeti is now used in our compagny, internally. We'd like to make it
full solution ready for our customers but we are missing some critical
features.

Before going further and starting to play with the code I'd like to
share my thoughts for comments. The idea is to submit patches in the
correct direction as early as possible in the development process.

I didn't looked much at the code, yet.


1. Improving the KVM integration.

1.1. Hibernation.

AFAICT, we can't hibernate a KVM instance from Ganeti. I guess it is
possible to do with QMP and store the RAM in a dedicated path or LVM
volume.


I believe adding a gnt-instance hibernate (or shutdown --hibernate)
would work well from our side.


I agree on that.


1.2. CPU Tunning.

- CPU pinning (mask) is supported.
- CPU SMP is under work.


Yes, this is all part of 2.7. Feel free to test it further and let us
know if it works.


2. Snapshots.

The current gnt-backup tool does not fit our needs. Here are some
limitations I've found (correct me if I'm wrong):

- It relies on os-type (and variants?) which is fine if you have
   homogeneous instances but does not help for a lot of per-instance
   customing.
   BTW, os-interface mix various kind of requirements (creation,
   export/import, rename, verify). It is hard for us to get relevant
   families of instances matching all these kind of requirements.


We do want to improve the os area, and are willing of course to
discuss a design and accept patches.
While allowing an OS to do something smarter is good for disk space
and efficiency, we can definitely add a fallback option.


Comments below.


- gnt-backup command does not support more than one export strategy.


What export strategies would you like?


- The export scripts are too much hard do understand and modify for our
   system administrators. But full instances management must be
   accessible for sysadmins (creation, tunning, backups, etc).


They don't seem "very" hard, but I definitely agree to any
simplification, and to making sure these features can run outside the
node os, if possible. (a-la-snf-image, but ganeti driven rather than
os driven, perhaps?)


Comments below.


- Perhaps other limitations I'm not aware?


The oses running as root on the node are a limitation if you want
arbitrary users to upload their own.
snf-image solves it for kvm, but it would be good to have ganeti
understand the concept of "boot auxiliary image to drive os
installation and other events".


I believe it might worth starting another tool from scratch with a new
design. I'm not saying it would have to replace the current
gnt-backup. I'm more suggesting something like 'gnt-snapshot' or
whatever at the side. Any thoughts on this?


I believe having gnt-backup snapshot would be as good. I don't see a
particular reason for another extra command (sometimes I wonder why it
should be gnt-backup at all, rather than subcommands of gnt-instance).


Comments below.


3. Clone.

Clonning an instance looks not supported.


Marco's idea sounds good, on this. Why not gnt-backup clone that does
this via export+import? (if not somehow in a smarter way?)


The ability for clones and snapshots is a very hot issue, but from our 
experience,
it's not that easy to tackle, if you want to do it correctly. I believe that it
should not be confused with the OS definitions level and the gnt-backup
functionality. Some food for thought:

Currently, we are able to do full clones and snapshots using a custom ExtStorage
provider. This provider speaks with our custom distributed storage layer which
undertakes the actual operation. So, to create an instance by cloning an 
existing
image disk known by the external system, we run:

gnt-instance add -t ext -o snf-image+default  ...
 --disk 
0:size=10G,origin="disk-UUID-known-to-the-external-system"

This way "origin" is passed to the ExtStorage provider as a parameter. The
provider knows from where and how to clone the disk transparently to
Ganeti. After that, the OS definition (snf-image) operates on a top of the disk
and does all the customization (sets passwords, resizes the filesystem, etc).

However, we believe (and after a live talk with Iustin) that it would be nice
to have Ganeti know about the concept of cloning and snapshotting. But to
do so, I believe we would need the following:

* Disk objects should become logically detached from instances, obtain their
  own UUIDs (as has happened with networks and nodegroups) and exist
  independently of the instances. One should be able to have disks that are
  not attached to any instance.

* The cloning and snapshotting functionality should be implemented at the
  bdev.py   level as separate functions for every template and not at the
  OS level.

The above would allow for very neat things apart from c

Re: [PATCH master] Run pre-migrate hooks on primary node too

2013-01-18 Thread Constantinos Venetsanopoulos

Hello Guido,

Is that OK with you?

Thanks,
Constantinos

On 01/17/2013 05:49 PM, Constantinos Venetsanopoulos wrote:

Signed-off-by: Constantinos Venetsanopoulos 
---
  lib/cmdlib.py |5 +++--
  1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/cmdlib.py b/lib/cmdlib.py
index d97795c..cf4d902 100644
--- a/lib/cmdlib.py
+++ b/lib/cmdlib.py
@@ -8306,8 +8306,9 @@ class LUInstanceMigrate(LogicalUnit):
  
  """

  instance = self._migrater.instance
-nl = [self.cfg.GetMasterNode()] + list(instance.secondary_nodes)
-return (nl, nl + [instance.primary_node])
+snodes = list(instance.secondary_nodes)
+nl = [self.cfg.GetMasterNode(), instance.primary_node] + snodes
+return (nl, nl)
  
  
  class LUInstanceMove(LogicalUnit):




Re: why pre migration hooks don't run on the primary node?

2013-01-18 Thread Constantinos Venetsanopoulos

On 01/17/2013 07:34 PM, Guido Trotter wrote:

The question is: what exacly of the logical id do you need?
Is it the block device? Or something else? And couldn't you just
export that, instead of the logical id?


It is the volume's name e.g. '.disk0'. That's what I do
now, but for now I just export the "second" part of the tuple.
It works for me since I'm only using one specific template
and know it's format, but it would be nice if we could do it
generically.

Thanks,
Constantinos


thanks,

Guido


On Thu, Jan 17, 2013 at 4:36 PM, Iustin Pop  wrote:

On Thu, Jan 17, 2013 at 04:47:59PM +0200, Constantinos Venetsanopoulos wrote:

On 01/17/2013 04:23 PM, Iustin Pop wrote:

On Thu, Jan 17, 2013 at 04:10:38PM +0200, Constantinos Venetsanopoulos wrote:

On 01/17/2013 03:55 PM, Iustin Pop wrote:

On Thu, Jan 17, 2013 at 03:48:56PM +0200, Constantinos Venetsanopoulos wrote:

On 01/17/2013 03:44 PM, Iustin Pop wrote:

On Thu, Jan 17, 2013 at 03:37:38PM +0200, Constantinos Venetsanopoulos wrote:

On 01/17/2013 02:28 PM, Iustin Pop wrote:

On Thu, Jan 17, 2013 at 01:53:51PM +0200, Constantinos Venetsanopoulos wrote:

Hello team,

Is there any particular reason why migration post hooks are run on
master+primary+secondary nodes whereas pre hooks are only run
on master+secondary?

I'm asking because we need to patch Ganeti to run pre migration
hooks on the primary node too and I'm curious if this causes
any trouble that is not so obvious.

It seems that the original live migration code was implemented in
83dcb007, without any mentions of why we only run in on the secondary.
We had other patches that extended on which nodes we run hooks, so I
would guess that for migration, we only run it on the secondary node
because we couldn't think of any reason the source node would refuse the
outgoing migration.

For failover there are reasons not to run it, but not for migration.


If not, I think it would make sense to run also pre migrate hooks on
the primary node too. Thoughts?

Sounds good. What reason do you have for this, I'm curious?

We are testing Ganeti with a custom distributed storage layer called
Archipelago which allows for repetitive cloning and snapshotting
independently of the backing storage technology. So we need to make
sure we have some Archipelago-related locks on the primary node
before migrating the instance.

Ah, I see.


So, should I send a patch or you will take care of it? It's actually a
oneliner patch (in BuildHooksNodes of LUInstanceMigrate).

Please send a patch :)


OK


And something else that is relative.

We also need the disk's logical id to be exported to hooks. Currently,
only 'size', 'mode' and 'disk_count' are exported. For our purpose,
I have already patched the code to export the first two values of the
tuple with dummy names, but it would be best if we found a way to
export it uniformly for all disk templates. Would you have something
to propose here?

This seems a bit of a stretch - the logical ID is quite internal, so
exposing it to hooks… hmm… Not sure right now. We can go cheap and just
export the elements, separated by spaces, but…

What do you mean export the elements separated by spaces?
I'm not sure I get that.

Well, for plain LVM the logical_id is the pair (vg_name, lv_name). We
could hust export DISK_3_LID="xenvg x.data", but it's not a good
thing. Exporting DISK_3_LID1="xenvg", DISK_3_LID2=".data" is also a
bit cheating…

OK, I get you now. Yes, that's exactly why I haven't proposed the
way I have implemented on our setup, because its the same as
yours second one. And I also believe it is not clean at all.

To be honest we have had some internal discussions (no code
yet) in which we were thinking of proposing the UUID model
for disks too. So that Disks are independent objects with their
UUIDs and not tied on the instance, just like node groups,
or networks. Then we could solve all the above problems
(plus others we can also discuss) using just the UUIDs.

The instance object would only have a reference to the disk's
UUID and not contain the whole disk object in config.data

What would you think of that approach?

That sounds good, but is somewhat orthogonal. (It would solve many other
problems, but unrelated).

Of course. I just mentioned it, because then we could just export
the UUID or name of the disk. But that is another conversation :)

Ack. One which we should have, at one point!


The question still remains - how do you export
in shell-format (as opposed to JSON) the disk properties, which vary
across different disk types?

That's indeed a tricky problem.
The best thing I can think of, to bypass the problem would be
to export the number of the tuple contents and the contents
indexed with their numbers, much like you proposed. So we would
have something like:

DISK0_LID_COUNT="2"
DI

[PATCH master] Run pre-migrate hooks on primary node too

2013-01-17 Thread Constantinos Venetsanopoulos
Signed-off-by: Constantinos Venetsanopoulos 
---
 lib/cmdlib.py |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/cmdlib.py b/lib/cmdlib.py
index d97795c..cf4d902 100644
--- a/lib/cmdlib.py
+++ b/lib/cmdlib.py
@@ -8306,8 +8306,9 @@ class LUInstanceMigrate(LogicalUnit):
 
 """
 instance = self._migrater.instance
-nl = [self.cfg.GetMasterNode()] + list(instance.secondary_nodes)
-return (nl, nl + [instance.primary_node])
+snodes = list(instance.secondary_nodes)
+nl = [self.cfg.GetMasterNode(), instance.primary_node] + snodes
+return (nl, nl)
 
 
 class LUInstanceMove(LogicalUnit):
-- 
1.7.2.5



Re: why pre migration hooks don't run on the primary node?

2013-01-17 Thread Constantinos Venetsanopoulos

On 01/17/2013 04:23 PM, Iustin Pop wrote:

On Thu, Jan 17, 2013 at 04:10:38PM +0200, Constantinos Venetsanopoulos wrote:

On 01/17/2013 03:55 PM, Iustin Pop wrote:

On Thu, Jan 17, 2013 at 03:48:56PM +0200, Constantinos Venetsanopoulos wrote:

On 01/17/2013 03:44 PM, Iustin Pop wrote:

On Thu, Jan 17, 2013 at 03:37:38PM +0200, Constantinos Venetsanopoulos wrote:

On 01/17/2013 02:28 PM, Iustin Pop wrote:

On Thu, Jan 17, 2013 at 01:53:51PM +0200, Constantinos Venetsanopoulos wrote:

Hello team,

Is there any particular reason why migration post hooks are run on
master+primary+secondary nodes whereas pre hooks are only run
on master+secondary?

I'm asking because we need to patch Ganeti to run pre migration
hooks on the primary node too and I'm curious if this causes
any trouble that is not so obvious.

It seems that the original live migration code was implemented in
83dcb007, without any mentions of why we only run in on the secondary.
We had other patches that extended on which nodes we run hooks, so I
would guess that for migration, we only run it on the secondary node
because we couldn't think of any reason the source node would refuse the
outgoing migration.

For failover there are reasons not to run it, but not for migration.


If not, I think it would make sense to run also pre migrate hooks on
the primary node too. Thoughts?

Sounds good. What reason do you have for this, I'm curious?

We are testing Ganeti with a custom distributed storage layer called
Archipelago which allows for repetitive cloning and snapshotting
independently of the backing storage technology. So we need to make
sure we have some Archipelago-related locks on the primary node
before migrating the instance.

Ah, I see.


So, should I send a patch or you will take care of it? It's actually a
oneliner patch (in BuildHooksNodes of LUInstanceMigrate).

Please send a patch :)


OK


And something else that is relative.

We also need the disk's logical id to be exported to hooks. Currently,
only 'size', 'mode' and 'disk_count' are exported. For our purpose,
I have already patched the code to export the first two values of the
tuple with dummy names, but it would be best if we found a way to
export it uniformly for all disk templates. Would you have something
to propose here?

This seems a bit of a stretch - the logical ID is quite internal, so
exposing it to hooks… hmm… Not sure right now. We can go cheap and just
export the elements, separated by spaces, but…

What do you mean export the elements separated by spaces?
I'm not sure I get that.

Well, for plain LVM the logical_id is the pair (vg_name, lv_name). We
could hust export DISK_3_LID="xenvg x.data", but it's not a good
thing. Exporting DISK_3_LID1="xenvg", DISK_3_LID2=".data" is also a
bit cheating…

OK, I get you now. Yes, that's exactly why I haven't proposed the
way I have implemented on our setup, because its the same as
yours second one. And I also believe it is not clean at all.

To be honest we have had some internal discussions (no code
yet) in which we were thinking of proposing the UUID model
for disks too. So that Disks are independent objects with their
UUIDs and not tied on the instance, just like node groups,
or networks. Then we could solve all the above problems
(plus others we can also discuss) using just the UUIDs.

The instance object would only have a reference to the disk's
UUID and not contain the whole disk object in config.data

What would you think of that approach?

That sounds good, but is somewhat orthogonal. (It would solve many other
problems, but unrelated).


Of course. I just mentioned it, because then we could just export
the UUID or name of the disk. But that is another conversation :)


The question still remains - how do you export
in shell-format (as opposed to JSON) the disk properties, which vary
across different disk types?


That's indeed a tricky problem.
The best thing I can think of, to bypass the problem would be
to export the number of the tuple contents and the contents
indexed with their numbers, much like you proposed. So we would
have something like:

DISK0_LID_COUNT="2"
DISK0_LID1="xenvg"
DISK0_LID2=".data"

I'm not sure. Do all the tuples have two values for all disk templates?
What is the format for drbd?

Constantinos


Re: [PATCH master] Run pre-migrate hooks on primary node too

2013-01-17 Thread Constantinos Venetsanopoulos

Hi Guido,

On 01/17/2013 04:36 PM, Guido Trotter wrote:


Wouldn't it be better to just change the nl variable, rather than 
adding the primary twice? What do other LUs with hooks on master,pnode 
and snodes do?




I also thought about that, but there is no pretty way to do it,
because you would have to split the above line. I checked other
LUs, but each one does it differently.

Thanks,
Consantinos


Thanks,

Guido

On 17 Jan 2013 15:24, "Constantinos Venetsanopoulos" <mailto:c...@grnet.gr>> wrote:


Signed-off-by: Constantinos Venetsanopoulos mailto:c...@grnet.gr>>
---
 lib/cmdlib.py |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/lib/cmdlib.py b/lib/cmdlib.py
index d97795c..54ed10b 100644
--- a/lib/cmdlib.py
+++ b/lib/cmdlib.py
@@ -8307,7 +8307,7 @@ class LUInstanceMigrate(LogicalUnit):
 """
 instance = self._migrater.instance
 nl = [self.cfg.GetMasterNode()] + list(instance.secondary_nodes)
-return (nl, nl + [instance.primary_node])
+return (nl + [instance.primary_node], nl +
[instance.primary_node])


 class LUInstanceMove(LogicalUnit):
--
1.7.2.5





[PATCH master] Run pre-migrate hooks on primary node too

2013-01-17 Thread Constantinos Venetsanopoulos
Signed-off-by: Constantinos Venetsanopoulos 
---
 lib/cmdlib.py |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/lib/cmdlib.py b/lib/cmdlib.py
index d97795c..54ed10b 100644
--- a/lib/cmdlib.py
+++ b/lib/cmdlib.py
@@ -8307,7 +8307,7 @@ class LUInstanceMigrate(LogicalUnit):
 """
 instance = self._migrater.instance
 nl = [self.cfg.GetMasterNode()] + list(instance.secondary_nodes)
-return (nl, nl + [instance.primary_node])
+return (nl + [instance.primary_node], nl + [instance.primary_node])
 
 
 class LUInstanceMove(LogicalUnit):
-- 
1.7.2.5



Re: why pre migration hooks don't run on the primary node?

2013-01-17 Thread Constantinos Venetsanopoulos

On 01/17/2013 03:55 PM, Iustin Pop wrote:

On Thu, Jan 17, 2013 at 03:48:56PM +0200, Constantinos Venetsanopoulos wrote:

On 01/17/2013 03:44 PM, Iustin Pop wrote:

On Thu, Jan 17, 2013 at 03:37:38PM +0200, Constantinos Venetsanopoulos wrote:

On 01/17/2013 02:28 PM, Iustin Pop wrote:

On Thu, Jan 17, 2013 at 01:53:51PM +0200, Constantinos Venetsanopoulos wrote:

Hello team,

Is there any particular reason why migration post hooks are run on
master+primary+secondary nodes whereas pre hooks are only run
on master+secondary?

I'm asking because we need to patch Ganeti to run pre migration
hooks on the primary node too and I'm curious if this causes
any trouble that is not so obvious.

It seems that the original live migration code was implemented in
83dcb007, without any mentions of why we only run in on the secondary.
We had other patches that extended on which nodes we run hooks, so I
would guess that for migration, we only run it on the secondary node
because we couldn't think of any reason the source node would refuse the
outgoing migration.

For failover there are reasons not to run it, but not for migration.


If not, I think it would make sense to run also pre migrate hooks on
the primary node too. Thoughts?

Sounds good. What reason do you have for this, I'm curious?

We are testing Ganeti with a custom distributed storage layer called
Archipelago which allows for repetitive cloning and snapshotting
independently of the backing storage technology. So we need to make
sure we have some Archipelago-related locks on the primary node
before migrating the instance.

Ah, I see.


So, should I send a patch or you will take care of it? It's actually a
oneliner patch (in BuildHooksNodes of LUInstanceMigrate).

Please send a patch :)


OK


And something else that is relative.

We also need the disk's logical id to be exported to hooks. Currently,
only 'size', 'mode' and 'disk_count' are exported. For our purpose,
I have already patched the code to export the first two values of the
tuple with dummy names, but it would be best if we found a way to
export it uniformly for all disk templates. Would you have something
to propose here?

This seems a bit of a stretch - the logical ID is quite internal, so
exposing it to hooks… hmm… Not sure right now. We can go cheap and just
export the elements, separated by spaces, but…

What do you mean export the elements separated by spaces?
I'm not sure I get that.

Well, for plain LVM the logical_id is the pair (vg_name, lv_name). We
could hust export DISK_3_LID="xenvg x.data", but it's not a good
thing. Exporting DISK_3_LID1="xenvg", DISK_3_LID2=".data" is also a
bit cheating…


OK, I get you now. Yes, that's exactly why I haven't proposed the
way I have implemented on our setup, because its the same as
yours second one. And I also believe it is not clean at all.

To be honest we have had some internal discussions (no code
yet) in which we were thinking of proposing the UUID model
for disks too. So that Disks are independent objects with their
UUIDs and not tied on the instance, just like node groups,
or networks. Then we could solve all the above problems
(plus others we can also discuss) using just the UUIDs.

The instance object would only have a reference to the disk's
UUID and not contain the whole disk object in config.data

What would you think of that approach?

Constantinos


Re: why pre migration hooks don't run on the primary node?

2013-01-17 Thread Constantinos Venetsanopoulos

On 01/17/2013 03:44 PM, Iustin Pop wrote:

On Thu, Jan 17, 2013 at 03:37:38PM +0200, Constantinos Venetsanopoulos wrote:

On 01/17/2013 02:28 PM, Iustin Pop wrote:

On Thu, Jan 17, 2013 at 01:53:51PM +0200, Constantinos Venetsanopoulos wrote:

Hello team,

Is there any particular reason why migration post hooks are run on
master+primary+secondary nodes whereas pre hooks are only run
on master+secondary?

I'm asking because we need to patch Ganeti to run pre migration
hooks on the primary node too and I'm curious if this causes
any trouble that is not so obvious.

It seems that the original live migration code was implemented in
83dcb007, without any mentions of why we only run in on the secondary.
We had other patches that extended on which nodes we run hooks, so I
would guess that for migration, we only run it on the secondary node
because we couldn't think of any reason the source node would refuse the
outgoing migration.

For failover there are reasons not to run it, but not for migration.


If not, I think it would make sense to run also pre migrate hooks on
the primary node too. Thoughts?

Sounds good. What reason do you have for this, I'm curious?

We are testing Ganeti with a custom distributed storage layer called
Archipelago which allows for repetitive cloning and snapshotting
independently of the backing storage technology. So we need to make
sure we have some Archipelago-related locks on the primary node
before migrating the instance.

Ah, I see.


So, should I send a patch or you will take care of it? It's actually a
oneliner patch (in BuildHooksNodes of LUInstanceMigrate).

Please send a patch :)



OK


And something else that is relative.

We also need the disk's logical id to be exported to hooks. Currently,
only 'size', 'mode' and 'disk_count' are exported. For our purpose,
I have already patched the code to export the first two values of the
tuple with dummy names, but it would be best if we found a way to
export it uniformly for all disk templates. Would you have something
to propose here?

This seems a bit of a stretch - the logical ID is quite internal, so
exposing it to hooks… hmm… Not sure right now. We can go cheap and just
export the elements, separated by spaces, but…


What do you mean export the elements separated by spaces?
I'm not sure I get that.

Constantinos


Re: why pre migration hooks don't run on the primary node?

2013-01-17 Thread Constantinos Venetsanopoulos

On 01/17/2013 02:28 PM, Iustin Pop wrote:

On Thu, Jan 17, 2013 at 01:53:51PM +0200, Constantinos Venetsanopoulos wrote:

Hello team,

Is there any particular reason why migration post hooks are run on
master+primary+secondary nodes whereas pre hooks are only run
on master+secondary?

I'm asking because we need to patch Ganeti to run pre migration
hooks on the primary node too and I'm curious if this causes
any trouble that is not so obvious.

It seems that the original live migration code was implemented in
83dcb007, without any mentions of why we only run in on the secondary.
We had other patches that extended on which nodes we run hooks, so I
would guess that for migration, we only run it on the secondary node
because we couldn't think of any reason the source node would refuse the
outgoing migration.

For failover there are reasons not to run it, but not for migration.


If not, I think it would make sense to run also pre migrate hooks on
the primary node too. Thoughts?

Sounds good. What reason do you have for this, I'm curious?


We are testing Ganeti with a custom distributed storage layer called
Archipelago which allows for repetitive cloning and snapshotting
independently of the backing storage technology. So we need to make
sure we have some Archipelago-related locks on the primary node
before migrating the instance.

So, should I send a patch or you will take care of it? It's actually a
oneliner patch (in BuildHooksNodes of LUInstanceMigrate).

And something else that is relative.

We also need the disk's logical id to be exported to hooks. Currently,
only 'size', 'mode' and 'disk_count' are exported. For our purpose,
I have already patched the code to export the first two values of the
tuple with dummy names, but it would be best if we found a way to
export it uniformly for all disk templates. Would you have something
to propose here?

thanks,
Constantinos


why pre migration hooks don't run on the primary node?

2013-01-17 Thread Constantinos Venetsanopoulos

Hello team,

Is there any particular reason why migration post hooks are run on
master+primary+secondary nodes whereas pre hooks are only run
on master+secondary?

I'm asking because we need to patch Ganeti to run pre migration
hooks on the primary node too and I'm curious if this causes
any trouble that is not so obvious.

If not, I think it would make sense to run also pre migrate hooks on
the primary node too. Thoughts?

Thanks,
Constantinos


Re: [PATCH master 1/2] burnin: Factorize disk template lists

2013-01-15 Thread Constantinos Venetsanopoulos

Hello team,

On 01/15/2013 01:18 PM, Michael Hanselmann wrote:

2013/1/15 Guido Trotter :

On Tue, Jan 15, 2013 at 12:11 PM, Michael Hanselmann  wrote:

+#: Disk templates supporting a single node
+_SINGLE_NODE_DISK_TEMPLATES = compat.UniqueFrozenset([
+  constants.DT_DISKLESS,
+  constants.DT_PLAIN,
+  constants.DT_FILE,
+  constants.DT_SHARED_FILE,
+  ])

Should RBD be here? After all it should be possible to use it from a
single node.

Maybe, but adding it belongs to another patch. Please confirm it
actually works (“should” is not certain enough) and then send a patch.


FYI,

DT_RBD and DT_EXT work on a single node cluster.

Regards,
Constantinos


Re: [PATCH master] Document the ExtStorage `SetInfo' functionality

2012-12-28 Thread Constantinos Venetsanopoulos

On 12/28/2012 05:53 PM, Iustin Pop wrote:

On Fri, Dec 28, 2012 at 04:19:00PM +0200, Constantinos Venetsanopoulos wrote:

Signed-off-by: Constantinos Venetsanopoulos 
---
  doc/design-shared-storage.rst   |   20 ++--
  man/ganeti-extstorage-interface.rst |   31 +++
  2 files changed, 41 insertions(+), 10 deletions(-)

-The ExtStorage Interface was introduced in Ganeti 2.6.
-Ganeti 2.6 and up is compatible with the ExtStorage Interface.
+The ExtStorage Interface was introduced in Ganeti 2.7.
+Ganeti 2.7 and up is compatible with the ExtStorage Interface.

Heh, I was wondering about that when pushing the initial design, but I
thought maybe some bits went in before and I lost track :)


No. Nothing went in in 2.6 wrt ExtStorage. It was just that the discussions
had started early this year and we initially thought that it would make
it in 2.6, hence the reference in the design doc.

Thanks,
Constantinos


[PATCH master] gnt-storage man page: make commands sub-sections

2012-12-28 Thread Constantinos Venetsanopoulos
Signed-off-by: Constantinos Venetsanopoulos 
---

Hello team,

During the review of the ExtStorage patch series Iustin commented that
the model for gnt-* man pages proposes commands to appear as
sub-sections rather than just bold-ed.

However, I see that the gnt-os man page doesn't comply with this model
either. Commands are bold-ed there, so that was probably why I chose
that for gnt-storage too.

In either way, this patch makes commands appear as sub-sections, so
depending which way you want to go, you can apply this patch or not.

Kind Regards,
Constantinos

 man/gnt-storage.rst |   12 +---
 1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/man/gnt-storage.rst b/man/gnt-storage.rst
index 9fb2325..805ec44 100644
--- a/man/gnt-storage.rst
+++ b/man/gnt-storage.rst
@@ -21,8 +21,10 @@ the Ganeti cluster. At the moment, it manages only external 
storage
 COMMANDS
 
 
+DIAGNOSE
+
 
-**diagnose**
+| **diagnose**
 
 This command provides detailed information about the state of all
 ExtStorage providers available in the Ganeti cluster. The state of each
@@ -34,7 +36,11 @@ missing from a node, or is only partially installed. This 
command will
 show the details of all ExtStorage providers and the reasons they are or
 aren't valid for every nodegroup in the cluster.
 
-**info**
+INFO
+
+
+| **info**
+| [*provider*]
 
 This command will list detailed information about each ExtStorage
 provider found in the cluster, including its nodegroup validity, the
@@ -54,7 +60,7 @@ storage (such as lvm, drbd, etc) and also provide diagnostics 
for them
 too.
 
 It can also be extended to handle internal and external storage pools,
-if/when this kind of abstraction is implemented to Ganeti.
+if/when this kind of abstraction is implemented inside Ganeti.
 
 .. vim: set textwidth=72 :
 .. Local Variables:
-- 
1.7.2.5



[PATCH master] Document the ExtStorage `SetInfo' functionality

2012-12-28 Thread Constantinos Venetsanopoulos
Signed-off-by: Constantinos Venetsanopoulos 
---
 doc/design-shared-storage.rst   |   20 ++--
 man/ganeti-extstorage-interface.rst |   31 +++
 2 files changed, 41 insertions(+), 10 deletions(-)

diff --git a/doc/design-shared-storage.rst b/doc/design-shared-storage.rst
index 3628f1a..05e91c0 100644
--- a/doc/design-shared-storage.rst
+++ b/doc/design-shared-storage.rst
@@ -22,6 +22,7 @@ This includes two distinct disk templates:
 
 Background
 ==
+
 DRBD is currently the only shared storage backend supported by Ganeti.
 DRBD offers the advantages of high availability while running on
 commodity hardware at the cost of high network I/O for block-level
@@ -45,6 +46,7 @@ need to take care about the mirroring process from one host 
to another.
 
 Use cases
 =
+
 We consider the following use cases:
 
 - A virtualization cluster with FibreChannel shared storage, mapping at
@@ -63,9 +65,9 @@ The design addresses the following procedures:
   storage.
 - Introduction of a shared file storage disk template for use with networked
   filesystems.
-- Introduction of shared block device disk template with device
+- Introduction of a shared block device disk template with device
   adoption.
-- Introduction of an External Storage Interface.
+- Introduction of the External Storage Interface.
 
 Additionally, mid- to long-term goals include:
 
@@ -156,8 +158,9 @@ The shared block device template will make the following 
assumptions:
 - The device will be available with the same path under all nodes in the
   node group.
 
-Introduction of an External Storage Interface
+Introduction of the External Storage Interface
 ==
+
 Overview
 
 
@@ -180,6 +183,7 @@ An “ExtStorage provider” will have to provide the following 
methods:
 - Grow a disk
 - Attach a disk to a given node
 - Detach a disk from a given node
+- SetInfo to a disk (add metadata)
 - Verify its supported parameters
 
 The proposed ExtStorage interface borrows heavily from the OS
@@ -191,6 +195,7 @@ provider is expected to provide the following scripts:
 - ``grow``
 - ``attach``
 - ``detach``
+- ``setinfo``
 - ``verify``
 
 All scripts will be called with no arguments and get their input via
@@ -208,6 +213,9 @@ all commands, and some of them might have extra ones.
 ``EXTP_name``
   ExtStorage parameter, where `name` is the parameter in
   upper-case (same as OS interface's ``OSP_*`` parameters).
+``VOL_METADATA``
+  A string containing metadata to be set for the volume.
+  This is exported only to the ``setinfo`` script.
 
 All scripts except `attach` should return 0 on success and non-zero on
 error, accompanied by an appropriate error message on stderr. The
@@ -221,9 +229,9 @@ Implementation
 To support the ExtStorage interface, we will introduce a new disk
 template called `ext`. This template will implement the existing Ganeti
 disk interface in `lib/bdev.py` (create, remove, attach, assemble,
-shutdown, grow), and will simultaneously pass control to the external
-scripts to actually handle the above actions. The `ext` disk template
-will act as a translation layer between the current Ganeti disk
+shutdown, grow, setinfo), and will simultaneously pass control to the
+external scripts to actually handle the above actions. The `ext` disk
+template will act as a translation layer between the current Ganeti disk
 interface and the ExtStorage providers.
 
 We will also introduce a new IDISK_PARAM called `IDISK_PROVIDER =
diff --git a/man/ganeti-extstorage-interface.rst 
b/man/ganeti-extstorage-interface.rst
index 8cf0f85..4b1e0b1 100644
--- a/man/ganeti-extstorage-interface.rst
+++ b/man/ganeti-extstorage-interface.rst
@@ -21,8 +21,9 @@ also be accessible by all nodes of this nodegroup too.
 REFERENCE
 -
 
-There are seven required files: *create*, *attach*, *detach*, *remove*,
-*grow*, *verify* (executables) and *parameters.list* (text file).
+There are eight required files: *create*, *attach*, *detach*, *remove*,
+*grow*, *setinfo*, *verify* (executables) and *parameters.list*
+(text file).
 
 Common environment
 ~~
@@ -57,6 +58,11 @@ EXTP_*name*
 variable, prefixed with ``EXTP_``, and upper-cased. For example, a
 ``fromsnap`` parameter will be exported as ``EXTP_FROMSNAP``.
 
+VOL_METADATA
+Available only to the **setinfo** script. A string containing
+metadata to be associated with the volume. Currently, Ganeti sets
+this value to ``originstname+X`` where ``X`` is the instance's name.
+
 EXECUTABLE SCRIPTS
 --
 
@@ -159,6 +165,23 @@ details on when the running instance becomes aware of its 
grown disk.
 
 The script returns ``0`` on success.
 
+setinfo
+~~~
+
+This script is used to add metadata to an existing volume. It is helpful
+when we need to keep an external, Ganeti-independent mapping between
+instances and volumes; primarily for recovery reasons. This is provider

Re: [PATCH master 7/7] Add ExtStorage related man pages

2012-12-22 Thread Constantinos Venetsanopoulos

On 12/22/2012 12:24 PM, Iustin Pop wrote:

On Thu, Dec 20, 2012 at 03:26:57PM +0100, Iustin Pop wrote:

On Fri, Dec 14, 2012 at 03:50:11PM +0200, Constantinos Venetsanopoulos wrote:

  * ganeti-extstorage-interface man page
  * gnt-storage man page
+gnt-storage(8) Ganeti | Version @GANETI_VERSION@
+
+
+Name
+
+
+gnt-storage - Ganeti storage administration
+
+Synopsis
+
+
+**gnt-storage** {command} [arguments...]
+
+DESCRIPTION
+---
+
+The **gnt-storage** is used for managing the available storage inside
+the Ganeti cluster. At the moment, it manages only external storage
+(ExtStorage).
+
+COMMANDS
+
+
+
+**diagnose**
+
+This command provides detailed information about the state of all
+ExtStorage providers available in the Ganeti cluster. The state of each
+provider is calculated per nodegroup. This means that a provider may be
+valid (meaning usable) for some nodegroups, and invalid (not usable) for
+some others. This command will help you see why an installed ExtStorage
+provider is not valid for a specific nodegroup. It could be that it is
+missing from a node, or is only partially installed. This command will
+show the details of all ExtStorage providers and the reasons they are or
+aren't valid for every nodegroup in the cluster.
+
+**info**
+
+This command will list detailed information about each ExtStorage
+provider found in the cluster, including its nodegroup validity, the
+supported parameters (if any) and their documentations, etc.
+
+For each ExtStorage provider only the valid nodegroups will be listed.
+
+If run with no arguments, it will display info for all ExtStorage
+providers found in the cluster. If given ExtStorage provider's names as
+arguments it will list info only for providers given.

The commands here do not follow the model in other gnt-* man pages (they
are bold-ed, not sub-sections). I'll commit with this changed, once 5/7
is in.

I committed as is for now. This concludes, if I count correctly, the
merge of the ext-storage patch series.


Yes. With this one, the ExtStorage patch series is completely merged.
To wrap up, I see the following pending issues after our whole discussion:

* a patch concerning the query thing of 5/7
* a patch which adds the SetInfo functionality to the design doc and the 
man pages
* a patch concerning merge of logfiles after we see how we'll handle the 
'attach' logfile


I hope I'll come back with patches for at least the first two issues, 
after Christmas.


Thanks for the review,
Have a nice holiday and merry Christmas,
Constantinos

thanks,
iustin




Re: [PATCH master 5/7] Add the gnt-storage client

2012-12-22 Thread Constantinos Venetsanopoulos

On 12/22/2012 12:24 PM, Iustin Pop wrote:

On Fri, Dec 21, 2012 at 04:34:57PM +0200, Constantinos Venetsanopoulos wrote:

On 12/20/2012 11:52 PM, Iustin Pop wrote:

On Thu, Dec 20, 2012 at 09:55:10PM +0200, Constantinos Venetsanopoulos wrote:

On 12/20/2012 7:28 PM, Iustin Pop wrote:

On Thu, Dec 20, 2012 at 07:18:39PM +0200, Constantinos Venetsanopoulos wrote:

On 12/20/2012 04:13 PM, Iustin Pop wrote:

On Fri, Dec 14, 2012 at 03:50:09PM +0200, Constantinos Venetsanopoulos wrote:

Add a new client called 'gnt-storage'.
The client interacts with the ExtStorage interface, similarly to
the way gnt-os interacts with the OS interface.

For now, only two commands are supported: 'info' and 'diagnose'.

'diagnose' calculates the node status of each provider on each node,
similarly to gnt-os diagnose. Furthermore, for every provider, it
calculates it's nodegroup validity for each nodegroup. This is done
inside the LU and not the client (marked as 'TODO' for the  global
validity of gnt-os diagnose).

In the future, gnt-storage can be used to manage storage pools,
or even be extended to diagnose other storage types supported by
Ganeti, such as lvm, drbd (INT_MIRROR) or rbd (EXT_MIRROR).

Hmmm. A few issues here.

The model this implements is based on the OS diagnose model, which is
unfortunately not a good one; we wanted to move this to a query for a
long time.

So I'm a bit hesitant to merge this as is, by adding a new opcode,
instead of a clean new query2-based model.

If we do add this as is, then at least the Haskell code needs to get the
new opcode as well.

So, which way do we go?

thanks,
iustin

This is tough. I'm not very confident with my Haskell knowledge yet,
to be the one to port it to Haskell :(

Sorry, I think you misunderstood me.

I just meant to use a query operation, in Python, like gnt-instance list
does, as opposed to an opcode-based one, like gnt-os list.

Oh. OK. Indeed I misunderstood.


I'd propose we merge this now, because it is really useful when
dealing with ExtStorage providers, plus it is the base for other
storage related OpCodes to come in the future (as discussed in the
Storage Pools thread).

If so, I'll try to learn the new query2-based model (maybe  with
some pointers from you; I haven't seen that at all) and I can
provide a future a patch that moves the diagnose functionality to
the new model.

Well, you already implement the query2 interface, just don't expose it
as query, but instead as an opcode. So the work would be small, in any
case.

Hmm.. I don't think I completely get that. It seems that I'm missing
something
here. As I said before, I haven't read the query2 design doc, so
probably this
is why.

Let me just ask one thing and I will come back soon for more clarifications,
after I've read the doc carefully:

All the logic code that now goes into the ExtStorareDiagnose LU will remain
as is? The changes should be done at the client side? Or we should change
the way the LU operates too, in order to expose the functionality as
query and
not opcode?

I'll try to explain.

There are two ways of interacting with masterd. Either you submit a job
(a list of opcodes) and get a job ID, then poll the job, etc. This is
how gnt-os list works:

- create a opcodes.OpOsDiagnose(output_fields=["name", "variants"],
   names=[])
- submit it to masterd, get a job ID back
- poll until the job is finished, get results

That I knew.


A query, on the other hand, is slightly different:

- the client sends a Luxi request of type Query (in query2) or
   specific QueryNodes (in query1) and waits for the response (in the
   same call)
- this request builds in masterd an opcode, it executes it, and gets the
   result back to the client

There's no job ID involved, and you get the result in the same call (if
you disconnect, the results are lost).

That I didn't know :)


As to implementation details, the only thing that changes is that your
OpCode and LU should should named OpQueryStorage, LUQueryStorage (these
are used for query1), you need to extend _QUERY_IMPL in cmdlib.py, etc.

The actual computation of fields/results remains the same, in
lib/query.py.

You can see a recent example in commit "Implement LUNetworkQuery".

OK. Its a bit clearer now. I'll see the code, and the above commit
and I'll come back on it. Thanks for the explanation.


Would that be OK with you?

Hmm, OK, then I'll submit this as is, fix the Haskell definitions to
support the new opcode (that is about 10-20 lines of code), and we'll
see for 2.8 about moving this to a pure query.

That sounds good. Until 2.8, I'll see the query2 system in more detail and I
think I'll be able to provide the corresponding patch, that moves things to
the direction you are proposing.

OK, sounds good then.

OK.

FYI, submitted with this interdiff (adding the opc

Re: [PATCH master 5/7] Add the gnt-storage client

2012-12-21 Thread Constantinos Venetsanopoulos

On 12/20/2012 11:52 PM, Iustin Pop wrote:

On Thu, Dec 20, 2012 at 09:55:10PM +0200, Constantinos Venetsanopoulos wrote:

On 12/20/2012 7:28 PM, Iustin Pop wrote:

On Thu, Dec 20, 2012 at 07:18:39PM +0200, Constantinos Venetsanopoulos wrote:

On 12/20/2012 04:13 PM, Iustin Pop wrote:

On Fri, Dec 14, 2012 at 03:50:09PM +0200, Constantinos Venetsanopoulos wrote:

Add a new client called 'gnt-storage'.
The client interacts with the ExtStorage interface, similarly to
the way gnt-os interacts with the OS interface.

For now, only two commands are supported: 'info' and 'diagnose'.

'diagnose' calculates the node status of each provider on each node,
similarly to gnt-os diagnose. Furthermore, for every provider, it
calculates it's nodegroup validity for each nodegroup. This is done
inside the LU and not the client (marked as 'TODO' for the  global
validity of gnt-os diagnose).

In the future, gnt-storage can be used to manage storage pools,
or even be extended to diagnose other storage types supported by
Ganeti, such as lvm, drbd (INT_MIRROR) or rbd (EXT_MIRROR).

Hmmm. A few issues here.

The model this implements is based on the OS diagnose model, which is
unfortunately not a good one; we wanted to move this to a query for a
long time.

So I'm a bit hesitant to merge this as is, by adding a new opcode,
instead of a clean new query2-based model.

If we do add this as is, then at least the Haskell code needs to get the
new opcode as well.

So, which way do we go?

thanks,
iustin

This is tough. I'm not very confident with my Haskell knowledge yet,
to be the one to port it to Haskell :(

Sorry, I think you misunderstood me.

I just meant to use a query operation, in Python, like gnt-instance list
does, as opposed to an opcode-based one, like gnt-os list.

Oh. OK. Indeed I misunderstood.


I'd propose we merge this now, because it is really useful when
dealing with ExtStorage providers, plus it is the base for other
storage related OpCodes to come in the future (as discussed in the
Storage Pools thread).

If so, I'll try to learn the new query2-based model (maybe  with
some pointers from you; I haven't seen that at all) and I can
provide a future a patch that moves the diagnose functionality to
the new model.

Well, you already implement the query2 interface, just don't expose it
as query, but instead as an opcode. So the work would be small, in any
case.

Hmm.. I don't think I completely get that. It seems that I'm missing
something
here. As I said before, I haven't read the query2 design doc, so
probably this
is why.

Let me just ask one thing and I will come back soon for more clarifications,
after I've read the doc carefully:

All the logic code that now goes into the ExtStorareDiagnose LU will remain
as is? The changes should be done at the client side? Or we should change
the way the LU operates too, in order to expose the functionality as
query and
not opcode?

I'll try to explain.

There are two ways of interacting with masterd. Either you submit a job
(a list of opcodes) and get a job ID, then poll the job, etc. This is
how gnt-os list works:

- create a opcodes.OpOsDiagnose(output_fields=["name", "variants"],
   names=[])
- submit it to masterd, get a job ID back
- poll until the job is finished, get results


That I knew.


A query, on the other hand, is slightly different:

- the client sends a Luxi request of type Query (in query2) or
   specific QueryNodes (in query1) and waits for the response (in the
   same call)
- this request builds in masterd an opcode, it executes it, and gets the
   result back to the client

There's no job ID involved, and you get the result in the same call (if
you disconnect, the results are lost).


That I didn't know :)


As to implementation details, the only thing that changes is that your
OpCode and LU should should named OpQueryStorage, LUQueryStorage (these
are used for query1), you need to extend _QUERY_IMPL in cmdlib.py, etc.

The actual computation of fields/results remains the same, in
lib/query.py.

You can see a recent example in commit "Implement LUNetworkQuery".


OK. Its a bit clearer now. I'll see the code, and the above commit
and I'll come back on it. Thanks for the explanation.


Would that be OK with you?

Hmm, OK, then I'll submit this as is, fix the Haskell definitions to
support the new opcode (that is about 10-20 lines of code), and we'll
see for 2.8 about moving this to a pure query.

That sounds good. Until 2.8, I'll see the query2 system in more detail and I
think I'll be able to provide the corresponding patch, that moves things to
the direction you are proposing.

OK, sounds good then.


OK.

Thanks,
Constantinos


Re: [PATCH master 5/7] Add the gnt-storage client

2012-12-20 Thread Constantinos Venetsanopoulos

On 12/20/2012 7:28 PM, Iustin Pop wrote:

On Thu, Dec 20, 2012 at 07:18:39PM +0200, Constantinos Venetsanopoulos wrote:

On 12/20/2012 04:13 PM, Iustin Pop wrote:

On Fri, Dec 14, 2012 at 03:50:09PM +0200, Constantinos Venetsanopoulos wrote:

Add a new client called 'gnt-storage'.
The client interacts with the ExtStorage interface, similarly to
the way gnt-os interacts with the OS interface.

For now, only two commands are supported: 'info' and 'diagnose'.

'diagnose' calculates the node status of each provider on each node,
similarly to gnt-os diagnose. Furthermore, for every provider, it
calculates it's nodegroup validity for each nodegroup. This is done
inside the LU and not the client (marked as 'TODO' for the  global
validity of gnt-os diagnose).

In the future, gnt-storage can be used to manage storage pools,
or even be extended to diagnose other storage types supported by
Ganeti, such as lvm, drbd (INT_MIRROR) or rbd (EXT_MIRROR).

Hmmm. A few issues here.

The model this implements is based on the OS diagnose model, which is
unfortunately not a good one; we wanted to move this to a query for a
long time.

So I'm a bit hesitant to merge this as is, by adding a new opcode,
instead of a clean new query2-based model.

If we do add this as is, then at least the Haskell code needs to get the
new opcode as well.

So, which way do we go?

thanks,
iustin

This is tough. I'm not very confident with my Haskell knowledge yet,
to be the one to port it to Haskell :(

Sorry, I think you misunderstood me.

I just meant to use a query operation, in Python, like gnt-instance list
does, as opposed to an opcode-based one, like gnt-os list.


Oh. OK. Indeed I misunderstood.


I'd propose we merge this now, because it is really useful when
dealing with ExtStorage providers, plus it is the base for other
storage related OpCodes to come in the future (as discussed in the
Storage Pools thread).

If so, I'll try to learn the new query2-based model (maybe  with
some pointers from you; I haven't seen that at all) and I can
provide a future a patch that moves the diagnose functionality to
the new model.

Well, you already implement the query2 interface, just don't expose it
as query, but instead as an opcode. So the work would be small, in any
case.


Hmm.. I don't think I completely get that. It seems that I'm missing 
something
here. As I said before, I haven't read the query2 design doc, so 
probably this

is why.

Let me just ask one thing and I will come back soon for more clarifications,
after I've read the doc carefully:

All the logic code that now goes into the ExtStorareDiagnose LU will remain
as is? The changes should be done at the client side? Or we should change
the way the LU operates too, in order to expose the functionality as 
query and

not opcode?


Would that be OK with you?

Hmm, OK, then I'll submit this as is, fix the Haskell definitions to
support the new opcode (that is about 10-20 lines of code), and we'll
see for 2.8 about moving this to a pure query.


That sounds good. Until 2.8, I'll see the query2 system in more detail and I
think I'll be able to provide the corresponding patch, that moves things to
the direction you are proposing.

Thanks,
Constantinos


Re: [PATCH master 5/7] Add the gnt-storage client

2012-12-20 Thread Constantinos Venetsanopoulos

On 12/20/2012 04:13 PM, Iustin Pop wrote:

On Fri, Dec 14, 2012 at 03:50:09PM +0200, Constantinos Venetsanopoulos wrote:

Add a new client called 'gnt-storage'.
The client interacts with the ExtStorage interface, similarly to
the way gnt-os interacts with the OS interface.

For now, only two commands are supported: 'info' and 'diagnose'.

'diagnose' calculates the node status of each provider on each node,
similarly to gnt-os diagnose. Furthermore, for every provider, it
calculates it's nodegroup validity for each nodegroup. This is done
inside the LU and not the client (marked as 'TODO' for the  global
validity of gnt-os diagnose).

In the future, gnt-storage can be used to manage storage pools,
or even be extended to diagnose other storage types supported by
Ganeti, such as lvm, drbd (INT_MIRROR) or rbd (EXT_MIRROR).

Hmmm. A few issues here.

The model this implements is based on the OS diagnose model, which is
unfortunately not a good one; we wanted to move this to a query for a
long time.

So I'm a bit hesitant to merge this as is, by adding a new opcode,
instead of a clean new query2-based model.

If we do add this as is, then at least the Haskell code needs to get the
new opcode as well.

So, which way do we go?

thanks,
iustin


This is tough. I'm not very confident with my Haskell knowledge yet,
to be the one to port it to Haskell :(

I'd propose we merge this now, because it is really useful when
dealing with ExtStorage providers, plus it is the base for other
storage related OpCodes to come in the future (as discussed in the
Storage Pools thread).

If so, I'll try to learn the new query2-based model (maybe  with
some pointers from you; I haven't seen that at all) and I can
provide a future a patch that moves the diagnose functionality to
the new model.

Would that be OK with you?

Thanks,
Constantinos



  1   2   3   >