Re: [PATCH stable-2.16 1/2] Tune getNodeInstances DRBD secondary computation

2016-05-30 Thread 'Iustin Pop' via ganeti-devel
2016-05-30 11:34 GMT+02:00 Brian Foley :

> On Sun, May 29, 2016 at 12:01:39PM +0200, Iustin Pop wrote:
> > On Thu, May 26, 2016 at 03:56:10PM +0100, Ganeti Development List wrote:
> > > This is exercised by the luxi QueryInstances call when the sinst_cnt
> and
> > > sint_list fields are used. This uses a lot of CPU and does a lot of
> > > short-lived heap allocation on clusters with many instances.
> > >
> > > The reimplementation allocates fewer temporary values, and does fewer
> > > object lookups by UUID. The net effect is to reduce heap use from
> ~3.2GB
> > > to ~1.5GB, and CPU use from ~1200ms to ~770ms in a test harness using
> > > a config with 1000 DRBD instances on 80 nodes.
> > >
> > > Signed-off-by: Brian Foley 
> >
> > Hi Brian,
> >
> > > +-- | Computes the secondary node UUID for a DRBD disk
> > > +computeDiskSecondaryNode :: Disk -> Maybe String
> > > +computeDiskSecondaryNode dsk =
> > > +  case diskLogicalId dsk of
> > > +Just (LIDDrbd8 _nodeA nodeB _ _ _ _) -> Just nodeB
> >
> > Are you sure this snippet is correct? If I remember correctly and things
> > haven't changed, disks do not record primary/secondary roles, they only
> > hold the nodes. I.e. nodeA ≠ primary, it's simple one of the nodes of
> > the disk, similary with nodeB. Hence the naming A/B, instead of Pri/Sec.
>
> Ah blast. No, I'm not sure. When diagnosing DRBD issues on live clusters
> nodeA always seemed to be the primary, so I assumed that this invariant
> was enforced by the code. I'll submit a patch to fix this shortly.
>

nodeA is the primary at the moment of instance creation; afterwards, live
migration/failover and change secondary can change this, but it's 50% which
is which.

thanks!
iustin


Re: [PATCH stable-2.16 1/2] Tune getNodeInstances DRBD secondary computation

2016-05-30 Thread 'Brian Foley' via ganeti-devel
On Sun, May 29, 2016 at 12:01:39PM +0200, Iustin Pop wrote:
> On Thu, May 26, 2016 at 03:56:10PM +0100, Ganeti Development List wrote:
> > This is exercised by the luxi QueryInstances call when the sinst_cnt and
> > sint_list fields are used. This uses a lot of CPU and does a lot of
> > short-lived heap allocation on clusters with many instances.
> > 
> > The reimplementation allocates fewer temporary values, and does fewer
> > object lookups by UUID. The net effect is to reduce heap use from ~3.2GB
> > to ~1.5GB, and CPU use from ~1200ms to ~770ms in a test harness using
> > a config with 1000 DRBD instances on 80 nodes.
> > 
> > Signed-off-by: Brian Foley 
> 
> Hi Brian,
> 
> > +-- | Computes the secondary node UUID for a DRBD disk
> > +computeDiskSecondaryNode :: Disk -> Maybe String
> > +computeDiskSecondaryNode dsk =
> > +  case diskLogicalId dsk of
> > +Just (LIDDrbd8 _nodeA nodeB _ _ _ _) -> Just nodeB
> 
> Are you sure this snippet is correct? If I remember correctly and things
> haven't changed, disks do not record primary/secondary roles, they only
> hold the nodes. I.e. nodeA ≠ primary, it's simple one of the nodes of
> the disk, similary with nodeB. Hence the naming A/B, instead of Pri/Sec.

Ah blast. No, I'm not sure. When diagnosing DRBD issues on live clusters
nodeA always seemed to be the primary, so I assumed that this invariant
was enforced by the code. I'll submit a patch to fix this shortly.

Thanks,
Brian.


Re: [PATCH stable-2.16 1/2] Tune getNodeInstances DRBD secondary computation

2016-05-29 Thread 'Iustin Pop' via ganeti-devel
On Thu, May 26, 2016 at 03:56:10PM +0100, Ganeti Development List wrote:
> This is exercised by the luxi QueryInstances call when the sinst_cnt and
> sint_list fields are used. This uses a lot of CPU and does a lot of
> short-lived heap allocation on clusters with many instances.
> 
> The reimplementation allocates fewer temporary values, and does fewer
> object lookups by UUID. The net effect is to reduce heap use from ~3.2GB
> to ~1.5GB, and CPU use from ~1200ms to ~770ms in a test harness using
> a config with 1000 DRBD instances on 80 nodes.
> 
> Signed-off-by: Brian Foley 

Hi Brian,

> +-- | Computes the secondary node UUID for a DRBD disk
> +computeDiskSecondaryNode :: Disk -> Maybe String
> +computeDiskSecondaryNode dsk =
> +  case diskLogicalId dsk of
> +Just (LIDDrbd8 _nodeA nodeB _ _ _ _) -> Just nodeB

Are you sure this snippet is correct? If I remember correctly and things
haven't changed, disks do not record primary/secondary roles, they only
hold the nodes. I.e. nodeA ≠ primary, it's simple one of the nodes of
the disk, similary with nodeB. Hence the naming A/B, instead of Pri/Sec.

Only the instance knows its current primary node, and thus you need that
to be able to determine which is the disk's secondary node.

regards,
iustin


Re: [PATCH stable-2.16 1/2] Tune getNodeInstances DRBD secondary computation

2016-05-26 Thread 'Viktor Bachraty' via ganeti-devel
LGTM, thanks for tracing this down!

On Thursday, May 26, 2016 at 3:56:14 PM UTC+1, Brian Foley wrote:
>
> This is exercised by the luxi QueryInstances call when the sinst_cnt and 
> sint_list fields are used. This uses a lot of CPU and does a lot of 
> short-lived heap allocation on clusters with many instances. 
>
> The reimplementation allocates fewer temporary values, and does fewer 
> object lookups by UUID. The net effect is to reduce heap use from ~3.2GB 
> to ~1.5GB, and CPU use from ~1200ms to ~770ms in a test harness using 
> a config with 1000 DRBD instances on 80 nodes. 
>
> Signed-off-by: Brian Foley  
> --- 
>  src/Ganeti/Config.hs | 37 ++--- 
>  1 file changed, 26 insertions(+), 11 deletions(-) 
>
> diff --git a/src/Ganeti/Config.hs b/src/Ganeti/Config.hs 
> index cf2f885..5b40694 100644 
> --- a/src/Ganeti/Config.hs 
> +++ b/src/Ganeti/Config.hs 
> @@ -90,7 +90,7 @@ import qualified Data.ByteString as BS 
>  import qualified Data.ByteString.UTF8 as UTF8 
>  import qualified Data.Foldable as F 
>  import Data.List (foldl', nub) 
> -import Data.Maybe (fromMaybe) 
> +import Data.Maybe (fromMaybe, mapMaybe) 
>  import Data.Monoid 
>  import qualified Data.Map as M 
>  import qualified Data.Set as S 
> @@ -158,21 +158,36 @@ instNodes :: ConfigData -> Instance -> S.Set String 
>  instNodes cfg inst = maybe id S.insert (instPrimaryNode inst) 
>$ instDiskNodes cfg inst 
>   
> --- | Computes the secondary nodes of an instance. Since this is valid 
> --- only for DRBD, we call directly 'instDiskNodes', skipping over the 
> --- extra primary insert. 
> -instSecondaryNodes :: ConfigData -> Instance -> S.Set String 
> -instSecondaryNodes cfg inst = 
> -  maybe id S.delete (instPrimaryNode inst) $ instDiskNodes cfg inst 
> +-- | Computes the secondary node UUID for a DRBD disk 
> +computeDiskSecondaryNode :: Disk -> Maybe String 
> +computeDiskSecondaryNode dsk = 
> +  case diskLogicalId dsk of 
> +Just (LIDDrbd8 _nodeA nodeB _ _ _ _) -> Just nodeB 
> +_ -> Nothing 
>   
>  -- | Get instances of a given node. 
>  -- The node is specified through its UUID. 
> +-- The secondary calculation is expensive and frequently called, so 
> optimise 
> +-- this to allocate fewer temporary values 
>  getNodeInstances :: ConfigData -> String -> ([Instance], [Instance]) 
>  getNodeInstances cfg nname = 
> -let all_inst = M.elems . fromContainer . configInstances $ cfg 
> -pri_inst = filter ((== Just nname) . instPrimaryNode) all_inst 
> -sec_inst = filter ((nname `S.member`) . instSecondaryNodes cfg) 
> all_inst 
> -in (pri_inst, sec_inst) 
> +let all_insts = M.elems . fromContainer . configInstances $ cfg 
> +all_disks = fromContainer . configDisks $ cfg 
> + 
> +pri_inst = filter ((== Just nname) . instPrimaryNode) $ all_insts 
> + 
> +find_disk :: String -> Maybe Disk 
> +find_disk d_uuid = M.lookup (UTF8.fromString d_uuid) all_disks 
> + 
> +inst_disks :: [(Instance, [Disk])] 
> +inst_disks = [(i, mapMaybe find_disk $ instDisks i) | i <- 
> all_insts] 
> + 
> +sec_insts :: [Instance] 
> +sec_insts = [inst | 
> +(inst, disks) <- inst_disks, 
> +s_uuid <- mapMaybe computeDiskSecondaryNode disks, 
> +s_uuid == nname] 
> +in (pri_inst, sec_insts) 
>   
>  -- | Computes the role of a node. 
>  getNodeRole :: ConfigData -> Node -> NodeRole 
> -- 
> 2.8.0.rc3.226.g39d4020 
>
>

[PATCH stable-2.16 1/2] Tune getNodeInstances DRBD secondary computation

2016-05-26 Thread 'Brian Foley' via ganeti-devel
This is exercised by the luxi QueryInstances call when the sinst_cnt and
sint_list fields are used. This uses a lot of CPU and does a lot of
short-lived heap allocation on clusters with many instances.

The reimplementation allocates fewer temporary values, and does fewer
object lookups by UUID. The net effect is to reduce heap use from ~3.2GB
to ~1.5GB, and CPU use from ~1200ms to ~770ms in a test harness using
a config with 1000 DRBD instances on 80 nodes.

Signed-off-by: Brian Foley 
---
 src/Ganeti/Config.hs | 37 ++---
 1 file changed, 26 insertions(+), 11 deletions(-)

diff --git a/src/Ganeti/Config.hs b/src/Ganeti/Config.hs
index cf2f885..5b40694 100644
--- a/src/Ganeti/Config.hs
+++ b/src/Ganeti/Config.hs
@@ -90,7 +90,7 @@ import qualified Data.ByteString as BS
 import qualified Data.ByteString.UTF8 as UTF8
 import qualified Data.Foldable as F
 import Data.List (foldl', nub)
-import Data.Maybe (fromMaybe)
+import Data.Maybe (fromMaybe, mapMaybe)
 import Data.Monoid
 import qualified Data.Map as M
 import qualified Data.Set as S
@@ -158,21 +158,36 @@ instNodes :: ConfigData -> Instance -> S.Set String
 instNodes cfg inst = maybe id S.insert (instPrimaryNode inst)
   $ instDiskNodes cfg inst
 
--- | Computes the secondary nodes of an instance. Since this is valid
--- only for DRBD, we call directly 'instDiskNodes', skipping over the
--- extra primary insert.
-instSecondaryNodes :: ConfigData -> Instance -> S.Set String
-instSecondaryNodes cfg inst =
-  maybe id S.delete (instPrimaryNode inst) $ instDiskNodes cfg inst
+-- | Computes the secondary node UUID for a DRBD disk
+computeDiskSecondaryNode :: Disk -> Maybe String
+computeDiskSecondaryNode dsk =
+  case diskLogicalId dsk of
+Just (LIDDrbd8 _nodeA nodeB _ _ _ _) -> Just nodeB
+_ -> Nothing
 
 -- | Get instances of a given node.
 -- The node is specified through its UUID.
+-- The secondary calculation is expensive and frequently called, so optimise
+-- this to allocate fewer temporary values
 getNodeInstances :: ConfigData -> String -> ([Instance], [Instance])
 getNodeInstances cfg nname =
-let all_inst = M.elems . fromContainer . configInstances $ cfg
-pri_inst = filter ((== Just nname) . instPrimaryNode) all_inst
-sec_inst = filter ((nname `S.member`) . instSecondaryNodes cfg) 
all_inst
-in (pri_inst, sec_inst)
+let all_insts = M.elems . fromContainer . configInstances $ cfg
+all_disks = fromContainer . configDisks $ cfg
+
+pri_inst = filter ((== Just nname) . instPrimaryNode) $ all_insts
+
+find_disk :: String -> Maybe Disk
+find_disk d_uuid = M.lookup (UTF8.fromString d_uuid) all_disks
+
+inst_disks :: [(Instance, [Disk])]
+inst_disks = [(i, mapMaybe find_disk $ instDisks i) | i <- all_insts]
+
+sec_insts :: [Instance]
+sec_insts = [inst |
+(inst, disks) <- inst_disks,
+s_uuid <- mapMaybe computeDiskSecondaryNode disks,
+s_uuid == nname]
+in (pri_inst, sec_insts)
 
 -- | Computes the role of a node.
 getNodeRole :: ConfigData -> Node -> NodeRole
-- 
2.8.0.rc3.226.g39d4020