Re: [PATCH stable-2.16 1/2] Tune getNodeInstances DRBD secondary computation
2016-05-30 11:34 GMT+02:00 Brian Foley: > On Sun, May 29, 2016 at 12:01:39PM +0200, Iustin Pop wrote: > > On Thu, May 26, 2016 at 03:56:10PM +0100, Ganeti Development List wrote: > > > This is exercised by the luxi QueryInstances call when the sinst_cnt > and > > > sint_list fields are used. This uses a lot of CPU and does a lot of > > > short-lived heap allocation on clusters with many instances. > > > > > > The reimplementation allocates fewer temporary values, and does fewer > > > object lookups by UUID. The net effect is to reduce heap use from > ~3.2GB > > > to ~1.5GB, and CPU use from ~1200ms to ~770ms in a test harness using > > > a config with 1000 DRBD instances on 80 nodes. > > > > > > Signed-off-by: Brian Foley > > > > Hi Brian, > > > > > +-- | Computes the secondary node UUID for a DRBD disk > > > +computeDiskSecondaryNode :: Disk -> Maybe String > > > +computeDiskSecondaryNode dsk = > > > + case diskLogicalId dsk of > > > +Just (LIDDrbd8 _nodeA nodeB _ _ _ _) -> Just nodeB > > > > Are you sure this snippet is correct? If I remember correctly and things > > haven't changed, disks do not record primary/secondary roles, they only > > hold the nodes. I.e. nodeA ≠ primary, it's simple one of the nodes of > > the disk, similary with nodeB. Hence the naming A/B, instead of Pri/Sec. > > Ah blast. No, I'm not sure. When diagnosing DRBD issues on live clusters > nodeA always seemed to be the primary, so I assumed that this invariant > was enforced by the code. I'll submit a patch to fix this shortly. > nodeA is the primary at the moment of instance creation; afterwards, live migration/failover and change secondary can change this, but it's 50% which is which. thanks! iustin
Re: [PATCH stable-2.16 1/2] Tune getNodeInstances DRBD secondary computation
On Sun, May 29, 2016 at 12:01:39PM +0200, Iustin Pop wrote: > On Thu, May 26, 2016 at 03:56:10PM +0100, Ganeti Development List wrote: > > This is exercised by the luxi QueryInstances call when the sinst_cnt and > > sint_list fields are used. This uses a lot of CPU and does a lot of > > short-lived heap allocation on clusters with many instances. > > > > The reimplementation allocates fewer temporary values, and does fewer > > object lookups by UUID. The net effect is to reduce heap use from ~3.2GB > > to ~1.5GB, and CPU use from ~1200ms to ~770ms in a test harness using > > a config with 1000 DRBD instances on 80 nodes. > > > > Signed-off-by: Brian Foley> > Hi Brian, > > > +-- | Computes the secondary node UUID for a DRBD disk > > +computeDiskSecondaryNode :: Disk -> Maybe String > > +computeDiskSecondaryNode dsk = > > + case diskLogicalId dsk of > > +Just (LIDDrbd8 _nodeA nodeB _ _ _ _) -> Just nodeB > > Are you sure this snippet is correct? If I remember correctly and things > haven't changed, disks do not record primary/secondary roles, they only > hold the nodes. I.e. nodeA ≠ primary, it's simple one of the nodes of > the disk, similary with nodeB. Hence the naming A/B, instead of Pri/Sec. Ah blast. No, I'm not sure. When diagnosing DRBD issues on live clusters nodeA always seemed to be the primary, so I assumed that this invariant was enforced by the code. I'll submit a patch to fix this shortly. Thanks, Brian.
Re: [PATCH stable-2.16 1/2] Tune getNodeInstances DRBD secondary computation
On Thu, May 26, 2016 at 03:56:10PM +0100, Ganeti Development List wrote: > This is exercised by the luxi QueryInstances call when the sinst_cnt and > sint_list fields are used. This uses a lot of CPU and does a lot of > short-lived heap allocation on clusters with many instances. > > The reimplementation allocates fewer temporary values, and does fewer > object lookups by UUID. The net effect is to reduce heap use from ~3.2GB > to ~1.5GB, and CPU use from ~1200ms to ~770ms in a test harness using > a config with 1000 DRBD instances on 80 nodes. > > Signed-off-by: Brian FoleyHi Brian, > +-- | Computes the secondary node UUID for a DRBD disk > +computeDiskSecondaryNode :: Disk -> Maybe String > +computeDiskSecondaryNode dsk = > + case diskLogicalId dsk of > +Just (LIDDrbd8 _nodeA nodeB _ _ _ _) -> Just nodeB Are you sure this snippet is correct? If I remember correctly and things haven't changed, disks do not record primary/secondary roles, they only hold the nodes. I.e. nodeA ≠ primary, it's simple one of the nodes of the disk, similary with nodeB. Hence the naming A/B, instead of Pri/Sec. Only the instance knows its current primary node, and thus you need that to be able to determine which is the disk's secondary node. regards, iustin
Re: [PATCH stable-2.16 1/2] Tune getNodeInstances DRBD secondary computation
LGTM, thanks for tracing this down! On Thursday, May 26, 2016 at 3:56:14 PM UTC+1, Brian Foley wrote: > > This is exercised by the luxi QueryInstances call when the sinst_cnt and > sint_list fields are used. This uses a lot of CPU and does a lot of > short-lived heap allocation on clusters with many instances. > > The reimplementation allocates fewer temporary values, and does fewer > object lookups by UUID. The net effect is to reduce heap use from ~3.2GB > to ~1.5GB, and CPU use from ~1200ms to ~770ms in a test harness using > a config with 1000 DRBD instances on 80 nodes. > > Signed-off-by: Brian Foley> --- > src/Ganeti/Config.hs | 37 ++--- > 1 file changed, 26 insertions(+), 11 deletions(-) > > diff --git a/src/Ganeti/Config.hs b/src/Ganeti/Config.hs > index cf2f885..5b40694 100644 > --- a/src/Ganeti/Config.hs > +++ b/src/Ganeti/Config.hs > @@ -90,7 +90,7 @@ import qualified Data.ByteString as BS > import qualified Data.ByteString.UTF8 as UTF8 > import qualified Data.Foldable as F > import Data.List (foldl', nub) > -import Data.Maybe (fromMaybe) > +import Data.Maybe (fromMaybe, mapMaybe) > import Data.Monoid > import qualified Data.Map as M > import qualified Data.Set as S > @@ -158,21 +158,36 @@ instNodes :: ConfigData -> Instance -> S.Set String > instNodes cfg inst = maybe id S.insert (instPrimaryNode inst) >$ instDiskNodes cfg inst > > --- | Computes the secondary nodes of an instance. Since this is valid > --- only for DRBD, we call directly 'instDiskNodes', skipping over the > --- extra primary insert. > -instSecondaryNodes :: ConfigData -> Instance -> S.Set String > -instSecondaryNodes cfg inst = > - maybe id S.delete (instPrimaryNode inst) $ instDiskNodes cfg inst > +-- | Computes the secondary node UUID for a DRBD disk > +computeDiskSecondaryNode :: Disk -> Maybe String > +computeDiskSecondaryNode dsk = > + case diskLogicalId dsk of > +Just (LIDDrbd8 _nodeA nodeB _ _ _ _) -> Just nodeB > +_ -> Nothing > > -- | Get instances of a given node. > -- The node is specified through its UUID. > +-- The secondary calculation is expensive and frequently called, so > optimise > +-- this to allocate fewer temporary values > getNodeInstances :: ConfigData -> String -> ([Instance], [Instance]) > getNodeInstances cfg nname = > -let all_inst = M.elems . fromContainer . configInstances $ cfg > -pri_inst = filter ((== Just nname) . instPrimaryNode) all_inst > -sec_inst = filter ((nname `S.member`) . instSecondaryNodes cfg) > all_inst > -in (pri_inst, sec_inst) > +let all_insts = M.elems . fromContainer . configInstances $ cfg > +all_disks = fromContainer . configDisks $ cfg > + > +pri_inst = filter ((== Just nname) . instPrimaryNode) $ all_insts > + > +find_disk :: String -> Maybe Disk > +find_disk d_uuid = M.lookup (UTF8.fromString d_uuid) all_disks > + > +inst_disks :: [(Instance, [Disk])] > +inst_disks = [(i, mapMaybe find_disk $ instDisks i) | i <- > all_insts] > + > +sec_insts :: [Instance] > +sec_insts = [inst | > +(inst, disks) <- inst_disks, > +s_uuid <- mapMaybe computeDiskSecondaryNode disks, > +s_uuid == nname] > +in (pri_inst, sec_insts) > > -- | Computes the role of a node. > getNodeRole :: ConfigData -> Node -> NodeRole > -- > 2.8.0.rc3.226.g39d4020 > >