[ovirt-users] Re: hyperconverged single node with SSD cache fails gluster creation

2019-12-01 Thread Strahil Nikolov
 When I first deployed my oVirt lab (v4.2.7 was latest and greatest) the 
ansible playbook didn't work for me.So I decided to stop the gluster processes 
on one of the nodes, Wipe all LVM and recreate it manually. Finally , I have 
managed to use my SSD for write-back cache - but I found out that if your Chunk 
size is larger than the default limit - it will never push it to the spinning 
disks. For details you can check 1668163 – LVM cache cannot flush buffer,change 
cache type or lvremove LV (CachePolicy 'cleaner' also doesn't work)
As we use either 'replica 2 arbiter 1' (old name replica 3 arbiter 1) or a pure 
replica 3 , we can afford a gluster node go 'pouf' as long as we have decent 
bandwidth and we use sharding.
So far I have changed my brick layout at least twice (for the cluster) without 
the VMs being affected - so you can still try to do the caching, but please 
check the comments in #1668163 about the chunk size of the cache.
Best Regards,Strahil Nikolov


В неделя, 1 декември 2019 г., 16:02:36 ч. Гринуич+2, Thomas Hoberg 
 написа:  
 
 Hi Gobinda,

unfortunately it's long gone, because I went back to an un-cached setup.

It was mostly a trial anyway, I had to re-do the 3-node HCI because it 
had died rather horribly on me (a repeating issue I have so far had on 
distinct sets of hardware, that I am still trying to hunt down... 
separate topic).

And since it was a blank(ed) set of servers, I just decided to try the 
SSD cache, to see if the Ansible script generation issue had been sorted 
out as described from upstream. I was rather encouraged to see that the 
Ansible script now had these changes included, that URS had described as 
becoming necessary with a new Ansible version.

It doesn't actually make a lot of sense in the setup, because the SSD 
cache is a single Samsung EVO 860 1TB unit while the storage is a RAID6 
out of 7 4TB 2.5" drives (per server): Both have similar bandwidth, IOPS 
would be very much workload dependent (the 2nd SSD I intended to use as 
a mirror was unfortunately cut from the budget).

It has space left over because the OS doesn't need that much, but I 
don't dare use a single SSD as a write-back cache, especially because 
the RAID controller (HP420i) hides all wear information and doesn't seem 
to pass TRIM either and for write-through I'm not sure it would do 
noticeably better than the RAID controller (I configured that not to 
cache the SSD, too).

So after it failed, I simply went back to no-cache for now. This HCI 
cluster is using relatively low-power hardware recalled from retirement 
that will host functional VMs, not high-performance workloads. They are 
well equipped with RAM and that's always the fastest cache anyway.

I guess you should be able to add and remove the SSD as cache layer at 
any time during the operation, because it's at a level oVirt doesn't 
manage and I'd love to see examples as to how it's done. Especially the 
removal part would be important to know, if your SSD signals unexpected 
levels of wear and you need to swap them out on the fly.

If I hit across another opportunity to test (most likely a single node), 
I will update here and make sure to collect a full set of log files 
including the ansible main config file.

Thank you for your interest and the follow-up,

Thomas

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/Y45BAH7PXJN6C6HXG4VDX4TRRPCH6TOX/
  ___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/J3A5ROG3BS4S6H7S7GXOTUYUZMIUSX6P/


[ovirt-users] Re: hyperconverged single node with SSD cache fails gluster creation

2019-12-01 Thread Thomas Hoberg

Hi Gobinda,

unfortunately it's long gone, because I went back to an un-cached setup.

It was mostly a trial anyway, I had to re-do the 3-node HCI because it 
had died rather horribly on me (a repeating issue I have so far had on 
distinct sets of hardware, that I am still trying to hunt down... 
separate topic).


And since it was a blank(ed) set of servers, I just decided to try the 
SSD cache, to see if the Ansible script generation issue had been sorted 
out as described from upstream. I was rather encouraged to see that the 
Ansible script now had these changes included, that URS had described as 
becoming necessary with a new Ansible version.


It doesn't actually make a lot of sense in the setup, because the SSD 
cache is a single Samsung EVO 860 1TB unit while the storage is a RAID6 
out of 7 4TB 2.5" drives (per server): Both have similar bandwidth, IOPS 
would be very much workload dependent (the 2nd SSD I intended to use as 
a mirror was unfortunately cut from the budget).


It has space left over because the OS doesn't need that much, but I 
don't dare use a single SSD as a write-back cache, especially because 
the RAID controller (HP420i) hides all wear information and doesn't seem 
to pass TRIM either and for write-through I'm not sure it would do 
noticeably better than the RAID controller (I configured that not to 
cache the SSD, too).


So after it failed, I simply went back to no-cache for now. This HCI 
cluster is using relatively low-power hardware recalled from retirement 
that will host functional VMs, not high-performance workloads. They are 
well equipped with RAM and that's always the fastest cache anyway.


I guess you should be able to add and remove the SSD as cache layer at 
any time during the operation, because it's at a level oVirt doesn't 
manage and I'd love to see examples as to how it's done. Especially the 
removal part would be important to know, if your SSD signals unexpected 
levels of wear and you need to swap them out on the fly.


If I hit across another opportunity to test (most likely a single node), 
I will update here and make sure to collect a full set of log files 
including the ansible main config file.


Thank you for your interest and the follow-up,

Thomas

<>___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/Y45BAH7PXJN6C6HXG4VDX4TRRPCH6TOX/


[ovirt-users] Re: hyperconverged single node with SSD cache fails gluster creation

2019-11-29 Thread Gobinda Das
Hi Thomas,
 Can you please share "hc_wizard_inventory.yml" file which is
under /etc/ansible/ ?

On Thu, Nov 28, 2019 at 11:26 PM Thomas Hoberg  wrote:

> Hi URS,
>
> I have tried again using the latest release (4.3.7) and noted that now
> the more "explicit" variant you quote was generated.
>
> The behavior is changed, but it still fails now complaining about
> /dev/sdb being mounted (or inaccessible in any other way).
>
> I am attaching the logs.
>
> I have a HDD RAID on /dev/sdb and a SSD partiton on /dev/sda3 with
>  >600GB of space left.
>
> I have mostly gone with defaults everywhere, used an arbiter (at least
> for the vmstore and data volumes) VDO and write-through caching with
> 550GB size (note that it fails to apply that value beyond the first node).
>
> Has anyone else tried a hyperconverged 3-node with SSD caching with
> success recently?
>
> Thanks for your feedback and help so far,
>
> Thomas
>
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/SR6QYSJHMYY6JRTPC3TGH24NXYY62TZM/
>


-- 


Thanks,
Gobinda
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/LSNWYOEOMK7KSH26MCUDAVWZGTEWOGA2/


[ovirt-users] Re: hyperconverged single node with SSD cache fails gluster creation

2019-11-28 Thread Thomas Hoberg

Hi URS,

I have tried again using the latest release (4.3.7) and noted that now 
the more "explicit" variant you quote was generated.


The behavior is changed, but it still fails now complaining about 
/dev/sdb being mounted (or inaccessible in any other way).


I am attaching the logs.

I have a HDD RAID on /dev/sdb and a SSD partiton on /dev/sda3 with 
>600GB of space left.


I have mostly gone with defaults everywhere, used an arbiter (at least 
for the vmstore and data volumes) VDO and write-through caching with 
550GB size (note that it fails to apply that value beyond the first node).


Has anyone else tried a hyperconverged 3-node with SSD caching with 
success recently?


Thanks for your feedback and help so far,

Thomas



gluster-deployment.log.gz
Description: GNU Zip compressed data
<>___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/SR6QYSJHMYY6JRTPC3TGH24NXYY62TZM/


[ovirt-users] Re: hyperconverged single node with SSD cache fails gluster creation

2019-11-28 Thread thomas
(could become a double post because I am using e-mail to attach the logs..)

Hi URS,

I have tried again using the latest release (4.3.7) and noted that now the more 
"explicit" variant you quote was generated.

The behavior is changed, but it still fails now complaining about /dev/sdb 
being mounted (or inaccessible in any other way).

I am attaching the logs.

I have a HDD RAID on /dev/sdb and a SSD partiton on /dev/sda3 with >600GB of 
space left.

I have mostly gone with defaults everywhere, used an arbiter (at least for the 
vmstore and data volumes) VDO and write-through caching with 550GB size (note 
that it fails to apply that value beyond the first node).

Has anyone else tried a hyperconverged 3-node with SSD caching with success 
recently?

Thanks for your feedback and help so far,

Thomas
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/K3UYGNAHI4PTIJX6A2EMNT62JEWRHMRE/


[ovirt-users] Re: hyperconverged single node with SSD cache fails gluster creation

2019-09-09 Thread thomas
Thanks a ton!

On one hand I'm glad it's a bug now known and fixed, on the other hand I am 
more scared than ever, that oVirt is too raw to upgrade without intensive QA.

I'll try both the manual approach and the new ansible scripts once I've 
overcome a new problem, that keeps me busy (that will be a new post).

So when would the change flow into the current oVirt release? 4.3.6 or 4.4?
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/BMYVSKBKQZWOPT2FL67HIW6NXOJ4YQZP/


[ovirt-users] Re: hyperconverged single node with SSD cache fails gluster creation

2019-09-04 Thread Sachidananda URS
On Wed, Sep 4, 2019 at 9:27 PM  wrote:

> I am seeing more success than failures at creating single and triple node
> hyperconverged setups after some weeks of experimentation so I am branching
> out to additional features: In this case the ability to use SSDs as cache
> media for hard disks.
>
> I tried first with a single node that combined caching and compression and
> that fails during the creation of LVMs.
>
> I tried again without the VDO compression, but actually the results where
> identical whilst VDO compression but without the LV cache worked ok.
>
> I tried various combinations, using less space etc., but the results are
> always the same and unfortunately rather cryptic (substituted the physical
> disk label with {disklabel}):
>
> TASK [gluster.infra/roles/backend_setup : Extend volume group]
> *
> failed: [{hostname}] (item={u'vgname': u'gluster_vg_{disklabel}p1',
> u'cachethinpoolname': u'gluster_thinpool_gluster_vg_{disklabel}p1',
> u'cachelvname': u'cachelv_gluster_thinpool_gluster_vg_{disklabel}p1',
> u'cachedisk': u'/dev/sda4', u'cachemetalvname':
> u'cache_gluster_thinpool_gluster_vg_{disklabel}p1', u'cachemode':
> u'writeback', u'cachemetalvsize': u'70G', u'cachelvsize': u'630G'}) =>
> {"ansible_loop_var": "item", "changed": false, "err": "  Physical volume
> \"/dev/mapper/vdo_{disklabel}p1\" still in use\n", "item": {"cachedisk":
> "/dev/sda4", "cachelvname":
> "cachelv_gluster_thinpool_gluster_vg_{disklabel}p1", "cachelvsize": "630G",
> "cachemetalvname": "cache_gluster_thinpool_gluster_vg_{disklabel}p1",
> "cachemetalvsize": "70G", "cachemode": "writeback", "cachethinpoolname":
> "gluster_thinpool_gluster_vg_{disklabel}p1", "vgname":
> "gluster_vg_{disklabel}p1"}, "msg": "Unable to reduce
> gluster_vg_{disklabel}p1 by /dev/dm-15.", "rc": 5}
>
> somewhere within that I see something that points to a race condition
> ("still in use").
>
> Unfortunately I have not been able to pinpoint the raw logs which are used
> at that stage and I wasn't able to obtain more info.
>
> At this point quite a bit of storage setup is already done, so rolling
> back for a clean new attempt, can be a bit complicated, with reboots to
> reconcile the kernel with data on disk.
>
> I don't actually believe it's related to single node and I'd be quite
> happy to move the creation of the SSD cache to a later stage, but in a VDO
> setup, this looks slightly complex to someone without intimate knowledge of
> LVS-with-cache-and-perhaps-thin/VDO/Gluster all thrown into one.
>
> Needless the feature set (SSD caching & compressed-dedup) sounds terribly
> attractive but when things don't just work, it's more terrifying.
>

Hi Thomas,

The way we have to write the variables for 2.8 while setting up cache.
Currently we are writing something like this:

gluster_infra_cache_vars:
- vgname: vg_sdb2
  cachedisk: /dev/sdb3
  cachelvname: cachelv_thinpool_vg_sdb2
  cachethinpoolname: thinpool_vg_sdb2
  cachelvsize: '10G'
  cachemetalvsize: '2G'
  cachemetalvname: cache_thinpool_vg_sdb2
  cachemode: writethrough
===
Not that cachedisk is provided as /dev/sdb3 which would be extended with vg
vg_sdb2 ... this works well
The module will take care of extending the vg with /dev/sdb3.

*However with Ansible-2.8 we cannot provide like this but have to be more
explicit. And have to mention the pv underlying*
*this volume group vg_sdb2. So, with respect to 2.8 we have to write that
variable like:*

>>>
  gluster_infra_cache_vars:
- vgname: vg_sdb2
  cachedisk: '/dev/sdb2,/dev/sdb3'
  cachelvname: cachelv_thinpool_vg_sdb2
  cachethinpoolname: thinpool_vg_sdb2
  cachelvsize: '10G'
  cachemetalvsize: '2G'
  cachemetalvname: cache_thinpool_vg_sdb2
  cachemode: writethrough
=

Note that I have mentioned both /dev/sdb2 and /dev/sdb3.
This change is backward compatible, that is it works with 2.7 as well. I
have raised an issue with Ansible as well.
Which can be found here: https://github.com/ansible/ansible/issues/56501

However, @olafbuitelaar has fixed this in gluster-ansible-infra, and the
patch is merged in master.
If you can checkout master branch, you should be fine.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/WU23D3OS4TLTX3R4FYRJC4NA6HRGF4C7/