Hi All,

I've got some issues with connecting my oVirt Cluster to my Ceph Cluster via 
iSCSI. There are two issues, and I don't know if one is causing the other, if 
they are related at all, or if they are two separate, unrelated issues. Let me 
explain.

The Situation
-------------
- I have a working three node Ceph Cluster (Ceph Quincy on Rocky Linux 8.6)
- The Ceph Cluster has four Storage Pools of between 4 and 8 TB each
- The Ceph Cluster has three iSCSI Gateways
- There is a single iSCSI Target on the Ceph Cluster
- The iSCSI Target has all three iSCSI Gateways attached
- The iSCSI Target has all four Storage Pools attached
- The four Storage Pools have been assigned LUNs 0-3
- I have set up (Discovery) CHAP Authorisation on the iSCSI Target
- I have a working three node self-hosted oVirt Cluster (oVirt v4.5.3 on Rocky 
Linux 8.6)
- The oVirt Cluster has (in addition to the hosted_storage Storage Domain) 
three GlusterFS Storage Domains
- I can ping all three Ceph Cluster Nodes to/from all three oVirt Hosts
- The iSCSI Target on the Ceph Cluster has all three oVirt Hosts Initiators 
attached
- Each Initiator has all four Ceph Storage Pools attached
- I have set up CHAP Authorisation on the iSCSI Target's Initiators
- The Ceph Cluster Admin Portal reports that all three Initiators are 
"logged_in"
- I have previous connected Ceph iSCSI LUNs to the oVirt Cluster successfully 
(as an experiment), but had to remove and re-instate them for the "final" 
version(?).
- The oVirt Admin Portal (ie HostedEngine) reports that Initiators are 1 & 2 
(ie oVirt Hosts 1 & 2) are "logged_in" to all three iSCSI Gateways
- The oVirt Admin Portal reports that Initiator 3 (ie oVirt Host 3) is 
"logged_in" to iSCSI Gateways 1 & 2
- I can "force" Initiator 3 to become "logged_in" to iSCSI Gateway 3, but when 
I do this it is *not* persistent
- oVirt Hosts 1 & 2 can/have discovered all three iSCSI Gateways
- oVirt Hosts 1 & 2 can/have discovered all four LUNs/Targets on all three 
iSCSI Gateways
- oVirt Host 3 can only discover 2 of the iSCSI Gateways
- For Target/LUN 0 oVirt Host 3 can only "see" the LUN provided by iSCSI 
Gateway 1
- For Targets/LUNs 1-3 oVirt Host 3 can only "see" the LUNs provided by iSCSI 
Gateways 1 & 2
- oVirt Host 3 can *not* "see" any of the Targets/LUNs provided by iSCSI 
Gateway 3
- When I create a new oVirt Storage Domain for any of the four LUNs:
  - I am presented with a message saying "The following LUNs are already in 
use..."
  - I am asked to "Approve operation" via a checkbox, which I do
  - As I watch the oVirt Admin Portal I can see the new iSCSI Storage Domain 
appear in the Storage Domain list, and then after a few minutes it is removed
  - After those few minutes I am presented with this failure message: "Error 
while executing action New SAN Storage Domain: Network error during 
communication with the Host."
- I have looked in the engine.log and all I could find that was relevant (as 
far as I know) was this:
~~~
2022-11-28 19:59:20,506+11 ERROR 
[org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStorageDomainVDSCommand] 
(default task-1) [77b0c12d] Command 'CreateStorageDomainVDSCommand(HostName = 
ovirt_node_1.mynet.local, 
CreateStorageDomainVDSCommandParameters:{hostId='967301de-be9f-472a-8e66-03c24f01fa71',
 storageDomain='StorageDomainStatic:{name='data', 
id='2a14e4bd-c273-40a0-9791-6d683d145558'}', 
args='s0OGKR-80PH-KVPX-Fi1q-M3e4-Jsh7-gv337P'})' execution failed: 
VDSGenericException: VDSNetworkException: Message timeout which can be caused 
by communication issues

2022-11-28 19:59:20,507+11 ERROR 
[org.ovirt.engine.core.bll.storage.domain.AddSANStorageDomainCommand] (default 
task-1) [77b0c12d] Command 
'org.ovirt.engine.core.bll.storage.domain.AddSANStorageDomainCommand' failed: 
EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: 
VDSGenericException: VDSNetworkException: Message timeout which can be caused 
by communication issues (Failed with error VDS_NETWORK_ERROR and code 5022)
~~~

I cannot see/detect any "communication issue" - but then again I'm not 100% 
sure what I should be looking for

I have looked on-line for an answer, and apart from not being able to get past 
Red Hat's "wall" to see the solutions that they have, all I could find that was 
relevant was this: 
https://lists.ovirt.org/archives/list/de...@ovirt.org/thread/AVLORQNOLJHRWMHTM4WCDRVP7VSIZBGR/
 . If this *is* relevant then there is not enough context here for me to 
proceed (ie/eg *where* (which host/vm) should that command be run?).

I also found (for a previous version of oVirt) notes about modifying the 
Postgres DB manual to resolve a similar issue. While I am more than comfortable 
doing this (I've been an SQL DBA for well over 20 years) this seems like asking 
for trouble - at least until I hear back from the oVirt Devs that this is OK to 
do - and of course, I'll need the relevant commands / locations / 
authorisations to get into the DB.

Questions
---------
- Are the two issues (oVirt Host 3 not having a full picture of the Ceph iSCSI 
environment and the oVirt iSCSI Storage Domain creation failure) related?
- Do I need to "refresh" the iSCSI info on the oVirt Hosts, and if so, how do I 
do this?
- Do I need to "flush" the old LUNs from the oVirt Cluster, and if so, how do I 
do this?
- Where else should I be looking for info in the logs (& which logs)?
- Does *anyone* have any other ideas how to resolve the situation - especially 
when using the Ceph iSCII Gateways?

Thanks in advance

Cheers

Dulux-Oz
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/MCOJD4R6PS4BKUTUM3BXYWSX5RDPWR2N/

Reply via email to