GitHub user dcontiveros-nf edited a comment on the discussion: Quick question about keystore (jks) requirement
A few updates here on where I am at in this process: <ins>Goal</ins> To be able to automate deployment of KVM nodes <ins>Blocker</ins> I cannot seem to get the proper procedure in place to automate generation of a compatible keystore so CloudStack manager node can allow auth on it <ins>Investigation</ins> To get past this blocker, I attempted to find out if I could replicate the proper format of the JKS keystore that Cloudstack expects. Here is where I see the first issue. **Issue 1** After looking at the code, it appears that Cloudstack expects `cloud.jks` to be used, so I can only assume that it is not possible to separate keystore and truststore JKS files the way most Java apps are designed to do. That's ok. We can always concatenate and format the relevant certs and insert into the JKS file in the proper order. Still, this requires knowing how to setup the JKS file. I do believe it is possible to get something the management node expects. I looked at the mailing lists and saw a post make a reference to an issue that was opened back in 2020 with a similar goal. Here is the relevant issue and comments: https://github.com/apache/cloudstack/issues/4199#issuecomment-681713147 https://github.com/apache/cloudstack/issues/4199#issuecomment-681740102 It appears this particular individual is automating creation of a PKCS#12 file with their certs, converting that to a JKS store, and then configuring Netty/Nio to use it. I attempted to create a Netty compatible JKS keystore with the following order (same as the comment): ``` PrivateKey Leaf Cert Intermediate Root CA ``` We have an existing pfx bundle and I did check the keystore had the above order but I still received SSL handshake errors via Nio on the Agent side: ``` # KVM node 2025-05-01 17:02:02,589 INFO [cloud.agent.Agent] (main:[]) (logid:) Attempted to connect to the server, but received an unexpected exception, trying again... com.cloud.utils.exception.NioConnectionException: SSL Handshake failed while connecting to host: HOST REDACTED: 8250 # Management node 2025-05-01 17:03:47,848 ERROR [c.c.u.n.Link] (AgentManager-SSLHandshakeHandler-1:[]) (logid:) SSL error caught during wrap data: Empty client certificate chain, for local address=/REDACTED:8250, remote address=/REDACTED:44838. ``` That last part of the management node is of relevance because of this issue: https://github.com/apache/cloudstack/issues/5805 This implies that adding a KVM node is necessary via GUI, but we are attempting to automate this completely. Specifying `ca.plugin.root.auth.strictness=false` is probably not the way to go here since our team will be deploying numerous nodes. We have deployed the agent successfully, but are missing the proper format of the JKS file. Searching for Jetty or Nio didn't reveal much for the latest docs. **Issue 2** It appears that Cloudstack does not differentiate between truststores for https connectivity to the GUI and traffic flowing between management nodes and agent nodes. I searched the entire project for JKS and my IDE returned only instances to `cloud.jks` or whatever is defined as the JKS file in the variable `https.keystore`. I'm not certain if this is used for both Netty and Nio, but it appears as if it is. Please let me know if I am mistaken as I only took a cursory glance. This isn't really a MAJOR issue, but there may be some desire to seperate out these two types of traffic. **Issue 3** This is specific to KVM. It appears that helper scripts are used to register the KVM hypervisor node to the management agent. Docs here point to this: > When a new host is being setup, such as adding a KVM host or starting a > systemvm host, the CA framework kicks in and uses ssh to execute > keystore-setup to generate a new keystore file cloud.jks.new, save a random > passphrase of the keystore in the agent’s properties file and a CSR cloud.csr > file. The CSR is then used to issue certificate for that agent/host and ssh > is used to execute keystore-cert-import to import the issued certificate > along with the CA certificate(s), the keystore is that renamed as cloud.jks > replacing an previous keystore in-use. During this process, keys and > certificates files are also stored in cloud.key, cloud.crt, cloud.ca.crt in > the agent’s configuration directory. It appears the code to do this is performed with the utility scripts in this path: `scripts/util/keystore-cert-import` `scripts/util/keystore-setup` Upon inspecting this file it appears that on registration, the manager will SSH into the host, run this script, setup a `cloud.jks`, generate a CSR for signing, and then inject that cert into the keystore. I see that it uses some hard coded attributes as well. I searched for any info relevant to interacting with the management node for signing a CSR, but there is almost no information in the docs. This makes me suspect I will run into the same issue when I call a REST API endpoint that either relies on a SystemVM or generates one. This is a bit alarming since we believe we will need to deploy some zones that may require RouterVMs. We are not certain as I cannot get this far into the automation or even registering one node. This part is possibly related to the *Issue 1* but chose to show its own issue since it is clearly stated in the docs that this procedure happens with SystemVMs (although not with that particular helper script). **Issue 4** It appears the alias used when successfully storing information is `Cloud`. The helper script `scripts/util/keystore-setup` is a bash script, but when it is called the `ALIAS` variable isn't sourced from an argument but rather hardcoded in the script, making this inflexible. I had setup a single node cluster (one testbench machine) to get used to cloudstack UI prior to this endeavour. Luckily I still had access to this file. I see the following when I dump all the info in keytool: ``` Keystore type: PKCS12 Keystore provider: SUN Your keystore contains 2 entries Alias name: cloud Entry type: PrivateKeyEntry Alias name: cloudca.1 Entry type: trustedCertEntry ``` This matches code in: `scripts/util/keystore-cert-import` This makes me believe that prior to deployment we need to stage valid certs with additional aliases based on our Root CA for all agents/hypervisor nodes. Now again, this JKS file was from a single node cluster. I understand there may be some differences in how the JKS will ultimately look in a distributed fashion. *Conclusion* We are running into quite a bit of issues with authentication and would like to resolve these ASAP. Since we have our own desired PKI, this is making things a bit difficult for us since documentation is a bit lax on this topic. Any feedback is appreciated. I did post on Slack and saw these messages get sent out to the mailing list, so any help would be appreciated. GitHub link: https://github.com/apache/cloudstack/discussions/10784#discussioncomment-13006349 ---- This is an automatically sent email for users@cloudstack.apache.org. To unsubscribe, please send an email to: users-unsubscr...@cloudstack.apache.org