GitHub user dcontiveros-nf edited a comment on the discussion: Quick question 
about keystore (jks) requirement

A few updates here on where I am at in this process:

<ins>Goal</ins>
To be able to automate deployment of KVM nodes 

<ins>Blocker</ins>
I cannot seem to get the proper procedure in place to automate generation of a 
compatible keystore so CloudStack manager node can allow auth on it

<ins>Investigation</ins>
To get past this blocker, I attempted to find out if I could replicate the 
proper format of the JKS keystore that Cloudstack expects. Here is where I see 
the first issue.

**Issue 1**
After looking at the code, it appears that Cloudstack expects `cloud.jks` to be 
used, so I can only assume that it is not possible to separate keystore and 
truststore JKS files the way most Java apps are designed to do. That's ok. We 
can always concatenate and format the relevant certs and insert into the JKS 
file in the proper order. Still, this requires knowing how to setup the JKS 
file. I do believe it is possible to get something the management node expects. 
I looked at the mailing lists and saw a post make a reference to an issue that 
was opened back in 2020 with a similar goal. Here is the relevant issue and 
comments:

https://github.com/apache/cloudstack/issues/4199#issuecomment-681713147
https://github.com/apache/cloudstack/issues/4199#issuecomment-681740102

It appears this particular individual is automating creation of a PKCS#12 file 
with their certs, converting that to a JKS store, and then configuring 
Netty/Nio to use it. I attempted to create a Netty compatible JKS keystore with 
the following order (same as the comment):

```
PrivateKey
  Leaf Cert
  Intermediate
  Root CA
```

We have an existing pfx bundle and I did check the keystore had the above order 
but I still received SSL handshake errors via  Nio on the Agent side:

```
# KVM node
2025-05-01 17:02:02,589 INFO  [cloud.agent.Agent] (main:[]) (logid:) Attempted 
to connect to the server, but received an unexpected exception, trying again... 
com.cloud.utils.exception.NioConnectionException: SSL Handshake failed while 
connecting to host: HOST REDACTED: 8250

# Management node
2025-05-01 17:03:47,848 ERROR [c.c.u.n.Link] 
(AgentManager-SSLHandshakeHandler-1:[]) (logid:) SSL error caught during wrap 
data: Empty client certificate chain, for local address=/REDACTED:8250, remote 
address=/REDACTED:44838.

```

That last part of the management node is of relevance because of this issue:

https://github.com/apache/cloudstack/issues/5805

This implies that adding a KVM node is necessary via GUI, but we are attempting 
to automate this completely. Specifying `ca.plugin.root.auth.strictness=false` 
is probably not the way to go here since our team will be deploying numerous 
nodes. We have deployed the agent successfully, but are missing the proper 
format of the JKS file. Searching for Jetty or Nio didn't reveal much for the 
latest docs.

**Issue 2**
It appears that Cloudstack does not differentiate between truststores for https 
connectivity to the GUI and traffic flowing between management nodes and agent 
nodes. I searched the entire project for JKS and my IDE returned only instances 
to `cloud.jks` or whatever is defined as the JKS file in the variable 
`https.keystore`. I'm not certain if this is used for both Netty and Nio, but 
it appears as if it is. Please let me know if I am mistaken as I only took a 
cursory glance. This isn't really a MAJOR issue, but there may be some desire 
to seperate out these two types of traffic.



**Issue 3**
This is specific to KVM. It appears that helper scripts are used to register 
the KVM hypervisor node to the management agent. Docs here point to this:

> When a new host is being setup, such as adding a KVM host or starting a 
> systemvm host, the CA framework kicks in and uses ssh to execute 
> keystore-setup to generate a new keystore file cloud.jks.new, save a random 
> passphrase of the keystore in the agent’s properties file and a CSR cloud.csr 
> file. The CSR is then used to issue certificate for that agent/host and ssh 
> is used to execute keystore-cert-import to import the issued certificate 
> along with the CA certificate(s), the keystore is that renamed as cloud.jks 
> replacing an previous keystore in-use. During this process, keys and 
> certificates files are also stored in cloud.key, cloud.crt, cloud.ca.crt in 
> the agent’s configuration directory.

It appears the code to do this is performed with the utility scripts in this 
path:

`scripts/util/keystore-cert-import`
`scripts/util/keystore-setup`

Upon inspecting this file it appears that on registration, the manager will SSH 
into the host, run this script, setup a `cloud.jks`, generate a CSR for 
signing, and then inject that cert into the keystore. I see that it uses some 
hard coded attributes as well. I searched for any info relevant to interacting 
with the management node for signing a CSR, but there is almost no information 
in the docs. This makes me suspect I will run into the same issue when I call a 
REST API endpoint that either relies on a SystemVM or generates one. This is a 
bit alarming since we believe we will need to deploy some zones that may 
require RouterVMs. We are not certain as I cannot get this far into the 
automation or even registering one node. This part is possibly related to the  
*Issue 1* but chose to show its own issue since it is clearly stated in the 
docs that this procedure happens with SystemVMs (although not with that 
particular helper script). 

**Issue 4**
It appears the alias used when successfully storing information is `Cloud`. The 
helper script `scripts/util/keystore-setup` is a bash script, but when it is 
called the `ALIAS` variable isn't sourced from an argument but rather hardcoded 
in the script, making this inflexible. I had setup a single node cluster (one 
testbench machine) to get used to cloudstack UI prior to this endeavour. 
Luckily I still had access to this file. I see the following when I dump all 
the info in keytool:

```
Keystore type: PKCS12
Keystore provider: SUN

Your keystore contains 2 entries

Alias name: cloud
Entry type: PrivateKeyEntry

Alias name: cloudca.1
Entry type: trustedCertEntry

``` 
This matches code in:

`scripts/util/keystore-cert-import`

This makes me believe that prior to deployment we need to stage valid certs 
with additional aliases based on our Root CA for all agents/hypervisor nodes. 
Now again, this JKS file was from a single node cluster. I understand there may 
be some differences in how the JKS will ultimately look in a distributed 
fashion.

*Conclusion*
We are running into quite a bit of issues with authentication and would like to 
resolve these ASAP. Since we have our own desired PKI, this is making things a 
bit difficult for us since documentation is a bit lax on this topic. 

Any feedback is appreciated. I did post on Slack and saw these messages get 
sent out to the mailing list, so any help would be appreciated.


GitHub link: 
https://github.com/apache/cloudstack/discussions/10784#discussioncomment-13006349

----
This is an automatically sent email for users@cloudstack.apache.org.
To unsubscribe, please send an email to: users-unsubscr...@cloudstack.apache.org

Reply via email to