Hi, I'm also interested by this feature, as we are going to build some PCI compliant cluster soon, and currently it's almost impossible to track who is doing what.
(we some infos for task, but not for all change actions without task, and tasks are not centralised). At minimum, logging the raw api request url && content (removing credentials && other critical infos ) from pveproxy with timestamp && user && ip could be a first step. Personnaly, I could manage to parse them to put them in a siem better could be to have a structured log with action,object,... indeed but I Don't have too much time to work on this currently :/ Alexandre -------- Message initial -------- De: Thomas Skinner <[email protected]> Répondre à: Proxmox VE development discussion <pve- [email protected]> À: Proxmox VE development discussion <[email protected]> Objet: [pve-devel] PVE Auditing System Date: 26/01/2026 04:03:46 Hello! I'm looking to implement an auditing system for PVE to help organizations better understand the actions performed by users via the API. In reference to the conversation in bug #4244, it seems that there's not currently any development on an auditing system. I am with Fabian in that we need to flesh out a design before going too far into development. Below are Thomas Lamprecht's thoughts from a couple years ago with my comments inline. ---- > - [ ] Auditing Framework > - [ ] Explore some auditing projects and possibly some (security) > standard > requirements about what could be a good feature set and > design, and > about what is a requirement to have to help users with > strict > requirements/rules on such things (e.g., gov agencies) A lot of what I've seen have been requirements to be able to adjust/configure success/failure auditing for elevated privileges, access control CRUD (user/group/domain/ACL), and other organizationally defined requirements (a catch-all for subjectively _important_ happening in the application). The logs must be in a standardized format including entities associated with the event and an accurate representation of what/when/where actions occurred. Some of the requirements in the US stem from PCI, NIST (specifically 800r53), and HIPAA. > - [ ] Probably add some (root only) log format on disk that can > be > filtered, rotated and allows configuring some guarantees > for how long > stuff is saved I would say leave the size, rotation, and retention up to the user. The way that logrotate is already used in the pveproxy logs should already be sufficiently configurable for this. In some fiddling so far, I found it easiest to spit out a least privilege log file for each of pveproxy, pvedaemon, and spiceproxy all in JSON format, which is highly ingestible and extendable later. Ultimately, I think the output should be to a file that some log ingester could read for aggregation on another system if required. > - [ ] Then, one probably wants hook/trace on every config change > of guests > and node relevant stuff with an signature like: `($type, > $id, > $change-key, $old, $new)` where `$type` and `$change-key` > to be > considered API (no arbitrary changes of existing ones) and > `$old` and > `$new` are arbitrary (scalar, hash/array ref). A good hook spot looks like in the `rest_handler` function in `PVE/HTTPServer.pm` of the pve-manager package. I'd propose having a call outside of the eval so that it can handle both success and failures in the logs. In my experience so far, it's necessary to log from each proxy endpoint because of the way that validation/permission check occurs: e.g. pvedaemon won't ever see a failure due to a permissions check from pveproxy because the code returns before it's ever proxied over. Putting the logging function here means there is risk of the function potentially not returning a valid response on an audit log failure, but I have seen some requirements where stopping the application when it cannot audit is a requirement (need to make this configurable and safe). Another option that I've thought of is having another daemon run (call it pveauditd), which receives messages from different PVE processes with audit logging information. This requires some interprocess communication, but could potentially reduce any kind of lag because the daemon could buffer messages and it has the I/O wait instead of the calling process. Permissions lockdown on files is easier here, too. Hook would probably still be in the same spot, but audit failures would be handled differently. A nice pro would be that other PVE processes could communicate log messages this way, too. I could use a little guidance/example of existing code if the developers want to go this route. As far as fields go, I think there are some steadfast required fields: - Datetime in UTC - Subject (who performed) - Object (identifier of object modified and its type) - Action (what did the subject attempt to/successfully do to the object) - Status (success/failure) - Node name or IP (where it was performed, could be implied by where the log resides) - Source IP (where the request originated) - Process name and ID (name could be implied in log file name) Some other interesting fields would be: - Before/after whole objects (mentioned specifically in bug #4244): I'd recommend this to be configurable because I think it would require two extra calls: one to retrieve the object before and another after it's modified. - Considerations: sensitive fields (especially credentials) would need to be redacted (character replacement or hashed) - What changed on the object: this would be a breakdown of the differences of the above objects (either fields that changed or the actual changes that were made) - Considerations: requires at least the information above and may be redundant if the above is included - API parameters: this would include the parameters passed to the API endpoint; development side, this is easy to include and would be useful for determining how an object was changed and is less costly than the before/after model. - Considerations: sensitive fields (especially credentials) would need to be redacted (character replacement or hashed) - Event ID: some auditing systems (particularly Microsoft's) use a unique ID for every different type of audit event. - Pros: translations of and custom formatted messages, easily processed/filtered by automated systems, ID mapping could be used across multiple products (e.g. pve and datacenter manager); - Cons: uniqueness of use must be guaranteed, not always human friendly. - Considerations: having this field could potentially eliminate the need for action/object-type fields if each action/object-type combo has its own ID; format of the ID For inclusion of auditing into each API endpoint, I think an addition to the `method_info` construct for each method would be appropriate. Some advanced validations could be done during build-time to ensure uniqueness of action/object type or event ID. > - [ ] Allow one to enable or disable auditing on some/all > guests/nodes, and > disable it by default due to cost I completely agree on this one, and I'd argue that it should be built-in from the start. The default could easily be to not audit anything explicitly, which should have minimal impact on runtime. It would be nice to have configuration of the auditing performed in the API and synced across the cluster filesystem. I'd recommend node-level overrides to cluster-level settings. If done in the API, this should have a separate permission or API path. My idea for implementation here is to have a hash built during application startup to load all of the action/object types or event IDs and lookup their status in said hash. The hash could either be updated or reloaded on changes in auditing settings. > - [ ] Add interface to view and filter audit events Consistent format in the log files should make this easier on the dev side. Event IDs or action/object combos could be used to determine translatable message formats to be shown to the user. Interfaces should have either a separate permission or API path (restricting this even to administrators is a common requirement). Cluster-level view would be nice with a reasonable default for how many messages to retrieve per node. An option to retrieve more for each node would be nice. Node-level view could also be adapted, similar to how the VM events are currently shown. > - [ ] Allow to produce notification's for an audit filter I'd use the same logic as for the view/filter interface above. Event IDs might make this easier. ---- I appreciate you all taking the time to review and reply to this thread. An auditing system would be a great addition to the PVE project that makes it even more enterprise friendly. In the implementation that I'm thinking of, this would be an addition to current logging files, not changing any existing formats, which should make it a non-breaking change. -- Thomas Skinner _______________________________________________ pve-devel mailing list [email protected] https://antiphishing.vadesecure.com/v4?f=UU9IMnpkdmJmSk9YOHVtc4dIojYDUv LV6XM-ZTz6kO34v9hOKBNC5McGBoKc- 9Ac30vEPrYvhvsDwhzAsWY28Q&i=Z09lbVhROVBaOHd3NjZZMXTyhEWpbxjWhT4l9zgx8wI &k=eFRI&r=eVVJWkdic3NtZTdZV2pycd04EJJIfQDw_CUOMKv0xfC7S5IJE5zoY3EGIQTIr heU&s=008c3d307c135abe0e89161d50159e31d110d214c743218a85e61720ad4a9663& u=https%3A%2F%2Flists.proxmox.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fpve- devel
