[jira] [Commented] (AMBARI-26532) Add Model Context Protocol (MCP) Server for AI-Driven Cluster Management
[
https://issues.apache.org/jira/browse/AMBARI-26532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18057598#comment-18057598
]
Nikita Pande commented on AMBARI-26532:
---
I've developed a prototype MCP server for Ambari in GoLang:
[mcp-ambari|https://github.com/nikita15p/mcp-ambari]
*Key enhancements over potential Typescript alternative* : -
- Built with official MCP Go SDK for standards compliance and efficiency
- Separates readonly tools (e.g., status queries) from actionable ones (e.g.,
service ops) for secure AI interactions
- Supports stdio, TLS, mTLS transports for flexible deployments
- Includes pre-defined workflows, prompts, resources, and tools tailored to
Ambari ops
This aligns directly with AMBARI-26532 goals, offering superior performance for
large-scale Hadoop clusters.
> Add Model Context Protocol (MCP) Server for AI-Driven Cluster Management
>
>
> Key: AMBARI-26532
> URL: https://issues.apache.org/jira/browse/AMBARI-26532
> Project: Ambari
> Issue Type: New Feature
>Reporter: Nikita Pande
>Assignee: Nikita Pande
>Priority: Major
>
> Integrating Ambari with MCP is not merely a technical exercise; it unlocks a
> new paradigm of cluster management, shifting from manual, UI-driven
> operations to conversational, automated, and ultimately autonomous control.
> This transformation enables a range of high-value use cases that can
> dramatically reduce operational overhead and democratize administrative
> expertise.
> * *Natural Language Diagnostics & Troubleshooting:* This is the most
> immediate and compelling use case. Administrators, regardless of their
> expertise level, can interact with the cluster in plain English to diagnose
> issues. Instead of navigating through multiple screens in the Ambari UI or
> crafting complex {{curl}} commands, they can simply ask questions. For
> instance:
> ** _"Why did the HDFS service health check fail on node '?"_
> ** _"Show me all CRITICAL alerts from the last 24 hours related to YARN."_
> ** _"What is the current heap usage of the NameNode, and how does it compare
> to yesterday?"_ To answer these, an AI agent would leverage MCP {{Resources}}
> to fetch health reports, alert histories, and performance metrics from
> Ambari, then use its reasoning capabilities to synthesize a coherent,
> human-readable answer.
> * *Automated and Agentic Remediation:* Moving beyond diagnosis, this
> integration empowers AI agents to take corrective actions. This creates a
> "self-healing" capability for the cluster. An agent can be instructed to
> execute complex remediation workflows that involve a chain of actions and
> checks. For example:
>
> ** _"The NameNode is in standby. Investigate the logs for critical errors.
> If none are found within the last 15 minutes, attempt a restart and confirm
> it becomes active. Notify the support channel in chat interface with the
> result."_ This workflow would require the agent to chain multiple MCP
> {{Tool}} calls: get logs ({{{}Resource{}}}), analyze them (LLM reasoning),
> restart the service ({{{}Tool{}}}), and check its status ({{{}Resource{}}}),
> demonstrating a sophisticated, agentic process.
> * *Conversational Configuration and Security Audits:* Complex configuration
> changes and security hardening are often error-prone. A conversational
> interface simplifies these tasks significantly.
>
> ** _"Increase the YARN NodeManager memory to 32GB on all worker nodes and
> then perform a rolling restart of the YARN service."_
> *
> ** _"Audit the cluster for security compliance. List all services that do
> not have Kerberos enabled and generate the sequence of API calls required to
> configure them."_ These commands would be translated by the agent into a
> series of {{updateServiceConfig}} and {{restartService}} tool calls, executed
> in the correct order.
> * *Declarative Provisioning via Conversation:* This use case represents an
> evolution of Ambari Blueprints, making cluster provisioning more accessible.
> An administrator could describe the desired cluster in high-level terms, and
> the AI agent would handle the low-level details of creating the Blueprint
> JSON.
>
> ** _"Provision a new 5-node test cluster using . The
> cluster should include HDFS, YARN, and Spark. Designate 'master01' as the
> master node with the NameNode and ResourceManager, and the rest as worker
> nodes with DataNodes and NodeManagers."_ The agent would parse this request,
> generate the corresponding Blueprint, and use an MCP {{Tool}} to submit it to
> the Ambari API, initiating the cluster deployment.
> * *Proposed Solution:* This feature proposes the development and integration
> of a new, standalone {*}Ambari MCP Server{
[jira] [Commented] (AMBARI-26532) Add Model Context Protocol (MCP) Server for AI-Driven Cluster Management
[
https://issues.apache.org/jira/browse/AMBARI-26532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18039797#comment-18039797
]
Nikita Pande commented on AMBARI-26532:
---
Hi [~call518] thanks for showing interest in the feature :)
I was working on a POC
[https://github.com/nikita15p/ambari-mcp-server.|https://github.com/nikita15p/ambari-mcp-server]
This has been tested with cline as MCP client and lists all the tools and
resources as part of MCP Servers.
Below is a table explicitly segregating *MCP tools* and the *MCP resources*
they manage, following your request and the ambari-mcp-server code/doc
structure:.
||MCP Tool Name||MCP Resource Operated On||
|ambari_clusters_getclusters|Cluster Collection|
|ambari_clusters_getcluster|Single Cluster|
|ambari_clusters_createcluster|Cluster|
|ambari_services_getservices|Cluster Services|
|ambari_services_getservice|Single Service|
|ambari_services_getservicestate|Service State|
|ambari_services_startservice|Service|
|ambari_services_stopservice|Service|
|ambari_services_getserviceswithstaleconfigs|Services (Stale Configs)|
|ambari_services_gethostcomponentswithstaleconfigs|Host Components (Stale
Configs)|
|ambari_services_restartservice|Service|
|ambari_services_restartcomponents|Component(s)|
|ambari_services_getrollingrestartstatus|Rolling Restart Status|
|ambari_services_enablemaintenancemode|Service/Component|
|ambari_services_disablemaintenancemode|Service/Component|
|ambari_services_runservicecheck|Service|
|ambari_services_isservicechecksupported|Service|
|ambari_services_getservicecheckstatus|Service Check|
|ambari_hosts_gethosts|Host Collection|
|ambari_hosts_gethost|Single Host|
|ambari_alerts_gettargets|Alert Targets|
|ambari_alerts_getalerts|Alerts|
|ambari_alerts_getalertsummary|Alert Summary|
|ambari_alerts_getalertdetails|Alert Definition|
|ambari_alerts_getalertdefinitions|Alert Definitions|
|ambari_alerts_updatealertdefinition|Alert Definition|
|ambari_alerts_getalertgroups|Alert Groups|
|ambari_alerts_createalertgroup|Alert Group|
|ambari_alerts_updatealertgroup|Alert Group|
|ambari_alerts_deletealertgroup|Alert Group|
|ambari_alerts_duplicatealertgroup|Alert Group|
|ambari_alerts_adddefinitiontogroup|Alert Group/Definition|
|ambari_alerts_removedefinitionfromgroup|Alert Group/Definition|
|ambari_alerts_getnotifications|Notification Targets|
|ambari_alerts_createnotification|Notification Target|
|ambari_alerts_updatenotification|Notification Target|
|ambari_alerts_deletenotification|Notification Target|
|ambari_alerts_addnotificationtogroup|Alert Group/Notification Target|
|ambari_alerts_removenotificationfromgroup|Alert Group/Notification Target|
|ambari_alerts_savealertsettings|Cluster Alert Settings|
* {*}MCP Tools{*}: The function/endpoints callable by the MCP client or AI via
the server.
* {*}MCP Resources{*}: The specific Ambari objects (clusters, hosts, services,
components, alert configs, notification targets, etc.) on which those tools
operate.
Limitations:
Currently, authentication details are managed and stored locally before being
passed to the system. Our roadmap includes implementing a full suite of
authentication and authorization mechanisms supported by Ambari, such as LDAP,
Kerberos, and Active Directory, etc integration, to enhance security and
flexibility.
> Add Model Context Protocol (MCP) Server for AI-Driven Cluster Management
>
>
> Key: AMBARI-26532
> URL: https://issues.apache.org/jira/browse/AMBARI-26532
> Project: Ambari
> Issue Type: New Feature
>Reporter: Nikita Pande
>Assignee: Nikita Pande
>Priority: Major
>
> Integrating Ambari with MCP is not merely a technical exercise; it unlocks a
> new paradigm of cluster management, shifting from manual, UI-driven
> operations to conversational, automated, and ultimately autonomous control.
> This transformation enables a range of high-value use cases that can
> dramatically reduce operational overhead and democratize administrative
> expertise.
> * *Natural Language Diagnostics & Troubleshooting:* This is the most
> immediate and compelling use case. Administrators, regardless of their
> expertise level, can interact with the cluster in plain English to diagnose
> issues. Instead of navigating through multiple screens in the Ambari UI or
> crafting complex {{curl}} commands, they can simply ask questions. For
> instance:
> ** _"Why did the HDFS service health check fail on node '?"_
> ** _"Show me all CRITICAL alerts from the last 24 hours related to YARN."_
> ** _"What is the current heap usage of the NameNode, and how does it compare
> to yesterday?"_ To answer these, an AI agent would leverage MCP {{Resources}}
> to fetch health reports, alert histories, and performance metrics fro
[jira] [Commented] (AMBARI-26532) Add Model Context Protocol (MCP) Server for AI-Driven Cluster Management
[
https://issues.apache.org/jira/browse/AMBARI-26532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18023090#comment-18023090
]
JungJungIn commented on AMBARI-26532:
-
Hi there — I’ve been working on a small PoC project called *MCP-Ambari-API*
(GitHub: https://github.com/call518/MCP-Ambari-API) which implements a
lightweight MCP-style interface on top of Ambari REST. Although it’s still in
early stages, it already covers a subset of functionality aligned with what
this JIRA proposes, so I wanted to share some experiences and suggestions.
*What MCP-Ambari-API currently supports (in my PoC):*
* Mapping Ambari read APIs (hosts, services, configs, metrics) to MCP
“Resources” endpoints
* Some write operations (e.g. update configurations, restart services) mapped
as MCP “Tools”
* Simple chaining/aggregation logic for small workflows
*Overlap / alignment with AMBARI-26532’s vision:*
* The read-only “Observer” role (expose cluster state, metrics, alert history)
is something I’ve partially implemented
* The “Operator” role (perform actions via MCP Tools) is also in scope
* The concept of translating conversational or agentic workflows into
orchestration over Ambari APIs is very much in the same spirit
*Gaps / limitations vs the full proposal in AMBARI-26532:*
* I don’t yet support natural language interpretation / LLM bridging
* No designed “Prompts” abstraction (complex multi-step workflows) in full
generality
* No full conversational UI or autonomous agent loop
* Limited error handling, security, concurrency, transactionality
*Suggestions / lessons learned from building the PoC:*
# Design a clear mapping layer between MCP primitives (Resource / Tool /
Prompt) and Ambari REST endpoints. A lot of complexity lies in reconciling
Ambari’s semantics (configs, versions, service/component dependencies).
# For multi-step workflows (Prompts), it helps to support templated workflows
(with parameters) rather than fully dynamic planning in first cut.
# Security / authentication boundary is critical. In my PoC, I had to
carefully gate write operations and respect Ambari’s RBAC.
# Metrics / alert data often need time-windowed querying and aggregation —
consider what time-series or summarization primitives MCP “Resources” need.
# Robustness: retries, fallback logic, partial rollbacks are important,
especially when chaining tools.
I’d be happy to contribute parts of MCP-Ambari-API (or collaborate) to the
official implementation of this feature. If maintainers are open, I can try a
pull request or prototype extension.
Thanks for opening this issue. It’s an exciting direction for making Ambari
more “agentic”.
> Add Model Context Protocol (MCP) Server for AI-Driven Cluster Management
>
>
> Key: AMBARI-26532
> URL: https://issues.apache.org/jira/browse/AMBARI-26532
> Project: Ambari
> Issue Type: New Feature
>Reporter: Nikita Pande
>Assignee: Nikita Pande
>Priority: Major
>
> Integrating Ambari with MCP is not merely a technical exercise; it unlocks a
> new paradigm of cluster management, shifting from manual, UI-driven
> operations to conversational, automated, and ultimately autonomous control.
> This transformation enables a range of high-value use cases that can
> dramatically reduce operational overhead and democratize administrative
> expertise.
> * *Natural Language Diagnostics & Troubleshooting:* This is the most
> immediate and compelling use case. Administrators, regardless of their
> expertise level, can interact with the cluster in plain English to diagnose
> issues. Instead of navigating through multiple screens in the Ambari UI or
> crafting complex {{curl}} commands, they can simply ask questions. For
> instance:
> ** _"Why did the HDFS service health check fail on node '?"_
> ** _"Show me all CRITICAL alerts from the last 24 hours related to YARN."_
> ** _"What is the current heap usage of the NameNode, and how does it compare
> to yesterday?"_ To answer these, an AI agent would leverage MCP {{Resources}}
> to fetch health reports, alert histories, and performance metrics from
> Ambari, then use its reasoning capabilities to synthesize a coherent,
> human-readable answer.
> * *Automated and Agentic Remediation:* Moving beyond diagnosis, this
> integration empowers AI agents to take corrective actions. This creates a
> "self-healing" capability for the cluster. An agent can be instructed to
> execute complex remediation workflows that involve a chain of actions and
> checks. For example:
>
> ** _"The NameNode is in standby. Investigate the logs for critical errors.
> If none are found within the last 15 minutes, attempt a restart and confirm
> it becomes active. Notify the support channel in
