This is an automated email from the ASF dual-hosted git repository.
yuchaoran pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/yunikorn-site.git
The following commit(s) were added to refs/heads/master by this push:
new 9d5c98828 [YUNIKORN-1355] Generic example of GPU scheduling with
Yunikorn (#200)
9d5c98828 is described below
commit 9d5c988281c25ff6ea755b3552e28a5d497880d0
Author: KatLantyss <[email protected]>
AuthorDate: Thu Dec 1 14:30:42 2022 +0800
[YUNIKORN-1355] Generic example of GPU scheduling with Yunikorn (#200)
---
docs/assets/yunikorn-gpu-time-slicing.png | Bin 0 -> 40653 bytes
docs/user_guide/workloads/run_nvidia.md | 346 +++++++++++++++++++++++++
docs/user_guide/workloads/run_tensorflow.md | 244 +++++++++--------
docs/user_guide/workloads/workload_overview.md | 1 +
sidebars.js | 1 +
5 files changed, 468 insertions(+), 124 deletions(-)
diff --git a/docs/assets/yunikorn-gpu-time-slicing.png
b/docs/assets/yunikorn-gpu-time-slicing.png
new file mode 100644
index 000000000..8b3d734a4
Binary files /dev/null and b/docs/assets/yunikorn-gpu-time-slicing.png differ
diff --git a/docs/user_guide/workloads/run_nvidia.md
b/docs/user_guide/workloads/run_nvidia.md
new file mode 100644
index 000000000..644910851
--- /dev/null
+++ b/docs/user_guide/workloads/run_nvidia.md
@@ -0,0 +1,346 @@
+---
+id: run_nvidia
+title: Run NVIDIA GPU Jobs
+description: How to run generic example of GPU scheduling with Yunikorn.
+keywords:
+ - NVIDIA GPU
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## Yunikorn with NVIDIA GPUs
+This guide gives an overview of how to set up NVIDIA Device Plugin which
enable user to run GPUs with Yunikorn, for more details please check
[**Kubernetes with
GPUs**](https://docs.nvidia.com/datacenter/cloud-native/kubernetes/install-k8s.html#option-2-installing-kubernetes-using-kubeadm).
+
+### Prerequisite
+Before following the steps below, Yunikorn need to deploy on the [**Kubernetes
with
GPUs**](https://docs.nvidia.com/datacenter/cloud-native/kubernetes/install-k8s.html#install-kubernetes).
+
+### Install NVIDIA Device Plugin
+Add the nvidia-device-plugin helm repository.
+```
+helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
+helm repo update
+helm repo list
+```
+
+Verify the latest release version of the plugin is available.
+```
+helm search repo nvdp --devel
+NAME CHART VERSION APP VERSION DESCRIPTION
+nvdp/nvidia-device-plugin 0.12.3 0.12.3 A Helm chart
for ...
+```
+
+Deploy the device plugin
+```
+kubectl create namespace nvidia
+helm install --generate-name nvdp/nvidia-device-plugin --namespace nvidia
--version 0.12.3
+```
+
+Check the status of the pods to ensure NVIDIA device plugin is running
+```
+kubectl get pods -A
+
+NAMESPACE NAME READY STATUS
RESTARTS AGE
+kube-flannel kube-flannel-ds-j24fx 1/1 Running 1
(11h ago) 11h
+kube-system coredns-78fcd69978-2x9l8 1/1 Running 1
(11h ago) 11h
+kube-system coredns-78fcd69978-gszrw 1/1 Running 1
(11h ago) 11h
+kube-system etcd-katlantyss-nzxt 1/1 Running 3
(11h ago) 11h
+kube-system kube-apiserver-katlantyss-nzxt 1/1 Running 4
(11h ago) 11h
+kube-system kube-controller-manager-katlantyss-nzxt 1/1 Running 3
(11h ago) 11h
+kube-system kube-proxy-4wz7r 1/1 Running 1
(11h ago) 11h
+kube-system kube-scheduler-katlantyss-nzxt 1/1 Running 4
(11h ago) 11h
+kube-system nvidia-device-plugin-1659451060-c92sb 1/1 Running 1
(11h ago) 11h
+```
+
+### Testing NVIDIA Device Plugin
+Create a gpu test yaml file.
+```
+# gpu-pod.yaml
+ apiVersion: v1
+ kind: Pod
+ metadata:
+ name: gpu-operator-test
+ spec:
+ restartPolicy: OnFailure
+ containers:
+ - name: cuda-vector-add
+ image: "nvidia/samples:vectoradd-cuda10.2"
+ resources:
+ limits:
+ nvidia.com/gpu: 1
+```
+Deploy the application.
+```
+kubectl apply -f gpu-pod.yaml
+```
+Check the logs to ensure the app completed successfully.
+```
+kubectl get pods gpu-operator-test
+
+NAME READY STATUS RESTARTS AGE
+gpu-operator-test 0/1 Completed 0 9d
+```
+Check the result.
+```
+kubectl logs gpu-operator-test
+
+[Vector addition of 50000 elements]
+Copy input data from the host memory to the CUDA device
+CUDA kernel launch with 196 blocks of 256 threads
+Copy output data from the CUDA device to the host memory
+Test PASSED
+Done
+```
+
+---
+## Enable GPU Time-Slicing (Optional)
+GPU time-slicing allow multi-tenant to share single GPU.
+To know how the GPU time-slicing works, please refer to [**Time-Slicing GPUs
in
Kubernetes**](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/gpu-sharing.html#introduction).
This page covers ways to enable GPU scheduling in Yunikorn using [**NVIDIA GPU
Operator**](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/gpu-operator).
+
+
+### Configuration
+Specify multiple configurations in a `ConfigMap` as in the following example.
+```yaml
+# time-slicing-config.yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+ name: time-slicing-config
+ namespace: nvidia
+data:
+ a100-40gb: |-
+ version: v1
+ sharing:
+ timeSlicing:
+ resources:
+ - name: nvidia.com/gpu
+ replicas: 8
+ - name: nvidia.com/mig-1g.5gb
+ replicas: 2
+ - name: nvidia.com/mig-2g.10gb
+ replicas: 2
+ - name: nvidia.com/mig-3g.20gb
+ replicas: 3
+ - name: nvidia.com/mig-7g.40gb
+ replicas: 7
+ rtx-3070: |-
+ version: v1
+ sharing:
+ timeSlicing:
+ resources:
+ - name: nvidia.com/gpu
+ replicas: 8
+```
+
+:::note
+If the GPU type in nodes do not include the a100-40gb or rtx-3070, you could
modify the yaml file based on existing GPU types.
+For example, there are only multiple rtx-2080ti in the local kubernetes
cluster.
+MIG is not supported by rtx-2080ti, so it could not replace the a100-40gb.
+Time slicing is supported by rtx-2080ti, so it could replace rtx-3070.
+:::
+
+:::info
+MIG support was added to Kubernetes in 2020. Refer to [**Supporting MIG in
Kubernetes**](https://www.google.com/url?q=https://docs.google.com/document/d/1mdgMQ8g7WmaI_XVVRrCvHPFPOMCm5LQD5JefgAh6N8g/edit&sa=D&source=editors&ust=1655578433019961&usg=AOvVaw1F-OezvM-Svwr1lLsdQmu3)
for details on how this works.
+:::
+
+Create a `ConfigMap` in the operator namespace.
+```bash
+kubectl create namespace nvidia
+kubectl create -f time-slicing-config.yaml
+```
+
+### Install NVIDIA GPU Operator
+Add the nvidia-gpu-operator helm repository.
+```bash
+helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
+helm repo update
+helm repo list
+```
+
+Enabling shared access to GPUs with the NVIDIA GPU Operator.
+- During fresh install of the NVIDIA GPU Operator with time-slicing enabled.
+ ```bash
+ helm install gpu-operator nvidia/gpu-operator \
+ -n nvidia \
+ --set devicePlugin.config.name=time-slicing-config
+ ```
+
+- For dynamically enabling time-slicing with GPU Operator already installed.
+ ```bash
+ kubectl patch clusterpolicy/cluster-policy \
+ -n nvidia --type merge \
+ -p '{"spec": {"devicePlugin": {"config": {"name": "time-slicing-config"}}}}'
+ ```
+
+### Applying the Time-Slicing Configuration
+There are two methods:
+- Across the cluster
+
+ Install the GPU Operator by passing the time-slicing `ConfigMap` name and
the default configuration.
+ ```bash
+ kubectl patch clusterpolicy/cluster-policy \
+ -n nvidia --type merge \
+ -p '{"spec": {"devicePlugin": {"config": {"name": "time-slicing-config",
"default": "rtx-3070"}}}}'
+ ```
+
+- On certain nodes
+
+ Label the node with the required time-slicing configuration in the
`ConfigMap`.
+ ```bash
+ kubectl label node <node-name> nvidia.com/device-plugin.config=rtx-3070
+ ```
+
+Once the GPU Operator and Time-Slicing GPUs is installed, check the status of
the pods to ensure all the containers are running and the validation is
complete.
+```bash
+kubectl get pods -n nvidia
+```
+
+```bash
+NAME READY STATUS
RESTARTS AGE
+gpu-feature-discovery-qbslx 2/2 Running
0 20h
+gpu-operator-7bdd8bf555-7clgv 1/1 Running
0 20h
+gpu-operator-node-feature-discovery-master-59b4b67f4f-q84zn 1/1 Running
0 20h
+gpu-operator-node-feature-discovery-worker-n58dv 1/1 Running
0 20h
+nvidia-container-toolkit-daemonset-8gv44 1/1 Running
0 20h
+nvidia-cuda-validator-tstpk 0/1
Completed 0 20h
+nvidia-dcgm-exporter-pgk7v 1/1 Running
1 20h
+nvidia-device-plugin-daemonset-w8hh4 2/2 Running
0 20h
+nvidia-device-plugin-validator-qrpxx 0/1
Completed 0 20h
+nvidia-operator-validator-htp6b 1/1 Running
0 20h
+```
+Verify that the time-slicing configuration is applied successfully.
+```bash
+kubectl describe node <node-name>
+```
+
+```bash
+...
+Capacity:
+ nvidia.com/gpu: 8
+...
+Allocatable:
+ nvidia.com/gpu: 8
+...
+```
+
+### Testing GPU Time-Slicing
+Create a wordload test file `plugin-test.yaml`.
+```yaml
+# plugin-test.yaml
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+ name: nvidia-plugin-test
+ labels:
+ app: nvidia-plugin-test
+spec:
+ replicas: 5
+ selector:
+ matchLabels:
+ app: nvidia-plugin-test
+ template:
+ metadata:
+ labels:
+ app: nvidia-plugin-test
+ spec:
+ tolerations:
+ - key: nvidia.com/gpu
+ operator: Exists
+ effect: NoSchedule
+ containers:
+ - name: dcgmproftester11
+ image: nvidia/samples:dcgmproftester-2.1.7-cuda11.2.2-ubuntu20.04
+ command: ["/bin/sh", "-c"]
+ args:
+ - while true; do /usr/bin/dcgmproftester11 --no-dcgm-validation -t
1004 -d 300; sleep 30; done
+ resources:
+ limits:
+ nvidia.com/gpu: 1
+ securityContext:
+ capabilities:
+ add: ["SYS_ADMIN"]
+```
+
+Create a deployment with multiple replicas.
+```bash
+kubectl apply -f plugin-test.yaml
+```
+
+Verify that all five replicas are running.
+- In pods
+ ```bash
+ kubectl get pods
+ ```
+
+ ```bash
+ NAME READY STATUS RESTARTS AGE
+ nvidia-plugin-test-677775d6c5-bpsvn 1/1 Running 0 8m8s
+ nvidia-plugin-test-677775d6c5-m95zm 1/1 Running 0 8m8s
+ nvidia-plugin-test-677775d6c5-9kgzg 1/1 Running 0 8m8s
+ nvidia-plugin-test-677775d6c5-lrl2c 1/1 Running 0 8m8s
+ nvidia-plugin-test-677775d6c5-9r2pz 1/1 Running 0 8m8s
+ ```
+- In node
+ ```bash
+ kubectl describe node <node-name>
+ ```
+
+ ```bash
+ ...
+ Allocated resources:
+ (Total limits may be over 100 percent, i.e., overcommitted.)
+ Resource Requests Limits
+ -------- -------- ------
+ ...
+ nvidia.com/gpu 5 5
+ ...
+ ```
+- In NVIDIA system management Interface
+ ```bash
+ nvidia-smi
+ ```
+
+ ```bash
+
+-----------------------------------------------------------------------------+
+ | NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8
|
+
|-------------------------------+----------------------+----------------------+
+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr.
ECC |
+ | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute
M. |
+ | | | MIG
M. |
+
|===============================+======================+======================|
+ | 0 NVIDIA GeForce ... On | 00000000:01:00.0 On |
N/A |
+ | 46% 86C P2 214W / 220W | 4297MiB / 8192MiB | 100%
Default |
+ | | |
N/A |
+
+-------------------------------+----------------------+----------------------+
+
+
+-----------------------------------------------------------------------------+
+ | Processes:
|
+ | GPU GI CI PID Type Process name GPU
Memory |
+ | ID ID Usage
|
+
|=============================================================================|
+ | 0 N/A N/A 1776886 C /usr/bin/dcgmproftester11
764MiB |
+ | 0 N/A N/A 1776921 C /usr/bin/dcgmproftester11
764MiB |
+ | 0 N/A N/A 1776937 C /usr/bin/dcgmproftester11
764MiB |
+ | 0 N/A N/A 1777068 C /usr/bin/dcgmproftester11
764MiB |
+ | 0 N/A N/A 1777079 C /usr/bin/dcgmproftester11
764MiB |
+
+-----------------------------------------------------------------------------+
+ ```
+
+- In Yunikorn UI applications
+
diff --git a/docs/user_guide/workloads/run_tensorflow.md
b/docs/user_guide/workloads/run_tensorflow.md
index 152068bd1..c1375759f 100644
--- a/docs/user_guide/workloads/run_tensorflow.md
+++ b/docs/user_guide/workloads/run_tensorflow.md
@@ -92,141 +92,137 @@ please read the document
[here](../../get_started/get_started.md#access-the-web-

-## Using Time-Slicing GPU
-
-### Prerequisite
-To use Time-Slicing GPU your cluster must be configured to use GPUs and
Time-Slicing GPUs.
-- Nodes must have GPUs attached.
-- Kubernetes version 1.24
-- GPU drivers must be installed on the cluster
-- Use the [GPU
Operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/getting-started.html)
to automatically setup and manage the NVIDA software components on the worker
nodes.
-- Set the Configuration of [Time-Slicing GPUs in
Kubernetes](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/gpu-sharing.html)
-
-
-
-Once the GPU Operator and Time-Slicing GPUs is installed, check the status of
the pods to ensure all the containers are running and the validation is
complete :
-```shell script
-kubectl get pod -n gpu-operator
-```
-```shell script
-NAME READY STATUS
RESTARTS AGE
-gpu-feature-discovery-fd5x4 2/2 Running
0 5d2h
-gpu-operator-569d9c8cb-kbn7s 1/1 Running
14 (39h ago) 5d2h
-gpu-operator-node-feature-discovery-master-84c7c7c6cf-f4sxz 1/1 Running
0 5d2h
-gpu-operator-node-feature-discovery-worker-p5plv 1/1 Running
8 (39h ago) 5d2h
-nvidia-container-toolkit-daemonset-zq766 1/1 Running
0 5d2h
-nvidia-cuda-validator-5tldf 0/1
Completed 0 5d2h
-nvidia-dcgm-exporter-95vm8 1/1 Running
0 5d2h
-nvidia-device-plugin-daemonset-7nzvf 2/2 Running
0 5d2h
-nvidia-device-plugin-validator-gj7nn 0/1
Completed 0 5d2h
-nvidia-operator-validator-nz84d 1/1 Running
0 5d2h
-```
-Verify that the time-slicing configuration is applied successfully :
+## Run a TensorFlow job with GPU scheduling
+To use Time-Slicing GPU your cluster must be configured to use [GPUs and
Time-Slicing
GPUs](https://yunikorn.apache.org/docs/next/user_guide/workloads/run_nvidia)
+This section covers a workload test scenario to validate TFJob with
Time-slicing GPU.
-```shell script
+:::note
+Verify that the time-slicing configuration is applied successfully
+```bash
kubectl describe node
```
-```shell script
+```bash
Capacity:
- nvidia.com/gpu: 16
+ nvidia.com/gpu: 8
...
Allocatable:
- nvidia.com/gpu: 16
+ nvidia.com/gpu: 8
...
```
-### Testing TensorFlow job with GPUs
-This section covers a workload test scenario to validate TFJob with
Time-slicing GPU.
+:::
-1. Create a workload test file `tf-gpu.yaml` as follows:
- ```shell script
- vim tf-gpu.yaml
+Create a workload test file `tf-gpu.yaml`
+```yaml
+# tf-gpu.yaml
+apiVersion: "kubeflow.org/v1"
+kind: "TFJob"
+metadata:
+ name: "tf-smoke-gpu"
+ namespace: kubeflow
+spec:
+ tfReplicaSpecs:
+ PS:
+ replicas: 1
+ template:
+ metadata:
+ creationTimestamp:
+ labels:
+ applicationId: "tf_job_20200521_001"
+ spec:
+ schedulerName: yunikorn
+ containers:
+ - args:
+ - python
+ - tf_cnn_benchmarks.py
+ - --batch_size=32
+ - --model=resnet50
+ - --variable_update=parameter_server
+ - --flush_stdout=true
+ - --num_gpus=1
+ - --local_parameter_device=cpu
+ - --device=cpu
+ - --data_format=NHWC
+ image:
docker.io/kubeflow/tf-benchmarks-cpu:v20171202-bdab599-dirty-284af3
+ name: tensorflow
+ ports:
+ - containerPort: 2222
+ name: tfjob-port
+ workingDir: /opt/tf-benchmarks/scripts/tf_cnn_benchmarks
+ restartPolicy: OnFailure
+ Worker:
+ replicas: 1
+ template:
+ metadata:
+ creationTimestamp: null
+ labels:
+ applicationId: "tf_job_20200521_001"
+ spec:
+ schedulerName: yunikorn
+ containers:
+ - args:
+ - python
+ - tf_cnn_benchmarks.py
+ - --batch_size=32
+ - --model=resnet50
+ - --variable_update=parameter_server
+ - --flush_stdout=true
+ - --num_gpus=1
+ - --local_parameter_device=cpu
+ - --device=gpu
+ - --data_format=NHWC
+ image:
docker.io/kubeflow/tf-benchmarks-gpu:v20171202-bdab599-dirty-284af3
+ name: tensorflow
+ ports:
+ - containerPort: 2222
+ name: tfjob-port
+ resources:
+ limits:
+ nvidia.com/gpu: 2
+ workingDir: /opt/tf-benchmarks/scripts/tf_cnn_benchmarks
+ restartPolicy: OnFailure
+```
+Create the TFJob
+```bash
+kubectl apply -f tf-gpu.yaml
+kubectl get pods -n kubeflow
+```
+```bash
+NAME READY STATUS RESTARTS AGE
+tf-smoke-gpu-ps-0 1/1 Running 0 18m
+tf-smoke-gpu-worker-0 1/1 Running 0 18m
+training-operator-7d98f9dd88-dd45l 1/1 Running 0 19m
+```
+
+Verify that TFJob are running.
+- In pod logs
+ ```bash
+ kubectl logs tf-smoke-gpu-worker-0 -n kubeflow
```
- ```yaml
- apiVersion: "kubeflow.org/v1"
- kind: "TFJob"
- metadata:
- name: "tf-smoke-gpu"
- namespace: kubeflow
- spec:
- tfReplicaSpecs:
- PS:
- replicas: 1
- template:
- metadata:
- creationTimestamp:
- labels:
- applicationId: "tf_job_20200521_001"
- spec:
- schedulerName: yunikorn
- containers:
- - args:
- - python
- - tf_cnn_benchmarks.py
- - --batch_size=32
- - --model=resnet50
- - --variable_update=parameter_server
- - --flush_stdout=true
- - --num_gpus=1
- - --local_parameter_device=cpu
- - --device=cpu
- - --data_format=NHWC
- image:
docker.io/kubeflow/tf-benchmarks-cpu:v20171202-bdab599-dirty-284af3
- name: tensorflow
- ports:
- - containerPort: 2222
- name: tfjob-port
- workingDir: /opt/tf-benchmarks/scripts/tf_cnn_benchmarks
- restartPolicy: OnFailure
- Worker:
- replicas: 1
- template:
- metadata:
- creationTimestamp: null
- labels:
- applicationId: "tf_job_20200521_001"
- spec:
- schedulerName: yunikorn
- containers:
- - args:
- - python
- - tf_cnn_benchmarks.py
- - --batch_size=32
- - --model=resnet50
- - --variable_update=parameter_server
- - --flush_stdout=true
- - --num_gpus=1
- - --local_parameter_device=cpu
- - --device=gpu
- - --data_format=NHWC
- image:
docker.io/kubeflow/tf-benchmarks-gpu:v20171202-bdab599-dirty-284af3
- name: tensorflow
- ports:
- - containerPort: 2222
- name: tfjob-port
- resources:
- limits:
- nvidia.com/gpu: 2
- workingDir: /opt/tf-benchmarks/scripts/tf_cnn_benchmarks
- restartPolicy: OnFailure
```
-2. Create the TFJob
- ```shell script
- kubectl apply -f tf-gpu.yaml
+ .......
+ ..Found device 0 with properties
+ ..name: NVIDIA GeForce RTX 3080 major: 8 minor: 6 memoryClockRate(GHz): 1.71
+
+ .......
+ ..Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: NVIDIA
GeForce RTX 3080, pci bus id: 0000:01:00.0, compute capability: 8.6)
+ .......
+ ```
+
+- In node
+ ```bash
+ ...
+ Allocated resources:
+ (Total limits may be over 100 percent, i.e., overcommitted.)
+ Resource Requests Limits
+ -------- -------- ------
+ ...
+ nvidia.com/gpu 2 2
+ ...
```
-3. Verify that TFJob are running on YuniKorn:
+
+- In Yunikorn UI applications

- Check the log of the pod:
- ```shell script
- kubectl logs logs po/tf-smoke-gpu-worker-0 -n kubeflow
- ```
- ```
- .......
- ..Found device 0 with properties:
- ..name: NVIDIA GeForce RTX 3080 major: 8 minor: 6 memoryClockRate(GHz):
1.71
-
- .......
- ..Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: NVIDIA
GeForce RTX 3080, pci bus id: 0000:01:00.0, compute capability: 8.6)
- .......
- ```
- 
\ No newline at end of file
+
+
+
diff --git a/docs/user_guide/workloads/workload_overview.md
b/docs/user_guide/workloads/workload_overview.md
index c8722c7fe..7040e79bd 100644
--- a/docs/user_guide/workloads/workload_overview.md
+++ b/docs/user_guide/workloads/workload_overview.md
@@ -53,6 +53,7 @@ omitted as it will be set automatically on newly created pods.
Examples of more advanced use cases can be found here:
+* [Run NVIDIA GPU Jobs](run_nvidia)
* [Run Spark Jobs](run_spark)
* [Run Flink Jobs](run_flink)
* [Run TensorFlow Jobs](run_tf)
diff --git a/sidebars.js b/sidebars.js
index 7f8eb922c..9e3ad5fbd 100644
--- a/sidebars.js
+++ b/sidebars.js
@@ -36,6 +36,7 @@ module.exports = {
label: 'Workloads',
items: [
'user_guide/workloads/workload_overview',
+ 'user_guide/workloads/run_nvidia',
'user_guide/workloads/run_spark',
'user_guide/workloads/run_flink',
'user_guide/workloads/run_tf',