anuraagnalluri commented on a change in pull request #369:
URL:
https://github.com/apache/incubator-yunikorn-k8shim/pull/369#discussion_r806327376
##########
File path: test/e2e/basic_scheduling/basic_scheduling_test.go
##########
@@ -70,6 +70,16 @@ var _ = ginkgo.Describe("", func() {
Ω(err3).NotTo(HaveOccurred())
Ω(d).NotTo(BeNil())
+ ginkgo.By("Restart scheduler pod")
+ _, err4 := kClient.ScaleDeployment(configmanager.YKScheduler,
0, configmanager.YuniKornTestConfig.YkNamespace)
+ gomega.Ω(err4).NotTo(gomega.HaveOccurred())
+ err5 :=
kClient.WaitForPodBySelectorTerminated(configmanager.YuniKornTestConfig.YkNamespace,
fmt.Sprintf("component=%s", configmanager.YKScheduler), 60)
Review comment:
I'm really not sure why this timeout had to be 60. Waiting for pods to
be fully terminated (this does not correspond to "terminating" in k8s state
diagram, but rather failing an existence check for any pods with the `selector`
labels in the given `namespace`) takes around 10-12 seconds. But even setting
30 here leads to a `timeout exceeded` error. Wasn't sure how to debug this or
estimate the value more properly.
##########
File path: test/e2e/basic_scheduling/basic_scheduling_test.go
##########
@@ -70,6 +70,16 @@ var _ = ginkgo.Describe("", func() {
Ω(err3).NotTo(HaveOccurred())
Ω(d).NotTo(BeNil())
+ ginkgo.By("Restart scheduler pod")
+ _, err4 := kClient.ScaleDeployment(configmanager.YKScheduler,
0, configmanager.YuniKornTestConfig.YkNamespace)
Review comment:
Note k8s has no API call to bounce a pod. I'm simply scaling the
replicas in the scheduler deployment to 0 and then back to 1.
##########
File path: test/e2e/basic_scheduling/basic_scheduling_test.go
##########
@@ -70,6 +70,16 @@ var _ = ginkgo.Describe("", func() {
Ω(err3).NotTo(HaveOccurred())
Ω(d).NotTo(BeNil())
+ ginkgo.By("Restart scheduler pod")
+ _, err4 := kClient.ScaleDeployment(configmanager.YKScheduler,
0, configmanager.YuniKornTestConfig.YkNamespace)
Review comment:
Note k8s has no API call to bounce a pod. We're simply scaling the
replicas in the scheduler deployment to 0 and then back to 1.
##########
File path: test/e2e/framework/helpers/k8s/k8s_utils.go
##########
@@ -295,14 +308,19 @@ func (k *KubeCtl) ListPods(namespace string, selector
string) (*v1.PodList, erro
}
// Wait up to timeout seconds for all pods in 'namespace' with given
'selector' to enter running state.
-// Returns an error if no pods are found or not all discovered pods enter
running state.
-func (k *KubeCtl) WaitForPodBySelectorRunning(namespace string, selector
string, timeout int) error {
+// Returns an error if no pods are found when 'wait' is false or not all
discovered pods enter running state within the 'timeout' duration.
+// If 'wait' is true, error will not be returned if no pods are found. Pods
will be continually listed until there is a non-empty list
+// to iterate over.
+func (k *KubeCtl) WaitForPodBySelectorRunning(namespace string, selector
string, timeout int, wait bool) error {
Review comment:
We add this `wait` parameter because other invocations of
`WaitForPodBySelectorRunning` are directly after calls to `CreatePod`. This
allows for the pod object to be returned by the API server at time of
execution. In the newly added code above to restart the scheduler pod, there is
a latency between scaling the deployment back to 1 and the ensuing call to
create a pod, meaning the object is not readily available to be returned by the
API server. In this case, `WaitForPodBySelectorRunning` will just error
immediately since there are no pods with the given `selector`, requiring us to
change the behavior with a flag.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]