[jira] [Created] (YUNIKORN-173) Generates one default application ID per namespace in the admission controller
Weiwei Yang created YUNIKORN-173: Summary: Generates one default application ID per namespace in the admission controller Key: YUNIKORN-173 URL: https://issues.apache.org/jira/browse/YUNIKORN-173 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Weiwei Yang Assignee: Weiwei Yang If app doesn't explicitly specify application ID, lets group such pods to one single app per namespace. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-173) Generates one default application ID per namespace in the admission controller
[ https://issues.apache.org/jira/browse/YUNIKORN-173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-173: Labels: pull-request-available (was: ) > Generates one default application ID per namespace in the admission controller > -- > > Key: YUNIKORN-173 > URL: https://issues.apache.org/jira/browse/YUNIKORN-173 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Labels: pull-request-available > > If app doesn't explicitly specify application ID, lets group such pods to > one single app per namespace. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-168) make test has become "chatty" due to verbose clean
[ https://issues.apache.org/jira/browse/YUNIKORN-168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-168: Labels: newbie pull-request-available (was: newbie) > make test has become "chatty" due to verbose clean > -- > > Key: YUNIKORN-168 > URL: https://issues.apache.org/jira/browse/YUNIKORN-168 > Project: Apache YuniKorn > Issue Type: Task > Components: build >Reporter: Wilfred Spiegelenburg >Assignee: Ting Yao,Huang >Priority: Major > Labels: newbie, pull-request-available > > The changes for the test target in the make file cause a large amount of log > to be spewed by the clean target when it tries to clean the build caches. > We need to clean that up and just log the results that we need not the output > of the clean. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-172) Unit tests cannot run with golang 1.13+
[ https://issues.apache.org/jira/browse/YUNIKORN-172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-172. Resolution: Duplicate > Unit tests cannot run with golang 1.13+ > --- > > Key: YUNIKORN-172 > URL: https://issues.apache.org/jira/browse/YUNIKORN-172 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler, shim - kubernetes >Reporter: Weiwei Yang >Priority: Major > > If go version is 1.13+, the unit tests will fail with some errors related to > flag. We need to get this fixed. I don't think we have any limitations at the > runtime to support 1.13+ versions. This is just a unit test issue, about how > we use CLI flags. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-159) Remove helm charts from the k8shim repo
[ https://issues.apache.org/jira/browse/YUNIKORN-159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kinga Marton updated YUNIKORN-159: -- Summary: Remove helm charts from the k8shim repo (was: remove helm charts from the k8shim repo) > Remove helm charts from the k8shim repo > --- > > Key: YUNIKORN-159 > URL: https://issues.apache.org/jira/browse/YUNIKORN-159 > Project: Apache YuniKorn > Issue Type: Task > Components: build >Reporter: Wilfred Spiegelenburg >Assignee: Kinga Marton >Priority: Major > > After we move the helm deployment to the release repo we should remove the > files from the k8shim repo. > This should include updating the > [documentation|https://github.com/apache/incubator-yunikorn-core/blob/master/docs/user-guide.md#quick-start] > to point to the correct place to find the helm charts and explain the > workings -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-159) Remove helm charts from the k8shim repo
[ https://issues.apache.org/jira/browse/YUNIKORN-159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-159: Labels: pull-request-available (was: ) > Remove helm charts from the k8shim repo > --- > > Key: YUNIKORN-159 > URL: https://issues.apache.org/jira/browse/YUNIKORN-159 > Project: Apache YuniKorn > Issue Type: Task > Components: build >Reporter: Wilfred Spiegelenburg >Assignee: Kinga Marton >Priority: Major > Labels: pull-request-available > > After we move the helm deployment to the release repo we should remove the > files from the k8shim repo. > This should include updating the > [documentation|https://github.com/apache/incubator-yunikorn-core/blob/master/docs/user-guide.md#quick-start] > to point to the correct place to find the helm charts and explain the > workings -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-140) Create helm chart repository
[ https://issues.apache.org/jira/browse/YUNIKORN-140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kinga Marton updated YUNIKORN-140: -- Summary: Create helm chart repository (was: Create helm chart repository and publish it to helmhub) > Create helm chart repository > > > Key: YUNIKORN-140 > URL: https://issues.apache.org/jira/browse/YUNIKORN-140 > Project: Apache YuniKorn > Issue Type: Improvement > Components: build >Reporter: Kinga Marton >Assignee: Kinga Marton >Priority: Major > Labels: pull-request-available > Fix For: 0.9 > > > Create a helm chart repository and publish the charts to > [https://hub.helm.sh/.|https://hub.helm.sh/] This will further help user to > try yunikorn out easily. > How to create the helm chart repository is described here: > [https://helm.sh/docs/topics/chart_repository/] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-159) Remove helm charts from the k8shim repo and update documentation accordingly
[ https://issues.apache.org/jira/browse/YUNIKORN-159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kinga Marton updated YUNIKORN-159: -- Summary: Remove helm charts from the k8shim repo and update documentation accordingly (was: Remove helm charts from the k8shim repo) > Remove helm charts from the k8shim repo and update documentation accordingly > > > Key: YUNIKORN-159 > URL: https://issues.apache.org/jira/browse/YUNIKORN-159 > Project: Apache YuniKorn > Issue Type: Task > Components: build >Reporter: Wilfred Spiegelenburg >Assignee: Kinga Marton >Priority: Major > Labels: pull-request-available > > After we move the helm deployment to the release repo we should remove the > files from the k8shim repo. > This should include updating the > [documentation|https://github.com/apache/incubator-yunikorn-core/blob/master/docs/user-guide.md#quick-start] > to point to the correct place to find the helm charts and explain the > workings -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-24) UT compatibility issue with GO 1.13.x
[ https://issues.apache.org/jira/browse/YUNIKORN-24?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-24: -- Priority: Minor (was: Major) > UT compatibility issue with GO 1.13.x > - > > Key: YUNIKORN-24 > URL: https://issues.apache.org/jira/browse/YUNIKORN-24 > Project: Apache YuniKorn > Issue Type: Test > Components: test - unit >Reporter: Wilfred Spiegelenburg >Priority: Minor > > UT starts to fail on flag after upgrade GO to 1.13.x. > This seems to be caused by us relying on undefined behaviour and fixes in the > go language. > Calling {{flag.Parse()}} in an init is considered an incorrect behaviour. > This is part of the 1.13 release notes: > [https://tip.golang.org/doc/go1.13#testing] > We currently call {{flag.Parse()}} in the init of the {{SchedulerConf}} which > we need to change. > * not to use {{init()}} functions, the cost of convenience is some > un-expected behaviors like this. > * remove these init functions and replace them with {{sync.once}} > {{init()}} functions could be problematic. However it is simpler than having > to check in each location that the package is initialised. Using > {{sync.once}} could add overhead or wrapping which we do not need. I think we > need to look at a combination of the two: {{init()}} when we know it is > package local initialisation only. If we have interaction with other packages > or optional initialisation use the {{sync.once}} > This {{flag.Parse()}} is a case of interacting with multiple packages, I > don't think we will have many more of these cases. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-24) UT compatibility issue with GO 1.13.x
[ https://issues.apache.org/jira/browse/YUNIKORN-24?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-24: -- Target Version: 0.9 > UT compatibility issue with GO 1.13.x > - > > Key: YUNIKORN-24 > URL: https://issues.apache.org/jira/browse/YUNIKORN-24 > Project: Apache YuniKorn > Issue Type: Test > Components: test - unit >Reporter: Wilfred Spiegelenburg >Priority: Minor > > UT starts to fail on flag after upgrade GO to 1.13.x. > This seems to be caused by us relying on undefined behaviour and fixes in the > go language. > Calling {{flag.Parse()}} in an init is considered an incorrect behaviour. > This is part of the 1.13 release notes: > [https://tip.golang.org/doc/go1.13#testing] > We currently call {{flag.Parse()}} in the init of the {{SchedulerConf}} which > we need to change. > * not to use {{init()}} functions, the cost of convenience is some > un-expected behaviors like this. > * remove these init functions and replace them with {{sync.once}} > {{init()}} functions could be problematic. However it is simpler than having > to check in each location that the package is initialised. Using > {{sync.once}} could add overhead or wrapping which we do not need. I think we > need to look at a combination of the two: {{init()}} when we know it is > package local initialisation only. If we have interaction with other packages > or optional initialisation use the {{sync.once}} > This {{flag.Parse()}} is a case of interacting with multiple packages, I > don't think we will have many more of these cases. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-160) release build on a fresh machine is failing
[ https://issues.apache.org/jira/browse/YUNIKORN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YUNIKORN-160: --- This is related to the travis setup for the web interface. We need a similar kind of setup to build the web repo when a new PR gets opened and we do not want to install all of yarn etc each time as that will become problematic. Can we re-use the docker image for both cases? > release build on a fresh machine is failing > --- > > Key: YUNIKORN-160 > URL: https://issues.apache.org/jira/browse/YUNIKORN-160 > Project: Apache YuniKorn > Issue Type: Bug >Reporter: Sunil G >Priority: Major > > While building ./build-docker-images.sh on a fresh machine, command is failing > {code:java} > Successfully tagged yunikorn/yunikorn-scheduler-admission-controller:latest > /Users/sgovindan/Work/releases/yunikorn/0.8.0/apache-yunikorn-0.8.0-incubating-src > yarn install && yarn build:prod > /bin/sh: yarn: command not found > make: *** [build-prod] Error 127 {code} > Proposal: > # Its not good to install "yarn" locally with the node version which we have > in machine manages the web build process. In YARNUI2, i have seen this error > regularly that some non-compatible package breaks the build > # Its better to do all the build process in docker which will have a sane > version compatibility of all support s/w versions -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-159) Remove helm charts from the k8shim repo and update documentation accordingly
[ https://issues.apache.org/jira/browse/YUNIKORN-159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113025#comment-17113025 ] Kinga Marton commented on YUNIKORN-159: --- I have removed the helm chart from he shim repo, updated the documentation and added the docker images to the web page. I haven't added the chart repository yet, because I think we should create a 0.8.0 release from the helm charts as well (+ upload it to helm up), and than we can add a download link for it as well. > Remove helm charts from the k8shim repo and update documentation accordingly > > > Key: YUNIKORN-159 > URL: https://issues.apache.org/jira/browse/YUNIKORN-159 > Project: Apache YuniKorn > Issue Type: Task > Components: build >Reporter: Wilfred Spiegelenburg >Assignee: Kinga Marton >Priority: Major > Labels: pull-request-available > > After we move the helm deployment to the release repo we should remove the > files from the k8shim repo. > This should include updating the > [documentation|https://github.com/apache/incubator-yunikorn-core/blob/master/docs/user-guide.md#quick-start] > to point to the correct place to find the helm charts and explain the > workings -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-117) Create event cache for queue and application events
[ https://issues.apache.org/jira/browse/YUNIKORN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113139#comment-17113139 ] Adam Antal commented on YUNIKORN-117: - Added {{EventMessage}} to the SI side in [this|https://github.com/apache/incubator-yunikorn-scheduler-interface/pull/17] pull request. > Create event cache for queue and application events > --- > > Key: YUNIKORN-117 > URL: https://issues.apache.org/jira/browse/YUNIKORN-117 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - cache, core - scheduler >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Critical > Labels: pull-request-available > > Create a simple preliminary implementation of the event cache of YUNIKORN-42. > We have the following limited scope for this task: > - implement it as a separate process from the scheduler (similar to > {{PartitionManager}}) > - only deal with queues and applications (the pods and nodes can be added > later) > - only store the apps last visited time from the scheduler > - clean up those objects that haven't been visited in the last 24h > Other cache implementations can be also considered. > As a starting point, channels are a safe choice to have async communication > with the scheduler without expecting bigger performance loss. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-171) Add example and document about how to run kubeflow with yunikorn
[ https://issues.apache.org/jira/browse/YUNIKORN-171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-171: Labels: pull-request-available (was: ) > Add example and document about how to run kubeflow with yunikorn > > > Key: YUNIKORN-171 > URL: https://issues.apache.org/jira/browse/YUNIKORN-171 > Project: Apache YuniKorn > Issue Type: Test > Components: documentation >Reporter: Weiwei Yang >Assignee: Ting Yao,Huang >Priority: Major > Labels: pull-request-available > > Provide some example and doc to demonstrate how to run kubeflow with yunikorn. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-155) data race in unit test: TestSchedulerRecoveryWhenPlacementRulesApplied
[ https://issues.apache.org/jira/browse/YUNIKORN-155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113179#comment-17113179 ] Kinga Marton commented on YUNIKORN-155: --- I wasn't able to reproduce this issue. I tried to run it multiple times, added some sleep to make the go routines slower, but the test passed each time and were no any race condition reported. >From debugging the test case and the attached stack trace I can see the >following things: * one goroutine (Goroutine 137) starts when starting the event handlers in scheduler.go: [https://github.com/apache/incubator-yunikorn-core/blob/master/pkg/scheduler/scheduler.go#L67] * the another one (Goroutine 135) is starting when we are starting event handlers is cluster-info.go: [https://github.com/apache/incubator-yunikorn-core/blob/master/pkg/cache/cluster_info.go#L69] There is one write and one read operation performed on the same ApplicationInfo object: * the write operation is while placing an application and set some fields on the ApplicationInfo. This is protected with a lock(Goroutine 137): [https://github.com/apache/incubator-yunikorn-core/blob/master/pkg/scheduler/placement/placement.go#L200] * the write operation is during logging in case of a SchedulerApplicationsUpdateEvent: [https://github.com/apache/incubator-yunikorn-core/blob/master/pkg/scheduler/scheduler.go#L176] > data race in unit test: TestSchedulerRecoveryWhenPlacementRulesApplied > -- > > Key: YUNIKORN-155 > URL: https://issues.apache.org/jira/browse/YUNIKORN-155 > Project: Apache YuniKorn > Issue Type: Test > Components: test - unit >Reporter: Wilfred Spiegelenburg >Assignee: Kinga Marton >Priority: Major > Attachments: data_race.txt > > > Testing shows a new data race while logging the queue name for an application > that gets added. > Details in the attached logs -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Comment Edited] (YUNIKORN-155) data race in unit test: TestSchedulerRecoveryWhenPlacementRulesApplied
[ https://issues.apache.org/jira/browse/YUNIKORN-155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113179#comment-17113179 ] Kinga Marton edited comment on YUNIKORN-155 at 5/21/20, 1:27 PM: - I wasn't able to reproduce this issue. I tried to run it multiple times, added some sleep to make the go routines slower, but the test passed each time and were no race condition reported. >From debugging the test case and the attached stack trace I can see the >following things: * one goroutine (Goroutine 137) starts when starting the event handlers in scheduler.go: [https://github.com/apache/incubator-yunikorn-core/blob/master/pkg/scheduler/scheduler.go#L67] * the another one (Goroutine 135) is starting when we are starting event handlers is cluster-info.go: [https://github.com/apache/incubator-yunikorn-core/blob/master/pkg/cache/cluster_info.go#L69] There is one write and one read operation performed on the same ApplicationInfo object: * the write operation is while placing an application and set some fields on the ApplicationInfo. This is protected with a lock(Goroutine 137): [https://github.com/apache/incubator-yunikorn-core/blob/master/pkg/scheduler/placement/placement.go#L200] * the write operation is during logging in case of a SchedulerApplicationsUpdateEvent: [https://github.com/apache/incubator-yunikorn-core/blob/master/pkg/scheduler/scheduler.go#L176] was (Author: kmarton): I wasn't able to reproduce this issue. I tried to run it multiple times, added some sleep to make the go routines slower, but the test passed each time and were no any race condition reported. >From debugging the test case and the attached stack trace I can see the >following things: * one goroutine (Goroutine 137) starts when starting the event handlers in scheduler.go: [https://github.com/apache/incubator-yunikorn-core/blob/master/pkg/scheduler/scheduler.go#L67] * the another one (Goroutine 135) is starting when we are starting event handlers is cluster-info.go: [https://github.com/apache/incubator-yunikorn-core/blob/master/pkg/cache/cluster_info.go#L69] There is one write and one read operation performed on the same ApplicationInfo object: * the write operation is while placing an application and set some fields on the ApplicationInfo. This is protected with a lock(Goroutine 137): [https://github.com/apache/incubator-yunikorn-core/blob/master/pkg/scheduler/placement/placement.go#L200] * the write operation is during logging in case of a SchedulerApplicationsUpdateEvent: [https://github.com/apache/incubator-yunikorn-core/blob/master/pkg/scheduler/scheduler.go#L176] > data race in unit test: TestSchedulerRecoveryWhenPlacementRulesApplied > -- > > Key: YUNIKORN-155 > URL: https://issues.apache.org/jira/browse/YUNIKORN-155 > Project: Apache YuniKorn > Issue Type: Test > Components: test - unit >Reporter: Wilfred Spiegelenburg >Assignee: Kinga Marton >Priority: Major > Attachments: data_race.txt > > > Testing shows a new data race while logging the queue name for an application > that gets added. > Details in the attached logs -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-24) UT compatibility issue with GO 1.13.x
[ https://issues.apache.org/jira/browse/YUNIKORN-24?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YUNIKORN-24: Priority: Major (was: Minor) > UT compatibility issue with GO 1.13.x > - > > Key: YUNIKORN-24 > URL: https://issues.apache.org/jira/browse/YUNIKORN-24 > Project: Apache YuniKorn > Issue Type: Test > Components: test - unit >Reporter: Wilfred Spiegelenburg >Priority: Major > > UT starts to fail on flag after upgrade GO to 1.13.x. > This seems to be caused by us relying on undefined behaviour and fixes in the > go language. > Calling {{flag.Parse()}} in an init is considered an incorrect behaviour. > This is part of the 1.13 release notes: > [https://tip.golang.org/doc/go1.13#testing] > We currently call {{flag.Parse()}} in the init of the {{SchedulerConf}} which > we need to change. > * not to use {{init()}} functions, the cost of convenience is some > un-expected behaviors like this. > * remove these init functions and replace them with {{sync.once}} > {{init()}} functions could be problematic. However it is simpler than having > to check in each location that the package is initialised. Using > {{sync.once}} could add overhead or wrapping which we do not need. I think we > need to look at a combination of the two: {{init()}} when we know it is > package local initialisation only. If we have interaction with other packages > or optional initialisation use the {{sync.once}} > This {{flag.Parse()}} is a case of interacting with multiple packages, I > don't think we will have many more of these cases. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-174) Fix SI constant change from YUNIKORN-135
Wilfred Spiegelenburg created YUNIKORN-174: -- Summary: Fix SI constant change from YUNIKORN-135 Key: YUNIKORN-174 URL: https://issues.apache.org/jira/browse/YUNIKORN-174 Project: Apache YuniKorn Issue Type: Bug Components: scheduler-interface Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg The build of the SI does not clean up correctly and leaves a file behind even if there are no changes (typo in Makefile) Multiple k8shim only constants are part of the spec which should not be in the SI. They are not shared between the k8shim and other shims or the core: * SparkLabelAppID * SparkLabelRole * SparkLabelRoleDriver * LabelApp * LabelApplicationID * LabelQueueName And we have no default resource types in the product. The core does not use them in the production code. They are only used in tests and we should remove them from tests also to highlight the fact that the core is resource type agnostic: * memory * vcore -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-175) remove memory and vcore references from resources in tests for core
Wilfred Spiegelenburg created YUNIKORN-175: -- Summary: remove memory and vcore references from resources in tests for core Key: YUNIKORN-175 URL: https://issues.apache.org/jira/browse/YUNIKORN-175 Project: Apache YuniKorn Issue Type: Bug Components: core - common, test - unit Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg The core is resource type agnostic. Lots of the core test however reference _memory_ and _vcore_ as if they were pre-defined types. There is no predefined type and we should not infer that there is a default type. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-176) schedulerCache might become inconsistent sometimes depending on the ordering of the events
Weiwei Yang created YUNIKORN-176: Summary: schedulerCache might become inconsistent sometimes depending on the ordering of the events Key: YUNIKORN-176 URL: https://issues.apache.org/jira/browse/YUNIKORN-176 Project: Apache YuniKorn Issue Type: Bug Components: shim - kubernetes Reporter: Weiwei Yang Assignee: Weiwei Yang Sometimes, we found some nodes are stuck at pending when working with the auto-scaler. Because some daemon set pods were pending to schedule. The root cause is: # auto-scaler scales up a node # the daemon set controller creates pod for e.g fluentd (it sets the pod.spec.nodeName="newly-added-host") # YK got informed from pod informer: add pod # add pod to cache (schedulerCache), since the {{pod.spec.nodeName}} is not nil, it adds a {{new nodeInfo}} # node informer got informed: add node # add node to scheduler cache, the node already exists, skip calling SetNode # scheduler tries to allocate the pod to the node # predicates failed: NodeUnknownCondition (node x doesn't exist in schedulerCache) # the allocation always fail and pod pending.. # since the daemon set pod could not be started, node status will be NotReady -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Updated] (YUNIKORN-176) schedulerCache might become inconsistent sometimes depending on the ordering of the events
[ https://issues.apache.org/jira/browse/YUNIKORN-176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YUNIKORN-176: Labels: pull-request-available (was: ) > schedulerCache might become inconsistent sometimes depending on the ordering > of the events > -- > > Key: YUNIKORN-176 > URL: https://issues.apache.org/jira/browse/YUNIKORN-176 > Project: Apache YuniKorn > Issue Type: Bug > Components: shim - kubernetes >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Labels: pull-request-available > > Sometimes, we found some nodes are stuck at pending when working with the > auto-scaler. Because some daemon set pods were pending to schedule. > The root cause is: > # auto-scaler scales up a node > # the daemon set controller creates pod for e.g fluentd (it sets the > pod.spec.nodeName="newly-added-host") > # YK got informed from pod informer: add pod > # add pod to cache (schedulerCache), since the {{pod.spec.nodeName}} is not > nil, it adds a {{new nodeInfo}} > # node informer got informed: add node > # add node to scheduler cache, the node already exists, skip calling SetNode > # scheduler tries to allocate the pod to the node > # predicates failed: NodeUnknownCondition (node x doesn't exist in > schedulerCache) > # the allocation always fail and pod pending.. > # since the daemon set pod could not be started, node status will be NotReady -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-148) Define API in scheduler interface to queue administration
[ https://issues.apache.org/jira/browse/YUNIKORN-148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang resolved YUNIKORN-148. -- Resolution: Won't Fix > Define API in scheduler interface to queue administration > - > > Key: YUNIKORN-148 > URL: https://issues.apache.org/jira/browse/YUNIKORN-148 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: scheduler-interface >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Labels: pull-request-available > > Expose queue mgmt API from scheduler interface. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-149) Watch K8s namespace and create unmanaged queues accordingly
[ https://issues.apache.org/jira/browse/YUNIKORN-149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113753#comment-17113753 ] Weiwei Yang commented on YUNIKORN-149: -- Same reason like YUNIKORN-148. > Watch K8s namespace and create unmanaged queues accordingly > --- > > Key: YUNIKORN-149 > URL: https://issues.apache.org/jira/browse/YUNIKORN-149 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: shim - kubernetes >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > > Watch K8s namespace object and create queues accordingly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Commented] (YUNIKORN-148) Define API in scheduler interface to queue administration
[ https://issues.apache.org/jira/browse/YUNIKORN-148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113752#comment-17113752 ] Weiwei Yang commented on YUNIKORN-148: -- The original PR is [https://github.com/apache/incubator-yunikorn-scheduler-interface/pull/16]. [~leftnoteasy], [~wilfreds], and me had a long discussion about pros and cons of using such admin APIs. And the conclusion is to abandon this for now. The pros of this approach are that we can have a more determinable API for namespace mapping, and more interactive with the clients. If we add queue CRD in the future, we can stop-fast if the quota is set with some issues. And explicitly let users know what's going on. The workflow looks like: # the user creates a queue CRD # shim react the add operation and call the addQueue API to create the queue # if failed, shim fails the operation and user will get an error from step 1 The cons are we already support the dynamical queues with placement rules, that means the yunikorn-core is taking in charge of queue creation/deletion. if we add such APIs, that means the shim will be responsible for this as well. Conceptional, it doesn't look like a placement rule (which only exists in the core). > Define API in scheduler interface to queue administration > - > > Key: YUNIKORN-148 > URL: https://issues.apache.org/jira/browse/YUNIKORN-148 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: scheduler-interface >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Labels: pull-request-available > > Expose queue mgmt API from scheduler interface. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-149) Watch K8s namespace and create unmanaged queues accordingly
[ https://issues.apache.org/jira/browse/YUNIKORN-149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang resolved YUNIKORN-149. -- Resolution: Won't Fix > Watch K8s namespace and create unmanaged queues accordingly > --- > > Key: YUNIKORN-149 > URL: https://issues.apache.org/jira/browse/YUNIKORN-149 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: shim - kubernetes >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > > Watch K8s namespace object and create queues accordingly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-171) Add example and document about how to run kubeflow with yunikorn
[ https://issues.apache.org/jira/browse/YUNIKORN-171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang resolved YUNIKORN-171. -- Fix Version/s: 0.9 Resolution: Fixed PR Merged, thanks! > Add example and document about how to run kubeflow with yunikorn > > > Key: YUNIKORN-171 > URL: https://issues.apache.org/jira/browse/YUNIKORN-171 > Project: Apache YuniKorn > Issue Type: Test > Components: documentation >Reporter: Weiwei Yang >Assignee: Ting Yao,Huang >Priority: Major > Labels: pull-request-available > Fix For: 0.9 > > > Provide some example and doc to demonstrate how to run kubeflow with yunikorn. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-177) SchedulerName doesn't need to be configurable
Weiwei Yang created YUNIKORN-177: Summary: SchedulerName doesn't need to be configurable Key: YUNIKORN-177 URL: https://issues.apache.org/jira/browse/YUNIKORN-177 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Weiwei Yang Currently, we allow user to overwrite the schedulerName. But this is not necessary, we should stick to \{{schedulerName=yunikorn}}. Lets revoke this from CLI options. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-81) Fix the kubernetes dashboard link in env-setup readme
[ https://issues.apache.org/jira/browse/YUNIKORN-81?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang resolved YUNIKORN-81. - Resolution: Later > Fix the kubernetes dashboard link in env-setup readme > - > > Key: YUNIKORN-81 > URL: https://issues.apache.org/jira/browse/YUNIKORN-81 > Project: Apache YuniKorn > Issue Type: Improvement > Components: documentation >Reporter: Ayub Pathan >Assignee: Ayub Pathan >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Documentation page which needs correction: > https://github.com/apache/incubator-yunikorn-core/blob/master/docs/setup/env-setup.md > Under deploy and access dashboard, the clickable link is pointing to below > URL: > http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/#!/login > The correct one should be > http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/#/login -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-178) Remove call to get config from the admission controller
Wilfred Spiegelenburg created YUNIKORN-178: -- Summary: Remove call to get config from the admission controller Key: YUNIKORN-178 URL: https://issues.apache.org/jira/browse/YUNIKORN-178 Project: Apache YuniKorn Issue Type: Bug Components: shim - kubernetes Reporter: Wilfred Spiegelenburg The admission controller does not have a possibility to process configuration from anywhere. However it does try to retrieve the scheduler name in by calling {{conf.GetSchedulerConf().SchedulerName}} in \{{updateSchedulerName}}. This will not work and should be replaced by a environment setting like what was done in YUNIKORN-28 for other values. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-179) Allow changing app id generation via option
Wilfred Spiegelenburg created YUNIKORN-179: -- Summary: Allow changing app id generation via option Key: YUNIKORN-179 URL: https://issues.apache.org/jira/browse/YUNIKORN-179 Project: Apache YuniKorn Issue Type: Bug Components: shim - kubernetes Reporter: Wilfred Spiegelenburg Currently a change to the way we generate a new app ID in the admission controller requires a code change. In YUNIKORN-173 we moved from a per app ID to a namespace based ID. We want to support both without a code change. This requires a re-instate of the app ID generation code which was removed and make it possible to switch between the two using an environment setting. We should end up with both ways in the code. We should default to the namespace generated ID -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org