[jira] [Created] (YUNIKORN-173) Generates one default application ID per namespace in the admission controller

2020-05-21 Thread Weiwei Yang (Jira)
Weiwei Yang created YUNIKORN-173:


 Summary: Generates one default application ID per namespace in the 
admission controller
 Key: YUNIKORN-173
 URL: https://issues.apache.org/jira/browse/YUNIKORN-173
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: shim - kubernetes
Reporter: Weiwei Yang
Assignee: Weiwei Yang


If app doesn't explicitly specify application ID,  lets group such pods to one 
single app per namespace.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-173) Generates one default application ID per namespace in the admission controller

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-173:

Labels: pull-request-available  (was: )

> Generates one default application ID per namespace in the admission controller
> --
>
> Key: YUNIKORN-173
> URL: https://issues.apache.org/jira/browse/YUNIKORN-173
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
>  Labels: pull-request-available
>
> If app doesn't explicitly specify application ID,  lets group such pods to 
> one single app per namespace.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-168) make test has become "chatty" due to verbose clean

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-168:

Labels: newbie pull-request-available  (was: newbie)

> make test has become "chatty" due to verbose clean
> --
>
> Key: YUNIKORN-168
> URL: https://issues.apache.org/jira/browse/YUNIKORN-168
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: build
>Reporter: Wilfred Spiegelenburg
>Assignee: Ting Yao,Huang
>Priority: Major
>  Labels: newbie, pull-request-available
>
> The changes for the test target in the make file cause a large amount of log 
> to be spewed by the clean target when it tries to clean the build caches.
> We need to clean that up and just log the results that we need not the output 
> of the clean.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-172) Unit tests cannot run with golang 1.13+

2020-05-21 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-172.

Resolution: Duplicate

> Unit tests cannot run with golang 1.13+
> ---
>
> Key: YUNIKORN-172
> URL: https://issues.apache.org/jira/browse/YUNIKORN-172
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler, shim - kubernetes
>Reporter: Weiwei Yang
>Priority: Major
>
> If go version is 1.13+, the unit tests will fail with some errors related to 
> flag. We need to get this fixed. I don't think we have any limitations at the 
> runtime to support 1.13+ versions. This is just a unit test issue, about how 
> we use CLI flags.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-159) Remove helm charts from the k8shim repo

2020-05-21 Thread Kinga Marton (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kinga Marton updated YUNIKORN-159:
--
Summary: Remove helm charts from the k8shim repo  (was: remove helm charts 
from the k8shim repo)

> Remove helm charts from the k8shim repo
> ---
>
> Key: YUNIKORN-159
> URL: https://issues.apache.org/jira/browse/YUNIKORN-159
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: build
>Reporter: Wilfred Spiegelenburg
>Assignee: Kinga Marton
>Priority: Major
>
> After we move the helm deployment to the release repo we should remove the 
> files from the k8shim repo.
>  This should include updating the 
> [documentation|https://github.com/apache/incubator-yunikorn-core/blob/master/docs/user-guide.md#quick-start]
>  to point to the correct place to find the helm charts and explain the 
> workings



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-159) Remove helm charts from the k8shim repo

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-159:

Labels: pull-request-available  (was: )

> Remove helm charts from the k8shim repo
> ---
>
> Key: YUNIKORN-159
> URL: https://issues.apache.org/jira/browse/YUNIKORN-159
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: build
>Reporter: Wilfred Spiegelenburg
>Assignee: Kinga Marton
>Priority: Major
>  Labels: pull-request-available
>
> After we move the helm deployment to the release repo we should remove the 
> files from the k8shim repo.
>  This should include updating the 
> [documentation|https://github.com/apache/incubator-yunikorn-core/blob/master/docs/user-guide.md#quick-start]
>  to point to the correct place to find the helm charts and explain the 
> workings



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-140) Create helm chart repository

2020-05-21 Thread Kinga Marton (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kinga Marton updated YUNIKORN-140:
--
Summary: Create helm chart repository  (was: Create helm chart repository 
and publish it to helmhub)

> Create helm chart repository
> 
>
> Key: YUNIKORN-140
> URL: https://issues.apache.org/jira/browse/YUNIKORN-140
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: build
>Reporter: Kinga Marton
>Assignee: Kinga Marton
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9
>
>
> Create a helm chart repository and publish the charts to 
> [https://hub.helm.sh/.|https://hub.helm.sh/] This will further help user to 
> try yunikorn out easily.
> How to create the helm chart repository is described here: 
> [https://helm.sh/docs/topics/chart_repository/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-159) Remove helm charts from the k8shim repo and update documentation accordingly

2020-05-21 Thread Kinga Marton (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kinga Marton updated YUNIKORN-159:
--
Summary: Remove helm charts from the k8shim repo and update documentation 
accordingly  (was: Remove helm charts from the k8shim repo)

> Remove helm charts from the k8shim repo and update documentation accordingly
> 
>
> Key: YUNIKORN-159
> URL: https://issues.apache.org/jira/browse/YUNIKORN-159
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: build
>Reporter: Wilfred Spiegelenburg
>Assignee: Kinga Marton
>Priority: Major
>  Labels: pull-request-available
>
> After we move the helm deployment to the release repo we should remove the 
> files from the k8shim repo.
>  This should include updating the 
> [documentation|https://github.com/apache/incubator-yunikorn-core/blob/master/docs/user-guide.md#quick-start]
>  to point to the correct place to find the helm charts and explain the 
> workings



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-24) UT compatibility issue with GO 1.13.x

2020-05-21 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-24?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-24:
--
Priority: Minor  (was: Major)

> UT compatibility issue with GO 1.13.x
> -
>
> Key: YUNIKORN-24
> URL: https://issues.apache.org/jira/browse/YUNIKORN-24
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: test - unit
>Reporter: Wilfred Spiegelenburg
>Priority: Minor
>
> UT starts to fail on flag after upgrade GO to 1.13.x.
> This seems to be caused by us relying on undefined behaviour and fixes in the 
> go language.
> Calling {{flag.Parse()}} in an init is considered an incorrect behaviour.
> This is part of the 1.13 release notes: 
> [https://tip.golang.org/doc/go1.13#testing]
> We currently call {{flag.Parse()}} in the init of the {{SchedulerConf}} which 
> we need to change.
>  * not to use {{init()}} functions, the cost of convenience is some 
> un-expected behaviors like this. 
>  * remove these init functions and replace them with {{sync.once}}
> {{init()}} functions could be problematic. However it is simpler than having 
> to check in each location that the package is initialised. Using 
> {{sync.once}} could add overhead or wrapping which we do not need. I think we 
> need to look at a combination of the two: {{init()}} when we know it is 
> package local initialisation only. If we have interaction with other packages 
> or optional initialisation use the {{sync.once}}
> This {{flag.Parse()}} is a case of interacting with multiple packages, I 
> don't think we will have many more of these cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-24) UT compatibility issue with GO 1.13.x

2020-05-21 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-24?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-24:
--
Target Version: 0.9

> UT compatibility issue with GO 1.13.x
> -
>
> Key: YUNIKORN-24
> URL: https://issues.apache.org/jira/browse/YUNIKORN-24
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: test - unit
>Reporter: Wilfred Spiegelenburg
>Priority: Minor
>
> UT starts to fail on flag after upgrade GO to 1.13.x.
> This seems to be caused by us relying on undefined behaviour and fixes in the 
> go language.
> Calling {{flag.Parse()}} in an init is considered an incorrect behaviour.
> This is part of the 1.13 release notes: 
> [https://tip.golang.org/doc/go1.13#testing]
> We currently call {{flag.Parse()}} in the init of the {{SchedulerConf}} which 
> we need to change.
>  * not to use {{init()}} functions, the cost of convenience is some 
> un-expected behaviors like this. 
>  * remove these init functions and replace them with {{sync.once}}
> {{init()}} functions could be problematic. However it is simpler than having 
> to check in each location that the package is initialised. Using 
> {{sync.once}} could add overhead or wrapping which we do not need. I think we 
> need to look at a combination of the two: {{init()}} when we know it is 
> package local initialisation only. If we have interaction with other packages 
> or optional initialisation use the {{sync.once}}
> This {{flag.Parse()}} is a case of interacting with multiple packages, I 
> don't think we will have many more of these cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-160) release build on a fresh machine is failing

2020-05-21 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-160:
---

This is related to the travis setup for the web interface. We need a similar 
kind of setup to build the web repo when a new PR gets opened and we do not 
want to install all of yarn etc each time as that will become problematic.

Can we re-use the docker image for both cases?

> release build on a fresh machine is failing
> ---
>
> Key: YUNIKORN-160
> URL: https://issues.apache.org/jira/browse/YUNIKORN-160
> Project: Apache YuniKorn
>  Issue Type: Bug
>Reporter: Sunil G
>Priority: Major
>
> While building ./build-docker-images.sh on a fresh machine, command is failing
> {code:java}
> Successfully tagged yunikorn/yunikorn-scheduler-admission-controller:latest
> /Users/sgovindan/Work/releases/yunikorn/0.8.0/apache-yunikorn-0.8.0-incubating-src
> yarn install && yarn build:prod
> /bin/sh: yarn: command not found
> make: *** [build-prod] Error 127 {code}
> Proposal:
>  # Its not good to install "yarn" locally with the node version which we have 
> in machine manages the web build process. In YARNUI2, i have seen this error 
> regularly that some non-compatible package breaks the build
>  # Its better to do all the build process in docker which will have a sane 
> version compatibility of all support s/w versions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-159) Remove helm charts from the k8shim repo and update documentation accordingly

2020-05-21 Thread Kinga Marton (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113025#comment-17113025
 ] 

Kinga Marton commented on YUNIKORN-159:
---

I have removed the helm chart from he shim repo, updated the documentation and 
added the docker images to the web page. I haven't added the chart repository 
yet, because I think we should create a 0.8.0 release from the helm charts as 
well (+ upload it to helm up), and than we can add a download link for it as 
well.

> Remove helm charts from the k8shim repo and update documentation accordingly
> 
>
> Key: YUNIKORN-159
> URL: https://issues.apache.org/jira/browse/YUNIKORN-159
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: build
>Reporter: Wilfred Spiegelenburg
>Assignee: Kinga Marton
>Priority: Major
>  Labels: pull-request-available
>
> After we move the helm deployment to the release repo we should remove the 
> files from the k8shim repo.
>  This should include updating the 
> [documentation|https://github.com/apache/incubator-yunikorn-core/blob/master/docs/user-guide.md#quick-start]
>  to point to the correct place to find the helm charts and explain the 
> workings



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-117) Create event cache for queue and application events

2020-05-21 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113139#comment-17113139
 ] 

Adam Antal commented on YUNIKORN-117:
-

Added {{EventMessage}} to the SI side in 
[this|https://github.com/apache/incubator-yunikorn-scheduler-interface/pull/17] 
pull request.

> Create event cache for queue and application events
> ---
>
> Key: YUNIKORN-117
> URL: https://issues.apache.org/jira/browse/YUNIKORN-117
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - cache, core - scheduler
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Critical
>  Labels: pull-request-available
>
> Create a simple preliminary implementation of the event cache of YUNIKORN-42.
> We have the following limited scope for this task:
> - implement it as a separate process from the scheduler (similar to 
> {{PartitionManager}})
> - only deal with queues and applications (the pods and nodes can be added 
> later)
> - only store the apps last visited time from the scheduler
> - clean up those objects that haven't been visited in the last 24h
> Other cache implementations can be also considered.
> As a starting point, channels are a safe choice to have async communication 
> with the scheduler without expecting bigger performance loss.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-171) Add example and document about how to run kubeflow with yunikorn

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-171:

Labels: pull-request-available  (was: )

> Add example and document about how to run kubeflow with yunikorn
> 
>
> Key: YUNIKORN-171
> URL: https://issues.apache.org/jira/browse/YUNIKORN-171
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: documentation
>Reporter: Weiwei Yang
>Assignee: Ting Yao,Huang
>Priority: Major
>  Labels: pull-request-available
>
> Provide some example and doc to demonstrate how to run kubeflow with yunikorn.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-155) data race in unit test: TestSchedulerRecoveryWhenPlacementRulesApplied

2020-05-21 Thread Kinga Marton (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113179#comment-17113179
 ] 

Kinga Marton commented on YUNIKORN-155:
---

I wasn't able to reproduce this issue. I tried to run it multiple times, added 
some sleep to make the go routines slower, but the test passed each time and 
were no any race condition reported.

>From debugging the test case and the attached stack trace I can see the 
>following things:
 * one goroutine (Goroutine 137) starts when starting the event handlers in 
scheduler.go: 
[https://github.com/apache/incubator-yunikorn-core/blob/master/pkg/scheduler/scheduler.go#L67]
 * the another one (Goroutine 135) is starting when we are starting event 
handlers is cluster-info.go: 
[https://github.com/apache/incubator-yunikorn-core/blob/master/pkg/cache/cluster_info.go#L69]

There is one write and one read operation performed on the same ApplicationInfo 
object:
 *  the write operation is while placing an application and set some fields on 
the ApplicationInfo. This is protected with a lock(Goroutine 137): 
[https://github.com/apache/incubator-yunikorn-core/blob/master/pkg/scheduler/placement/placement.go#L200]
 * the write operation is during logging in case of a 
SchedulerApplicationsUpdateEvent: 
[https://github.com/apache/incubator-yunikorn-core/blob/master/pkg/scheduler/scheduler.go#L176]

> data race in unit test: TestSchedulerRecoveryWhenPlacementRulesApplied
> --
>
> Key: YUNIKORN-155
> URL: https://issues.apache.org/jira/browse/YUNIKORN-155
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: test - unit
>Reporter: Wilfred Spiegelenburg
>Assignee: Kinga Marton
>Priority: Major
> Attachments: data_race.txt
>
>
> Testing shows a new data race while logging the queue name for an application 
> that gets added.
> Details in the attached logs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Comment Edited] (YUNIKORN-155) data race in unit test: TestSchedulerRecoveryWhenPlacementRulesApplied

2020-05-21 Thread Kinga Marton (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113179#comment-17113179
 ] 

Kinga Marton edited comment on YUNIKORN-155 at 5/21/20, 1:27 PM:
-

I wasn't able to reproduce this issue. I tried to run it multiple times, added 
some sleep to make the go routines slower, but the test passed each time and 
were no race condition reported.

>From debugging the test case and the attached stack trace I can see the 
>following things:
 * one goroutine (Goroutine 137) starts when starting the event handlers in 
scheduler.go: 
[https://github.com/apache/incubator-yunikorn-core/blob/master/pkg/scheduler/scheduler.go#L67]
 * the another one (Goroutine 135) is starting when we are starting event 
handlers is cluster-info.go: 
[https://github.com/apache/incubator-yunikorn-core/blob/master/pkg/cache/cluster_info.go#L69]

There is one write and one read operation performed on the same ApplicationInfo 
object:
 *  the write operation is while placing an application and set some fields on 
the ApplicationInfo. This is protected with a lock(Goroutine 137): 
[https://github.com/apache/incubator-yunikorn-core/blob/master/pkg/scheduler/placement/placement.go#L200]
 * the write operation is during logging in case of a 
SchedulerApplicationsUpdateEvent: 
[https://github.com/apache/incubator-yunikorn-core/blob/master/pkg/scheduler/scheduler.go#L176]


was (Author: kmarton):
I wasn't able to reproduce this issue. I tried to run it multiple times, added 
some sleep to make the go routines slower, but the test passed each time and 
were no any race condition reported.

>From debugging the test case and the attached stack trace I can see the 
>following things:
 * one goroutine (Goroutine 137) starts when starting the event handlers in 
scheduler.go: 
[https://github.com/apache/incubator-yunikorn-core/blob/master/pkg/scheduler/scheduler.go#L67]
 * the another one (Goroutine 135) is starting when we are starting event 
handlers is cluster-info.go: 
[https://github.com/apache/incubator-yunikorn-core/blob/master/pkg/cache/cluster_info.go#L69]

There is one write and one read operation performed on the same ApplicationInfo 
object:
 *  the write operation is while placing an application and set some fields on 
the ApplicationInfo. This is protected with a lock(Goroutine 137): 
[https://github.com/apache/incubator-yunikorn-core/blob/master/pkg/scheduler/placement/placement.go#L200]
 * the write operation is during logging in case of a 
SchedulerApplicationsUpdateEvent: 
[https://github.com/apache/incubator-yunikorn-core/blob/master/pkg/scheduler/scheduler.go#L176]

> data race in unit test: TestSchedulerRecoveryWhenPlacementRulesApplied
> --
>
> Key: YUNIKORN-155
> URL: https://issues.apache.org/jira/browse/YUNIKORN-155
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: test - unit
>Reporter: Wilfred Spiegelenburg
>Assignee: Kinga Marton
>Priority: Major
> Attachments: data_race.txt
>
>
> Testing shows a new data race while logging the queue name for an application 
> that gets added.
> Details in the attached logs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-24) UT compatibility issue with GO 1.13.x

2020-05-21 Thread Weiwei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-24?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YUNIKORN-24:

Priority: Major  (was: Minor)

> UT compatibility issue with GO 1.13.x
> -
>
> Key: YUNIKORN-24
> URL: https://issues.apache.org/jira/browse/YUNIKORN-24
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: test - unit
>Reporter: Wilfred Spiegelenburg
>Priority: Major
>
> UT starts to fail on flag after upgrade GO to 1.13.x.
> This seems to be caused by us relying on undefined behaviour and fixes in the 
> go language.
> Calling {{flag.Parse()}} in an init is considered an incorrect behaviour.
> This is part of the 1.13 release notes: 
> [https://tip.golang.org/doc/go1.13#testing]
> We currently call {{flag.Parse()}} in the init of the {{SchedulerConf}} which 
> we need to change.
>  * not to use {{init()}} functions, the cost of convenience is some 
> un-expected behaviors like this. 
>  * remove these init functions and replace them with {{sync.once}}
> {{init()}} functions could be problematic. However it is simpler than having 
> to check in each location that the package is initialised. Using 
> {{sync.once}} could add overhead or wrapping which we do not need. I think we 
> need to look at a combination of the two: {{init()}} when we know it is 
> package local initialisation only. If we have interaction with other packages 
> or optional initialisation use the {{sync.once}}
> This {{flag.Parse()}} is a case of interacting with multiple packages, I 
> don't think we will have many more of these cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-174) Fix SI constant change from YUNIKORN-135

2020-05-21 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-174:
--

 Summary: Fix SI constant change from YUNIKORN-135
 Key: YUNIKORN-174
 URL: https://issues.apache.org/jira/browse/YUNIKORN-174
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: scheduler-interface
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


The build of the SI does not clean up correctly and leaves a file behind even 
if there are no changes (typo in Makefile)

Multiple k8shim only constants are part of the spec which should not be in the 
SI. They are not shared between the k8shim and other shims or the core:
 * SparkLabelAppID
 * SparkLabelRole
 * SparkLabelRoleDriver
 * LabelApp
 * LabelApplicationID
 * LabelQueueName

And we have no default resource types in the product. The core does not use 
them in the production code. They are only used in tests and we should remove 
them from tests also to highlight the fact that the core is resource type 
agnostic:
 * memory
 * vcore



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-175) remove memory and vcore references from resources in tests for core

2020-05-21 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-175:
--

 Summary: remove memory and vcore references from resources in 
tests for core 
 Key: YUNIKORN-175
 URL: https://issues.apache.org/jira/browse/YUNIKORN-175
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: core - common, test - unit
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


The core is resource type agnostic.

Lots of the core test however reference _memory_ and _vcore_ as if they were 
pre-defined types. There is no predefined type and we should not infer that 
there is a default type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-176) schedulerCache might become inconsistent sometimes depending on the ordering of the events

2020-05-21 Thread Weiwei Yang (Jira)
Weiwei Yang created YUNIKORN-176:


 Summary: schedulerCache might become inconsistent sometimes 
depending on the ordering of the events
 Key: YUNIKORN-176
 URL: https://issues.apache.org/jira/browse/YUNIKORN-176
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: shim - kubernetes
Reporter: Weiwei Yang
Assignee: Weiwei Yang


Sometimes, we found some nodes are stuck at pending when working with the 
auto-scaler. Because some daemon set pods were pending to schedule.

The root cause is: 
 # auto-scaler scales up a node
 # the daemon set controller creates pod for e.g fluentd (it sets the 
pod.spec.nodeName="newly-added-host")
 # YK got informed from pod informer: add pod
 # add pod to cache (schedulerCache), since the {{pod.spec.nodeName}} is not 
nil, it adds a {{new nodeInfo}}
 # node informer got informed: add node
 # add node to scheduler cache, the node already exists, skip calling SetNode
 # scheduler tries to allocate the pod to the node
 # predicates failed: NodeUnknownCondition (node x doesn't exist in 
schedulerCache)
 # the allocation always fail and pod pending..
 # since the daemon set pod could not be started, node status will be NotReady



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Updated] (YUNIKORN-176) schedulerCache might become inconsistent sometimes depending on the ordering of the events

2020-05-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YUNIKORN-176:

Labels: pull-request-available  (was: )

> schedulerCache might become inconsistent sometimes depending on the ordering 
> of the events
> --
>
> Key: YUNIKORN-176
> URL: https://issues.apache.org/jira/browse/YUNIKORN-176
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: shim - kubernetes
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
>  Labels: pull-request-available
>
> Sometimes, we found some nodes are stuck at pending when working with the 
> auto-scaler. Because some daemon set pods were pending to schedule.
> The root cause is: 
>  # auto-scaler scales up a node
>  # the daemon set controller creates pod for e.g fluentd (it sets the 
> pod.spec.nodeName="newly-added-host")
>  # YK got informed from pod informer: add pod
>  # add pod to cache (schedulerCache), since the {{pod.spec.nodeName}} is not 
> nil, it adds a {{new nodeInfo}}
>  # node informer got informed: add node
>  # add node to scheduler cache, the node already exists, skip calling SetNode
>  # scheduler tries to allocate the pod to the node
>  # predicates failed: NodeUnknownCondition (node x doesn't exist in 
> schedulerCache)
>  # the allocation always fail and pod pending..
>  # since the daemon set pod could not be started, node status will be NotReady



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-148) Define API in scheduler interface to queue administration

2020-05-21 Thread Weiwei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved YUNIKORN-148.
--
Resolution: Won't Fix

> Define API in scheduler interface to queue administration
> -
>
> Key: YUNIKORN-148
> URL: https://issues.apache.org/jira/browse/YUNIKORN-148
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: scheduler-interface
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
>  Labels: pull-request-available
>
> Expose queue mgmt API from scheduler interface.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-149) Watch K8s namespace and create unmanaged queues accordingly

2020-05-21 Thread Weiwei Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113753#comment-17113753
 ] 

Weiwei Yang commented on YUNIKORN-149:
--

Same reason like YUNIKORN-148.

> Watch K8s namespace and create unmanaged queues accordingly
> ---
>
> Key: YUNIKORN-149
> URL: https://issues.apache.org/jira/browse/YUNIKORN-149
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: shim - kubernetes
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
>
> Watch K8s namespace object and create queues accordingly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Commented] (YUNIKORN-148) Define API in scheduler interface to queue administration

2020-05-21 Thread Weiwei Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YUNIKORN-148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113752#comment-17113752
 ] 

Weiwei Yang commented on YUNIKORN-148:
--

The original PR is 
[https://github.com/apache/incubator-yunikorn-scheduler-interface/pull/16].

[~leftnoteasy], [~wilfreds], and me had a long discussion about pros and cons 
of using such admin APIs. And the conclusion is to abandon this for now. 

The pros of this approach are that we can have a more determinable API for 
namespace mapping, and more interactive with the clients. If we add queue CRD 
in the future, we can stop-fast if the quota is set with some issues. And 
explicitly let users know what's going on.  The workflow looks like:
 # the user creates a queue CRD
 # shim react the add operation and call the addQueue API to create the queue
 # if failed, shim fails the operation and user will get an error from step 1

The cons are we already support the dynamical queues with placement rules, that 
means the yunikorn-core is taking in charge of queue creation/deletion. if we 
add such APIs, that means the shim will be responsible for this as well. 
Conceptional, it doesn't look like a placement rule (which only exists in the 
core). 

> Define API in scheduler interface to queue administration
> -
>
> Key: YUNIKORN-148
> URL: https://issues.apache.org/jira/browse/YUNIKORN-148
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: scheduler-interface
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
>  Labels: pull-request-available
>
> Expose queue mgmt API from scheduler interface.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-149) Watch K8s namespace and create unmanaged queues accordingly

2020-05-21 Thread Weiwei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved YUNIKORN-149.
--
Resolution: Won't Fix

> Watch K8s namespace and create unmanaged queues accordingly
> ---
>
> Key: YUNIKORN-149
> URL: https://issues.apache.org/jira/browse/YUNIKORN-149
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: shim - kubernetes
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
>
> Watch K8s namespace object and create queues accordingly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-171) Add example and document about how to run kubeflow with yunikorn

2020-05-21 Thread Weiwei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved YUNIKORN-171.
--
Fix Version/s: 0.9
   Resolution: Fixed

PR Merged, thanks!

> Add example and document about how to run kubeflow with yunikorn
> 
>
> Key: YUNIKORN-171
> URL: https://issues.apache.org/jira/browse/YUNIKORN-171
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: documentation
>Reporter: Weiwei Yang
>Assignee: Ting Yao,Huang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9
>
>
> Provide some example and doc to demonstrate how to run kubeflow with yunikorn.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-177) SchedulerName doesn't need to be configurable

2020-05-21 Thread Weiwei Yang (Jira)
Weiwei Yang created YUNIKORN-177:


 Summary: SchedulerName doesn't need to be configurable
 Key: YUNIKORN-177
 URL: https://issues.apache.org/jira/browse/YUNIKORN-177
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: shim - kubernetes
Reporter: Weiwei Yang


Currently, we allow user to overwrite the schedulerName. But this is not 
necessary, we should stick to \{{schedulerName=yunikorn}}. Lets revoke this 
from CLI options.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-81) Fix the kubernetes dashboard link in env-setup readme

2020-05-21 Thread Weiwei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-81?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved YUNIKORN-81.
-
Resolution: Later

> Fix the kubernetes dashboard link in env-setup readme
> -
>
> Key: YUNIKORN-81
> URL: https://issues.apache.org/jira/browse/YUNIKORN-81
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Ayub Pathan
>Assignee: Ayub Pathan
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Documentation page which needs correction: 
> https://github.com/apache/incubator-yunikorn-core/blob/master/docs/setup/env-setup.md
> Under deploy and access dashboard, the clickable link is pointing to below 
> URL:
> http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/#!/login
> The correct one should be 
> http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/#/login



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-178) Remove call to get config from the admission controller

2020-05-21 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-178:
--

 Summary: Remove call to get config from the admission controller
 Key: YUNIKORN-178
 URL: https://issues.apache.org/jira/browse/YUNIKORN-178
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: shim - kubernetes
Reporter: Wilfred Spiegelenburg


The admission controller does not have a possibility to process configuration 
from anywhere.

However it does try to retrieve the scheduler name in by calling 
{{conf.GetSchedulerConf().SchedulerName}} in \{{updateSchedulerName}}. This 
will not work and should be replaced by a environment setting like what was 
done in YUNIKORN-28 for other values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-179) Allow changing app id generation via option

2020-05-21 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-179:
--

 Summary: Allow changing app id generation via option
 Key: YUNIKORN-179
 URL: https://issues.apache.org/jira/browse/YUNIKORN-179
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: shim - kubernetes
Reporter: Wilfred Spiegelenburg


Currently a change to the way we generate a new app ID in the admission 
controller requires a code change. In YUNIKORN-173 we moved from a per app ID 
to a namespace based ID. We want to support both without a code change.

This requires a re-instate of the app ID generation code which was removed and 
make it possible to switch between the two using an environment setting. We 
should end up with both ways in the code.

We should default to the namespace generated ID



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org