[jira] [Resolved] (YUNIKORN-2665) Gang app originator pod changes after restart

2024-06-12 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2665.
-
Fix Version/s: 1.6.0
   1.5.2
   Resolution: Fixed

Changes have been committed and backported into the 1.5 branch closing

> Gang app originator pod changes after restart
> -
>
> Key: YUNIKORN-2665
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2665
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: shim - kubernetes
>Affects Versions: 1.3.0, 1.4.0, 1.5.0, 1.5.1
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.6.0, 1.5.2
>
>
> Gang app choose the first pod (who created the app) as originator pod which 
> becomes the real driver pod later. While processing gang app specifically 
> after the placeholder creation and in the process of replacement, restart can 
> lead to the below described incorrect behaviour:
> During restore, there is no guarantee on the ordering of pods coming from K8s 
> lister especially when all the pods created with the same second timestamp. 
> k8s use the seconds based timestamp, which means all pods created with in 
> same second has same timestamp. During this situation, whichever pod comes 
> first from lister, YK designate it as originator pod. So, any placeholder 
> could become the originator pod and actual originator pod has been lost. This 
> change could cause rippling effects leading to weird behaviour and needs to 
> be fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2672) Upgrade to K8s 1.29.6

2024-06-12 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2672:
---

 Summary: Upgrade to K8s 1.29.6
 Key: YUNIKORN-2672
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2672
 Project: Apache YuniKorn
  Issue Type: Task
  Components: shim - kubernetes
Reporter: Wilfred Spiegelenburg


A major performance regression was fixed in K8s that on analysis mainly impacts 
the plugin implementation. The regression is part of the release 1.29.4 we 
currently build against.

See [https://github.com/kubernetes/kubernetes/pull/125197] for details



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2655) Cleanup REST API documentation

2024-05-31 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2655:
---

 Summary: Cleanup REST API documentation
 Key: YUNIKORN-2655
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2655
 Project: Apache YuniKorn
  Issue Type: Task
  Components: documentation
Reporter: Wilfred Spiegelenburg


The REST API documentation is not up to date with the current behaviour as it 
does not show any 400 or 404 errors returned by a number of API calls.

The error response only shows a 500 code with the same message for each call.

We should move to a simple list for each call showing the applicable errors 
like this:
{code:java}
### Error responses

**Code** : `400 Bad Request` (URL query is invalid, missing partition name)

**Code** : `404 Not Found` (Partition not found)

**Code** : `500 Internal Server Error` {code}
Remove the error examples as they do not add any detail required



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2654) Remove unused code in k8shim context

2024-05-30 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2654:
---

 Summary: Remove unused code in k8shim context
 Key: YUNIKORN-2654
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2654
 Project: Apache YuniKorn
  Issue Type: Task
  Components: shim - kubernetes
Reporter: Wilfred Spiegelenburg


The NotifyApplicationComplete and NotifyApplicationFail  function are not 
called by anything and are unused code.

The K8shim does not trigger the application completion or failure. This is 
triggered by the core when the application no longer has any activity 
registered.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2653) Gang scheduling K8s event formatting compliance

2024-05-30 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2653:
---

 Summary: Gang scheduling K8s event formatting compliance
 Key: YUNIKORN-2653
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2653
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: shim - kubernetes
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


The K8s events provide definitions and rules around the content of the fields 
within the event. Adjust the content of gang scheduling related events to 
comply with the rules.
Focussed on the reason and action fields only.
  * 'reason' is the reason this event is generated. 'reason' should be short 
and unique; it should be in UpperCamelCase format (starting with a capital 
letter). 
 * 'action' explains what happened with regarding/ what action did the 
ReportingController take in objects name; it should be in UpperCamelCase format 
(starting with a capital letter). 

No space or long text.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2648) Add deadlock detection config to the configmap

2024-05-29 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2648:
---

 Summary: Add deadlock detection config to the configmap
 Key: YUNIKORN-2648
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2648
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: core - common
Reporter: Wilfred Spiegelenburg


The current deadlock detection is configured using environment variables. That 
requires a change of the image and a restart of the scheduler to take effect 
and is not easy to maintain.

We should be using yunikorn-defaults config map for the settings. We want a 
default set, turned off, for production use cases. However making the configs 
loadable from the config map makes turning it on easier.

Update the configmap and restart the scheduler to turn the detection on or off.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2647) Flaky test TestUpdateNodeCapacity

2024-05-29 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2647:
---

 Summary: Flaky test TestUpdateNodeCapacity
 Key: YUNIKORN-2647
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2647
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: test - unit
Reporter: Wilfred Spiegelenburg


Same as we saw in YUNIKORN-2573 the single node update test might fail:
{code:java}
--- FAIL: TestUpdateNodeCapacity (0.03s)
    operation_test.go:446: Expected partition resource map[memory:1 
vcore:2], doesn't match with actual partition resource map[memory:1 
vcore:2]{code}
We calculate the delta resources when updating node capacity with that delta we 
update resources in partition.

The test would fail with following order same as for multiple nodes

node.SetCapacity() -> waitForAvailableNodeResource() ->  
partitionInfo.GetTotalPartitionResource()  -> 
partition.updatePartitionResource()



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2638) Simplify finalizeNodes and finalizePods

2024-05-21 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2638:
---

 Summary: Simplify finalizeNodes and finalizePods
 Key: YUNIKORN-2638
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2638
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: shim - kubernetes
Reporter: Wilfred Spiegelenburg


In finalizeNodes and finalizePods a map is created to store the newly retrieved 
pods and nodes. The map is only used as a reference and the pod and node 
objects themselves are not used.

Instead of storing the objects the maps could use a boolean value to store. 
This also simplifies the check later for the existence of the node or pod to 
just a single map lookup. 

We should also set the size of the map, length of the nodes or pod list 
retrieved, to prevent any re-allocation during the map filling.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2637) finalizePods should ignore pods like registerPods does

2024-05-20 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2637:
---

 Summary: finalizePods should ignore pods like registerPods does
 Key: YUNIKORN-2637
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2637
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: shim - kubernetes
Reporter: Wilfred Spiegelenburg


The initialisation code is a two step process for pods: first list all pods and 
add them to the system in registerPods(). This returns a list of pods processed.

The second step happens after event handlers are turned on and nodes have been 
cleaned up etc. During the second step pods from the first step are checked and 
removed. However pods that were already in a terminated state in step 1 get 
removed again. Although the step should be idempotent this is unneeded. When 
iterating over the existing pods any pod in a terminal state should be skipped.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2630) Release context lock in shim when processing config in the core

2024-05-16 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2630:
---

 Summary: Release context lock in shim when processing config in 
the core
 Key: YUNIKORN-2630
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2630
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: shim - kubernetes
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


When an change comes in for a the configmaps we process the change under a 
context lock as we need to merge the two configmaps.

We keep this lock even if all the work is done in the shim and processing has 
been transferred to the core. This is unneeded as the core has its own locking 
an serialisation of the changes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2628) fix release announcement links

2024-05-16 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2628.
-
Fix Version/s: 1.6.0
   Resolution: Fixed

links are fixed after removing the {{..}} from the path

> fix release announcement links
> --
>
> Key: YUNIKORN-2628
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2628
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: website
>    Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> In YUNIKORN-2595 a regression snuck in breaking the links to the release 
> announcements.
> Need to reverse that path change for the release announcements.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2627) Add K8s 1.30 to the e2e matrix

2024-05-16 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2627.
-
Fix Version/s: 1.6.0
   Resolution: Fixed

Upgrdaed kind to version 0.23 and added 1.30 as a new version to test with

> Add K8s 1.30 to the e2e matrix
> --
>
> Key: YUNIKORN-2627
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2627
> Project: Apache YuniKorn
>  Issue Type: Improvement
>    Reporter: Wilfred Spiegelenburg
>Assignee: Tseng Hsi-Huang
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 1.6.0
>
>
> k8s 1.30 support in kind is now available as part of the [0.23 
> release|https://github.com/kubernetes-sigs/kind/releases/tag/v0.23.0]
> Need to add 1.30 to the matrix for the next release



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2628) fix release announcement links

2024-05-14 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2628:
---

 Summary: fix release announcement links
 Key: YUNIKORN-2628
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2628
 Project: Apache YuniKorn
  Issue Type: Task
  Components: website
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


In YUNIKORN-2596 a regression snuck in breaking the links to the release 
announcements.

Need to reverse that path change for the release announcements.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2627) Add K8s 1.30 to the e2e matrix

2024-05-14 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2627:
---

 Summary: Add K8s 1.30 to the e2e matrix
 Key: YUNIKORN-2627
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2627
 Project: Apache YuniKorn
  Issue Type: Improvement
Reporter: Wilfred Spiegelenburg


k8s 1.30 support in kind is now available as part of the [0.23 
release|https://github.com/kubernetes-sigs/kind/releases/tag/v0.23.0]

Need to add 1.30 to the matrix for the next release



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2531) Create unit tests for AsyncRMCallback

2024-05-14 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2531.
-
Fix Version/s: 1.6.0
   Resolution: Fixed

new tests added to the system to improve coverage

> Create unit tests for AsyncRMCallback
> -
>
> Key: YUNIKORN-2531
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2531
> Project: Apache YuniKorn
>  Issue Type: Test
>  Components: shim - kubernetes
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> There are no unit tests for the {{AsyncRMCallback}} type in the shim 
> (scheduler_callback.go). It's tested indirectly but we have no idea about the 
> coverage or how it behaves in rare scenarios.
> At least longer methods such as {{UpdateApplication()}}, 
> {{UpdateAllocation()}} and {{UpdateNode()}} should be covered.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2615) Remove named returns from predicate_manager.go

2024-05-14 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2615.
-
Fix Version/s: 1.6.0
   Resolution: Fixed

refactor committed to master for 1.6.0

> Remove named returns from predicate_manager.go
> --
>
> Key: YUNIKORN-2615
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2615
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes
>    Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Predicate manager has defined named returns on some functions but does not 
> use them. They should be removed as the way they are used can cause issues 
> that are hard to debug.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



Re: Improving documentation about observability

2024-05-13 Thread Wilfred Spiegelenburg
Please file jiras for any of the issues mentioned, or one jira if it
can all be handled from one.
All your remarks make sense.

You can even open a PR for the changes that you would like to make.
Documentation in the yunikorn-site repository.
The sample deployment is located in the yunikorn-k8shim repository [1].
Contributions are always welcome. We should document and or set
sensible configuration values if we provide any.

Wilfred

[1] 
https://github.com/apache/yunikorn-k8shim/blob/master/deployments/scheduler/prometheus.yml#L18

On Mon, 13 May 2024 at 19:31, Wiard van Rij  wrote:
>
> Hello everyone,
>
> I'm getting in touch through the mailing list since I haven't set up my Jira 
> account yet.
>
> I'd like to discuss the content found at 
> https://yunikorn.apache.org/docs/user_guide/prometheus/. It seems that out of 
> the box, it doesn't offer sensible default values. Typically, Prometheus is 
> deployed as a comprehensive solution, not just for a single service like 
> yunikorn. Thus, suggesting a configuration change that alters the global 
> interval rate to 3 seconds might not be the most advisable approach. Instead, 
> I'd argue that adjusting this interval isn't necessary, especially 
> considering you're recommending adding another job to the static config.
>
> Specifically, I propose the following adjustments:
>
>   *   Eliminate the global block from the configuration.
>   *   If an evaluation_interval is suggested, ensure its value matches the 
> scrape interval.
>   *   Set the scrape_interval to either 15 seconds or 30 seconds. I lean 
> towards 15 seconds as it should be more than adequate.
>   *
> Encourage users to avoid using overrides in the scrape_configs. Instead, they 
> could utilize annotations on the service or implement a serviceMonitor when 
> using Prometheus Operator.
>  *
> This is honestly a more easier solution that doesn't involve changing 
> Prometheus 'core' configuration
>
> Thanks in advance,
>
> Wiard
>

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



Re: [VOTE] Release Apache YuniKorn 1.5.1 RC1

2024-05-13 Thread Wilfred Spiegelenburg
+1 (binding)

- Verified signatures and checksums
- Verified LICENSE and NOTICE files
- Verified release tarball structure
- Built release on Mac Sonoma (ARM64):
  - make image with go 1.22 and 1.21
- Ran make test, all tests passed
- Installed locally on Kind cluster (1.29)

- REST interface checks:
  - verified the SHA references in the cluster detail
  - verified the build date is set correctly
- checked REST endpoints and UI

On Fri, 10 May 2024 at 18:40, Peter Bacsko  wrote:
>
> Hello everyone,
>
> I would like to call a vote for releasing Apache YuniKorn 1.5.1 RC1.
> This is a minor release which contains only bugfixes.
>
> The release artefacts have been uploaded here:
>   https://dist.apache.org/repos/dist/dev/yunikorn/1.5.1-RC1/
>
> My public key is located in the KEYS file:
>   https://downloads.apache.org//yunikorn/KEYS
>
> JIRA issues that have been resolved in this release:
>https://issues.apache.org/jira/issues/?filter=12353383
>
> The release solves a deadlock issue. If possible, test Yunikorn with
> workloads that put Yunikorn under stress (ie. thousands/tens of thousands
> of pods).
>
> Git tags for each component are as follows:
> yunikorn-scheduler-interface: v1.5.1-1
> yunikorn-core: v1.5.1-1
> yunikorn-k8shim: v1.5.1-1
> yunikorn-web: v1.5.1-1
> yunikorn-release: v1.5.1-1
>
> Once the release is voted on and approved, all repos will be tagged
> 1.5.1 for consistency.
>
> Please review and vote. The vote will be open for at least 96 hours
> and closes on Tuesday 14 May 2024, 20:00:00 CEST.
>
> [ ] +1 Approve
> [ ] +0 No opinion
> [ ] -1 Disapprove (and the reason why)
>
>
> Thank you,
> Peter

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2618) Streamline AsyncRMCallback UpdateAllocation

2024-05-09 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2618:
---

 Summary: Streamline AsyncRMCallback UpdateAllocation
 Key: YUNIKORN-2618
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2618
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: shim - kubernetes
Reporter: Wilfred Spiegelenburg


if task is not found, a nil is returned from {{context.getTask}} in  for 
{{response.New}} processing we should just log that fact and proceed to the 
next alloc. Simplifies the flow as we never need to check for a. nil task. We 
should never have a pod in the cache that does not exist as a task on an 
application.

We retrieve the application using the application ID from the response to never 
use the object. We only use the application ID to pass into an event. The 
context event handler then does the exact same lookup again to process the 
event on the app.

We need to become much smarter in this area, double or triple lookups, generate 
async events that just change the state of the app or task or kick off another 
event.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2616) Remove unused bool return from PreemptionPredicates()

2024-05-08 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2616:
---

 Summary: Remove unused bool return from PreemptionPredicates()
 Key: YUNIKORN-2616
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2616
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: shim - kubernetes
Reporter: Wilfred Spiegelenburg


The predicate manager method {{PreemptionPredicates()}} returns two values an 
int and boolean. The boolean is false if the integer is -1 and true for 0 or 
llarger. There is no need for the boolean as the -1 already indicates the same



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2615) Remove named returns from predicate_manager.go

2024-05-08 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2615:
---

 Summary: Remove named returns from predicate_manager.go
 Key: YUNIKORN-2615
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2615
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: shim - kubernetes
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


Predicate manager has defined named returns on some functions but does not use 
them. They should be removed as the way they are used can cause issues that are 
hard to debug.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2601) Update kindest/node: v1.29.1 to v1.29.2, v1.28.6 to v1.28.7, v1.27.10 to v1.27.11, v1.26.13 -> v1.26.14

2024-05-08 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2601.
-
Fix Version/s: 1.6.0
   Resolution: Fixed

Changes committed.

No Kind for 1.30 available yet we should log a new Jira to add it later.

> Update kindest/node:  v1.29.1 to v1.29.2, v1.28.6 to v1.28.7, v1.27.10 to 
> v1.27.11, v1.26.13 -> v1.26.14
> 
>
> Key: YUNIKORN-2601
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2601
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: test - e2e
>Reporter: Chia-Ping Tsai
>Assignee: Hsien-Cheng(Ryan) Huang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> as title



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



Algolia search: not indexing 1.5.0

2024-05-07 Thread Wilfred Spiegelenburg
Hi all,

I was checking the website for some doc and found that the Algolia
search does not work for the 1.5.0 documentation release.
The index skips from 1.4.0 to next as can be seen in the sitemap [1]
I am not sure if this was caused by the change to Docusaurus 3.0 or
something else is broken. I have no idea which credentials (email +
password) own the Algolia search setup for YuniKorn. At this point
there is no way to check.

I have also not seen a crawler message being sent to the private
mailing list in a while.
@Weiwei Yang I copied you in specifically because I think you had
access last time but I am not sure anymore.

Wilfred

[1] https://yunikorn.apache.org/sitemap.xml

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2591) Document placement rules always

2024-05-06 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2591.
-
Fix Version/s: 1.5.1
   1.5.0
   1.4.0
   Resolution: Fixed

Change made to the docs going back to 1.4.0, 1.5.0.

Will be part of the 1.5.1. release also

> Document placement rules always
> ---
>
> Key: YUNIKORN-2591
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2591
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: documentation
>    Reporter: Wilfred Spiegelenburg
>Assignee: Hsien-Cheng(Ryan) Huang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.5.1, 1.5.0, 1.4.0
>
>
> The current [doc 
> says|https://yunikorn.apache.org/docs/user_guide/queue_config#placement-rules]:
> {quote}If no rules are defined the placement manager is not started and each 
> application _must_ have a queue set on submit.
> {quote}
> This is not correct, we moved to placement rules always in YUNIKORN-1793 in 
> YuniKorn 1.4 The documentation needs to be updated to reflect that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2596) Enhance layout for release announcements

2024-05-06 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2596.
-
Fix Version/s: 1.5.1
   Resolution: Fixed

Fixed and published changes applied to 1.5.0 layout, before the 1.5.1 release.

marking as fixed in 1.5.1

> Enhance layout for release announcements
> 
>
> Key: YUNIKORN-2596
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2596
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: website
>    Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.5.1
>
> Attachments: release_announce.png, releasee_announce_updated.png
>
>
> The current release announcements page lacks a decent layout. The page is 
> generated during the build based on the directory content.
> Some simple updates would make the page more readable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2595) Fix download page links

2024-05-06 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2595.
-
Fix Version/s: 1.5.1
   Resolution: Fixed

download page fixed for 1.5.0, deployed before the 1.5.1 release

Marking as fixed in 1.5.1

> Fix download page links
> ---
>
> Key: YUNIKORN-2595
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2595
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: website
>    Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.5.1
>
>
> The download links must follow a specific set of rule as specified 
> [here|https://infra.apache.org/release-download-pages.html].
> We currently do not set the correct download link for the source package. We 
> dropped the closer.lua resolution for the content network in one of the 
> releases. With the next release, 1.5.1, coming up we need to fix this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2595) Fix download page links

2024-04-29 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2595:
---

 Summary: Fix download page links
 Key: YUNIKORN-2595
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2595
 Project: Apache YuniKorn
  Issue Type: Task
  Components: website
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


The download links must follow a specific set of rule as specified 
[here|https://infra.apache.org/release-download-pages.html].

We currently do not set the correct download link for the source package. We 
dropped the closer.lua resolution for the content network in one of the 
releases. With the next release, 1.5.1, coming up we need to fix this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



Re: [DISCUSSION] Yunikorn release 1.5.1

2024-04-28 Thread Wilfred Spiegelenburg
Peter,

Thank you for starting this discussion. See inline for further comments.

> Hi all,
>
> Due to the number of problems that we have discovered since the release of
> 1.5.0, I believe it makes sense to create a new Yunikorn release which
> consists of bug fixes only. If I'm not mistaken we haven't done this before
> (at least since leaving the ASF incubator), so this would be the first
> minor Yunikorn release.

+1
I am totally for releasing YuniKorn 1.5.1 with the lock fixes.
Looking at all the work you have done for this release: would you be
willing to also step up as a release manager for the 1.5.1 release?

> There are a bunch of fixes that are already on branch-1.5:
>
>- YUNIKORN-2521 Scheduler deadlock (resolved indirectly by YUNIKORN-2544)
>- YUNIKORN-2539 Add optional deadlock detection
>- YUNIKORN-2544 [UMBRELLA] Fix Yunikorn potential locking issues
>   - YUNIKORN-2543 Fix locking in RMProxy
>   - YUNIKORN-2545 Eliminate multiple lock calls from Queue
>   - YUNIKORN-2548 Potential deadlock during concurrent
>   bottom-up/top-down queue traversal
>   - YUNIKORN-2550 Fix locking in PartitionContext
>   - YUNIKORN-2552 Recursive locking when sending remove queue event
>   - YUNIKORN-2553 [core] Enable deadlock detection during unit tests
>   - YUNIKORN-2563 [shim] Enable deadlock detection during unit tests
>   - YUNIKORN-2574 totalPartitionResource should not be mutated with
>   AddTo/SubFrom
>   - YUNIKORN-2562 Nil pointer panic in Application.ReplaceAllocation()
>

Yes for all the above.

> The following is In Progress for 1.5.1:
>
>- YUNIKORN-2526 Discrepancy between shim cache and core app/task list
>after scheduler restart

This would be a good one to get in if we have some progress on this.
Do we understand what is going on yet? I looked at the jira and am not
sure if we understand the root cause.

> Candidates:
>
>- YUNIKORN-2520 PVC errors in AssumePod() are not handled properly -
>Resolved, only cherry-picking is needed

Yes, this could be added.

I also think we need to check if we have any CVE fixes that need to be added.
Quick check shows these two:
* golang.org/x/net 0.23 (CVE-2023-45288 or GO-2024-2687 via YUNIKORN-2541)
* google.golang.org/protobuf to v1.33.0 (CVE-2024-24786 via YUNIKORN-2469)
* build with golang 1.21.9

To satisfy the scanners, although we are not affected:
* K8s 1.29.4 (CVE-2024-3177)


>- YUNIKORN-2057 FindQueueByAppID is slow - Critical priority, "In
>progress" since Oct 2023
>- YUNIKORN-1089 Application handling with invalid task group annotations
>- Critical priority, no progress
>- YUNIKORN-1988 Preemption happens when a queue lower than its
>guaranteed capacity - Critical priority, "In progress" since Sep 2023

No for the last 3 mentioned. We did not block the 1.5.0 release on
these and they have not made enough progress since then.
I would not consider them as a possible candidate for 1.5.1

Wilfred

>
> Thoughts, opinions? What should be the scope of 1.5.1?
>
> Thanks,
> Peter

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2591) Document placement rules always

2024-04-25 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2591:
---

 Summary: Document placement rules always
 Key: YUNIKORN-2591
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2591
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: documentation
Reporter: Wilfred Spiegelenburg


The current [doc 
says|https://yunikorn.apache.org/docs/user_guide/queue_config#placement-rules]:
{quote}If no rules are defined the placement manager is not started and each 
application _must_ have a queue set on submit.
{quote}
This is not correct, we moved to placement rules always in YUNIKORN-1793 in 
YuniKorn 1.4 The documentation needs to be updated to reflect that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2590) Handler tests should check for nil request on create

2024-04-25 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2590:
---

 Summary: Handler tests should check for nil request on create
 Key: YUNIKORN-2590
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2590
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: core - common, test - unit
Reporter: Wilfred Spiegelenburg


In the handler_test.go file we have an anti pattern showing a large number 
(40+) warnings in an IDE:
{quote}'req' might have 'nil' or other unexpected value as its corresponding 
error variable might be not 'nil'
{quote}
The warning are due to the fact that we have the following pattern:
{code:java}
req, err = http.NewRequest("GET", "path", strings.NewReader(""))
req = req.WithContext(context.WithValue(req.Context(), httprouter.ParamsKey, 
httprouter.Params{})){code}
There is no error assertion after the request creation. We should add a simple 
{{assert.NilError(t, err, "HTTP request create failed")}} inserted between 
creating and using the request.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2581) Expose running placement rules in REST

2024-04-23 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2581:
---

 Summary: Expose running placement rules in REST
 Key: YUNIKORN-2581
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2581
 Project: Apache YuniKorn
  Issue Type: New Feature
  Components: core - common
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


Since introducing the use of placement rules always and the recovery rule the 
queue config does not correctly show the running rules.

Also if a config update has been rejected, for any reason, the rules would not 
be correct

Exposing the configured rules from the placement manager works around all these 
issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2575) Make logging for IsPodFitNode clear

2024-04-23 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2575.
-
Fix Version/s: 1.6.0
   Resolution: Fixed

unique errors are returned for all failure cases which at DEBUG level will show 
exactly why the failure occurred.

> Make logging for IsPodFitNode clear
> ---
>
> Key: YUNIKORN-2575
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2575
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes
>    Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> The logging in {{IsPodFitNode()}} logs the same message for a missing pod and 
> node. We should log clearly which thing is missing: the node or the pod.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2580) Remove executionTimeoutMilliSeconds

2024-04-23 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2580.
-
Resolution: Won't Fix

This is used for the placeholder timeout and cannot be removed.

> Remove executionTimeoutMilliSeconds
> ---
>
> Key: YUNIKORN-2580
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2580
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: scheduler-interface
>Reporter: Chia-Ping Tsai
>Priority: Minor
>
> [https://github.com/apache/yunikorn-scheduler-interface/blob/b70081933c38018fd7f01c82635f5b186c4ef394/si.proto#L211]
> It is not used actually, and hence we should either remove it or add facility 
> for it. Personally, I'd like to remove it to simplify the interface.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2578) Refactor SchedulerCache.GetPod() remove bool return

2024-04-23 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2578:
---

 Summary: Refactor SchedulerCache.GetPod() remove bool return
 Key: YUNIKORN-2578
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2578
 Project: Apache YuniKorn
  Issue Type: Task
  Components: shim - kubernetes
Reporter: Wilfred Spiegelenburg


SchedulerCache {{GetPod()}} and {{GetPodNoLock()}} retrun two values:
# *v1.Pod
# bool

The boolean value is redundant as it is false if the pod is not found and a nil 
is returned for the pod. The boolean is true if the pod has a value. Testing 
for a nil pod has the same result.

We do not cache a nil pod in the cache for a pod UID



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2577) Remove named returns from IsPodFitNodeViaPreemption

2024-04-23 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2577:
---

 Summary: Remove named returns from IsPodFitNodeViaPreemption
 Key: YUNIKORN-2577
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2577
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: shim - kubernetes
Reporter: Wilfred Spiegelenburg


IsPodFitNodeViaPreemption has defined named returns but does not use them. They 
should be removed as the way they are used can cause issues that are hard to 
debug.

As part of this change we need to further cleanup:
* The variable {{ok}} also gets shadowed multiple times, not just from the 
named return declaration.
* The if construct around {{GetPodNoLock()}} is not needed as it returns a nil 
for the pod if it returns false. Just adding the result for the pod always has 
the same effect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2575) Make logging for IsPodFitNode clear

2024-04-22 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2575:
---

 Summary: Make logging for IsPodFitNode clear
 Key: YUNIKORN-2575
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2575
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: shim - kubernetes
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


The logging in {{IsPodFitNode()}} logs the same message for a missing pod and 
node. We should log clearly which thing is missing: the node or the pod.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2556) Remove getResourceUsageDAOInfo from test code

2024-04-12 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2556:
---

 Summary: Remove getResourceUsageDAOInfo from test code
 Key: YUNIKORN-2556
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2556
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: core - common
Reporter: Wilfred Spiegelenburg


Remove the {{getResourceUsageDAOInfo()}} call from the test code. If we need to 
retrieve the usage for the whole queueTracker hierarchy we should add that in 
the test code separately instead of using the DAO and convert that back

The DAO object should also not contain the pointer to the resource object. It 
should contain the DAOMap for the resource object similar to all other DAO 
definitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2555) Cleanup placement rules in partition

2024-04-12 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2555:
---

 Summary: Cleanup placement rules in partition
 Key: YUNIKORN-2555
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2555
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: core - scheduler
Reporter: Wilfred Spiegelenburg


The placement rule config is tracked in the partition in the object 
{{partition.rules}} 

This object contains the config with which the placement manager is initialised 
. This was used/needed before the move to always use placement rules.. Since 
the change to always use placement rules it no longer has a function. The 
config is now also out of sync with the rules used in the placement manager.

There is no need to keep this object in the partition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



Re: [DISCUSSION] (Potential) Locking issues in Yunikorn

2024-04-11 Thread Wilfred Spiegelenburg
Scheduling is lock free when we get to any of the application.Try...()
calls. The scheduling thread does not hold any locks until we get
there. That was how it was designed and implemented.
When we get there nothing but the scheduling thread is allowed to make
changes to the application for the duration of the scheduler working
on that application.
Letting go of the application lock will break that assumption. That
cannot happen.

Setting the termination callback for the application only happens
once. When we add the newly created application to the partition. We
should fix the timing and or locking around this as part of PR #838 in
which the AddApplication in the partition is refactored.

The locking for accessing nodes in the partition is unneeded. They are
left over from before the node iterator was changed. The nodes used to
be a simple map directly in the partition. Any call thus needed to
lock the partition. Now the nodes are an object that is set on
partition creation and handles its own locking. All of the locks in
the partition around the nodes object should be removed. They are not
needed.

Wilfred

On Thu, 11 Apr 2024 at 21:44, Peter Bacsko  wrote:
>
> Thanks for the replies. I managed to get good progress on this issue.
>
> There's a thing which I'd like to talk about. It's not something which is
> critical but it needs to be addressed IMO.
>
> The scope of the mutex-protected critical section is too large in
> tryAllocate, tryReservedAllocate and tryPlaceholderAllocate:
>
> func (sa *Application) trySomeAllocation() {
>   sa.Lock()
>   defer sa.Unlock()
>
>   .. doing things ...
> }
>
> The "doing things" can mean a lot of different operations. For example, in
> tryAllocate(), this means:
> - checking headroom/quota
> - checking requiredNode: if it's set, then it tries to allocate on a
> specific node
> - iterating through available nodes with the node iterator
> - if we don't have an allocation, we attempt to preempt existing allocations
>
> This is a lot of task while holding the application lock. In YUNIKORN-2521
> <https://issues.apache.org/jira/browse/YUNIKORN-2521>, we can clearly see
> why this is bad: state dump times out and other REST endpoints which try to
> access the application will also time out.
>
> I suggest re-thinking the scopes to avoid this.
>
> Peter
>
> On Mon, Apr 8, 2024 at 4:58 AM Wilfred Spiegelenburg 
> wrote:
>
> > Case 1:
> > I am all for simplifying and removing locks. Changing the SI like you
> > propose will trigger a YuniKorn 2.0 as it is incompatible with the
> > current setup. There is a much simpler change that does not require a
> > 2.0 version. See comments in the jira.
> >
> > Case 2:
> > This is a bug I think, which has nothing to do with locking. The call
> > to get the priority of the child is made before we have updated the
> > properties for child or even applied the template. We would not be
> > calling that until after the properties are converted into setting in
> > UpdateQueueProperties. That should solve the double lock issue also.
> >
> > Case 3:
> > The call(s) in GetPartitionQueueDAOInfo inside the lock could be
> > dangerous. The getHeadRoom call walks up the queue and takes locks
> > there.
> > It should be moved outside of the existing queue lock. The comment on
> > the lock even says so but it has slipped through.
> >
> > Wilfred
> >
> > On Sun, 7 Apr 2024 at 05:33, Craig Condit  wrote:
> > >
> > > I’m all for fixing these… and in general where lockless algorithms can
> > be implemented cleanly, I’m in favor of those implementations instead of
> > requiring locks, so for RMProxy I’m +1 on that. The extra memory for an
> > RMProxy instance is irrelevant.
> > >
> > > The recursive locking case is a real problem, and I’m surprised that
> > hasn’t bitten us harder. It can cause all sorts of issues.
> > >
> > > Craig
> > >
> > > > On Apr 6, 2024, at 2:19 PM, Peter Bacsko  wrote:
> > > >
> > > > Hi all,
> > > >
> > > > after YUNIKORN-2539 got merged, we identified some potential deadlocks.
> > > > These are false positives now, but a small change can cause Yunikorn to
> > > > fall apart, so the term "potential deadlock" describes them properly.
> > > >
> > > > Thoughs, opinions are welcome. IMO we should handle these with
> > priority to
> > > > ensure stability.
> > > >
> > > > *1. Locking order: Cache→RMProxy vs RMProxy→Cache*
> > > >
> > > > We grab the locks of these types in different order (read bottom to
>

Re: [DISCUSSION] (Potential) Locking issues in Yunikorn

2024-04-07 Thread Wilfred Spiegelenburg
Case 1:
I am all for simplifying and removing locks. Changing the SI like you
propose will trigger a YuniKorn 2.0 as it is incompatible with the
current setup. There is a much simpler change that does not require a
2.0 version. See comments in the jira.

Case 2:
This is a bug I think, which has nothing to do with locking. The call
to get the priority of the child is made before we have updated the
properties for child or even applied the template. We would not be
calling that until after the properties are converted into setting in
UpdateQueueProperties. That should solve the double lock issue also.

Case 3:
The call(s) in GetPartitionQueueDAOInfo inside the lock could be
dangerous. The getHeadRoom call walks up the queue and takes locks
there.
It should be moved outside of the existing queue lock. The comment on
the lock even says so but it has slipped through.

Wilfred

On Sun, 7 Apr 2024 at 05:33, Craig Condit  wrote:
>
> I’m all for fixing these… and in general where lockless algorithms can be 
> implemented cleanly, I’m in favor of those implementations instead of 
> requiring locks, so for RMProxy I’m +1 on that. The extra memory for an 
> RMProxy instance is irrelevant.
>
> The recursive locking case is a real problem, and I’m surprised that hasn’t 
> bitten us harder. It can cause all sorts of issues.
>
> Craig
>
> > On Apr 6, 2024, at 2:19 PM, Peter Bacsko  wrote:
> >
> > Hi all,
> >
> > after YUNIKORN-2539 got merged, we identified some potential deadlocks.
> > These are false positives now, but a small change can cause Yunikorn to
> > fall apart, so the term "potential deadlock" describes them properly.
> >
> > Thoughs, opinions are welcome. IMO we should handle these with priority to
> > ensure stability.
> >
> > *1. Locking order: Cache→RMProxy vs RMProxy→Cache*
> >
> > We grab the locks of these types in different order (read bottom to top):
> >
> > pkg/rmproxy/rmproxy.go:307 rmproxy.(*RMProxy).GetResourceManagerCallback
> > ??? <
> > pkg/rmproxy/rmproxy.go:306 rmproxy.(*RMProxy).GetResourceManagerCallback ???
> > pkg/rmproxy/rmproxy.go:359 rmproxy.(*RMProxy).UpdateNode ???
> > cache.(*Context).updateNodeResources ???
> > cache.(*Context).updateNodeOccupiedResources ???
> > cache.(*Context).updateForeignPod ???
> > cache.(*Context).UpdatePod ???
> >
> > cache.(*Context).ForgetPod ??? <
> > cache.(*Context).ForgetPod ???
> > cache.(*AsyncRMCallback).UpdateAllocation ???
> > rmproxy.(*RMProxy).triggerUpdateAllocation ???
> > rmproxy.(*RMProxy).processRMReleaseAllocationEvent ???
> > rmproxy.(*RMProxy).handleRMEvents ???
> >
> > I already created a JIRA for this one:
> > https://issues.apache.org/jira/browse/YUNIKORN-2543.
> >
> > Luckily we only call RLock() inside RMProxy while processing allocations,
> > but if this changes, deadlock is guaranteed. I made a POC which creates a
> > lockless RMProxy instance after calling a factory method, see my comment in
> > the JIRA. The scheduler core compiles & all tests pass.
> >
> > *2. Lock order: multiple locks calls on the same goroutine which interferes
> > with another goroutine*
> >
> > We shouldn't call RLock() multiple times or after Lock() on the same mutex
> > instance. This is dangerous as described here:
> > https://github.com/sasha-s/go-deadlock/?tab=readme-ov-file#grabbing-an-rlock-twice-from-the-same-goroutine
> >
> > objects.(*Queue).GetCurrentPriority ??? <   ← Queue RLock
> > objects.(*Queue).GetCurrentPriority ???
> > objects.(*Queue).addChildQueue ???← Queue Lock
> > objects.NewConfiguredQueue ???
> > scheduler.(*PartitionContext).addQueue ???
> > scheduler.(*PartitionContext).addQueue ???
> > scheduler.(*PartitionContext).initialPartitionFromConfig ???
> > scheduler.newPartitionContext ???
> > scheduler.(*ClusterContext).updateSchedulerConfig ???
> > scheduler.(*ClusterContext).processRMRegistrationEvent ???
> > scheduler.(*Scheduler).handleRMEvent ???
> >
> > objects.(*Queue).internalHeadRoom ??? <   ← Queue RLock #2
> > objects.(*Queue).internalHeadRoom ???
> > objects.(*Queue).getHeadRoom ???
> > objects.(*Queue).getHeadRoom ???
> > objects.(*Queue).GetPartitionQueueDAOInfo ???  ←Queue RLock #1
> > objects.(*Queue).GetPartitionQueueDAOInfo ???
> > objects.(*Queue).GetPartitionQueueDAOInfo ???
> > scheduler.(*PartitionContext).GetPartitionQueues ???
> > webservice.getPartitionQueues ???
> > http.HandlerFunc.ServeHTTP ???
> >
> > No JIRA yet. Fix looks straightfoward.
> >
> > *3. Recursive locking*
> >
> > It's basically the same as #2, but it's not about an interaction between
> > two goroutines. It's just a single goroutine which grabs RLock() multiple
> > times. This can be dangerous in itself.
> >
> > objects.(*Queue).internalHeadRoom ??? <   ← Queue RLock #2
> > objects.(*Queue).internalHeadRoom ???
> > objects.(*Queue).getHeadRoom ???
> > objects.(*Queue).getHeadRoom ???
> > objects.(*Queue).GetPartitionQueueDAOInfo ???  ← Queue RLock #1
> > objects.(*Queue).GetPartitionQueueDAOInfo ???
> > 

[jira] [Created] (YUNIKORN-2540) clean up constants in pkg/cache/context_test.go

2024-04-04 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2540:
---

 Summary: clean up constants in pkg/cache/context_test.go
 Key: YUNIKORN-2540
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2540
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: shim - kubernetes
Reporter: Wilfred Spiegelenburg


Constants are duplicated in the {{pkg/cache/context_test.go}}

example {{fakeNodeName}} is defined multiple times in the files. We should move 
to a central point of defining the constants for the test at the top of the 
file. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2520) PVC errors in AssumePod() are not handled properly

2024-04-04 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2520.
-
Fix Version/s: 1.6.0
   Resolution: Fixed

Changes merged to master

Volume issues should be handled correctly now.

> PVC errors in AssumePod() are not handled properly
> --
>
> Key: YUNIKORN-2520
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2520
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: shim - kubernetes
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> When there is an error caused by a volume operation in 
> {{Context.AssumePod()}}, the allocation on core side will not be removed.
> Although we check the result from {{UpdateAllocation}}, the error handling is 
> just logging:
> {noformat}
> if err := callback.UpdateAllocation(response); err != nil {
>   rmp.handleUpdateResponseError(rmID, err)
>   }
> ...
> func (rmp *RMProxy) handleUpdateResponseError(rmID string, err error) {
> log.Log(log.RMProxy).Error("failed to handle response",
>zap.String("rmID", rmID),
>zap.Error(err))
> }{noformat}
> I suggest moving volume-related code to {{{}Task.postTaskAllocated()}}. In 
> this case, the task will transition to "Failed" state and we'll have 
> allocationID available, so we can release both the ask and the allocation:
> {noformat}
> func (task *Task) releaseAllocation() {
>   ...
>   var releaseRequest *si.AllocationRequest
>   s := TaskStates()
>   switch task.GetTaskState() {
>   case s.New, s.Pending, s.Scheduling, s.Rejected:
>   releaseRequest = common.CreateReleaseAskRequestForTask(
>   task.applicationID, task.taskID, 
> task.application.partition)  <-- release ask + allocation if possible
>   default:
>   if task.allocationID == "" {
>   ... log error ...
>   return
>   }
>   releaseRequest = 
> common.CreateReleaseAllocationRequestForTask(
>   task.applicationID, task.taskID, 
> task.allocationID, task.application.partition, task.terminationType)
>   }
> ...{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2538) Shim cache context pre-allocate slice

2024-04-04 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2538:
---

 Summary: Shim cache context pre-allocate slice
 Key: YUNIKORN-2538
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2538
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: shim - kubernetes
Reporter: Wilfred Spiegelenburg


When building the reason string from all volume failure reasons we should 
allocate a slice once based on the size of the reasons object we get returned.

See [review 
comment|https://github.com/apache/yunikorn-k8shim/pull/810#discussion_r1550882867]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2537) cleanup UpdateAllocation in callback

2024-04-04 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2537:
---

 Summary: cleanup UpdateAllocation in callback
 Key: YUNIKORN-2537
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2537
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: shim - kubernetes
Reporter: Wilfred Spiegelenburg


UpdateAllocation needs a cleanup: {{getTask()}} already checks for the 
application. No need to retrieve the application when we process response.New. 
Sending an event should be linked to the existence of the task not of the 
application.

On top of that we have the appID already in the task so we do not need to get 
it from the app.

The same logic needs to be applied to the whole function, we already do it for 
the release.* handling.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2533) Implement String() for TrackedResource

2024-04-04 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2533:
---

 Summary: Implement String() for TrackedResource
 Key: YUNIKORN-2533
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2533
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: core - common
Reporter: Wilfred Spiegelenburg


To fix the way TrackedResources are logged it should implement the String() 
function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2527) Allow remove and re-add configured queue within cleanup time

2024-04-03 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2527.
-
Fix Version/s: 1.6.0
   Resolution: Fixed

Queues can now be removed and added back again within a cleanup cycle

> Allow remove and re-add configured queue within cleanup time 
> -
>
> Key: YUNIKORN-2527
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2527
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - common
>    Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> When we remove a queue from the config it is marked for cleanup. If we re-add 
> the same queue in the config again before the cleanup gets executed the queue 
> still gets removed.
> reproduction:
>  * edit config map remove a queue, save
>  * immediately edit configmap add the same queue back, save
>  * wait for the cleanup to happen, queue should still exist after the fix



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2519) Remove bypass ACL check from placement rules

2024-04-03 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2519.
-
Fix Version/s: 1.6.0
   Resolution: Fixed

refactor committed to master for 1.6.0

> Remove bypass ACL check from placement rules
> 
>
> Key: YUNIKORN-2519
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2519
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler
>    Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Instead of returning a flag to not bypass the ACL check by all rules except 
> for the recovery rule special case the recovery rule to bypass checks.
> The recovery queue is created without ACLs, quota and is always a leaf queue. 
> The only rule that can return the recovery queue is the recovery rule which 
> is the last one in the list.
> Use all these facts to simplify the placement processing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2527) Allow remove and re-add configured queue within cleanup time

2024-04-02 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2527:
---

 Summary: Allow remove and re-add configured queue within cleanup 
time 
 Key: YUNIKORN-2527
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2527
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: core - common
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


When we remove a queue from the config it is marked for cleanup. If we re-add 
the same queue in the config again before the cleanup gets executed the queue 
still gets removed.

reproduction:
 * edit config map remove a queue, save
 * immediately edit configmap add the same queue back, save
 * wait for the cleanup to happen



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2498) Implement force create flag in k8shim for recovery queue

2024-04-01 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2498.
-
Fix Version/s: 1.6.0
   Resolution: Fixed

> Implement force create flag in k8shim for recovery queue
> 
>
> Key: YUNIKORN-2498
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2498
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: shim - kubernetes
>    Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> As part of the initialisation changes a new recovery queue was added to allow 
> already running allocation to be restored even if the queue config was 
> changed. The implementation on the k8shim side needs to be added to leverage 
> the forced create flag from YUNIKORN-1887.
> Without that the changes added for the recovery queue will not be used



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2494) Revisit IsAtorAbove, WithIn, GetRemaining Guaranteed resources calculation

2024-03-28 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2494.
-
Fix Version/s: 1.6.0
   Resolution: Fixed

Functions added to the master code, not actively used yet.

> Revisit IsAtorAbove, WithIn, GetRemaining Guaranteed resources calculation
> --
>
> Key: YUNIKORN-2494
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2494
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - common
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> These 3 methods doesn't expose the actual guaranteed values and returns 
> boolean value based on the calculation. There are cases, where these boolean 
> values are not correct and also there is a need to know the actual guaranteed 
> values. For example, How much is remaining in Guaranteed? How much can be 
> preempted? etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2519) Remove bypass ACL check from placement rules

2024-03-27 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2519:
---

 Summary: Remove bypass ACL check from placement rules
 Key: YUNIKORN-2519
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2519
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: core - scheduler
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


Instead of returning a flag to not bypass the ACL check by all rules except for 
the recovery rule special case the recovery rule to bypass checks.

The recovery queue is created without ACLs, quota and is always a leaf queue. 
The only rule that can return the recovery queue is the recovery rule which is 
the last one in the list.

Use all these facts to simplify the placement processing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2518) Allow recovery queue in REST requests

2024-03-27 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2518:
---

 Summary: Allow recovery queue in REST requests
 Key: YUNIKORN-2518
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2518
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: core - common
Reporter: Wilfred Spiegelenburg


The current checks for the REST requests that require a queue path to be 
provided prevent looking at the {{root.@recover@}} queue.

The validator filters the queue names which makes it impossible to check if the 
queue has any running applications or pod after initialisation using the REST 
requests. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2506) fix

2024-03-19 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2506:
---

 Summary: fix 
 Key: YUNIKORN-2506
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2506
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: webapp
Reporter: Wilfred Spiegelenburg


When running make on the web UI project a deprecation warning is printed for 
the fonts we include:
{code:java}
 WARN  deprecated fontsource-roboto@4.0.0: Package relocated. Please install 
and migrate to @fontsource/roboto. {code}
Move to {{@fontsource/roboto}} to fix the warning



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2498) Implement force create flag in k8shim for recovery queue

2024-03-19 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2498:
---

 Summary: Implement force create flag in k8shim for recovery queue
 Key: YUNIKORN-2498
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2498
 Project: Apache YuniKorn
  Issue Type: Task
  Components: shim - kubernetes
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


As part of the initialisation changes a new recovery queue was added to allow 
already running allocation to be restored even if the queue config was changed. 
The implementation on the k8shim side needs to be added to leverage the forced 
create flag from YUNIKORN-1887.

Without that the changes added for the recovery queue will not be used



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2497) Update node.js to 18.19.1

2024-03-18 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2497:
---

 Summary: Update node.js to 18.19.1
 Key: YUNIKORN-2497
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2497
 Project: Apache YuniKorn
  Issue Type: Task
  Components: website
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


Node 18.x is a LTS version. The version 18.17 has been superseded with two 
other releases 18.18 and 18.19. Both have some CVE fixes which we should be 
including for stability.

Moving the build to 18.19 (currently 18.19.1)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2496) Fix security issues in website javascript

2024-03-18 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2496.
-
Fix Version/s: 1.6.0
   Resolution: Fixed

Change committed all dependabot alerts closed

> Fix security issues in website javascript
> -
>
> Key: YUNIKORN-2496
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2496
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: website
>    Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> The change to pnmp triggered a large number of security alerts from 
> dependabot.
> 7 could be fixed directly by the 4 PRs opened by dependabot. 6 need manual 
> intervention.
> The change also included an upgrade of the Algolia search component to 3.x. 
> That change prevent running {{{}pnpm audit{}}}. 
> Docusaurus 3.x also contains a large number of backward incompatible changes 
> and an upgrade is planned separately. Using the Algolia 3.x dependency 
> already pushes some of these changes and should be reverted to Algolia 2.x 
> same as the rest of Docusaurus environment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2496) Fix security issues in website javascript

2024-03-17 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2496:
---

 Summary: Fix security issues in website javascript
 Key: YUNIKORN-2496
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2496
 Project: Apache YuniKorn
  Issue Type: Task
  Components: website
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


The change to pnmp triggered a large number of security alerts from dependabot.

7 could be fixed directly by the 4 PRs opened by dependabot. 6 need manual 
intervention.

The change also included an upgrade of the Algolia search component to 3.x. 
That change prevent running {{{}pnpm audit{}}}. 
Docusaurus 3.x also contains a large number of backward incompatible changes 
and an upgrade is planned separately. Using the Algolia 3.x dependency already 
pushes some of these changes and should be reverted to Algolia 2.x same as the 
rest of Docusaurus environment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2490) Add new PMC and committer members

2024-03-14 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2490.
-
Fix Version/s: 1.6.0
   Resolution: Fixed

Web site is updated with the new details after checks.

Deploy of the new site should take about 30 min.

> Add new PMC and committer members
> -
>
> Key: YUNIKORN-2490
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2490
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: website
>    Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We have elected a new PMC member and some committers. Now that they have 
> accepted we should add them to the website.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



Re: [ANNOUNCE] Apache YuniKorn v1.5.0 release

2024-03-14 Thread Wilfred Spiegelenburg
Thank you to TingYao for being the release manager.
And the whole community for this rather large release.

Wilfred

On Fri, 15 Mar 2024 at 05:37, 陳昱霖  wrote:
>
> Great! Thanks for Tingyao's hard work!
>
> Yu-Lin Chen

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[ANNOUNCE] New committer: Yu-Lin Chen

2024-03-13 Thread Wilfred Spiegelenburg
The Project Management Committee (PMC) for Apache YuniKorn has invited
Yu-Lin Chen to become a committer and we are pleased to announce that he
has accepted.
Please join me in congratulating him.

Congratulations & Welcome aboard Yu-Lin !

Wilfred
on behalf of The Apache YuniKorn PMC

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[ANNOUNCE] New committer: Kuan-Po Tseng

2024-03-13 Thread Wilfred Spiegelenburg
The Project Management Committee (PMC) for Apache YuniKorn has invited
Kuan-Po Tseng to become a committer and we are pleased to announce that he
has accepted.
Please join me in congratulating him.

Congratulations & Welcome aboard Kuan-Po !

Wilfred
on behalf of The Apache YuniKorn PMC

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2490) Add new PMC and committer members

2024-03-13 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2490:
---

 Summary: Add new PMC and committer members
 Key: YUNIKORN-2490
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2490
 Project: Apache YuniKorn
  Issue Type: Task
  Components: website
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


We have elected a new PMC member and some committers. Now that they have 
accepted we should add them to the website.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[ANNOUNCE] New committer: WenChih (Ryan) Lo

2024-03-13 Thread Wilfred Spiegelenburg
The Project Management Committee (PMC) for Apache YuniKorn has invited
WenChih (Ryan) Lo to become a committer and we are pleased to announce that he
has accepted.
Please join me in congratulating him.

Congratulations & Welcome aboard Ryan !

Wilfred
on behalf of The Apache YuniKorn PMC

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[ANNOUNCE] New PMC member: Chia-Ping Tsai

2024-03-13 Thread Wilfred Spiegelenburg
The Project Management Committee (PMC) for Apache YuniKorn has invited
Chia-Ping Tsai to become a PMC member and we are pleased to announce
that he has accepted.

On behalf of the Apache YuniKorn PMC
Wilfred

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2482) Failure to set template does not return error

2024-03-12 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2482:
---

 Summary: Failure to set template does not return error
 Key: YUNIKORN-2482
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2482
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: core - scheduler
Reporter: Wilfred Spiegelenburg


The update of setting a template on a parent could fail if the template is not 
correct. The error is swallowed and a success is returned but the update of the 
queue has not finished correctly:
*Queue.applyConf()
{code:java}
if !sq.isLeaf {
if err = sq.setTemplate(conf.ChildTemplate); err != nil {
   return nil
}
} {code}
Need to add tests to make sure we do not regress.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



Re: [VOTE]Release Apache YuniKorn 1.5.0 RC2

2024-03-07 Thread Wilfred Spiegelenburg
+1 (binding)

- Verified signatures and checksums
- Verified LICENSE and NOTICE files
- Verified release tarball structure
- Built release on Mac Sonoma (ARM64)
  - verified protobuf CVE change to 1.33 in the image
  - verified LICENSE and NOTICE inside the images
- Installed locally on Kind cluster (1.28)
- Ran a simple preemption triggering workload

- REST interface checks:
  - verified the SHA references in the cluster detail
  - verified the build date is set correctly
- checked REST endpoints and UI

Filed a jira against the new and undocumented REST API for specific queue [1]

Wilfred

[1] https://issues.apache.org/jira/browse/YUNIKORN-2472


On Fri, 8 Mar 2024 at 01:34, TingYao  wrote:
>
> Hello everyone,
>
> I would like to call a vote for releasing Apache YuniKorn 1.5.0 RC2.
>
> The release artefacts have been uploaded here:
>   https://dist.apache.org/repos/dist/dev/yunikorn/1.5.0-RC2/
>
> My public key is located in the KEYS file:
>   https://downloads.apache.org//yunikorn/KEYS
>
> JIRA issues that have been resolved in this release:
>   https://issues.apache.org/jira/issues/?filter=12352958
>
> This release artifact build with go 1.21.8 to fix some CVEs issue.
> Compared to the RC1, the RC2 addresses several CVEs and memory leak issues.
> Also remove reproducible build artifacts from draft release note. Please
> read the draft release notes
> attached to this vote for further details.
>   https://github.com/apache/yunikorn-site/pull/405
>
> Git tags for each component are as follows:
> yunikorn-scheduler-interface: v1.5.0-1
> yunikorn-core: v1.5.0-3
> yunikorn-k8shim: v1.5.0-3
> yunikorn-web: v1.5.0-1
> yunikorn-release: v1.5.0-3
>
> Once the release is voted on and approved, all repos will be tagged
> 1.5.0 for consistency.
>
> Please review and vote. The vote will be open for at least 72 hours
> and closes on Sunday 10 March 2024, 15:00:00 UTC
>
> [ ] +1 Approve
> [ ] +0 No opinion
> [ ] -1 Disapprove (and the reason why)
>
> Thank you,
> Tingyao

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2472) REST API returns subtree by default

2024-03-07 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2472:
---

 Summary: REST API returns subtree by default
 Key: YUNIKORN-2472
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2472
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: core - common
Affects Versions: 1.5.0
Reporter: Wilfred Spiegelenburg


The subtree query parameter is interpreted the opposite of what would be 
expected.

If you call {{/ws/v1/partition/default/queue/root?subtree}} then you do not get 
the subtree. If you call {{/ws/v1/partition/default/queue/root}} you get the 
whole tree rooted at root

We have not documented the new API yet so before we add it to the docs we 
should fix the behaviour:
 * subtree given: return the whole tree
 * subtree missing: return one level

The code fix is as simple as a ! in a single call and inverting the test cases 
to pass or not pass {{?subtree}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



Re: [VOTE] Release Apache YuniKorn 1.5.0 RC1

2024-03-05 Thread Wilfred Spiegelenburg
Yes I think we need to spin a new RC: -1 for RC1

Go 1.21.8 delivers a total of 5 CVE fixes, with another CVE in the
protobuf code.
We should fix the two memory leaks discovered. Both are simple and
non-invasive fixes.

We should remove the reproducible build details from the README until
we figure out what is happening.

Wilfred

On Wed, 6 Mar 2024 at 10:15, Craig Condit  wrote:
>
> All of the below-mentioned issues have been resolved in branch-1.5.0 in 
> preparation for a possible 1.5.0-rc2. Assuming we move forward with rc2, we 
> should build with go 1.21.8 to ensure the latest fixes in the go standard 
> library are included as well.
>
> Craig
>
>
> > On Mar 5, 2024, at 3:12 PM, Craig Condit  wrote:
> >
> > -1 (binding).
> >
> > All,
> >
> > We have a few issues in rc1 that I believe we should address before 
> > shipping 1.5.0:
> >
> > CVEs:
> >
> > - CVE-2024-24783 (requires rebuild with go 1.21.8)
> > - CVE-2023-45290 (requires rebuild with go 1.21.8)
> > - CVE-2023-45289 (requires rebuild with go 1.21.8)
> > - CVE-2024-24786 (requires updates to google.golang.org/protobuf 
> >  and possibly github.com/golang/protobuf 
> > )
> >
> > Broken functionality:
> >
> > - Reproducible builds (unknown why this has failed, but we will need to 
> > remove the content from the README.md that claims reproducible status)
> >
> > Critical bugs (both memory leaks):
> >
> > - https://issues.apache.org/jira/browse/YUNIKORN-2465 - Remove Task objects 
> > from the shim upon pod completion (fix merged to master and to branch-1.5)
> > - https://issues.apache.org/jira/browse/YUNIKORN-2467 - Remove 
> > AllocationAsk from the core when a pod is completed (PR available; needs 
> > review to determine if this is a 1.5 blocker).
> >
> > I think we should address each of these and cut an rc2. Thought?
> >
> > Craig Condit
> >
> >> On Mar 2, 2024, at 10:38 AM, TingYao  wrote:
> >>
> >> Hello everyone,
> >>
> >> I would like to call a vote for releasing Apache YuniKorn 1.5.0 RC1.
> >>
> >> The release artefacts have been uploaded here:
> >> https://dist.apache.org/repos/dist/dev/yunikorn/1.5.0-RC1
> >>
> >> My public key is located in the KEYS file:
> >> https://downloads.apache.org//yunikorn/KEYS
> >>
> >> JIRA issues that have been resolved in this release:
> >> https://issues.apache.org/jira/issues/?filter=12352958
> >>
> >> Git tags for each component are as follows:
> >> yunikorn-scheduler-interface: v1.5.0-1
> >> yunikorn-core: v1.5.0-2
> >> yunikorn-k8shim: v1.5.0-2
> >> yunikorn-web: v1.5.0-1
> >> yunikorn-release: v1.5.0-2
> >>
> >> Once the release is voted on and approved, all repos will be tagged
> >> 1.5.0 for consistency.
> >>
> >> Please review and vote. The vote will be open for at least 72 hours
> >> and closes on Wednesday 5 March 2024, 17:00:00 UTC
> >>
> >> [ ] +1 Approve
> >> [ ] +0 No opinion
> >> [ ] -1 Disapprove (and the reason why)
> >>
> >> Thank you,
> >> Tingyao
> >
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
> For additional commands, e-mail: dev-h...@yunikorn.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



Re: [VOTE] Release Apache YuniKorn 1.5.0 RC1

2024-03-04 Thread Wilfred Spiegelenburg
The binaries generated for me are really different. I copied out the
files generated during the building of the release and compared them
with files I generated based on the release. The sha-512 sums that are
part of the README in my release are again different from any that
have been shown here or in the README.md of the release artefacts.

from the release process: (linux/arm64) run on my local machine:
  59811940  4 Mar 14:26 yunikorn-admission-controller
  65162249  4 Mar 14:25 yunikorn-scheduler
  83047917  4 Mar 14:26 yunikorn-scheduler-plugin
from the build after (linux/arm64):
 59811916  4 Mar 14:43 yunikorn-admission-controller
 65162321  4 Mar 14:42 yunikorn-scheduler
 83047941  4 Mar 14:42 yunikorn-scheduler-plugin

That is running on the same machine with the same go compiler. Seems
like something fundamentally is broken. We tested this a number of
times before it was committed. Not sure why it worked back then and
not any more.

Wilfred


On Mon, 4 Mar 2024 at 08:23, Craig Condit  wrote:
>
> I’m trying to validate the binaries produced using the new reproducible 
> builds feature, but I’m getting different checksums than what the README 
> indicates I should. Was the release tarball created from a fresh checkout of 
> the release repository with no uncommitted changes?
>
> README.md shows these checksums for amd64:
>
> 74646cecfb0ec1bd171ea58ee28e12466939841ac4f6a4a56b482a3d336388c4cee707eba393bd4214750860bc7d2fd3dd877097bf5cee1e495e9b8b14004bc7
> yunikorn-admission-controller
> 1508297773eb2ef7910abd39b15221b09ee6ce48f29c4d5903f42f46e65a4f583048a8483243845af4a43ceeba911d03659798e8316995bf6cd87c9fcf86f02d
> yunikorn-scheduler
> 67cdfb99f50eb271f932205bd45fa2bf4e9108e815f7d51fcd0c9cf15747eed3dcf7e2f46a1f2eb2ae7d3d43ac88330e21bbf380a5e8b19128f14707c2777f9f
>   yunikorn-scheduler-plugin
> 1eaa7485480f6430cd58e85ec6fd1b4c11d1abe08c509e53e6cb6772c188dd75c5f9f2c8d79fc334d68a3b3c8260ccdf5631409897346759cb636c4098efdf94
>   yunikorn-web
>
> My results:
>
> c47192a5f0b8b1afe6244b31b1fd31668c664ea8fbc9476c4678e5d2e2c2c4543908af95a960d6fbece36f1ae7ee34ebdeda56cc40989fe10fa51e56360a8c97
> yunikorn-admission-controller
> 71c4531b5d8a38c60196393d5bcbe053e5e24068c0b083dea21de93ae5891909df5d6a6ea2173c526978f23c6368555e5d079112207c1e8bed3a9ec19b69f186
> yunikorn-scheduler
> 54e60d6f9deb834e1fc33b5a065ee9f5db7a2a67374245075da889caf3182d17fe327a4036aed60fda6c9a301f488de917c17bafbb654ec806211297d6fc6ba3
> yunikorn-scheduler-plugin
> 518c70006448426eda6a533b816fc3e8251a92065009f20619d9cd1ca21e80906749d83194d12fd771a44e9f328f8aac4c8af8f76028f17a8c1570f663e25606
> yunikorn-web
>
>
> If we can’t reproduce the results, then the README content is invalid. I’ve 
> tested the release process locally by generating the release tarball and then 
> rebuilding again from the resulting tarball.
>
> Craig
>
> > On Mar 2, 2024, at 10:38 AM, TingYao  wrote:
> >
> > Hello everyone,
> >
> > I would like to call a vote for releasing Apache YuniKorn 1.5.0 RC1.
> >
> > The release artefacts have been uploaded here:
> >  https://dist.apache.org/repos/dist/dev/yunikorn/1.5.0-RC1
> >
> > My public key is located in the KEYS file:
> >  https://downloads.apache.org//yunikorn/KEYS
> >
> > JIRA issues that have been resolved in this release:
> >  https://issues.apache.org/jira/issues/?filter=12352958
> >
> > Git tags for each component are as follows:
> > yunikorn-scheduler-interface: v1.5.0-1
> > yunikorn-core: v1.5.0-2
> > yunikorn-k8shim: v1.5.0-2
> > yunikorn-web: v1.5.0-1
> > yunikorn-release: v1.5.0-2
> >
> > Once the release is voted on and approved, all repos will be tagged
> > 1.5.0 for consistency.
> >
> > Please review and vote. The vote will be open for at least 72 hours
> > and closes on Wednesday 5 March 2024, 17:00:00 UTC
> >
> > [ ] +1 Approve
> > [ ] +0 No opinion
> > [ ] -1 Disapprove (and the reason why)
> >
> > Thank you,
> > Tingyao
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
> For additional commands, e-mail: dev-h...@yunikorn.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2462) incorrect gang annotations in example

2024-02-27 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2462:
---

 Summary: incorrect gang annotations in example
 Key: YUNIKORN-2462
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2462
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: documentation
Reporter: Wilfred Spiegelenburg


The example for turning on gang scheduling with Spark is incorrect.

[https://yunikorn.apache.org/docs/next/user_guide/gang_scheduling/#enable-gang-scheduling-for-spark-jobs]

The example shows:
{code:java}
  yunikorn.apache.org/taskGroupName: “spark-driver”
  yunikorn.apache.org/taskGroup: “
TaskGroups: [ {code}
The {{taskGroupName}} should be {{task-group-name}} and {{taskGroup}} should be 
 {{task-groups}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2456) Remove weak ciphers from TLS

2024-02-27 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2456.
-
Fix Version/s: 1.5.0
   Resolution: Fixed

committed to master and cherry-picked into branch-1.5

resolving

> Remove weak ciphers from TLS
> 
>
> Key: YUNIKORN-2456
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2456
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: security, shim - kubernetes
>    Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.5.0
>
>
> The TLS connection for the admission controller allows ciphers that are 
> considered weak in the connection.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2456) Remove weak ciphers from TLS

2024-02-26 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2456:
---

 Summary: Remove weak ciphers from TLS
 Key: YUNIKORN-2456
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2456
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: security, shim - kubernetes
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


The TLS connection for the admission controller allows ciphers that are 
considered weak in the connection.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



Re: [DISCUSSION] Yunikorn release 1.5.0

2024-02-26 Thread Wilfred Spiegelenburg
The last change is approved YUNIKORN-1706. Peter will commit and backport this.
Please check the private@ list for the last development.

Wilfred

On Sat, 24 Feb 2024 at 18:09, TingYao  wrote:
>
> Hi Everyone,
>
> Update:
>
> We've move some jiras to the next release, and we still got two jiras in
> progress.
> I have created Yunikorn 1.5 branch for all 4 repos(core, k8shim, interface,
> web) as well. Upon the blocker issue fixed, I will start to cherry-picked,
> tagging and go mod dependency changes.
>
> Thanks,
> Tingyao
>
> TingYao  於 2024年2月18日 週日 下午8:45寫道:
>
> > Hi Everyone,
> >
> > I would like to start the discussion for Release 1.5.0.
> >
> > Planned major features:
> >
> > YUNIKORN-970 Change queue metrics to labeled
> > 
> > YUNIKORN-2099 [Umbrella] K8shim simplification
> > 
> > YUNIKORN-2115 [Umbrella] Application tracking history - Phase 2
> >  
> > YUNIKORN-1362 filtering nodes in UI
> > 
> > YUNIKORN-1922 display pending resources in web UI
> > 
> > YUNIKORN-2140 Web UI: resource display rework
> > i
> >
> > Additionally, minor enhancements and bug fixes have been covered as part
> > of this release.
> >
> > There are some open items with target version 1.5.0:
> >
> > https://issues.apache.org/jira/browse/YUNIKORN-2030?jql=project%20%3D%20YUNIKORN%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20%22Target%20Version%22%20%3D%201.5.0%20ORDER%20BY%20priority%20DESC
> >
> > Please review this list and decide whether it's feasible to
> > complete them before code freeze. If not, I will retarget the tickets
> > to 1.6.0.
> >
> > There are some in progress blocker or critical issues with target version
> > 1.5.0:
> >
> > YUNIKORN-2030 Need to check headroom when trying other nodes for reserved
> > allocations
> >  
> > YUNIKORN-1706 We should clean up failed apps in shim side
> > 
> > YUNIKORN-1089 Application handling with invalid task group annotations
> > 
> >
> > Hope we can include those change, otherwise we might need to postpone
> > release.
> >
> > Here is the preliminary schedule:
> > Code freeze on 22 Feb
> > Branch on 23 Feb
> > First RC out latest by 1 March
> >
> > Based on the voting process, we can tentatively plan for release Yunikorn
> > 1.5.0 around the week of 4 - 8 March.
> >
> > Please feel free to share your thoughts.
> >
> > Thanks,
> > Tingyao
> >

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2042) REST API for specific queue

2024-02-26 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2042.
-
 Fix Version/s: 1.5.0
Target Version: 1.5.0  (was: 1.6.0)
Resolution: Fixed

change committed and cherry-picked into branch 1.5

> REST API for specific queue
> ---
>
> Key: YUNIKORN-2042
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2042
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - common
>Reporter: Ted Lin
>Assignee: Ted Lin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.0
>
>
> Expose a REST API for specific queue:
> /ws/v1/partition/%s/queue/%s/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2030) Need to check headroom when trying other nodes for reserved allocations

2024-02-26 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2030.
-
Fix Version/s: 1.5.0
   Resolution: Fixed

change committed and cherry-picked into branch-1.5

thank you for the analysis and change.

> Need to check headroom when trying other nodes for reserved allocations
> ---
>
> Key: YUNIKORN-2030
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2030
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.5.0
>
>
> As reported in YUNIKORN-1996, we are seeing many messages like below from 
> time to time:
> {code:java}
>  WARN    objects/application.go:1504 queue update failed unexpectedly 
>    {“error”: “allocation (map[memory:37580963840 pods:1 vcore:2000]) puts 
> queue ‘root.test-queue’ over maximum allocation (map[memory:3300011278336 
> vcore:390584]), current usage (map[memory:3291983380480 pods:91 
> vcore:186000])“}{code}
> Restarting Yunikorn helps stoppinging it. Creating this Jira to investigate 
> why it happened, because it's not supposed to happen as we check if there is 
> enough resource headroom before calling 
>  
> {code:java}
> func (sa *Application) tryNode(node *Node, ask *AllocationAsk) *Allocation 
> {code}
> which printed the above message, and only call it when there is enough 
> headroom.
> There maybe a bug in headroom checking?
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2448) Expose 3rd party licenses in the web UI

2024-02-22 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2448:
---

 Summary: Expose 3rd party licenses in the web UI
 Key: YUNIKORN-2448
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2448
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: webapp
Reporter: Wilfred Spiegelenburg


We have a 3rd party license file that gets generated and included in the 
deployment for the web UI. This 3rd party license file is accessible if you 
know what its name is etc.

We should expose this detail to comply with the some requirements on 
attribution etc as part of the web UI. Similar to how Jira exposes it as part 
of its About Jira pop up.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2413) Variables that are initialisms or acronyms should have a consistent case

2024-02-22 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2413.
-
Fix Version/s: 1.5.0
   Resolution: Fixed

Two refactors left for later: function names should be updated:

[{{{}master{}}}/pkg/events/event_ringbuffer.go#L206|https://github.com/apache/yunikorn-core/blob/master/pkg/events/event_ringbuffer.go?rgh-link-date=2024-02-19T17%3A21%3A31Z#L206]
[{{{}master{}}}/pkg/log/logger_test.go#L38|https://github.com/apache/yunikorn-core/blob/master/pkg/log/logger_test.go?rgh-link-date=2024-02-19T17%3A21%3A31Z#L38]

thank you [~priyansh] 

> Variables that are initialisms or acronyms should have a consistent case
> 
>
> Key: YUNIKORN-2413
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2413
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Ryan Lo
>Assignee: Priyansh Choudhary
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 1.5.0
>
>
> Discussed in YUNIKORN-2405
> We mixed up "Id" and "ID" in our code base, and it's better to standardize 
> the use of acronyms and initialisms according to [this 
> doc.|https://go.dev/wiki/CodeReviewComments#initialisms]
> An example:
> current: allocationId
> taget: allocationID



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2115) [Umbrella] YuniKorn application traceability - phase II

2024-02-21 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2115.
-
Fix Version/s: 1.5.0
   Resolution: Fixed

> [Umbrella] YuniKorn application traceability - phase II
> ---
>
> Key: YUNIKORN-2115
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2115
> Project: Apache YuniKorn
>  Issue Type: New Feature
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 1.5.0
>
>
> This is a follow-up on YUNIKORN-1628.
> This ticket focuses on streaming and user/group events.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2116) Track user/group events

2024-02-21 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2116.
-
Fix Version/s: 1.5.0
   Resolution: Fixed

Core changes committed. The changes to the SI have been committed last week.

Both PRs are done, closing.

> Track user/group events
> ---
>
> Key: YUNIKORN-2116
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2116
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2441) Wildcard limits are not applied to the root tracker during creation

2024-02-21 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2441.
-
Fix Version/s: 1.5.0
   Resolution: Fixed

Change committed

> Wildcard limits are not applied to the root tracker during creation
> ---
>
> Key: YUNIKORN-2441
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2441
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.5.0
>
>
> When a queue tracker is created with {{newQueueTracker()}}, the appropriate 
> wildcard limits are applied if the tracking type is "user".
> The problem is this call:
> {noformat}
>   if trackType == user {
>   if config := m.getUserWildCardLimitsConfig(queuePath + "." + 
> queueName); config != nil {
> {noformat}
> For "root", we'll call "root." (with a dot at the end) instead of "root".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2445) Add comments around locking setup in tracker code

2024-02-21 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2445:
---

 Summary: Add comments around locking setup in tracker code
 Key: YUNIKORN-2445
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2445
 Project: Apache YuniKorn
  Issue Type: Task
  Components: core - scheduler
Reporter: Wilfred Spiegelenburg


The QueueTracker code is lock free and should stay lock free. Each queue 
tracker object is always only linked to one UserTracker or GroupTracker. 
Locking is thus handled from those objects.

This does mean that calls to the user or group trackers that can modify the 
underlying queue tracker structure must take a write lock. 

This specifically impacts the {{canRunApp()}} and {{headroom()}} calls as they 
add new entries in the queue hierarchy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2440) [UMBRELLA] Remove stateaware scheduling

2024-02-21 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2440:
---

 Summary: [UMBRELLA] Remove stateaware scheduling
 Key: YUNIKORN-2440
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2440
 Project: Apache YuniKorn
  Issue Type: Task
  Components: core - scheduler
Reporter: Wilfred Spiegelenburg


Umbrella jira to track all the work to remove state ware scheduling:
* remove scheduling code
* remove documentation
* remove configuration options
* document way to achieve similar behaviour (FIFO with max applications)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2439) Announce deprecation of state aware scheduling

2024-02-21 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2439:
---

 Summary: Announce deprecation of state aware scheduling
 Key: YUNIKORN-2439
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2439
 Project: Apache YuniKorn
  Issue Type: Task
  Components: release-notes
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


State aware scheduling was a simple scheduling algorithm that provided a stop 
gap until gang scheduling was implemented. Gang scheduling and state aware do 
not work together. Gang scheduling is a more generic way of achieving almost 
the same behaviour.

State aware scheduling has a number of drawbacks and could be used as an attack 
vector to slow down overall scheduling performance.

We should deprecate it and remove in an upcoming release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-2026) Update features document in Chinese translation

2024-02-21 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg closed YUNIKORN-2026.
---

> Update features document in Chinese translation
> ---
>
> Key: YUNIKORN-2026
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2026
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: documentation
>Reporter: JiaChi Wang
>Assignee: JiaChi Wang
>Priority: Minor
>  Labels: pull-request-available
>
> Some parts are missing in the Chinese translation of the features document.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-1511) Adding Chinese translation of Deploy to Kubernetes

2024-02-21 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg closed YUNIKORN-1511.
---

> Adding Chinese translation of Deploy to Kubernetes
> --
>
> Key: YUNIKORN-1511
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1511
> Project: Apache YuniKorn
>  Issue Type: Task
>Reporter: Chen Yu Teng
>Assignee: Chenchen Lai
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-2220) pod.DeepCopy() is called twice in Task

2024-02-21 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg closed YUNIKORN-2220.
---

> pod.DeepCopy() is called twice in Task
> --
>
> Key: YUNIKORN-2220
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2220
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: shim - kubernetes
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Minor
>
> A small improvement is possible in {{task.go}}.
> In {{handleSubmitTaskEvent()}} and {{{}postTaskAllocated(){}}}, we call 
> {{pod.DeepCopy()}} twice to avoid possible race conditions, but a single copy 
> is enough. Once we have a copy, it's local to the method.
> {noformat}
> events.GetRecorder().Eventf(task.pod.DeepCopy(), nil, v1.EventTypeNormal, 
> "Scheduling", "Scheduling",
>   "%s is queued and waiting for allocation", task.alias)
>   // if this task belongs to a task group, that means the app has gang 
> scheduling enabled
>   // in this case, post an event to indicate the task is being gang 
> scheduled
>   if !task.placeholder && task.taskGroupName != "" {
>   events.GetRecorder().Eventf(task.pod.DeepCopy(), nil,
>   v1.EventTypeNormal, "GangScheduling", "GangScheduling",
>   "Pod belongs to the taskGroup %s, it will be scheduled 
> as a gang member", task.taskGroupName) <-- second copy if GS is used
>   }
> {noformat}
> {noformat}
> events.GetRecorder().Eventf(task.pod.DeepCopy(),
>   nil, v1.EventTypeNormal, "Scheduled", "Scheduled",
>   "Successfully assigned %s to node %s", task.alias, task.nodeName)
> ...
> events.GetRecorder().Eventf(task.pod.DeepCopy(), nil,
>   v1.EventTypeNormal, "PodBindSuccessful", "PodBindSuccessful",
>   "Pod %s is successfully bound to node %s", task.alias, task.nodeName)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-803) Improve coverage of partition.go

2024-02-21 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg closed YUNIKORN-803.
--

> Improve coverage of partition.go
> 
>
> Key: YUNIKORN-803
> URL: https://issues.apache.org/jira/browse/YUNIKORN-803
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler
>Reporter: Chen Yu Teng
>Assignee: Cliff Su
>Priority: Minor
> Attachments: list.png, partition.go coverage.png
>
>
> According to feedback of coverage file, add test to improve coverage of 
> partition.go



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-1691) Adding Chinese translation of User Based Resource Usage Tracking

2024-02-21 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg closed YUNIKORN-1691.
---

>  Adding Chinese translation of User Based Resource Usage Tracking
> -
>
> Key: YUNIKORN-1691
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1691
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: website
>Reporter: Chen Yu Teng
>Assignee: Chenchen Lai
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-1692) Adding Chinese translation of User Based Resource Usage Tracking

2024-02-21 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg closed YUNIKORN-1692.
---

> Adding Chinese translation of User Based Resource Usage Tracking
> 
>
> Key: YUNIKORN-1692
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1692
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: website
>Reporter: Chen Yu Teng
>Assignee: Huang Guan Hao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-2223) Eliminate separate mutex variables

2024-02-21 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg closed YUNIKORN-2223.
---

> Eliminate separate mutex variables
> --
>
> Key: YUNIKORN-2223
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2223
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: shim - kubernetes
>Reporter: Peter Bacsko
>Priority: Minor
>
> In {{{}cache.Task{}}}, the lock variable is defined as:
> {noformat}
> type Task struct {
> ...
> schedulingState TaskSchedulingState
> sm  *fsm.FSM
> lock*sync.RWMutex
> } {noformat}
> This also applies to {{cache.Application}} and {{cache.Context}}.
> In other parts of the code, we simply embed {{sync.RWMutex}}. There's no need 
> to have a separate variable. Locking and unlocking become simpler.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-1033) Add Chinese translation for developer guide documents

2024-02-21 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-1033.
-
Resolution: Won't Do

With the changes from YUNIKORN-2411 this is no longer relevant.

> Add Chinese translation for developer guide documents
> -
>
> Key: YUNIKORN-1033
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1033
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: documentation
>Reporter: cdmikechen
>Assignee: Chen Yu Teng
>Priority: Major
>
> Add Chinese translation for developer guide documents, this is a sub task on 
> https://issues.apache.org/jira/browse/YUNIKORN-1029
> This issue include YuniKorn site developer guide documents.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-1691) Adding Chinese translation of User Based Resource Usage Tracking

2024-02-21 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-1691.
-
Resolution: Won't Do

With the changes from YUNIKORN-2411 this is no longer relevant.

>  Adding Chinese translation of User Based Resource Usage Tracking
> -
>
> Key: YUNIKORN-1691
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1691
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: website
>Reporter: Chen Yu Teng
>Assignee: Chenchen Lai
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-1692) Adding Chinese translation of User Based Resource Usage Tracking

2024-02-21 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-1692.
-
Resolution: Won't Do

With the changes from YUNIKORN-2411 this is no longer relevant.

> Adding Chinese translation of User Based Resource Usage Tracking
> 
>
> Key: YUNIKORN-1692
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1692
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: website
>Reporter: Chen Yu Teng
>Assignee: Huang Guan Hao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-1511) Adding Chinese translation of Deploy to Kubernetes

2024-02-21 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-1511.
-
Resolution: Won't Do

With the changes from YUNIKORN-2411 this is no longer relevant.

> Adding Chinese translation of Deploy to Kubernetes
> --
>
> Key: YUNIKORN-1511
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1511
> Project: Apache YuniKorn
>  Issue Type: Task
>Reporter: Chen Yu Teng
>Assignee: Chenchen Lai
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2026) Update features document in Chinese translation

2024-02-21 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2026.
-
Resolution: Won't Do

With the changes from YUNIKORN-2411 this is no longer relevant.

> Update features document in Chinese translation
> ---
>
> Key: YUNIKORN-2026
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2026
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: documentation
>Reporter: JiaChi Wang
>Assignee: JiaChi Wang
>Priority: Minor
>  Labels: pull-request-available
>
> Some parts are missing in the Chinese translation of the features document.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2337) Update documentation about event streaming

2024-02-21 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2337.
-
Fix Version/s: 1.5.0
   Resolution: Fixed

New REST API end point added to the docs

> Update documentation about event streaming
> --
>
> Key: YUNIKORN-2337
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2337
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.0
>
>
> Update the docs about the new REST endpoint and possible config entries 
> (concurrent streaming limits).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2425) Release build script should use "go mod" instead of manual replacements

2024-02-19 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2425.
-
Fix Version/s: 1.5.0
   Resolution: Fixed

Using go mod edit instead of adding lines to go mod file.

> Release build script should use "go mod" instead of manual replacements
> ---
>
> Key: YUNIKORN-2425
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2425
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: release
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.0
>
>
> The tools/build-release.py script included in yunikorn-release uses manual 
> file editing to perform module replacements. This is fragile, and can fail in 
> a number of cases (including when a replace directive already exists). Go 
> provides native tooling to script this via the "go mod edit" command.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2407) Update review guidelines link

2024-02-11 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2407:
---

 Summary: Update review guidelines link
 Key: YUNIKORN-2407
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2407
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: website
Reporter: Wilfred Spiegelenburg


The coding guidelines link in the contribution guide points to the old 
location. Update the link to point to the wiki instead of the github page:
https://yunikorn.apache.org/community/coding_guidelines#the-basics

Point it to: https://go.dev/wiki/CodeReviewComments



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



Re: Should we remove Chinese documents

2024-02-08 Thread Wilfred Spiegelenburg
As a non-Chinese speaker I am leaving that decision to our Chinese
speaking community members.

Would be good to get some feedback on the following point:
If the decision is to stop: do we remove current content or let it age out?

Wilfred



On Fri, 9 Feb 2024 at 17:59, Chia-Ping Tsai  wrote:
>
> Dear all,
>
> We have a topic in slack 
> (https://yunikornworkspace.slack.com/archives/CL9CRJ1KM/p1707414644039739) to 
> discuss the future of Chinese documents.
>
> As a YK developer, I’d like to get rid of Chinese documents because it is 
> hard to make it up-to-date. Also, we ought to focus on other more important 
> features/fixes.
>
> As a (native) Chinese speaker, I’d like to get rid of Chinese documents 
> because the translation tools are good enough to help me to read the English 
> documents. Also, there are some native-Chinese developers in the YK 
> community, and hence I can have discussion with them by Chinese.
>
> As a YK committer, I’d like to see more feedbacks from the community before 
> making this difficult decision. Hence, PLEASE feel free to raise objection to 
> this proposal via mail or slack.
>
>
> --
> Happy Lunar New Year
> Chia-Ping Tsai (蔡嘉平)
> -
> To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
> For additional commands, e-mail: dev-h...@yunikorn.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Closed] (YUNIKORN-1333) [webapp] Expose current user quota usage details

2024-02-08 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg closed YUNIKORN-1333.
---

> [webapp] Expose current user quota usage details
> 
>
> Key: YUNIKORN-1333
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1333
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: webapp
>Reporter: Manikandan R
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



  1   2   3   4   5   6   7   8   9   10   >