jiangzho commented on code in PR #8: URL: https://github.com/apache/spark-kubernetes-operator/pull/8#discussion_r1577383742
########## spark-operator-api/src/main/java/org/apache/spark/kubernetes/operator/status/ApplicationAttemptSummary.java: ########## @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ + +package org.apache.spark.kubernetes.operator.status; + +import java.util.Map; + +import com.fasterxml.jackson.annotation.JsonIgnoreProperties; +import com.fasterxml.jackson.annotation.JsonInclude; +import lombok.AllArgsConstructor; +import lombok.Data; +import lombok.EqualsAndHashCode; +import lombok.NoArgsConstructor; + +@Data +@NoArgsConstructor +@AllArgsConstructor +@EqualsAndHashCode(callSuper = true) +@JsonInclude(JsonInclude.Include.NON_NULL) +@JsonIgnoreProperties(ignoreUnknown = true) +public class ApplicationAttemptSummary extends BaseAttemptSummary { + // The state transition history for given attempt + // This is used when state history trimming is enabled + protected Map<Long, ApplicationState> stateTransitionHistory; Review Comment: This is for the sake of unique state identification. We attempt to assign an unique id to each `ApplicationState`, that would be always incrementing across multiple attempts. I also considered to add state id inside the `ApplicationState` instead of introducing a map of state id <-> state, but it end up with many corner cases to achieve idempotency for state transitioning. It also serves in state transition history truncating. Sometimes this state transition history can be really long & cause big items in ETCD. The map helps us to avoid iterating the full state each time we truncate the history. ########## spark-operator-api/src/main/java/org/apache/spark/kubernetes/operator/status/ApplicationStateSummary.java: ########## @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ + +package org.apache.spark.kubernetes.operator.status; + +import java.util.Set; + +public enum ApplicationStateSummary implements BaseStateSummary { + /** Spark application is submitted to the cluster but yet scheduled. */ + SUBMITTED, + + /** Spark application will be restarted with same configuration */ + SCHEDULED_TO_RESTART, + + /** A request has been made to start driver pod in the cluster */ + DRIVER_REQUESTED, + + /** Driver pod has reached running state */ + DRIVER_STARTED, + + /** Spark session is initialized */ Review Comment: Updated the doc - yea this can be confusing. We'll name it in a way revealing that this indeed means driver is ready and can be exposed via service. ########## spark-operator-api/src/main/java/org/apache/spark/kubernetes/operator/status/ApplicationStateSummary.java: ########## @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ + +package org.apache.spark.kubernetes.operator.status; + +import java.util.Set; + +public enum ApplicationStateSummary implements BaseStateSummary { + /** Spark application is submitted to the cluster but yet scheduled. */ + SUBMITTED, + + /** Spark application will be restarted with same configuration */ + SCHEDULED_TO_RESTART, + + /** A request has been made to start driver pod in the cluster */ + DRIVER_REQUESTED, + + /** Driver pod has reached running state */ + DRIVER_STARTED, + + /** Spark session is initialized */ + DRIVER_READY, + + /** Less that minimal required executor pods become ready during starting up */ Review Comment: Updated description. You are absolutely right that executor state does not imply their actual state from Spark perspective. This is a 'best effort' from operator side to observe app status without modification to the core. We do have some ideas to optimize this in future versions. Making operator able to detect app status by: * connect to driver to get its registered executor information (instead of watching executor pods). I may use existing SparkUI for this purpose - user should be able to opt-in this feature if they enables pod-to-pod communication between operator and driver. * have the driver updates CRD status, possibly via a listener. These future enhancement may involve core / k8s module changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
