haiyangsun-db commented on code in PR #55657: URL: https://github.com/apache/spark/pull/55657#discussion_r3236721427
########## udf/worker/proto/src/main/protobuf/udf_protocol.proto: ########## @@ -0,0 +1,681 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +syntax = "proto3"; + +import "common.proto"; + +package org.apache.spark.udf.worker; + +option java_package = "org.apache.spark.udf.worker"; +option java_multiple_files = true; + +// ===================================================================== +// Language-agnostic UDF execution protocol. +// +// The Spark engine acts as the gRPC client; a UDF worker (in any +// language) acts as the gRPC server. +// ===================================================================== + +// The default UDF gRPC service. A worker that exposes this service +// MUST do so over the default connection of the worker specification. +// +// In future, additional connections (e.g. a separate channel) may be +// reserved by the worker spec for other purposes. +service UdfWorker { + // Per-execution stream. See [[UdfControlRequest]] for the complete + // wire protocol and ordering invariants. + // + // Error contract: if the gRPC connection breaks at any point, gRPC + // surfaces an error on the stream. The engine therefore never needs Review Comment: IMO, detecting hanging UDF code is out-of-scope of this protocol as in theory customer can intentionally define a long sleep UDF. But yes, we should be able to set a timeout for that - e.g., a max timeout processing one row/unit of data. A client-side cancel mechanism here may not be sufficient to deal with that - assuming the grpc buffer is filled with input batches, and it causes a hang in the UDF worker, then the cancel message would never make it to the UDF worker. However, we shall be able to handle it with a UDF worker side time-out - if processing one row of data is taking more than a threshold, we shall trigger a hanging user-code exception. The comments here are not quite well written, i will polish that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
