Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21118#discussion_r183380926 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/SupportsScanUnsafeRow.java --- @@ -1,46 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.spark.sql.sources.v2.reader; - -import java.util.List; - -import org.apache.spark.annotation.InterfaceStability; -import org.apache.spark.sql.Row; -import org.apache.spark.sql.catalyst.expressions.UnsafeRow; - -/** - * A mix-in interface for {@link DataSourceReader}. Data source readers can implement this - * interface to output {@link UnsafeRow} directly and avoid the row copy at Spark side. - * This is an experimental and unstable interface, as {@link UnsafeRow} is not public and may get - * changed in the future Spark versions. - */ -@InterfaceStability.Unstable -public interface SupportsScanUnsafeRow extends DataSourceReader { --- End diff -- I think we still need this trait. In Spark SQL there is a contract that all operators produce `UnsafeRow`, except some Dataset related operators. We have operators assuming the input rows are `UnsafeRow` and do type cast, e.g. operators which use `GenerateUnsafeRowJoiner`. That is to say, the data source scan has to do an unsafe projection to make sure it produces unsafe rows. This will be a waste if the data source already produces unsafe rows. For file-based data source, we solve this issue by adding a flag `needsUnsafeRowConversion` to decide if we need the unsafe projection or not. Another solution is what @rdblue proposed in the dev list discussion: do `isInstanOf[UnsafeRow]` check for each input row and skip unsafe projection if it's already unsafe row. That may have a performance penalty because of the per-row check. So this trait is still useful, at least for the built-in file-based data sources.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org