[ 
https://issues.apache.org/jira/browse/SPARK-41276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17639586#comment-17639586
 ] 

Apache Spark commented on SPARK-41276:
--------------------------------------

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38811

> Optimize constructor use of `StructType`
> ----------------------------------------
>
>                 Key: SPARK-41276
>                 URL: https://issues.apache.org/jira/browse/SPARK-41276
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib, SQL
>    Affects Versions: 3.4.0
>            Reporter: Yang Jie
>            Priority: Minor
>
> There are two main ways to construct `StructType`:
> - Primary constructor
> ```scala
> case class StructType(fields: Array[StructField])
> ```
> - Use `Seq` as input constructor
> ```scala
> def apply(fields: Seq[StructField]): StructType = StructType(fields.toArray)
> ```
> These two construction methods are widely used in Spark, but the latter 
> requires an additional collection conversion.
> This pr changes the following 3 scenarios to use primary constructor to 
> reduce one collection conversion:
> 1. For manually create `Seq` input scenes, change to use manually create 
> `Array` input instead, for examaple:
> https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala#L55-L63
> 2. For the scenario where 'toSeq' is added to create input for compatibility 
> with Scala 2.13, directly call 'toArray' to instead, for example:
> https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala#L108-L113
> 3. For scenes whose input is originally `Array`, remove the redundant 
> `toSeq`, for example:
> https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L587-L592



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to