bersprockets commented on a change in pull request #23392: [SPARK-26450][SQL]
Avoid rebuilding map of schema for every column in projection
URL: https://github.com/apache/spark/pull/23392#discussion_r244423562
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala
##########
@@ -316,8 +316,10 @@ object GenerateUnsafeProjection extends
CodeGenerator[Seq[Expression], UnsafePro
protected def canonicalize(in: Seq[Expression]): Seq[Expression] =
in.map(ExpressionCanonicalizer.execute)
- protected def bind(in: Seq[Expression], inputSchema: Seq[Attribute]):
Seq[Expression] =
- in.map(BindReferences.bindReference(_, inputSchema))
+ protected def bind(in: Seq[Expression], inputSchema: Seq[Attribute]):
Seq[Expression] = {
+ lazy val inputSchemaAttrSeq: AttributeSeq = inputSchema
Review comment:
Yes, that is the reason. For example, the query <code>df.count</code>, where
df is a dataframe from a CSV datasource, calls GenerateUnsafeProjection.bind
with am empty list of expressions.
However, the map inside the AttributeSeq object is not built until someone
accesses exprIdToOrdinal, so maybe it is overkill.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]