[GitHub] spark pull request #16541: [SPARK-19088][SQL] Optimize sequence type deseria...

viirya Thu, 12 Jan 2017 17:21:31 -0800

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16541#discussion_r95922004
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
    @@ -589,6 +590,171 @@ case class MapObjects private(
       }
     }
     
    +object CollectObjects {
    +  private val curId = new java.util.concurrent.atomic.AtomicInteger()
    +
    +  /**
    +   * Construct an instance of CollectObjects case class.
    +   *
    +   * @param function The function applied on the collection elements.
    +   * @param inputData An expression that when evaluated returns a 
collection object.
    +   * @param elementType The data type of elements in the collection.
    +   * @param collClass The type of the resulting collection.
    +   */
    +  def apply(
    +             function: Expression => Expression,
    +             inputData: Expression,
    +             elementType: DataType,
    +             collClass: Class[_]): CollectObjects = {
    +    val loopValue = "CollectObjects_loopValue" + curId.getAndIncrement()
    +    val loopIsNull = "CollectObjects_loopIsNull" + curId.getAndIncrement()
    +    val loopVar = LambdaVariable(loopValue, loopIsNull, elementType)
    +    val builderValue = "CollectObjects_builderValue" + 
curId.getAndIncrement()
    +    CollectObjects(loopValue, loopIsNull, elementType, function(loopVar), 
inputData,
    +      collClass, builderValue)
    +  }
    +}
    +
    +/**
    + * An equivalent to the [[MapObjects]] case class but returning an 
ObjectType containing
    + * a Scala collection constructed using the associated builder, obtained 
by calling `newBuilder`
    + * on the collection's companion object.
    + *
    + * @param loopValue the name of the loop variable that used when iterate 
the collection, and used
    + *                  as input for the `lambdaFunction`
    + * @param loopIsNull the nullity of the loop variable that used when 
iterate the collection, and
    + *                   used as input for the `lambdaFunction`
    + * @param loopVarDataType the data type of the loop variable that used 
when iterate the collection,
    + *                        and used as input for the `lambdaFunction`
    + * @param lambdaFunction A function that take the `loopVar` as input, and 
used as lambda function
    + *                       to handle collection elements.
    + * @param inputData An expression that when evaluated returns a collection 
object.
    + * @param collClass The type of the resulting collection.
    + * @param builderValue The name of the builder variable used to construct 
the resulting collection.
    + */
    +case class CollectObjects private(
    +    loopValue: String,
    +    loopIsNull: String,
    +    loopVarDataType: DataType,
    +    lambdaFunction: Expression,
    +    inputData: Expression,
    +    collClass: Class[_],
    +    builderValue: String) extends Expression with NonSQLExpression {
    +
    +  override def nullable: Boolean = inputData.nullable
    +
    +  override def children: Seq[Expression] = lambdaFunction :: inputData :: 
Nil
    +
    +  override def eval(input: InternalRow): Any =
    +    throw new UnsupportedOperationException("Only code-generated 
evaluation is supported")
    +
    +  override def dataType: DataType = ObjectType(collClass)
    +
    +  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    +    val collObjectName = s"${collClass.getName}$$.MODULE$$"
    +    val getBuilderVar = s"$collObjectName.newBuilder()"
    --- End diff --
    
    Yeah, I see. Although we can't deserialize back to `Range`, as there is no 
problem to serialize it into internal format in SparkSQL, we still can convert 
the dataset to a dataframe. With `RowEncoder`, we can deserialize back to 
`Row`. That it what I do in #16546.
    
    I will try if the `Seq` builder fallback work for that pr. Thanks.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16541: [SPARK-19088][SQL] Optimize sequence type deseria...

Reply via email to