rwpenney commented on a change in pull request #30745:
URL: https://github.com/apache/spark/pull/30745#discussion_r544526828



##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Product.scala
##########
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.util.TypeUtils
+import org.apache.spark.sql.types.{ AbstractDataType, DataType, DoubleType, 
NumericType }
+
+
+@ExpressionDescription(
+  usage = "_FUNC_(expr) - Returns the product calculated from values of a 
group.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(col) FROM VALUES (2), (3), (5) AS tab(col);
+       30
+      > SELECT _FUNC_(col) FROM VALUES (NULL), (5), (7) AS tab(col);
+       35
+      > SELECT _FUNC_(col) FROM VALUES (NULL), (NULL) AS tab(col);
+       NULL
+  """,
+  group = "agg_funcs",
+  since = "3.2.0")
+case class Product(child: Expression, scale: Double = 1.0)
+    extends DeclarativeAggregate with ImplicitCastInputTypes {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  override def nullable: Boolean = true
+
+  override def dataType: DataType = resultType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(NumericType)

Review comment:
       I don't think we want to restrict the inputs to be just `DoubleType`, 
unless I'm incorrect about Spark automatically up-casting integer and float 
types when passed to this function. My thinking, from the user's perspective is 
as follows:
   * It should be easy to pass-in a list of integers without any explicit 
casting in the user's code
   * If the user knows that their result will fit within a 52-bit integer, then 
they can down-cast the output of `product` and obtain the exact integer result
   * It's likely to be commonplace for input datasets to use either floats or 
doubles, and it's an intuitive behaviour that they will silently up-cast to 
`Double` (as with `avg()`)
   * There's no easy way of *preventing* the user from doing something silly 
that causes overflows or numerical round-off errors, but a `Double` output type 
provides a good all-round compromise. I'd expect that situations where the 
output overflows will require particular knowledge of the input dataset that 
must be the user's responsibility.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to