[GitHub] [spark] LuciferYang commented on a diff in pull request #42414: [SPARK-42664][CONNECT] Support `bloomFilter` function for `DataFrameStatFunctions`

via GitHub Wed, 09 Aug 2023 20:55:11 -0700


LuciferYang commented on code in PR #42414:
URL: https://github.com/apache/spark/pull/42414#discussion_r1289514717



##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -1730,6 +1731,36 @@ class SparkConnectPlanner(val sessionHolder: 
SessionHolder) extends Logging {
         val ignoreNulls = extractBoolean(children(3), "ignoreNulls")
         Some(Lead(children.head, children(1), children(2), ignoreNulls))
 
+      case "bloom_filter_agg" if fun.getArgumentsCount == 3 =>
+        // [col, expectedNumItems: Long, numBits: Long]
+        val children = fun.getArgumentsList.asScala.map(transformExpression)
+
+        // Check expectedNumItems is LongType and value greater than 0L
+        val expectedNumItemsExpr = children(1)
+        val expectedNumItems = expectedNumItemsExpr match {

Review Comment:
   Change to `Column.fn("bloom_filter_agg", col, lit(expectedNumItems), 
lit(numBits)`, the logic indeed appears simpler now, and I have a point for 
discussion. 
   
   @hvanhovell Do you think we should check the validity of the input here? By 
checking here, the error message can be exactly the same as the api in 
`sql/core`. However, if we use the validation mechanism of 
`BloomFilterAggregate`, the content of the error message will be different, but 
the code will be more concise.
   
   Perhaps we don't need to ensure that the error message is the same as before?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] LuciferYang commented on a diff in pull request #42414: [SPARK-42664][CONNECT] Support `bloomFilter` function for `DataFrameStatFunctions`

Reply via email to