cloud-fan commented on code in PR #53548: URL: https://github.com/apache/spark/pull/53548#discussion_r2638781376
########## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/kllAggregates.scala: ########## @@ -449,6 +449,414 @@ case class KllSketchAggDouble( } } +/** + * The KllMergeAggBigint function merges multiple Apache DataSketches KllLongsSketch instances + * that have been serialized to binary format. This is useful for combining sketches created + * in separate aggregations (e.g., from different partitions or time windows). + * It outputs the merged binary representation of the KllLongsSketch. + * + * See [[https://datasketches.apache.org/docs/KLL/KLLSketch.html]] for more information. + * + * @param child + * child expression containing binary KllLongsSketch representations to merge + * @param kExpr + * optional expression for the k parameter from the Apache DataSketches library that controls + * the size and accuracy of the sketch. Must be a constant integer between 8 and 65535. + * Default is 200 (normalized rank error ~1.65%). Larger k values provide more accurate + * estimates but result in larger, slower sketches. + * @param mutableAggBufferOffset + * offset for mutable aggregation buffer + * @param inputAggBufferOffset + * offset for input aggregation buffer + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = """ + _FUNC_(expr[, k]) - Merges binary KllLongsSketch representations and returns the merged sketch. + The input expression should contain binary sketch representations (e.g., from kll_sketch_agg_bigint). + The optional k parameter controls the size and accuracy of the merged sketch (default 200, range 8-65535). + """, + examples = """ + Examples: + > SELECT LENGTH(kll_sketch_to_string_bigint(_FUNC_(sketch_col))) > 0 FROM (SELECT kll_sketch_agg_bigint(col) as sketch_col FROM VALUES (1), (2), (3) tab(col)) t; Review Comment: What does this example mean? IIUC there is only one sketch binary and it doesn't really merge anything. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
