dtenedor commented on code in PR #53548: URL: https://github.com/apache/spark/pull/53548#discussion_r2673460353
########## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/kllAggregates.scala: ########## @@ -449,6 +449,414 @@ case class KllSketchAggDouble( } } +/** + * The KllMergeAggBigint function merges multiple Apache DataSketches KllLongsSketch instances + * that have been serialized to binary format. This is useful for combining sketches created + * in separate aggregations (e.g., from different partitions or time windows). + * It outputs the merged binary representation of the KllLongsSketch. + * + * See [[https://datasketches.apache.org/docs/KLL/KLLSketch.html]] for more information. + * + * @param child + * child expression containing binary KllLongsSketch representations to merge + * @param kExpr + * optional expression for the k parameter from the Apache DataSketches library that controls Review Comment: Sure, this sounds good. Let's do your suggestion where we use the K value from the first accumulated sketch if not specified as a parameter. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
