cboumalh commented on PR #51298:
URL: https://github.com/apache/spark/pull/51298#issuecomment-3146708401

   Hi @HyukjinKwon @cloud-fan @gengliangwang — hope you're doing well.
   
   I wanted to resurface this PR (#51298), which adds Theta Sketch support to 
Spark SQL. It extends the existing HyperLogLog functionality by enabling set 
operations like intersection and difference, with full SQL and Python API 
support.
   
   We’ve been using this heavily at Amazon in production Spark pipelines for 
scalable set analytics (like segmentation and churn). The implementation 
includes tests and benchmarks. I proposed the idea in the dev mailing list and 
it received positive feedback from the original HLL contributors and the 
Datasketches founder.
   
   I noticed this was targeted to Spark 4.1.0 in JIRA — just wanted to check in 
and see if there’s anything I can do to help move it forward or address 
concerns. Happy to walk through any part of the code or design if that would be 
helpful — just let me know. Thanks again for your time and for maintaining the 
project!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to