Do you mean you want to call pyarrow.compute.tdigest on different inputs over the time, and continuously merge the results into one tdigest?
Pyarrow.compute.tdigest (python wrapper of c++ kernel) is an aggregate kernel to consume input array and output the wanted quantiles. It's not suitable to return the internal tdigest structure (and how can one make use of the tdigest structure?). The c++ tdigest utility (not kernel) does supports merging tdigests. [1] Is it possible to use the tdigest utility directly? [1] https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/tdigest.h#L79 Yibo From: [email protected] <[email protected]> Sent: Monday, March 21, 2022 10:06 PM To: [email protected] Subject: [Python] pyarrow.compute.tdigest return class Hello everyone, Is there any way for the pyarrow.compute.tdigest function to return a TDigest structure in such a way that it can be merged? I have a use case where I would like to store time series percentile distributions. The pyarrow function tdigest is very fast but the output is numbers and these cannot be aggregated. I have tried using TDigest (https://github.com/CamDavidsonPilon/tdigest) but it is very slow. Thank you very much. IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
