Having the option of keeping the t-digest separate could be useful. For instance Google's SQL dialect allows for tracking some sketch data structures separately [1]
[1] https://cloud.google.com/bigquery/docs/reference/standard-sql/hll_functions On Mon, Mar 21, 2022 at 7:42 PM Yibo Cai <[email protected]> wrote: > Do you mean you want to call pyarrow.compute.tdigest on different inputs > over the time, and continuously merge the results into one tdigest? > > > > Pyarrow.compute.tdigest (python wrapper of c++ kernel) is an aggregate > kernel to consume input array and output the wanted quantiles. It’s not > suitable to return the internal tdigest structure (and how can one make use > of the tdigest structure?). > > > > The c++ tdigest utility (not kernel) does supports merging tdigests. [1] > > Is it possible to use the tdigest utility directly? > > > > [1] > https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/tdigest.h#L79 > > > > Yibo > > > > *From:* [email protected] <[email protected]> > *Sent:* Monday, March 21, 2022 10:06 PM > *To:* [email protected] > *Subject:* [Python] pyarrow.compute.tdigest return class > > > > Hello everyone, > > > > Is there any way for the pyarrow.compute.tdigest function to return a > TDigest structure in such a way that it can be merged? > > > > I have a use case where I would like to store time series percentile > distributions. The pyarrow function tdigest is very fast but the output is > numbers and these cannot be aggregated. > > > > I have tried using TDigest (https://github.com/CamDavidsonPilon/tdigest) > but it is very slow. > > > > Thank you very much. > > > IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose the > contents to any other person, use it for any purpose, or store or copy the > information in any medium. Thank you. >
