Github user MLnick commented on the pull request:
https://github.com/apache/spark/pull/8362#issuecomment-140019207
@hvanhovell as discussed on the `dev` mailing list, perhaps it would be
interesting to allow the return type to include the aggregated HLL registers.
This could be (for example) in the form of `StructType` `{'cardinality':Long,
'hll': Array[Byte]`}, where the `hll` is in the same serialized form that can
be used to instantiate say a `StreamLib` or `Algebird` HLL class for use
outside of Spark.
Is it possible to specify input arguments for `rsd`? So `SELECT APPROX
DISTINCT(column, 0.1) FROM ...`? If so, then another option is to add a further
argument such as `returnHLL: Boolean = false` so that either the raw HLL or the
cardinality is returned?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]