Makes sense. Thanks!
On 12/15/21 21:36, Jungtaek Lim wrote:
> If ASF wants to do it, INFRA could probably deal with it for entire
> projects, like ASF code of conduct being exposed to the right side of
> the all ASF github repos recently.
>
> On Wed, Dec 15, 2021 at 11:49 PM Sean Owen wrote:
>
>
If ASF wants to do it, INFRA could probably deal with it for entire
projects, like ASF code of conduct being exposed to the right side of the
all ASF github repos recently.
On Wed, Dec 15, 2021 at 11:49 PM Sean Owen wrote:
> It might imply that this is a way to fund Spark alone, and it isn't.
>
Thanks for the suggestions. I suppose I should share a bit more about what
I tried/learned, so others who come later can understand why a
memory-efficient, exact median is not in Spark.
Spark's own ApproximatePercentile also uses QuantileSummaries internally
Nicholas,
This may or may not be much help, but in RasterFrames we have an
approximate quantiles Expression computed against Tiles (2d geospatial
arrays) which makes use of
`org.apache.spark.sql.catalyst.util.QuantileSummaries` to do the hard work.
So perhaps a directionally correct example of
Parquet or ORC have the necessary stats to make this fast too already, but
only helps if you want the median of sorted data as stored on disk, rather
than the general case. Not sure you can do better than roughly what a sort
entails if you want the exact median
On Wed, Dec 15, 2021, 8:56 AM Pol
Correct me if I am wrong, but If the dataset was indexed by the given
column, you could get the median without reading the whole dataset,
shuffling, and so on. Disclaimer (I work in Qbeast). So the issue is more
on the data format and the possibility to push down the operation to the
data source.
It might imply that this is a way to fund Spark alone, and it isn't.
Probably no big deal either way but maybe not worth it. It won't be a
mystery how to find and fund the ASF for the few orgs that want to, as
compared to a small project
On Wed, Dec 15, 2021, 8:34 AM Maciej wrote:
> Hi All,
>
>
Hi All,
Just wondering ‒ would it make sense to add .github/FUNDING.yml with
custom link pointing to one (or both) of these:
* https://www.apache.org/foundation/sponsorship.html
* https://www.apache.org/foundation/contributing.html
--
Best regards,
Maciej Szymkiewicz
Web: