date:20211215

Re: [MISC] Should we add .github/FUNDING.yml

2021-12-15 Thread Maciej

Makes sense. Thanks! On 12/15/21 21:36, Jungtaek Lim wrote: > If ASF wants to do it, INFRA could probably deal with it for entire > projects, like ASF code of conduct being exposed to the right side of > the all ASF github repos recently. > > On Wed, Dec 15, 2021 at 11:49 PM Sean Owen wrote: > >

Re: [MISC] Should we add .github/FUNDING.yml

2021-12-15 Thread Jungtaek Lim

If ASF wants to do it, INFRA could probably deal with it for entire projects, like ASF code of conduct being exposed to the right side of the all ASF github repos recently. On Wed, Dec 15, 2021 at 11:49 PM Sean Owen wrote: > It might imply that this is a way to fund Spark alone, and it isn't. >

Re: Creating a memory-efficient AggregateFunction to calculate Median

2021-12-15 Thread Nicholas Chammas

Thanks for the suggestions. I suppose I should share a bit more about what I tried/learned, so others who come later can understand why a memory-efficient, exact median is not in Spark. Spark's own ApproximatePercentile also uses QuantileSummaries internally

Re: Creating a memory-efficient AggregateFunction to calculate Median

2021-12-15 Thread Fitch, Simeon

Nicholas, This may or may not be much help, but in RasterFrames we have an approximate quantiles Expression computed against Tiles (2d geospatial arrays) which makes use of `org.apache.spark.sql.catalyst.util.QuantileSummaries` to do the hard work. So perhaps a directionally correct example of

Re: Creating a memory-efficient AggregateFunction to calculate Median

2021-12-15 Thread Sean Owen

Parquet or ORC have the necessary stats to make this fast too already, but only helps if you want the median of sorted data as stored on disk, rather than the general case. Not sure you can do better than roughly what a sort entails if you want the exact median On Wed, Dec 15, 2021, 8:56 AM Pol

Re: Creating a memory-efficient AggregateFunction to calculate Median

2021-12-15 Thread Pol Santamaria

Correct me if I am wrong, but If the dataset was indexed by the given column, you could get the median without reading the whole dataset, shuffling, and so on. Disclaimer (I work in Qbeast). So the issue is more on the data format and the possibility to push down the operation to the data source.

Re: [MISC] Should we add .github/FUNDING.yml

2021-12-15 Thread Sean Owen

It might imply that this is a way to fund Spark alone, and it isn't. Probably no big deal either way but maybe not worth it. It won't be a mystery how to find and fund the ASF for the few orgs that want to, as compared to a small project On Wed, Dec 15, 2021, 8:34 AM Maciej wrote: > Hi All, > >

[MISC] Should we add .github/FUNDING.yml

2021-12-15 Thread Maciej

Hi All, Just wondering ‒ would it make sense to add .github/FUNDING.yml with custom link pointing to one (or both) of these: * https://www.apache.org/foundation/sponsorship.html * https://www.apache.org/foundation/contributing.html -- Best regards, Maciej Szymkiewicz Web:

Re: [MISC] Should we add .github/FUNDING.yml

Re: [MISC] Should we add .github/FUNDING.yml

Re: Creating a memory-efficient AggregateFunction to calculate Median

Re: Creating a memory-efficient AggregateFunction to calculate Median

Re: Creating a memory-efficient AggregateFunction to calculate Median

Re: Creating a memory-efficient AggregateFunction to calculate Median

Re: [MISC] Should we add .github/FUNDING.yml

[MISC] Should we add .github/FUNDING.yml

8 matches

Site Navigation

Mail list logo

Footer information