Re: Creating a memory-efficient AggregateFunction to calculate Median

2021-12-15 Thread Fitch, Simeon
Nicholas, This may or may not be much help, but in RasterFrames we have an approximate quantiles Expression computed against Tiles (2d geospatial arrays) which makes use of `org.apache.spark.sql.catalyst.util.QuantileSummaries` to do the hard work. So perhaps a directionally correct example of

Re: [SPARK-20384][SQL] Support value class in schema of Dataset (third time's a charm)

2021-08-03 Thread Fitch, Simeon
Emil, We too are interested in this work. Thank you for resurrecting it. I hope the Spark committers work to incorporate it. Regards, Simeon On Mon, Aug 2, 2021 at 9:51 AM Emil Ejbyfeldt wrote: > Hi dev, > > After looking into the details of this and discussing with the other > authors that

Help Migrating BaseRelation to Spark 3.x

2021-06-03 Thread Fitch, Simeon
Hi, I'm the tech lead on RasterFrames, which adds geospatial raster data capability to Apache Spark SQL. We are trying to migrate to Spark 3.x, and are struggling with getting our various DataSources to work, and wondered if some might share some tips on what might be going on. Most of our issues

Re: Public API access to UDTs

2021-02-01 Thread Fitch, Simeon
, seems pretty small as a change? >> >> On Thu, Jan 28, 2021 at 5:10 PM Fitch, Simeon wrote: >> >>> Hi, >>> >>> First time posting here, so apologies if I need to be directing this >>> topic elsewhere. >>> >>> I'm the author of

Re: Public API access to UDTs

2021-01-29 Thread Fitch, Simeon
75-L76 > Just making it public for developers, even with a 'use at your own risk' > warning, seems pretty small as a change? > > On Thu, Jan 28, 2021 at 5:10 PM Fitch, Simeon wrote: > >> Hi, >> >> First time posting here, so apologies if I need to be directing this &

Public API access to UDTs

2021-01-28 Thread Fitch, Simeon
Hi, First time posting here, so apologies if I need to be directing this topic elsewhere. I'm the author of RasterFrames, and a contributor to GeoMesa's Spark SQL module. Both make use of decently low level Catalyst constructs, include custom UDTs; RasterFrames introduces a geospatial raster