Re: DataFrame joins much slower than SpatialRDD joins

2021-04-14 Thread Jia Yu
Hi folks, Some thoughts regarding this. 1. DataFrame Geom serializer was changed to WKB serializer from Sedona original SHAPE serializer, in 1.0.0 [1]. Based on our test, the old serializer was several times faster than the WKB serializer [2]. We are working on a PR to support both SHAPE serializ

Re: DataFrame joins much slower than SpatialRDD joins

2021-04-14 Thread Adam Binford
Are you using the 1.0.0 release? If so, there's a bug that prevented spatial indexing from being used in SQL join queries, which hopefully explains the difference. Also, there will be broadcast join support too which could make the SQL join even faster than RDD join for large-small joins. Adam On

DataFrame joins much slower than SpatialRDD joins

2021-04-13 Thread Andrew Brooks
I've noticed that performing joins with the DataFrame API tends to be significantly slower than using the SpatialRDD API directly. To illustrate, I've put together a simple benchmark, which generates 10k points and 10k envelopes at random, then counts the number of envelope/point pairs such that