Hi folks,
Some thoughts regarding this.
1. DataFrame Geom serializer was changed to WKB serializer from Sedona
original SHAPE serializer, in 1.0.0 [1]. Based on our test, the old
serializer was several times faster than the WKB serializer [2]. We are
working on a PR to support both SHAPE serializ
Are you using the 1.0.0 release? If so, there's a bug that prevented
spatial indexing from being used in SQL join queries, which hopefully
explains the difference. Also, there will be broadcast join support too
which could make the SQL join even faster than RDD join for large-small
joins.
Adam
On
I've noticed that performing joins with the DataFrame API tends to be
significantly slower than using the SpatialRDD API directly. To illustrate,
I've put together a simple benchmark, which generates 10k points and 10k
envelopes at random, then counts the number of envelope/point pairs such that