Hi Edo, Thanks for the reply. I would like to help build a prototype. I will experiment with AVX2 & AVX512 instructions, and see if I get something in experiments.
Thank You, Gourav On Mon, 12 Apr 2021 at 23:52, Edo Liberty <edo.libe...@gmail.com> wrote: > Hi Gourav > This sounds like a very good idea. I’m sure it will improve the > performance for KLL. Would you care to help build a prototype for it? > Asaik we didn’t experiment with SIMD instruction sets yet. > Edo > > > On Mon, Apr 12, 2021 at 08:50 Gourav Kumar <gourav1...@gmail.com> wrote: > >> Hi All, >> >> Hope you all are keeping well in these COVID times. >> >> I am using KLL Sketch for my project where I need to compute approx >> percentile over a stream. >> I have gone through the paper Streaming Quantiles Algorithms with Small >> Space and Update Time (arxiv.org) <https://arxiv.org/pdf/1907.00236.pdf>. >> >> >> There in the last section in appendix Figure 11, it has been mentioned >> that a significant *amount of time is spend in sorting* after sketch >> becomes full and, also the figure shows with sort the update time goes to >> *50ns( >> Opposed to 20ns without sorting)*. >> >> I was wondering if there has been an effort to improve on this using >> vectorized instruction (leveragin AVX2, AVX512 instructions) even if it >> works just for integers & floats. >> >> There are Sorts like Bitonic Sort which can reduce sorting time with AVX >> instructions for smaller arrays, as they use sorting networks. >> >> My Question to team is : >> >> Q1. Some guidance/pointers on any KLL Sketch vectorization effort in this >> direction if any? (Specially to reduce the sort time, even if it works just >> for integer & float sketches) >> >> I tried to optimize sketch further for integer streams. It was evident in >> profiling that 50% of time is spent in sorting. Sorting suffered mostly >> because of branch misprediction. >> >> Q2. Do you think any such vectorization effort will be better without >> lazy compaction? As the level boundaries will be fixed, which might help >> while loading/storing data into vector registers. >> >> >> -- >> Regards, >> Gourav >> > -- Regards, Gourav