Hi Matt,

Here I believe strongly that we need tests. Nathan assured me that
nothing is faster on the GPU than sort+reduce-by-key since
they are highly optimized. I think they will be hard to beat, and the
initial timings I had say that this is the case. I am willing to be
wrong, but I am not willing to overengineer based on supposition.

Fair enough. Is a brute-force implementation for P1 elements sufficient as a baseline for discussion?

Best regards,
Karli

Reply via email to