masahi edited a comment on pull request #7195: URL: https://github.com/apache/tvm/pull/7195#issuecomment-754199021
@mbrookhart I have a fast-path for one segment case, so the perf is the same between current / new. I'll update the condition to work for dimension other than two. https://github.com/apache/tvm/blob/26254f522de531569441eac4fecb45885fcdc30a/src/runtime/contrib/thrust/thrust.cu#L57 @trevor-m Yes I briefly looked at cub's segmented sort. My impression is that it launches one thread block per segment. This sounds great when there are many segments to sort and each of segment is not so big. I'm not sure if that is a good fit for our use case - I think we are more likely to sort a few, but large segments, and most likely we only have one segment. I'm actually surprised to hear that TRT uses cub's segmented sort. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org