[GitHub] [tvm] yzh119 edited a comment on pull request #10207: Support sub warp reduction for CUDA target.

2022-02-11 Thread GitBox
yzh119 edited a comment on pull request #10207: URL: https://github.com/apache/tvm/pull/10207#issuecomment-1035993696 > Looks like the perf improvement isn't very much? Only when n = 4 the shuffle-down implementation is better than the shared memory implementation 樂 My typo, I have

[GitHub] [tvm] yzh119 edited a comment on pull request #10207: Support sub warp reduction for CUDA target.

2022-02-11 Thread GitBox
yzh119 edited a comment on pull request #10207: URL: https://github.com/apache/tvm/pull/10207#issuecomment-1034535980 Sure, below is the measured time of the kernel: ```python @T.prim_func def reduce(a: T.handle, b: T.handle, n: T.int32) -> None: A = T.match_buffer(a,

[GitHub] [tvm] yzh119 edited a comment on pull request #10207: Support sub warp reduction for CUDA target.

2022-02-11 Thread GitBox
yzh119 edited a comment on pull request #10207: URL: https://github.com/apache/tvm/pull/10207#issuecomment-1035995100 > BTW do we have this requirement in the codebase now? @MasterJH5574 yes there is a notion of `group_extent` and `reduce_extent`. -- This is an automated message

[GitHub] [tvm] yzh119 edited a comment on pull request #10207: Support sub warp reduction for CUDA target.

2022-02-10 Thread GitBox
yzh119 edited a comment on pull request #10207: URL: https://github.com/apache/tvm/pull/10207#issuecomment-1034575574 Some other notes: If in the following case: ```python @T.prim_func def reduce(a: T.handle, b: T.handle, n: T.int32) -> None: A = T.match_buffer(a, [1,

[GitHub] [tvm] yzh119 edited a comment on pull request #10207: Support sub warp reduction for CUDA target.

2022-02-10 Thread GitBox
yzh119 edited a comment on pull request #10207: URL: https://github.com/apache/tvm/pull/10207#issuecomment-1034535980 Sure, below is the measured time of the kernel: ```python @T.prim_func def reduce(a: T.handle, b: T.handle, n: T.int32) -> None: A = T.match_buffer(a,

[GitHub] [tvm] yzh119 edited a comment on pull request #10207: Support sub warp reduction for CUDA target.

2022-02-09 Thread GitBox
yzh119 edited a comment on pull request #10207: URL: https://github.com/apache/tvm/pull/10207#issuecomment-1034575574 There are some issues to be solved: If in the following case: ```python @T.prim_func def reduce(a: T.handle, b: T.handle, n: T.int32) -> None: A =

[GitHub] [tvm] yzh119 edited a comment on pull request #10207: Support sub warp reduction for CUDA target.

2022-02-09 Thread GitBox
yzh119 edited a comment on pull request #10207: URL: https://github.com/apache/tvm/pull/10207#issuecomment-1034575574 There are some issues to be solved: If in the following case: ```python @T.prim_func def reduce(a: T.handle, b: T.handle, n: T.int32) -> None: A =