[GitHub] [tvm] masahi commented on pull request #7303: [TOPI] Make cumsum IR reusable, add thrust scan

2021-01-20 Thread GitBox


masahi commented on pull request #7303:
URL: https://github.com/apache/tvm/pull/7303#issuecomment-763456381


   Thanks @mbrookhart @anijain2305 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [tvm] masahi commented on pull request #7303: [TOPI] Make cumsum IR reusable, add thrust scan

2021-01-19 Thread GitBox


masahi commented on pull request #7303:
URL: https://github.com/apache/tvm/pull/7303#issuecomment-763291062


   @anijain2305 I added an empty tensor test in 
https://github.com/apache/tvm/pull/7303/commits/20afc3243a17f48084204855f498c7f9af1cad7a
   
   OpenCL seems to have a problem with 0 size buffer, but otherwise both TIR 
scan and thrust scan seem to have no issue. Please take a look.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [tvm] masahi commented on pull request #7303: [TOPI] Make cumsum IR reusable, add thrust scan

2021-01-19 Thread GitBox


masahi commented on pull request #7303:
URL: https://github.com/apache/tvm/pull/7303#issuecomment-763264832


   > Once it is merged, I can try on my end with TF models as well.
   
   Perf improvement is not expected, since it only improves `get_valid_count` 
slightly if you use thrust scan instead of TIR scan. The purpose of this PR is 
to enable parallelization for other ops, that are difficult without it. 
`argwhere` is a perfect example that I'll demonstrate soon after this one.
   
   @anijain2305 The term you want to search for is "gpu stream compaction".



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [tvm] masahi commented on pull request #7303: [TOPI] Make cumsum IR reusable, add thrust scan

2021-01-19 Thread GitBox


masahi commented on pull request #7303:
URL: https://github.com/apache/tvm/pull/7303#issuecomment-763257824


   hmm interesting, I've never created a test case with empty tensor, is that 
possible?
   
   Note that the IR is copied straight from 
https://github.com/apache/tvm/pull/7303, so the same guard against empty tensor 
is here.
   
   
https://github.com/apache/tvm/blob/4e13a3f4a04300113e9332ef581859cb0a40a082/python/tvm/topi/cuda/scan.py#L59



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [tvm] masahi commented on pull request #7303: [TOPI] Make cumsum IR reusable, add thrust scan

2021-01-19 Thread GitBox


masahi commented on pull request #7303:
URL: https://github.com/apache/tvm/pull/7303#issuecomment-763106769


   1. Right now, inclusive scan can be supported by `exclusive_scan(data) + 
data`. I think that is fine for now, given that our scan IR is far from stable 
and we don't want to maintain two IRs for the sake of removing the additional 
sum.
   
   2. Yes, we can definitely do that. But this PR is already not small and I 
want to keep the original IR as close as possible for this PR. There are other 
TODO items for scan (e.g. support other binary ops), so I hope we can address 
this problem in the future as well.
   
   A related discussion point: Do you expect scan performance on non-innermost 
axis to be slower than the innermost case? If that's the case (which I believe 
yes), I think supporting non innermost scan and other ranks by 
   ```
   reshape + transpose + innermost scan + reshape and transpose back 
   ```
   is a good solution. It is definitely preferred in terms of implementation 
simplicity, allowing scan implementation to focus on 1 or 2D + innermost axis.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org