(tvm) branch nightly updated (460f6f1d3e -> de91c5ca94)

2024-04-17 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch nightly
in repository https://gitbox.apache.org/repos/asf/tvm.git


from 460f6f1d3e [QoL][Relax] Infer StructInfo for relax::Tuple on 
construction (#16860)
 add d030ce27a1 [TVMScript] Optionally use `ruff format` instead of `black` 
(#16876)
 add 857fe614ab [Target] Don't register AArch64 target tags without LLVM 
compiler support (#16897)
 add b3ffd97569 [BYOC] Add layout check and update shape check for cublas 
FP8 BYOC (#16895)
 add da56c89f32 [Dlight] Enhance vectorization for gpu matmul (#16894)
 add de91c5ca94 [Bugfix] rocm shared memory issue on MI250 (#16901)

No new revisions were added by this update.

Summary of changes:
 cmake/modules/LLVM.cmake |  1 +
 cmake/utils/FindLLVM.cmake   | 18 +
 python/tvm/dlight/gpu/gemv.py|  5 +-
 python/tvm/dlight/gpu/matmul.py  |  7 +-
 python/tvm/relax/backend/contrib/cublas.py   | 28 ++-
 python/tvm/script/highlight.py   | 95 +++-
 src/target/parsers/aprofile.cc   |  7 +-
 src/target/tag.cc|  6 +-
 tests/python/dlight/test_gpu_matmul.py   | 81 ++--
 tests/python/dlight/test_gpu_matmul_tensorize.py | 18 ++---
 tests/python/relax/test_codegen_cublas.py| 20 +++--
 11 files changed, 199 insertions(+), 87 deletions(-)



[PR] [Bugfix] CudaDeviceAPI::GetAttr may check kExist when GPUs absent [tvm]

2024-04-17 Thread via GitHub


Lunderberg opened a new pull request, #16903:
URL: https://github.com/apache/tvm/pull/16903

   This commit resolves a bug that was introduced in
   https://github.com/apache/tvm/pull/16377.  If no CUDA-capable GPUs are 
present, the call to `cudaGetDeviceCount` will return an error, which will be 
raised as an exception by the `CUDA_CALL` macro.  However, checking the 
`kExist` flag is valid even if no GPUs are present.
   
   This commit removes the use of `CUDA_CALL`, and instead returns false in 
this case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(tvm) branch main updated (da56c89f32 -> de91c5ca94)

2024-04-17 Thread tqchen
This is an automated email from the ASF dual-hosted git repository.

tqchen pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


from da56c89f32 [Dlight] Enhance vectorization for gpu matmul (#16894)
 add de91c5ca94 [Bugfix] rocm shared memory issue on MI250 (#16901)

No new revisions were added by this update.

Summary of changes:
 python/tvm/dlight/gpu/gemv.py | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)



Re: [PR] [Bugfix] rocm shared memory issue on MI250 [tvm]

2024-04-17 Thread via GitHub


tqchen merged PR #16901:
URL: https://github.com/apache/tvm/pull/16901


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(tvm) branch main updated (b3ffd97569 -> da56c89f32)

2024-04-17 Thread tqchen
This is an automated email from the ASF dual-hosted git repository.

tqchen pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


from b3ffd97569 [BYOC] Add layout check and update shape check for cublas 
FP8 BYOC (#16895)
 add da56c89f32 [Dlight] Enhance vectorization for gpu matmul (#16894)

No new revisions were added by this update.

Summary of changes:
 python/tvm/dlight/gpu/matmul.py  |  7 +-
 tests/python/dlight/test_gpu_matmul.py   | 81 
 tests/python/dlight/test_gpu_matmul_tensorize.py | 18 +++---
 3 files changed, 54 insertions(+), 52 deletions(-)



Re: [PR] [Dlight] Enhance vectorization for gpu matmul [tvm]

2024-04-17 Thread via GitHub


tqchen merged PR #16894:
URL: https://github.com/apache/tvm/pull/16894


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [Relax] Allow specifying entry_funcs for BYOC [tvm]

2024-04-17 Thread via GitHub


vinx13 opened a new pull request, #16902:
URL: https://github.com/apache/tvm/pull/16902

   cc @tqchen 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [BYOC] Add layout check and update shape check for cublas FP8 BYOC [tvm]

2024-04-17 Thread via GitHub


tqchen merged PR #16895:
URL: https://github.com/apache/tvm/pull/16895


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(tvm) branch main updated (857fe614ab -> b3ffd97569)

2024-04-17 Thread tqchen
This is an automated email from the ASF dual-hosted git repository.

tqchen pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


from 857fe614ab [Target] Don't register AArch64 target tags without LLVM 
compiler support (#16897)
 add b3ffd97569 [BYOC] Add layout check and update shape check for cublas 
FP8 BYOC (#16895)

No new revisions were added by this update.

Summary of changes:
 python/tvm/relax/backend/contrib/cublas.py | 28 
 tests/python/relax/test_codegen_cublas.py  | 20 
 2 files changed, 36 insertions(+), 12 deletions(-)



Re: [PR] [Target] Don't register AArch64 target tags without LLVM compiler support [tvm]

2024-04-17 Thread via GitHub


tqchen merged PR #16897:
URL: https://github.com/apache/tvm/pull/16897


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(tvm) branch main updated (d030ce27a1 -> 857fe614ab)

2024-04-17 Thread tqchen
This is an automated email from the ASF dual-hosted git repository.

tqchen pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


from d030ce27a1 [TVMScript] Optionally use `ruff format` instead of `black` 
(#16876)
 add 857fe614ab [Target] Don't register AArch64 target tags without LLVM 
compiler support (#16897)

No new revisions were added by this update.

Summary of changes:
 cmake/modules/LLVM.cmake   |  1 +
 cmake/utils/FindLLVM.cmake | 18 ++
 src/target/parsers/aprofile.cc |  7 ---
 src/target/tag.cc  |  6 +-
 4 files changed, 28 insertions(+), 4 deletions(-)



[PR] [CMAKE] Misc improvment of Util [tvm]

2024-04-17 Thread via GitHub


tqchen opened a new pull request, #16900:
URL: https://github.com/apache/tvm/pull/16900

   This PR updates the utils so tvm_option can take in list argument. Also 
introduces a flag for MSCCLPP.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [BYOC] Add layout check and update shape check for cublas FP8 BYOC [tvm]

2024-04-17 Thread via GitHub


vinx13 commented on code in PR #16895:
URL: https://github.com/apache/tvm/pull/16895#discussion_r1569209004


##
python/tvm/relax/backend/contrib/cublas.py:
##
@@ -68,11 +69,30 @@ def _check_matmul(context: PatternCheckContext) -> bool:
 # Rows number must be multiples of 4 for IGEMM
 return False
 elif lhs_dtype == "e4m3_float8" and rhs_dtype == "e4m3_float8":
-# Matrix dimensions must be multiples of 16. This requirement is 
missing from the cuBLAS
-# docs, but it was observed during testing.
-if not isinstance(rhs_shape[-1], (tvm.tir.expr.IntImm, int)) or 
rhs_shape[-1] % 16 != 0:
+matmul_rhs_var = matmul_call.args[1]
+rhs_transposed = False
+if matmul_rhs_var in context.matched_bindings:
+matmul_rhs_call = context.matched_bindings[matmul_rhs_var]
+assert (
+isinstance(matmul_rhs_call, tvm.relax.Call)
+and matmul_rhs_call.op.name == "relax.permute_dims"
+)

Review Comment:
   `if matmul_rhs_var in context.matched_bindings:` this condition implies rhs 
is transposed (it's the only pattern that rhs is another binding being 
matched), so I added an assertion here, it won't crash if we have 
non-transposed rhs



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Target] Don't register AArch64 target tags without LLVM compiler support [tvm]

2024-04-17 Thread via GitHub


mvermeulen commented on PR #16897:
URL: https://github.com/apache/tvm/pull/16897#issuecomment-2061698191

   I retested with the ROCm build and no longer see warning messages about a 
missing ARM configuration when running ROCm.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Restore "pytest.mark.gpu" for RELAX tests [tvm]

2024-04-17 Thread via GitHub


Lunderberg commented on code in PR #16741:
URL: https://github.com/apache/tvm/pull/16741#discussion_r1569089772


##
tests/python/relax/test_codegen_cudnn.py:
##
@@ -36,12 +36,8 @@ def reset_seed():
 
 has_cudnn = tvm.get_global_func("relax.ext.cudnn", True)
 
-cudnn_enabled = pytest.mark.skipif(
-not has_cudnn,
-reason="cuDNN not enabled.",
-)
 
-pytestmark = [cudnn_enabled]
+pytestmark = [*tvm.testing.requires_cudnn.marks()]

Review Comment:
   Good point.  I thought I had made `Feature` be a subclass of `Mark`, but I 
guess I never did.  In that case, I like your simplification.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Dlight] Enhance vectorization for gpu matmul [tvm]

2024-04-17 Thread via GitHub


tqchen commented on PR #16894:
URL: https://github.com/apache/tvm/pull/16894#issuecomment-2061427828

   @vinx13 please fix the testcase


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [relay][feature] save relay IR as onnx for visualize [tvm]

2024-04-17 Thread via GitHub


tqchen commented on PR #16847:
URL: https://github.com/apache/tvm/pull/16847#issuecomment-2061421527

   seems a better approach to write something like 
https://github.com/lutzroeder/netron/blob/main/source/caffe.js in netron. That 
file would parse the json file exported by the tvm relax, and construct the 
graph. Alternatively, we can add a visualization spec json for tvm, and add 
netron support.
   
   This would remove the need of protobuf and allows more direct export


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [SVE][TOPI] Add conv2d NHWC hybrid SVE schedule for `arm_cpu` [tvm]

2024-04-17 Thread via GitHub


Anndrey24 opened a new pull request, #16899:
URL: https://github.com/apache/tvm/pull/16899

   This commit adds an `arm_cpu` conv2d NHWC schedule which generates SVE 
instructions by extending the hybrid GeMM approach implemented in #16106 to use 
scalable expressions as splitting factors.
   
   Various vscale-related fixes needed to implement the schedule are also 
included, such as:
   
- adding vscale bounds in the `ConstIntBoundAnalyzer` and 
`IntervalSetEvaluator`
- simplifying `MinNode` and `MaxNode` that have scalable expression 
operands in `RewriteSimplifier`, which would appear when defining the shape of 
a buffer padded to be a multiple of vscale and in its respective buffer access 
indices (e.g. `C_1 = T.Buffer((1024 * (T.vscale() * 16 + 256 - 16 % T.vscale() 
* 16),), data=C)` instead of `C_1 = T.Buffer((1024 * (T.max(255, T.vscale() * 
16 + 255 - 16 % T.vscale() * 16) + 1),), data=C)`)
   
   The correctness of the new schedule is checked using a TOPI test, while the 
presence of generated SVE instructions is verified by a codegen_aarch64 test. 
The new rewrite_simplify rules are also covered by additional test cases.  
   
   cc @ekalda @lhutton1 @Lunderberg 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] [Bug] Graph optimization model compilation error involving `Pad` operator [tvm]

2024-04-17 Thread via GitHub


shaoyuyoung opened a new issue, #16898:
URL: https://github.com/apache/tvm/issues/16898

   I am trying to compile an ONNX (graph below) model using TVM.
   
![5618520ff3e8817d39d4422c547eaf9](https://github.com/apache/tvm/assets/100203773/0a00c2d4-70e1-4e66-b7ae-96737e30d5b3)
   
   Of course, this is a complicated graph, but we can simplify it as below.
   
![image](https://github.com/apache/tvm/assets/100203773/10fd68b9-168f-4db7-b593-a88bd410e612)
   
   These two graphs are equal. When I try to compile them using TVM. The 
original ONNX model fails but the simplified ONNX model passes. It is very 
strange!
   
   This seems to involve the `Pad` operator shape-checking problem.
   
   In theory, I think **TVM should have strong compatibility with the native 
ONNX model**. However, the truth is not satisfactory. 
   
   It seems that only simplified, simple models are acceptable to TVM
   ### Expected behavior
   ONNX compilation passes
   
   ### Actual behavior
   
   ```
   onnx fail
   Traceback (most recent call last):
 18: 
tvm::runtime::PackedFuncObj::Extractor::AssignTypedLambda(tvm::transform::__mk_TVM9::{lambda(tvm::transform::Pass, 
tvm::IRModule)#1}, std::__cxx11::basic_string, 
std::allocator >)::{lambda(tvm::runtime::TVMArgs const&, 
tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, 
std::__cxx11::basic_string, std::allocator 
>, tvm::runtime::TVMRetValue)
 17: tvm::transform::Pass::operator()(tvm::IRModule) const
 16: tvm::transform::Pass::operator()(tvm::IRModule, 
tvm::transform::PassContext const&) const
 15: tvm::relay::transform::FunctionPassNode::operator()(tvm::IRModule, 
tvm::transform::PassContext const&) const
 14: _ZN3tvm7runtime13PackedFun
 13: tvm::runtime::TypedPackedFunc::AssignTypedLambda(tvm::relay::transform::DynamicToStatic()::{lambda(tvm::relay::Function,
 tvm::IRModule, tvm::transform::PassContext)#1})::{lambda(tvm::runtime::TVMArgs 
const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs 
const&, tvm::runtime::TVMRetValue*) const
 12: tvm::relay::DynamicToStatic(tvm::relay::Function, tvm::IRModule)
 11: tvm::relay::DynamicToStaticMutator::PrepareInput(tvm::RelayExpr const&)
 10: tvm::transform::Pass::operator()(tvm::IRModule) const
 9: tvm::transform::Pass::operator()(tvm::IRModule, 
tvm::transform::PassContext const&) const
 8: tvm::relay::transform::FunctionPassNode::operator()(tvm::IRModule, 
tvm::transform::PassContext const&) const
 7: tvm::transform::Pass::operator()(tvm::IRModule) const
 6: tvm::transform::Pass::operator()(tvm::IRModule, 
tvm::transform::PassContext const&) const
 5: tvm::transform::ModulePassNode::operator()(tvm::IRModule, 
tvm::transform::PassContext const&) const
 4: 
tvm::runtime::PackedFuncObj::Extractor::AssignTypedLambda(tvm::relay::transform::InferType()::{lambda(tvm::IRModule, 
tvm::transform::PassContext const&)#1})::{lambda(tvm::runtime::TVMArgs const&, 
tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, 
tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
 3: tvm::relay::TypeInferencer::Infer(tvm::GlobalVar, tvm::relay::Function)
 2: tvm::relay::TypeSolver::Solve()
 1: 
tvm::runtime::PackedFuncObj::Extractor const&, int, tvm::Attrs const&, 
tvm::TypeReporter const&)>::AssignTypedLambda const&, int, tvm::Attrs const&, 
tvm::TypeReporter const&)>(bool (*)(tvm::runtime::Array 
const&, int, tvm::Attrs const&, tvm::TypeReporter 
const&))::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> 
>::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, 
tvm::runtime::TVMRetValue*)
 0: tvm::relay::PadRel(tvm::runtime::Array const&, int, 
tvm::Attrs const&, tvm::TypeReporter const&)
 File 
"/root/anaconda3/conda-bld/tvm-package_1701590675822/work/src/relay/op/nn/pad.cc",
 line 131
   InternalError: Check failed: (data->shape.size() == param->pad_width.size()) 
is false: There should be as many pad width pairs as shape dimensions but the 
shape has 5 dimensions and there are 4 pad width pairs.
   ```
   
   ### Environment
   
   Operating System: Ubuntu 18
   TVM:0.15
   Torch: 2.1.1
   ONNX: 1.15.0
   
   ### Steps to reproduce
   ONNX file is here: 
[onnx.zip](https://github.com/apache/tvm/files/15011753/onnx.zip)
   
   
   Here is the script
   ```python
   from onnxsim import simplify
   import tvm
   from tvm import relay
   import onnx
   
   
   def compile_onnx(onnx_model, shape):
   mod_from_onnx, params_onnx = relay.frontend.from_onnx(onnx_model,
 shape=shape)
   with tvm.transform.PassContext(opt_level=4):
   executor = relay.build_module.create_executor(
   'graph', mod_from_onnx, tvm.cpu(), 'llvm', params_onnx
   ).evaluate()
   
   
   model = onnx.load('./model.onnx')
   
   try:
   compile_onnx(model, {'v0_0': [], 'v6_0': [5, 5, 4, 2, 1]})
   except 

Re: [PR] Restore "pytest.mark.gpu" for RELAX tests [tvm]

2024-04-17 Thread via GitHub


apeskov commented on code in PR #16741:
URL: https://github.com/apache/tvm/pull/16741#discussion_r1568848341


##
tests/python/relax/test_codegen_cudnn.py:
##
@@ -36,12 +36,8 @@ def reset_seed():
 
 has_cudnn = tvm.get_global_func("relax.ext.cudnn", True)

Review Comment:
   You re absolutely right. Will remove it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [CUBLAS] Enable offloading of R.matmul + R.dequantize [tvm]

2024-04-17 Thread via GitHub


ibsidorenko opened a new pull request, #16896:
URL: https://github.com/apache/tvm/pull/16896

   This commit enables offloading of `R.matmul` + `R.dequantize` to cuBLAS 
codegen. Dequantization scale is passed to runtime function and set to alpha 
parameter. If there is no dequantization, then alpha == 1.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Restore "pytest.mark.gpu" for RELAX tests [tvm]

2024-04-17 Thread via GitHub


apeskov commented on code in PR #16741:
URL: https://github.com/apache/tvm/pull/16741#discussion_r1568846620


##
tests/python/relax/test_codegen_cudnn.py:
##
@@ -36,12 +36,8 @@ def reset_seed():
 
 has_cudnn = tvm.get_global_func("relax.ext.cudnn", True)
 
-cudnn_enabled = pytest.mark.skipif(
-not has_cudnn,
-reason="cuDNN not enabled.",
-)
 
-pytestmark = [cudnn_enabled]
+pytestmark = [*tvm.testing.requires_cudnn.marks()]

Review Comment:
   I'm not sure that it's possible. Global object `pytestmark` should has type 
`Mark` or `List[Mark]` but our TVM markers has type 
`tvm.testing.utils.Feature`. So direct assignment, like you suggest above, will 
lead to type check error:
   > TypeError: got  
instead of Mark
   
   Moreover, the idea of this change was to utilise hierarchical structure of 
tvm testing features. `pytest.mark.gpu` mark is a root of this hierarchy. So I 
have to extract all markers from current feature and all parents. That is 
exactly method `marks()` do.
   
   But you are right this line can be slightly simplified by removing explicit 
list generator, like:
   > pytestmark = tvm.testing.requires_cudnn.marks()



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [BYOC] Add layout check and update shape check for cublas FP8 BYOC [tvm]

2024-04-17 Thread via GitHub


ibsidorenko commented on code in PR #16895:
URL: https://github.com/apache/tvm/pull/16895#discussion_r1568429222


##
python/tvm/relax/backend/contrib/cublas.py:
##
@@ -68,11 +69,30 @@ def _check_matmul(context: PatternCheckContext) -> bool:
 # Rows number must be multiples of 4 for IGEMM
 return False
 elif lhs_dtype == "e4m3_float8" and rhs_dtype == "e4m3_float8":
-# Matrix dimensions must be multiples of 16. This requirement is 
missing from the cuBLAS
-# docs, but it was observed during testing.
-if not isinstance(rhs_shape[-1], (tvm.tir.expr.IntImm, int)) or 
rhs_shape[-1] % 16 != 0:
+matmul_rhs_var = matmul_call.args[1]
+rhs_transposed = False
+if matmul_rhs_var in context.matched_bindings:
+matmul_rhs_call = context.matched_bindings[matmul_rhs_var]
+assert (
+isinstance(matmul_rhs_call, tvm.relax.Call)
+and matmul_rhs_call.op.name == "relax.permute_dims"
+)

Review Comment:
   I am Ok, thank you! Just a nit question: do we need here assert for the case 
when rhs_call is something but not `permute_dims`? Just to leave rhs_transposed 
== False and return False in the next IF (without crash):
   ```
   if not rhs_transposed:
   # cuBLAS FP8 operations require rhs being transposed
   return False
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org