Re: [I] [Bug] Inconsistent Results between Direct Optimization and Sequential Optimization in TVM [tvm]

2024-05-08 Thread via GitHub
Jupiterghy commented on issue #16870: URL: https://github.com/apache/tvm/issues/16870#issuecomment-2102059120 > If you call a pass directly (instead of using `Sequential`, it will bypass the check for `opt_level`, `required_pass`, etc. Thank you for your response. However, I'm curious

(tvm) branch nightly updated (819b0023e4 -> c0a47ed139)

2024-05-08 Thread github-bot
This is an automated email from the ASF dual-hosted git repository. github-bot pushed a change to branch nightly in repository https://gitbox.apache.org/repos/asf/tvm.git from 819b0023e4 [Relax] Support nested ModuleList in nn.Module (#16971) add 02c4c55eaa [SVE] Add codegen support for

Re: [PR] [DLIGHT][GPU] Improved gemv outer fallback schedule [tvm]

2024-05-08 Thread via GitHub
krishnaraj36 commented on PR #16973: URL: https://github.com/apache/tvm/pull/16973#issuecomment-2101932485 @srkreddy1238 @tqchen : Can you please take a look to this PR. let me know your advise. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [Unity][BYOC] Use arith.Analyzer to check batch equality of matmul in cublas [tvm]

2024-05-08 Thread via GitHub
rickzx commented on PR #16982: URL: https://github.com/apache/tvm/pull/16982#issuecomment-2101405369 cc: @MasterJH5574 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[PR] [Unity][BYOC] Use arith.Analyzer to check batch equality of matmul in cublas [tvm]

2024-05-08 Thread via GitHub
rickzx opened a new pull request, #16982: URL: https://github.com/apache/tvm/pull/16982 For workloads with a mixture of symbolic shape and concrete shape as batch sizes, we cannot directly use `int()` to obtain the batch size. Instead, we can use `arith.Analyzer` to check equality. F

[PR] [SME] Add scalable fp16->fp32 dense schedule [tvm]

2024-05-08 Thread via GitHub
lhutton1 opened a new pull request, #16981: URL: https://github.com/apache/tvm/pull/16981 This commit extends the functionality of the SME dense and matmul schedules to support operations with fp16 inputs and an fp32 output, where `transpose_a=False` and `transpose_b=True`. For conve

Re: [PR] [Relax] Implement relax.op.view [tvm]

2024-05-08 Thread via GitHub
masahi commented on PR #16955: URL: https://github.com/apache/tvm/pull/16955#issuecomment-2101267069 @tvm-bot rerun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] [SVE] Add support for representing and creating buffer-level predicates [tvm]

2024-05-08 Thread via GitHub
Lunderberg commented on code in PR #16966: URL: https://github.com/apache/tvm/pull/16966#discussion_r1594285519 ## include/tvm/script/ir_builder/tir/ir.h: ## @@ -411,8 +411,10 @@ Var EnvThread(String thread_tag, DataType dtype = DataType::Int(32)); * \param buffer The buffer.

Re: [PR] [RFC] Add NNEF frontend [tvm-rfcs]

2024-05-08 Thread via GitHub
agoston-mc commented on PR #108: URL: https://github.com/apache/tvm-rfcs/pull/108#issuecomment-2100973263 We have updated the PR with Relax frontend, but we have also kept the Relay, as an option, thinking it could be useful to have both, because we noticed performance differences during te

Re: [PR] [Relax] Implement relax.op.view [tvm]

2024-05-08 Thread via GitHub
Lunderberg commented on code in PR #16955: URL: https://github.com/apache/tvm/pull/16955#discussion_r1594281102 ## python/tvm/script/parser/relax/entry.py: ## @@ -296,13 +298,17 @@ class CallableProxy(StructInfoProxy): purity : bool Whether the callable is pure.

Re: [PR] [Relax] Implement relax.op.view [tvm]

2024-05-08 Thread via GitHub
Lunderberg commented on code in PR #16955: URL: https://github.com/apache/tvm/pull/16955#discussion_r1594267879 ## python/tvm/relax/op/memory/view.py: ## @@ -0,0 +1,77 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. Se

[PR] [Cuda] Skip FreeDataSpace when CUDA driver is in inconsistent state [tvm]

2024-05-08 Thread via GitHub
Lunderberg opened a new pull request, #16980: URL: https://github.com/apache/tvm/pull/16980 Prior to this commit, the RAII handler in `NDArray` would always attempt to free a cuda memory allocation on destruction. However, the call to `cudaFree` may throw an exception. If this happens dur

[PR] [Disco] Implement `num_workers` property for `disco.Session` [tvm]

2024-05-08 Thread via GitHub
Lunderberg opened a new pull request, #16978: URL: https://github.com/apache/tvm/pull/16978 Prior to this commit, while the `num_workers` argument was provided to the `disco.Session` object, it could not be determined from an existing `disco.Session` object. As a result, functions that int

[PR] [TOPI] Remove `blockIdx.z` in topi sort [tvm]

2024-05-08 Thread via GitHub
Hzfengsy opened a new pull request, #16977: URL: https://github.com/apache/tvm/pull/16977 As `blockIdx.z` is not allowed in WebGPU, this PR split `blockIdx.z` into `blockIdx.y` to support WebGPU -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [DLIGHT][GPU] Enhance opencl thread limit for schedules [tvm]

2024-05-08 Thread via GitHub
tqchen commented on code in PR #16972: URL: https://github.com/apache/tvm/pull/16972#discussion_r1593921530 ## src/target/target_kind.cc: ## @@ -340,9 +340,9 @@ TVM_REGISTER_TARGET_KIND("rocm", kDLROCM) .set_target_parser(UpdateROCmAttrs); TVM_REGISTER_TARGET_KIND("openc

Re: [PR] [DLIGHT][GPU] Enhance opencl thread limit for schedules [tvm]

2024-05-08 Thread via GitHub
krishnaraj36 commented on code in PR #16972: URL: https://github.com/apache/tvm/pull/16972#discussion_r1593921051 ## src/target/target_kind.cc: ## @@ -340,9 +340,9 @@ TVM_REGISTER_TARGET_KIND("rocm", kDLROCM) .set_target_parser(UpdateROCmAttrs); TVM_REGISTER_TARGET_KIND(

Re: [PR] [DLIGHT][GPU] Enhance opencl thread limit for schedules [tvm]

2024-05-08 Thread via GitHub
tqchen commented on code in PR #16972: URL: https://github.com/apache/tvm/pull/16972#discussion_r1593920403 ## src/target/target_kind.cc: ## @@ -340,9 +340,9 @@ TVM_REGISTER_TARGET_KIND("rocm", kDLROCM) .set_target_parser(UpdateROCmAttrs); TVM_REGISTER_TARGET_KIND("openc

Re: [PR] [DLIGHT][GPU] Enhance opencl thread limit for schedules [tvm]

2024-05-08 Thread via GitHub
tqchen commented on code in PR #16972: URL: https://github.com/apache/tvm/pull/16972#discussion_r1593918265 ## src/target/target_kind.cc: ## @@ -340,9 +340,9 @@ TVM_REGISTER_TARGET_KIND("rocm", kDLROCM) .set_target_parser(UpdateROCmAttrs); TVM_REGISTER_TARGET_KIND("openc

Re: [PR] [WebGPU] Support `__dp4a(int8x4, int8x4)` as a pure extern method [tvm]

2024-05-08 Thread via GitHub
tqchen commented on code in PR #16976: URL: https://github.com/apache/tvm/pull/16976#discussion_r1593914420 ## src/target/source/codegen_webgpu.cc: ## @@ -405,6 +410,19 @@ void CodeGenWebGPU::VisitExpr_(const CallNode* op, std::ostream& os) { // NOLIN this->EndScope(els

Re: [PR] [DLIGHT][GPU] Enhance opencl thread limit for schedules [tvm]

2024-05-08 Thread via GitHub
krishnaraj36 commented on code in PR #16972: URL: https://github.com/apache/tvm/pull/16972#discussion_r1593911129 ## src/target/target_kind.cc: ## @@ -340,9 +340,9 @@ TVM_REGISTER_TARGET_KIND("rocm", kDLROCM) .set_target_parser(UpdateROCmAttrs); TVM_REGISTER_TARGET_KIND(

Re: [PR] [DLIGHT][GPU] Enhance opencl thread limit for schedules [tvm]

2024-05-08 Thread via GitHub
tqchen commented on code in PR #16972: URL: https://github.com/apache/tvm/pull/16972#discussion_r1593903178 ## src/target/target_kind.cc: ## @@ -340,9 +340,9 @@ TVM_REGISTER_TARGET_KIND("rocm", kDLROCM) .set_target_parser(UpdateROCmAttrs); TVM_REGISTER_TARGET_KIND("openc

(tvm) branch main updated: [CUBLAS][FP8] Enable R.matmul + R.multiply offloading (#16974)

2024-05-08 Thread masahi
This is an automated email from the ASF dual-hosted git repository. masahi pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/tvm.git The following commit(s) were added to refs/heads/main by this push: new c0a47ed139 [CUBLAS][FP8] Enable R.matmul + R.multiply o

Re: [PR] [CUBLAS][FP8] Enable R.matmul + R.multiply offloading [tvm]

2024-05-08 Thread via GitHub
masahi merged PR #16974: URL: https://github.com/apache/tvm/pull/16974 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.or

Re: [PR] [SVE] Add codegen support for `vscale_range()` function attribute [tvm]

2024-05-08 Thread via GitHub
lhutton1 merged PR #16962: URL: https://github.com/apache/tvm/pull/16962 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.

(tvm) branch main updated: [SVE] Add codegen support for `vscale_range()` function attribute (#16962)

2024-05-08 Thread lukhut
This is an automated email from the ASF dual-hosted git repository. lukhut pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/tvm.git The following commit(s) were added to refs/heads/main by this push: new 02c4c55eaa [SVE] Add codegen support for `vscale_range(

Re: [PR] [SVE] Add codegen support for `vscale_range()` function attribute [tvm]

2024-05-08 Thread via GitHub
lhutton1 commented on PR #16962: URL: https://github.com/apache/tvm/pull/16962#issuecomment-2100060824 Thanks @Anndrey24! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] [WebGPU] Support `__dp4a(int8x4, int8x4)` as a pure extern method [tvm]

2024-05-08 Thread via GitHub
Jiawei-Shao opened a new pull request, #16976: URL: https://github.com/apache/tvm/pull/16976 This patch adds the support of `__dp4a(int8x4, int8x4)` as a pure extern method of WebGPU target. In the generated WGSL shader, `int8x4` will be translated into `u32`, and `__dp4a(int8x4, int8x4)` w