[TVM Discuss] [Questions] Why convolution written in python
Got it! Thank you very much ~~ --- [Visit Topic](https://discuss.tvm.ai/t/why-convolution-written-in-python/6072/3) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/2d2445eb36b8dc7e2b41b15ba235dd1884b74db7e3589db0956605a504e48976).
[TVM Discuss] [Questions] [AutoTVM] Selective tuning of hotspots
Dear community, I'm currently trying to **reduce overall Auto-TVM runtimes** by selectively tuning only the kernels that are actual hotspots in the application. **Hotspot detection** can be performed fairly easily, e.g. by using the **debug runtime** which gives a detailed callgraph profile when executing run(). My question is **how to match** these identified operations to the AutoTVM selected kernels. On the one hand, the profile information looks like [this example](https://docs.tvm.ai/dev/debugger.html) shows. A prioritized list of nodes mostly identified by their LLVM IR name. On the other hand, when selecting the tasks to be tuned ``` kernels =autotvm.task.extract_from_program(ir["main"], target=target, params=params, ops=None)``` gives you a list of [Task](https://docs.tvm.ai/api/python/autotvm.html#tvm.autotvm.task.task.Task) objects, e.g.: ``` Task(func_name=dense_nopack.x86, args=(('TENSOR', (1, 16), 'float32'), ('TENSOR', (64, 16), 'float32'), None, 'float32'), kwargs={}, workload=('dense_nopack.x86', ('TENSOR', (1, 16), 'float32'), ('TENSOR', (64, 16), 'float32'), None, 'float32')) ``` My question refers to how to match such Tasks to their IR counterparts? Any help, ideas, suggestions are much appreciated! Thank you & Best regards --- [Visit Topic](https://discuss.tvm.ai/t/autotvm-selective-tuning-of-hotspots/6083/1) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/b3ab2b4aee2549639b76b097f07b59596f5d6f8492a15dcdfdb721d70a74f9df).
[TVM Discuss] [Application] Unintuitive lowered code in TVM
Hi all, I defined a toy computation and scheduled it in TVM. I am having some difficulty in understanding how the lowered code that TVM produces corresponds to the schedule. I have reproduced both the Python and the lowered IR below. Python code: ``` import tvm from tvm import te batch_size = 24 hidden_size = 256 length = 100 scan_state = te.placeholder((length, batch_size, hidden_size)) scan_init = te.placeholder((1, batch_size, hidden_size)) C = te.compute((length, batch_size, hidden_size), lambda t, b, i: scan_state[t - 1, b, i] * 4) D = te.compute((length, batch_size, hidden_size), lambda t, b, i: C[t, b, i] + 7892) scan_update = D scan = tvm.te.scan(scan_init, scan_update, scan_state) s = te.create_schedule([scan.op]) bx = te.thread_axis((0, batch_size), "blockIdx.x") tx = te.thread_axis((0, hidden_size), "threadIdx.x") s[scan].env_threads([bx]) xo, xi = s[C].split(s[C].op.axis[1], factor=24) s[C].bind(xi, bx) s[C].bind(s[C].op.axis[2], tx) xo, xi = s[D].split(s[D].op.axis[1], factor=24) s[D].bind(xi, bx) s[D].bind(s[D].op.axis[2], tx) print(tvm.lower(s, [scan_init, scan_state, scan_update, scan], simple_mode = True)) ``` Lowered IR: ``` produce scan { // attr [iter_var(blockIdx.x, range(min=0, ext=24), blockIdx.x)] thread_extent = 24 // attr [compute] storage_scope = "shared" allocate compute[float32 * 256] for (scan.idx, 0, 99) { produce compute { // attr [iter_var(threadIdx.x, range(min=0, ext=256), threadIdx.x)] thread_extent = 256 if (likely((blockIdx.x < 1))) { if (likely((blockIdx.x < 12))) { compute[((blockIdx.x*256) + threadIdx.x)] = (scan[(((scan.idx*6144) + (blockIdx.x*512)) + threadIdx.x)]*4f) } } } // attr [iter_var(threadIdx.x, range(min=0, ext=256), threadIdx.x)] thread_extent = 256 scan[scan.idx*6144) + (blockIdx.x*256)) + threadIdx.x) + 6144)] = (compute[threadIdx.x] + 7892f) } } ``` Specifically I do not understand the predicates (`blockIdx.x < 1` and `blockIdx.x < 12`) in the computation on the first compute op. --- [Visit Topic](https://discuss.tvm.ai/t/unintuitive-lowered-code-in-tvm/6081/1) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/72db99ccd96b211b3074773d19cd09088950d64098d3afe4ff335faa2d03e904).
[TVM Discuss] [Questions] How does a Relay OP support variable length parameter list?
I think the relay's convention is to convert multiple parameters into a tuple. --- [Visit Topic](https://discuss.tvm.ai/t/how-does-a-relay-op-support-variable-length-parameter-list/1753/3) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/84312964adddc5fb3b176e9ef83681d8047fd267e0c831e9e4ecfe85d0f0b339).
[TVM Discuss] [Questions] [VM] The performance degradation of VM runtime and Dynamic Shape support compared to Graph Runtime
Thank you for the response! I try it using the cpu backend and target with "llvm". --- [Visit Topic](https://discuss.tvm.ai/t/vm-the-performance-degradation-of-vm-runtime-and-dynamic-shape-support-compared-to-graph-runtime/6076/4) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/3458174744928c40790296c6acdcf9738a74f34a087dd79b387ed8f846c1ce62).
[TVM Discuss] [Questions] Why convolution written in python
Since tvm is a compiler infrastructure, though the convolution is defined using a Python API, it is simply defining the computation. When the operation runs, this computation is compiled to a backend, e.g. LLVM, OpenCL, CUDA. So there isn't an overhead in inference time by using Python here. To get an intuition, you can see [in this example](https://docs.tvm.ai/tutorials/tensor_expr_get_started.html) how we define the vector addition using the tvm Python API, and then compile it to a fast module. --- [Visit Topic](https://discuss.tvm.ai/t/why-convolution-written-in-python/6072/2) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/c11f13bd68a4d184a11abc25e65e2a9986d6841d3c9917b2fb52e6f042ac7581).
[TVM Discuss] [Questions] [VM] The performance degradation of VM runtime and Dynamic Shape support compared to Graph Runtime
Are you running this on GPU or CPU? The performance degradation is expected on GPU as we need the heterogenous runtime support to avoid redundant memory copy between CPU and GPU. @zhiics is currently working on this. Besides, @jroesch is working on the memory planning for dynamic shape cases to reduce the total number of memory allocations and reuse the buffer as much as possible. --- [Visit Topic](https://discuss.tvm.ai/t/vm-the-performance-degradation-of-vm-runtime-and-dynamic-shape-support-compared-to-graph-runtime/6076/2) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/95c211523b32a97c2f1b4a06cb08003530203d645a01bd436f1fea100370b3b1).
[TVM Discuss] [Application] TOPI autotuning integration
Did you use the latest TVM master version? In latest version, we move to use [Relay Op Strategy](https://docs.tvm.ai/dev/relay_op_strategy.html) to choose which implementation to compile for each op. You need to add your implementation in the strategy in order to be used during the compilation. --- [Visit Topic](https://discuss.tvm.ai/t/topi-autotuning-integration/6079/2) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/7e2bc46e4bbf3bb9ed7f33733a8460f0ac1a0d46b04228d3222cb908d496b591).