Alright! I performed some profiling. I've taken
[this](https://github.com/apache/incubator-tvm/blob/master/tutorials/autotvm/tune_relay_x86.py)
tutorial as a basis, using only VGG-16.
This is the **output** of the **debug runtime** :
```
Node Name Ops
Hi!
Does this appear because of this previous issue:
https://discuss.tvm.ai/t/bug-arm-significant-performance-degradation-of-execution-times-between-tvm-revisions/6029/
I was experiencing a similar slowdown on ARM CPUs which was lead back by the
limited Winograd algorithm...
Cheers
Robert
Damn you are right! Hm.. matching the debug runtime output to the LLVM IR is
fairly easy. I don't know whether the shapes are somehow encoded in the LLVM
IR. My guess: no...
---
[Visit
Topic](https://discuss.tvm.ai/t/autotvm-selective-tuning-of-hotspots/6083/5) to
respond.
You are
I see! That's a very good question. I'm not sure to be honest. I wanted to look
into it myself for measuring a performance baseline for each kernel before
autotuning. I'll look into it the next days when I have spare time. Please let
me know if you have found something in the meanwhile..
Dear community,
I'm currently trying to **reduce overall Auto-TVM runtimes** by selectively
tuning only the kernels that are actual hotspots in the application.
**Hotspot detection** can be performed fairly easily, e.g. by using the **debug
runtime** which gives a detailed callgraph profile
Hi,
I'm not fully sure whether I understood your questions correctly. If you are
looking for the default schedule for a selected target, TVM fetches that from
[this](https://github.com/uwsampl/tvm-distro/tree/master/tophub) repository
during execution time and places a copy under