zhanghaohit commented on a change in pull request #6126:
URL: https://github.com/apache/incubator-tvm/pull/6126#discussion_r462007395
##
File path: src/tir/transforms/lower_tvm_builtin.cc
##
@@ -88,16 +88,6 @@ class BuiltinLower : public StmtExprMutator {
op = stmt.as();
// Get constant allocation bound.
int64_t nbytes = GetVectorBytes(op->dtype);
Review comment:
Original code will fail if there are multiple workloads in one schedule.
For example, in `fused_nn_conv2d_add_add_right_shift_clip_cast_31`, the
`conv2d` and `add` may both have `workload` attrs. We have to get the correct
workload by comparing the `task_name`.
Previously it works fine, as `add` is not a tunable op. But since we also
want to put middle alu-only nodes (residual blocks) to VTA, such as
`fused_cast_cast_add_nn_relu_clip_cast_3`. We create a vta schedule for `add`
(see
[add.alu](https://github.com/apache/incubator-tvm/blob/a1daa1c47d2a19e51ab96da5c20e187f3bbf3413/vta/python/vta/top/op.py#L166))
##
File path: python/tvm/autotvm/task/topi_integration.py
##
@@ -227,17 +227,21 @@ def wrapper(outs, *args, **kwargs):
return _decorate
-def get_workload(outs):
+def get_workload(outs, task_name=None):
"""Retrieve the workload from outputs"""
def traverse(tensors):
"""traverse all ops to find attached workload"""
for t in tensors:
op = t.op
-if 'workload' in op.attrs:
-return args_to_workload(op.attrs['workload'])
wkl = traverse(op.input_tensors)
if wkl:
return wkl
+
Review comment:
Original code will fail if there are multiple workloads in one schedule.
For example, in `fused_nn_conv2d_add_add_right_shift_clip_cast_31`, the
`conv2d` and `add` may both have `workload` attrs. We have to get the correct
workload by comparing the `task_name`.
Previously it works fine, as `add` is not a tunable op. But since we also
want to put middle alu-only nodes (residual blocks) to VTA, such as
`fused_cast_cast_add_nn_relu_clip_cast_3`. We create a vta schedule for `add`
(see
[add.alu](https://github.com/apache/incubator-tvm/blob/a1daa1c47d2a19e51ab96da5c20e187f3bbf3413/vta/python/vta/top/op.py#L166))
##
File path: src/relay/backend/compile_engine.cc
##
@@ -230,7 +230,7 @@ class ScheduleGetter : public
backend::MemoizedExprTranslator>
<< "Two complicated op in a primitive function "
<< " master=" << master_op_ << " current=" << op;
}
-if (op_pattern >= master_op_pattern_) {
+if (op_pattern > master_op_pattern_) {
Review comment:
By this change, the op visited first will have high priority to be the
master_op, compared with the ops with the same op_pattern. I think it is true
that the front ops are more important.
This change is also due to the introduction of ALU-only node, e.g.,
`fused_cast_cast_add_left_shift_add_nn_relu_add_right_shift_cast_2`. In this
case, we'll choose the first add as the master_op. The last `add` is actually
an addition of a tensor and const. If we choose the last `add` as the
master_op, the autotune config may cause ALU buffer overflow, as we actually
need more ALU buffer for the whole fused op.
##
File path: src/relay/backend/compile_engine.cc
##
@@ -288,7 +288,7 @@ class ScheduleGetter : public
backend::MemoizedExprTranslator>
tvm::Target target_;
Op master_op_;
Attrs master_attrs_;
- int master_op_pattern_{0};
+ int master_op_pattern_{-1};
Review comment:
This is due to the change here:
```
if (op_pattern >= master_op_pattern_) {
```
to
```
if (op_pattern > master_op_pattern_) {
```
##
File path: src/tir/transforms/lower_tvm_builtin.cc
##
@@ -88,16 +88,6 @@ class BuiltinLower : public StmtExprMutator {
op = stmt.as();
// Get constant allocation bound.
int64_t nbytes = GetVectorBytes(op->dtype);
Review comment:
This removes special handling for kDLCPU. Otherwise, it may cause LLVM
parameters match error.
```bash
Traceback (most recent call last):
File "vta/tutorials/frontend/deploy_classification.py", line 210, in
params=params, target_host=env.target_host)
File
"/4pd/home/zhanghao/workspace/tvm-2/tvm/python/tvm/relay/build_module.py", line
251, in build
graph_json, mod, params = bld_mod.build(mod, target, target_host, params)
File
"/4pd/home/zhanghao/workspace/tvm-2/tvm/python/tvm/relay/build_module.py", line
120, in build
self._build(mod, target, target_host)
File "tvm/_ffi/_cython/./packed_func.pxi", line 321, in
tvm._ffi._cy3.core.PackedFuncBase.__call__
File "tvm/_ffi/_cython/./packed_func.pxi", line 256, in
tvm._ffi._cy3.core.FuncCall
File "tvm/_ffi/_cython/./packed_func.pxi", line 245, in
tvm._ffi._cy3.core.FuncCall3
File "tvm/_ffi/_cython/./base.pxi", line