(tvm) branch main updated: [CLML] Fix in clml pattern check condition (#16933)
This is an automated email from the ASF dual-hosted git repository. srk pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/tvm.git The following commit(s) were added to refs/heads/main by this push: new 1453893be0 [CLML] Fix in clml pattern check condition (#16933) 1453893be0 is described below commit 1453893be08f34dbde2950a179028d11daf48936 Author: krishnaraj36 AuthorDate: Sat Apr 27 11:06:31 2024 +0530 [CLML] Fix in clml pattern check condition (#16933) * [CLML] Fix in clml pattern check condition Added more check condition to make clml path more robust. 1. Depth_to_space - CLML path only supported for mode="DCR" and NCHW layout 2. Default checks - CLML supports less than 4D tensor dimension and with batch size =1. * Update clml.py --- python/tvm/relay/op/contrib/clml.py| 118 + tests/python/contrib/test_clml/test_ops.py | 30 ++-- 2 files changed, 109 insertions(+), 39 deletions(-) diff --git a/python/tvm/relay/op/contrib/clml.py b/python/tvm/relay/op/contrib/clml.py index 53b022c347..22a7aae2b1 100644 --- a/python/tvm/relay/op/contrib/clml.py +++ b/python/tvm/relay/op/contrib/clml.py @@ -93,6 +93,7 @@ class OptimizeBatchnorm(ExprMutator): if ( not isinstance(arg, (Var, Constant)) and isinstance(arg, tvm.relay.TupleGetItem) +and isinstance(arg.tuple_value.op, tvm.ir.op.Op) and arg.tuple_value.op.name == "nn.batch_norm" and (not isinstance(arg.tuple_value.args[0], (Var, Constant))) and arg.tuple_value.args[0].op.name == "nn.conv2d" @@ -260,7 +261,8 @@ def clml_pattern_table(): ) ) pattern = pattern.optional(is_op("nn.relu")) -pattern = pattern.optional(is_op("clip")) +# Fusion pattern to support with relu6 layer. +pattern = pattern.optional(is_op("clip").has_attr({"a_min": 0.0, "a_max": 6.0})) return pattern def conv_transpose_pattern(): @@ -276,7 +278,8 @@ def clml_pattern_table(): ) ) pattern = pattern.optional(is_op("nn.relu")) -pattern = pattern.optional(is_op("clip")) +# Fusion pattern to support with relu6 layer. +pattern = pattern.optional(is_op("clip").has_attr({"a_min": 0.0, "a_max": 6.0})) return pattern def pad_conv_pattern(): @@ -293,7 +296,8 @@ def clml_pattern_table(): ) ) pattern = pattern.optional(is_op("nn.relu")) -pattern = pattern.optional(is_op("clip")) +# Fusion pattern to support with relu6 layer. +pattern = pattern.optional(is_op("clip").has_attr({"a_min": 0.0, "a_max": 6.0})) return pattern def batch_norm_pattern(): @@ -359,6 +363,9 @@ def clml_pattern_table(): if attrs.data_layout != "NCHW": return False +if call.checked_type.shape[0] > 1: +return False + if ( (not clip_found) and (attrs.kernel_size[0] == 3) @@ -411,19 +418,13 @@ def clml_pattern_table(): # Scalars are not supported if len(call.args[1].checked_type.shape) == 0: return False +if call.args[0] == call.args[1]: +return False if tuple(call.args[0].checked_type.shape) != tuple(call.args[1].checked_type.shape): return False -for arg in call.args: -# Avoid any operators with dtype Int64 -if arg.checked_type.dtype == "int64": -return False -# No support for batch> 1 -if arg.checked_type.shape[0] > 1: -return False - -return True +return check_default_op(call) def check_pad_op(extract): call = extract @@ -433,60 +434,117 @@ def clml_pattern_table(): # Pad layers before any convolution are not guarenteed to be NCHW. if isinstance(call.args[0], tvm.relay.expr.Var): return False -return True +return check_default_op(call) def check_softmax_op(extract): call = extract -# supports 2D and 4D tensors +# supports 2D and 4D tensors. if len(call.args[0].checked_type.shape) not in [2, 4]: return False -return True +return check_default_op(call) def check_upsampling_op(extract): call = extract if call.attrs["method"] != "bilinear": return False -return True +return check_default_op(call) def check_concat_op(extract): call = extract if call.attrs["axis"] != 1:
(tvm) branch main updated: [SCRIPT][ADRENO] Fix in build config for adreno (#16927)
This is an automated email from the ASF dual-hosted git repository. srk pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/tvm.git The following commit(s) were added to refs/heads/main by this push: new 5bd10472e9 [SCRIPT][ADRENO] Fix in build config for adreno (#16927) 5bd10472e9 is described below commit 5bd10472e9a1b81a25e355824e84587a6988255c Author: krishnaraj36 AuthorDate: Fri Apr 26 15:06:10 2024 +0530 [SCRIPT][ADRENO] Fix in build config for adreno (#16927) 1. Enable CXX environment setting for empty tvm subgraph. 2. Enable clml profiling and tuning in rpc environment 3. Enable Opencl when CLML build. --- tests/scripts/setup-adreno-env.sh | 3 ++- tests/scripts/task_build_adreno_bins.sh | 3 +++ tests/scripts/task_config_build_adreno.sh | 3 +-- 3 files changed, 6 insertions(+), 3 deletions(-) diff --git a/tests/scripts/setup-adreno-env.sh b/tests/scripts/setup-adreno-env.sh index 15c124a0f0..d2c776412e 100755 --- a/tests/scripts/setup-adreno-env.sh +++ b/tests/scripts/setup-adreno-env.sh @@ -80,6 +80,7 @@ function def_environment() { export RPC_DEVICE_KEY="android" export RPC_TARGET="adreno" export TVM_NDK_CC="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang" +export CXX="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang" } def_environment @@ -111,7 +112,7 @@ case ${ENVIRONMENT} in adb forward tcp:$((LISTEN_PORT + 1)) tcp:$((LISTEN_PORT + 1)) adb forward tcp:$((LISTEN_PORT + 2)) tcp:$((LISTEN_PORT + 2)) adb forward tcp:$((LISTEN_PORT + 3)) tcp:$((LISTEN_PORT + 3)) -adb shell "cd ${TARGET_FOLDER}; killall -9 tvm_rpc-${USER}; sleep 2; LD_LIBRARY_PATH=${TARGET_FOLDER}/ ./tvm_rpc-${USER} server --host=0.0.0.0 --port=${LISTEN_PORT} --port-end=$((LISTEN_PORT + 10)) --tracker=127.0.0.1:${TVM_TRACKER_PORT} --key=${RPC_DEVICE_KEY}" +adb shell "cd ${TARGET_FOLDER}; killall -9 tvm_rpc-${USER}; sleep 2; export CLML_PROFILING=1; export CLML_IS_TUNING_RUN=1; export CLML_TUNING_CACHE=clml.bin; LD_LIBRARY_PATH=${TARGET_FOLDER}/ ./tvm_rpc-${USER} server --host=0.0.0.0 --port=${LISTEN_PORT} --port-end=$((LISTEN_PORT + 10)) --tracker=127.0.0.1:${TVM_TRACKER_PORT} --key=${RPC_DEVICE_KEY}" ;; "query") diff --git a/tests/scripts/task_build_adreno_bins.sh b/tests/scripts/task_build_adreno_bins.sh index 80ac461c4e..38eefd93a6 100755 --- a/tests/scripts/task_build_adreno_bins.sh +++ b/tests/scripts/task_build_adreno_bins.sh @@ -31,6 +31,9 @@ cp ../cmake/config.cmake . if [ -f "${ADRENO_OPENCL}/CL/cl_qcom_ml_ops.h" ] ; then echo set\(USE_CLML "${ADRENO_OPENCL}"\) >> config.cmake echo set\(USE_CLML_GRAPH_EXECUTOR "${ADRENO_OPENCL}"\) >> config.cmake +fi +if [ -f "${ADRENO_OPENCL}/CL/cl.h" ] ; then +echo set\(USE_OPENCL "${ADRENO_OPENCL}"\) >> config.cmake else echo set\(USE_OPENCL ON\) >> config.cmake fi diff --git a/tests/scripts/task_config_build_adreno.sh b/tests/scripts/task_config_build_adreno.sh index afe6407cba..cf8917c9a5 100755 --- a/tests/scripts/task_config_build_adreno.sh +++ b/tests/scripts/task_config_build_adreno.sh @@ -26,9 +26,8 @@ cp ../cmake/config.cmake . echo set\(USE_OPENCL_GTEST /googletest\) >> config.cmake if [ -f "${ADRENO_OPENCL}/CL/cl_qcom_ml_ops.h" ] ; then echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake -else -echo set\(USE_OPENCL ON\) >> config.cmake fi +echo set\(USE_OPENCL ON\) >> config.cmake echo set\(USE_RPC ON\) >> config.cmake echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
(tvm) branch main updated: [RUNTIME][CLML] Fix for Softmax op for 4D tensors (#16328)
This is an automated email from the ASF dual-hosted git repository. srk pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/tvm.git The following commit(s) were added to refs/heads/main by this push: new a5e883e846 [RUNTIME][CLML] Fix for Softmax op for 4D tensors (#16328) a5e883e846 is described below commit a5e883e8465e11221d3f22d6ef2f61a1bfa5d1f2 Author: krishnaraj36 AuthorDate: Thu Jan 18 12:38:57 2024 +0530 [RUNTIME][CLML] Fix for Softmax op for 4D tensors (#16328) Fixed the softmax layer for 4D tensors to support for NCHW and NHWC layout types. Enabled relevant test cases for softmax layer --- python/tvm/relay/op/contrib/clml.py| 3 +- src/runtime/contrib/clml/clml_runtime.cc | 62 - tests/python/contrib/test_clml/test_ops.py | 86 -- 3 files changed, 98 insertions(+), 53 deletions(-) diff --git a/python/tvm/relay/op/contrib/clml.py b/python/tvm/relay/op/contrib/clml.py index 14dd35a3cb..53b022c347 100644 --- a/python/tvm/relay/op/contrib/clml.py +++ b/python/tvm/relay/op/contrib/clml.py @@ -437,7 +437,8 @@ def clml_pattern_table(): def check_softmax_op(extract): call = extract -if len(call.args[0].checked_type.shape) > 2: +# supports 2D and 4D tensors +if len(call.args[0].checked_type.shape) not in [2, 4]: return False return True diff --git a/src/runtime/contrib/clml/clml_runtime.cc b/src/runtime/contrib/clml/clml_runtime.cc index aa1e2b82b6..8e69cb8bd1 100644 --- a/src/runtime/contrib/clml/clml_runtime.cc +++ b/src/runtime/contrib/clml/clml_runtime.cc @@ -511,6 +511,7 @@ class CLMLRuntime : public JSONRuntimeBase { /*! * \brief Create an CLML tensor from JSON node entry. Lookup storage map before creation. + * Update input placeholder for NHWC layout * * \param nid The node index of graph JSON. * \param shape shape information of tensor @@ -528,15 +529,22 @@ class CLMLRuntime : public JSONRuntimeBase { uint32_t eid = EntryID(nid, 0); node_data = data_entry_[eid]->data; } + auto clml_tensor = MakeCLMLTensorFromJSONNode(node, layout, dtype, node_data, shape); + this->layer_.storage_map.insert({nid, std::make_pair(clml_tensor, node)}); if ("input" == node.GetOpType()) { this->layer_.inputs.insert({nid, this->layer_.storage_map[nid].first}); // Input copy placeholder Tensor -this->layer_.in_placeholder.insert( -{nid, MakeCLMLTensorFromJSONNode(node, CL_TENSOR_LAYOUT_NCHW_QCOM, dtype, node_data, - shape)}); +if (layout == CL_TENSOR_LAYOUT_OPTIMAL_QCOM) { + this->layer_.in_placeholder.insert( + {nid, MakeCLMLTensorFromJSONNode(node, CL_TENSOR_LAYOUT_NCHW_QCOM, dtype, node_data, + shape)}); +} else { + this->layer_.in_placeholder.insert( + {nid, MakeCLMLTensorFromJSONNode(node, layout, dtype, node_data, shape)}); +} } return clml_tensor; @@ -559,6 +567,7 @@ class CLMLRuntime : public JSONRuntimeBase { const auto& node = nodes_[nid]; if ("nn.dense" == node.GetOpName()) CreateDenseLayerTensor(_, node, nid); if ("nn.batch_matmul" == node.GetOpName()) CreateBatchMatmulLayerTensor(_, node, nid); + if ("nn.softmax" == node.GetOpName()) CreateSoftmaxLayerTensor(_, node, nid); } for (nid = 0; nid < nodes_.size(); ++nid) { @@ -1092,6 +1101,37 @@ class CLMLRuntime : public JSONRuntimeBase { return; } + /*! + * \brief Create a Softmax layer Tensors with supported layout. + * \param layer The CLML layer to build. Containing inputs, outputs and the CLML function. + * \param node The JSON representation of the operator. + * \param nid The node index of JSON graph node, which points to this operator. + */ + + void CreateSoftmaxLayerTensor(CachedLayer* layer, const JSONGraphNode& node, size_t nid) { +cl_ml_tensor_layout_qcom layout; +cl_int result = 0; +cl_ml_op_qcom op = nullptr; +DLDataType tvm_dtype = node.GetOpDataType()[0]; +cl_channel_type cl_dtype = MakeCLDataType(tvm_dtype); +auto out_dims = GetTensorDims(nodes_[node.GetInputs()[0].id_]); +int axis = std::stoi(node.GetAttr>("axis")[0]); +// enabling NHWC layout && NCHW layout for 4D, basis the axis value +if (out_dims.h >= 1 && out_dims.w >= 1) { + if (axis == 3 || axis == -1) { +layout = CL_TENSOR_LAYOUT_NHWC_QCOM; + } else { +layout = CL_TENSOR_LAYOUT_NCHW_QCOM; + } +} else { // default layout for 2D + layout = CL_TENSOR_LAYOUT_OPTIMAL_QCOM; +} +auto output = MakeCLMLTensorFromJSONEntry(nid, {}, layout, cl_dtype)
(tvm) branch main updated: [RUNTIME][CLML] Fix for CLML ops and enable more test case (#15896)
This is an automated email from the ASF dual-hosted git repository. srk pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/tvm.git The following commit(s) were added to refs/heads/main by this push: new 3a57a40c1b [RUNTIME][CLML] Fix for CLML ops and enable more test case (#15896) 3a57a40c1b is described below commit 3a57a40c1ba40e1c330346905f8db72775fc9992 Author: krishnaraj36 AuthorDate: Wed Dec 20 13:50:00 2023 +0530 [RUNTIME][CLML] Fix for CLML ops and enable more test case (#15896) * [RUNTIME][CLML] Fix for few clml ops Fixed the dense operator and enhance clml network testcase * [RUNTIME][CLML] Fix for dense layer and float16 Fixed the dense layer issue in network level and improved converage of dense layer with clml Fixed float16 crash error. * Update comment for dense pattern * fix in clml test cases * Enable more test cases and few fixes * Fix the import error * Fix the import error * Fix in batchnorm testcase * Restructure clml test case and enable vm executor * Fix the import error in clml test network * Fix the test failure for vm tests * Update clml.py --- python/tvm/relay/op/contrib/clml.py | 118 ++- src/relay/backend/contrib/clml/codegen.cc| 2 +- src/runtime/contrib/clml/clml_runtime.cc | 521 - tests/python/contrib/test_clml/conftest.py | 21 +- tests/python/contrib/test_clml/infrastructure.py | 242 +++--- tests/python/contrib/test_clml/test_network.py | 249 +++--- tests/python/contrib/test_clml/test_ops.py | 942 +-- tests/scripts/task_python_adreno.sh | 1 + 8 files changed, 1332 insertions(+), 764 deletions(-) diff --git a/python/tvm/relay/op/contrib/clml.py b/python/tvm/relay/op/contrib/clml.py index f194dd114b..14dd35a3cb 100644 --- a/python/tvm/relay/op/contrib/clml.py +++ b/python/tvm/relay/op/contrib/clml.py @@ -18,6 +18,7 @@ """CLML Library supported operators.""" import json from string import Template +import numpy as np import tvm from tvm import relay @@ -27,7 +28,7 @@ from tvm.relay import transform from tvm.relay.build_module import bind_params_by_name from tvm.relay import function as _function from tvm.relay.expr_functor import ExprMutator -from tvm.relay.expr import Call, TupleGetItem +from tvm.relay.expr import Call, TupleGetItem, Var, Constant from ...dataflow_pattern import wildcard, is_op, is_constant, is_tuple_get_item, is_tuple from .register import register_pattern_table @@ -81,34 +82,61 @@ class RemoveDropoutPass: return RemoveDropout().visit(func) -class BroadcastInputs(ExprMutator): +class OptimizeBatchnorm(ExprMutator): """ -Binary operators need broadcasting for CLML. +Fuse Conv+Batchnorm and constant folder to generate Conv+Add. """ -def visit_call(self, call): -if call.op.name in ["add", "subtract", "multiply", "divide", "maximum", "minimum"]: -new_fn = self.visit(call.op) -call_shape = call.checked_type.shape -lhs = call.args[0] -rhs = call.args[1] -lhs_shape = lhs.checked_type.shape -rhs_shape = rhs.checked_type.shape -if list(call_shape) != list(lhs_shape): -lhs = relay.broadcast_to(self.visit(lhs), call_shape) -if list(call_shape) != list(rhs_shape): -rhs = relay.broadcast_to(self.visit(rhs), call_shape) -args = [lhs, rhs] -return Call(new_fn, args, call.attrs) -return super().visit_call(call) +def visit_call(self, call) -> relay.expr.Expr: +new_args = [] +for arg in call.args: +if ( +not isinstance(arg, (Var, Constant)) +and isinstance(arg, tvm.relay.TupleGetItem) +and arg.tuple_value.op.name == "nn.batch_norm" +and (not isinstance(arg.tuple_value.args[0], (Var, Constant))) +and arg.tuple_value.args[0].op.name == "nn.conv2d" +): +ep = arg.tuple_value.attrs["epsilon"] +wt = arg.tuple_value.args[1].data.numpy() +bs = arg.tuple_value.args[2].data.numpy() +mn = arg.tuple_value.args[3].data.numpy() +vr = arg.tuple_value.args[4].data.numpy() + ep +dino = np.sqrt(vr) +wt = wt / dino +bs = bs - mn * wt +conv_op = arg.tuple_value.args[0] +conv_args = list(conv_op.args) +wt_conv = conv_args[1].data.numpy() +if conv_op.attrs["kernel_layout"] == "OIHW": +wt = wt.r
[tvm] branch main updated: [CI][ADRENO] Few updates to Adreno docker setup (#15897)
This is an automated email from the ASF dual-hosted git repository. srk pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/tvm.git The following commit(s) were added to refs/heads/main by this push: new a79f632333 [CI][ADRENO] Few updates to Adreno docker setup (#15897) a79f632333 is described below commit a79f632333d0f319f938ad58575bbd5ea85bd0d3 Author: Siva AuthorDate: Tue Oct 10 13:51:02 2023 +0530 [CI][ADRENO] Few updates to Adreno docker setup (#15897) Enabling google tests and clang-format version update. --- apps/cpp_clml/scripts/clml_codegen.py | 2 +- docker/Dockerfile.ci_adreno | 8 ++-- tests/scripts/task_config_build_adreno.sh | 1 + 3 files changed, 8 insertions(+), 3 deletions(-) diff --git a/apps/cpp_clml/scripts/clml_codegen.py b/apps/cpp_clml/scripts/clml_codegen.py index bf19c0e4b9..7540812ed5 100644 --- a/apps/cpp_clml/scripts/clml_codegen.py +++ b/apps/cpp_clml/scripts/clml_codegen.py @@ -57,7 +57,7 @@ def main(): f_src = open("../clml_models.cc", "w") f_src.write("\n".join(gen_src)) f_src.close() -os.popen("clang-format-10 -i ../clml_models.cc") +os.popen("clang-format-15 -i ../clml_models.cc") if __name__ == "__main__": diff --git a/docker/Dockerfile.ci_adreno b/docker/Dockerfile.ci_adreno index 11be0a8baa..961977c542 100644 --- a/docker/Dockerfile.ci_adreno +++ b/docker/Dockerfile.ci_adreno @@ -16,7 +16,7 @@ # under the License. # CI docker GPU env -FROM tlcpack/ci-gpu:20220908-060034-62bdc91b1 +FROM tlcpack/ci-gpu COPY utils/apt-install-and-clear.sh /usr/local/bin/apt-install-and-clear @@ -26,4 +26,8 @@ RUN bash /install/ubuntu_install_androidsdk.sh 25.2.9519653 3.22.1 33.0.2 33 ENV PATH /opt/android-sdk-linux/platform-tools:$PATH # Clang tool for CLML source codegen -RUN apt-get update && apt-install-and-clear -y clang-format-10 +RUN apt-get update && apt-install-and-clear -y clang-format-15 + +#Google Test +COPY install/ubuntu_install_googletest.sh /install/ubuntu_install_googletest.sh +RUN bash install/ubuntu_install_googletest.sh diff --git a/tests/scripts/task_config_build_adreno.sh b/tests/scripts/task_config_build_adreno.sh index 1b6750f165..afe6407cba 100755 --- a/tests/scripts/task_config_build_adreno.sh +++ b/tests/scripts/task_config_build_adreno.sh @@ -23,6 +23,7 @@ mkdir -p "$BUILD_DIR" cd "$BUILD_DIR" cp ../cmake/config.cmake . +echo set\(USE_OPENCL_GTEST /googletest\) >> config.cmake if [ -f "${ADRENO_OPENCL}/CL/cl_qcom_ml_ops.h" ] ; then echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake else
[tvm] branch main updated: [CLI TOOLS][RTVM] Improve rtvm tool with new options to measure native performance (#15818)
This is an automated email from the ASF dual-hosted git repository. srk pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/tvm.git The following commit(s) were added to refs/heads/main by this push: new def551dfd5 [CLI TOOLS][RTVM] Improve rtvm tool with new options to measure native performance (#15818) def551dfd5 is described below commit def551dfd50bfff4e9d50108dc4e8027b553b8ec Author: Siva AuthorDate: Fri Sep 29 10:30:20 2023 +0530 [CLI TOOLS][RTVM] Improve rtvm tool with new options to measure native performance (#15818) * [RTVM] Improve rtvm tool with new options to measure native performance Few fixes and enhancements that affects model loading times New options to measure performance. * * review comments * * review comments --- apps/cpp_rtvm/README.md | 22 + apps/cpp_rtvm/main.cc | 199 ++-- apps/cpp_rtvm/tvm_runner.cc | 129 +--- apps/cpp_rtvm/tvm_runner.h | 24 +- 4 files changed, 316 insertions(+), 58 deletions(-) diff --git a/apps/cpp_rtvm/README.md b/apps/cpp_rtvm/README.md index c60a7b0e12..652d46eb58 100644 --- a/apps/cpp_rtvm/README.md +++ b/apps/cpp_rtvm/README.md @@ -122,6 +122,11 @@ Command line usage --input- Numpy file for the model input (optional and we use random of not given) --output - Numpy file name to dump the model output as numpy --dump-meta- Dump model meta information +--pre-compiled - The file name of a file where pre-compiled programs should be stored +--profile - Profile over all execution +--dry-run - Profile after given dry runs, default 10 +--run-count- Profile for given runs, default 50 +--zero-copy- Profile with zero copy api Example ./rtvm --model=keras-resnet50 --device="opencl" --dump-meta @@ -366,3 +371,20 @@ stored. If the pre-compiled file name was passed to the `rtvm` then After method `Load`, method `UsePreCompiledProgram` is called. This method loads pre-compiled programs if the file exists. In opposite case the file will be created and pre-compiled programs will be saved to this file. + +# Performnace Profiling Options +The tool has added few options to measure wall clock performance of the given model on Target natively. +--profile : Can turn on the profiling +--dry-run : The number of times dry run the model before mearuring the performance. Default value os 10 +--run-count : The number times to run the model and take an average. Default value is 50. +--zero-copy: This option enables graph runtime zero copy to be used for input and output than byte copy to DLTensor. + +Performance profile options dumps information summary as given below. + Module Load :27 ms + Graph Runtime Create :11 ms + Params Read :15 ms + Params Set :41 ms + Pre Compiled Progs Load :24 ms +Total Load Time :118 ms +Average ExecTime:27 ms +Unload Time :35.9236 ms diff --git a/apps/cpp_rtvm/main.cc b/apps/cpp_rtvm/main.cc index c38a5f62bd..dc3cf1c414 100644 --- a/apps/cpp_rtvm/main.cc +++ b/apps/cpp_rtvm/main.cc @@ -29,6 +29,7 @@ #endif #include +#include #include #include #include @@ -54,7 +55,11 @@ static const string kUsage = "--input- Numpy file for the model input (optional and we use random of not given)\n" "--output - Numpy file name to dump the model output as numpy\n" "--dump-meta- Dump model meta information\n" -"--pre-compiled - The file name of a file where pre-compiled programs should be stored" +"--pre-compiled - The file name of a file where pre-compiled programs should be stored\n" +"--profile - Profile over all execution\n" +"--dry-run - Profile after given dry runs, default 10\n" +"--run-count- Profile for given runs, default 50\n" +"--zero-copy- Profile with zero copy api\n" "\n" " Example\n" " ./rtvm --model=keras-resnet50 --device=\"opencl\" --dump-meta\n" @@ -68,6 +73,7 @@ static const string kUsage = * \arg input Numpy file for the model input * \arg output Numpy file name to dump the model output as numpy * \arg pre_compiled File name where pre-compiled programs should be stored + * \arg profile Do we profile overall execution */ struct ToolArgs { string model; @@ -75,7 +81,11 @@ struct ToolArgs { string input; string output; string pre_compiled; - bool dump_meta = false; + bool dump_meta{false}; + bool profile{false}; + int dry_run{10}; + int run_count{50}; + bool zero_copy{false}; }; /*! @@ -89,6 +99,10 @@ void PrintArgs(const ToolArgs& args) { LOG(INFO) << "Output= " << args.output; LOG(INFO) << "Pre-compiled = " <&l
[tvm] branch main updated: [FRONTEND] Fix unnecessary pylint errors (#15838)
This is an automated email from the ASF dual-hosted git repository. srk pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/tvm.git The following commit(s) were added to refs/heads/main by this push: new 8b40f5d028 [FRONTEND] Fix unnecessary pylint errors (#15838) 8b40f5d028 is described below commit 8b40f5d028632da82bd6cbf83865041d4186b068 Author: Siva AuthorDate: Fri Sep 29 10:29:00 2023 +0530 [FRONTEND] Fix unnecessary pylint errors (#15838) Handle unnecessary pylint errors from these frontends --- tests/python/frontend/keras/test_forward.py | 2 +- tests/python/frontend/oneflow/test_forward.py | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tests/python/frontend/keras/test_forward.py b/tests/python/frontend/keras/test_forward.py index 9d33b15a91..ba3880e186 100644 --- a/tests/python/frontend/keras/test_forward.py +++ b/tests/python/frontend/keras/test_forward.py @@ -28,11 +28,11 @@ from tensorflow import keras as tf_keras # prevent Keras from using up all gpu memory import keras +import pytest import tvm from tvm import relay from tvm.contrib import graph_executor import tvm.testing -import pytest if tf.executing_eagerly(): GPUS = tf.config.experimental.list_physical_devices("GPU") diff --git a/tests/python/frontend/oneflow/test_forward.py b/tests/python/frontend/oneflow/test_forward.py index 7ddc347e86..fda5f1b723 100644 --- a/tests/python/frontend/oneflow/test_forward.py +++ b/tests/python/frontend/oneflow/test_forward.py @@ -20,11 +20,11 @@ import os import numpy as np import oneflow as flow +from packaging import version as package_version import tvm import tvm.testing import tvm.topi.testing from tvm import relay -from packaging import version as package_version MODEL_HOME = "test_model"
[tvm] branch main updated: [OpenCL] Implement save/load pre-compiled programs (#13868)
This is an automated email from the ASF dual-hosted git repository. srk pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/tvm.git The following commit(s) were added to refs/heads/main by this push: new 099ed94951 [OpenCL] Implement save/load pre-compiled programs (#13868) 099ed94951 is described below commit 099ed949519f3b6ae182c31ce69496f18a1f60ad Author: Egor Churaev AuthorDate: Fri Feb 3 05:28:35 2023 +0300 [OpenCL] Implement save/load pre-compiled programs (#13868) * [OpenCL] Implement save/load pre-compiled programs Using pre-compiled programs might significantly improve inference time of the first run. - Added methods `SupportPreCompiledPrograms` which reports if the module supports using pre-compiled programs. - Method `GetPreCompiledPrograms` returns string with bytes of pre-compiled programs. - Method `SetPreCompiledPrograms` allows user to pass pre-compiled programs to the module. * Fix lint * Apply comment: PackedFunc is used * Fix build * Fix CI and rename functions * Apply comments --- apps/cpp_rtvm/README.md| 14 ++ apps/cpp_rtvm/main.cc | 9 + apps/cpp_rtvm/tvm_runner.cc| 29 ++- apps/cpp_rtvm/tvm_runner.h | 4 + src/runtime/opencl/opencl_common.h | 2 + src/runtime/opencl/opencl_device_api.cc| 4 +- src/runtime/opencl/opencl_module.cc| 77 .../opencl/opencl_wrapper/opencl_wrapper.cc| 12 ++ tests/cpp-runtime/opencl/opencl_compile_to_bin.cc | 208 + 9 files changed, 356 insertions(+), 3 deletions(-) diff --git a/apps/cpp_rtvm/README.md b/apps/cpp_rtvm/README.md index e696153282..c60a7b0e12 100644 --- a/apps/cpp_rtvm/README.md +++ b/apps/cpp_rtvm/README.md @@ -352,3 +352,17 @@ python3 -m tvm.driver.tvmc compile --cross-compiler ${ANDROID_NDK_HOME}/toolchai python3 -m tvm.driver.tvmc run --device="cl" keras-resnet50.tar --rpc-key ${TVM_RPC_KEY} --rpc-tracker {TVM_TRACKER_HOST}:{TVM_TRACKER_PORT} --print-time ``` + +# Use pre-compiled OpenCL kernels +Using pre-compiled programs might significantly improve inference time of the +first run. E.g. for topology with ~300 kernels compilation time on Adreno was +about 26 seconds. But after dumping compiled programs to binary files and reuse +them on the next runs, the compilation time was significantly decreased (more +than 1000 times) and starts to be around 25 ms. + +To use such functionality, the developer have to pass parameter `--pre-compiled` +to the `rtvm` and specify the file name where pre-compiled programs will be +stored. If the pre-compiled file name was passed to the `rtvm` then After method +`Load`, method `UsePreCompiledProgram` is called. This method loads pre-compiled +programs if the file exists. In opposite case the file will be created and +pre-compiled programs will be saved to this file. diff --git a/apps/cpp_rtvm/main.cc b/apps/cpp_rtvm/main.cc index 31019ee0c9..c38a5f62bd 100644 --- a/apps/cpp_rtvm/main.cc +++ b/apps/cpp_rtvm/main.cc @@ -54,6 +54,7 @@ static const string kUsage = "--input- Numpy file for the model input (optional and we use random of not given)\n" "--output - Numpy file name to dump the model output as numpy\n" "--dump-meta- Dump model meta information\n" +"--pre-compiled - The file name of a file where pre-compiled programs should be stored" "\n" " Example\n" " ./rtvm --model=keras-resnet50 --device=\"opencl\" --dump-meta\n" @@ -66,12 +67,14 @@ static const string kUsage = * \arg device The target device to use {llvm, cl, ...etc.} * \arg input Numpy file for the model input * \arg output Numpy file name to dump the model output as numpy + * \arg pre_compiled File name where pre-compiled programs should be stored */ struct ToolArgs { string model; string device; string input; string output; + string pre_compiled; bool dump_meta = false; }; @@ -84,6 +87,7 @@ void PrintArgs(const ToolArgs& args) { LOG(INFO) << "Device= " << args.device; LOG(INFO) << "Input = " << args.input; LOG(INFO) << "Output= " << args.output; + LOG(INFO) << "Pre-compiled = " << args.pre_compiled; LOG(INFO) << "Dump Metadata = " << ((args.dump_meta) ? ("True") : ("False")); } @@ -172,6 +176,8 @@ void ParseCmdArgs(int argc, char* argv[], struct ToolArgs& args) { if (!pmeta.empty()) { args.dump_meta = true; } + + args.pre_compiled = GetCmdOption(argc, argv, "--pre-compiled="); } /*! @@ -190,6 +196,9 @@ int ExecuteMo
[tvm] branch main updated (18b7dc1dd9 -> 56771a87d1)
This is an automated email from the ASF dual-hosted git repository. srk pushed a change to branch main in repository https://gitbox.apache.org/repos/asf/tvm.git from 18b7dc1dd9 [MetaSchedule] Fix for RewriteLayout + AllocateConst when the rank of the rewritten weight doesn't change (#13851) add 56771a87d1 [CLML][RUNTIME] Enable more ops in CLML runtime (#13834) No new revisions were added by this update. Summary of changes: python/tvm/relay/op/contrib/clml.py| 16 - src/runtime/contrib/clml/clml_runtime.cc | 67 ++- tests/python/contrib/test_clml/test_ops.py | 102 + 3 files changed, 183 insertions(+), 2 deletions(-)
[tvm] branch main updated: [CLML][RELAY] Enable Pad and Conv2d layer fusion (#13649)
This is an automated email from the ASF dual-hosted git repository. srk pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/tvm.git The following commit(s) were added to refs/heads/main by this push: new ece99a243b [CLML][RELAY] Enable Pad and Conv2d layer fusion (#13649) ece99a243b is described below commit ece99a243beab1fe879d78868367731d5a516a83 Author: krishnaraj36 <45380557+krishnara...@users.noreply.github.com> AuthorDate: Wed Dec 28 11:24:11 2022 +0530 [CLML][RELAY] Enable Pad and Conv2d layer fusion (#13649) * [CLML][RELAY] Enable Pad and Conv2d layer fusion Enabled clml supported nn.pad+nn.conv2d fusion pattern in clml pattern table * Fix pad testcase attributes * Fix the lint error * Fix the lint error * Removed redundent check in clml pattern * Fix the lint error Co-authored-by: kvegiraj --- python/tvm/relay/op/contrib/clml.py| 21 + src/relay/backend/contrib/clml/codegen.cc | 2 +- tests/python/contrib/test_clml/test_ops.py | 4 ++-- 3 files changed, 24 insertions(+), 3 deletions(-) diff --git a/python/tvm/relay/op/contrib/clml.py b/python/tvm/relay/op/contrib/clml.py index c3d4eb8470..6453b8a06c 100644 --- a/python/tvm/relay/op/contrib/clml.py +++ b/python/tvm/relay/op/contrib/clml.py @@ -147,6 +147,23 @@ def clml_pattern_table(): pattern = pattern.optional(is_op("clip")) return pattern +def pad_conv_pattern(): +"""Create a pad with convolution pattern.""" +pattern = is_op("nn.pad")(wildcard(), is_constant()) +pattern = is_op("nn.conv2d")(pattern, is_constant()) +pattern = pattern.optional(lambda x: is_op("nn.bias_add")(x, is_constant())) +pattern = pattern.optional(lambda x: is_op("add")(x, is_constant())) +pattern = pattern.optional( +lambda x: is_tuple_get_item( +is_op("nn.batch_norm")( +x, is_constant(), is_constant(), is_constant(), is_constant() +) +) +) +pattern = pattern.optional(is_op("nn.relu")) +pattern = pattern.optional(is_op("clip")) +return pattern + def batch_norm_pattern(): """Create a batch norm pattern.""" pattern = is_op("nn.batch_norm")( @@ -200,9 +217,11 @@ def clml_pattern_table(): while call.op.name != "nn.conv2d": call = call.args[0] + attrs, args = call.attrs, call.args if attrs.data_layout != "NCHW": return False + if ( (not clip_found) and (attrs.kernel_size[0] == 3) @@ -211,6 +230,7 @@ def clml_pattern_table(): and (attrs.channels == attrs.groups) ): return False + data_typ = args[0].checked_type kernel_typ = args[1].checked_type is_depthwise = is_depthwise_conv2d( @@ -246,6 +266,7 @@ def clml_pattern_table(): return True return [ +("clml.pad_conv2d", pad_conv_pattern(), check_conv), ("clml.conv2d", conv_pattern(), check_conv), ("clml.dense", dense_pattern(), check_default_op), ("clml.pad", pad_pattern(), check_pad_op), diff --git a/src/relay/backend/contrib/clml/codegen.cc b/src/relay/backend/contrib/clml/codegen.cc index 9ecec0c453..167c48e1ba 100644 --- a/src/relay/backend/contrib/clml/codegen.cc +++ b/src/relay/backend/contrib/clml/codegen.cc @@ -83,7 +83,7 @@ class CLMLJSONSerializer : public backend::contrib::JSONSerializer { ICHECK(comp.defined()) << "CLML JSON runtime only supports composite functions."; const std::string name = comp.value(); std::shared_ptr json_node; -if (name == "clml.conv2d") { +if (name == "clml.conv2d" || name == "clml.pad_conv2d") { json_node = CreateCompositeConvJSONNode(cn); } else if (name == "clml.batch_norm") { json_node = CreateBatchNormJSONNode(cn); diff --git a/tests/python/contrib/test_clml/test_ops.py b/tests/python/contrib/test_clml/test_ops.py index d2431d2dfd..da09715fbe 100644 --- a/tests/python/contrib/test_clml/test_ops.py +++ b/tests/python/contrib/test_clml/test_ops.py @@ -45,7 +45,7 @@ def _get_conv_model( a = relay.var(next(iter(var)), shape=shape, dtype=dtype) input_arr = var[next(iter(var))] if has_pad: -p = ((0, 0), (padding[0], padding[0]), (padding[1], padding[1]), (0, 0)) +p = ((0, 0), (0, 0), (padding[0], padding[0]), (padding[1], padding[1])) a = relay.nn.pad(a, pad_width=p) padding = (0, 0, 0, 0) else: @@ -97,7 +97,7 @@ def test_conv2d(device, dtype): trials = [ # Normal c
[incubator-tvm] branch master updated: Don't add cast for TF batch norm when type isn't changing (#5731)
This is an automated email from the ASF dual-hosted git repository. srk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-tvm.git The following commit(s) were added to refs/heads/master by this push: new 2e1ef8e Don't add cast for TF batch norm when type isn't changing (#5731) 2e1ef8e is described below commit 2e1ef8e4b7e39bcd0ce68192c38800e2364e0984 Author: Trevor Morris AuthorDate: Mon Jun 8 16:43:28 2020 -0700 Don't add cast for TF batch norm when type isn't changing (#5731) --- python/tvm/relay/frontend/tensorflow.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/python/tvm/relay/frontend/tensorflow.py b/python/tvm/relay/frontend/tensorflow.py index 201c6ba..50987f9 100644 --- a/python/tvm/relay/frontend/tensorflow.py +++ b/python/tvm/relay/frontend/tensorflow.py @@ -1227,7 +1227,7 @@ def _fused_batch_norm(): attr['data_format'] = attr['data_format'].decode("utf-8") if attr['data_format'] == 'NCHW': axis = 1 -if 'U' in attr: +if 'U' in attr and attr['U'].name != attr['T'].name: need_cast = True inputs[0] = _op.cast(inputs[0], dtype=attr['U'].name) # Check if mean and variance are empty
[incubator-tvm] branch master updated (de54754 -> 2ec7caa)
This is an automated email from the ASF dual-hosted git repository. srk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/incubator-tvm.git. from de54754 Fix the values for test_fmod since it fails way too often otherwise (#5723) add 2ec7caa fix small bug about dense_grad (#5695) No new revisions were added by this update. Summary of changes: python/tvm/relay/op/_tensor_grad.py | 7 --- tests/python/relay/test_op_grad_level2.py | 1 + 2 files changed, 5 insertions(+), 3 deletions(-)
[incubator-tvm] branch master updated (3d61dc8 -> 43dcbc6)
This is an automated email from the ASF dual-hosted git repository. srk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/incubator-tvm.git. from 3d61dc8 [ONNX]ReduceL1, ReduceL2, ReduceSumSquare, ReduceLogSum ops added (#5721) add 43dcbc6 [TENSORFLOW]StatefulPartitionedCall/PartitionedCall Ops support added (#5617) No new revisions were added by this update. Summary of changes: python/tvm/relay/frontend/tensorflow.py | 126 - tests/python/frontend/tensorflow/test_forward.py | 344 ++- 2 files changed, 465 insertions(+), 5 deletions(-)
[incubator-tvm] branch master updated (030a163 -> 70017ef)
This is an automated email from the ASF dual-hosted git repository. srk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/incubator-tvm.git. from 030a163 update_document_after_repository_renamed (#4398) add 70017ef [Golang][Doc] improve the samples and doc (#4385) No new revisions were added by this update. Summary of changes: golang/README.md | 8 ++-- golang/sample/Makefile | 2 +- golang/sample/complex.go | 2 +- golang/sample/gen_mobilenet_lib.py | 91 ++ 4 files changed, 98 insertions(+), 5 deletions(-) create mode 100644 golang/sample/gen_mobilenet_lib.py
[incubator-tvm] branch master updated (2baf310 -> a226973)
This is an automated email from the ASF dual-hosted git repository. srk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/incubator-tvm.git. from 2baf310 [Relay][Frontend][Tensorflow]Add conv2d_transpose (#4300) add a226973 [Frontend]Add TensorFlow FloorMod (#4308) No new revisions were added by this update. Summary of changes: docs/api/python/topi.rst | 2 ++ docs/frontend/tensorflow.rst | 1 + docs/langref/relay_op.rst| 2 ++ python/tvm/relay/frontend/tensorflow.py | 10 +-- python/tvm/relay/op/_tensor.py | 4 +++ python/tvm/relay/op/tensor.py| 36 ++ src/relay/op/tensor/binary.cc| 12 tests/python/frontend/tensorflow/test_forward.py | 26 ++-- tests/python/relay/test_op_level1.py | 4 ++- topi/include/topi/broadcast.h| 38 topi/python/topi/broadcast.py| 38 topi/src/topi.cc | 2 ++ topi/tests/python/test_topi_broadcast.py | 14 + 13 files changed, 183 insertions(+), 6 deletions(-)