(tvm) branch main updated: [CLML] Fix in clml pattern check condition (#16933)

2024-04-26 Thread srk
This is an automated email from the ASF dual-hosted git repository.

srk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new 1453893be0 [CLML] Fix in clml pattern check condition (#16933)
1453893be0 is described below

commit 1453893be08f34dbde2950a179028d11daf48936
Author: krishnaraj36 
AuthorDate: Sat Apr 27 11:06:31 2024 +0530

[CLML] Fix in clml pattern check condition (#16933)

* [CLML] Fix in clml pattern check condition

Added more check condition to make clml path more robust.
1. Depth_to_space - CLML path only supported for mode="DCR" and NCHW
layout
2. Default checks -  CLML supports less than 4D tensor dimension and
with batch size =1.

* Update clml.py
---
 python/tvm/relay/op/contrib/clml.py| 118 +
 tests/python/contrib/test_clml/test_ops.py |  30 ++--
 2 files changed, 109 insertions(+), 39 deletions(-)

diff --git a/python/tvm/relay/op/contrib/clml.py 
b/python/tvm/relay/op/contrib/clml.py
index 53b022c347..22a7aae2b1 100644
--- a/python/tvm/relay/op/contrib/clml.py
+++ b/python/tvm/relay/op/contrib/clml.py
@@ -93,6 +93,7 @@ class OptimizeBatchnorm(ExprMutator):
 if (
 not isinstance(arg, (Var, Constant))
 and isinstance(arg, tvm.relay.TupleGetItem)
+and isinstance(arg.tuple_value.op, tvm.ir.op.Op)
 and arg.tuple_value.op.name == "nn.batch_norm"
 and (not isinstance(arg.tuple_value.args[0], (Var, Constant)))
 and arg.tuple_value.args[0].op.name == "nn.conv2d"
@@ -260,7 +261,8 @@ def clml_pattern_table():
 )
 )
 pattern = pattern.optional(is_op("nn.relu"))
-pattern = pattern.optional(is_op("clip"))
+# Fusion pattern to support with relu6 layer.
+pattern = pattern.optional(is_op("clip").has_attr({"a_min": 0.0, 
"a_max": 6.0}))
 return pattern
 
 def conv_transpose_pattern():
@@ -276,7 +278,8 @@ def clml_pattern_table():
 )
 )
 pattern = pattern.optional(is_op("nn.relu"))
-pattern = pattern.optional(is_op("clip"))
+# Fusion pattern to support with relu6 layer.
+pattern = pattern.optional(is_op("clip").has_attr({"a_min": 0.0, 
"a_max": 6.0}))
 return pattern
 
 def pad_conv_pattern():
@@ -293,7 +296,8 @@ def clml_pattern_table():
 )
 )
 pattern = pattern.optional(is_op("nn.relu"))
-pattern = pattern.optional(is_op("clip"))
+# Fusion pattern to support with relu6 layer.
+pattern = pattern.optional(is_op("clip").has_attr({"a_min": 0.0, 
"a_max": 6.0}))
 return pattern
 
 def batch_norm_pattern():
@@ -359,6 +363,9 @@ def clml_pattern_table():
 if attrs.data_layout != "NCHW":
 return False
 
+if call.checked_type.shape[0] > 1:
+return False
+
 if (
 (not clip_found)
 and (attrs.kernel_size[0] == 3)
@@ -411,19 +418,13 @@ def clml_pattern_table():
 # Scalars are not supported
 if len(call.args[1].checked_type.shape) == 0:
 return False
+if call.args[0] == call.args[1]:
+return False
 
 if tuple(call.args[0].checked_type.shape) != 
tuple(call.args[1].checked_type.shape):
 return False
 
-for arg in call.args:
-# Avoid any operators with dtype Int64
-if arg.checked_type.dtype == "int64":
-return False
-# No support for batch> 1
-if arg.checked_type.shape[0] > 1:
-return False
-
-return True
+return check_default_op(call)
 
 def check_pad_op(extract):
 call = extract
@@ -433,60 +434,117 @@ def clml_pattern_table():
 # Pad layers before any convolution are not guarenteed to be NCHW.
 if isinstance(call.args[0], tvm.relay.expr.Var):
 return False
-return True
+return check_default_op(call)
 
 def check_softmax_op(extract):
 call = extract
-# supports 2D and 4D tensors
+# supports 2D and 4D tensors.
 if len(call.args[0].checked_type.shape) not in [2, 4]:
 return False
-return True
+return check_default_op(call)
 
 def check_upsampling_op(extract):
 call = extract
 if call.attrs["method"] != "bilinear":
 return False
-return True
+return check_default_op(call)
 
 def check_concat_op(extract):
 call = extract
 if call.attrs["axis"] != 1:
  

(tvm) branch main updated: [SCRIPT][ADRENO] Fix in build config for adreno (#16927)

2024-04-26 Thread srk
This is an automated email from the ASF dual-hosted git repository.

srk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new 5bd10472e9 [SCRIPT][ADRENO] Fix in build config for adreno (#16927)
5bd10472e9 is described below

commit 5bd10472e9a1b81a25e355824e84587a6988255c
Author: krishnaraj36 
AuthorDate: Fri Apr 26 15:06:10 2024 +0530

[SCRIPT][ADRENO] Fix in build config for adreno (#16927)

1. Enable CXX environment setting for empty tvm subgraph.
 2. Enable clml profiling and tuning in rpc environment
 3. Enable Opencl when CLML build.
---
 tests/scripts/setup-adreno-env.sh | 3 ++-
 tests/scripts/task_build_adreno_bins.sh   | 3 +++
 tests/scripts/task_config_build_adreno.sh | 3 +--
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/tests/scripts/setup-adreno-env.sh 
b/tests/scripts/setup-adreno-env.sh
index 15c124a0f0..d2c776412e 100755
--- a/tests/scripts/setup-adreno-env.sh
+++ b/tests/scripts/setup-adreno-env.sh
@@ -80,6 +80,7 @@ function def_environment() {
 export RPC_DEVICE_KEY="android"
 export RPC_TARGET="adreno"
 export 
TVM_NDK_CC="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang"
+export 
CXX="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang"
 }
 
 def_environment
@@ -111,7 +112,7 @@ case ${ENVIRONMENT} in
 adb forward tcp:$((LISTEN_PORT + 1)) tcp:$((LISTEN_PORT + 1))
 adb forward tcp:$((LISTEN_PORT + 2)) tcp:$((LISTEN_PORT + 2))
 adb forward tcp:$((LISTEN_PORT + 3)) tcp:$((LISTEN_PORT + 3))
-adb shell "cd ${TARGET_FOLDER}; killall -9 tvm_rpc-${USER}; sleep 2; 
LD_LIBRARY_PATH=${TARGET_FOLDER}/ ./tvm_rpc-${USER} server --host=0.0.0.0 
--port=${LISTEN_PORT} --port-end=$((LISTEN_PORT + 10)) 
--tracker=127.0.0.1:${TVM_TRACKER_PORT} --key=${RPC_DEVICE_KEY}"
+adb shell "cd ${TARGET_FOLDER}; killall -9 tvm_rpc-${USER}; sleep 2; 
export CLML_PROFILING=1; export CLML_IS_TUNING_RUN=1; export 
CLML_TUNING_CACHE=clml.bin; LD_LIBRARY_PATH=${TARGET_FOLDER}/ ./tvm_rpc-${USER} 
server --host=0.0.0.0 --port=${LISTEN_PORT} --port-end=$((LISTEN_PORT + 10)) 
--tracker=127.0.0.1:${TVM_TRACKER_PORT} --key=${RPC_DEVICE_KEY}"
 ;;
 
   "query")
diff --git a/tests/scripts/task_build_adreno_bins.sh 
b/tests/scripts/task_build_adreno_bins.sh
index 80ac461c4e..38eefd93a6 100755
--- a/tests/scripts/task_build_adreno_bins.sh
+++ b/tests/scripts/task_build_adreno_bins.sh
@@ -31,6 +31,9 @@ cp ../cmake/config.cmake .
 if [ -f "${ADRENO_OPENCL}/CL/cl_qcom_ml_ops.h" ] ; then
 echo set\(USE_CLML "${ADRENO_OPENCL}"\) >> config.cmake
 echo set\(USE_CLML_GRAPH_EXECUTOR "${ADRENO_OPENCL}"\) >> config.cmake
+fi
+if [ -f "${ADRENO_OPENCL}/CL/cl.h" ] ; then
+echo set\(USE_OPENCL "${ADRENO_OPENCL}"\) >> config.cmake
 else
 echo set\(USE_OPENCL ON\) >> config.cmake
 fi
diff --git a/tests/scripts/task_config_build_adreno.sh 
b/tests/scripts/task_config_build_adreno.sh
index afe6407cba..cf8917c9a5 100755
--- a/tests/scripts/task_config_build_adreno.sh
+++ b/tests/scripts/task_config_build_adreno.sh
@@ -26,9 +26,8 @@ cp ../cmake/config.cmake .
 echo set\(USE_OPENCL_GTEST /googletest\) >> config.cmake
 if [ -f "${ADRENO_OPENCL}/CL/cl_qcom_ml_ops.h" ] ; then
 echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake
-else
-echo set\(USE_OPENCL ON\) >> config.cmake
 fi
+echo set\(USE_OPENCL ON\) >> config.cmake
 echo set\(USE_RPC ON\) >> config.cmake
 echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
 echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake



(tvm) branch main updated: [RUNTIME][CLML] Fix for Softmax op for 4D tensors (#16328)

2024-01-17 Thread srk
This is an automated email from the ASF dual-hosted git repository.

srk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new a5e883e846 [RUNTIME][CLML] Fix for Softmax op for 4D tensors (#16328)
a5e883e846 is described below

commit a5e883e8465e11221d3f22d6ef2f61a1bfa5d1f2
Author: krishnaraj36 
AuthorDate: Thu Jan 18 12:38:57 2024 +0530

[RUNTIME][CLML] Fix for Softmax op for 4D tensors (#16328)

Fixed the softmax layer for 4D tensors to support for NCHW and NHWC
layout types.
Enabled relevant test cases for softmax layer
---
 python/tvm/relay/op/contrib/clml.py|  3 +-
 src/runtime/contrib/clml/clml_runtime.cc   | 62 -
 tests/python/contrib/test_clml/test_ops.py | 86 --
 3 files changed, 98 insertions(+), 53 deletions(-)

diff --git a/python/tvm/relay/op/contrib/clml.py 
b/python/tvm/relay/op/contrib/clml.py
index 14dd35a3cb..53b022c347 100644
--- a/python/tvm/relay/op/contrib/clml.py
+++ b/python/tvm/relay/op/contrib/clml.py
@@ -437,7 +437,8 @@ def clml_pattern_table():
 
 def check_softmax_op(extract):
 call = extract
-if len(call.args[0].checked_type.shape) > 2:
+# supports 2D and 4D tensors
+if len(call.args[0].checked_type.shape) not in [2, 4]:
 return False
 return True
 
diff --git a/src/runtime/contrib/clml/clml_runtime.cc 
b/src/runtime/contrib/clml/clml_runtime.cc
index aa1e2b82b6..8e69cb8bd1 100644
--- a/src/runtime/contrib/clml/clml_runtime.cc
+++ b/src/runtime/contrib/clml/clml_runtime.cc
@@ -511,6 +511,7 @@ class CLMLRuntime : public JSONRuntimeBase {
 
   /*!
* \brief Create an CLML tensor from JSON node entry. Lookup storage map 
before creation.
+   * Update input placeholder for NHWC layout
*
* \param nid The node index of graph JSON.
* \param shape shape information of tensor
@@ -528,15 +529,22 @@ class CLMLRuntime : public JSONRuntimeBase {
 uint32_t eid = EntryID(nid, 0);
 node_data = data_entry_[eid]->data;
   }
+
   auto clml_tensor = MakeCLMLTensorFromJSONNode(node, layout, dtype, 
node_data, shape);
+
   this->layer_.storage_map.insert({nid, std::make_pair(clml_tensor, 
node)});
 
   if ("input" == node.GetOpType()) {
 this->layer_.inputs.insert({nid, this->layer_.storage_map[nid].first});
 // Input copy placeholder Tensor
-this->layer_.in_placeholder.insert(
-{nid, MakeCLMLTensorFromJSONNode(node, CL_TENSOR_LAYOUT_NCHW_QCOM, 
dtype, node_data,
- shape)});
+if (layout == CL_TENSOR_LAYOUT_OPTIMAL_QCOM) {
+  this->layer_.in_placeholder.insert(
+  {nid, MakeCLMLTensorFromJSONNode(node, 
CL_TENSOR_LAYOUT_NCHW_QCOM, dtype, node_data,
+   shape)});
+} else {
+  this->layer_.in_placeholder.insert(
+  {nid, MakeCLMLTensorFromJSONNode(node, layout, dtype, node_data, 
shape)});
+}
   }
 
   return clml_tensor;
@@ -559,6 +567,7 @@ class CLMLRuntime : public JSONRuntimeBase {
   const auto& node = nodes_[nid];
   if ("nn.dense" == node.GetOpName()) CreateDenseLayerTensor(_, 
node, nid);
   if ("nn.batch_matmul" == node.GetOpName()) 
CreateBatchMatmulLayerTensor(_, node, nid);
+  if ("nn.softmax" == node.GetOpName()) CreateSoftmaxLayerTensor(_, 
node, nid);
 }
 
 for (nid = 0; nid < nodes_.size(); ++nid) {
@@ -1092,6 +1101,37 @@ class CLMLRuntime : public JSONRuntimeBase {
 return;
   }
 
+  /*!
+   * \brief Create a Softmax layer Tensors with supported layout.
+   * \param layer The CLML layer to build. Containing inputs, outputs and the 
CLML function.
+   * \param node The JSON representation of the operator.
+   * \param nid The node index of JSON graph node, which points to this 
operator.
+   */
+
+  void CreateSoftmaxLayerTensor(CachedLayer* layer, const JSONGraphNode& node, 
size_t nid) {
+cl_ml_tensor_layout_qcom layout;
+cl_int result = 0;
+cl_ml_op_qcom op = nullptr;
+DLDataType tvm_dtype = node.GetOpDataType()[0];
+cl_channel_type cl_dtype = MakeCLDataType(tvm_dtype);
+auto out_dims = GetTensorDims(nodes_[node.GetInputs()[0].id_]);
+int axis = std::stoi(node.GetAttr>("axis")[0]);
+// enabling  NHWC layout && NCHW layout for 4D,  basis the axis value
+if (out_dims.h >= 1 && out_dims.w >= 1) {
+  if (axis == 3 || axis == -1) {
+layout = CL_TENSOR_LAYOUT_NHWC_QCOM;
+  } else {
+layout = CL_TENSOR_LAYOUT_NCHW_QCOM;
+  }
+} else {  // default layout for 2D
+  layout = CL_TENSOR_LAYOUT_OPTIMAL_QCOM;
+}
+auto output = MakeCLMLTensorFromJSONEntry(nid, {}, layout, cl_dtype)

(tvm) branch main updated: [RUNTIME][CLML] Fix for CLML ops and enable more test case (#15896)

2023-12-20 Thread srk
This is an automated email from the ASF dual-hosted git repository.

srk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new 3a57a40c1b [RUNTIME][CLML] Fix for CLML ops and enable more test case 
(#15896)
3a57a40c1b is described below

commit 3a57a40c1ba40e1c330346905f8db72775fc9992
Author: krishnaraj36 
AuthorDate: Wed Dec 20 13:50:00 2023 +0530

[RUNTIME][CLML] Fix for CLML ops and enable more test case (#15896)

* [RUNTIME][CLML] Fix for few clml ops

Fixed the dense operator and enhance clml network testcase

* [RUNTIME][CLML] Fix for dense layer and float16

Fixed the dense layer issue in network level and improved
converage of dense layer with clml
Fixed float16 crash error.

* Update comment for dense pattern

* fix in clml test cases

* Enable more test cases and few fixes

* Fix the import error

* Fix the import error

* Fix in batchnorm testcase

* Restructure clml test case and enable vm executor

* Fix the import error in clml test network

* Fix the test failure for vm tests

* Update clml.py
---
 python/tvm/relay/op/contrib/clml.py  | 118 ++-
 src/relay/backend/contrib/clml/codegen.cc|   2 +-
 src/runtime/contrib/clml/clml_runtime.cc | 521 -
 tests/python/contrib/test_clml/conftest.py   |  21 +-
 tests/python/contrib/test_clml/infrastructure.py | 242 +++---
 tests/python/contrib/test_clml/test_network.py   | 249 +++---
 tests/python/contrib/test_clml/test_ops.py   | 942 +--
 tests/scripts/task_python_adreno.sh  |   1 +
 8 files changed, 1332 insertions(+), 764 deletions(-)

diff --git a/python/tvm/relay/op/contrib/clml.py 
b/python/tvm/relay/op/contrib/clml.py
index f194dd114b..14dd35a3cb 100644
--- a/python/tvm/relay/op/contrib/clml.py
+++ b/python/tvm/relay/op/contrib/clml.py
@@ -18,6 +18,7 @@
 """CLML Library supported operators."""
 import json
 from string import Template
+import numpy as np
 import tvm
 
 from tvm import relay
@@ -27,7 +28,7 @@ from tvm.relay import transform
 from tvm.relay.build_module import bind_params_by_name
 from tvm.relay import function as _function
 from tvm.relay.expr_functor import ExprMutator
-from tvm.relay.expr import Call, TupleGetItem
+from tvm.relay.expr import Call, TupleGetItem, Var, Constant
 
 from ...dataflow_pattern import wildcard, is_op, is_constant, 
is_tuple_get_item, is_tuple
 from .register import register_pattern_table
@@ -81,34 +82,61 @@ class RemoveDropoutPass:
 return RemoveDropout().visit(func)
 
 
-class BroadcastInputs(ExprMutator):
+class OptimizeBatchnorm(ExprMutator):
 """
-Binary operators need broadcasting for CLML.
+Fuse Conv+Batchnorm and constant folder to generate Conv+Add.
 """
 
-def visit_call(self, call):
-if call.op.name in ["add", "subtract", "multiply", "divide", 
"maximum", "minimum"]:
-new_fn = self.visit(call.op)
-call_shape = call.checked_type.shape
-lhs = call.args[0]
-rhs = call.args[1]
-lhs_shape = lhs.checked_type.shape
-rhs_shape = rhs.checked_type.shape
-if list(call_shape) != list(lhs_shape):
-lhs = relay.broadcast_to(self.visit(lhs), call_shape)
-if list(call_shape) != list(rhs_shape):
-rhs = relay.broadcast_to(self.visit(rhs), call_shape)
-args = [lhs, rhs]
-return Call(new_fn, args, call.attrs)
-return super().visit_call(call)
+def visit_call(self, call) -> relay.expr.Expr:
+new_args = []
+for arg in call.args:
+if (
+not isinstance(arg, (Var, Constant))
+and isinstance(arg, tvm.relay.TupleGetItem)
+and arg.tuple_value.op.name == "nn.batch_norm"
+and (not isinstance(arg.tuple_value.args[0], (Var, Constant)))
+and arg.tuple_value.args[0].op.name == "nn.conv2d"
+):
+ep = arg.tuple_value.attrs["epsilon"]
+wt = arg.tuple_value.args[1].data.numpy()
+bs = arg.tuple_value.args[2].data.numpy()
+mn = arg.tuple_value.args[3].data.numpy()
+vr = arg.tuple_value.args[4].data.numpy() + ep
+dino = np.sqrt(vr)
+wt = wt / dino
+bs = bs - mn * wt
+conv_op = arg.tuple_value.args[0]
+conv_args = list(conv_op.args)
+wt_conv = conv_args[1].data.numpy()
+if conv_op.attrs["kernel_layout"] == "OIHW":
+wt = wt.r

[tvm] branch main updated: [CI][ADRENO] Few updates to Adreno docker setup (#15897)

2023-10-10 Thread srk
This is an automated email from the ASF dual-hosted git repository.

srk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new a79f632333 [CI][ADRENO] Few updates to Adreno docker setup (#15897)
a79f632333 is described below

commit a79f632333d0f319f938ad58575bbd5ea85bd0d3
Author: Siva 
AuthorDate: Tue Oct 10 13:51:02 2023 +0530

[CI][ADRENO] Few updates to Adreno docker setup (#15897)

Enabling google tests and clang-format version update.
---
 apps/cpp_clml/scripts/clml_codegen.py | 2 +-
 docker/Dockerfile.ci_adreno   | 8 ++--
 tests/scripts/task_config_build_adreno.sh | 1 +
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/apps/cpp_clml/scripts/clml_codegen.py 
b/apps/cpp_clml/scripts/clml_codegen.py
index bf19c0e4b9..7540812ed5 100644
--- a/apps/cpp_clml/scripts/clml_codegen.py
+++ b/apps/cpp_clml/scripts/clml_codegen.py
@@ -57,7 +57,7 @@ def main():
 f_src = open("../clml_models.cc", "w")
 f_src.write("\n".join(gen_src))
 f_src.close()
-os.popen("clang-format-10 -i ../clml_models.cc")
+os.popen("clang-format-15 -i ../clml_models.cc")
 
 
 if __name__ == "__main__":
diff --git a/docker/Dockerfile.ci_adreno b/docker/Dockerfile.ci_adreno
index 11be0a8baa..961977c542 100644
--- a/docker/Dockerfile.ci_adreno
+++ b/docker/Dockerfile.ci_adreno
@@ -16,7 +16,7 @@
 # under the License.
 
 # CI docker GPU env
-FROM tlcpack/ci-gpu:20220908-060034-62bdc91b1
+FROM tlcpack/ci-gpu
 
 COPY utils/apt-install-and-clear.sh /usr/local/bin/apt-install-and-clear
 
@@ -26,4 +26,8 @@ RUN bash /install/ubuntu_install_androidsdk.sh 25.2.9519653 
3.22.1 33.0.2 33
 ENV PATH /opt/android-sdk-linux/platform-tools:$PATH
 
 # Clang tool for CLML source codegen
-RUN apt-get update && apt-install-and-clear -y clang-format-10
+RUN apt-get update && apt-install-and-clear -y clang-format-15
+
+#Google Test
+COPY install/ubuntu_install_googletest.sh /install/ubuntu_install_googletest.sh
+RUN bash install/ubuntu_install_googletest.sh
diff --git a/tests/scripts/task_config_build_adreno.sh 
b/tests/scripts/task_config_build_adreno.sh
index 1b6750f165..afe6407cba 100755
--- a/tests/scripts/task_config_build_adreno.sh
+++ b/tests/scripts/task_config_build_adreno.sh
@@ -23,6 +23,7 @@ mkdir -p "$BUILD_DIR"
 cd "$BUILD_DIR"
 cp ../cmake/config.cmake .
 
+echo set\(USE_OPENCL_GTEST /googletest\) >> config.cmake
 if [ -f "${ADRENO_OPENCL}/CL/cl_qcom_ml_ops.h" ] ; then
 echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake
 else



[tvm] branch main updated: [CLI TOOLS][RTVM] Improve rtvm tool with new options to measure native performance (#15818)

2023-09-28 Thread srk
This is an automated email from the ASF dual-hosted git repository.

srk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new def551dfd5 [CLI TOOLS][RTVM] Improve rtvm tool with new options to 
measure native performance (#15818)
def551dfd5 is described below

commit def551dfd50bfff4e9d50108dc4e8027b553b8ec
Author: Siva 
AuthorDate: Fri Sep 29 10:30:20 2023 +0530

[CLI TOOLS][RTVM] Improve rtvm tool with new options to measure native 
performance (#15818)

* [RTVM] Improve rtvm tool with new options to measure native performance

Few fixes and enhancements that affects model loading times
New options to measure performance.

* * review comments

* * review comments
---
 apps/cpp_rtvm/README.md |  22 +
 apps/cpp_rtvm/main.cc   | 199 ++--
 apps/cpp_rtvm/tvm_runner.cc | 129 +---
 apps/cpp_rtvm/tvm_runner.h  |  24 +-
 4 files changed, 316 insertions(+), 58 deletions(-)

diff --git a/apps/cpp_rtvm/README.md b/apps/cpp_rtvm/README.md
index c60a7b0e12..652d46eb58 100644
--- a/apps/cpp_rtvm/README.md
+++ b/apps/cpp_rtvm/README.md
@@ -122,6 +122,11 @@ Command line usage
 --input- Numpy file for the model input (optional and we use random of 
not given)
 --output   - Numpy file name to dump the model output as numpy
 --dump-meta- Dump model meta information
+--pre-compiled - The file name of a file where pre-compiled programs should be 
stored
+--profile  - Profile over all execution
+--dry-run  - Profile after given dry runs, default 10
+--run-count- Profile for given runs, default 50
+--zero-copy- Profile with zero copy api
 
   Example
   ./rtvm --model=keras-resnet50 --device="opencl" --dump-meta
@@ -366,3 +371,20 @@ stored. If the pre-compiled file name was passed to the 
`rtvm` then After method
 `Load`, method `UsePreCompiledProgram` is called. This method loads 
pre-compiled
 programs if the file exists. In opposite case the file will be created and
 pre-compiled programs will be saved to this file.
+
+# Performnace Profiling Options
+The tool has added few options to measure wall clock performance of the given 
model on Target natively.
+--profile : Can turn on the profiling
+--dry-run : The number of times dry run the model before mearuring the 
performance. Default value os 10
+--run-count : The number times to run the model and take an average. Default 
value is 50.
+--zero-copy: This option enables graph runtime zero copy to be used for input 
and output than byte copy to DLTensor.
+
+Performance profile options dumps information summary as given below.
+ Module Load  :27 ms
+ Graph Runtime Create :11 ms
+ Params Read  :15 ms
+ Params Set   :41 ms
+ Pre Compiled Progs Load  :24 ms
+Total Load Time :118 ms
+Average ExecTime:27 ms
+Unload Time :35.9236 ms
diff --git a/apps/cpp_rtvm/main.cc b/apps/cpp_rtvm/main.cc
index c38a5f62bd..dc3cf1c414 100644
--- a/apps/cpp_rtvm/main.cc
+++ b/apps/cpp_rtvm/main.cc
@@ -29,6 +29,7 @@
 #endif
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -54,7 +55,11 @@ static const string kUsage =
 "--input- Numpy file for the model input (optional and we use 
random of not given)\n"
 "--output   - Numpy file name to dump the model output as numpy\n"
 "--dump-meta- Dump model meta information\n"
-"--pre-compiled - The file name of a file where pre-compiled programs 
should be stored"
+"--pre-compiled - The file name of a file where pre-compiled programs 
should be stored\n"
+"--profile  - Profile over all execution\n"
+"--dry-run  - Profile after given dry runs, default 10\n"
+"--run-count- Profile for given runs, default 50\n"
+"--zero-copy- Profile with zero copy api\n"
 "\n"
 "  Example\n"
 "  ./rtvm --model=keras-resnet50 --device=\"opencl\" --dump-meta\n"
@@ -68,6 +73,7 @@ static const string kUsage =
  * \arg input Numpy file for the model input
  * \arg output Numpy file name to dump the model output as numpy
  * \arg pre_compiled File name where pre-compiled programs should be stored
+ * \arg profile Do we profile overall execution
  */
 struct ToolArgs {
   string model;
@@ -75,7 +81,11 @@ struct ToolArgs {
   string input;
   string output;
   string pre_compiled;
-  bool dump_meta = false;
+  bool dump_meta{false};
+  bool profile{false};
+  int dry_run{10};
+  int run_count{50};
+  bool zero_copy{false};
 };
 
 /*!
@@ -89,6 +99,10 @@ void PrintArgs(const ToolArgs& args) {
   LOG(INFO) << "Output= " << args.output;
   LOG(INFO) << "Pre-compiled  = " <&l

[tvm] branch main updated: [FRONTEND] Fix unnecessary pylint errors (#15838)

2023-09-28 Thread srk
This is an automated email from the ASF dual-hosted git repository.

srk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new 8b40f5d028 [FRONTEND] Fix unnecessary pylint errors (#15838)
8b40f5d028 is described below

commit 8b40f5d028632da82bd6cbf83865041d4186b068
Author: Siva 
AuthorDate: Fri Sep 29 10:29:00 2023 +0530

[FRONTEND] Fix unnecessary pylint errors (#15838)

Handle unnecessary pylint errors from these frontends
---
 tests/python/frontend/keras/test_forward.py   | 2 +-
 tests/python/frontend/oneflow/test_forward.py | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/python/frontend/keras/test_forward.py 
b/tests/python/frontend/keras/test_forward.py
index 9d33b15a91..ba3880e186 100644
--- a/tests/python/frontend/keras/test_forward.py
+++ b/tests/python/frontend/keras/test_forward.py
@@ -28,11 +28,11 @@ from tensorflow import keras as tf_keras
 # prevent Keras from using up all gpu memory
 import keras
 
+import pytest
 import tvm
 from tvm import relay
 from tvm.contrib import graph_executor
 import tvm.testing
-import pytest
 
 if tf.executing_eagerly():
 GPUS = tf.config.experimental.list_physical_devices("GPU")
diff --git a/tests/python/frontend/oneflow/test_forward.py 
b/tests/python/frontend/oneflow/test_forward.py
index 7ddc347e86..fda5f1b723 100644
--- a/tests/python/frontend/oneflow/test_forward.py
+++ b/tests/python/frontend/oneflow/test_forward.py
@@ -20,11 +20,11 @@ import os
 
 import numpy as np
 import oneflow as flow
+from packaging import version as package_version
 import tvm
 import tvm.testing
 import tvm.topi.testing
 from tvm import relay
-from packaging import version as package_version
 
 MODEL_HOME = "test_model"
 



[tvm] branch main updated: [OpenCL] Implement save/load pre-compiled programs (#13868)

2023-02-02 Thread srk
This is an automated email from the ASF dual-hosted git repository.

srk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new 099ed94951 [OpenCL] Implement save/load pre-compiled programs (#13868)
099ed94951 is described below

commit 099ed949519f3b6ae182c31ce69496f18a1f60ad
Author: Egor Churaev 
AuthorDate: Fri Feb 3 05:28:35 2023 +0300

[OpenCL] Implement save/load pre-compiled programs (#13868)

* [OpenCL] Implement save/load pre-compiled programs

Using pre-compiled programs might significantly improve inference time
of the first run.

- Added methods `SupportPreCompiledPrograms` which reports if the module
  supports using pre-compiled programs.
- Method `GetPreCompiledPrograms` returns string with bytes of
  pre-compiled programs.
- Method `SetPreCompiledPrograms` allows user to pass pre-compiled
  programs to the module.

* Fix lint

* Apply comment: PackedFunc is used

* Fix build

* Fix CI and rename functions

* Apply comments
---
 apps/cpp_rtvm/README.md|  14 ++
 apps/cpp_rtvm/main.cc  |   9 +
 apps/cpp_rtvm/tvm_runner.cc|  29 ++-
 apps/cpp_rtvm/tvm_runner.h |   4 +
 src/runtime/opencl/opencl_common.h |   2 +
 src/runtime/opencl/opencl_device_api.cc|   4 +-
 src/runtime/opencl/opencl_module.cc|  77 
 .../opencl/opencl_wrapper/opencl_wrapper.cc|  12 ++
 tests/cpp-runtime/opencl/opencl_compile_to_bin.cc  | 208 +
 9 files changed, 356 insertions(+), 3 deletions(-)

diff --git a/apps/cpp_rtvm/README.md b/apps/cpp_rtvm/README.md
index e696153282..c60a7b0e12 100644
--- a/apps/cpp_rtvm/README.md
+++ b/apps/cpp_rtvm/README.md
@@ -352,3 +352,17 @@ python3 -m tvm.driver.tvmc compile --cross-compiler 
${ANDROID_NDK_HOME}/toolchai
 python3 -m tvm.driver.tvmc run --device="cl" keras-resnet50.tar --rpc-key 
${TVM_RPC_KEY} --rpc-tracker {TVM_TRACKER_HOST}:{TVM_TRACKER_PORT} --print-time
 
 ```
+
+# Use pre-compiled OpenCL kernels
+Using pre-compiled programs might significantly improve inference time of the
+first run. E.g. for topology with ~300 kernels compilation time on Adreno was
+about 26 seconds. But after dumping compiled programs to binary files and reuse
+them on the next runs, the compilation time was significantly decreased (more
+than 1000 times) and starts to be around 25 ms.
+
+To use such functionality, the developer have to pass parameter 
`--pre-compiled`
+to the `rtvm` and specify the file name where pre-compiled programs will be
+stored. If the pre-compiled file name was passed to the `rtvm` then After 
method
+`Load`, method `UsePreCompiledProgram` is called. This method loads 
pre-compiled
+programs if the file exists. In opposite case the file will be created and
+pre-compiled programs will be saved to this file.
diff --git a/apps/cpp_rtvm/main.cc b/apps/cpp_rtvm/main.cc
index 31019ee0c9..c38a5f62bd 100644
--- a/apps/cpp_rtvm/main.cc
+++ b/apps/cpp_rtvm/main.cc
@@ -54,6 +54,7 @@ static const string kUsage =
 "--input- Numpy file for the model input (optional and we use 
random of not given)\n"
 "--output   - Numpy file name to dump the model output as numpy\n"
 "--dump-meta- Dump model meta information\n"
+"--pre-compiled - The file name of a file where pre-compiled programs 
should be stored"
 "\n"
 "  Example\n"
 "  ./rtvm --model=keras-resnet50 --device=\"opencl\" --dump-meta\n"
@@ -66,12 +67,14 @@ static const string kUsage =
  * \arg device The target device to use {llvm, cl, ...etc.}
  * \arg input Numpy file for the model input
  * \arg output Numpy file name to dump the model output as numpy
+ * \arg pre_compiled File name where pre-compiled programs should be stored
  */
 struct ToolArgs {
   string model;
   string device;
   string input;
   string output;
+  string pre_compiled;
   bool dump_meta = false;
 };
 
@@ -84,6 +87,7 @@ void PrintArgs(const ToolArgs& args) {
   LOG(INFO) << "Device= " << args.device;
   LOG(INFO) << "Input = " << args.input;
   LOG(INFO) << "Output= " << args.output;
+  LOG(INFO) << "Pre-compiled  = " << args.pre_compiled;
   LOG(INFO) << "Dump Metadata = " << ((args.dump_meta) ? ("True") : ("False"));
 }
 
@@ -172,6 +176,8 @@ void ParseCmdArgs(int argc, char* argv[], struct ToolArgs& 
args) {
   if (!pmeta.empty()) {
 args.dump_meta = true;
   }
+
+  args.pre_compiled = GetCmdOption(argc, argv, "--pre-compiled=");
 }
 
 /*!
@@ -190,6 +196,9 @@ int ExecuteMo

[tvm] branch main updated (18b7dc1dd9 -> 56771a87d1)

2023-01-27 Thread srk
This is an automated email from the ASF dual-hosted git repository.

srk pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


from 18b7dc1dd9 [MetaSchedule] Fix for RewriteLayout + AllocateConst when 
the rank of the rewritten weight doesn't change (#13851)
 add 56771a87d1 [CLML][RUNTIME] Enable more ops in CLML runtime (#13834)

No new revisions were added by this update.

Summary of changes:
 python/tvm/relay/op/contrib/clml.py|  16 -
 src/runtime/contrib/clml/clml_runtime.cc   |  67 ++-
 tests/python/contrib/test_clml/test_ops.py | 102 +
 3 files changed, 183 insertions(+), 2 deletions(-)



[tvm] branch main updated: [CLML][RELAY] Enable Pad and Conv2d layer fusion (#13649)

2022-12-27 Thread srk
This is an automated email from the ASF dual-hosted git repository.

srk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new ece99a243b [CLML][RELAY] Enable Pad and Conv2d layer fusion (#13649)
ece99a243b is described below

commit ece99a243beab1fe879d78868367731d5a516a83
Author: krishnaraj36 <45380557+krishnara...@users.noreply.github.com>
AuthorDate: Wed Dec 28 11:24:11 2022 +0530

[CLML][RELAY] Enable Pad and Conv2d layer fusion (#13649)

* [CLML][RELAY] Enable Pad and Conv2d layer fusion

Enabled clml supported nn.pad+nn.conv2d fusion pattern in clml pattern table

* Fix pad testcase attributes

* Fix the lint error

* Fix the lint error

* Removed redundent check in clml pattern

* Fix the lint error

Co-authored-by: kvegiraj 
---
 python/tvm/relay/op/contrib/clml.py| 21 +
 src/relay/backend/contrib/clml/codegen.cc  |  2 +-
 tests/python/contrib/test_clml/test_ops.py |  4 ++--
 3 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/python/tvm/relay/op/contrib/clml.py 
b/python/tvm/relay/op/contrib/clml.py
index c3d4eb8470..6453b8a06c 100644
--- a/python/tvm/relay/op/contrib/clml.py
+++ b/python/tvm/relay/op/contrib/clml.py
@@ -147,6 +147,23 @@ def clml_pattern_table():
 pattern = pattern.optional(is_op("clip"))
 return pattern
 
+def pad_conv_pattern():
+"""Create a pad with convolution pattern."""
+pattern = is_op("nn.pad")(wildcard(), is_constant())
+pattern = is_op("nn.conv2d")(pattern, is_constant())
+pattern = pattern.optional(lambda x: is_op("nn.bias_add")(x, 
is_constant()))
+pattern = pattern.optional(lambda x: is_op("add")(x, is_constant()))
+pattern = pattern.optional(
+lambda x: is_tuple_get_item(
+is_op("nn.batch_norm")(
+x, is_constant(), is_constant(), is_constant(), 
is_constant()
+)
+)
+)
+pattern = pattern.optional(is_op("nn.relu"))
+pattern = pattern.optional(is_op("clip"))
+return pattern
+
 def batch_norm_pattern():
 """Create a batch norm pattern."""
 pattern = is_op("nn.batch_norm")(
@@ -200,9 +217,11 @@ def clml_pattern_table():
 
 while call.op.name != "nn.conv2d":
 call = call.args[0]
+
 attrs, args = call.attrs, call.args
 if attrs.data_layout != "NCHW":
 return False
+
 if (
 (not clip_found)
 and (attrs.kernel_size[0] == 3)
@@ -211,6 +230,7 @@ def clml_pattern_table():
 and (attrs.channels == attrs.groups)
 ):
 return False
+
 data_typ = args[0].checked_type
 kernel_typ = args[1].checked_type
 is_depthwise = is_depthwise_conv2d(
@@ -246,6 +266,7 @@ def clml_pattern_table():
 return True
 
 return [
+("clml.pad_conv2d", pad_conv_pattern(), check_conv),
 ("clml.conv2d", conv_pattern(), check_conv),
 ("clml.dense", dense_pattern(), check_default_op),
 ("clml.pad", pad_pattern(), check_pad_op),
diff --git a/src/relay/backend/contrib/clml/codegen.cc 
b/src/relay/backend/contrib/clml/codegen.cc
index 9ecec0c453..167c48e1ba 100644
--- a/src/relay/backend/contrib/clml/codegen.cc
+++ b/src/relay/backend/contrib/clml/codegen.cc
@@ -83,7 +83,7 @@ class CLMLJSONSerializer : public 
backend::contrib::JSONSerializer {
 ICHECK(comp.defined()) << "CLML JSON runtime only supports composite 
functions.";
 const std::string name = comp.value();
 std::shared_ptr json_node;
-if (name == "clml.conv2d") {
+if (name == "clml.conv2d" || name == "clml.pad_conv2d") {
   json_node = CreateCompositeConvJSONNode(cn);
 } else if (name == "clml.batch_norm") {
   json_node = CreateBatchNormJSONNode(cn);
diff --git a/tests/python/contrib/test_clml/test_ops.py 
b/tests/python/contrib/test_clml/test_ops.py
index d2431d2dfd..da09715fbe 100644
--- a/tests/python/contrib/test_clml/test_ops.py
+++ b/tests/python/contrib/test_clml/test_ops.py
@@ -45,7 +45,7 @@ def _get_conv_model(
 a = relay.var(next(iter(var)), shape=shape, dtype=dtype)
 input_arr = var[next(iter(var))]
 if has_pad:
-p = ((0, 0), (padding[0], padding[0]), (padding[1], padding[1]), (0, 
0))
+p = ((0, 0), (0, 0), (padding[0], padding[0]), (padding[1], 
padding[1]))
 a = relay.nn.pad(a, pad_width=p)
 padding = (0, 0, 0, 0)
 else:
@@ -97,7 +97,7 @@ def test_conv2d(device, dtype):
 trials = [
 # Normal c

[incubator-tvm] branch master updated: Don't add cast for TF batch norm when type isn't changing (#5731)

2020-06-08 Thread srk
This is an automated email from the ASF dual-hosted git repository.

srk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-tvm.git


The following commit(s) were added to refs/heads/master by this push:
 new 2e1ef8e  Don't add cast for TF batch norm when type isn't changing 
(#5731)
2e1ef8e is described below

commit 2e1ef8e4b7e39bcd0ce68192c38800e2364e0984
Author: Trevor Morris 
AuthorDate: Mon Jun 8 16:43:28 2020 -0700

Don't add cast for TF batch norm when type isn't changing (#5731)
---
 python/tvm/relay/frontend/tensorflow.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/python/tvm/relay/frontend/tensorflow.py 
b/python/tvm/relay/frontend/tensorflow.py
index 201c6ba..50987f9 100644
--- a/python/tvm/relay/frontend/tensorflow.py
+++ b/python/tvm/relay/frontend/tensorflow.py
@@ -1227,7 +1227,7 @@ def _fused_batch_norm():
 attr['data_format'] = attr['data_format'].decode("utf-8")
 if attr['data_format'] == 'NCHW':
 axis = 1
-if 'U' in attr:
+if 'U' in attr and attr['U'].name != attr['T'].name:
 need_cast = True
 inputs[0] = _op.cast(inputs[0], dtype=attr['U'].name)
 # Check if mean and variance are empty



[incubator-tvm] branch master updated (de54754 -> 2ec7caa)

2020-06-05 Thread srk
This is an automated email from the ASF dual-hosted git repository.

srk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-tvm.git.


from de54754  Fix the values for test_fmod since it fails way too often 
otherwise (#5723)
 add 2ec7caa  fix small bug about dense_grad (#5695)

No new revisions were added by this update.

Summary of changes:
 python/tvm/relay/op/_tensor_grad.py   | 7 ---
 tests/python/relay/test_op_grad_level2.py | 1 +
 2 files changed, 5 insertions(+), 3 deletions(-)



[incubator-tvm] branch master updated (3d61dc8 -> 43dcbc6)

2020-06-03 Thread srk
This is an automated email from the ASF dual-hosted git repository.

srk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-tvm.git.


from 3d61dc8  [ONNX]ReduceL1, ReduceL2, ReduceSumSquare, ReduceLogSum ops 
added (#5721)
 add 43dcbc6  [TENSORFLOW]StatefulPartitionedCall/PartitionedCall Ops 
support added  (#5617)

No new revisions were added by this update.

Summary of changes:
 python/tvm/relay/frontend/tensorflow.py  | 126 -
 tests/python/frontend/tensorflow/test_forward.py | 344 ++-
 2 files changed, 465 insertions(+), 5 deletions(-)



[incubator-tvm] branch master updated (030a163 -> 70017ef)

2019-11-21 Thread srk
This is an automated email from the ASF dual-hosted git repository.

srk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-tvm.git.


from 030a163  update_document_after_repository_renamed (#4398)
 add 70017ef  [Golang][Doc] improve the samples and doc (#4385)

No new revisions were added by this update.

Summary of changes:
 golang/README.md   |  8 ++--
 golang/sample/Makefile |  2 +-
 golang/sample/complex.go   |  2 +-
 golang/sample/gen_mobilenet_lib.py | 91 ++
 4 files changed, 98 insertions(+), 5 deletions(-)
 create mode 100644 golang/sample/gen_mobilenet_lib.py



[incubator-tvm] branch master updated (2baf310 -> a226973)

2019-11-17 Thread srk
This is an automated email from the ASF dual-hosted git repository.

srk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-tvm.git.


from 2baf310  [Relay][Frontend][Tensorflow]Add conv2d_transpose (#4300)
 add a226973  [Frontend]Add TensorFlow FloorMod (#4308)

No new revisions were added by this update.

Summary of changes:
 docs/api/python/topi.rst |  2 ++
 docs/frontend/tensorflow.rst |  1 +
 docs/langref/relay_op.rst|  2 ++
 python/tvm/relay/frontend/tensorflow.py  | 10 +--
 python/tvm/relay/op/_tensor.py   |  4 +++
 python/tvm/relay/op/tensor.py| 36 ++
 src/relay/op/tensor/binary.cc| 12 
 tests/python/frontend/tensorflow/test_forward.py | 26 ++--
 tests/python/relay/test_op_level1.py |  4 ++-
 topi/include/topi/broadcast.h| 38 
 topi/python/topi/broadcast.py| 38 
 topi/src/topi.cc |  2 ++
 topi/tests/python/test_topi_broadcast.py | 14 +
 13 files changed, 183 insertions(+), 6 deletions(-)