[tvm] branch nightly updated (cf081d9929 -> def551dfd5)
This is an automated email from the ASF dual-hosted git repository. github-bot pushed a change to branch nightly in repository https://gitbox.apache.org/repos/asf/tvm.git from cf081d9929 [BugFix][CPP] Fix cpp deploy bug (#15773) add 9d8e6fda50 [ADRENO] Minor changes for Adreno docs and help scripts (#15830) add 8b40f5d028 [FRONTEND] Fix unnecessary pylint errors (#15838) add def551dfd5 [CLI TOOLS][RTVM] Improve rtvm tool with new options to measure native performance (#15818) No new revisions were added by this update. Summary of changes: apps/cpp_rtvm/README.md | 22 +++ apps/cpp_rtvm/main.cc | 199 ++ apps/cpp_rtvm/tvm_runner.cc | 129 + apps/cpp_rtvm/tvm_runner.h| 24 +++- docs/how_to/deploy/adreno.rst | 2 +- tests/python/frontend/keras/test_forward.py | 2 +- tests/python/frontend/oneflow/test_forward.py | 2 +- tests/scripts/setup-adreno-env.sh | 18 ++- tests/scripts/task_config_build_adreno.sh | 2 + 9 files changed, 334 insertions(+), 66 deletions(-)
[tvm] branch main updated: [CLI TOOLS][RTVM] Improve rtvm tool with new options to measure native performance (#15818)
This is an automated email from the ASF dual-hosted git repository. srk pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/tvm.git The following commit(s) were added to refs/heads/main by this push: new def551dfd5 [CLI TOOLS][RTVM] Improve rtvm tool with new options to measure native performance (#15818) def551dfd5 is described below commit def551dfd50bfff4e9d50108dc4e8027b553b8ec Author: Siva AuthorDate: Fri Sep 29 10:30:20 2023 +0530 [CLI TOOLS][RTVM] Improve rtvm tool with new options to measure native performance (#15818) * [RTVM] Improve rtvm tool with new options to measure native performance Few fixes and enhancements that affects model loading times New options to measure performance. * * review comments * * review comments --- apps/cpp_rtvm/README.md | 22 + apps/cpp_rtvm/main.cc | 199 ++-- apps/cpp_rtvm/tvm_runner.cc | 129 +--- apps/cpp_rtvm/tvm_runner.h | 24 +- 4 files changed, 316 insertions(+), 58 deletions(-) diff --git a/apps/cpp_rtvm/README.md b/apps/cpp_rtvm/README.md index c60a7b0e12..652d46eb58 100644 --- a/apps/cpp_rtvm/README.md +++ b/apps/cpp_rtvm/README.md @@ -122,6 +122,11 @@ Command line usage --input- Numpy file for the model input (optional and we use random of not given) --output - Numpy file name to dump the model output as numpy --dump-meta- Dump model meta information +--pre-compiled - The file name of a file where pre-compiled programs should be stored +--profile - Profile over all execution +--dry-run - Profile after given dry runs, default 10 +--run-count- Profile for given runs, default 50 +--zero-copy- Profile with zero copy api Example ./rtvm --model=keras-resnet50 --device="opencl" --dump-meta @@ -366,3 +371,20 @@ stored. If the pre-compiled file name was passed to the `rtvm` then After method `Load`, method `UsePreCompiledProgram` is called. This method loads pre-compiled programs if the file exists. In opposite case the file will be created and pre-compiled programs will be saved to this file. + +# Performnace Profiling Options +The tool has added few options to measure wall clock performance of the given model on Target natively. +--profile : Can turn on the profiling +--dry-run : The number of times dry run the model before mearuring the performance. Default value os 10 +--run-count : The number times to run the model and take an average. Default value is 50. +--zero-copy: This option enables graph runtime zero copy to be used for input and output than byte copy to DLTensor. + +Performance profile options dumps information summary as given below. + Module Load :27 ms + Graph Runtime Create :11 ms + Params Read :15 ms + Params Set :41 ms + Pre Compiled Progs Load :24 ms +Total Load Time :118 ms +Average ExecTime:27 ms +Unload Time :35.9236 ms diff --git a/apps/cpp_rtvm/main.cc b/apps/cpp_rtvm/main.cc index c38a5f62bd..dc3cf1c414 100644 --- a/apps/cpp_rtvm/main.cc +++ b/apps/cpp_rtvm/main.cc @@ -29,6 +29,7 @@ #endif #include +#include #include #include #include @@ -54,7 +55,11 @@ static const string kUsage = "--input- Numpy file for the model input (optional and we use random of not given)\n" "--output - Numpy file name to dump the model output as numpy\n" "--dump-meta- Dump model meta information\n" -"--pre-compiled - The file name of a file where pre-compiled programs should be stored" +"--pre-compiled - The file name of a file where pre-compiled programs should be stored\n" +"--profile - Profile over all execution\n" +"--dry-run - Profile after given dry runs, default 10\n" +"--run-count- Profile for given runs, default 50\n" +"--zero-copy- Profile with zero copy api\n" "\n" " Example\n" " ./rtvm --model=keras-resnet50 --device=\"opencl\" --dump-meta\n" @@ -68,6 +73,7 @@ static const string kUsage = * \arg input Numpy file for the model input * \arg output Numpy file name to dump the model output as numpy * \arg pre_compiled File name where pre-compiled programs should be stored + * \arg profile Do we profile overall execution */ struct ToolArgs { string model; @@ -75,7 +81,11 @@ struct ToolArgs { string input; string output; string pre_compiled; - bool dump_meta = false; + bool dump_meta{false}; + bool profile{false}; + int dry_run{10}; + int run_count{50}; + bool zero_copy{false}; }; /*! @@ -89,6 +99,10 @@ void PrintArgs(const ToolArgs& args) { LOG(INFO) << "Output= " << args.output; LOG(INFO) << "Pre-compiled = " << args.pre_compiled; LOG(INFO) << "Dump Metadata = " << ((args.dump_meta) ? ("True") : ("False")); + LOG(INFO) << "Profile = " << ((args.profile) ? ("True") : ("False")); +
[GitHub] [tvm] srkreddy1238 merged pull request #15818: [CLI TOOLS][RTVM] Improve rtvm tool with new options to measure native performance
srkreddy1238 merged PR #15818: URL: https://github.com/apache/tvm/pull/15818 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[tvm] branch main updated: [FRONTEND] Fix unnecessary pylint errors (#15838)
This is an automated email from the ASF dual-hosted git repository. srk pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/tvm.git The following commit(s) were added to refs/heads/main by this push: new 8b40f5d028 [FRONTEND] Fix unnecessary pylint errors (#15838) 8b40f5d028 is described below commit 8b40f5d028632da82bd6cbf83865041d4186b068 Author: Siva AuthorDate: Fri Sep 29 10:29:00 2023 +0530 [FRONTEND] Fix unnecessary pylint errors (#15838) Handle unnecessary pylint errors from these frontends --- tests/python/frontend/keras/test_forward.py | 2 +- tests/python/frontend/oneflow/test_forward.py | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tests/python/frontend/keras/test_forward.py b/tests/python/frontend/keras/test_forward.py index 9d33b15a91..ba3880e186 100644 --- a/tests/python/frontend/keras/test_forward.py +++ b/tests/python/frontend/keras/test_forward.py @@ -28,11 +28,11 @@ from tensorflow import keras as tf_keras # prevent Keras from using up all gpu memory import keras +import pytest import tvm from tvm import relay from tvm.contrib import graph_executor import tvm.testing -import pytest if tf.executing_eagerly(): GPUS = tf.config.experimental.list_physical_devices("GPU") diff --git a/tests/python/frontend/oneflow/test_forward.py b/tests/python/frontend/oneflow/test_forward.py index 7ddc347e86..fda5f1b723 100644 --- a/tests/python/frontend/oneflow/test_forward.py +++ b/tests/python/frontend/oneflow/test_forward.py @@ -20,11 +20,11 @@ import os import numpy as np import oneflow as flow +from packaging import version as package_version import tvm import tvm.testing import tvm.topi.testing from tvm import relay -from packaging import version as package_version MODEL_HOME = "test_model"
[GitHub] [tvm] srkreddy1238 merged pull request #15838: [FRONTEND] Fix unnecessary pylint errors
srkreddy1238 merged PR #15838: URL: https://github.com/apache/tvm/pull/15838 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] vinx13 commented on pull request #15839: [Unity] Include last kernel launch in captured CudaGraph
vinx13 commented on PR #15839: URL: https://github.com/apache/tvm/pull/15839#issuecomment-1740116164 That's correct. It's relying on the current behavior of `StaticBlockPlanMemory` that doesn't rewrite the input/output tensors. So only tensors from `R.builtin.alloc_tensor` can be returned (similar to the current implementation of this pass, `StaticBlockPlanMemory` doesn't rewrite the body of `SeqExpr`). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] Lunderberg commented on pull request #15839: [Unity] Include last kernel launch in captured CudaGraph
Lunderberg commented on PR #15839: URL: https://github.com/apache/tvm/pull/15839#issuecomment-1740113779 I'm afraid I don't quite understand. Either `R.memory.alloc_storage` or `R.builtin.alloc_tensor` could be independent allocations that could be returned as output. (e.g. A static memory plan allocations using `R.memory.alloc_storage`, reuses that buffer several times internally, before returning it with the final result.) Oh, I think I see. That would mean that its relying on the current behavior of `StaticBlockPlanMemory`, and assumes that the user-provided parameters are identical across each call. Effectively, the last kernel launch would be also serving as a memcpy from the memory used by the captured cudagraph to other memory space. Is that understanding correct? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] zxybazh commented on a diff in pull request #15823: [Unity] Add Relax multi-device e2e cases
zxybazh commented on code in PR #15823: URL: https://github.com/apache/tvm/pull/15823#discussion_r1340704043 ## src/tir/transforms/default_gpu_schedule.cc: ## @@ -98,24 +98,48 @@ IRModule MarkScheduled(const IRModule& mod) { mod->type_definitions, // type_definitions mod->import_set_, // import_set mod->source_map,// map - mod->attrs);// attrs); + mod->attrs, // attrs + mod->global_infos); // global_infos +} + +bool IsScheduledOnGPU(const BaseFunc& func) { + Optional target = func->attrs.GetAttr(tvm::attr::kTarget); + if (target.defined()) { Review Comment: Do we need to also check target from context here, i.e., GPU target defined in context but not in the function attribute? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] yongwww commented on pull request #15772: [Unity][QNN] Add QNN quantize/dequantize ops
yongwww commented on PR #15772: URL: https://github.com/apache/tvm/pull/15772#issuecomment-1740031754 cc: @slyubomirsky @psrivas2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] yongwww commented on pull request #15686: [Unity] Add new Relax annotation ops: smooth and absmax
yongwww commented on PR #15686: URL: https://github.com/apache/tvm/pull/15686#issuecomment-1739990584 @ibsidorenko thanks for the effort! looks we might not need use these two ops, shall we close this pr? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] vinx13 commented on pull request #15839: [Unity] Include last kernel launch in captured CudaGraph
vinx13 commented on PR #15839: URL: https://github.com/apache/tvm/pull/15839#issuecomment-1739959284 I didn't consider `R.builtin.alloc_tensor` previously because it can be either dynamic allocation or memory visible to external (tensors for input/output). The implication here is running the model with cuda graph enabled will overwrite the output tensors of previous runs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] masahi commented on pull request #15839: [Unity] Include last kernel launch in captured CudaGraph
masahi commented on PR #15839: URL: https://github.com/apache/tvm/pull/15839#issuecomment-1739872085 cc @vinx13 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[tvm] branch unity updated: [Unity] Support Padding Reversal in Alter-Op pass (#15679)
This is an automated email from the ASF dual-hosted git repository. masahi pushed a commit to branch unity in repository https://gitbox.apache.org/repos/asf/tvm.git The following commit(s) were added to refs/heads/unity by this push: new 225d067fb8 [Unity] Support Padding Reversal in Alter-Op pass (#15679) 225d067fb8 is described below commit 225d067fb85e37723bbb3a5ca51f872c0e0f6abf Author: rutkoor <120498024+rutk...@users.noreply.github.com> AuthorDate: Thu Sep 28 23:19:01 2023 +0530 [Unity] Support Padding Reversal in Alter-Op pass (#15679) * Support for padding Reversal in Alter-op pass * Removing lambda from remove_pad te.compute * Applying clang-format on .cc file * Removing extra line from manipulate.h file --- .../tvm/relax/transform/legalize_ops/manipulate.py | 10 ++- python/tvm/relax/transform/transform.py| 2 +- src/relax/transform/alter_op_impl.cc | 81 +++--- src/te/operation/compute_op.cc | 6 +- tests/python/relax/test_transform_alter_op_impl.py | 80 + 5 files changed, 153 insertions(+), 26 deletions(-) diff --git a/python/tvm/relax/transform/legalize_ops/manipulate.py b/python/tvm/relax/transform/legalize_ops/manipulate.py index 4e06a0df39..e56240dc0d 100644 --- a/python/tvm/relax/transform/legalize_ops/manipulate.py +++ b/python/tvm/relax/transform/legalize_ops/manipulate.py @@ -182,7 +182,15 @@ def _layout_transform(bb: BlockBuilder, call: Call) -> Expr: ) index_map: tvm.tir.IndexMap = call.attrs.index_map -pad_value = call.attrs.pad_value.value +pad_value = call.attrs.pad_value +if pad_value is not None: +pad_value = pad_value.value +else: +if "int" in call.args[0].struct_info.dtype: +pad_value = int(0) +else: +pad_value = float(0.0) + axis_separators: tvm.tir.IndexMap.AXIS_SEPARATOR = call.attrs.axis_separators # Convert to list from array axis_separators = list(map(lambda x: x.value, axis_separators)) diff --git a/python/tvm/relax/transform/transform.py b/python/tvm/relax/transform/transform.py index 2a06d5098e..72a9966a4b 100644 --- a/python/tvm/relax/transform/transform.py +++ b/python/tvm/relax/transform/transform.py @@ -1034,7 +1034,7 @@ def AlterOpImpl( l = [] for transform in transform_list: if isinstance(transform, Callable): -transform = IndexMap.from_func(transform) +transform = IndexMap.from_func_with_separators(transform)[0] l.append(transform) op_buffer_transforms[operator_name] = l diff --git a/src/relax/transform/alter_op_impl.cc b/src/relax/transform/alter_op_impl.cc index 9813c4ed24..98d64dd7a8 100644 --- a/src/relax/transform/alter_op_impl.cc +++ b/src/relax/transform/alter_op_impl.cc @@ -30,7 +30,11 @@ #include #include #include +#include #include +#include + +#include "../../te/operation/create_primfunc.h" namespace tvm { namespace relax { @@ -162,8 +166,18 @@ class AlterOpImplMutator : public ExprMutator { return arr_tensor_sinfo; } + bool IsScalarConstant(const Expr& expr) { +if (expr->IsInstance() && expr.as()->is_scalar()) { + return true; +} +return false; + } + Expr TransformLayout(const Expr& expr, const IndexMap& index_map, - const Array axis_separators) { + const Array& axis_separators) { +if (IsScalarConstant(expr) || index_map.get() == nullptr) { + return expr; +} ObjectPtr attrs = make_object(); // We want to avoid two layout_transform ops to share the same index map even if they are // identical. The scope of vars used in index map initial indices is local to the op. Not doing @@ -173,19 +187,70 @@ class AlterOpImplMutator : public ExprMutator { return Call(layout_transform_op_, {expr}, Attrs{std::move(attrs)}, {}); } + /*! + * \brief Adds the \p remove_pad op to the module if it has not already been added before. + * \returns The global var associated with the remove_pad PrimFunc. + */ + GlobalVar GetOrCreateRemovePadOp(const Array& old_shape, const DataType& dtype) { +int t_shape = old_shape.size(); +if (remove_pad_map_.count(t_shape) != 0) { + return remove_pad_map_[t_shape]; +} +// Create dynamic shapes for input and output tensors +Array dyn_padded_shape, dyn_old_shape; +for (int i = 0; i < t_shape; i++) { + tir::Var var1("p" + std::to_string(i), old_shape[i].dtype()); + tir::Var var2("i" + std::to_string(i), old_shape[i].dtype()); + dyn_padded_shape.push_back(var1); + dyn_old_shape.push_back(var2); +} + +// Input tensor of remove_pad op +te::Tensor placeholder_tensor = te::placeholder(dyn_padded_shape, dtype, "input"); +// Output tensor of remove_pad op +te::Tensor output_tensor = te::compute( +dyn_old_shape, +
[GitHub] [tvm] masahi merged pull request #15679: [Unity] Support Padding Reversal in Alter-Op pass
masahi merged PR #15679: URL: https://github.com/apache/tvm/pull/15679 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] Lunderberg commented on pull request #15839: [Unity] Include last kernel launch in captured CudaGraph
Lunderberg commented on PR #15839: URL: https://github.com/apache/tvm/pull/15839#issuecomment-1739748683 As part of this implementation, moving the `R.memory.kill_tensor` and `R.memory.kill_storage` to be generated in a later `KillAfterLastUse` pass meant that the `RemoveCUDAGraph` pass did not need to track these calls. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] Lunderberg opened a new pull request, #15839: [Unity] Include last kernel launch in captured CudaGraph
Lunderberg opened a new pull request, #15839: URL: https://github.com/apache/tvm/pull/15839 Prior to this commit, the last kernel launch would not be included in a captured CUDA graph. This commit updates `RewriteCUDAGraph` to include the last kernel launch. The previous implementation assumed that any calls to `R.builtin.alloc_tensor` that remain after `StaticPlanBlockMemory` are dynamic allocations. This is not the case, as the allocation of a static-shaped output tensor may still use `R.builtin.alloc_tensor`. The primary change of this commit was to update `RewriteCUDAGraph` to check for static allocations directly, rather than inferring a static allocation based on the operation being used. This change triggered an additional bug, in which the previous implementation only checked for output variables if they occurred as part of a `VarBinding`, and not if they occurred as the body of a `SeqExpr`. As a result, a captured CUDA graph whose output was immediately used as the output of the containing Relax function would contain an undefined variable. This commit updates `RewriteCUDAGraph` to operate on a `SeqExpr` rather than a `BindingBlock`, so that the `SeqExprNode::body` may be inspected for output variables. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] tlopex commented on issue #15148: [Tracking Issue] TFLite operator support
tlopex commented on issue #15148: URL: https://github.com/apache/tvm/issues/15148#issuecomment-1739729430 @p3achyjr ``` # with respect to its fp32 input range, defined in fake_quant. # s = 255/(fmax-fmin); m = -fmin*s (the zero point) for i in input_arrays: try: quant_scale = 255 / (input_range[i][1] - input_range[i][0]) except ZeroDivisionError: print("Min and max of the input range for tensor " + i + " can't be equal") mean = -input_range[i][0] * quant_scale input_stats[i] = (mean, quant_scale) ``` Here, if the range is symmetrical, the zero point will always be 128. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] tlopex commented on issue #15148: [Tracking Issue] TFLite operator support
tlopex commented on issue #15148: URL: https://github.com/apache/tvm/issues/15148#issuecomment-1739727222 @p3achyjr ```# calculate the mean and quantization scale for every input tensor, # with respect to its fp32 input range, defined in fake_quant. # s = 255/(fmax-fmin); m = -fmin*s (the zero point) for i in input_arrays: try: quant_scale = 255 / (input_range[i][1] - input_range[i][0]) except ZeroDivisionError: print("Min and max of the input range for tensor " + i + " can't be equal") mean = -input_range[i][0] * quant_scale input_stats[i] = (mean, quant_scale)``` Here, if the range is symmetrical, the zero point will always be 128. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] srkreddy1238 commented on a diff in pull request #15833: [VM] Memory Manager moved up to runtime
srkreddy1238 commented on code in PR #15833: URL: https://github.com/apache/tvm/pull/15833#discussion_r1340435814 ## include/tvm/runtime/memory/memory_manager.h: ## @@ -37,15 +37,15 @@ namespace tvm { namespace runtime { -namespace vm { +namespace memory { struct Buffer { /*! \brief The pointer to the allocated block of memory. */ void* data{nullptr}; /*! \brief The size of the block. */ size_t size{0}; /*! \brief The shape of the tensor. */ - std::vector shape; + ShapeTuple shape; Review Comment: Agreed. Even the two stage memory allocation for OpenCL texture doesn't need it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] yongwww commented on a diff in pull request #15833: [VM] Memory Manager moved up to runtime
yongwww commented on code in PR #15833: URL: https://github.com/apache/tvm/pull/15833#discussion_r1340399667 ## include/tvm/runtime/memory/memory_manager.h: ## @@ -37,15 +37,15 @@ namespace tvm { namespace runtime { -namespace vm { +namespace memory { struct Buffer { /*! \brief The pointer to the allocated block of memory. */ void* data{nullptr}; /*! \brief The size of the block. */ size_t size{0}; /*! \brief The shape of the tensor. */ - std::vector shape; + ShapeTuple shape; Review Comment: we can remove this member `ShapeTuple shape;`, and related `buf.shape.push_back(shape[i]);` in naive_allocator.h. Looks it is not used. ## src/runtime/memory/memory_manager.cc: ## @@ -154,29 +154,30 @@ Allocator* MemoryManager::GetAllocator(Device dev) { return it->second.get(); } -NDArray Allocator::Empty(std::vector shape, DLDataType dtype, DLDevice dev) { +NDArray Allocator::Empty(ShapeTuple shape, DLDataType dtype, DLDevice dev, + Optional mem_scope) { VerifyDataType(dtype); NDArray::Container* container = new NDArray::Container(nullptr, shape, dtype, dev); container->SetDeleter(BufferDeleter); - size_t size = GetDataSize(container->dl_tensor); + size_t size = DeviceAPI::Get(dev)->GetDataSize(container->dl_tensor); size_t alignment = GetDataAlignment(container->dl_tensor); Buffer* buffer = new Buffer; - *buffer = this->Alloc(size, alignment, dtype); + if (!mem_scope.defined() || mem_scope == "global") { +*buffer = this->Alloc(size, alignment, dtype); + } else { +*buffer = this->Alloc(shape, dtype, mem_scope.value()); + } container->manager_ctx = reinterpret_cast(buffer); container->dl_tensor.data = buffer->data; return NDArray(GetObjectPtr(container)); } Review Comment: it would be helpful to add the definition of Storage as below in in memory_manager.cc, it was not defined yet. ``` Storage::Storage(Buffer buffer) { auto n = make_object(); n->buffer = std::move(buffer); data_ = std::move(n); } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] Lunderberg merged pull request #15810: [Unity] Implement relax.transform.KillAfterLastUse
Lunderberg merged PR #15810: URL: https://github.com/apache/tvm/pull/15810 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[tvm] branch unity updated: [Unity] Implement relax.transform.KillAfterLastUse (#15810)
This is an automated email from the ASF dual-hosted git repository. lunderberg pushed a commit to branch unity in repository https://gitbox.apache.org/repos/asf/tvm.git The following commit(s) were added to refs/heads/unity by this push: new aa4587feb5 [Unity] Implement relax.transform.KillAfterLastUse (#15810) aa4587feb5 is described below commit aa4587feb5103927d95e5e931149debd0a0aeafc Author: Eric Lunderberg AuthorDate: Thu Sep 28 08:54:22 2023 -0500 [Unity] Implement relax.transform.KillAfterLastUse (#15810) * [Unity][Util] Expose CanonicalizeBindings as internal utility * [Unity] Implement relax.transform.KillAfterLastUse Prior to this commit, intermediate objects produced while executing a Relax function would persist until the end of the Relax function. While re-use of static allocations is handled by the `StaticPlanBlockMemory` transform, re-use of dynamic allocations is handled by the `relax_vm::PooledAllocator`. For large Relax functions representing end-to-end model execution, releasing memory from the VM registers to the `relax_vm::PooledAllocator` at the end of the function call may be insufficient. This commit introduces a new pass, `relax.transform.KillAfterLastUse`, which identifies the last usage of each Relax variable and inserts a `relax.memory.kill_tensor`, `relax.memory.kill_storage`, or `relax.vm.kill_object` call depending on the object type. This insertion is suppressed if a Relax variables is already killed, such as static allocations and tensors tracked by `StaticPlanBlockMemory`. * Avoid calling R.vm.kill_object on objects not in registers --- python/tvm/relax/transform/transform.py| 10 + python/tvm/relax/vm_build.py | 1 + src/relax/transform/kill_after_last_use.cc | 289 + src/relax/transform/utils.h| 13 ++ tests/python/relax/test_kill_after_last_use.py | 55 + 5 files changed, 368 insertions(+) diff --git a/python/tvm/relax/transform/transform.py b/python/tvm/relax/transform/transform.py index 13874aa044..2a06d5098e 100644 --- a/python/tvm/relax/transform/transform.py +++ b/python/tvm/relax/transform/transform.py @@ -368,6 +368,16 @@ def StaticPlanBlockMemory() -> tvm.ir.transform.Pass: return _ffi_api.StaticPlanBlockMemory() # type: ignore +def KillAfterLastUse() -> tvm.ir.transform.Pass: +"""Drop all tensor/storage objects after last use + +Returns +--- +ret : tvm.ir.transform.Pass +""" +return _ffi_api.KillAfterLastUse() # type: ignore + + def VMBuiltinLower() -> tvm.ir.transform.Pass: """Lowering generic intrinsic to VM intrinsics. diff --git a/python/tvm/relax/vm_build.py b/python/tvm/relax/vm_build.py index d8679522db..142da5c451 100644 --- a/python/tvm/relax/vm_build.py +++ b/python/tvm/relax/vm_build.py @@ -310,6 +310,7 @@ def build( passes.append(relax.transform.RemovePurityChecking()) passes.append(relax.transform.CallTIRRewrite()) passes.append(relax.transform.StaticPlanBlockMemory()) +passes.append(relax.transform.KillAfterLastUse()) if tvm.transform.PassContext.current().config.get("relax.backend.use_cuda_graph", False): passes.append(relax.transform.RewriteCUDAGraph()) diff --git a/src/relax/transform/kill_after_last_use.cc b/src/relax/transform/kill_after_last_use.cc new file mode 100644 index 00..0f28c6c2b9 --- /dev/null +++ b/src/relax/transform/kill_after_last_use.cc @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +/*! + * \file src/relax/transform/kill_after_last_use.cc + * \brief Kill storage/tensor objects after last use, if not already killed + */ +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +#include "utils.h" + +namespace tvm { +namespace relax { + +class UnusedTrivialBindingRemover : public ExprMutator { + public: + static Expr Apply(Expr expr) { +struct UsedCollector : ExprVisitor { + void VisitExpr_(const VarNode* val) override { used.insert(val); } + void VisitExpr_(const DataflowVarNode* val) override { +
[GitHub] [tvm] Lunderberg commented on issue #15148: [Tracking Issue] TFLite operator support
Lunderberg commented on issue #15148: URL: https://github.com/apache/tvm/issues/15148#issuecomment-1739257241 > I see what you're saying--maybe we can add min/max overrides for `_test_forward_elemwise_quantized`. That's what I'm thinking as well. It looks like it currently uses the same range for both quantization and for data generation. I think it will need to override the data generation range to exclude zero from the denominator, but to keep zero in the quantization range as zero may occur in the output. > I'm surprised that `div` and `floor_div` aren't failing in this case though, since the rhs can generate 0s :/. Agreed, as I would expect the same problem to effect any operator with a restricted domain. My guess is that there's some optimization that assumes the inputs to be valid (a legal assumption, as the output is typically undefined when the denominator is zero), and that that optimization is affecting floormod differently from floordiv. It probably would be good to track that optimization down at some point, if it occurs at the TVM level, but I don't think that should delay the re-enabling of the unit test. > May I ask how you're running these tests multiple times? It's a bit of a hacky way to do so. I commented out everything in `test_all_elemwise` except for the `_test_forward_elemwise_quantized(_test_floor_mod)` line, then added a parametrized pytest fixture to the file. When running pytest as usual (`python3 -mpytest -sv tests/python/frontend/tflite/test_forward.py::test_all_elemwise`), it then repeats every test the number of times specified. ```python import pytest @pytest.fixture(params=list(range(100)), autouse=True) def repeat_all_tests(request): return request.param ``` I suppose I could have just made a for loop, but I was lazy and this let me use pytests pass/fail counter instead of making my own :P. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] srkreddy1238 commented on pull request #15818: [CLI TOOLS][RTVM] Improve rtvm tool with new options to measure native performance
srkreddy1238 commented on PR #15818: URL: https://github.com/apache/tvm/pull/15818#issuecomment-1739147760 Never mind. Wile I was relooking into the code I found a reference the deleted object and fixed it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] echuraev commented on a diff in pull request #15818: [CLI TOOLS][RTVM] Improve rtvm tool with new options to measure native performance
echuraev commented on code in PR #15818: URL: https://github.com/apache/tvm/pull/15818#discussion_r1339925369 ## apps/cpp_rtvm/main.cc: ## @@ -350,43 +350,46 @@ int ExecuteModel(ToolArgs& args) { } else if (!args.input.empty() && !args.output.empty()) { LOG(INFO) << "Executing with Input:" << args.input << " Output:" << args.output; // Set Input from Numpy Input -runner->SetInput(args.input); +runner.SetInput(args.input); // Run the model -runner->Run(); +runner.Run(); // Get Output as Numpy dump -runner->GetOutput(args.output); +runner.GetOutput(args.output); } else { LOG(INFO) << "Executing dry run ... "; // Set random input for all inputs for (auto& elem : mInfo.input_info) { LOG(INFO) << "Set Random Input for :" << elem.first; auto shape = elem.second.first; - size_t ssize = runner->GetInputMemSize(elem.first); + size_t ssize = runner.GetInputMemSize(elem.first); char* data = (char*)malloc(ssize); LOG(INFO) << "Random Input Size:" << ssize << " bytes"; - runner->SetInput(elem.first, data); + runner.SetInput(elem.first, data); free(data); } // Run the model -runner->Run(); +runner.Run(); // Get Output and dump few values for (auto& elem : mInfo.output_info) { LOG(INFO) << "Get Output for :" << elem.first; auto shape = elem.second.first; - size_t ssize = runner->GetOutputMemSize(elem.first); + size_t ssize = runner.GetOutputMemSize(elem.first); char* data = (char*)malloc(ssize); - runner->GetOutput(elem.first, data); + runner.GetOutput(elem.first, data); LOG(INFO) << "Output Size:" << ssize << " bytes"; free(data); } } + + if (args.profile) { +// Print Stats +runner.PrintStats(); + } auto tstart = std::chrono::high_resolution_clock::now(); - delete runner; + runner.~TVMRunner(); Review Comment: Ah.. My bad, I missed it during reviewing... Sorry. Probably in this case dynamic allocation should be more preferable in terms of readability and syntax. So we can leave it as it is or change back to the dynamic allocation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] srkreddy1238 commented on pull request #15818: [CLI TOOLS][RTVM] Improve rtvm tool with new options to measure native performance
srkreddy1238 commented on PR #15818: URL: https://github.com/apache/tvm/pull/15818#issuecomment-1738870461 @echuraev There are some requirements and concerns on high load and unload timings with TVM in general. With PreCompiled bin we could reduce the majority and there exist some more scope with params loading. Unload time is also a bit significant. I am working on few improvements around these areas ... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] srkreddy1238 commented on a diff in pull request #15818: [CLI TOOLS][RTVM] Improve rtvm tool with new options to measure native performance
srkreddy1238 commented on code in PR #15818: URL: https://github.com/apache/tvm/pull/15818#discussion_r1339876548 ## apps/cpp_rtvm/main.cc: ## @@ -350,43 +350,46 @@ int ExecuteModel(ToolArgs& args) { } else if (!args.input.empty() && !args.output.empty()) { LOG(INFO) << "Executing with Input:" << args.input << " Output:" << args.output; // Set Input from Numpy Input -runner->SetInput(args.input); +runner.SetInput(args.input); // Run the model -runner->Run(); +runner.Run(); // Get Output as Numpy dump -runner->GetOutput(args.output); +runner.GetOutput(args.output); } else { LOG(INFO) << "Executing dry run ... "; // Set random input for all inputs for (auto& elem : mInfo.input_info) { LOG(INFO) << "Set Random Input for :" << elem.first; auto shape = elem.second.first; - size_t ssize = runner->GetInputMemSize(elem.first); + size_t ssize = runner.GetInputMemSize(elem.first); char* data = (char*)malloc(ssize); LOG(INFO) << "Random Input Size:" << ssize << " bytes"; - runner->SetInput(elem.first, data); + runner.SetInput(elem.first, data); free(data); } // Run the model -runner->Run(); +runner.Run(); // Get Output and dump few values for (auto& elem : mInfo.output_info) { LOG(INFO) << "Get Output for :" << elem.first; auto shape = elem.second.first; - size_t ssize = runner->GetOutputMemSize(elem.first); + size_t ssize = runner.GetOutputMemSize(elem.first); char* data = (char*)malloc(ssize); - runner->GetOutput(elem.first, data); + runner.GetOutput(elem.first, data); LOG(INFO) << "Output Size:" << ssize << " bytes"; free(data); } } + + if (args.profile) { +// Print Stats +runner.PrintStats(); + } auto tstart = std::chrono::high_resolution_clock::now(); - delete runner; + runner.~TVMRunner(); Review Comment: We are timing it to measure the unload time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] echuraev merged pull request #15830: [ADRENO] Minor changes for Adreno docs and help scripts
echuraev merged PR #15830: URL: https://github.com/apache/tvm/pull/15830 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[tvm] branch main updated: [ADRENO] Minor changes for Adreno docs and help scripts (#15830)
This is an automated email from the ASF dual-hosted git repository. echuraev pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/tvm.git The following commit(s) were added to refs/heads/main by this push: new 9d8e6fda50 [ADRENO] Minor changes for Adreno docs and help scripts (#15830) 9d8e6fda50 is described below commit 9d8e6fda50bce14bf597de1f87711230e6001e4e Author: Siva AuthorDate: Thu Sep 28 15:38:30 2023 +0530 [ADRENO] Minor changes for Adreno docs and help scripts (#15830) [ADRENO] Minor changes for Adreno docs and help scripts NCHW is mandatory layout for CLML offload. Updated the docs. CI scripts will keep OpenCL enbaled as fallback always. Enable configurable device bind ports. Helps in multi user environments. --- docs/how_to/deploy/adreno.rst | 2 +- tests/scripts/setup-adreno-env.sh | 18 +- tests/scripts/task_config_build_adreno.sh | 2 ++ 3 files changed, 16 insertions(+), 6 deletions(-) diff --git a/docs/how_to/deploy/adreno.rst b/docs/how_to/deploy/adreno.rst index ed016a3ff7..f0b8c6f757 100644 --- a/docs/how_to/deploy/adreno.rst +++ b/docs/how_to/deploy/adreno.rst @@ -432,7 +432,7 @@ as the OpenCL path is fallback option for any operator didn't go through OpenCLM python3 -m tvm.driver.tvmc compile \ --cross-compiler ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang \ - --target="opencl, clml, llvm" --target-llvm-mtriple aarch64-linux-gnu --target-opencl-device adreno \ + --target="opencl, clml, llvm" --desired-layout NCHW --target-llvm-mtriple aarch64-linux-gnu --target-opencl-device adreno \ --tuning-records keras-resnet50.log -o keras-resnet50.tar resnet50.h5 On successful compilation, above command produce ``keras-resnet50.tar``. diff --git a/tests/scripts/setup-adreno-env.sh b/tests/scripts/setup-adreno-env.sh index 55a92c5f61..15c124a0f0 100755 --- a/tests/scripts/setup-adreno-env.sh +++ b/tests/scripts/setup-adreno-env.sh @@ -20,12 +20,13 @@ ENVIRONMENT="" RPC_PORT="" ADB_SERIAL="" +LISTEN_PORT=5000 function usage() { echo "Helper script to setup the environment for Tracker, RPC Device and for application" echo "Usage (Help) : source setup-adreno-env.sh -h" echo "Usage (Tracker): source setup-adreno-env.sh -e tracker -p " -echo "Usage (Device): source setup-adreno-env.sh -e device -p -d " +echo "Usage (Device): source setup-adreno-env.sh -e device -p -d [-l ]" echo "Usage (Query): source setup-adreno-env.sh -e query -p " } @@ -46,6 +47,11 @@ while [[ $# -gt 0 ]]; do shift # past argument shift # past value ;; +-l|--listen-port) + LISTEN_PORT="$2" + shift # past argument + shift # past value + ;; -h|--help) usage return 0 @@ -62,6 +68,7 @@ done echo "ENVIRONMENT = ${ENVIRONMENT}" echo "RPC_PORT = ${RPC_PORT}" echo "ADB_SERIAL= ${ADB_SERIAL}" +echo "DEVICE LISTEN POPRT= ${LISTEN_PORT}" function def_environment() { @@ -100,10 +107,11 @@ case ${ENVIRONMENT} in fi adb reverse tcp:${TVM_TRACKER_PORT} tcp:${TVM_TRACKER_PORT} -adb forward tcp:5000 tcp:5000 -adb forward tcp:5001 tcp:5001 -adb forward tcp:5002 tcp:5002 -adb shell "cd ${TARGET_FOLDER}; killall -9 tvm_rpc-${USER}; sleep 2; LD_LIBRARY_PATH=${TARGET_FOLDER}/ ./tvm_rpc-${USER} server --host=0.0.0.0 --port=5000 --port-end=5010 --tracker=127.0.0.1:${TVM_TRACKER_PORT} --key=${RPC_DEVICE_KEY}" +adb forward tcp:${LISTEN_PORT} tcp:${LISTEN_PORT} +adb forward tcp:$((LISTEN_PORT + 1)) tcp:$((LISTEN_PORT + 1)) +adb forward tcp:$((LISTEN_PORT + 2)) tcp:$((LISTEN_PORT + 2)) +adb forward tcp:$((LISTEN_PORT + 3)) tcp:$((LISTEN_PORT + 3)) +adb shell "cd ${TARGET_FOLDER}; killall -9 tvm_rpc-${USER}; sleep 2; LD_LIBRARY_PATH=${TARGET_FOLDER}/ ./tvm_rpc-${USER} server --host=0.0.0.0 --port=${LISTEN_PORT} --port-end=$((LISTEN_PORT + 10)) --tracker=127.0.0.1:${TVM_TRACKER_PORT} --key=${RPC_DEVICE_KEY}" ;; "query") diff --git a/tests/scripts/task_config_build_adreno.sh b/tests/scripts/task_config_build_adreno.sh index 62e6ffecbc..1b6750f165 100755 --- a/tests/scripts/task_config_build_adreno.sh +++ b/tests/scripts/task_config_build_adreno.sh @@ -25,6 +25,8 @@ cp ../cmake/config.cmake . if [ -f "${ADRENO_OPENCL}/CL/cl_qcom_ml_ops.h" ] ; then echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake +else +echo set\(USE_OPENCL ON\) >> config.cmake fi echo set\(USE_RPC ON\) >> config.cmake echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
[GitHub] [tvm] echuraev commented on a diff in pull request #15818: [CLI TOOLS][RTVM] Improve rtvm tool with new options to measure native performance
echuraev commented on code in PR #15818: URL: https://github.com/apache/tvm/pull/15818#discussion_r1339863992 ## apps/cpp_rtvm/main.cc: ## @@ -350,43 +350,46 @@ int ExecuteModel(ToolArgs& args) { } else if (!args.input.empty() && !args.output.empty()) { LOG(INFO) << "Executing with Input:" << args.input << " Output:" << args.output; // Set Input from Numpy Input -runner->SetInput(args.input); +runner.SetInput(args.input); // Run the model -runner->Run(); +runner.Run(); // Get Output as Numpy dump -runner->GetOutput(args.output); +runner.GetOutput(args.output); } else { LOG(INFO) << "Executing dry run ... "; // Set random input for all inputs for (auto& elem : mInfo.input_info) { LOG(INFO) << "Set Random Input for :" << elem.first; auto shape = elem.second.first; - size_t ssize = runner->GetInputMemSize(elem.first); + size_t ssize = runner.GetInputMemSize(elem.first); char* data = (char*)malloc(ssize); LOG(INFO) << "Random Input Size:" << ssize << " bytes"; - runner->SetInput(elem.first, data); + runner.SetInput(elem.first, data); free(data); } // Run the model -runner->Run(); +runner.Run(); // Get Output and dump few values for (auto& elem : mInfo.output_info) { LOG(INFO) << "Get Output for :" << elem.first; auto shape = elem.second.first; - size_t ssize = runner->GetOutputMemSize(elem.first); + size_t ssize = runner.GetOutputMemSize(elem.first); char* data = (char*)malloc(ssize); - runner->GetOutput(elem.first, data); + runner.GetOutput(elem.first, data); LOG(INFO) << "Output Size:" << ssize << " bytes"; free(data); } } + + if (args.profile) { +// Print Stats +runner.PrintStats(); + } auto tstart = std::chrono::high_resolution_clock::now(); - delete runner; + runner.~TVMRunner(); Review Comment: Why do you need to call dtor here? Probably, I missed something in the logic. I suppose that the object will be automatically destroyed in the end of the program. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] srkreddy1238 opened a new pull request, #15838: [FRONTEND] Fix unnecessary pylint errors
srkreddy1238 opened a new pull request, #15838: URL: https://github.com/apache/tvm/pull/15838 Handle unnecessary pylint errors from these frontends -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] srkreddy1238 commented on a diff in pull request #15833: [VM] Memory Manager moved up to runtime
srkreddy1238 commented on code in PR #15833: URL: https://github.com/apache/tvm/pull/15833#discussion_r1339827581 ## include/tvm/runtime/memory_manager.h: ## @@ -130,7 +131,7 @@ class MemoryManager { class StorageObj : public Object { public: /*! \brief The index into the VM function table. */ - Buffer buffer; + MBuffer buffer; /*! \brief Allocate an NDArray from a given piece of storage. */ NDArray AllocNDArray(size_t offset, std::vector shape, DLDataType dtype); Review Comment: Handled ShapeTuple across memory_manager and Allocators -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] masahi opened a new pull request, #15837: [Unity][BYOC] Support attention over batched sequences of variable sequence lengths
masahi opened a new pull request, #15837: URL: https://github.com/apache/tvm/pull/15837 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] lhutton1 opened a new pull request, #15836: [CI] Update ci-gpu image
lhutton1 opened a new pull request, #15836: URL: https://github.com/apache/tvm/pull/15836 The ci-gpu is updated to make changes in #15819 reflect in CI. cc @ashutosh-arm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org