[GitHub] [incubator-tvm] u99127 commented on a change in pull request #5848: [TFLite] QNN support for TFLite 2.1.0 quantized models
u99127 commented on a change in pull request #5848: URL: https://github.com/apache/incubator-tvm/pull/5848#discussion_r448548518 ## File path: python/tvm/relay/frontend/tflite.py ## @@ -651,12 +701,43 @@ def convert_shape(self, op): def convert_relu(self, op): """Convert TFLite ReLU""" +try: +from tflite.ActivationFunctionType import ActivationFunctionType +except ImportError: +raise ImportError("The tflite package must be installed") + Review comment: I think this is unnecessary given the import of ActivationFunctionType in the constructor [here](https://github.com/apache/incubator-tvm/blob/b979bf6a7630ada055fff2d65e1cd0f8d55bb6a0/python/tvm/relay/frontend/tflite.py#L54) ## File path: python/tvm/relay/frontend/tflite.py ## @@ -692,6 +773,11 @@ def _hard_swish(data): def convert_relu6(self, op): """Convert TFLite ReLU6""" +try: +from tflite.ActivationFunctionType import ActivationFunctionType Review comment: Same as relu, I think this is unnecessary given the import of ActivationFunctionType in the constructor [here](https://github.com/apache/incubator-tvm/blob/b979bf6a7630ada055fff2d65e1cd0f8d55bb6a0/python/tvm/relay/frontend/tflite.py#L54) ## File path: python/tvm/relay/frontend/tflite.py ## @@ -263,21 +305,29 @@ def get_tensor_value(self, tensor_wrapper): except ImportError: raise ImportError("The tflite package must be installed") +# Read the data from the buffer. Also extract the shape. +# The shape is used later to reshape the data. +data = tensor_wrapper.buffer.DataAsNumpy() +shape = tensor_wrapper.tensor.ShapeAsNumpy() + +# When TFLite buffer is of size 1 (scalar), then TFLite tensor shape is set to 0. +# Therefore, we set the shape to 1 for numpy reshape to work. Set shape to 1 if the data is +# a scalar type +if data.size == 1 and isinstance(shape, int) and shape == 0: +shape = (1,) + +if tensor_wrapper.tensor.Type() == TensorType.INT8: Review comment: Minor nit and this should really be credited to Dmitriy Smirnov. https://github.com/d-smirnov the condition here could well be pulled out into a helper function that has a dictionary to help us map from TensorType to numpy type ? Would make the code much cleaner and reduce duplication. i.e. something like def get_tensor_type_as_numpy(self, tensor_wrapper): """Returns np.dtype out of TensorType""" """Returns np.dtype out of TensorType""" assert isinstance(tensor_wrapper, TensorWrapper) try: from tflite.TensorType import TensorType return { TensorType.UINT8: np.uint8, TensorType.INT8:np.int8, TensorType.FLOAT32: np.float32, TensorType.INT32: np.int32, TensorType.INT64: np.int64, TensorType.BOOL:np.bool_ }[ tensor_wrapper.tensor.Type() ] except ImportError: raise ImportError("The tflite package must be installed") except KeyError: raise NotImplementedError("Tensor type '{}' currently not supported" .format(tensor_wrapper.tensor.Type())) def get_tensor_value(self, tensor_wrapper): """Get tensor buffer value from given tensor wrapper""" assert isinstance(tensor_wrapper, TensorWrapper) value_type = self.get_tensor_type_as_numpy( tensor_wrapper ) return np.frombuffer( tensor_wrapper.buffer.DataAsNumpy(), dtype=value_type ).reshape( tensor_wrapper.tensor.ShapeAsNumpy() \ if 0 != tensor_wrapper.tensor.ShapeLength() \ else [] ) ## File path: tests/python/frontend/tflite/test_forward.py ## @@ -787,14 +897,30 @@ def test_forward_convolution(): _test_convolution([4, 17, 17, 124], [1, 1, 124, 19], [1, 1], [1, 1], 'SAME', 'NHWC', quantized=quantized) _test_convolution([4, 17, 17, 12], [3, 3, 12, 32], [1, 1], [2, 2], 'VALID', 'NHWC', quantized=quantized) -# depthwise convolution -_test_convolution([4, 8, 8, 176], [1, 1, 176, 1], [1, 1], [1, 1], 'SAME', 'NHWC', True) -_test_convolution([4, 17, 17, 19], [3, 3, 19, 1], [1, 1], [2, 2], 'VALID', 'NHWC', True) -_test_convolution([4, 17, 17, 124], [1, 1, 124, 1], [1, 1], [1, 1], 'SAME', 'NHWC', True) -_test_convolution([4, 17, 17, 12], [3, 3, 12, 1], [1, 1], [2, 2], 'VALID', 'NHWC', True) -_test_convolution([4, 17, 17, 12], [3, 3, 12, 2], [1, 1], [2, 2], 'VALID', 'NHWC', True) -# dephtwise convolution with single input channel -_test_convolution([1, 76, 64, 1], [9, 5, 1, 96], [1,
[GitHub] [incubator-tvm] u99127 commented on a change in pull request #5848: [TFLite] QNN support for TFLite 2.1.0 quantized models
u99127 commented on a change in pull request #5848: URL: https://github.com/apache/incubator-tvm/pull/5848#discussion_r445098409 ## File path: python/tvm/relay/frontend/tflite.py ## @@ -2566,17 +2613,27 @@ def convert_quantize(self, op): input_tensors = self.get_input_tensors(op) assert len(input_tensors) == 1, "input tensors length should be 1" input_tensor = input_tensors[0] +input_tensor_type_str = self.get_tensor_type_str(input_tensor.tensor.Type()) in_expr = self.get_expr(input_tensor.tensor_idx) output_tensors = self.get_output_tensors(op) assert len(output_tensors) == 1, "output tensors length should be 1" output_tensor = output_tensors[0] +output_tensor_type_str = self.get_tensor_type_str(output_tensor.tensor.Type()) # The output must be quantized assert output_tensor.qnn_params -# Quantize the input -out = self.quantize(in_expr, output_tensor) +# TFLite Quantize op can also act as Requantize op +if input_tensor_type_str == "float32": +out = self.quantize(in_expr, output_tensor) +else: +out = _qnn.op.requantize(in_expr, + input_scale=input_tensor.qnn_params['scale'], + input_zero_point=input_tensor.qnn_params['zero_point'], + output_scale=output_tensor.qnn_params['scale'], + output_zero_point=output_tensor.qnn_params['zero_point'], + out_dtype=output_tensor_type_str) return out Review comment: This to me looks like it can go in by it's own right as a separate PR but this needs a unit test change in tflite/test_forward.py . ## File path: python/tvm/relay/frontend/tflite.py ## @@ -243,10 +243,46 @@ def get_tensors(self, tensors_idx_list): qnn_params = None tflite_qnn_params = tensor.Quantization() if tflite_qnn_params is not None: -scale = float(tflite_qnn_params.ScaleAsNumpy()) -zero_point = int(tflite_qnn_params.ZeroPointAsNumpy()) +# Params might be per-tensor or per-axis quantized. For per-tensor, scale and zero +# points are scalar. For per-axis, scale and zero points are tensors. But as per +# TFLite quantization spec, the restrictions on ops suggest that for per-axis, even +# if zero point is a tensor - all the zero points are identical. More infomration +# here - https://www.tensorflow.org/lite/performance/quantization_spec + +tflite_scale = tflite_qnn_params.ScaleAsNumpy() +tflite_zero_point = tflite_qnn_params.ZeroPointAsNumpy() +is_qnn_params_valid = True + +# Handle Per-axis and per-tensor cases +if isinstance(tflite_scale, np.ndarray): +assert isinstance(tflite_zero_point, np.ndarray) + +# Tensor - Per-axis quantization +if tflite_scale.shape != (1,) and tflite_zero_point.shape != (1,): +scale = tflite_scale +# Ensure that all zero points are identical +zero_point = tflite_zero_point +assert all(x == zero_point[0] for x in zero_point) Review comment: Minor Nit : Can we use an error here instead of an assert to show us clearly the change that has happened ? It also means we can provide some sensible diagnostic ? ## File path: python/tvm/relay/frontend/tflite.py ## @@ -243,10 +243,46 @@ def get_tensors(self, tensors_idx_list): qnn_params = None tflite_qnn_params = tensor.Quantization() if tflite_qnn_params is not None: -scale = float(tflite_qnn_params.ScaleAsNumpy()) -zero_point = int(tflite_qnn_params.ZeroPointAsNumpy()) +# Params might be per-tensor or per-axis quantized. For per-tensor, scale and zero +# points are scalar. For per-axis, scale and zero points are tensors. But as per +# TFLite quantization spec, the restrictions on ops suggest that for per-axis, even +# if zero point is a tensor - all the zero points are identical. More infomration +# here - https://www.tensorflow.org/lite/performance/quantization_spec Review comment: To be clear, we are interpreting this from the fact that Conv2d and Depthwise_conv2d have a zero_point of 0 listed in their restriction even though they have per-axis quantization. I would make the comment more explicit . For per-axis or per-channel quantization the scale and zero points for the weights are tensors (?) ## File path: