From: Tomer Tayar <tta...@habana.ai>

When a PCIe AXI drain event happens, it is possible that the driver
cannot access the device through PCIe, and therefore cannot send a
hard-reset request to FW.
Starting from FW version 1.13, FW will initiate a hard-reset in such
a case without waiting for a reset request from the driver.

Signed-off-by: Tomer Tayar <tta...@habana.ai>
Reviewed-by: Oded Gabbay <ogab...@kernel.org>
Signed-off-by: Oded Gabbay <ogab...@kernel.org>
---
 drivers/accel/habanalabs/common/habanalabs.h | 8 ++++++++
 drivers/accel/habanalabs/gaudi2/gaudi2.c     | 2 ++
 2 files changed, 10 insertions(+)

diff --git a/drivers/accel/habanalabs/common/habanalabs.h 
b/drivers/accel/habanalabs/common/habanalabs.h
index 1655c101c705..5c69a482b8de 100644
--- a/drivers/accel/habanalabs/common/habanalabs.h
+++ b/drivers/accel/habanalabs/common/habanalabs.h
@@ -3594,6 +3594,14 @@ static inline bool hl_is_fw_sw_ver_below(struct 
hl_device *hdev, u32 fw_sw_major
        return false;
 }
 
+static inline bool hl_is_fw_sw_ver_equal_or_greater(struct hl_device *hdev, 
u32 fw_sw_major,
+                                                       u32 fw_sw_minor)
+{
+       return (hdev->fw_sw_major_ver > fw_sw_major ||
+                       (hdev->fw_sw_major_ver == fw_sw_major &&
+                                       hdev->fw_sw_minor_ver >= fw_sw_minor));
+}
+
 /*
  * Kernel module functions that can be accessed by entire module
  */
diff --git a/drivers/accel/habanalabs/gaudi2/gaudi2.c 
b/drivers/accel/habanalabs/gaudi2/gaudi2.c
index 819660c684cf..b739078c2d87 100644
--- a/drivers/accel/habanalabs/gaudi2/gaudi2.c
+++ b/drivers/accel/habanalabs/gaudi2/gaudi2.c
@@ -10007,6 +10007,8 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, 
struct hl_eq_entry *eq_ent
                error_count = gaudi2_handle_pcie_drain(hdev, 
&eq_entry->pcie_drain_ind_data);
                reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
                event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
+               if (hl_is_fw_sw_ver_equal_or_greater(hdev, 1, 13))
+                       is_critical = true;
                break;
 
        case GAUDI2_EVENT_PSOC59_RPM_ERROR_OR_DRAIN:
-- 
2.34.1

Reply via email to