[PATCH 02/14] accel/habanalabs/gaudi2: include block id in ECC error reporting

2023-09-18 Thread Oded Gabbay
From: Ofir Bitton During ECC event handling, Memory wrapper id was mistakenly printed as block id. Fix the print and in addition fetch the actual block-id from firmware. Signed-off-by: Ofir Bitton Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/gaudi2/gaudi2

Re: [PATCH] accel/habanalabs: refactor deprecated strncpy

2023-09-18 Thread Oded Gabbay
On Fri, Aug 25, 2023 at 12:19 PM Stanislaw Gruszka wrote: > > On Wed, Aug 23, 2023 at 12:23:08AM +, Justin Stitt wrote: > > `strncpy` is deprecated for use on NUL-terminated destination strings [1]. > > > > A suitable replacement is `strscpy` [2] due to the fact that it > > guarantees

Re: [PATCH] accel/habanalabs/gaudi2: Fix incorrect string length computation in gaudi2_psoc_razwi_get_engines()

2023-09-13 Thread Oded Gabbay
On Tue, Sep 5, 2023 at 3:28 PM Stanislaw Gruszka wrote: > > On Mon, Sep 04, 2023 at 09:18:36PM +0200, Christophe JAILLET wrote: > > snprintf() returns the "number of characters which *would* be generated for > > the given input", not the size *really* generated. > > > > In order to avoid too

Re: [PATCH] accel/habanalabs: refactor deprecated strncpy to strscpy_pad

2023-08-31 Thread Oded Gabbay
On Sat, Aug 26, 2023 at 1:13 AM Kees Cook wrote: > > On Fri, Aug 25, 2023 at 10:09:51PM +, Justin Stitt wrote: > > `strncpy` is deprecated for use on NUL-terminated destination strings [1]. > > > > We see that `prop->cpucp_info.card_name` is supposed to be > > NUL-terminated based on its

[PATCH 2/2] accel/habanalabs: fix ETR/ETF flush logic

2023-07-24 Thread Oded Gabbay
From: Benjamin Dotan When config_etr or config_etf are called we need to validate the parameters that are passed into them to make sure the requested operation is valid. Signed-off-by: Benjamin Dotan Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/gaudi

[PATCH 1/2] accel/habanalabs/gaudi2 : remove psoc_arc access

2023-07-24 Thread Oded Gabbay
From: Benjamin Dotan Because firmware is blocking PSOC_ARC_DBG, we need to disable access to this block. Signed-off-by: Benjamin Dotan Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- .../habanalabs/gaudi2/gaudi2_coresight.c | 24 +-- 1 file changed, 12

[PATCH] accel/habanalabs/gaudi2: prepare to remove cpu_rst_status

2023-07-20 Thread Oded Gabbay
compatibility. Signed-off-by: Igor Grinberg Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/gaudi2/gaudi2.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/drivers/accel/habanalabs/gaudi2/gaudi2.c b/drivers/accel/habanalabs/gaudi2

Re: [PATCH] accel/habanalabs: add more debugfs stub helpers

2023-07-20 Thread Oded Gabbay
On Thu, Jul 20, 2023 at 1:29 PM Daniel Vetter wrote: > > On Sun, Jun 11, 2023 at 12:50:31PM +0300, Oded Gabbay wrote: > > On Fri, Jun 9, 2023 at 4:37 PM Tomer Tayar wrote: > > > > > > On 09/06/2023 15:06, Arnd Bergmann wrote: > > > > From: Arnd Bergman

Re: [RFC 0/5] Proposal to use netlink for RAS and Telemetry across drm subsystem

2023-07-17 Thread Oded Gabbay
-soc-fatal-mdfi-east 0x1023 > > error-gt1-soc-fatal-mdfi-south 0x1024 > > error-gt1-soc-fatal-hbm-ss0-0 0x1025 > > error-gt1-soc-fatal-hbm-ss0-1 0x1000

Re: [PATCH 1/2] eventfd: simplify eventfd_signal()

2023-07-13 Thread Oded Gabbay
/* already in OOM ? */ > if (memcg->under_oom) > - eventfd_signal(eventfd, 1); > + eventfd_signal(eventfd); > spin_unlock(_oom_lock); > > return 0; > @@ -4791,7 +4791,7 @@ static void memcg_event_remove(struct work_struct *work) > event->unregister_event(memcg, event->eventfd); > > /* Notify userspace the event is going away. */ > - eventfd_signal(event->eventfd, 1); > + eventfd_signal(event->eventfd); > > eventfd_ctx_put(event->eventfd); > kfree(event); > diff --git a/mm/vmpressure.c b/mm/vmpressure.c > index b52644771cc4..ba4cdef37e42 100644 > --- a/mm/vmpressure.c > +++ b/mm/vmpressure.c > @@ -169,7 +169,7 @@ static bool vmpressure_event(struct vmpressure *vmpr, > continue; > if (level < ev->level) > continue; > - eventfd_signal(ev->efd, 1); > + eventfd_signal(ev->efd); > ret = true; > } > mutex_unlock(>events_lock); > diff --git a/samples/vfio-mdev/mtty.c b/samples/vfio-mdev/mtty.c > index a60801fb8660..5edcf8d738de 100644 > --- a/samples/vfio-mdev/mtty.c > +++ b/samples/vfio-mdev/mtty.c > @@ -1028,9 +1028,9 @@ static int mtty_trigger_interrupt(struct mdev_state > *mdev_state) > } > > if (mdev_state->irq_index == VFIO_PCI_MSI_IRQ_INDEX) > - ret = eventfd_signal(mdev_state->msi_evtfd, 1); > + ret = eventfd_signal(mdev_state->msi_evtfd); > else > - ret = eventfd_signal(mdev_state->intx_evtfd, 1); > + ret = eventfd_signal(mdev_state->intx_evtfd); > > #if defined(DEBUG_INTR) > pr_info("Intx triggered\n"); > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c > index 89912a17f5d5..c0e230f4c3e9 100644 > --- a/virt/kvm/eventfd.c > +++ b/virt/kvm/eventfd.c > @@ -61,7 +61,7 @@ static void irqfd_resampler_notify(struct > kvm_kernel_irqfd_resampler *resampler) > > list_for_each_entry_srcu(irqfd, >list, resampler_link, > > srcu_read_lock_held(>kvm->irq_srcu)) > - eventfd_signal(irqfd->resamplefd, 1); > + eventfd_signal(irqfd->resamplefd); > } > > /* > @@ -786,7 +786,7 @@ ioeventfd_write(struct kvm_vcpu *vcpu, struct > kvm_io_device *this, gpa_t addr, > if (!ioeventfd_in_range(p, addr, len, val)) > return -EOPNOTSUPP; > > - eventfd_signal(p->eventfd, 1); > + eventfd_signal(p->eventfd); > return 0; > } > > > -- > 2.34.1 > For habanalabs (device.c): Reviewed-by: Oded Gabbay

[PATCH 12/12] accel/habanalabs: release user interfaces earlier in device fini

2023-07-11 Thread Oded Gabbay
accesses to these interfaces, this check is not hermetic and it is better to just reverse the order of the code in hl_device_fini(). Signed-off-by: Tomer Tayar Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/common/device.c | 12 ++-- 1 file changed, 6

[PATCH 11/12] accel/habanalabs: Move ioctls to the device specific ioctls range

2023-07-11 Thread Oded Gabbay
From: Tomer Tayar To use drm_ioctl(), move the ioctls to the device specific ioctls range at [DRM_COMMAND_BASE, DRM_COMMAND_END). Signed-off-by: Tomer Tayar Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- .../accel/habanalabs/common/command_buffer.c | 5 +- .../habanalabs/common

[PATCH 04/12] accel/habanalabs/gaudi2: fix missing check of kernel ctx

2023-07-11 Thread Oded Gabbay
If we are initializing the kernel context when we have a Gaudi2 device, we don't need to do any late initializing of that context with specific Gaudi2 code. Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/gaudi2/gaudi2.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers

[PATCH 10/12] accel/habanalabs: update debugfs-driver-habanalabs with the accel path

2023-07-11 Thread Oded Gabbay
From: Tomer Tayar Replace "/sys/kernel/debug/habanalabs/hl/..." with "/sys/kernel/debug/accel//...". Signed-off-by: Tomer Tayar Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- .../ABI/testing/debugfs-driver-habanalabs | 84 +-- 1 file ch

[PATCH 07/12] accel/habanalabs: add info ioctl for engine error reports

2023-07-11 Thread Oded Gabbay
From: Ofir Bitton User gets notification for every engine error report, but he still lacks the exact engine information. Hence, we allow user to query for the exact engine reported an error. Signed-off-by: Ofir Bitton Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel

[PATCH 08/12] accel/habanalabs: register compute device as an accel device

2023-07-11 Thread Oded Gabbay
it will be handled in subsequent patches. Signed-off-by: Tomer Tayar Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/common/debugfs.c | 22 +-- drivers/accel/habanalabs/common/device.c | 163 +++--- drivers/accel/habanalabs/common/habanalabs.h

[PATCH 09/12] accel/habanalabs: update sysfs-driver-habanalabs with the accel path

2023-07-11 Thread Oded Gabbay
From: Tomer Tayar Replace "/sys/class/habanalabs/hl/..." with "/sys/class/accel/accel/device/...". Signed-off-by: Tomer Tayar Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- .../ABI/testing/sysfs-driver-habanalabs | 64 +-- 1 file changed,

[PATCH 06/12] accel/habanalabs: set default device release watchdog T/O as 30 sec

2023-07-11 Thread Oded Gabbay
to collect debug data. Increase the default value to 30 sec. Signed-off-by: Tomer Tayar Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/common/device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/accel/habanalabs/common/device.c b

[PATCH 03/12] accel/habanalabs/gaudi2: prepare to remove soft_rst_irq

2023-07-11 Thread Oded Gabbay
the backward compatibility. Signed-off-by: Igor Grinberg Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/gaudi2/gaudi2.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/accel/habanalabs/gaudi2/gaudi2.c b/drivers/accel/habanalabs/gaudi2

[PATCH 05/12] accel/habanalabs: handle f/w reserved dram space request

2023-07-11 Thread Oded Gabbay
From: Dani Liberman It is possible for FW to request reserved space in dram. If the device supports this option, it will retrieve the size from the f/w and will reserve it. Currently we add the common code infrastructure to support it. Signed-off-by: Dani Liberman Reviewed-by: Oded Gabbay

[PATCH 02/12] accel/habanalabs/gaudi2: unsecure tpc count registers

2023-07-11 Thread Oded Gabbay
From: Ofir Bitton As TPC kernels now must use those registers we unsecure them. Signed-off-by: Ofir Bitton Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/gaudi2/gaudi2_security.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/accel/habanalabs

[PATCH 01/12] accel/habanalabs/gaudi2: un-secure register for engine cores interrupt

2023-07-11 Thread Oded Gabbay
From: Tomer Tayar The F/W dynamically allocates one of the PSOC scratchpad registers for the engine cores, so they can raise events towards the F/W. To allow the engine cores to access this register, this register must be non-secured. Signed-off-by: Tomer Tayar Reviewed-by: Oded Gabbay Signed

Re: [PATCH] habanalabs/gaudi: Add MODULE_FIRMWARE macros

2023-07-03 Thread Oded Gabbay
+ > #define GAUDI_DMA_POOL_BLK_SIZE0x100 /* 256 bytes */ > > #define GAUDI_RESET_TIMEOUT_MSEC 2000/* 2000ms */ > -- > 2.37.2 > Reviewed-by: Oded Gabbay Applied to -next. Thanks, Oded

Re: [PATCH] accel: make accel_class a static const structure

2023-07-03 Thread Oded Gabbay
time > > placing it into read-only memory, instead of having to be dynamically > > allocated at boot time. > > > > Cc: Oded Gabbay > > Cc: dri-devel@lists.freedesktop.org > > Suggested-by: Greg Kroah-Hartman > > Signed-off-by: Ivan Orlov > > Signed-o

Re: [PATCH v2 15/24] habanalabs: use vmalloc_array and vcalloc

2023-07-02 Thread Oded Gabbay
sizeof(u32)); > if (!sync_objects) > return NULL; > > @@ -453,8 +454,8 @@ hl_state_dump_alloc_read_sm_block_monito > s64 base_addr; /* Base addr can be negative */ > int i; > > - monitors = vmalloc(sds->props[SP_MONITORS_AMOUNT] * > - sizeof(struct hl_mon_state_dump)); > + monitors = vmalloc_array(sds->props[SP_MONITORS_AMOUNT], > +sizeof(struct hl_mon_state_dump)); > if (!monitors) > return NULL; > > Reviewed-by: Oded Gabbay

Re: [PATCH drm-next v5 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI

2023-06-20 Thread Oded Gabbay
On Tue, Jun 20, 2023 at 10:13 AM Dave Airlie wrote: > > On Tue, 20 Jun 2023 at 17:06, Oded Gabbay wrote: > > > > On Tue, Jun 20, 2023 at 7:05 AM Dave Airlie wrote: > > > > > > Since this is feature is nouveau only currently and doesn't disturb > >

Re: [PATCH drm-next v5 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI

2023-06-20 Thread Oded Gabbay
On Tue, Jun 20, 2023 at 7:05 AM Dave Airlie wrote: > > Since this is feature is nouveau only currently and doesn't disturb > the current nouveau code paths, I'd like to try and get this work in > tree so other drivers can work from it. > > If there are any major objections to this, I'm happy to

Re: [Intel-xe] [RFC PATCH 1/1] drm/xe: Introduce function pointers for MMIO functions

2023-06-18 Thread Oded Gabbay
On Thu, Jun 15, 2023 at 7:34 PM Matt Roper wrote: > > On Thu, Jun 15, 2023 at 04:04:18PM +0300, Oded Gabbay wrote: > > On Thu, Jun 15, 2023 at 3:01 AM Matt Roper > > wrote: > > > > > > On Mon, Jun 12, 2023 at 06:31:57PM +0200, Francois Dugast wrote: > &

Re: [Intel-xe] [RFC PATCH 1/1] drm/xe: Introduce function pointers for MMIO functions

2023-06-18 Thread Oded Gabbay
On Thu, Jun 15, 2023 at 7:34 PM Matt Roper wrote: > > On Thu, Jun 15, 2023 at 04:04:18PM +0300, Oded Gabbay wrote: > > On Thu, Jun 15, 2023 at 3:01 AM Matt Roper > > wrote: > > > > > > On Mon, Jun 12, 2023 at 06:31:57PM +0200, Francois Dugast wrote: > &

[PATCH 3/3] accel/habanalabs: dump temperature threshold boot error

2023-06-12 Thread Oded Gabbay
From: Ofir Bitton Add dump of an error reported from f/w during boot time. This error indicates a failure with setting temperature threshold. Signed-off-by: Ofir Bitton Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/common/firmware_if.c | 5 + 1 file

[PATCH 2/3] accel/habanalabs: reset device if scrubbing failed

2023-06-12 Thread Oded Gabbay
If scrubbing memory after user released device has failed it means the device is in a bad state and should be reset. Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/common/device.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/accel/habanalabs/common

[PATCH 1/3] accel/habanalabs: remove pdev check on idle check

2023-06-12 Thread Oded Gabbay
Our simulator supports idle check so no need anymore to check if pdev exists. Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/common/device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/accel/habanalabs/common/device.c b/drivers/accel/habanalabs/common

[PATCH 1/3] accel/habanalabs: Allow single timestamp registration request at a time

2023-06-12 Thread Oded Gabbay
, without proper protection, we could end up adding the same node twice to the interrupts wait lists. Signed-off-by: farah kassabri Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- .../habanalabs/common/command_submission.c| 318 +++--- drivers/accel/habanalabs/common

[PATCH 2/3] accel/habanalabs: fix wait_for_interrupt abortion flow

2023-06-12 Thread Oded Gabbay
of the completion API will be greater than 0 since it will return the timeout, but as this indicates successful completion, the driver should mark it as aborted. Signed-off-by: farah kassabri Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- .../habanalabs/common/command_submission.c

[PATCH 3/3] accel/habanalabs: change user interrupt to threaded IRQ

2023-06-12 Thread Oded Gabbay
From: Tal Cohen It is preferable to handle the user interrupt job from a threaded IRQ context. This will allow to avoid disabling interrupts when the user process registers for a new event and to avoid long handling inside an interrupt. Signed-off-by: Tal Cohen Reviewed-by: Oded Gabbay Signed

Re: [PATCH] accel/habanalabs: add more debugfs stub helpers

2023-06-11 Thread Oded Gabbay
On Fri, Jun 9, 2023 at 4:37 PM Tomer Tayar wrote: > > On 09/06/2023 15:06, Arnd Bergmann wrote: > > From: Arnd Bergmann > > > > Two functions got added with normal prototypes for debugfs, but not > > alternative when building without it: > > > > drivers/accel/habanalabs/common/device.c: In

[PATCH 10/12] accel/habanalabs: print return code when process termination fails

2023-06-08 Thread Oded Gabbay
From: Koby Elbaz As part of driver teardown, we attempt to kill all user processes. It shouldn't fail, but if it does we want to print the error code that the kapi returned to us. Signed-off-by: Koby Elbaz Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs

[PATCH 12/12] accel/habanalabs: rename fd_list to hpriv_list

2023-06-08 Thread Oded Gabbay
From: Koby Elbaz Every time an FD is returned to the user, the driver adds a corresponding private structure to the list. Yet, it's still a list of private structures rather than of FDs. Remove, as well, an unnecessary comment. Signed-off-by: Koby Elbaz Reviewed-by: Oded Gabbay Signed-off

[PATCH 09/12] accel/habanalabs: fix standalone preboot descriptor request

2023-06-08 Thread Oded Gabbay
there are no backward compatibility issues as older f/w versions simply ignore this value. Signed-off-by: farah kassabri Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/common/firmware_if.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git

[PATCH 11/12] accel/habanalabs: call put_pid after hpriv list is updated

2023-06-08 Thread Oded Gabbay
Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/common/device.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/accel/habanalabs/common/device.c b/drivers/accel/habanalabs/common/device.c index 764d40c0d666..c61a58a2e622 100644 --- a/drivers/accel

[PATCH 06/12] accel/habanalabs: set device status 'malfunction' while in rmmod

2023-06-08 Thread Oded Gabbay
attempting to kill all processes in a list that can't be ever really empty. Signed-off-by: Koby Elbaz Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/common/device.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/accel

[PATCH 07/12] accel/habanalabs: stop fetching MME SBTE error cause

2023-06-08 Thread Oded Gabbay
From: Ofir Bitton Because in this case we have only a single possible cause, we can safely stop fetching the cause from firmware. Signed-off-by: Ofir Bitton Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/gaudi2/gaudi2.c | 31 ++-- 1 file

[PATCH 08/12] accel/habanalabs: handle arc farm razwi

2023-06-08 Thread Oded Gabbay
From: Dani Liberman Implement razwi handling for arc farm and add it to arc farm sei event handler. Signed-off-by: Dani Liberman Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/gaudi2/gaudi2.c | 16 +--- 1 file changed, 13 insertions(+), 3

[PATCH 05/12] accel/habanalabs: print task name upon creation of a user context

2023-06-08 Thread Oded Gabbay
From: Tomer Tayar It is useful for debug to know which user process have acquired the device. Add this info to the relevant debug print, in addition to the already printed user context's ASID. Signed-off-by: Tomer Tayar Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel

[PATCH 03/12] accel/habanalabs: notify user about undefined opcode event

2023-06-08 Thread Oded Gabbay
From: Ofir Bitton In order for user to be aware of undefined opcode events, we must store all relevant information and notify user about the failure. The user will fetch the stored info via info ioctl. Signed-off-by: Ofir Bitton Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay

[PATCH 04/12] accel/habanalabs: print task name and request code upon ioctl failure

2023-06-08 Thread Oded Gabbay
From: Tomer Tayar When an ioctl fails, it is useful to know what is the task command name and the full ioctl request code, in addition to the task pid and the ioctl number. Add the additional information to the relevant debug error prints. Signed-off-by: Tomer Tayar Reviewed-by: Oded Gabbay

[PATCH 02/12] accel/habanalabs: update pending reset flags with new reset requests

2023-06-08 Thread Oded Gabbay
for the device status. To prevent such cases, update the pending reset flags with the new requests flags before the requests are dropped. Signed-off-by: Tomer Tayar Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/common/device.c | 4 +++- 1 file changed, 3 insertions

[PATCH 01/12] accel/habanalabs: prevent immediate hard reset due to 2 adjacent H/W events

2023-06-08 Thread Oded Gabbay
immediate reset, modify the driver to perform it if the user is not registered to events AND we don't already have a pending reset for a previous H/W event. Signed-off-by: Tomer Tayar Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/common/device.c | 11

[git pull] habanalabs for drm-next-6.5

2023-06-08 Thread Oded Gabbay
fit Moti Haimovski (3): accel/habanalabs: fix bug in free scratchpad memory accel/habanalabs: call to HW/FW err returns 0 when no events exist accel/habanalabs: fix mem leak in capture user mappings Oded Gabbay (5): accel/habanalabs: set unused bit as reserved a

Re: DRM debugfs cleanup take 4

2023-06-01 Thread Oded Gabbay
On Wed, Apr 12, 2023 at 5:52 PM Christian König wrote: > > Hi guys, > > took me some tries to get the Intel CI happy with this patch set. > > This is the version rebased on drm-misc-next, for a CI run you actually > need to rebase the last patch to drm-tip. So I'm planning to merge 1-4 > for this

[PATCH 2/3] accel/habanalabs: add event queue extra validation

2023-05-28 Thread Oded Gabbay
From: Ofir Bitton In order to increase reliability of the event queue interface, we apply to Gaudi2 the same mechanism we have in Gaudi1. The extra validation is basically checking that the received event index matches the expected index. Signed-off-by: Ofir Bitton Reviewed-by: Oded Gabbay

[PATCH 3/3] accel/habanalabs: refactor error info reset

2023-05-28 Thread Oded Gabbay
From: Dani Liberman Moved error info reset code to single function for future use from other places in the driver. Signed-off-by: Dani Liberman Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/common/device.c | 8 drivers/accel/habanalabs

[PATCH 1/3] accel/habanalabs: unsecure TSB_CFG_MTRR regs

2023-05-28 Thread Oded Gabbay
From: Ofir Bitton In order to utilize Engine Barrier padding, user must have access to this register set. Signed-off-by: Ofir Bitton Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/gaudi2/gaudi2_security.c | 4 1 file changed, 4 insertions(+) diff

Re: [PATCH 0/7] Add a DRM driver to support AI Processing Unit (APU)

2023-05-24 Thread Oded Gabbay
On Wed, May 24, 2023 at 2:34 AM Kevin Hilman wrote: > > Jeffrey Hugo writes: > > > On 5/17/2023 8:52 AM, Alexandre Bailon wrote: > >> This adds a DRM driver that implements communication between the CPU and an > >> APU. The driver target embedded device that usually run inference using > >>

Re: [PATCH 0/5] accel/ivpu: Add debugfs support

2023-05-24 Thread Oded Gabbay
On Wed, May 24, 2023 at 11:29 AM Stanislaw Gruszka wrote: > > Hi > > On Wed, May 24, 2023 at 10:55:08AM +0300, Oded Gabbay wrote: > > On Wed, May 24, 2023 at 10:49 AM Stanislaw Gruszka > > wrote: > > > > > > Add debugfs support for ivpu driver, most imp

Re: [PATCH 0/5] accel/ivpu: Add debugfs support

2023-05-24 Thread Oded Gabbay
On Wed, May 24, 2023 at 10:49 AM Stanislaw Gruszka wrote: > > Add debugfs support for ivpu driver, most importantly firmware loging > and tracing. Hi, Without looking at the code I have 2 comments/questions: 1. Please add an ABI documentation in Documentation/ABI/testing/ or

Re: [PATCH 1/4] accel/habanalabs: remove sim code

2023-05-22 Thread Oded Gabbay
On Mon, May 22, 2023 at 2:33 PM Dan Carpenter wrote: > > Thanks! > > On Mon, May 22, 2023 at 02:25:45PM +0300, Oded Gabbay wrote: > > diff --git a/drivers/accel/habanalabs/common/device.c > > b/drivers/accel/habanalabs/common/device.c > > index cab5a63db8c1..ca15c8d

[PATCH 3/4] accel/habanalabs: fix bug of not fetching addr_dec info

2023-05-22 Thread Oded Gabbay
From: Ofir Bitton addr_dec info should always be fetched, regardless of cause value. Signed-off-by: Ofir Bitton Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/gaudi2/gaudi2.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git

[PATCH 4/4] accel/habanalabs: move ioctl error print to debug level

2023-05-22 Thread Oded Gabbay
We don't want to allow users to spam the kernel log and sending ioctls with bad opcodes is a sure way to do it. Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/common/habanalabs_ioctl.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/accel/habanalabs

[PATCH 2/4] accel/habanalabs: add description to several info ioctls

2023-05-22 Thread Oded Gabbay
From: Dani Liberman Several info ioctls may return success although no data retrieved. Signed-off-by: Dani Liberman Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- include/uapi/drm/habanalabs_accel.h | 10 ++ 1 file changed, 10 insertions(+) diff --git a/include/uapi/drm

[PATCH 1/4] accel/habanalabs: remove sim code

2023-05-22 Thread Oded Gabbay
There were a few places where simulator only code got into the upstream. Remove those places that can confuse other developers. Fixes: 2a0a839b6a28 ("habanalabs: extend fatal messages to contain PCI info") Cc: Moti Haimovski Cc: Dan Carpenter Signed-off-by: Oded Gabbay --- dri

[PATCH 08/12] accel/habanalabs: use binning info when handling razwi

2023-05-16 Thread Oded Gabbay
-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/gaudi2/gaudi2.c | 17 ++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/drivers/accel/habanalabs/gaudi2/gaudi2.c b/drivers/accel/habanalabs/gaudi2/gaudi2.c index b8644d87f817..a6aa17d86820 100644

[PATCH 11/12] accel/habanalabs: update state when loading boot fit

2023-05-16 Thread Oded Gabbay
From: Koby Elbaz Any FW component we load must be followed by a corresponding state update. However, it seems that so far we skipped doing so for the bootfit case, so fix that. Signed-off-by: Koby Elbaz Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/common

[PATCH 12/12] accel/habanalabs: mask part of hmmu page fault captured address

2023-05-16 Thread Oded Gabbay
Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/gaudi2/gaudi2.c | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/accel/habanalabs/gaudi2/gaudi2.c b/drivers/accel/habanalabs/gaudi2/gaudi2.c index 4981b8eb0ff5..1cb2b72e1cd2

[PATCH 07/12] accel/habanalabs: remove support for mmu disable

2023-05-16 Thread Oded Gabbay
From: Ofir Bitton As mmu disable mode is only used for bring-up stages, let's remove this option and all code related to it. Signed-off-by: Ofir Bitton Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- .../accel/habanalabs/common/command_buffer.c | 6 - .../habanalabs/common

[PATCH 06/12] accel/habanalabs: upon DMA errors, use FW-extracted error cause

2023-05-16 Thread Oded Gabbay
From: Koby Elbaz Initially, the driver used to read the error cause data directly from the ASIC. However, the FW now clears it before the driver could read it. Therefore we should use the error cause data that is extracted by the FW. Signed-off-by: Koby Elbaz Reviewed-by: Oded Gabbay Signed

[PATCH 09/12] accel/habanalabs: use lower QM in QM errors handling

2023-05-16 Thread Oded Gabbay
omer Tayar Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/gaudi2/gaudi2.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/accel/habanalabs/gaudi2/gaudi2.c b/drivers/accel/habanalabs/gaudi2/gaudi2.c index a6aa17d86820.

[PATCH 05/12] accel/habanalabs: print max timeout value on CS stuck

2023-05-16 Thread Oded Gabbay
If a workload got stuck, we print an error to the kernel log about it. Add to that print the configured max timeout value, as that value is not fixed between ASICs and in addition it can be configured using a kernel module parameter. Signed-off-by: Oded Gabbay --- .../habanalabs/common

[PATCH 10/12] accel/habanalabs: print qman data on error only for lower qman

2023-05-16 Thread Oded Gabbay
for the lower QMAN. Signed-off-by: Tomer Tayar Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/gaudi2/gaudi2.c | 146 +++--- drivers/accel/habanalabs/gaudi2/gaudi2P.h | 2 +- .../include/gaudi2/asic_reg/gaudi2_regs.h | 11 ++ 3 files

[PATCH 03/12] accel/habanalabs: fix mem leak in capture user mappings

2023-05-16 Thread Oded Gabbay
From: Moti Haimovski This commit fixes a memory leak caused when clearing the user_mappings info when a new context is opened immediately after user_mapping is captured and a hard reset is performed. Signed-off-by: Moti Haimovski Reviewed-by: Dani Liberman Reviewed-by: Oded Gabbay Signed-off

[PATCH 04/12] accel/habanalabs: align to latest firmware specs

2023-05-16 Thread Oded Gabbay
Update the firmware common interface files with the latest version. Signed-off-by: Oded Gabbay --- .../habanalabs/include/common/cpucp_if.h | 18 .../habanalabs/include/common/hl_boot_if.h| 41 --- 2 files changed, 16 insertions(+), 43 deletions(-) diff --git

[PATCH 02/12] accel/habanalabs: set unused bit as reserved

2023-05-16 Thread Oded Gabbay
Get latest f/w gaudi2 interface file which marks unused bist_need_iatu_config bit in cold_rst_data structure as reserved bit. Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/include/gaudi2/gaudi2_fw_if.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/accel

[PATCH 01/12] accel/habanalabs: rename security functions related arguments

2023-05-16 Thread Oded Gabbay
From: Koby Elbaz Make the argument names specify the registers array represent registers that should be unsecured so the user can access them. Signed-off-by: Koby Elbaz Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/common/security.c | 57

Re: [PATCH] accel/habanalabs: fix gaudi2_get_tpc_idle_status() return

2023-05-16 Thread Oded Gabbay
pcs(hdev, _iter); > > - return tpc_idle_data.is_idle; > + return *tpc_idle_data.is_idle; > } > > static bool gaudi2_get_decoder_idle_status(struct hl_device *hdev, u64 > *mask_arr, u8 mask_len, > -- > 2.39.2 > Reviewed-by: Oded Gabbay Applied to -next. Thanks, Oded

Re: [PATCH -next] habanalabs: Fix some kernel-doc comments

2023-05-16 Thread Oded Gabbay
; + * @user_regs_range_array: register range array > + * @user_regs_range_array_size: register range array size > * > */ > int hl_init_pb_ranges_single_dcore(struct hl_device *hdev, u32 dcore_offset, > -- > 2.20.1.7.g153144c > Reviewed-by: Oded Gabbay Applied to -next. Thanks, Oded

Re: [PATCH v2] accel/habanalabs: Make use of rhashtable

2023-05-07 Thread Oded Gabbay
On Mon, May 8, 2023 at 8:28 AM Cai Huoqing wrote: > > On 07 5月 23 16:17:55, Oded Gabbay wrote: > > On Sat, May 6, 2023 at 12:25 PM Cai Huoqing wrote: > > > > > > On 04 5月 23 09:12:40, Oded Gabbay wrote: > > > > On Thu, May 4, 2023 at 6:00 AM Cai Huoqing

Re: [PATCH v2] accel/habanalabs: Make use of rhashtable

2023-05-07 Thread Oded Gabbay
On Sat, May 6, 2023 at 12:25 PM Cai Huoqing wrote: > > On 04 5月 23 09:12:40, Oded Gabbay wrote: > > On Thu, May 4, 2023 at 6:00 AM Cai Huoqing wrote: > > > > > > On 30 4月 23 09:36:29, Oded Gabbay wrote: > > > > On Fri, Apr 28, 2

Re: [PATCH v2] accel/habanalabs: Make use of rhashtable

2023-05-04 Thread Oded Gabbay
On Thu, May 4, 2023 at 6:00 AM Cai Huoqing wrote: > > On 30 4月 23 09:36:29, Oded Gabbay wrote: > > On Fri, Apr 28, 2023 at 5:49 PM Cai Huoqing wrote: > > > > > > Using rhashtable to accelerate the search for userptr by address, > > > instead of using a l

[PATCH 6/6] accel/habanalabs: always fetch pci addr_dec error info

2023-05-01 Thread Oded Gabbay
From: Ofir Bitton Due to missing indication of address decode source (LBW/HBW bus), we should always try and fetch extended information. Signed-off-by: Ofir Bitton Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/gaudi2/gaudi2.c | 14 ++ 1 file

[PATCH 5/6] accel/habanalabs: fix a static warning - 'dubious: x & !y'

2023-05-01 Thread Oded Gabbay
From: Koby Elbaz Use a straight forward approach to get a conditional result. Signed-off-by: Koby Elbaz Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/gaudi2/gaudi2.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/accel

[PATCH 3/6] accel/habanalabs: expose debugfs files later

2023-05-01 Thread Oded Gabbay
it to be done right afterwards. The initialization of the debugfs entry structure is left in its current position because it is used before creating the files. Signed-off-by: Tomer Tayar Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/common/debugfs.c| 37

[PATCH 4/6] accel/habanalabs: poll for device status update following WFE cmd

2023-05-01 Thread Oded Gabbay
at that stage, but it might not be. Therefore, we increase WFE's robustness by polling on the status register that will be updated once the device is actually halted. Signed-off-by: Koby Elbaz Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/common/firmware_if.c | 28

[PATCH 1/6] accel/habanalabs: add missing tpc interrupt info

2023-05-01 Thread Oded Gabbay
From: Dafna Hirschfeld For some reason the last possible tpc interrupt cause in gaudi2_tpc_interrupts_cause is missing from the code. Signed-off-by: Dafna Hirschfeld Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/gaudi2/gaudi2.c | 3 ++- 1 file changed, 2

[PATCH 2/6] accel/habanalabs: add pci health check during heartbeat

2023-05-01 Thread Oded Gabbay
. Signed-off-by: Ofir Bitton Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/common/device.c | 15 ++- drivers/accel/habanalabs/common/habanalabs.h | 2 ++ drivers/accel/habanalabs/common/habanalabs_drv.c | 2 -- 3 files changed, 16 insertions

Re: [PATCH v2] accel/habanalabs: Make use of rhashtable

2023-04-30 Thread Oded Gabbay
On Fri, Apr 28, 2023 at 5:49 PM Cai Huoqing wrote: > > Using rhashtable to accelerate the search for userptr by address, > instead of using a list. > > Preferably, the lookup complexity of a hash table is O(1). > > This patch will speedup the method > hl_userptr_is_pinned by

[PATCH 07/10] accel/habanalabs: call to HW/FW err returns 0 when no events exist

2023-04-18 Thread Oded Gabbay
From: Moti Haimovski This commit modifies the call to retrieve HW or FW error events to return success when no events are pending, as done in the calls to other events. Signed-off-by: Moti Haimovski Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- .../habanalabs/common

[PATCH 10/10] accel/habanalabs: refactor abort of completions and waits

2023-04-18 Thread Oded Gabbay
From: Koby Elbaz Aborting CS completions should be in command_submission.c but aborting waiting for user interrupts should be in device.c. This separation is also for adding more abort operations in the future. Signed-off-by: Koby Elbaz Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay

[PATCH 09/10] accel/habanalabs: minimize encapsulation signal mutex lock time

2023-04-18 Thread Oded Gabbay
disables preemption, it could lead to sleeping in atomic context. Signed-off-by: Koby Elbaz Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/common/command_submission.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/accel

[PATCH 08/10] accel/habanalabs: add unregister timestamp uapi

2023-04-18 Thread Oded Gabbay
-by: farah kassabri Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- .../habanalabs/common/command_submission.c| 123 ++ include/uapi/drm/habanalabs_accel.h | 1 + 2 files changed, 101 insertions(+), 23 deletions(-) diff --git a/drivers/accel/habanalabs/common

[PATCH 03/10] accel/habanalabs: extract and save the FW's SW major/minor/sub-minor

2023-04-18 Thread Oded Gabbay
Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/common/firmware_if.c | 78 +-- drivers/accel/habanalabs/common/habanalabs.h | 6 ++ 2 files changed, 78 insertions(+), 6 deletions(-) diff --git a/drivers/accel/habanalabs/common/firmware_if.c b/drivers/accel

[PATCH 02/10] accel/habanalabs: rename fw_{major/minor}_version to fw_inner_{major/minor}_ver

2023-04-18 Thread Oded Gabbay
From: Dafna Hirschfeld We later want to add fields for Firmware SW version. The current extracted FW version is the inner FW versioning so the new name is better and also better differentiate from the FW's SW version. Signed-off-by: Dafna Hirschfeld Reviewed-by: Oded Gabbay Signed-off

[PATCH 05/10] accel/habanalabs: do soft-reset using cpucp packet

2023-04-18 Thread Oded Gabbay
From: Dafna Hirschfeld This is done depending on the FW version. The cpucp method is preferable and saves scratchpads resource. Signed-off-by: Dafna Hirschfeld Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/common/firmware_if.c | 14 ++ drivers

[PATCH 06/10] accel/habanalabs: unsecure TPC bias registers

2023-04-18 Thread Oded Gabbay
From: Ofir Bitton User needs to be able to perform downcast / upcast of fp8_143 dtype. Hence bias register needs to be accessed by the user. Signed-off-by: Ofir Bitton Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/gaudi2/gaudi2_security.c | 1 + 1 file

[PATCH 04/10] accel/habanalabs: check fw version using sw version

2023-04-18 Thread Oded Gabbay
From: Dafna Hirschfeld The fw inner version is less trustable, instead use the fw general sw release version. Signed-off-by: Dafna Hirschfeld Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/common/habanalabs.h | 10 -- drivers/accel/habanalabs/gaudi2

[PATCH 01/10] accel/habanalabs: add helper to extract the FW major/minor

2023-04-18 Thread Oded Gabbay
From: Dafna Hirschfeld the helper is extract_u32_until_given_char and can later be used to also get the major/minor of the sw version. Signed-off-by: Dafna Hirschfeld Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/common/firmware_if.c | 69

[PATCH 4/4] accel/habanalabs: fix bug in free scratchpad memory

2023-04-16 Thread Oded Gabbay
From: Moti Haimovski This commit fixes a bug in Gaudi2 when freeing the scratchpad memory in case software init fails. Signed-off-by: Moti Haimovski Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/gaudi2/gaudi2.c | 4 ++-- 1 file changed, 2 insertions(+), 2

[PATCH 2/4] accel/habanalabs: allow user to modify EDMA RL register

2023-04-16 Thread Oded Gabbay
From: Rakesh Ughreja EDMA transpose workload requires to signal for every activation. User FW sends all the dummy signals to RD_LBW_RATE_LIM_CFG, to save lbw bandwidth. We need the user to be able to access that register to configure it. Signed-off-by: Rakesh Ughreja Reviewed-by: Oded Gabbay

[PATCH 3/4] accel/habanalabs: remove commented code that won't be used

2023-04-16 Thread Oded Gabbay
From: Koby Elbaz Once it was decided that these security settings are to be done by FW rather than by the driver, there's no reason to keep them in the code. Signed-off-by: Koby Elbaz Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/gaudi2/gaudi2_security.c

[PATCH 1/4] accel/habanalabs: ignore false positive razwi

2023-04-16 Thread Oded Gabbay
is a false positive. The Driver should not "count" a PSOC RAZWI event error when the caused the address is zeroed. Signed-off-by: Tal Cohen Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/gaudi2/gaudi2.c | 43 +++- 1 file changed, 27

Re: [PATCH] accel/habanalabs: remove variable gaudi_irq_name

2023-04-16 Thread Oded Gabbay
1_0", "gaudi cq 1_1", "gaudi cq 1_2", "gaudi cq > 1_3", > - "gaudi cq 5_0", "gaudi cq 5_1", "gaudi cq 5_2", "gaudi cq > 5_3", > - "gaudi cpu eq" > -}; > - > static const u8 gaudi_dma_assignment[GAUDI_DMA_MAX] = { > [GAUDI_PCI_DMA_1] = GAUDI_ENGINE_ID_DMA_0, > [GAUDI_PCI_DMA_2] = GAUDI_ENGINE_ID_DMA_1, > -- > 2.27.0 > Reviewed-by: Oded Gabbay Applied to -next Thanks, Oded

<    1   2   3   4   5   6   7   8   9   10   >