Re: [RESEND] [PATCH v3] cxl: Prevent adapter reset if an active context exists
On 04/11/16 23:07, Frederic Barrat wrote: When I inject an EEH error, this patch causes the following WARN. Thoughts? mmm, hard to see a relation with that patch. I couldn't reproduce either. Could it bear any relation with the patch you're working on (lspci called while the capi device is unconfigured)? No, this was without any other patches... [ 60.593116] pci :01 : [PE# 000] Switching PHB to CXL [ 60.622727] Adapter context unlocked with 0 active contexts [ 60.622762] [ cut here ] [ 60.622771] WARNING: CPU: 12 PID: 627 at ../drivers/misc/cxl/main.c:325 cxl_adapter_context_unlock+0x60/0x80 [cxl] [ 60.622772] Modules linked in: fuse powernv_rng rng_core leds_powernv powernv_op_panel led_class vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq multipath bnx2x mdio libcrc32c cxl [ 60.622794] CPU: 12 PID: 627 Comm: eehd Not tainted 4.9.0-rc1-ajd-6-g6fb17cc #4 [ 60.622795] task: c003be084900 task.stack: c003be108000 [ 60.622797] NIP: d4350be0 LR: d4350bdc CTR: c0492fd0 [ 60.622799] REGS: c003be10b660 TRAP: 0700 Not tainted (4.9.0-rc1-ajd-6-g6fb17cc) [ 60.622800] MSR: 90010282b033 [ 60.622810] CR: 28000282 XER: 2000 [ 60.622811] SOFTE: 1 CFAR: c094fc88 [ 60.622814] GPR00: d4350bdc c003be10b8e0 d4379ae8 002f [ 60.622818] GPR04: 0001 03b8 [ 60.622822] GPR08: 0001 [ 60.622826] GPR12: cfe03000 c00baac8 c003c5166500 [ 60.622830] GPR16: [ 60.622834] GPR20: c0b14fe8 [ 60.622837] GPR24: c0b14fc0 c003afc10400 c003b0c4 [ 60.622841] GPR28: c003c505a098 c003afc10400 0006 [ 60.622850] NIP [d4350be0] cxl_adapter_context_unlock+0x60/0x80 [cxl] [ 60.622856] LR [d4350bdc] cxl_adapter_context_unlock+0x5c/0x80 [cxl] [ 60.622857] Call Trace: [ 60.622863] [c003be10b8e0] [d4350bdc] cxl_adapter_context_unlock+0x5c/0x80 [cxl] (unreliable) [ 60.622871] [c003be10b940] [d435e810] cxl_configure_adapter+0x930/0x960 [cxl] [ 60.622879] [c003be10b9f0] [d435e88c] cxl_pci_slot_reset+0x4c/0x230 [cxl] [ 60.622883] [c003be10baa0] [c0032cd4] eeh_report_reset+0x164/0x1a0 [ 60.622887] [c003be10bae0] [c0031220] eeh_pe_dev_traverse+0x90/0x170 [ 60.622890] [c003be10bb70] [c0033354] eeh_handle_normal_event+0x3d4/0x520 [ 60.622892] [c003be10bc20] [c0033624] eeh_handle_event+0x44/0x360 [ 60.622895] [c003be10bcd0] [c0033a58] eeh_event_handler+0x118/0x1d0 [ 60.622898] [c003be10bd80] [c00babc8] kthread+0x108/0x130 [ 60.622902] [c003be10be30] [c000c0a0] ret_from_kernel_thread+0x5c/0xbc [ 60.622903] Instruction dump: [ 60.622905] 2f84 4dfe0020 7c0802a6 7c8407b4 3920 f8010010 f821ffa1 91230348 [ 60.622911] 3c62 e8638070 48016639 e8410018 <0fe0> 38210060 e8010010 7c0803a6 [ 60.622918] ---[ end trace d358551c9a007b4f ]--- [ 60.622959] cxl afu0.0: Activating AFU directed mode [ 60.623097] EEH: Notify device driver to resume That *definitely* looks related to this patch... Andrew -- Andrew Donnellan OzLabs, ADL Canberra andrew.donnel...@au1.ibm.com IBM Australia Limited
Re: [RESEND] [PATCH v3] cxl: Prevent adapter reset if an active context exists
On 05/11/16 00:15, Uma Krishnan wrote: Frederic/Andrew, Just recently this issue has been reported by system test without any of the two patches you are suspecting - this patch nor the lspci patch. I was hoping the lspci patch from Andrew can possibly solve it. System test CQ is SW370625. The stack reported in that is same, [ 5895.245959] EEH: PHB#2 failure detected, location: N/A [ 5895.246078] CPU: 19 PID: 121774 Comm: lspci Not tainted 3.10.0-514.el7.ppc64le #1 [ 5895.246240] Call Trace: [ 5895.246307] [c009f3707a60] [c0017ce0] show_stack+0x80/0x330 (unreliable) [ 5895.246501] [c009f3707b10] [c09b22f4] dump_stack+0x30/0x44 [ 5895.246665] [c009f3707b30] [c003b9ac] eeh_dev_check_failure+0x21c/0x580 [ 5895.246855] [c009f3707bd0] [c00879dc] pnv_pci_read_config+0xbc/0x160 [ 5895.247045] [c009f3707c10] [c0527d54] pci_user_read_config_dword+0x84/0x160 [ 5895.247233] [c009f3707c60] [c0547224] pci_read_config+0xf4/0x2e0 [ 5895.247398] [c009f3707ce0] [c03efb3c] read+0x10c/0x2a0 [ 5895.247561] [c009f3707da0] [c031d160] vfs_read+0x110/0x290 [ 5895.247726] [c009f3707de0] [c031ec70] SyS_pread64+0xb0/0xd0 This isn't a WARN - this stack trace is printed explicitly by the EEH code in the case of a PHB failure. arch/powerpc/kernel/eeh.c, line 403. Andrew -- Andrew Donnellan OzLabs, ADL Canberra andrew.donnel...@au1.ibm.com IBM Australia Limited
Re: [RESEND] [PATCH v3] cxl: Prevent adapter reset if an active context exists
Frederic/Andrew, Just recently this issue has been reported by system test without any of the two patches you are suspecting - this patch nor the lspci patch. I was hoping the lspci patch from Andrew can possibly solve it. System test CQ is SW370625. The stack reported in that is same, [ 5895.245959] EEH: PHB#2 failure detected, location: N/A [ 5895.246078] CPU: 19 PID: 121774 Comm: lspci Not tainted 3.10.0-514.el7.ppc64le #1 [ 5895.246240] Call Trace: [ 5895.246307] [c009f3707a60] [c0017ce0] show_stack+0x80/0x330 (unreliable) [ 5895.246501] [c009f3707b10] [c09b22f4] dump_stack+0x30/0x44 [ 5895.246665] [c009f3707b30] [c003b9ac] eeh_dev_check_failure+0x21c/0x580 [ 5895.246855] [c009f3707bd0] [c00879dc] pnv_pci_read_config+0xbc/0x160 [ 5895.247045] [c009f3707c10] [c0527d54] pci_user_read_config_dword+0x84/0x160 [ 5895.247233] [c009f3707c60] [c0547224] pci_read_config+0xf4/0x2e0 [ 5895.247398] [c009f3707ce0] [c03efb3c] read+0x10c/0x2a0 [ 5895.247561] [c009f3707da0] [c031d160] vfs_read+0x110/0x290 [ 5895.247726] [c009f3707de0] [c031ec70] SyS_pread64+0xb0/0xd0 Uma Krishnan On 11/4/2016 7:07 AM, Frederic Barrat wrote: Hi Andrew, Le 04/11/2016 à 07:27, Andrew Donnellan a écrit : On 14/10/16 20:38, Vaibhav Jain wrote: This patch prevents resetting the cxl adapter via sysfs in presence of one or more active cxl_context on it. This protects against an unrecoverable error caused by PSL owning a dirty cache line even after reset and host tries to touch the same cache line. In case a force reset of the card is required irrespective of any active contexts, the int value -1 can be stored in the 'reset' sysfs attribute of the card. The patch introduces a new atomic_t member named contexts_num inside struct cxl that holds the number of active context attached to the card , which is checked against '0' before proceeding with the reset. To prevent against a race condition where a context is activated just after reset check is performed, the contexts_num is atomically set to '-1' after reset-check to indicate that no more contexts can be activated on the card anymore. Before activating a context we atomically test if contexts_num is non-negative and if so, increment its value by one. In case the value of contexts_num is negative then it indicates that the card is about to be reset and context activation is error-ed out at that point. Cc: sta...@vger.kernel.org Fixes: 62fa19d4 ("cxl: Add ability to reset the card") Acked-by: Frederic Barrat Reviewed-by: Andrew Donnellan Signed-off-by: Vaibhav Jain When I inject an EEH error, this patch causes the following WARN. Thoughts? mmm, hard to see a relation with that patch. I couldn't reproduce either. Could it bear any relation with the patch you're working on (lspci called while the capi device is unconfigured)? Fred [ 55.965011] EEH: PHB#0 failure detected, location: N/A [ 55.965078] CPU: 20 PID: 9933 Comm: lspci Not tainted 4.9.0-rc1-ajd-6-g6fb17cc #4 [ 55.965080] Call Trace: [ 55.965091] [c0036818fab0] [c0950ec8] dump_stack+0xb0/0xf0 (unreliable) [ 55.965100] [c0036818faf0] [c002eb44] eeh_dev_check_failure+0x1e4/0x540 [ 55.965107] [c0036818fb90] [c0064090] pnv_pci_read_config+0xc0/0x130 [ 55.965114] [c0036818fbd0] [c04bec24] pci_user_read_config_dword+0x84/0x160 [ 55.965119] [c0036818fc20] [c04d12f4] pci_read_config+0x164/0x2a0 [ 55.965125] [c0036818fca0] [c0318e70] sysfs_kf_bin_read+0x70/0xc0 [ 55.965131] [c0036818fcc0] [c0317ff8] kernfs_fop_read+0xd8/0x260 [ 55.965136] [c0036818fd10] [c0278b7c] __vfs_read+0x3c/0x180 [ 55.965141] [c0036818fda0] [c0279e2c] vfs_read+0xac/0x1a0 [ 55.965146] [c0036818fde0] [c027bc24] SyS_pread64+0xb4/0xd0 [ 55.965152] [c0036818fe30] [c000bd20] system_call+0x38/0xfc [ 55.965171] EEH: Detected error on PHB#0 [ 55.965173] EEH: This PCI device has failed 1 times in the last hour [ 55.965174] EEH: Notify device drivers to shutdown [ 55.965182] cxl afu0.0: Deactivating AFU directed mode [ 55.965261] Harmless Hypervisor Maintenance interrupt [Recovered] [ 55.965263] Error detail: Unknown [ 55.965265] HMER: 8040 [ 55.965267] Harmless Hypervisor Maintenance interrupt [Recovered] [ 55.965268] Error detail: Unknown [ 55.965270] HMER: 8040 [ 55.965326] cxl afu0.0: PSL Purge called with link down, ignoring [ 55.965563] EEH: Collect temporary log [ 55.965565] PHB3 PHB#0 Diag-data (Version: 1) [ 55.965566] brdgCtl: [ 55.965568] UtlSts: 0020 [ 55.965570] RootSts: [ 55.965571] RootErrSts: [ 55.965572] RootErrLog: [ 55.965574] RootErrLog1:
Re: [RESEND] [PATCH v3] cxl: Prevent adapter reset if an active context exists
Hi Andrew, Le 04/11/2016 à 07:27, Andrew Donnellan a écrit : On 14/10/16 20:38, Vaibhav Jain wrote: This patch prevents resetting the cxl adapter via sysfs in presence of one or more active cxl_context on it. This protects against an unrecoverable error caused by PSL owning a dirty cache line even after reset and host tries to touch the same cache line. In case a force reset of the card is required irrespective of any active contexts, the int value -1 can be stored in the 'reset' sysfs attribute of the card. The patch introduces a new atomic_t member named contexts_num inside struct cxl that holds the number of active context attached to the card , which is checked against '0' before proceeding with the reset. To prevent against a race condition where a context is activated just after reset check is performed, the contexts_num is atomically set to '-1' after reset-check to indicate that no more contexts can be activated on the card anymore. Before activating a context we atomically test if contexts_num is non-negative and if so, increment its value by one. In case the value of contexts_num is negative then it indicates that the card is about to be reset and context activation is error-ed out at that point. Cc: sta...@vger.kernel.org Fixes: 62fa19d4 ("cxl: Add ability to reset the card") Acked-by: Frederic Barrat Reviewed-by: Andrew Donnellan Signed-off-by: Vaibhav Jain When I inject an EEH error, this patch causes the following WARN. Thoughts? mmm, hard to see a relation with that patch. I couldn't reproduce either. Could it bear any relation with the patch you're working on (lspci called while the capi device is unconfigured)? Fred [ 55.965011] EEH: PHB#0 failure detected, location: N/A [ 55.965078] CPU: 20 PID: 9933 Comm: lspci Not tainted 4.9.0-rc1-ajd-6-g6fb17cc #4 [ 55.965080] Call Trace: [ 55.965091] [c0036818fab0] [c0950ec8] dump_stack+0xb0/0xf0 (unreliable) [ 55.965100] [c0036818faf0] [c002eb44] eeh_dev_check_failure+0x1e4/0x540 [ 55.965107] [c0036818fb90] [c0064090] pnv_pci_read_config+0xc0/0x130 [ 55.965114] [c0036818fbd0] [c04bec24] pci_user_read_config_dword+0x84/0x160 [ 55.965119] [c0036818fc20] [c04d12f4] pci_read_config+0x164/0x2a0 [ 55.965125] [c0036818fca0] [c0318e70] sysfs_kf_bin_read+0x70/0xc0 [ 55.965131] [c0036818fcc0] [c0317ff8] kernfs_fop_read+0xd8/0x260 [ 55.965136] [c0036818fd10] [c0278b7c] __vfs_read+0x3c/0x180 [ 55.965141] [c0036818fda0] [c0279e2c] vfs_read+0xac/0x1a0 [ 55.965146] [c0036818fde0] [c027bc24] SyS_pread64+0xb4/0xd0 [ 55.965152] [c0036818fe30] [c000bd20] system_call+0x38/0xfc [ 55.965171] EEH: Detected error on PHB#0 [ 55.965173] EEH: This PCI device has failed 1 times in the last hour [ 55.965174] EEH: Notify device drivers to shutdown [ 55.965182] cxl afu0.0: Deactivating AFU directed mode [ 55.965261] Harmless Hypervisor Maintenance interrupt [Recovered] [ 55.965263] Error detail: Unknown [ 55.965265] HMER: 8040 [ 55.965267] Harmless Hypervisor Maintenance interrupt [Recovered] [ 55.965268] Error detail: Unknown [ 55.965270] HMER: 8040 [ 55.965326] cxl afu0.0: PSL Purge called with link down, ignoring [ 55.965563] EEH: Collect temporary log [ 55.965565] PHB3 PHB#0 Diag-data (Version: 1) [ 55.965566] brdgCtl: [ 55.965568] UtlSts: 0020 [ 55.965570] RootSts: [ 55.965571] RootErrSts: [ 55.965572] RootErrLog: [ 55.965574] RootErrLog1: [ 55.965575] nFir:8090 0030006e 8000 [ 55.965577] PhbSts: 001c 001c [ 55.965578] Lem: 0210 40018e2400022482 0010 [ 55.965582] OutErr: 0020 0020 [ 55.965584] InAErr: 8000 8000 0402 [ 55.965586] PE[ 0] A/B: 8000 8000 [ 55.965587] EEH: Reset without hotplug activity [ 60.592750] EEH: Notify device drivers the completion of reset [ 60.592760] cxl-pci :01:00.0: enabling device (0140 -> 0142) [ 60.593018] pci :01 : [PE# 000] Switching PHB to CXL [ 60.593116] pci :01 : [PE# 000] Switching PHB to CXL [ 60.622727] Adapter context unlocked with 0 active contexts [ 60.622762] [ cut here ] [ 60.622771] WARNING: CPU: 12 PID: 627 at ../drivers/misc/cxl/main.c:325 cxl_adapter_context_unlock+0x60/0x80 [cxl] [ 60.622772] Modules linked in: fuse powernv_rng rng_core leds_powernv powernv_op_panel led_class vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi scsi_transport_iscsi autofs4 btrfs raid
Re: [RESEND] [PATCH v3] cxl: Prevent adapter reset if an active context exists
On 14/10/16 20:38, Vaibhav Jain wrote: This patch prevents resetting the cxl adapter via sysfs in presence of one or more active cxl_context on it. This protects against an unrecoverable error caused by PSL owning a dirty cache line even after reset and host tries to touch the same cache line. In case a force reset of the card is required irrespective of any active contexts, the int value -1 can be stored in the 'reset' sysfs attribute of the card. The patch introduces a new atomic_t member named contexts_num inside struct cxl that holds the number of active context attached to the card , which is checked against '0' before proceeding with the reset. To prevent against a race condition where a context is activated just after reset check is performed, the contexts_num is atomically set to '-1' after reset-check to indicate that no more contexts can be activated on the card anymore. Before activating a context we atomically test if contexts_num is non-negative and if so, increment its value by one. In case the value of contexts_num is negative then it indicates that the card is about to be reset and context activation is error-ed out at that point. Cc: sta...@vger.kernel.org Fixes: 62fa19d4 ("cxl: Add ability to reset the card") Acked-by: Frederic Barrat Reviewed-by: Andrew Donnellan Signed-off-by: Vaibhav Jain When I inject an EEH error, this patch causes the following WARN. Thoughts? [ 55.965011] EEH: PHB#0 failure detected, location: N/A [ 55.965078] CPU: 20 PID: 9933 Comm: lspci Not tainted 4.9.0-rc1-ajd-6-g6fb17cc #4 [ 55.965080] Call Trace: [ 55.965091] [c0036818fab0] [c0950ec8] dump_stack+0xb0/0xf0 (unreliable) [ 55.965100] [c0036818faf0] [c002eb44] eeh_dev_check_failure+0x1e4/0x540 [ 55.965107] [c0036818fb90] [c0064090] pnv_pci_read_config+0xc0/0x130 [ 55.965114] [c0036818fbd0] [c04bec24] pci_user_read_config_dword+0x84/0x160 [ 55.965119] [c0036818fc20] [c04d12f4] pci_read_config+0x164/0x2a0 [ 55.965125] [c0036818fca0] [c0318e70] sysfs_kf_bin_read+0x70/0xc0 [ 55.965131] [c0036818fcc0] [c0317ff8] kernfs_fop_read+0xd8/0x260 [ 55.965136] [c0036818fd10] [c0278b7c] __vfs_read+0x3c/0x180 [ 55.965141] [c0036818fda0] [c0279e2c] vfs_read+0xac/0x1a0 [ 55.965146] [c0036818fde0] [c027bc24] SyS_pread64+0xb4/0xd0 [ 55.965152] [c0036818fe30] [c000bd20] system_call+0x38/0xfc [ 55.965171] EEH: Detected error on PHB#0 [ 55.965173] EEH: This PCI device has failed 1 times in the last hour [ 55.965174] EEH: Notify device drivers to shutdown [ 55.965182] cxl afu0.0: Deactivating AFU directed mode [ 55.965261] Harmless Hypervisor Maintenance interrupt [Recovered] [ 55.965263] Error detail: Unknown [ 55.965265] HMER: 8040 [ 55.965267] Harmless Hypervisor Maintenance interrupt [Recovered] [ 55.965268] Error detail: Unknown [ 55.965270] HMER: 8040 [ 55.965326] cxl afu0.0: PSL Purge called with link down, ignoring [ 55.965563] EEH: Collect temporary log [ 55.965565] PHB3 PHB#0 Diag-data (Version: 1) [ 55.965566] brdgCtl: [ 55.965568] UtlSts: 0020 [ 55.965570] RootSts: [ 55.965571] RootErrSts: [ 55.965572] RootErrLog: [ 55.965574] RootErrLog1: [ 55.965575] nFir:8090 0030006e 8000 [ 55.965577] PhbSts: 001c 001c [ 55.965578] Lem: 0210 40018e2400022482 0010 [ 55.965582] OutErr: 0020 0020 [ 55.965584] InAErr: 8000 8000 0402 [ 55.965586] PE[ 0] A/B: 8000 8000 [ 55.965587] EEH: Reset without hotplug activity [ 60.592750] EEH: Notify device drivers the completion of reset [ 60.592760] cxl-pci :01:00.0: enabling device (0140 -> 0142) [ 60.593018] pci :01 : [PE# 000] Switching PHB to CXL [ 60.593116] pci :01 : [PE# 000] Switching PHB to CXL [ 60.622727] Adapter context unlocked with 0 active contexts [ 60.622762] [ cut here ] [ 60.622771] WARNING: CPU: 12 PID: 627 at ../drivers/misc/cxl/main.c:325 cxl_adapter_context_unlock+0x60/0x80 [cxl] [ 60.622772] Modules linked in: fuse powernv_rng rng_core leds_powernv powernv_op_panel led_class vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq multipath bnx2x mdio libcrc32c cxl [ 60.622794] CPU: 12 PID: 627 Comm: eehd Not tainted 4.9.0-rc1-ajd-6-g6fb17cc #4 [ 60.622795] task: c003be084900 t
[RESEND] [PATCH v3] cxl: Prevent adapter reset if an active context exists
This patch prevents resetting the cxl adapter via sysfs in presence of one or more active cxl_context on it. This protects against an unrecoverable error caused by PSL owning a dirty cache line even after reset and host tries to touch the same cache line. In case a force reset of the card is required irrespective of any active contexts, the int value -1 can be stored in the 'reset' sysfs attribute of the card. The patch introduces a new atomic_t member named contexts_num inside struct cxl that holds the number of active context attached to the card , which is checked against '0' before proceeding with the reset. To prevent against a race condition where a context is activated just after reset check is performed, the contexts_num is atomically set to '-1' after reset-check to indicate that no more contexts can be activated on the card anymore. Before activating a context we atomically test if contexts_num is non-negative and if so, increment its value by one. In case the value of contexts_num is negative then it indicates that the card is about to be reset and context activation is error-ed out at that point. Cc: sta...@vger.kernel.org Fixes: 62fa19d4 ("cxl: Add ability to reset the card") Acked-by: Frederic Barrat Reviewed-by: Andrew Donnellan Signed-off-by: Vaibhav Jain --- Changelog: RESEND v3 * Marked the patch for stable and added sign-offs & Fixes tag v3..v1 -> * Context-lock is now taken earlier in cxl_start_context to prevent against leaking ctx->pid in error path as pointed out by Frederic Barrat. * Fixed tabs that sneaked their way into sysfs-class-cxl changes. Thanks Andrew Donnellan for catching that. v2..v1 -> * Addressed following review comments from Frederic Barrat: - Spell error changing 'Incase' to 'In case'. - Changed the comment description for context_num member to use a slightly more universal notation for integers. - Added cleanup code for context irqs in case context lock is taken. - Added a new function called cxl_adapter_context_unlock that sets context_num to '0' (forcibly if needed). - cxl adapter struct when allocated is initialized with context lock taken and released when the card config is complete. - Simplified code flow in function reset_adapter_store. --- Documentation/ABI/testing/sysfs-class-cxl | 7 -- drivers/misc/cxl/api.c| 9 +++ drivers/misc/cxl/context.c| 3 +++ drivers/misc/cxl/cxl.h| 24 ++ drivers/misc/cxl/file.c | 11 drivers/misc/cxl/guest.c | 3 +++ drivers/misc/cxl/main.c | 42 ++- drivers/misc/cxl/pci.c| 2 ++ drivers/misc/cxl/sysfs.c | 27 +--- 9 files changed, 121 insertions(+), 7 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-class-cxl b/Documentation/ABI/testing/sysfs-class-cxl index 4ba0a2a..640f65e 100644 --- a/Documentation/ABI/testing/sysfs-class-cxl +++ b/Documentation/ABI/testing/sysfs-class-cxl @@ -220,8 +220,11 @@ What: /sys/class/cxl//reset Date: October 2014 Contact:linuxppc-dev@lists.ozlabs.org Description:write only -Writing 1 will issue a PERST to card which may cause the card -to reload the FPGA depending on load_image_on_perst. +Writing 1 will issue a PERST to card provided there are no +contexts active on any one of the card AFUs. This may cause +the card to reload the FPGA depending on load_image_on_perst. +Writing -1 will do a force PERST irrespective of any active +contexts on the card AFUs. Users: https://github.com/ibm-capi/libcxl What: /sys/class/cxl//perst_reloads_same_image (not in a guest) diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c index f3d34b9..af23d7d 100644 --- a/drivers/misc/cxl/api.c +++ b/drivers/misc/cxl/api.c @@ -229,6 +229,14 @@ int cxl_start_context(struct cxl_context *ctx, u64 wed, if (ctx->status == STARTED) goto out; /* already started */ + /* +* Increment the mapped context count for adapter. This also checks +* if adapter_context_lock is taken. +*/ + rc = cxl_adapter_context_get(ctx->afu->adapter); + if (rc) + goto out; + if (task) { ctx->pid = get_task_pid(task, PIDTYPE_PID); ctx->glpid = get_task_pid(task->group_leader, PIDTYPE_PID); @@ -240,6 +248,7 @@ int cxl_start_context(struct cxl_context *ctx, u64 wed, if ((rc = cxl_ops->attach_process(ctx, kernel, wed, 0))) { pu