[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-26 Thread Zhanglei Mao
To summary #1 Symptom: On AMD EPYC, ROME server platform, SATA hot plug not working on Ubuntu 22.04 LTS. #2 Root cause: Ubuntu kernel compile with configure CONFIG_SATA_MOBILE_LPM_POLICY=3. During devices scan ( boot, pci scan, ahci driver load), if didn't detected any valid

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-24 Thread Zhanglei Mao
@Mario, May I ask you another 2 extending questions. A. What is results in a AMD client( like laptop)? I thought this hot plug works out of box in client, what cause this difference from kernel/code? B. I was told that hot plug works out of box on their Intel's server. What cause this

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-24 Thread Mario Limonciello
@KH: Thanks for sharing that! I agree with you. I've sent this up to explicitly document the new behavior. https://lore.kernel.org/linux-ide/20220524170508.563-4-mario.limoncie...@amd.com/T/#u -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-24 Thread Zhanglei Mao
@Mario, Thanks for those deep and detail analyse for the root cause. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1971576 Title: SATA device hot plug regression on AMD EPYC (Asus) server To

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-24 Thread Kai-Heng Feng
According to AHCI spec v1.3.1, "7.3 Native Hot Plug Support", once LPM is enabled the hotplug needs to be disabled. So I agree with 2), I think we should write document and let users know how to change the LPM for hotplugging detection. For 3) I don't think we need to change the way it works,

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-23 Thread Mario Limonciello
0-> ATA_LPM_UNKNOWN, 1-> ATA_LPM_MAX_POWER, 2-> ATA_LPM_MED_POWER, 3-> ATA_LPM_MED_POWER_WITH_DIPM, /* Med power + DIPM as win IRST does */ 4-> ATA_LPM_MIN_POWER_WITH_PARTIAL, /* Min Power + partial and slumber */ 5-> ATA_LPM_MIN_POWER, /* Min power + no partial (slumber

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-23 Thread Zhanglei Mao
Hi Mario, The test results for 5.18.0-4 are below: Kernel parameters sata hot plug works or not default No hci.mobile_lpm_policy=0 Yes hci.mobile_lpm_policy=1 Yes hci.mobile_lpm_policy=2 Yes By the way, what is

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-23 Thread Kai-Heng Feng
Can you please collect the trace again with "scsi" prefix: $ sudo trace-cmd record -p function -l "*sata*" -l "*ahci*" -l "*scsi*" Thanks again! -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1971576

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-20 Thread Zhanglei Mao
For #40, trace data for hotplug not working ** Attachment added: "trace-no-hotplug.zip" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1971576/+attachment/5591529/+files/trace-no-hotplug.zip -- You received this bug notification because you are a member of Ubuntu Bugs, which is

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-20 Thread Zhanglei Mao
For #39, Marios's,partner said all fails ( hotplug was not working). I guess we were expected it can work. So I will check him if he set kernel parameter correctly. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-20 Thread Zhanglei Mao
For #40, I asked them to collect trace data on both hotplug working and not. ** Attachment added: "trace-hotplug.zip" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1971576/+attachment/5591527/+files/trace-hotplug.zip -- You received this bug notification because you are a member of

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-19 Thread Kai-Heng Feng
I guess the hotplug event is filtered out. Please collect ftrace log so we can investigate the issue: $ sudo trace-cmd record -p function -l "*sata*" -l "*ahci*" -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-18 Thread Mario Limonciello
On any failing kernel, can you try to add "ahci.mobile_lpm_policy=0" to the kernel command line and confirm if that fixes things? If it does, could you also compare "ahci.mobile_lpm_policy=1" and "ahci.mobile_lpm_policy=2"? -- You received this bug notification because you are a member of

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-18 Thread Zhanglei Mao
Hey Haled, Mario, Both 5.15 and 5.18 are failed, kernel were from (Khaled's) #37 building. "uname -r "output from screenshots are: 5.15.0-32-generic 5.18.0-4-generic thanks! -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-18 Thread Khaled El Mously
Hey Zhanglei. We do not have a 5.16 Ubuntu kernel. We do have 5.15 kernels. The current mainline version is 5.18 not 5.16. I have built 2 kernels, one 5.18 and one 5.15 kernel for you. 5.15: https://kernel.ubuntu.com/~kmously/kernel-kmously-45afdcc-gwrv/ 5.18:

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-17 Thread Mario Limonciello
No need to build a kernel, at least for a quick test you can pick up one from the mainline PPA and try it. https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.18-rc7/ -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-17 Thread Zhanglei Mao
Hello Kahled, For Mario's comments#33, will you or can you build a main kernel (v5.16) for testing? //thanks. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1971576 Title: SATA device hot plug

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-17 Thread Zhanglei Mao
Hello Kahled, First, thanks for your effort to find this root cause in a short time. For your comment #31, parnter has confirmed it is same for other type of disk. I also asked them to raise this issue to AMD and AMD technical guy reply below. "Yes, 1022:7901h is AMD SATA AHCI

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-17 Thread Mario Limonciello
> but I believe you are using an AMD EPYC server, so I don't understand why you would be affected at all. It may be that this server silicon has the same HW IP as the client chip. The change was tested on client chips before submitting. What it is supposed to do is set the policy for the drives

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-17 Thread Kai-Heng Feng
Can you please give mainline kernel a try? If mainline kernel still doesn't work, please run the following: $ sudo trace-cmd record -p function -l "*ata*" -l "*ahci*" ... then plug the disk, Ctrl + C on trace-cmd, attach trace.dat here. -- You received this bug notification because you are a

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-17 Thread Khaled El Mously
Thanks Zhanglei. Great. We have identified the problem patch, which is this one: 380cd49e207ba4 ata: ahci: Add Green Sardine vendor ID as board_ahci_mobile But I am not really sure why this patch is causing a problem. The patch only adds one new line as you can see here:

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-17 Thread Zhanglei Mao
Hello Khaled, the Version 3007 failed. thanks! //Mao -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1971576 Title: SATA device hot plug regression on AMD EPYC (Asus) server To manage notifications

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-16 Thread Khaled El Mously
Hello Zhanglei. Thanks. This means: 3000: BUG 3002: NO BUG 3003: NO BUG 3004: NO BUG 3005: BUG 3006: NO BUG There are only 5 patches between 3005 and 3006 so one of them is the problem. You can see the list of patches here: https://pastebin.canonical.com/p/TkvDGcfHWk/plain/ Only one of them

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-16 Thread Zhanglei Mao
Hello Khaled, the 3006 kernel can pass. thanks //Mao -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1971576 Title: SATA device hot plug regression on AMD EPYC (Asus) server To manage notifications

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-14 Thread Khaled El Mously
Hello Zhanglei. Thanks for the update. This means: 3000: BUG 3002: NO BUG 3003: NO BUG 3004: NO BUG 3005: BUG We are getting very close now. Please try version 3006 from this link: https://kernel.ubuntu.com/~kmously/kernel-kmously-c756bab-RGcD/ Please make sure you are running 3006. Thanks.

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-13 Thread Zhanglei Mao
Hello Khaled, Version 3005 hot plug fail. Thanks. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1971576 Title: SATA device hot plug regression on AMD EPYC (Asus) server To manage notifications

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-12 Thread Khaled El Mously
Hello again Zhanglei. This means that so far: 3000: BUG 3002: NO BUG 3003: NO BUG 3004: NO BUG You can find kernel 3005 here: https://kernel.ubuntu.com/~kmously/kernel-kmously-039f206-aRqC/ Please ensure you are running version -3005. Thank you -- You received this bug notification because

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-12 Thread Khaled El Mously
Thanks Zhanglei. I am building 3005 now. It should be ready in about 30 minutes. I will update again soon -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1971576 Title: SATA device hot plug

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-12 Thread Zhanglei Mao
Hello Kahled, Version 3004 hot plug can pass. thanks //Mao -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1971576 Title: SATA device hot plug regression on AMD EPYC (Asus) server To manage

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-12 Thread Khaled El Mously
Thanks Zhanglei. This means that so far: 3000: BUG 3002: NO BUG 3003: NO BUG We are getting closer. This is the remaining set of patches: 00501b41aaf73f (tag: test-3000, tag: fail1) s390/pci: move pseudo-MMIO to prevent MIO overlap 14914e943b0ca5 cpufreq: Fix get_cpu_device() failure in

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-11 Thread Zhanglei Mao
Hello Kahled, Version 3003 hot plug passed. //thanks -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1971576 Title: SATA device hot plug regression on AMD EPYC (Asus) server To manage notifications

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-11 Thread Zhanglei Mao
Hello Kahled, thanks for quick response. Your understanding for 3000 bug and 3002 no bug is correct. I have asked them to verify 3003 now. //thanks -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-11 Thread Khaled El Mously
Hello Zhanglei. Thanks for the update. From my understanding, so far: 3000: BUG 3002: NO BUG I have the next kernel, 3003, available here: https://kernel.ubuntu.com/~kmously/kernel-kmously-8243717-Kutu Once again, please ensure that you are testing with -3003 when testing. Thank you --

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-11 Thread Zhanglei Mao
Hello Khaled, Version 3002 hot plug can pass. //thanks. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1971576 Title: SATA device hot plug regression on AMD EPYC (Asus) server To manage

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-10 Thread Zhanglei Mao
Hello Khaled, thanks for share detail information for your patching and explains of unable building mulit-kernel. The partner engineer did use 3000 kernel for sata hotplug test as they sent me a screenshot of "uname -r" output. I have asked them to test new 3002 kernel now. -- You received

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-10 Thread Khaled El Mously
Hello Zhanglei, Thanks for the update. I am a little surprised that this kernel failed. There are 2 SATA related changes in kernel -100 which I suspected were the root cause. However, the kernel that I provided (version 3000) did NOT contain those patches, so I expected it to work. The patches

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-10 Thread Zhanglei Mao
Hello Khale, hot plug test is fail for this version kernel. Please build next kernel. By the way, if it possible to build multi kernel, so that they can test them all in one shot. You know, the partner engineer is working on home this week and he have to look for someone else on office each time.

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-10 Thread Zhanglei Mao
Hello Khaled, I have asked partner engineer to test your build kernel. thanks. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1971576 Title: SATA device hot plug regression on AMD EPYC (Asus) server

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-09 Thread Khaled El Mously
Hello Zhanglei, thanks for confirming the working/broken versions. I am not sure if I will be able to reproduce the issue myself. There are 270 changes between -99 and -100. If you can help me bisect them, we should be able to quickly identify the problem. Would you be able to test the kernels I

[Bug 1971576] Re: SATA device hot plug regression on AMD EPYC (Asus) server

2022-05-08 Thread Zhanglei Mao
** Summary changed: - SATA device hot plug regression on AMD EYPC (Asus) server + SATA device hot plug regression on AMD EPYC (Asus) server -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1971576