[Bug 1771557] Re: NVMe boot drives not supported - failing in generating initramfs

2018-05-18 Thread Guilherme G. Piccoli
Thanks a lot Eric and Brian for handling the SRU process. I just tested the packages initramfs-tools and initramfs-tools-bin version 0.103ubuntu4.11, from trusty-proposed, following the test procedure of this LP's description. Everything is working fine, so I'll mark this as verification-done.

[Bug 1771557] Re: NVMe boot drives not supported - failing in generating initramfs

2018-05-16 Thread Guilherme G. Piccoli
Fixed in Debian ** Project changed: linux => initramfs-tools ** Changed in: initramfs-tools Status: Unknown => Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1771557 Title:

[Bug 1771557] Re: NVMe boot drives not supported - failing in generating initramfs

2018-05-16 Thread Guilherme G. Piccoli
** Changed in: initramfs-tools (Ubuntu) Status: In Progress => Fix Released ** Changed in: initramfs-tools (Ubuntu) Importance: High => Medium ** Changed in: initramfs-tools (Ubuntu) Milestone: trusty-updates => None -- You received this bug notification because you are a member

[Bug 1771557] [NEW] NVMe boot drives not supported - failing in generating initramfs

2018-05-16 Thread Guilherme G. Piccoli
** Affects: initramfs-tools Importance: Unknown Status: Fix Released ** Affects: initramfs-tools (Ubuntu) Importance: High Assignee: Guilherme G. Piccoli (gpiccoli) Status: In Progress ** Affects: initramfs-tools (Ubuntu Trusty) Importance: High Assignee:

[Bug 1771557] Re: NVMe boot drives not supported - failing in generating initramfs

2018-05-16 Thread Guilherme G. Piccoli
mportance: Undecided => High ** Changed in: initramfs-tools (Ubuntu Trusty) Status: New => In Progress ** Changed in: initramfs-tools (Ubuntu Trusty) Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli) ** Changed in: initramfs-tools (Ubuntu Trusty) Milestone: None => tr

[Bug 1771557] Re: NVMe boot drives not supported - failing in generating initramfs

2018-05-16 Thread Guilherme G. Piccoli
** Description changed: - [Impact] - The initramfs-tools hook-functions script cannot translate nvmeXnYpZ to nvmeXnY block device, so it's failing and not building the initram disk. + [Impact] - Upstream solution is composed for at least 2 patches (it's a series, but + When creating the

[Bug 1771557] Re: NVMe boot drives not supported - failing in generating initramfs

2018-05-16 Thread Guilherme G. Piccoli
This solution was suggested by Szilard Cserey and further improved by Dan Streetman - thanks both! ** Patch added: "lp1771557_v1.debdiff" https://bugs.launchpad.net/ubuntu/+source/initramfs-tools/+bug/1771557/+attachment/5140444/+files/lp1771557_v1.debdiff -- You received this bug

[Bug 1771108] Re: package nvidia-340 (not installed) failed to install/upgrade: попытка перезаписать «/lib/udev/rules.d/71-nvidia.rules», который уже имеется в пакете nvidia-kernel-common-390 390.48-0

2018-06-05 Thread Guilherme G. Piccoli
Hi Anton, thank you! So, if you're not experiencing this bug anymore, I'd suggest to mark this LP as invalid. Cheers, Guilherme -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1771108 Title:

[Bug 1770269] Re: package nvidia-340 340.106-0ubuntu3 failed to install/upgrade: installed nvidia-340 package post-installation script subprocess returned error exit status 139

2018-06-05 Thread Guilherme G. Piccoli
Thanks a lot Mike! In fact, it seems (from the logs) that the modules are building fine, the issue happens after the build - which is different for the issue I've seen in my system (caused by failure in the build process). Given you got a segfault in the end of the installation process, I'd

[Bug 1770497] Re: package nvidia-340 (not installed) failed to install/upgrade: trying to overwrite '/lib/udev/rules.d/71-nvidia.rules', which is also in package nvidia-kernel-common-390 390.48-0ubunt

2018-05-27 Thread Guilherme G. Piccoli
Hello, I was experiencing something similar. I'd like to request some data from you; seems we might have duplicate bugs, so I'm trying to narrowing if they all are reporting the same failures, and I have a potential fix for it. Please, try to rebuild the packages by running (as root):

[Bug 1770269] Re: package nvidia-340 340.106-0ubuntu3 failed to install/upgrade: installed nvidia-340 package post-installation script subprocess returned error exit status 139

2018-05-27 Thread Guilherme G. Piccoli
Hello, I was experiencing something similar. I'd like to request some data from you; seems we might have duplicate bugs, so I'm trying to narrowing if they all are reporting the same failures, and I have a potential fix for it. Please, try to rebuild the packages by running (as root):

[Bug 1771413] Re: package nvidia-340 340.106-0ubuntu3 failed to install/upgrade: new nvidia-340 package pre-installation script subprocess returned error exit status 1

2018-05-27 Thread Guilherme G. Piccoli
Hello, I was experiencing something similar. I'd like to request some data from you; seems we might have duplicate bugs, so I'm trying to narrowing if they all are reporting the same failures, and I have a potential fix for it. Please, try to rebuild the packages by running (as root):

[Bug 1771108] Re: package nvidia-340 (not installed) failed to install/upgrade: попытка перезаписать «/lib/udev/rules.d/71-nvidia.rules», который уже имеется в пакете nvidia-kernel-common-390 390.48-0

2018-05-27 Thread Guilherme G. Piccoli
Hello, I was experiencing something similar. I'd like to request some data from you; seems we might have duplicate bugs, so I'm trying to narrowing if they all are reporting the same failures, and I have a potential fix for it. Please, try to rebuild the packages by running (as root):

[Bug 1744300] Re: bt_iter() crash due to NULL pointer

2018-01-19 Thread Guilherme G. Piccoli
** Also affects: linux (Ubuntu) Importance: Undecided Status: New ** No longer affects: linux-gcp (Ubuntu) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1744300 Title: bt_iter() crash

[Bug 1744300] [NEW] bt_iter() crash due to NULL pointer

2018-01-19 Thread Guilherme G. Piccoli
Public bug reported: SRU Justification: [Impact] The following crash was observed in Ubuntu 16.04 running linux-gcp kernel version 4.13 (specifically 4.13.0-1006.9): [ 10.972644] BUG: unable to handle kernel NULL pointer dereference at 0030 [ 10.980708] IP: bt_iter+0x31/0x50 [

[Bug 1744300] Re: bt_iter() crash due to NULL pointer

2018-01-19 Thread Guilherme G. Piccoli
** No longer affects: linux (Ubuntu Bionic) ** Changed in: linux (Ubuntu Xenial) Status: New => Fix Released ** Changed in: linux (Ubuntu Artful) Status: New => In Progress -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to

[Bug 1744300] Re: bt_iter() crash due to NULL pointer

2018-02-14 Thread Guilherme G. Piccoli
** Tags removed: verification-needed-artful ** Tags added: verification-done-artful -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1744300 Title: bt_iter() crash due to NULL pointer To manage

[Bug 1749961] Re: xhci_hcd: TRB DMA errors reported with ASMedia ASM1142 USB 3.1 Controller

2018-02-20 Thread Guilherme G. Piccoli
Patch was modified (by adding the PCI_ID of device 1142A, which confusingly is 1242!) and still the problem reproduces. New approaches to be tried soon. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1667750] Re: xhci_hcd: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13

2018-02-20 Thread Guilherme G. Piccoli
As an informative note: it was observed that adapter "ASMedia Technology Inc. ASM1142 USB 3.1 Host Controller [1b21:1242]" has a similar bug, but seems the quirk (from kernel commit 9da5a1092b13), even if applied to right PCI_ID, doesn't fix the issue. I guess this was the case from the user

[Bug 1749961] [NEW] xhci_hcd: TRB DMA errors reported with ASMedia ASM1142 USB 3.1 Controller

2018-02-16 Thread Guilherme G. Piccoli
Public bug reported: It was observed that while trying to use a 4K USB webcam connected to USB port provided by ASMedia ASM1142 USB 3.1 Controller, the webcam does not work and kernel log shows the following messages: [431.928016] xhci_hcd :12:00.0: ERROR Transfer event TRB DMA ptr not part

[Bug 1750013] Re: systemd-logind: memory leaks on session's connections (trusty-only)

2018-02-16 Thread Guilherme G. Piccoli
As already briefly mentioned, Trusty systemd's package has additional patches to customize systemd - specially, we have the addition of glue code to make Trusty's systemd work with cgmanager. This is necessary due to the choice of upstart as init manager - it prevents regular cgroup handling from

[Bug 1750013] [NEW] systemd-logind: memory leaks on session's connections (trusty-only)

2018-02-16 Thread Guilherme G. Piccoli
essions alive. This can be verified through the command "loginctl list-session" - each session that once connected is present there "forever". The memory leaks can eventually lead to OOM situation of this process. Debug progress will be tracked here, in this LP. ** Affects: systemd

[Bug 1750013] Re: systemd-logind: memory leaks on session's connections (trusty-only)

2018-02-16 Thread Guilherme G. Piccoli
After some debug, it was observed that the only path in which a session is freed (other than in user/seat free or manager free) is in garbage collecting, specifically in the function manager_gc(). It means a closing session should be somehow added to gc in order gc has a chance to validate if the

[Bug 1750013] Re: systemd-logind: memory leaks on session's connections (trusty-only)

2018-02-16 Thread Guilherme G. Piccoli
It was noticed that during the logind manager logic for creating a session based on dbus event, the sessions's initialization procedure will allocate 2 structures related to cgroup management of a session: controllers and reset_controllers. Both these structs are filled with cgroup controllers

[Bug 1750013] Re: systemd-logind: memory leaks on session's connections (trusty-only)

2018-02-16 Thread Guilherme G. Piccoli
** Changed in: systemd (Ubuntu) Importance: Undecided => Medium ** Changed in: systemd (Ubuntu) Status: New => In Progress -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1750013 Title:

[Bug 1749961] Re: xhci_hcd: TRB DMA errors reported with ASMedia ASM1142 USB 3.1 Controller

2018-02-16 Thread Guilherme G. Piccoli
** Changed in: linux (Ubuntu Trusty) Importance: Undecided => Medium ** Changed in: linux (Ubuntu Trusty) Status: New => In Progress ** Changed in: linux (Ubuntu Trusty) Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli) ** Changed in: linux (Ubuntu Xenial) I

[Bug 1750013] Re: systemd-logind: memory leaks on session's connections (trusty-only)

2018-02-16 Thread Guilherme G. Piccoli
** Changed in: systemd (Ubuntu Trusty) Importance: Undecided => Medium ** Changed in: systemd (Ubuntu Trusty) Status: New => In Progress ** Changed in: systemd (Ubuntu Trusty) Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli) ** Changed in: systemd (Ubun

[Bug 1785686] [NEW] makedumpfile not able to use compression in trusty kernel 3.13 (fallback to 'cp')

2018-08-06 Thread Guilherme G. Piccoli
Public bug reported: When kdumping in 3.13.0-85-generic, using the sysrq trigger "echo c > /proc/sysrq-trigger", I got the following messages: The kernel version is not supported. The created dumpfile may be incomplete. check_release: Can't get the kernel version. makedumpfile Failed. *

[Bug 1770269] Re: package nvidia-340 340.106-0ubuntu3 failed to install/upgrade: installed nvidia-340 package post-installation script subprocess returned error exit status 139

2018-07-06 Thread Guilherme G. Piccoli
Wow, that's interesting. Glad it works now, I'll close the LP then. Thanks for the report, if it ever happens again let us know. Cheers, Guilherme ** Changed in: nvidia-graphics-drivers-340 (Ubuntu) Status: New => Invalid -- You received this bug notification because you are a member

[Bug 1750013] Re: systemd-logind: memory leaks on session's connections (trusty-only)

2018-03-12 Thread Guilherme G. Piccoli
We had a potential regression reported after some users installed this package from -proposed. The reports starts on comment #17 in LP: #1303649 . Summary: users are reporting high CPU loads from both systemd-logind and cgmanager processes, as well as delays in logins. I'm investigating to

[Bug 1750013] Re: systemd-logind: memory leaks on session's connections (trusty-only)

2018-03-12 Thread Guilherme G. Piccoli
Thanks Stéphane, the 2nd point is a reasonable consideration. Regardless if I could provide a fix to cgmanager for the regression, certainly it won't be added again as a dependency. Cheers, Guilherme -- You received this bug notification because you are a member of Ubuntu Bugs, which is

[Bug 1316970] Re: g_dbus memory leak in lrmd

2018-03-01 Thread Guilherme G. Piccoli
Dan, thanks for your v4! I just tested it and the leak seems much less intense - the patch mitigated it very fine. But still...we're leaking 16K/hour. I'll investigate this further. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1303649] Re: systemd-logind spins in cgmanager_ping_sync()

2018-03-12 Thread Guilherme G. Piccoli
Thanks Dale and Marcelo for your quick report - this is really useful. The problem seem to be caused due to cgmanager being added as dependency of systemd in -proposed - this request was clearly explained in LP #1750013 (not a dup for this, it's another issue with systemd-logind). During my test

[Bug 1750013] Re: systemd-logind: memory leaks on session's connections (trusty-only)

2018-03-09 Thread Guilherme G. Piccoli
Thanks Dan and Brian. Just tested: after 1h and more than 5500 SSH sessions, no leaks at all were observed. (As a comparison, testing the "old" version 20.26 I got 1.8 MB of leak in half an hour!) Cheers, Guilherme ** Tags removed: sts-sru-needed verification-needed

[Bug 1303649] Re: systemd-logind spins in cgmanager_ping_sync()

2018-04-03 Thread Guilherme G. Piccoli
Folks that are experiencing this issue: the best way to circumvent it for now I guess it's downgrade systemd package to version 204-5ubuntu20.26 and remove cgmanager. To remove cgmanager: "sudo apt-get remove cgmanager" To downgrade systemd version: "sudo apt-get install systemd-

[Bug 1749961] Re: xhci_hcd: TRB DMA errors reported with ASMedia ASM1142 USB 3.1 Controller

2018-04-10 Thread Guilherme G. Piccoli
Hi Imperia, I built a mainline kernel (version 4.16) with a different quirk that I think might help here. Can you test it? Thanks in advance! Instructions (run all as root): 1) wget people.canonical.com/~gpiccoli/imperia416.tgz 2) mv imperia416.tgz / 3) tar -zxf imperia416.tgz 4)

[Bug 1316970] Re: g_dbus memory leak in lrmd

2018-04-05 Thread Guilherme G. Piccoli
Thanks Łukasz, I've tested the pacemaker packages in -proposed and they fixed the issue for me. The version I've tested is: 1.1.10+git20130802-1ubuntu2.5 . The test was performed like this: I have a test-case that I ran against the version in trusty-updates (call it baseline); then I performed

[Bug 1750013] Re: systemd-logind: memory leaks on session's connections (trusty-only)

2018-04-12 Thread Guilherme G. Piccoli
Hi Łukasz, thanks for the heads-up. I just tested systemd from -proposed, version 204-5ubuntu20.28 and it fixes the issue reported in this LP. The test consists in doing multiple SSHs in a loop (better explained in the description). I've run the test with and without cgmanager installed, in

[Bug 1303649] Re: systemd-logind spins in cgmanager_ping_sync()

2018-04-12 Thread Guilherme G. Piccoli
For the folks that observed the recent cgmanager issue with logind: there's a new systemd version in -proposed; it's version "204-5ubuntu20.28". I've just tested it with and without cgmanager, in kernels 4.4.0-119 and 3.13.0-145 (according to LP #1750013), and didn't observe any constant CPU

[Bug 1303649] Re: systemd-logind spins in cgmanager_ping_sync()

2018-04-15 Thread Guilherme G. Piccoli
Thanks a lot for your feedback Marcelo; I'm glad everything seems fine now. Cheers, Guilherme -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1303649 Title: systemd-logind spins in

[Bug 1750013] Re: systemd-logind: memory leaks on session's connections (trusty-only)

2018-04-19 Thread Guilherme G. Piccoli
As mentioned by Dan in the above comments, some failures in autopkgtest like: autopkgtest [14:44:48]: test ubuntu-regression-suite: [--- Source Package Version: 4.4.0-1017.17 Running Kernel Version: 3.13.0-145.194 ERROR: running version does not match source package Are

[Bug 1749961] Re: xhci_hcd: TRB DMA errors reported with ASMedia ASM1142 USB 3.1 Controller

2018-03-28 Thread Guilherme G. Piccoli
Nice imperia, thanks for the report here. First we need to be sure it's exactly the same adapter. Can you provide the the output of "lspci -nn" ? Then, if it's the same adapter: 0) Which Ubuntu version are you running? Which kernel version are you using? Can you try in the latest 4.13 for

[Bug 1749961] Re: xhci_hcd: TRB DMA errors reported with ASMedia ASM1142 USB 3.1 Controller

2018-03-29 Thread Guilherme G. Piccoli
Thanks a lot Imperia! It's indeed the same PCI adapter, and it's even better you're running an upstream kernel like this. I'll analyze your logs in order to match with the ones I have here. I might need some xhci traces to understand the TRBs operations (like the enqueue and completion of TRBs).

[Bug 1750013] Re: systemd-logind: memory leaks on session's connections (trusty-only)

2018-04-02 Thread Guilherme G. Piccoli
Thank you Mauro! One thing that worth to take a look is that you're using kernel 3.13, and this could be related to the high CPU utilization issue you're observing. We're changing the approach of this fix to not rely on cgmanager anymore. The CPU utilization issue is however another bug that

[Bug 1749961] Re: xhci_hcd: TRB DMA errors reported with ASMedia ASM1142 USB 3.1 Controller

2018-04-02 Thread Guilherme G. Piccoli
Thanks again Imperia, the traces are fine. They're only 25MB, shouldn't have caused any kind of disk issues, like out of space condition. Also, I'd like to see the correlated kernel log to match the problematic TRBs from the kernel log with trace information. Can you provide me the relevant

[Bug 1750013] Re: systemd-logind: memory leaks on session's connections (trusty-only)

2018-04-03 Thread Guilherme G. Piccoli
Dan, this is the v3 of the patch. I bumped the version to 20.28 since my proposed 20.27 caused the regression aforementioned. For this version, I removed the dependency of cgmanager, along with the code that added closing sessions to garbage collector. Happens that a similar code is present on

[Bug 1749961] Re: xhci_hcd: TRB DMA errors reported with ASMedia ASM1142 USB 3.1 Controller

2018-03-29 Thread Guilherme G. Piccoli
Wow Imperia, you're being really helpful here, thank you very much! To enable traces, these are the instructions I've provided to other people affected so far: 0) Reboot the machine in order to put it in a consistent state; 1) echo "module xhci_hcd +flpt" >

[Bug 1316970] Re: g_dbus memory leak in lrmd

2018-03-16 Thread Guilherme G. Piccoli
Hi Dan, made some progress on the investigation (not definitive, but still it helps us to continue with the SRU process). By using Valgrind memcheck analyzer I couldn't observe any non-constant leaks after Seyeong's patch gets applied. Then, I started to use two more analyzers in order to obtain

[Bug 1750013] Re: systemd-logind: memory leaks on session's connections (trusty-only)

2018-03-05 Thread Guilherme G. Piccoli
Thanks Dan. About cgmanager: if we don't have it installed, it does not affect memory of systemd-logind tool per se. What happens, IMHO, is even more severe: we don't free/de-allocate sysfs cgroup paths for sessions. So, in my tests _without_ cgmanager, after 8000 SSH sessions to my target

[Bug 1750013] Re: systemd-logind: memory leaks on session's connections (trusty-only)

2018-03-02 Thread Guilherme G. Piccoli
Dan, really well-observed! Thanks for the suggestion, it makes total sense. I'll update the approach: instead of using session_remove_fifo() to add the session on GC, I'll use the DBus handler itself. After testing, I'll submit a debdiff_v2 here. Thanks, Guilherme -- You received this bug

[Bug 1750013] Re: systemd-logind: memory leaks on session's connections (trusty-only)

2018-03-02 Thread Guilherme G. Piccoli
Dan, thanks for looking at the patch and for your question! I tried to explain this a bit in comment #1, not too much detailed though: "[...]closing session should be somehow added to gc [...] and then (the gc) performs the clean-up. To include a session in gc, the function

[Bug 1750013] Re: systemd-logind: memory leaks on session's connections (trusty-only)

2018-03-02 Thread Guilherme G. Piccoli
Dan, please let me know if the debdiff v2 above addresses your concern and if it's properly following your suggestion. I tested the modification and it's working fine. Thanks, Guilherme -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to

[Bug 1750013] Re: systemd-logind: memory leaks on session's connections (trusty-only)

2018-03-02 Thread Guilherme G. Piccoli
** Patch added: "debdiff v2" https://bugs.launchpad.net/ubuntu/artful/+source/systemd/+bug/1750013/+attachment/5067376/+files/lp1750013-trusty_v2.debdiff -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1750013] Re: systemd-logind: memory leaks on session's connections (trusty-only)

2018-02-28 Thread Guilherme G. Piccoli
** Description changed: - It was observed that systemd-logind tool is leaking memory at each session - connected. The issue happens in systemd from Trusty (14.04), which latest - version currently (Feb/2018) is 204-5ubuntu20.26 (and still reproduces - the bug). - - The basic test-case is to run

[Bug 1316970] Re: g_dbus memory leak in lrmd

2018-02-28 Thread Guilherme G. Piccoli
** Changed in: pacemaker (Ubuntu Trusty) Assignee: Seyeong Kim (xtrusia) => Guilherme G. Piccoli (gpiccoli) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1316970 Title: g_dbus memory l

[Bug 1749961] Re: xhci_hcd: TRB DMA errors reported with ASMedia ASM1142 USB 3.1 Controller

2018-10-05 Thread Guilherme G. Piccoli
Hi Roy, thanks for your quick response. First thing, I'd like to ask you to attach the output of "lspci -vvv" and "dmidecode" in this LP so we can validate the adapters and be sure they are exactly the same, and also the motherboard type. Run both commands as root user. After that, i'll ask you

[Bug 1792660] Re: nvme name floated after boot with 4.15.0 kernel

2018-10-08 Thread Guilherme G. Piccoli
Hi Guo, the Ubuntu kernels schedule can be checked here: https://wiki.ubuntu.com/Kernel/Support . Ubuntu kernel team chooses a kernel version for a specific release, and once that release is available, there's a support schedule. Some versions have long-term support, so they're called LTS. In

[Bug 1797990] [NEW] kdump fail due to an IRQ storm

2018-10-15 Thread Guilherme G. Piccoli
, the PCI NIC was an "Intel 82599ES 10-Gigabit [8086:10fb]" that was used in SR-IOV PCI passthrough mode (vfio_pci), under high load on the guest. ** Affects: linux (Ubuntu) Importance: High Assignee: Guilherme G. Piccoli (gpiccoli) Status: Confirmed ** Tags: st

[Bug 1797990] Re: kdump fail due to an IRQ storm

2018-10-16 Thread Guilherme G. Piccoli
One problem faced during this approach was that the early-quirks code in x86 performs a recursive search in the PCI bus descending from the "first" bus :00, and walking through all secondary busses by jumping between bridges. For historical perspective about this code's evolution, see [0].

[Bug 1797990] Re: kdump fail due to an IRQ storm

2018-10-16 Thread Guilherme G. Piccoli
** Attachment added: "lspci tree output of a multi root bridge PCI topology" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1797990/+attachment/5201886/+files/lspci_multi_root.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to

[Bug 1797990] Re: kdump fail due to an IRQ storm

2018-10-16 Thread Guilherme G. Piccoli
** Attachment added: "lspci tree output of a single root bridge PCI topology" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1797990/+attachment/5201885/+files/lspci_single_root.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to

[Bug 1797990] Re: kdump fail due to an IRQ storm

2018-10-16 Thread Guilherme G. Piccoli
** Changed in: linux (Ubuntu Bionic) Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli) ** Changed in: linux (Ubuntu Xenial) Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli) ** Changed in: linux (Ubuntu Trusty) Assignee: (unassigned) => Guilherme G

[Bug 1797990] Re: kdump fail due to an IRQ storm

2018-10-16 Thread Guilherme G. Piccoli
During the investigation, we've noticed that PCI specification mentions the need of MSI/MSI-X capability to be disabled during a system boot/reset; from PCI Local Bus specification 3.0, sections 6.8.1.3 and 6.8.2.3: "[...] MSI Enable: This bit’s state after reset is 0 (MSI is disabled)." PCI

[Bug 1797990] Re: kdump fail due to an IRQ storm

2018-10-16 Thread Guilherme G. Piccoli
** Description changed: We have reports of a kdump failure in Ubuntu (in x86 machine) that was narrowed down to a MSI irq storm coming from a PCI network device. The bug manifests as a lack of progress in the boot process of the kdump kernel, and a storm of kernel messages like:

[Bug 1797990] Re: kdump fail due to an IRQ storm

2018-10-18 Thread Guilherme G. Piccoli
** Patch added: "Export pci capabilities function from AGP to early-PCI code" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1797990/+attachment/5202666/+files/0002-x86-PCI-Export-find_cap-to-be-used-in-early-PCI-code.patch -- You received this bug notification because you are a member

[Bug 1797990] Re: kdump fail due to an IRQ storm

2018-10-18 Thread Guilherme G. Piccoli
** Patch added: "Parameter to enable quirk in early boot to disable MSIs on kexec" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1797990/+attachment/5202667/+files/0003-x86-quirks-Add-parameter-to-clear-MSIs-early-on-boot.patch -- You received this bug notification because you are a

[Bug 1797990] Re: kdump fail due to an IRQ storm

2018-10-18 Thread Guilherme G. Piccoli
Mailing list archive URL: https://marc.info/?l=linux-pci=153988799707413 (navigate using "next in list") -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1797990 Title: kdump fail due to an IRQ storm

[Bug 1797990] Re: kdump fail due to an IRQ storm

2018-10-18 Thread Guilherme G. Piccoli
Patches sent to the mailing lists today. ** Patch added: "Patch 1: Scan all PCI busses" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1797990/+attachment/5202664/+files/0001-x86-quirks-Scan-all-busses-for-early-PCI-quirks.patch -- You received this bug notification because you are a

[Bug 1764956] Re: Guests using IBRS incur a large performance penalty

2018-10-23 Thread Guilherme G. Piccoli
he guest, host gets its performance back. I'll investigate some commits upstream, including the one you suggested, and once we figure the exact fix for this, will request SRU to the kernel team. Thanks, Guilherme ** Changed in: linux (Ubuntu) Assignee: (unassigned) => Guilherme G. Picco

[Bug 1800562] [NEW] Replace "nousb" option in kdump command-line for the newer "usbcore.nousb"

2018-10-29 Thread Guilherme G. Piccoli
odule parameter"). We need to take this into account in kdump-tools, or else we may boot with USB in kdump even the command-line saying the opposite. ** Affects: makedumpfile (Ubuntu) Importance: High Assignee: Guilherme G. Piccoli (gpiccoli) Status: Confirmed ** Tags

[Bug 1800566] [NEW] Make the reset_devices parameter default for kdump kernels

2018-10-29 Thread Guilherme G. Piccoli
Public bug reported: Kernel has the "reset_devices" parameter that drivers can opt-in, and perform special activity in case this parameter is parsed from command- line. For example, in kdump kernels it hints the drivers that they (maybe) are booting from a non-healthy condition and needs to issue

[Bug 1800873] [NEW] Add KDUMP_CMDLINE_REMOVE option to remove portions of kernel command-line

2018-10-31 Thread Guilherme G. Piccoli
posed here is KDUMP_CMDLINE_REMOVE, which would tentatively "sed"-out the options from the kernel command-line before appending the new ones from KDUMP_CMDLINE_APPEND. ** Affects: makedumpfile (Ubuntu) Importance: Low Assignee: Guilherme G. Piccoli (gpiccoli) Status:

[Bug 1749961] Re: xhci_hcd: TRB DMA errors reported with ASMedia ASM1142 USB 3.1 Controller

2018-10-05 Thread Guilherme G. Piccoli
nux (Debian) Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1749961 Title: xhci_hcd: TRB DMA errors reported with ASMedia ASM1142 U

[Bug 1791758] Re: ldisc crash on reopened tty

2018-10-02 Thread Guilherme G. Piccoli
905f05 - - $ git describe --contains 71472fa9c52b1da27663c275d416d8654b905f05 - v4.12-rc1 + ** Changed in: linux (Ubuntu) Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli) ** Changed in: linux (Ubuntu Trusty) Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli)

[Bug 1791758] Re: ldisc crash on reopened tty

2018-10-02 Thread Guilherme G. Piccoli
Kernel core dump analysis: crash> set PID: 23697 COMMAND: "kworker/u82:0" TASK: 88370bcfaa80 [THREAD_INFO: 883708104000] CPU: 33 STATE: TASK_RUNNING (PANIC) crash> bt PID: 23697 TASK: 88370bcfaa80 CPU: 33 COMMAND: "kworker/u82:0" [...] #3 [883708107b78] __bad_area_nosemaphore at

[Bug 1791758] Re: ldisc crash on reopened tty

2018-10-02 Thread Guilherme G. Piccoli
: Confirmed => In Progress ** Changed in: linux (Ubuntu Xenial) Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli) ** Changed in: linux (Ubuntu Bionic) Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli) ** Description changed: [Impact] The fo

[Bug 1794877] Re: Crash in ixgbe, during tx packet xmit (while potentially changing queues number)

2018-10-02 Thread Guilherme G. Piccoli
** Changed in: linux (Ubuntu Xenial) Status: New => Confirmed ** Changed in: linux (Ubuntu) Status: Confirmed => In Progress ** Changed in: linux (Ubuntu Xenial) Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli) -- You received this bug notification be

[Bug 1792660] Re: nvme name floated after boot with 4.15.0 kernel

2018-10-02 Thread Guilherme G. Piccoli
I've just tested a mainline kernel version 4.17.0, and nvme names didn't float when using the kernel parameter "nvme_core.multipath=0", which reinforces that the fix patch is present in 4.17, so Guo: I guess your 4.17 version is really based on 4.17-rc1. Let me know if there's anything else to

[Bug 1792660] Re: nvme name floated after boot with 4.15.0 kernel

2018-09-28 Thread Guilherme G. Piccoli
Hi Mao, I see...the partner is not using a regular Ubuntu build. This patch was introduced upstream in kernel 4.17, so it's not present in a regular 4.15 or 4.16. It is present in our Ubuntu kernel though, because it was backported and added, but I can't guarantee it's present in the custom

[Bug 1792660] Re: nvme name floated after boot with 4.15.0 kernel

2018-10-11 Thread Guilherme G. Piccoli
You're very welcome Guo! I'll mark this as resolved, in case you have questions, feel free to comment here. Cheers, Guilherme ** Changed in: linux (Ubuntu) Status: Triaged => Fix Released ** Changed in: linux (Ubuntu Bionic) Status: Triaged => Fix Released -- You received this

[Bug 1792660] Re: nvme name floated after boot with 4.15.0 kernel

2018-10-01 Thread Guilherme G. Piccoli
Hi Guo, thanks for your tests! So, to confirm: a) With Ubuntu kernel 4.15.0-34, using the kernel parameter "nvme_core.multipath=0", you _don't_ see the issue; b) With kernel 4.17.0-041700rc1-generic, even using the parameter "nvme_core.multipath=0", you *can reproduce* the issue. Right? I've

[Bug 1792660] Re: nvme name floated after boot with 4.15.0 kernel

2018-09-24 Thread Guilherme G. Piccoli
** Changed in: linux (Ubuntu) Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli) ** Changed in: linux (Ubuntu Bionic) Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subs

[Bug 1792660] Re: nvme name floated after boot with 4.15.0 kernel

2018-09-24 Thread Guilherme G. Piccoli
Hi Zhanglei, I've faced this issue some time ago - in fact, it's not a bug, but some annoyance caused by the multipath introduction in the nvme driver. It started recently, after [0] - the introduction of NVMe multipath brought a change in the way namespaces' "identity" is calculated. Basically,

[Bug 1792660] Re: nvme name floated after boot with 4.15.0 kernel

2018-09-26 Thread Guilherme G. Piccoli
Hi Zhanglei, thanks for the test and screenshot. I can't say for sure, but based on the screenshot, seems they are still running 4.15.0-29 - I'm seeing the BOOT_IMAGE entry in the /proc/cmdline. Specifically, this doesn't mean much (one can boot like a 4.4 kernel and add a BOOT_IMAGE of a 4.15,

[Bug 1794877] [NEW] Crash in ixgbe, during tx packet xmit (while potentially changing queues number)

2018-09-27 Thread Guilherme G. Piccoli
0f 43 53 54 [28663.498992] RIP [] ixgbe_xmit_frame_ring+0x81/0xf50 [ixgbe] [28663.512112] RSP [28663.518217] CR2: 0058 ** Affects: linux (Ubuntu) Importance: High Assignee: Guilherme G. Piccoli (gpiccoli) Status: Confirmed ** Tags: sts -- You received th

[Bug 1764246] Re: kdump kernel panics on Bionic

2018-09-27 Thread Guilherme G. Piccoli
I tried here with a Bionic VM (kernel 4.15.0-34) and kdump worked fine, using the default config (128M reserved for the crash kernel). I'll try to mimic the HW of Daniel's guest and perhaps the kernel version, although it's super old and seems not available anymore. @Paul, perhaps worth for you

[Bug 1794877] Re: Crash in ixgbe, during tx packet xmit (while potentially changing queues number)

2018-09-27 Thread Guilherme G. Piccoli
** Attachment added: "ethtool_-i_eth5.txt" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1794877/+attachment/5193805/+files/ethtool_-i_eth5.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1794877] Re: Crash in ixgbe, during tx packet xmit (while potentially changing queues number)

2018-09-27 Thread Guilherme G. Piccoli
lspci -nn output for this adapter: 04:00.1 Ethernet controller [0200]: Intel Corporation 82599 10 Gigabit Dual Port Backplane Connection [8086:10f8] (rev 01) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz-

[Bug 1794877] Re: Crash in ixgbe, during tx packet xmit (while potentially changing queues number)

2018-09-27 Thread Guilherme G. Piccoli
** Attachment added: "disassembly of relevant functions in crash" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1794877/+attachment/5193816/+files/ixgbe_xmit_ring.asm -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1794877] Re: Crash in ixgbe, during tx packet xmit (while potentially changing queues number)

2018-09-27 Thread Guilherme G. Piccoli
A preliminary analysis of the problem, based in a crash dump collected. >From dmesg, we have [28663.018356] BUG: unable to handle kernel NULL pointer dereference at 0058 [28663.026266] IP: [] ixgbe_xmit_frame_ring+0x81/0xf50 [ixgbe] Using addr2line to validate the line in the

[Bug 1794877] Re: Crash in ixgbe, during tx packet xmit (while potentially changing queues number)

2018-09-27 Thread Guilherme G. Piccoli
I was reading assembly and comparing with the code to evaluate the accuracy of the registers during the dump, and also some points in which it could has failed. In the tx transmit function of ixgbe - ixgbe_xmit_frame(), we have the following: tx_ring = ring ? ring :

[Bug 1794877] Re: Crash in ixgbe, during tx packet xmit (while potentially changing queues number)

2018-09-27 Thread Guilherme G. Piccoli
** Attachment added: "lspci_-nnvv.txt" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1794877/+attachment/5193804/+files/lspci_-nnvv.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1794877

[Bug 1810328] Re: iommu - need to effectively disable iommu if "intel_iommu=off" is passed as a kernel parameter

2019-01-02 Thread Guilherme G. Piccoli
** Attachment added: "Fix scenario (iommu correctly disabled with the proposed patch)" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810328/+attachment/5226417/+files/iommu-patched -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to

[Bug 1810328] [NEW] iommu - need to effectively disable iommu if "intel_iommu=off" is passed as a kernel parameter

2019-01-02 Thread Guilherme G. Piccoli
the iommu after it was enabled before (which would show that the patch works fine indeed, by allowing iommu to get disabled) - some drivers/devices may require iommu to be enabled in order to work, in such case disabling it would prevent the device to work. ** Affects: linux (Ubuntu) Importanc

[Bug 1810328] Re: iommu - need to effectively disable iommu if "intel_iommu=off" is passed as a kernel parameter

2019-01-02 Thread Guilherme G. Piccoli
** Attachment added: "Issue scenario (iommu couldn't get disabled without the patch)" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810328/+attachment/5226416/+files/iommu-missing -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to

[Bug 1810328] Re: iommu - need to effectively disable iommu if "intel_iommu=off" is passed as a kernel parameter

2019-01-02 Thread Guilherme G. Piccoli
Patch submitted to kernel-team mailing list: https://lists.ubuntu.com/archives/kernel-team/2019-January/097452.html -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1810328 Title: iommu - need to

[Bug 1807393] Re: nvme - Polling on timeout

2019-01-17 Thread Guilherme G. Piccoli
The bug was verified in Xenial kernel 4.4.0-142.168 available in -proposed. I'm running in an AWS 2-cpu instance, which exhibits the issue if we run a small reproducer script (a loop that basically changes IRQ affinity for the NVMe MSIs/legacy interrupt among the CPUs and performs a 4K write to

[Bug 1810328] Re: iommu - need to effectively disable iommu if "intel_iommu=off" is passed as a kernel parameter

2019-01-17 Thread Guilherme G. Piccoli
Verification was done in Xenial kernel 4.4.0-142, available in -proposed pocket. Basically, a regular boot with "intel_iommu=on" was completed, and then we kexec'ed the same kernel, but now with "intel_iommu=off". Everything works as expected; before this patch inclusion, we saw DMAR errors (as

[Bug 1791758] Re: ldisc crash on reopened tty

2019-01-17 Thread Guilherme G. Piccoli
We're still pending the Xenial verification; I'm waiting for the trusty- HWE kernel to be released in -proposed, since we have a user capable of reproducing the crash in that version, so I'm planning to ask them to try in the trusty-HWE package. -- You received this bug notification because you

[Bug 1791758] Re: ldisc crash on reopened tty

2019-01-17 Thread Guilherme G. Piccoli
I was able to verify/test in both Bionic and Cosmic proposed kernels, respectively: 4.15.0-44.47 and 4.18.0-14.15. I don't have a reproducer, but to exercise the paths modified by the patches, the following approach was taken: (a) Open ssh connection to the host/test machine, and run the

  1   2   3   4   5   6   >