[Kernel-packages] [Bug 1757497] Re: Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16 namespaces (Bolt / NVMe)
** Changed in: ubuntu-power-systems Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1757497 Title: Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16 namespaces (Bolt / NVMe) Status in The Ubuntu-power-systems project: Fix Released Status in linux package in Ubuntu: Fix Released Status in linux source package in Bionic: Fix Released Bug description: ---Problem Description--- We are seeing similar IO Hang on some namespaces when running HTX 16 namespaces on Ubuntu18.04 ---uname output--- Linux ltciofvtr-spoon4 4.15.0-10-generic #11-Ubuntu SMP Tue Feb 13 18:21:52 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux ---Additional Hardware Info--- (Bolt / NVMe)0003:01:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller 172Xa [144d:a822] (rev 01) Machine Type = AC922 ---Steps to Reproduce--- 1> Install Ubuntu18.04 , upgrade to 4.15.0-10 kernel 2> Install htxubuntu-472.deb 3> make sure you create name spaces #!/bin/bash device=/dev/nvme0 echo $device nvme format $device nvme set-feature $device -f 0x0b --value=0x0100 nvme delete-ns $device -n 0x sleep 5 nvme list nvme get-log $device -l 200 -i 4 max=`nvme id-ctrl $device | grep ^nn | awk '{print $NF}'` for i in $(eval echo {1..$max}) do echo $i nvme create-ns $device --nsze=700 --ncap=700 --flbas=0 --dps=0 nvme attach-ns $device --namespace-id=$i --controllers=`nvme list-ctrl $device | awk -F: '{print $2}'` sleep 2 nvme get-log $device -l 200 -i 4 sleep 2 done nvme list 3> run mdt.hd on those namespaces Contact Information = naveed...@in.ibm.com Stack trace output: - - Device id:/dev/nvme0n8 Timestamp:Feb 20 16:57:30 2018 err= sev=1 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hardware Exerciser stopped on error - - Device id:/dev/nvme0n10 Timestamp:Feb 20 16:57:36 2018 err= sev=1 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hung I/O alert! Segment table-1, Detected 1 I/O(s) hung. Current time: 1519163856; hang criteria: 600 secs, Hard hang threshold: 3 Process ID: 0x8161 1st lba Blocks KernelHang Duration (Hex)(Hex)ThreadCnt(Secs) ** Threshold of 1800 secs on one or more I/Os exceeded! 0x5ae08b 8 7e0457eaf180 44800 - - Device id:/dev/nvme0n10 Timestamp:Feb 20 16:57:36 2018 err= sev=1 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hardware Exerciser stopped on error - - Device id:/dev/nvme0n4 Timestamp:Feb 20 17:14:19 2018 err= sev=4 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hung I/O alert! Segment table-1, Detected 1 I/O(s) hung. Current time: 1519164859; hang criteria: 600 secs, Hard hang threshold: 3 Process ID: 0x815b 1st lba Blocks KernelHang Duration (Hex)(Hex)ThreadCnt(Secs) 0x398a7e 2 71d5a180 33000 - [17643.202114] INFO: task hxestorage:39744 blocked for more than 120 seconds. [17643.202180] Not tainted 4.15.0-10-generic #11-Ubuntu [17643.202224] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [17643.202342] hxestorage D0 39744
[Kernel-packages] [Bug 1757497] Re: Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16 namespaces (Bolt / NVMe)
This bug was fixed in the package linux - 4.15.0-15.16 --- linux (4.15.0-15.16) bionic; urgency=medium * linux: 4.15.0-15.16 -proposed tracker (LP: #1761177) * FFe: Enable configuring resume offset via sysfs (LP: #1760106) - PM / hibernate: Make passing hibernate offsets more friendly * /dev/bcache/by-uuid links not created after reboot (LP: #1729145) - SAUCE: (no-up) bcache: decouple emitting a cached_dev CHANGE uevent * Ubuntu18.04:POWER9:DD2.2 - Unable to start a KVM guest with default machine type(pseries-bionic) complaining "KVM implementation does not support Transactional Memory, try cap-htm=off" (kvm) (LP: #1752026) - powerpc: Use feature bit for RTC presence rather than timebase presence - powerpc: Book E: Remove unused CPU_FTR_L2CSR bit - powerpc: Free up CPU feature bits on 64-bit machines - powerpc: Add CPU feature bits for TM bug workarounds on POWER9 v2.2 - powerpc/powernv: Provide a way to force a core into SMT4 mode - KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9 - KVM: PPC: Book3S HV: Work around XER[SO] bug in fake suspend mode - KVM: PPC: Book3S HV: Work around TEXASR bug in fake suspend state * Important Kernel fixes to be backported for Power9 (kvm) (LP: #1758910) - powerpc/mm: Fixup tlbie vs store ordering issue on POWER9 * Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16 namespaces (Bolt / NVMe) (LP: #1757497) - powerpc/64s: Fix lost pending interrupt due to race causing lost update to irq_happened * fwts-efi-runtime-dkms 18.03.00-0ubuntu1: fwts-efi-runtime-dkms kernel module failed to build (LP: #1760876) - [Packaging] include the retpoline extractor in the headers linux (4.15.0-14.15) bionic; urgency=medium * linux: 4.15.0-14.15 -proposed tracker (LP: #1760678) * [Bionic] mlx4 ETH - mlnx_qos failed when set some TC to vendor (LP: #1758662) - net/mlx4_en: Change default QoS settings * AT_BASE_PLATFORM in AUXV is absent on kernels available on Ubuntu 17.10 (LP: #1759312) - powerpc/64s: Fix NULL AT_BASE_PLATFORM when using DT CPU features * Bionic update to 4.15.15 stable release (LP: #1760585) - net: dsa: Fix dsa_is_user_port() test inversion - openvswitch: meter: fix the incorrect calculation of max delta_t - qed: Fix MPA unalign flow in case header is split across two packets. - tcp: purge write queue upon aborting the connection - qed: Fix non TCP packets should be dropped on iWARP ll2 connection - sysfs: symlink: export sysfs_create_link_nowarn() - net: phy: relax error checking when creating sysfs link netdev->phydev - devlink: Remove redundant free on error path - macvlan: filter out unsupported feature flags - net: ipv6: keep sk status consistent after datagram connect failure - ipv6: old_dport should be a __be16 in __ip6_datagram_connect() - ipv6: sr: fix NULL pointer dereference when setting encap source address - ipv6: sr: fix scheduling in RCU when creating seg6 lwtunnel state - mlxsw: spectrum_buffers: Set a minimum quota for CPU port traffic - net: phy: Tell caller result of phy_change() - ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes - net sched actions: return explicit error when tunnel_key mode is not specified - ppp: avoid loop in xmit recursion detection code - rhashtable: Fix rhlist duplicates insertion - test_rhashtable: add test case for rhltable with duplicate objects - kcm: lock lower socket in kcm_attach - sch_netem: fix skb leak in netem_enqueue() - ieee802154: 6lowpan: fix possible NULL deref in lowpan_device_event() - net: use skb_to_full_sk() in skb_update_prio() - net: Fix hlist corruptions in inet_evict_bucket() - s390/qeth: free netdevice when removing a card - s390/qeth: when thread completes, wake up all waiters - s390/qeth: lock read device while queueing next buffer - s390/qeth: on channel error, reject further cmd requests - soc/fsl/qbman: fix issue in qman_delete_cgr_safe() - dpaa_eth: fix error in dpaa_remove() - dpaa_eth: remove duplicate initialization - dpaa_eth: increment the RX dropped counter when needed - dpaa_eth: remove duplicate increment of the tx_errors counter - dccp: check sk for closed state in dccp_sendmsg() - ipv6: fix access to non-linear packet in ndisc_fill_redirect_hdr_option() - l2tp: do not accept arbitrary sockets - net: ethernet: arc: Fix a potential memory leak if an optional regulator is deferred - net: ethernet: ti: cpsw: add check for in-band mode setting with RGMII PHY interface - net: fec: Fix unbalanced PM runtime calls - net/iucv: Free memory obtained by kzalloc - netlink: avoid a double skb free in genlmsg_mcast() - net: Only honor ifindex in IP_PKTINFO if non-0 - net: systemport: Rewrite __bcm_sysport_tx_reclaim() -
[Kernel-packages] [Bug 1757497] Re: Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16 namespaces (Bolt / NVMe)
** Changed in: linux (Ubuntu Bionic) Status: In Progress => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1757497 Title: Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16 namespaces (Bolt / NVMe) Status in The Ubuntu-power-systems project: Fix Committed Status in linux package in Ubuntu: Fix Committed Status in linux source package in Bionic: Fix Committed Bug description: ---Problem Description--- We are seeing similar IO Hang on some namespaces when running HTX 16 namespaces on Ubuntu18.04 ---uname output--- Linux ltciofvtr-spoon4 4.15.0-10-generic #11-Ubuntu SMP Tue Feb 13 18:21:52 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux ---Additional Hardware Info--- (Bolt / NVMe)0003:01:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller 172Xa [144d:a822] (rev 01) Machine Type = AC922 ---Steps to Reproduce--- 1> Install Ubuntu18.04 , upgrade to 4.15.0-10 kernel 2> Install htxubuntu-472.deb 3> make sure you create name spaces #!/bin/bash device=/dev/nvme0 echo $device nvme format $device nvme set-feature $device -f 0x0b --value=0x0100 nvme delete-ns $device -n 0x sleep 5 nvme list nvme get-log $device -l 200 -i 4 max=`nvme id-ctrl $device | grep ^nn | awk '{print $NF}'` for i in $(eval echo {1..$max}) do echo $i nvme create-ns $device --nsze=700 --ncap=700 --flbas=0 --dps=0 nvme attach-ns $device --namespace-id=$i --controllers=`nvme list-ctrl $device | awk -F: '{print $2}'` sleep 2 nvme get-log $device -l 200 -i 4 sleep 2 done nvme list 3> run mdt.hd on those namespaces Contact Information = naveed...@in.ibm.com Stack trace output: - - Device id:/dev/nvme0n8 Timestamp:Feb 20 16:57:30 2018 err= sev=1 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hardware Exerciser stopped on error - - Device id:/dev/nvme0n10 Timestamp:Feb 20 16:57:36 2018 err= sev=1 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hung I/O alert! Segment table-1, Detected 1 I/O(s) hung. Current time: 1519163856; hang criteria: 600 secs, Hard hang threshold: 3 Process ID: 0x8161 1st lba Blocks KernelHang Duration (Hex)(Hex)ThreadCnt(Secs) ** Threshold of 1800 secs on one or more I/Os exceeded! 0x5ae08b 8 7e0457eaf180 44800 - - Device id:/dev/nvme0n10 Timestamp:Feb 20 16:57:36 2018 err= sev=1 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hardware Exerciser stopped on error - - Device id:/dev/nvme0n4 Timestamp:Feb 20 17:14:19 2018 err= sev=4 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hung I/O alert! Segment table-1, Detected 1 I/O(s) hung. Current time: 1519164859; hang criteria: 600 secs, Hard hang threshold: 3 Process ID: 0x815b 1st lba Blocks KernelHang Duration (Hex)(Hex)ThreadCnt(Secs) 0x398a7e 2 71d5a180 33000 - [17643.202114] INFO: task hxestorage:39744 blocked for more than 120 seconds. [17643.202180] Not tainted 4.15.0-10-generic #11-Ubuntu [17643.202224] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [17643.202342] hxestorage D0
[Kernel-packages] [Bug 1757497] Re: Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16 namespaces (Bolt / NVMe)
** Changed in: ubuntu-power-systems Status: In Progress => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1757497 Title: Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16 namespaces (Bolt / NVMe) Status in The Ubuntu-power-systems project: Fix Committed Status in linux package in Ubuntu: Fix Committed Status in linux source package in Bionic: Fix Committed Bug description: ---Problem Description--- We are seeing similar IO Hang on some namespaces when running HTX 16 namespaces on Ubuntu18.04 ---uname output--- Linux ltciofvtr-spoon4 4.15.0-10-generic #11-Ubuntu SMP Tue Feb 13 18:21:52 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux ---Additional Hardware Info--- (Bolt / NVMe)0003:01:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller 172Xa [144d:a822] (rev 01) Machine Type = AC922 ---Steps to Reproduce--- 1> Install Ubuntu18.04 , upgrade to 4.15.0-10 kernel 2> Install htxubuntu-472.deb 3> make sure you create name spaces #!/bin/bash device=/dev/nvme0 echo $device nvme format $device nvme set-feature $device -f 0x0b --value=0x0100 nvme delete-ns $device -n 0x sleep 5 nvme list nvme get-log $device -l 200 -i 4 max=`nvme id-ctrl $device | grep ^nn | awk '{print $NF}'` for i in $(eval echo {1..$max}) do echo $i nvme create-ns $device --nsze=700 --ncap=700 --flbas=0 --dps=0 nvme attach-ns $device --namespace-id=$i --controllers=`nvme list-ctrl $device | awk -F: '{print $2}'` sleep 2 nvme get-log $device -l 200 -i 4 sleep 2 done nvme list 3> run mdt.hd on those namespaces Contact Information = naveed...@in.ibm.com Stack trace output: - - Device id:/dev/nvme0n8 Timestamp:Feb 20 16:57:30 2018 err= sev=1 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hardware Exerciser stopped on error - - Device id:/dev/nvme0n10 Timestamp:Feb 20 16:57:36 2018 err= sev=1 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hung I/O alert! Segment table-1, Detected 1 I/O(s) hung. Current time: 1519163856; hang criteria: 600 secs, Hard hang threshold: 3 Process ID: 0x8161 1st lba Blocks KernelHang Duration (Hex)(Hex)ThreadCnt(Secs) ** Threshold of 1800 secs on one or more I/Os exceeded! 0x5ae08b 8 7e0457eaf180 44800 - - Device id:/dev/nvme0n10 Timestamp:Feb 20 16:57:36 2018 err= sev=1 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hardware Exerciser stopped on error - - Device id:/dev/nvme0n4 Timestamp:Feb 20 17:14:19 2018 err= sev=4 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hung I/O alert! Segment table-1, Detected 1 I/O(s) hung. Current time: 1519164859; hang criteria: 600 secs, Hard hang threshold: 3 Process ID: 0x815b 1st lba Blocks KernelHang Duration (Hex)(Hex)ThreadCnt(Secs) 0x398a7e 2 71d5a180 33000 - [17643.202114] INFO: task hxestorage:39744 blocked for more than 120 seconds. [17643.202180] Not tainted 4.15.0-10-generic #11-Ubuntu [17643.202224] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [17643.202342] hxestorage D0
[Kernel-packages] [Bug 1757497] Re: Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16 namespaces (Bolt / NVMe)
Bionic request submitted: https://lists.ubuntu.com/archives/kernel-team/2018-April/091346.html -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1757497 Title: Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16 namespaces (Bolt / NVMe) Status in The Ubuntu-power-systems project: In Progress Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: In Progress Bug description: ---Problem Description--- We are seeing similar IO Hang on some namespaces when running HTX 16 namespaces on Ubuntu18.04 ---uname output--- Linux ltciofvtr-spoon4 4.15.0-10-generic #11-Ubuntu SMP Tue Feb 13 18:21:52 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux ---Additional Hardware Info--- (Bolt / NVMe)0003:01:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller 172Xa [144d:a822] (rev 01) Machine Type = AC922 ---Steps to Reproduce--- 1> Install Ubuntu18.04 , upgrade to 4.15.0-10 kernel 2> Install htxubuntu-472.deb 3> make sure you create name spaces #!/bin/bash device=/dev/nvme0 echo $device nvme format $device nvme set-feature $device -f 0x0b --value=0x0100 nvme delete-ns $device -n 0x sleep 5 nvme list nvme get-log $device -l 200 -i 4 max=`nvme id-ctrl $device | grep ^nn | awk '{print $NF}'` for i in $(eval echo {1..$max}) do echo $i nvme create-ns $device --nsze=700 --ncap=700 --flbas=0 --dps=0 nvme attach-ns $device --namespace-id=$i --controllers=`nvme list-ctrl $device | awk -F: '{print $2}'` sleep 2 nvme get-log $device -l 200 -i 4 sleep 2 done nvme list 3> run mdt.hd on those namespaces Contact Information = naveed...@in.ibm.com Stack trace output: - - Device id:/dev/nvme0n8 Timestamp:Feb 20 16:57:30 2018 err= sev=1 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hardware Exerciser stopped on error - - Device id:/dev/nvme0n10 Timestamp:Feb 20 16:57:36 2018 err= sev=1 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hung I/O alert! Segment table-1, Detected 1 I/O(s) hung. Current time: 1519163856; hang criteria: 600 secs, Hard hang threshold: 3 Process ID: 0x8161 1st lba Blocks KernelHang Duration (Hex)(Hex)ThreadCnt(Secs) ** Threshold of 1800 secs on one or more I/Os exceeded! 0x5ae08b 8 7e0457eaf180 44800 - - Device id:/dev/nvme0n10 Timestamp:Feb 20 16:57:36 2018 err= sev=1 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hardware Exerciser stopped on error - - Device id:/dev/nvme0n4 Timestamp:Feb 20 17:14:19 2018 err= sev=4 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hung I/O alert! Segment table-1, Detected 1 I/O(s) hung. Current time: 1519164859; hang criteria: 600 secs, Hard hang threshold: 3 Process ID: 0x815b 1st lba Blocks KernelHang Duration (Hex)(Hex)ThreadCnt(Secs) 0x398a7e 2 71d5a180 33000 - [17643.202114] INFO: task hxestorage:39744 blocked for more than 120 seconds. [17643.202180] Not tainted 4.15.0-10-generic #11-Ubuntu [17643.202224] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [17643.202342] hxestorage D
[Kernel-packages] [Bug 1757497] Re: Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16 namespaces (Bolt / NVMe)
** Changed in: ubuntu-power-systems Status: Triaged => In Progress -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1757497 Title: Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16 namespaces (Bolt / NVMe) Status in The Ubuntu-power-systems project: In Progress Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: In Progress Bug description: ---Problem Description--- We are seeing similar IO Hang on some namespaces when running HTX 16 namespaces on Ubuntu18.04 ---uname output--- Linux ltciofvtr-spoon4 4.15.0-10-generic #11-Ubuntu SMP Tue Feb 13 18:21:52 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux ---Additional Hardware Info--- (Bolt / NVMe)0003:01:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller 172Xa [144d:a822] (rev 01) Machine Type = AC922 ---Steps to Reproduce--- 1> Install Ubuntu18.04 , upgrade to 4.15.0-10 kernel 2> Install htxubuntu-472.deb 3> make sure you create name spaces #!/bin/bash device=/dev/nvme0 echo $device nvme format $device nvme set-feature $device -f 0x0b --value=0x0100 nvme delete-ns $device -n 0x sleep 5 nvme list nvme get-log $device -l 200 -i 4 max=`nvme id-ctrl $device | grep ^nn | awk '{print $NF}'` for i in $(eval echo {1..$max}) do echo $i nvme create-ns $device --nsze=700 --ncap=700 --flbas=0 --dps=0 nvme attach-ns $device --namespace-id=$i --controllers=`nvme list-ctrl $device | awk -F: '{print $2}'` sleep 2 nvme get-log $device -l 200 -i 4 sleep 2 done nvme list 3> run mdt.hd on those namespaces Contact Information = naveed...@in.ibm.com Stack trace output: - - Device id:/dev/nvme0n8 Timestamp:Feb 20 16:57:30 2018 err= sev=1 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hardware Exerciser stopped on error - - Device id:/dev/nvme0n10 Timestamp:Feb 20 16:57:36 2018 err= sev=1 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hung I/O alert! Segment table-1, Detected 1 I/O(s) hung. Current time: 1519163856; hang criteria: 600 secs, Hard hang threshold: 3 Process ID: 0x8161 1st lba Blocks KernelHang Duration (Hex)(Hex)ThreadCnt(Secs) ** Threshold of 1800 secs on one or more I/Os exceeded! 0x5ae08b 8 7e0457eaf180 44800 - - Device id:/dev/nvme0n10 Timestamp:Feb 20 16:57:36 2018 err= sev=1 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hardware Exerciser stopped on error - - Device id:/dev/nvme0n4 Timestamp:Feb 20 17:14:19 2018 err= sev=4 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hung I/O alert! Segment table-1, Detected 1 I/O(s) hung. Current time: 1519164859; hang criteria: 600 secs, Hard hang threshold: 3 Process ID: 0x815b 1st lba Blocks KernelHang Duration (Hex)(Hex)ThreadCnt(Secs) 0x398a7e 2 71d5a180 33000 - [17643.202114] INFO: task hxestorage:39744 blocked for more than 120 seconds. [17643.202180] Not tainted 4.15.0-10-generic #11-Ubuntu [17643.202224] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [17643.202342] hxestorage D0 39744 3424
[Kernel-packages] [Bug 1757497] Re: Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16 namespaces (Bolt / NVMe)
** Changed in: linux (Ubuntu Bionic) Status: Triaged => In Progress ** Changed in: linux (Ubuntu Bionic) Assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) => Joseph Salisbury (jsalisbury) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1757497 Title: Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16 namespaces (Bolt / NVMe) Status in The Ubuntu-power-systems project: Triaged Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: In Progress Bug description: ---Problem Description--- We are seeing similar IO Hang on some namespaces when running HTX 16 namespaces on Ubuntu18.04 ---uname output--- Linux ltciofvtr-spoon4 4.15.0-10-generic #11-Ubuntu SMP Tue Feb 13 18:21:52 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux ---Additional Hardware Info--- (Bolt / NVMe)0003:01:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller 172Xa [144d:a822] (rev 01) Machine Type = AC922 ---Steps to Reproduce--- 1> Install Ubuntu18.04 , upgrade to 4.15.0-10 kernel 2> Install htxubuntu-472.deb 3> make sure you create name spaces #!/bin/bash device=/dev/nvme0 echo $device nvme format $device nvme set-feature $device -f 0x0b --value=0x0100 nvme delete-ns $device -n 0x sleep 5 nvme list nvme get-log $device -l 200 -i 4 max=`nvme id-ctrl $device | grep ^nn | awk '{print $NF}'` for i in $(eval echo {1..$max}) do echo $i nvme create-ns $device --nsze=700 --ncap=700 --flbas=0 --dps=0 nvme attach-ns $device --namespace-id=$i --controllers=`nvme list-ctrl $device | awk -F: '{print $2}'` sleep 2 nvme get-log $device -l 200 -i 4 sleep 2 done nvme list 3> run mdt.hd on those namespaces Contact Information = naveed...@in.ibm.com Stack trace output: - - Device id:/dev/nvme0n8 Timestamp:Feb 20 16:57:30 2018 err= sev=1 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hardware Exerciser stopped on error - - Device id:/dev/nvme0n10 Timestamp:Feb 20 16:57:36 2018 err= sev=1 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hung I/O alert! Segment table-1, Detected 1 I/O(s) hung. Current time: 1519163856; hang criteria: 600 secs, Hard hang threshold: 3 Process ID: 0x8161 1st lba Blocks KernelHang Duration (Hex)(Hex)ThreadCnt(Secs) ** Threshold of 1800 secs on one or more I/Os exceeded! 0x5ae08b 8 7e0457eaf180 44800 - - Device id:/dev/nvme0n10 Timestamp:Feb 20 16:57:36 2018 err= sev=1 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hardware Exerciser stopped on error - - Device id:/dev/nvme0n4 Timestamp:Feb 20 17:14:19 2018 err= sev=4 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hung I/O alert! Segment table-1, Detected 1 I/O(s) hung. Current time: 1519164859; hang criteria: 600 secs, Hard hang threshold: 3 Process ID: 0x815b 1st lba Blocks KernelHang Duration (Hex)(Hex)ThreadCnt(Secs) 0x398a7e 2 71d5a180 33000 - [17643.202114] INFO: task hxestorage:39744 blocked for more than 120 seconds. [17643.202180] Not tainted 4.15.0-10-generic #11-Ubuntu
[Kernel-packages] [Bug 1757497] Re: Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16 namespaces (Bolt / NVMe)
** Tags added: triage-g -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1757497 Title: Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16 namespaces (Bolt / NVMe) Status in The Ubuntu-power-systems project: Triaged Status in linux package in Ubuntu: Triaged Status in linux source package in Bionic: Triaged Bug description: ---Problem Description--- We are seeing similar IO Hang on some namespaces when running HTX 16 namespaces on Ubuntu18.04 ---uname output--- Linux ltciofvtr-spoon4 4.15.0-10-generic #11-Ubuntu SMP Tue Feb 13 18:21:52 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux ---Additional Hardware Info--- (Bolt / NVMe)0003:01:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller 172Xa [144d:a822] (rev 01) Machine Type = AC922 ---Steps to Reproduce--- 1> Install Ubuntu18.04 , upgrade to 4.15.0-10 kernel 2> Install htxubuntu-472.deb 3> make sure you create name spaces #!/bin/bash device=/dev/nvme0 echo $device nvme format $device nvme set-feature $device -f 0x0b --value=0x0100 nvme delete-ns $device -n 0x sleep 5 nvme list nvme get-log $device -l 200 -i 4 max=`nvme id-ctrl $device | grep ^nn | awk '{print $NF}'` for i in $(eval echo {1..$max}) do echo $i nvme create-ns $device --nsze=700 --ncap=700 --flbas=0 --dps=0 nvme attach-ns $device --namespace-id=$i --controllers=`nvme list-ctrl $device | awk -F: '{print $2}'` sleep 2 nvme get-log $device -l 200 -i 4 sleep 2 done nvme list 3> run mdt.hd on those namespaces Contact Information = naveed...@in.ibm.com Stack trace output: - - Device id:/dev/nvme0n8 Timestamp:Feb 20 16:57:30 2018 err= sev=1 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hardware Exerciser stopped on error - - Device id:/dev/nvme0n10 Timestamp:Feb 20 16:57:36 2018 err= sev=1 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hung I/O alert! Segment table-1, Detected 1 I/O(s) hung. Current time: 1519163856; hang criteria: 600 secs, Hard hang threshold: 3 Process ID: 0x8161 1st lba Blocks KernelHang Duration (Hex)(Hex)ThreadCnt(Secs) ** Threshold of 1800 secs on one or more I/Os exceeded! 0x5ae08b 8 7e0457eaf180 44800 - - Device id:/dev/nvme0n10 Timestamp:Feb 20 16:57:36 2018 err= sev=1 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hardware Exerciser stopped on error - - Device id:/dev/nvme0n4 Timestamp:Feb 20 17:14:19 2018 err= sev=4 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hung I/O alert! Segment table-1, Detected 1 I/O(s) hung. Current time: 1519164859; hang criteria: 600 secs, Hard hang threshold: 3 Process ID: 0x815b 1st lba Blocks KernelHang Duration (Hex)(Hex)ThreadCnt(Secs) 0x398a7e 2 71d5a180 33000 - [17643.202114] INFO: task hxestorage:39744 blocked for more than 120 seconds. [17643.202180] Not tainted 4.15.0-10-generic #11-Ubuntu [17643.202224] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [17643.202342] hxestorage D0 39744 3424 0x0004 [17643.202346] Call Trace: [17643.202352]
[Kernel-packages] [Bug 1757497] Re: Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16 namespaces (Bolt / NVMe)
** Changed in: ubuntu-power-systems Status: New => Triaged -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1757497 Title: Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16 namespaces (Bolt / NVMe) Status in The Ubuntu-power-systems project: Triaged Status in linux package in Ubuntu: Triaged Status in linux source package in Bionic: Triaged Bug description: ---Problem Description--- We are seeing similar IO Hang on some namespaces when running HTX 16 namespaces on Ubuntu18.04 ---uname output--- Linux ltciofvtr-spoon4 4.15.0-10-generic #11-Ubuntu SMP Tue Feb 13 18:21:52 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux ---Additional Hardware Info--- (Bolt / NVMe)0003:01:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller 172Xa [144d:a822] (rev 01) Machine Type = AC922 ---Steps to Reproduce--- 1> Install Ubuntu18.04 , upgrade to 4.15.0-10 kernel 2> Install htxubuntu-472.deb 3> make sure you create name spaces #!/bin/bash device=/dev/nvme0 echo $device nvme format $device nvme set-feature $device -f 0x0b --value=0x0100 nvme delete-ns $device -n 0x sleep 5 nvme list nvme get-log $device -l 200 -i 4 max=`nvme id-ctrl $device | grep ^nn | awk '{print $NF}'` for i in $(eval echo {1..$max}) do echo $i nvme create-ns $device --nsze=700 --ncap=700 --flbas=0 --dps=0 nvme attach-ns $device --namespace-id=$i --controllers=`nvme list-ctrl $device | awk -F: '{print $2}'` sleep 2 nvme get-log $device -l 200 -i 4 sleep 2 done nvme list 3> run mdt.hd on those namespaces Contact Information = naveed...@in.ibm.com Stack trace output: - - Device id:/dev/nvme0n8 Timestamp:Feb 20 16:57:30 2018 err= sev=1 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hardware Exerciser stopped on error - - Device id:/dev/nvme0n10 Timestamp:Feb 20 16:57:36 2018 err= sev=1 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hung I/O alert! Segment table-1, Detected 1 I/O(s) hung. Current time: 1519163856; hang criteria: 600 secs, Hard hang threshold: 3 Process ID: 0x8161 1st lba Blocks KernelHang Duration (Hex)(Hex)ThreadCnt(Secs) ** Threshold of 1800 secs on one or more I/Os exceeded! 0x5ae08b 8 7e0457eaf180 44800 - - Device id:/dev/nvme0n10 Timestamp:Feb 20 16:57:36 2018 err= sev=1 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hardware Exerciser stopped on error - - Device id:/dev/nvme0n4 Timestamp:Feb 20 17:14:19 2018 err= sev=4 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hung I/O alert! Segment table-1, Detected 1 I/O(s) hung. Current time: 1519164859; hang criteria: 600 secs, Hard hang threshold: 3 Process ID: 0x815b 1st lba Blocks KernelHang Duration (Hex)(Hex)ThreadCnt(Secs) 0x398a7e 2 71d5a180 33000 - [17643.202114] INFO: task hxestorage:39744 blocked for more than 120 seconds. [17643.202180] Not tainted 4.15.0-10-generic #11-Ubuntu [17643.202224] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [17643.202342] hxestorage D0 39744 3424 0x0004
[Kernel-packages] [Bug 1757497] Re: Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16 namespaces (Bolt / NVMe)
** Also affects: ubuntu-power-systems Importance: Undecided Status: New ** Changed in: ubuntu-power-systems Importance: Undecided => High ** Changed in: ubuntu-power-systems Assignee: (unassigned) => Canonical Kernel Team (canonical-kernel-team) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1757497 Title: Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16 namespaces (Bolt / NVMe) Status in The Ubuntu-power-systems project: New Status in linux package in Ubuntu: Triaged Status in linux source package in Bionic: Triaged Bug description: ---Problem Description--- We are seeing similar IO Hang on some namespaces when running HTX 16 namespaces on Ubuntu18.04 ---uname output--- Linux ltciofvtr-spoon4 4.15.0-10-generic #11-Ubuntu SMP Tue Feb 13 18:21:52 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux ---Additional Hardware Info--- (Bolt / NVMe)0003:01:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller 172Xa [144d:a822] (rev 01) Machine Type = AC922 ---Steps to Reproduce--- 1> Install Ubuntu18.04 , upgrade to 4.15.0-10 kernel 2> Install htxubuntu-472.deb 3> make sure you create name spaces #!/bin/bash device=/dev/nvme0 echo $device nvme format $device nvme set-feature $device -f 0x0b --value=0x0100 nvme delete-ns $device -n 0x sleep 5 nvme list nvme get-log $device -l 200 -i 4 max=`nvme id-ctrl $device | grep ^nn | awk '{print $NF}'` for i in $(eval echo {1..$max}) do echo $i nvme create-ns $device --nsze=700 --ncap=700 --flbas=0 --dps=0 nvme attach-ns $device --namespace-id=$i --controllers=`nvme list-ctrl $device | awk -F: '{print $2}'` sleep 2 nvme get-log $device -l 200 -i 4 sleep 2 done nvme list 3> run mdt.hd on those namespaces Contact Information = naveed...@in.ibm.com Stack trace output: - - Device id:/dev/nvme0n8 Timestamp:Feb 20 16:57:30 2018 err= sev=1 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hardware Exerciser stopped on error - - Device id:/dev/nvme0n10 Timestamp:Feb 20 16:57:36 2018 err= sev=1 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hung I/O alert! Segment table-1, Detected 1 I/O(s) hung. Current time: 1519163856; hang criteria: 600 secs, Hard hang threshold: 3 Process ID: 0x8161 1st lba Blocks KernelHang Duration (Hex)(Hex)ThreadCnt(Secs) ** Threshold of 1800 secs on one or more I/Os exceeded! 0x5ae08b 8 7e0457eaf180 44800 - - Device id:/dev/nvme0n10 Timestamp:Feb 20 16:57:36 2018 err= sev=1 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hardware Exerciser stopped on error - - Device id:/dev/nvme0n4 Timestamp:Feb 20 17:14:19 2018 err= sev=4 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hung I/O alert! Segment table-1, Detected 1 I/O(s) hung. Current time: 1519164859; hang criteria: 600 secs, Hard hang threshold: 3 Process ID: 0x815b 1st lba Blocks KernelHang Duration (Hex)(Hex)ThreadCnt(Secs) 0x398a7e 2 71d5a180 33000 - [17643.202114] INFO: task hxestorage:39744 blocked for more than 120 seconds. [17643.202180] Not tainted
[Kernel-packages] [Bug 1757497] Re: Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16 namespaces (Bolt / NVMe)
** Tags added: kernel-da-key ** Also affects: linux (Ubuntu Bionic) Importance: Undecided Assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) Status: New ** Changed in: linux (Ubuntu Bionic) Importance: Undecided => High ** Changed in: linux (Ubuntu Bionic) Status: New => Triaged -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1757497 Title: Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16 namespaces (Bolt / NVMe) Status in linux package in Ubuntu: Triaged Status in linux source package in Bionic: Triaged Bug description: ---Problem Description--- We are seeing similar IO Hang on some namespaces when running HTX 16 namespaces on Ubuntu18.04 ---uname output--- Linux ltciofvtr-spoon4 4.15.0-10-generic #11-Ubuntu SMP Tue Feb 13 18:21:52 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux ---Additional Hardware Info--- (Bolt / NVMe)0003:01:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller 172Xa [144d:a822] (rev 01) Machine Type = AC922 ---Steps to Reproduce--- 1> Install Ubuntu18.04 , upgrade to 4.15.0-10 kernel 2> Install htxubuntu-472.deb 3> make sure you create name spaces #!/bin/bash device=/dev/nvme0 echo $device nvme format $device nvme set-feature $device -f 0x0b --value=0x0100 nvme delete-ns $device -n 0x sleep 5 nvme list nvme get-log $device -l 200 -i 4 max=`nvme id-ctrl $device | grep ^nn | awk '{print $NF}'` for i in $(eval echo {1..$max}) do echo $i nvme create-ns $device --nsze=700 --ncap=700 --flbas=0 --dps=0 nvme attach-ns $device --namespace-id=$i --controllers=`nvme list-ctrl $device | awk -F: '{print $2}'` sleep 2 nvme get-log $device -l 200 -i 4 sleep 2 done nvme list 3> run mdt.hd on those namespaces Contact Information = naveed...@in.ibm.com Stack trace output: - - Device id:/dev/nvme0n8 Timestamp:Feb 20 16:57:30 2018 err= sev=1 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hardware Exerciser stopped on error - - Device id:/dev/nvme0n10 Timestamp:Feb 20 16:57:36 2018 err= sev=1 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hung I/O alert! Segment table-1, Detected 1 I/O(s) hung. Current time: 1519163856; hang criteria: 600 secs, Hard hang threshold: 3 Process ID: 0x8161 1st lba Blocks KernelHang Duration (Hex)(Hex)ThreadCnt(Secs) ** Threshold of 1800 secs on one or more I/Os exceeded! 0x5ae08b 8 7e0457eaf180 44800 - - Device id:/dev/nvme0n10 Timestamp:Feb 20 16:57:36 2018 err= sev=1 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hardware Exerciser stopped on error - - Device id:/dev/nvme0n4 Timestamp:Feb 20 17:14:19 2018 err= sev=4 Exerciser Name:hxestorage Serial No:Not Available Part No:Not Available Location:Not Available FRU Number:Not Available Device:Not Available Error Text:Hung I/O alert! Segment table-1, Detected 1 I/O(s) hung. Current time: 1519164859; hang criteria: 600 secs, Hard hang threshold: 3 Process ID: 0x815b 1st lba Blocks KernelHang Duration (Hex)(Hex)ThreadCnt(Secs) 0x398a7e 2 71d5a180 33000 - [17643.202114] INFO: task hxestorage:39744 blocked for more than 120 seconds. [17643.202180]