[Kernel-packages] [Bug 1757497] Re: Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16 namespaces (Bolt / NVMe)

2018-04-23 Thread Manoj Iyer
** Changed in: ubuntu-power-systems
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1757497

Title:
  Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16
  namespaces  (Bolt / NVMe)

Status in The Ubuntu-power-systems project:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Released

Bug description:
  ---Problem Description---
  We are seeing similar IO Hang on some namespaces when running HTX 16 
namespaces on Ubuntu18.04 
   
  ---uname output---
  Linux ltciofvtr-spoon4 4.15.0-10-generic #11-Ubuntu SMP Tue Feb 13 18:21:52 
UTC 2018 ppc64le ppc64le ppc64le GNU/Linux
   
  ---Additional Hardware Info---
  (Bolt / NVMe)0003:01:00.0 Non-Volatile memory controller [0108]: Samsung 
Electronics Co Ltd NVMe SSD Controller 172Xa [144d:a822] (rev 01)

  Machine Type = AC922 
   
  ---Steps to Reproduce---
   1> Install Ubuntu18.04 , upgrade to 4.15.0-10 kernel
  2> Install htxubuntu-472.deb
  3> make sure you create name spaces 
  #!/bin/bash

  device=/dev/nvme0
  echo $device

  nvme format $device

  nvme set-feature $device -f 0x0b --value=0x0100

  nvme delete-ns $device -n 0x
  sleep 5
  nvme list

  nvme get-log $device -l 200 -i 4

  max=`nvme id-ctrl $device | grep ^nn | awk '{print $NF}'`

  for i in $(eval echo {1..$max})
  do
  echo $i
  nvme create-ns $device --nsze=700 --ncap=700 --flbas=0 --dps=0
  nvme attach-ns $device --namespace-id=$i --controllers=`nvme list-ctrl 
$device | awk -F: '{print $2}'`
  sleep 2
  nvme get-log $device -l 200 -i 4
  sleep 2
  done
  nvme list

  3> run mdt.hd on those namespaces
   
  Contact Information = naveed...@in.ibm.com 
   
  Stack trace output:
   -

  -
  Device id:/dev/nvme0n8  
  Timestamp:Feb 20 16:57:30 2018
  err=
  sev=1
  Exerciser Name:hxestorage
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available
  Device:Not Available
  Error Text:Hardware Exerciser stopped on error

  
  -

  - 
  
  Device id:/dev/nvme0n10 
  Timestamp:Feb 20 16:57:36 2018   
  err=
  sev=1
  Exerciser Name:hxestorage   
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available   
  Device:Not Available
  Error Text:Hung I/O alert! Segment table-1,  Detected 1 I/O(s) hung.
  Current time: 1519163856; hang criteria: 600 secs, Hard hang threshold: 3
  Process ID: 0x8161
 1st lba   Blocks   KernelHang   Duration
  (Hex)(Hex)ThreadCnt(Secs)
  ** Threshold of 1800 secs on one or more I/Os exceeded!
  0x5ae08b 8 7e0457eaf180  44800 

 
  -

  -
  Device id:/dev/nvme0n10 
  Timestamp:Feb 20 16:57:36 2018
  err=
  sev=1
  Exerciser Name:hxestorage
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available
  Device:Not Available
  Error Text:Hardware Exerciser stopped on error

  
  -

  - 
  
  Device id:/dev/nvme0n4  
  Timestamp:Feb 20 17:14:19 2018   
  err=
  sev=4
  Exerciser Name:hxestorage   
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available   
  Device:Not Available
  Error Text:Hung I/O alert! Segment table-1,  Detected 1 I/O(s) hung.
  Current time: 1519164859; hang criteria: 600 secs, Hard hang threshold: 3
  Process ID: 0x815b
 1st lba   Blocks   KernelHang   Duration
  (Hex)(Hex)ThreadCnt(Secs)
  0x398a7e 2 71d5a180  33000 

 
  -

  [17643.202114] INFO: task hxestorage:39744 blocked for more than 120 seconds.
  [17643.202180]   Not tainted 4.15.0-10-generic #11-Ubuntu
  [17643.202224] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [17643.202342] hxestorage  D0 39744  

[Kernel-packages] [Bug 1757497] Re: Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16 namespaces (Bolt / NVMe)

2018-04-13 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 4.15.0-15.16

---
linux (4.15.0-15.16) bionic; urgency=medium

  * linux: 4.15.0-15.16 -proposed tracker (LP: #1761177)

  * FFe: Enable configuring resume offset via sysfs (LP: #1760106)
- PM / hibernate: Make passing hibernate offsets more friendly

  * /dev/bcache/by-uuid links not created after reboot (LP: #1729145)
- SAUCE: (no-up) bcache: decouple emitting a cached_dev CHANGE uevent

  * Ubuntu18.04:POWER9:DD2.2 - Unable to start a KVM guest with default machine
type(pseries-bionic) complaining "KVM implementation does not support
Transactional Memory, try cap-htm=off" (kvm) (LP: #1752026)
- powerpc: Use feature bit for RTC presence rather than timebase presence
- powerpc: Book E: Remove unused CPU_FTR_L2CSR bit
- powerpc: Free up CPU feature bits on 64-bit machines
- powerpc: Add CPU feature bits for TM bug workarounds on POWER9 v2.2
- powerpc/powernv: Provide a way to force a core into SMT4 mode
- KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9
- KVM: PPC: Book3S HV: Work around XER[SO] bug in fake suspend mode
- KVM: PPC: Book3S HV: Work around TEXASR bug in fake suspend state

  * Important Kernel fixes to be backported for Power9 (kvm) (LP: #1758910)
- powerpc/mm: Fixup tlbie vs store ordering issue on POWER9

  * Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16
namespaces  (Bolt / NVMe) (LP: #1757497)
- powerpc/64s: Fix lost pending interrupt due to race causing lost update to
  irq_happened

  * fwts-efi-runtime-dkms 18.03.00-0ubuntu1: fwts-efi-runtime-dkms kernel module
failed to build (LP: #1760876)
- [Packaging] include the retpoline extractor in the headers

linux (4.15.0-14.15) bionic; urgency=medium

  * linux: 4.15.0-14.15 -proposed tracker (LP: #1760678)

  * [Bionic] mlx4 ETH - mlnx_qos failed when set some TC to vendor
(LP: #1758662)
- net/mlx4_en: Change default QoS settings

  * AT_BASE_PLATFORM in AUXV is absent on kernels available on Ubuntu 17.10
(LP: #1759312)
- powerpc/64s: Fix NULL AT_BASE_PLATFORM when using DT CPU features

  * Bionic update to 4.15.15 stable release (LP: #1760585)
- net: dsa: Fix dsa_is_user_port() test inversion
- openvswitch: meter: fix the incorrect calculation of max delta_t
- qed: Fix MPA unalign flow in case header is split across two packets.
- tcp: purge write queue upon aborting the connection
- qed: Fix non TCP packets should be dropped on iWARP ll2 connection
- sysfs: symlink: export sysfs_create_link_nowarn()
- net: phy: relax error checking when creating sysfs link netdev->phydev
- devlink: Remove redundant free on error path
- macvlan: filter out unsupported feature flags
- net: ipv6: keep sk status consistent after datagram connect failure
- ipv6: old_dport should be a __be16 in __ip6_datagram_connect()
- ipv6: sr: fix NULL pointer dereference when setting encap source address
- ipv6: sr: fix scheduling in RCU when creating seg6 lwtunnel state
- mlxsw: spectrum_buffers: Set a minimum quota for CPU port traffic
- net: phy: Tell caller result of phy_change()
- ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes
- net sched actions: return explicit error when tunnel_key mode is not
  specified
- ppp: avoid loop in xmit recursion detection code
- rhashtable: Fix rhlist duplicates insertion
- test_rhashtable: add test case for rhltable with duplicate objects
- kcm: lock lower socket in kcm_attach
- sch_netem: fix skb leak in netem_enqueue()
- ieee802154: 6lowpan: fix possible NULL deref in lowpan_device_event()
- net: use skb_to_full_sk() in skb_update_prio()
- net: Fix hlist corruptions in inet_evict_bucket()
- s390/qeth: free netdevice when removing a card
- s390/qeth: when thread completes, wake up all waiters
- s390/qeth: lock read device while queueing next buffer
- s390/qeth: on channel error, reject further cmd requests
- soc/fsl/qbman: fix issue in qman_delete_cgr_safe()
- dpaa_eth: fix error in dpaa_remove()
- dpaa_eth: remove duplicate initialization
- dpaa_eth: increment the RX dropped counter when needed
- dpaa_eth: remove duplicate increment of the tx_errors counter
- dccp: check sk for closed state in dccp_sendmsg()
- ipv6: fix access to non-linear packet in ndisc_fill_redirect_hdr_option()
- l2tp: do not accept arbitrary sockets
- net: ethernet: arc: Fix a potential memory leak if an optional regulator 
is
  deferred
- net: ethernet: ti: cpsw: add check for in-band mode setting with RGMII PHY
  interface
- net: fec: Fix unbalanced PM runtime calls
- net/iucv: Free memory obtained by kzalloc
- netlink: avoid a double skb free in genlmsg_mcast()
- net: Only honor ifindex in IP_PKTINFO if non-0
- net: systemport: Rewrite __bcm_sysport_tx_reclaim()
- 

[Kernel-packages] [Bug 1757497] Re: Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16 namespaces (Bolt / NVMe)

2018-04-03 Thread Seth Forshee
** Changed in: linux (Ubuntu Bionic)
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1757497

Title:
  Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16
  namespaces  (Bolt / NVMe)

Status in The Ubuntu-power-systems project:
  Fix Committed
Status in linux package in Ubuntu:
  Fix Committed
Status in linux source package in Bionic:
  Fix Committed

Bug description:
  ---Problem Description---
  We are seeing similar IO Hang on some namespaces when running HTX 16 
namespaces on Ubuntu18.04 
   
  ---uname output---
  Linux ltciofvtr-spoon4 4.15.0-10-generic #11-Ubuntu SMP Tue Feb 13 18:21:52 
UTC 2018 ppc64le ppc64le ppc64le GNU/Linux
   
  ---Additional Hardware Info---
  (Bolt / NVMe)0003:01:00.0 Non-Volatile memory controller [0108]: Samsung 
Electronics Co Ltd NVMe SSD Controller 172Xa [144d:a822] (rev 01)

  Machine Type = AC922 
   
  ---Steps to Reproduce---
   1> Install Ubuntu18.04 , upgrade to 4.15.0-10 kernel
  2> Install htxubuntu-472.deb
  3> make sure you create name spaces 
  #!/bin/bash

  device=/dev/nvme0
  echo $device

  nvme format $device

  nvme set-feature $device -f 0x0b --value=0x0100

  nvme delete-ns $device -n 0x
  sleep 5
  nvme list

  nvme get-log $device -l 200 -i 4

  max=`nvme id-ctrl $device | grep ^nn | awk '{print $NF}'`

  for i in $(eval echo {1..$max})
  do
  echo $i
  nvme create-ns $device --nsze=700 --ncap=700 --flbas=0 --dps=0
  nvme attach-ns $device --namespace-id=$i --controllers=`nvme list-ctrl 
$device | awk -F: '{print $2}'`
  sleep 2
  nvme get-log $device -l 200 -i 4
  sleep 2
  done
  nvme list

  3> run mdt.hd on those namespaces
   
  Contact Information = naveed...@in.ibm.com 
   
  Stack trace output:
   -

  -
  Device id:/dev/nvme0n8  
  Timestamp:Feb 20 16:57:30 2018
  err=
  sev=1
  Exerciser Name:hxestorage
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available
  Device:Not Available
  Error Text:Hardware Exerciser stopped on error

  
  -

  - 
  
  Device id:/dev/nvme0n10 
  Timestamp:Feb 20 16:57:36 2018   
  err=
  sev=1
  Exerciser Name:hxestorage   
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available   
  Device:Not Available
  Error Text:Hung I/O alert! Segment table-1,  Detected 1 I/O(s) hung.
  Current time: 1519163856; hang criteria: 600 secs, Hard hang threshold: 3
  Process ID: 0x8161
 1st lba   Blocks   KernelHang   Duration
  (Hex)(Hex)ThreadCnt(Secs)
  ** Threshold of 1800 secs on one or more I/Os exceeded!
  0x5ae08b 8 7e0457eaf180  44800 

 
  -

  -
  Device id:/dev/nvme0n10 
  Timestamp:Feb 20 16:57:36 2018
  err=
  sev=1
  Exerciser Name:hxestorage
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available
  Device:Not Available
  Error Text:Hardware Exerciser stopped on error

  
  -

  - 
  
  Device id:/dev/nvme0n4  
  Timestamp:Feb 20 17:14:19 2018   
  err=
  sev=4
  Exerciser Name:hxestorage   
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available   
  Device:Not Available
  Error Text:Hung I/O alert! Segment table-1,  Detected 1 I/O(s) hung.
  Current time: 1519164859; hang criteria: 600 secs, Hard hang threshold: 3
  Process ID: 0x815b
 1st lba   Blocks   KernelHang   Duration
  (Hex)(Hex)ThreadCnt(Secs)
  0x398a7e 2 71d5a180  33000 

 
  -

  [17643.202114] INFO: task hxestorage:39744 blocked for more than 120 seconds.
  [17643.202180]   Not tainted 4.15.0-10-generic #11-Ubuntu
  [17643.202224] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [17643.202342] hxestorage  D0 

[Kernel-packages] [Bug 1757497] Re: Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16 namespaces (Bolt / NVMe)

2018-04-03 Thread Frank Heimes
** Changed in: ubuntu-power-systems
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1757497

Title:
  Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16
  namespaces  (Bolt / NVMe)

Status in The Ubuntu-power-systems project:
  Fix Committed
Status in linux package in Ubuntu:
  Fix Committed
Status in linux source package in Bionic:
  Fix Committed

Bug description:
  ---Problem Description---
  We are seeing similar IO Hang on some namespaces when running HTX 16 
namespaces on Ubuntu18.04 
   
  ---uname output---
  Linux ltciofvtr-spoon4 4.15.0-10-generic #11-Ubuntu SMP Tue Feb 13 18:21:52 
UTC 2018 ppc64le ppc64le ppc64le GNU/Linux
   
  ---Additional Hardware Info---
  (Bolt / NVMe)0003:01:00.0 Non-Volatile memory controller [0108]: Samsung 
Electronics Co Ltd NVMe SSD Controller 172Xa [144d:a822] (rev 01)

  Machine Type = AC922 
   
  ---Steps to Reproduce---
   1> Install Ubuntu18.04 , upgrade to 4.15.0-10 kernel
  2> Install htxubuntu-472.deb
  3> make sure you create name spaces 
  #!/bin/bash

  device=/dev/nvme0
  echo $device

  nvme format $device

  nvme set-feature $device -f 0x0b --value=0x0100

  nvme delete-ns $device -n 0x
  sleep 5
  nvme list

  nvme get-log $device -l 200 -i 4

  max=`nvme id-ctrl $device | grep ^nn | awk '{print $NF}'`

  for i in $(eval echo {1..$max})
  do
  echo $i
  nvme create-ns $device --nsze=700 --ncap=700 --flbas=0 --dps=0
  nvme attach-ns $device --namespace-id=$i --controllers=`nvme list-ctrl 
$device | awk -F: '{print $2}'`
  sleep 2
  nvme get-log $device -l 200 -i 4
  sleep 2
  done
  nvme list

  3> run mdt.hd on those namespaces
   
  Contact Information = naveed...@in.ibm.com 
   
  Stack trace output:
   -

  -
  Device id:/dev/nvme0n8  
  Timestamp:Feb 20 16:57:30 2018
  err=
  sev=1
  Exerciser Name:hxestorage
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available
  Device:Not Available
  Error Text:Hardware Exerciser stopped on error

  
  -

  - 
  
  Device id:/dev/nvme0n10 
  Timestamp:Feb 20 16:57:36 2018   
  err=
  sev=1
  Exerciser Name:hxestorage   
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available   
  Device:Not Available
  Error Text:Hung I/O alert! Segment table-1,  Detected 1 I/O(s) hung.
  Current time: 1519163856; hang criteria: 600 secs, Hard hang threshold: 3
  Process ID: 0x8161
 1st lba   Blocks   KernelHang   Duration
  (Hex)(Hex)ThreadCnt(Secs)
  ** Threshold of 1800 secs on one or more I/Os exceeded!
  0x5ae08b 8 7e0457eaf180  44800 

 
  -

  -
  Device id:/dev/nvme0n10 
  Timestamp:Feb 20 16:57:36 2018
  err=
  sev=1
  Exerciser Name:hxestorage
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available
  Device:Not Available
  Error Text:Hardware Exerciser stopped on error

  
  -

  - 
  
  Device id:/dev/nvme0n4  
  Timestamp:Feb 20 17:14:19 2018   
  err=
  sev=4
  Exerciser Name:hxestorage   
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available   
  Device:Not Available
  Error Text:Hung I/O alert! Segment table-1,  Detected 1 I/O(s) hung.
  Current time: 1519164859; hang criteria: 600 secs, Hard hang threshold: 3
  Process ID: 0x815b
 1st lba   Blocks   KernelHang   Duration
  (Hex)(Hex)ThreadCnt(Secs)
  0x398a7e 2 71d5a180  33000 

 
  -

  [17643.202114] INFO: task hxestorage:39744 blocked for more than 120 seconds.
  [17643.202180]   Not tainted 4.15.0-10-generic #11-Ubuntu
  [17643.202224] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [17643.202342] hxestorage  D0 

[Kernel-packages] [Bug 1757497] Re: Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16 namespaces (Bolt / NVMe)

2018-04-03 Thread Joseph Salisbury
Bionic request submitted:
https://lists.ubuntu.com/archives/kernel-team/2018-April/091346.html

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1757497

Title:
  Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16
  namespaces  (Bolt / NVMe)

Status in The Ubuntu-power-systems project:
  In Progress
Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress

Bug description:
  ---Problem Description---
  We are seeing similar IO Hang on some namespaces when running HTX 16 
namespaces on Ubuntu18.04 
   
  ---uname output---
  Linux ltciofvtr-spoon4 4.15.0-10-generic #11-Ubuntu SMP Tue Feb 13 18:21:52 
UTC 2018 ppc64le ppc64le ppc64le GNU/Linux
   
  ---Additional Hardware Info---
  (Bolt / NVMe)0003:01:00.0 Non-Volatile memory controller [0108]: Samsung 
Electronics Co Ltd NVMe SSD Controller 172Xa [144d:a822] (rev 01)

  Machine Type = AC922 
   
  ---Steps to Reproduce---
   1> Install Ubuntu18.04 , upgrade to 4.15.0-10 kernel
  2> Install htxubuntu-472.deb
  3> make sure you create name spaces 
  #!/bin/bash

  device=/dev/nvme0
  echo $device

  nvme format $device

  nvme set-feature $device -f 0x0b --value=0x0100

  nvme delete-ns $device -n 0x
  sleep 5
  nvme list

  nvme get-log $device -l 200 -i 4

  max=`nvme id-ctrl $device | grep ^nn | awk '{print $NF}'`

  for i in $(eval echo {1..$max})
  do
  echo $i
  nvme create-ns $device --nsze=700 --ncap=700 --flbas=0 --dps=0
  nvme attach-ns $device --namespace-id=$i --controllers=`nvme list-ctrl 
$device | awk -F: '{print $2}'`
  sleep 2
  nvme get-log $device -l 200 -i 4
  sleep 2
  done
  nvme list

  3> run mdt.hd on those namespaces
   
  Contact Information = naveed...@in.ibm.com 
   
  Stack trace output:
   -

  -
  Device id:/dev/nvme0n8  
  Timestamp:Feb 20 16:57:30 2018
  err=
  sev=1
  Exerciser Name:hxestorage
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available
  Device:Not Available
  Error Text:Hardware Exerciser stopped on error

  
  -

  - 
  
  Device id:/dev/nvme0n10 
  Timestamp:Feb 20 16:57:36 2018   
  err=
  sev=1
  Exerciser Name:hxestorage   
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available   
  Device:Not Available
  Error Text:Hung I/O alert! Segment table-1,  Detected 1 I/O(s) hung.
  Current time: 1519163856; hang criteria: 600 secs, Hard hang threshold: 3
  Process ID: 0x8161
 1st lba   Blocks   KernelHang   Duration
  (Hex)(Hex)ThreadCnt(Secs)
  ** Threshold of 1800 secs on one or more I/Os exceeded!
  0x5ae08b 8 7e0457eaf180  44800 

 
  -

  -
  Device id:/dev/nvme0n10 
  Timestamp:Feb 20 16:57:36 2018
  err=
  sev=1
  Exerciser Name:hxestorage
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available
  Device:Not Available
  Error Text:Hardware Exerciser stopped on error

  
  -

  - 
  
  Device id:/dev/nvme0n4  
  Timestamp:Feb 20 17:14:19 2018   
  err=
  sev=4
  Exerciser Name:hxestorage   
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available   
  Device:Not Available
  Error Text:Hung I/O alert! Segment table-1,  Detected 1 I/O(s) hung.
  Current time: 1519164859; hang criteria: 600 secs, Hard hang threshold: 3
  Process ID: 0x815b
 1st lba   Blocks   KernelHang   Duration
  (Hex)(Hex)ThreadCnt(Secs)
  0x398a7e 2 71d5a180  33000 

 
  -

  [17643.202114] INFO: task hxestorage:39744 blocked for more than 120 seconds.
  [17643.202180]   Not tainted 4.15.0-10-generic #11-Ubuntu
  [17643.202224] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [17643.202342] hxestorage  D  

[Kernel-packages] [Bug 1757497] Re: Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16 namespaces (Bolt / NVMe)

2018-04-03 Thread Frank Heimes
** Changed in: ubuntu-power-systems
   Status: Triaged => In Progress

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1757497

Title:
  Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16
  namespaces  (Bolt / NVMe)

Status in The Ubuntu-power-systems project:
  In Progress
Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress

Bug description:
  ---Problem Description---
  We are seeing similar IO Hang on some namespaces when running HTX 16 
namespaces on Ubuntu18.04 
   
  ---uname output---
  Linux ltciofvtr-spoon4 4.15.0-10-generic #11-Ubuntu SMP Tue Feb 13 18:21:52 
UTC 2018 ppc64le ppc64le ppc64le GNU/Linux
   
  ---Additional Hardware Info---
  (Bolt / NVMe)0003:01:00.0 Non-Volatile memory controller [0108]: Samsung 
Electronics Co Ltd NVMe SSD Controller 172Xa [144d:a822] (rev 01)

  Machine Type = AC922 
   
  ---Steps to Reproduce---
   1> Install Ubuntu18.04 , upgrade to 4.15.0-10 kernel
  2> Install htxubuntu-472.deb
  3> make sure you create name spaces 
  #!/bin/bash

  device=/dev/nvme0
  echo $device

  nvme format $device

  nvme set-feature $device -f 0x0b --value=0x0100

  nvme delete-ns $device -n 0x
  sleep 5
  nvme list

  nvme get-log $device -l 200 -i 4

  max=`nvme id-ctrl $device | grep ^nn | awk '{print $NF}'`

  for i in $(eval echo {1..$max})
  do
  echo $i
  nvme create-ns $device --nsze=700 --ncap=700 --flbas=0 --dps=0
  nvme attach-ns $device --namespace-id=$i --controllers=`nvme list-ctrl 
$device | awk -F: '{print $2}'`
  sleep 2
  nvme get-log $device -l 200 -i 4
  sleep 2
  done
  nvme list

  3> run mdt.hd on those namespaces
   
  Contact Information = naveed...@in.ibm.com 
   
  Stack trace output:
   -

  -
  Device id:/dev/nvme0n8  
  Timestamp:Feb 20 16:57:30 2018
  err=
  sev=1
  Exerciser Name:hxestorage
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available
  Device:Not Available
  Error Text:Hardware Exerciser stopped on error

  
  -

  - 
  
  Device id:/dev/nvme0n10 
  Timestamp:Feb 20 16:57:36 2018   
  err=
  sev=1
  Exerciser Name:hxestorage   
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available   
  Device:Not Available
  Error Text:Hung I/O alert! Segment table-1,  Detected 1 I/O(s) hung.
  Current time: 1519163856; hang criteria: 600 secs, Hard hang threshold: 3
  Process ID: 0x8161
 1st lba   Blocks   KernelHang   Duration
  (Hex)(Hex)ThreadCnt(Secs)
  ** Threshold of 1800 secs on one or more I/Os exceeded!
  0x5ae08b 8 7e0457eaf180  44800 

 
  -

  -
  Device id:/dev/nvme0n10 
  Timestamp:Feb 20 16:57:36 2018
  err=
  sev=1
  Exerciser Name:hxestorage
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available
  Device:Not Available
  Error Text:Hardware Exerciser stopped on error

  
  -

  - 
  
  Device id:/dev/nvme0n4  
  Timestamp:Feb 20 17:14:19 2018   
  err=
  sev=4
  Exerciser Name:hxestorage   
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available   
  Device:Not Available
  Error Text:Hung I/O alert! Segment table-1,  Detected 1 I/O(s) hung.
  Current time: 1519164859; hang criteria: 600 secs, Hard hang threshold: 3
  Process ID: 0x815b
 1st lba   Blocks   KernelHang   Duration
  (Hex)(Hex)ThreadCnt(Secs)
  0x398a7e 2 71d5a180  33000 

 
  -

  [17643.202114] INFO: task hxestorage:39744 blocked for more than 120 seconds.
  [17643.202180]   Not tainted 4.15.0-10-generic #11-Ubuntu
  [17643.202224] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [17643.202342] hxestorage  D0 39744   3424 

[Kernel-packages] [Bug 1757497] Re: Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16 namespaces (Bolt / NVMe)

2018-04-03 Thread Joseph Salisbury
** Changed in: linux (Ubuntu Bionic)
   Status: Triaged => In Progress

** Changed in: linux (Ubuntu Bionic)
 Assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) => 
Joseph Salisbury (jsalisbury)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1757497

Title:
  Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16
  namespaces  (Bolt / NVMe)

Status in The Ubuntu-power-systems project:
  Triaged
Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress

Bug description:
  ---Problem Description---
  We are seeing similar IO Hang on some namespaces when running HTX 16 
namespaces on Ubuntu18.04 
   
  ---uname output---
  Linux ltciofvtr-spoon4 4.15.0-10-generic #11-Ubuntu SMP Tue Feb 13 18:21:52 
UTC 2018 ppc64le ppc64le ppc64le GNU/Linux
   
  ---Additional Hardware Info---
  (Bolt / NVMe)0003:01:00.0 Non-Volatile memory controller [0108]: Samsung 
Electronics Co Ltd NVMe SSD Controller 172Xa [144d:a822] (rev 01)

  Machine Type = AC922 
   
  ---Steps to Reproduce---
   1> Install Ubuntu18.04 , upgrade to 4.15.0-10 kernel
  2> Install htxubuntu-472.deb
  3> make sure you create name spaces 
  #!/bin/bash

  device=/dev/nvme0
  echo $device

  nvme format $device

  nvme set-feature $device -f 0x0b --value=0x0100

  nvme delete-ns $device -n 0x
  sleep 5
  nvme list

  nvme get-log $device -l 200 -i 4

  max=`nvme id-ctrl $device | grep ^nn | awk '{print $NF}'`

  for i in $(eval echo {1..$max})
  do
  echo $i
  nvme create-ns $device --nsze=700 --ncap=700 --flbas=0 --dps=0
  nvme attach-ns $device --namespace-id=$i --controllers=`nvme list-ctrl 
$device | awk -F: '{print $2}'`
  sleep 2
  nvme get-log $device -l 200 -i 4
  sleep 2
  done
  nvme list

  3> run mdt.hd on those namespaces
   
  Contact Information = naveed...@in.ibm.com 
   
  Stack trace output:
   -

  -
  Device id:/dev/nvme0n8  
  Timestamp:Feb 20 16:57:30 2018
  err=
  sev=1
  Exerciser Name:hxestorage
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available
  Device:Not Available
  Error Text:Hardware Exerciser stopped on error

  
  -

  - 
  
  Device id:/dev/nvme0n10 
  Timestamp:Feb 20 16:57:36 2018   
  err=
  sev=1
  Exerciser Name:hxestorage   
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available   
  Device:Not Available
  Error Text:Hung I/O alert! Segment table-1,  Detected 1 I/O(s) hung.
  Current time: 1519163856; hang criteria: 600 secs, Hard hang threshold: 3
  Process ID: 0x8161
 1st lba   Blocks   KernelHang   Duration
  (Hex)(Hex)ThreadCnt(Secs)
  ** Threshold of 1800 secs on one or more I/Os exceeded!
  0x5ae08b 8 7e0457eaf180  44800 

 
  -

  -
  Device id:/dev/nvme0n10 
  Timestamp:Feb 20 16:57:36 2018
  err=
  sev=1
  Exerciser Name:hxestorage
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available
  Device:Not Available
  Error Text:Hardware Exerciser stopped on error

  
  -

  - 
  
  Device id:/dev/nvme0n4  
  Timestamp:Feb 20 17:14:19 2018   
  err=
  sev=4
  Exerciser Name:hxestorage   
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available   
  Device:Not Available
  Error Text:Hung I/O alert! Segment table-1,  Detected 1 I/O(s) hung.
  Current time: 1519164859; hang criteria: 600 secs, Hard hang threshold: 3
  Process ID: 0x815b
 1st lba   Blocks   KernelHang   Duration
  (Hex)(Hex)ThreadCnt(Secs)
  0x398a7e 2 71d5a180  33000 

 
  -

  [17643.202114] INFO: task hxestorage:39744 blocked for more than 120 seconds.
  [17643.202180]   Not tainted 4.15.0-10-generic #11-Ubuntu
  

[Kernel-packages] [Bug 1757497] Re: Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16 namespaces (Bolt / NVMe)

2018-03-26 Thread Frank Heimes
** Tags added: triage-g

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1757497

Title:
  Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16
  namespaces  (Bolt / NVMe)

Status in The Ubuntu-power-systems project:
  Triaged
Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Bionic:
  Triaged

Bug description:
  ---Problem Description---
  We are seeing similar IO Hang on some namespaces when running HTX 16 
namespaces on Ubuntu18.04 
   
  ---uname output---
  Linux ltciofvtr-spoon4 4.15.0-10-generic #11-Ubuntu SMP Tue Feb 13 18:21:52 
UTC 2018 ppc64le ppc64le ppc64le GNU/Linux
   
  ---Additional Hardware Info---
  (Bolt / NVMe)0003:01:00.0 Non-Volatile memory controller [0108]: Samsung 
Electronics Co Ltd NVMe SSD Controller 172Xa [144d:a822] (rev 01)

  Machine Type = AC922 
   
  ---Steps to Reproduce---
   1> Install Ubuntu18.04 , upgrade to 4.15.0-10 kernel
  2> Install htxubuntu-472.deb
  3> make sure you create name spaces 
  #!/bin/bash

  device=/dev/nvme0
  echo $device

  nvme format $device

  nvme set-feature $device -f 0x0b --value=0x0100

  nvme delete-ns $device -n 0x
  sleep 5
  nvme list

  nvme get-log $device -l 200 -i 4

  max=`nvme id-ctrl $device | grep ^nn | awk '{print $NF}'`

  for i in $(eval echo {1..$max})
  do
  echo $i
  nvme create-ns $device --nsze=700 --ncap=700 --flbas=0 --dps=0
  nvme attach-ns $device --namespace-id=$i --controllers=`nvme list-ctrl 
$device | awk -F: '{print $2}'`
  sleep 2
  nvme get-log $device -l 200 -i 4
  sleep 2
  done
  nvme list

  3> run mdt.hd on those namespaces
   
  Contact Information = naveed...@in.ibm.com 
   
  Stack trace output:
   -

  -
  Device id:/dev/nvme0n8  
  Timestamp:Feb 20 16:57:30 2018
  err=
  sev=1
  Exerciser Name:hxestorage
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available
  Device:Not Available
  Error Text:Hardware Exerciser stopped on error

  
  -

  - 
  
  Device id:/dev/nvme0n10 
  Timestamp:Feb 20 16:57:36 2018   
  err=
  sev=1
  Exerciser Name:hxestorage   
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available   
  Device:Not Available
  Error Text:Hung I/O alert! Segment table-1,  Detected 1 I/O(s) hung.
  Current time: 1519163856; hang criteria: 600 secs, Hard hang threshold: 3
  Process ID: 0x8161
 1st lba   Blocks   KernelHang   Duration
  (Hex)(Hex)ThreadCnt(Secs)
  ** Threshold of 1800 secs on one or more I/Os exceeded!
  0x5ae08b 8 7e0457eaf180  44800 

 
  -

  -
  Device id:/dev/nvme0n10 
  Timestamp:Feb 20 16:57:36 2018
  err=
  sev=1
  Exerciser Name:hxestorage
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available
  Device:Not Available
  Error Text:Hardware Exerciser stopped on error

  
  -

  - 
  
  Device id:/dev/nvme0n4  
  Timestamp:Feb 20 17:14:19 2018   
  err=
  sev=4
  Exerciser Name:hxestorage   
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available   
  Device:Not Available
  Error Text:Hung I/O alert! Segment table-1,  Detected 1 I/O(s) hung.
  Current time: 1519164859; hang criteria: 600 secs, Hard hang threshold: 3
  Process ID: 0x815b
 1st lba   Blocks   KernelHang   Duration
  (Hex)(Hex)ThreadCnt(Secs)
  0x398a7e 2 71d5a180  33000 

 
  -

  [17643.202114] INFO: task hxestorage:39744 blocked for more than 120 seconds.
  [17643.202180]   Not tainted 4.15.0-10-generic #11-Ubuntu
  [17643.202224] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [17643.202342] hxestorage  D0 39744   3424 0x0004
  [17643.202346] Call Trace:
  [17643.202352] 

[Kernel-packages] [Bug 1757497] Re: Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16 namespaces (Bolt / NVMe)

2018-03-21 Thread Frank Heimes
** Changed in: ubuntu-power-systems
   Status: New => Triaged

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1757497

Title:
  Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16
  namespaces  (Bolt / NVMe)

Status in The Ubuntu-power-systems project:
  Triaged
Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Bionic:
  Triaged

Bug description:
  ---Problem Description---
  We are seeing similar IO Hang on some namespaces when running HTX 16 
namespaces on Ubuntu18.04 
   
  ---uname output---
  Linux ltciofvtr-spoon4 4.15.0-10-generic #11-Ubuntu SMP Tue Feb 13 18:21:52 
UTC 2018 ppc64le ppc64le ppc64le GNU/Linux
   
  ---Additional Hardware Info---
  (Bolt / NVMe)0003:01:00.0 Non-Volatile memory controller [0108]: Samsung 
Electronics Co Ltd NVMe SSD Controller 172Xa [144d:a822] (rev 01)

  Machine Type = AC922 
   
  ---Steps to Reproduce---
   1> Install Ubuntu18.04 , upgrade to 4.15.0-10 kernel
  2> Install htxubuntu-472.deb
  3> make sure you create name spaces 
  #!/bin/bash

  device=/dev/nvme0
  echo $device

  nvme format $device

  nvme set-feature $device -f 0x0b --value=0x0100

  nvme delete-ns $device -n 0x
  sleep 5
  nvme list

  nvme get-log $device -l 200 -i 4

  max=`nvme id-ctrl $device | grep ^nn | awk '{print $NF}'`

  for i in $(eval echo {1..$max})
  do
  echo $i
  nvme create-ns $device --nsze=700 --ncap=700 --flbas=0 --dps=0
  nvme attach-ns $device --namespace-id=$i --controllers=`nvme list-ctrl 
$device | awk -F: '{print $2}'`
  sleep 2
  nvme get-log $device -l 200 -i 4
  sleep 2
  done
  nvme list

  3> run mdt.hd on those namespaces
   
  Contact Information = naveed...@in.ibm.com 
   
  Stack trace output:
   -

  -
  Device id:/dev/nvme0n8  
  Timestamp:Feb 20 16:57:30 2018
  err=
  sev=1
  Exerciser Name:hxestorage
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available
  Device:Not Available
  Error Text:Hardware Exerciser stopped on error

  
  -

  - 
  
  Device id:/dev/nvme0n10 
  Timestamp:Feb 20 16:57:36 2018   
  err=
  sev=1
  Exerciser Name:hxestorage   
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available   
  Device:Not Available
  Error Text:Hung I/O alert! Segment table-1,  Detected 1 I/O(s) hung.
  Current time: 1519163856; hang criteria: 600 secs, Hard hang threshold: 3
  Process ID: 0x8161
 1st lba   Blocks   KernelHang   Duration
  (Hex)(Hex)ThreadCnt(Secs)
  ** Threshold of 1800 secs on one or more I/Os exceeded!
  0x5ae08b 8 7e0457eaf180  44800 

 
  -

  -
  Device id:/dev/nvme0n10 
  Timestamp:Feb 20 16:57:36 2018
  err=
  sev=1
  Exerciser Name:hxestorage
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available
  Device:Not Available
  Error Text:Hardware Exerciser stopped on error

  
  -

  - 
  
  Device id:/dev/nvme0n4  
  Timestamp:Feb 20 17:14:19 2018   
  err=
  sev=4
  Exerciser Name:hxestorage   
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available   
  Device:Not Available
  Error Text:Hung I/O alert! Segment table-1,  Detected 1 I/O(s) hung.
  Current time: 1519164859; hang criteria: 600 secs, Hard hang threshold: 3
  Process ID: 0x815b
 1st lba   Blocks   KernelHang   Duration
  (Hex)(Hex)ThreadCnt(Secs)
  0x398a7e 2 71d5a180  33000 

 
  -

  [17643.202114] INFO: task hxestorage:39744 blocked for more than 120 seconds.
  [17643.202180]   Not tainted 4.15.0-10-generic #11-Ubuntu
  [17643.202224] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [17643.202342] hxestorage  D0 39744   3424 0x0004
  

[Kernel-packages] [Bug 1757497] Re: Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16 namespaces (Bolt / NVMe)

2018-03-21 Thread Andrew Cloke
** Also affects: ubuntu-power-systems
   Importance: Undecided
   Status: New

** Changed in: ubuntu-power-systems
   Importance: Undecided => High

** Changed in: ubuntu-power-systems
 Assignee: (unassigned) => Canonical Kernel Team (canonical-kernel-team)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1757497

Title:
  Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16
  namespaces  (Bolt / NVMe)

Status in The Ubuntu-power-systems project:
  New
Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Bionic:
  Triaged

Bug description:
  ---Problem Description---
  We are seeing similar IO Hang on some namespaces when running HTX 16 
namespaces on Ubuntu18.04 
   
  ---uname output---
  Linux ltciofvtr-spoon4 4.15.0-10-generic #11-Ubuntu SMP Tue Feb 13 18:21:52 
UTC 2018 ppc64le ppc64le ppc64le GNU/Linux
   
  ---Additional Hardware Info---
  (Bolt / NVMe)0003:01:00.0 Non-Volatile memory controller [0108]: Samsung 
Electronics Co Ltd NVMe SSD Controller 172Xa [144d:a822] (rev 01)

  Machine Type = AC922 
   
  ---Steps to Reproduce---
   1> Install Ubuntu18.04 , upgrade to 4.15.0-10 kernel
  2> Install htxubuntu-472.deb
  3> make sure you create name spaces 
  #!/bin/bash

  device=/dev/nvme0
  echo $device

  nvme format $device

  nvme set-feature $device -f 0x0b --value=0x0100

  nvme delete-ns $device -n 0x
  sleep 5
  nvme list

  nvme get-log $device -l 200 -i 4

  max=`nvme id-ctrl $device | grep ^nn | awk '{print $NF}'`

  for i in $(eval echo {1..$max})
  do
  echo $i
  nvme create-ns $device --nsze=700 --ncap=700 --flbas=0 --dps=0
  nvme attach-ns $device --namespace-id=$i --controllers=`nvme list-ctrl 
$device | awk -F: '{print $2}'`
  sleep 2
  nvme get-log $device -l 200 -i 4
  sleep 2
  done
  nvme list

  3> run mdt.hd on those namespaces
   
  Contact Information = naveed...@in.ibm.com 
   
  Stack trace output:
   -

  -
  Device id:/dev/nvme0n8  
  Timestamp:Feb 20 16:57:30 2018
  err=
  sev=1
  Exerciser Name:hxestorage
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available
  Device:Not Available
  Error Text:Hardware Exerciser stopped on error

  
  -

  - 
  
  Device id:/dev/nvme0n10 
  Timestamp:Feb 20 16:57:36 2018   
  err=
  sev=1
  Exerciser Name:hxestorage   
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available   
  Device:Not Available
  Error Text:Hung I/O alert! Segment table-1,  Detected 1 I/O(s) hung.
  Current time: 1519163856; hang criteria: 600 secs, Hard hang threshold: 3
  Process ID: 0x8161
 1st lba   Blocks   KernelHang   Duration
  (Hex)(Hex)ThreadCnt(Secs)
  ** Threshold of 1800 secs on one or more I/Os exceeded!
  0x5ae08b 8 7e0457eaf180  44800 

 
  -

  -
  Device id:/dev/nvme0n10 
  Timestamp:Feb 20 16:57:36 2018
  err=
  sev=1
  Exerciser Name:hxestorage
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available
  Device:Not Available
  Error Text:Hardware Exerciser stopped on error

  
  -

  - 
  
  Device id:/dev/nvme0n4  
  Timestamp:Feb 20 17:14:19 2018   
  err=
  sev=4
  Exerciser Name:hxestorage   
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available   
  Device:Not Available
  Error Text:Hung I/O alert! Segment table-1,  Detected 1 I/O(s) hung.
  Current time: 1519164859; hang criteria: 600 secs, Hard hang threshold: 3
  Process ID: 0x815b
 1st lba   Blocks   KernelHang   Duration
  (Hex)(Hex)ThreadCnt(Secs)
  0x398a7e 2 71d5a180  33000 

 
  -

  [17643.202114] INFO: task hxestorage:39744 blocked for more than 120 seconds.
  [17643.202180]   Not tainted 

[Kernel-packages] [Bug 1757497] Re: Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16 namespaces (Bolt / NVMe)

2018-03-21 Thread Joseph Salisbury
** Tags added: kernel-da-key

** Also affects: linux (Ubuntu Bionic)
   Importance: Undecided
 Assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
   Status: New

** Changed in: linux (Ubuntu Bionic)
   Importance: Undecided => High

** Changed in: linux (Ubuntu Bionic)
   Status: New => Triaged

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1757497

Title:
  Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16
  namespaces  (Bolt / NVMe)

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Bionic:
  Triaged

Bug description:
  ---Problem Description---
  We are seeing similar IO Hang on some namespaces when running HTX 16 
namespaces on Ubuntu18.04 
   
  ---uname output---
  Linux ltciofvtr-spoon4 4.15.0-10-generic #11-Ubuntu SMP Tue Feb 13 18:21:52 
UTC 2018 ppc64le ppc64le ppc64le GNU/Linux
   
  ---Additional Hardware Info---
  (Bolt / NVMe)0003:01:00.0 Non-Volatile memory controller [0108]: Samsung 
Electronics Co Ltd NVMe SSD Controller 172Xa [144d:a822] (rev 01)

  Machine Type = AC922 
   
  ---Steps to Reproduce---
   1> Install Ubuntu18.04 , upgrade to 4.15.0-10 kernel
  2> Install htxubuntu-472.deb
  3> make sure you create name spaces 
  #!/bin/bash

  device=/dev/nvme0
  echo $device

  nvme format $device

  nvme set-feature $device -f 0x0b --value=0x0100

  nvme delete-ns $device -n 0x
  sleep 5
  nvme list

  nvme get-log $device -l 200 -i 4

  max=`nvme id-ctrl $device | grep ^nn | awk '{print $NF}'`

  for i in $(eval echo {1..$max})
  do
  echo $i
  nvme create-ns $device --nsze=700 --ncap=700 --flbas=0 --dps=0
  nvme attach-ns $device --namespace-id=$i --controllers=`nvme list-ctrl 
$device | awk -F: '{print $2}'`
  sleep 2
  nvme get-log $device -l 200 -i 4
  sleep 2
  done
  nvme list

  3> run mdt.hd on those namespaces
   
  Contact Information = naveed...@in.ibm.com 
   
  Stack trace output:
   -

  -
  Device id:/dev/nvme0n8  
  Timestamp:Feb 20 16:57:30 2018
  err=
  sev=1
  Exerciser Name:hxestorage
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available
  Device:Not Available
  Error Text:Hardware Exerciser stopped on error

  
  -

  - 
  
  Device id:/dev/nvme0n10 
  Timestamp:Feb 20 16:57:36 2018   
  err=
  sev=1
  Exerciser Name:hxestorage   
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available   
  Device:Not Available
  Error Text:Hung I/O alert! Segment table-1,  Detected 1 I/O(s) hung.
  Current time: 1519163856; hang criteria: 600 secs, Hard hang threshold: 3
  Process ID: 0x8161
 1st lba   Blocks   KernelHang   Duration
  (Hex)(Hex)ThreadCnt(Secs)
  ** Threshold of 1800 secs on one or more I/Os exceeded!
  0x5ae08b 8 7e0457eaf180  44800 

 
  -

  -
  Device id:/dev/nvme0n10 
  Timestamp:Feb 20 16:57:36 2018
  err=
  sev=1
  Exerciser Name:hxestorage
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available
  Device:Not Available
  Error Text:Hardware Exerciser stopped on error

  
  -

  - 
  
  Device id:/dev/nvme0n4  
  Timestamp:Feb 20 17:14:19 2018   
  err=
  sev=4
  Exerciser Name:hxestorage   
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available   
  Device:Not Available
  Error Text:Hung I/O alert! Segment table-1,  Detected 1 I/O(s) hung.
  Current time: 1519164859; hang criteria: 600 secs, Hard hang threshold: 3
  Process ID: 0x815b
 1st lba   Blocks   KernelHang   Duration
  (Hex)(Hex)ThreadCnt(Secs)
  0x398a7e 2 71d5a180  33000 

 
  -

  [17643.202114] INFO: task hxestorage:39744 blocked for more than 120 seconds.
  [17643.202180]