Re: [PVE-User] Proxmox 6 - disk problem

2019-08-22 Thread lord_Niedzwiedz

Hello,

Disks are nvme (m.2), inserted through pci-e plates.
Until now, everything worked fine and never hung on proxmox 5-4.

root@tomas:/var/log# smartctl -a /*/dev/sda*/
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda ES.2
Device Model: ST31000340NS
Serial Number:    9QJ2LV6L
LU WWN Device Id: 5 000c50 01082a141
Firmware Version: SN05
User Capacity:    1,000,203,804,160 bytes [1.00 TB]
Sector Size:  512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:    In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Thu Aug 22 10:16:10 2019 CEST

==> WARNING: There are known problems with these drives,
see the following Seagate web pages:
http://knowledge.seagate.com/articles/en_US/FAQ/207931en
http://knowledge.seagate.com/articles/en_US/FAQ/207963en

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)    Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:  (   0)    The previous self-test 
routine completed

                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (  642) seconds.
Offline data collection
capabilities:              (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:    (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:    (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   1) minutes.
Extended self-test routine
recommended polling time:      ( 237) minutes.
Conveyance self-test routine
recommended polling time:      (   2) minutes.
SCT capabilities:        (0x003d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE UPDATED  
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000f   078   063   044    Pre-fail 
Always   -   60157077
  3 Spin_Up_Time    0x0003   099   099   000    Pre-fail 
Always   -   0
  4 Start_Stop_Count    0x0032   100   100   020    Old_age 
Always   -   119
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail 
Always   -   3
  7 Seek_Error_Rate 0x000f   080   060   030    Pre-fail 
Always   -   22054801530
  9 Power_On_Hours  0x0032   089   011   000    Old_age 
Always   -   10379
 10 Spin_Retry_Count    0x0013   100   100   097    Pre-fail 
Always   -   0
 12 Power_Cycle_Count   0x0032   100   037   020    Old_age 
Always   -   120
184 End-to-End_Error    0x0032   100   100   099    Old_age 
Always   -   0
187 Reported_Uncorrect  0x0032   100   100   000    Old_age 
Always   -   0
188 Command_Timeout 0x0032   100   100   000    Old_age 
Always   -   0
189 High_Fly_Writes 0x003a   100   100   000    Old_age 
Always   -   0
190 Airflow_Temperature_Cel 0x0022   067   048   045    Old_age 
Always   -   33 (Min/Max 28/34)
194 Temperature_Celsius 0x0022   033   052   000    Old_age 
Always   -   33 (0 15 0 0 0)
195 Hardware_ECC_Recovered  0x001a   027   008   000    Old_age 
Always   -   60157077
197 Current_Pending_Sector  0x0012   100   100   000    Old_age 
Always   -   0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age 
Offline  -   0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age 
Always   -   0


SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1    0    0  Not_testing
    2    0    0  Not_testing
    3    0    0  Not_testing
    4    0    0  Not_testing
    5    0 

Re: [PVE-User] Proxmox 6 - disk problem

2019-08-22 Thread Eneko Lacunza

Hi,

So what disks/RAID controller are there on the server? :)

My guess is disk if failed :) Did you try smartctl ?

Also, I think attachments are stripped off :)

Cheers

El 22/8/19 a las 10:03, lord_Niedzwiedz escribió:

CPU usage 0.04% of 32 CPU(s)
_/*IO delay    20.38%        !!*/_
Load average    37.97,37.26,30.31
RAM usage    45.25% (56.93 GiB of 125.81 GiB)
KSM sharing    0 B
HD space(root)    0.53% (1.32 GiB of 247.29 GiB)
SWAP usage        N/A
CPU(s)        32 x AMD EPYC 7281 16-Core Processor (1 Socket)
Kernel Version        Linux 5.0.15-1-pve #1 SMP PVE 5.0.15-1 (Wed, 03 
Jul 2019 10:51:57 +0200)

PVE Manager Version        pve-manager/6.0-4/2a719255

Proxmox working very slowly.
I stop all VM.

htop -    say nothing
iotop    -    say nothing


If i try command:
# sync
- shell waiting !! ;/


This same too:
root@tomas:~# pveperf
CPU BOGOMIPS:  134377.28
REGEX/SECOND:  2100393
HD SIZE:   247.29 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND: 531.28

^C^Z
[1]+  Stopped pveperf
root@tomas:~# ^C

_/*After this:*/__/*    IO delay         40%*/_


In a phisical console i heave:
INFO: task zwol:554 blocked for more than 120 seconds.
Tainted:    P    0    5.0.15-1-pve #1
"echo 0 > /prox/sys/kernel/hung_task_timeout_sec" disables this message.
INFO: task txg_quiesce:1007 blocked for more than 120 seconds.
Tainted:    P    0    5.0.15-1-pve #1
"echo 0 > /prox/sys/kernel/hung_task_timeout_sec" disables this message.
INFO: task kvm:27326 blocked for more than 120 seconds.
Tainted:    P    0    5.0.15-1-pve #1
"echo 0 > /prox/sys/kernel/hung_task_timeout_sec" disables this message.
INFO: task kvm:8930 blocked for more than 120 seconds.
Tainted:    P    0    5.0.15-1-pve #1
"echo 0 > /prox/sys/kernel/hung_task_timeout_sec" disables this message.
INFO: task zvol:26963 blocked for more than 120 seconds.
Tainted:    P    0    5.0.15-1-pve #1
"echo 0 > /prox/sys/kernel/hung_task_timeout_sec" disables this message.
INFO: task zvol:26967 blocked for more than 120 seconds.
Tainted:    P    0    5.0.15-1-pve #1
"echo 0 > /prox/sys/kernel/hung_task_timeout_sec" disables this message.
INFO: task zvol:26972 blocked for more than 120 seconds.
Tainted:    P    0    5.0.15-1-pve #1
"echo 0 > /prox/sys/kernel/hung_task_timeout_sec" disables this message.
INFO: task zvol:26974 blocked for more than 120 seconds.
Tainted:    P    0    5.0.15-1-pve #1
"echo 0 > /prox/sys/kernel/hung_task_timeout_sec" disables this message.
INFO: task zvol:26976 blocked for more than 120 seconds.
Tainted:    P    0    5.0.15-1-pve #1
"echo 0 > /prox/sys/kernel/hung_task_timeout_sec" disables this message.
INFO: task zvol:26980 blocked for more than 120 seconds.

At the restart on end i heave:
[  !!  ]  Froceibly rebooting: Ctrl-Alt-Del was pressed more than 7 
times within 2s
Systemd-shutdown[1]: Syncing filesystems and block devices - time out, 
issuing SIGKILL to PID 3940.

Started bpfilter
pvefw-logger [24351]: received terminate request (signal)
pvefw-logger [24351]: stopping pvefw logger

Server not stop/restart   ;-/
Any idea        ??!!

log file included.











___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user



--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943569206
Astigarraga bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


[PVE-User] Proxmox 6 - disk problem

2019-08-22 Thread lord_Niedzwiedz

CPU usage    0.04% of 32 CPU(s)
_/*IO delay    20.38%        !!*/_
Load average    37.97,37.26,30.31
RAM usage    45.25% (56.93 GiB of 125.81 GiB)
KSM sharing    0 B
HD space(root)    0.53% (1.32 GiB of 247.29 GiB)
SWAP usage        N/A
CPU(s)        32 x AMD EPYC 7281 16-Core Processor (1 Socket)
Kernel Version        Linux 5.0.15-1-pve #1 SMP PVE 5.0.15-1 (Wed, 03 
Jul 2019 10:51:57 +0200)

PVE Manager Version        pve-manager/6.0-4/2a719255

Proxmox working very slowly.
I stop all VM.

htop -    say nothing
iotop    -    say nothing


If i try command:
# sync
- shell waiting !! ;/


This same too:
root@tomas:~# pveperf
CPU BOGOMIPS:  134377.28
REGEX/SECOND:  2100393
HD SIZE:   247.29 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND: 531.28

^C^Z
[1]+  Stopped pveperf
root@tomas:~# ^C

_/*After this:*/__/*    IO delay         40%*/_


In a phisical console i heave:
INFO: task zwol:554 blocked for more than 120 seconds.
Tainted:    P    0    5.0.15-1-pve #1
"echo 0 > /prox/sys/kernel/hung_task_timeout_sec" disables this message.
INFO: task txg_quiesce:1007 blocked for more than 120 seconds.
Tainted:    P    0    5.0.15-1-pve #1
"echo 0 > /prox/sys/kernel/hung_task_timeout_sec" disables this message.
INFO: task kvm:27326 blocked for more than 120 seconds.
Tainted:    P    0    5.0.15-1-pve #1
"echo 0 > /prox/sys/kernel/hung_task_timeout_sec" disables this message.
INFO: task kvm:8930 blocked for more than 120 seconds.
Tainted:    P    0    5.0.15-1-pve #1
"echo 0 > /prox/sys/kernel/hung_task_timeout_sec" disables this message.
INFO: task zvol:26963 blocked for more than 120 seconds.
Tainted:    P    0    5.0.15-1-pve #1
"echo 0 > /prox/sys/kernel/hung_task_timeout_sec" disables this message.
INFO: task zvol:26967 blocked for more than 120 seconds.
Tainted:    P    0    5.0.15-1-pve #1
"echo 0 > /prox/sys/kernel/hung_task_timeout_sec" disables this message.
INFO: task zvol:26972 blocked for more than 120 seconds.
Tainted:    P    0    5.0.15-1-pve #1
"echo 0 > /prox/sys/kernel/hung_task_timeout_sec" disables this message.
INFO: task zvol:26974 blocked for more than 120 seconds.
Tainted:    P    0    5.0.15-1-pve #1
"echo 0 > /prox/sys/kernel/hung_task_timeout_sec" disables this message.
INFO: task zvol:26976 blocked for more than 120 seconds.
Tainted:    P    0    5.0.15-1-pve #1
"echo 0 > /prox/sys/kernel/hung_task_timeout_sec" disables this message.
INFO: task zvol:26980 blocked for more than 120 seconds.

At the restart on end i heave:
[  !!  ]  Froceibly rebooting: Ctrl-Alt-Del was pressed more than 7 
times within 2s
Systemd-shutdown[1]: Syncing filesystems and block devices - time out, 
issuing SIGKILL to PID 3940.

Started bpfilter
pvefw-logger [24351]: received terminate request (signal)
pvefw-logger [24351]: stopping pvefw logger

Server not stop/restart   ;-/
Any idea        ??!!

log file included.











___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user