Re: [gpfsug-discuss] /sbin/rmmod mmfs26 hangs on mmshutdown

2018-07-12 Thread Sven Oehme
Hi,

the problem is the cleanup of the tokens and/or the openfile objects. i
suggest you open a defect for this.

sven


On Thu, Jul 12, 2018 at 8:22 AM Billich Heinrich Rainer (PSI) <
heiner.bill...@psi.ch> wrote:

>
>
>
>
> Hello Sven,
>
>
>
> The machine has
>
>
>
> maxFilesToCache 204800   (2M)
>
>
>
> it will become a CES node, hence the higher than default value. It’s just
> a 3 node cluster with remote cluster mount and no activity (yet). But all
> three nodes are listed as token server by ‘mmdiag –tokenmgr’.
>
>
>
> Top showed 100% idle on core 55.  This matches the kernel messages about
> rmmod being stuck on core 55.
>
> I didn’t see a dominating thread/process, but many kernel threads showed
> 30-40% CPU, in sum that used  about 50% of all cpu available.
>
>
>
> This time mmshutdown did return and left the module loaded, next mmstartup
> tried to remove the ‘old’ module and got stuck :-(
>
>
>
> I append two links to screenshots
>
>
>
> Thank you,
>
>
>
> Heiner
>
>
>
> https://pasteboard.co/Hu86DKf.png
>
> https://pasteboard.co/Hu86rg4.png
>
>
>
> If the links don’t work  I can post the images to the list.
>
>
>
> Kernel messages:
>
>
>
> [  857.791050] CPU: 55 PID: 16429 Comm: rmmod Tainted: GW  OEL
>    3.10.0-693.17.1.el7.x86_64 #1
>
> [  857.842265] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9,
> BIOS P89 01/22/2018
>
> [  857.884938] task: 883ffafe8fd0 ti: 88342af3 task.ti:
> 88342af3
>
> [  857.924120] RIP: 0010:[]  []
> compound_unlock_irqrestore+0xe/0x20
>
> [  857.970708] RSP: 0018:88342af33d38  EFLAGS: 0246
>
> [  857.999742] RAX:  RBX: 88207ffda068 RCX:
> 00e5
>
> [  858.037165] RDX: 0246 RSI: 0246 RDI:
> 0246
>
> [  858.074416] RBP: 88342af33d38 R08:  R09:
> 
>
> [  858.111519] R10: 88207ffcfac0 R11: ea00fff40280 R12:
> 0200
>
> [  858.148421] R13: 0001fff40280 R14: 8118cd84 R15:
> 88342af33ce8
>
> [  858.185845] FS:  7fc797d1e740() GS:883fff0c()
> knlGS:
>
> [  858.227062] CS:  0010 DS:  ES:  CR0: 80050033
>
> [  858.257819] CR2: 004116d0 CR3: 003fc2ec CR4:
> 001607e0
>
> [  858.295143] DR0:  DR1:  DR2:
> 
>
> [  858.332145] DR3:  DR6: fffe0ff0 DR7:
> 0400
>
> [  858.369097] Call Trace:
>
> [  858.384829]  [] put_compound_page+0x149/0x174
>
> [  858.416176]  [] put_page+0x45/0x50
>
> [  858.443185]  [] cxiReleaseAndForgetPages+0xda/0x220
> [mmfslinux]
>
> [  858.481751]  [] ? cxiDeallocPageList+0xbd/0x110
> [mmfslinux]
>
> [  858.518206]  [] cxiDeallocPageList+0x45/0x110
> [mmfslinux]
>
> [  858.554438]  [] ? _raw_spin_lock+0x10/0x30
>
> [  858.585522]  [] cxiFreeSharedMemory+0x12a/0x130
> [mmfslinux]
>
> [  858.622670]  [] kxFreeAllSharedMemory+0xe2/0x160
> [mmfs26]
>
> [  858.659246]  [] mmfs+0xc85/0xca0 [mmfs26]
>
> [  858.689379]  [] gpfs_clean+0x26/0x30 [mmfslinux]
>
> [  858.722330]  [] cleanup_module+0x25/0x30 [mmfs26]
>
> [  858.755431]  [] SyS_delete_module+0x19b/0x300
>
> [  858.786882]  [] system_call_fastpath+0x16/0x1b
>
> [  858.818776] Code: 89 ca 44 89 c1 4c 8d 43 10 e8 6f 2b ff ff 89 c2 48 89
> 13 5b 5d c3 0f 1f 80 00 00 00 00 55 48 89 e5 f0 80 67 03 fe 48 89 f7 57 9d
> <0f> 1f 44 00 00 5d c3 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44
>
> [  859.068528] hrtimer: interrupt took 2877171 ns
>
> [  870.517924] INFO: rcu_sched self-detected stall on CPU { 55}  (t=240003
> jiffies g=18437 c=18436 q=194992)
>
> [  870.577882] Task dump for CPU 55:
>
> [  870.602837] rmmod   R  running task0 16429  16374
> 0x0008
>
> [  870.645206] Call Trace:
>
> [  870.666388][] sched_show_task+0xa8/0x110
>
> [  870.704271]  [] dump_cpu_task+0x39/0x70
>
> [  870.738421]  [] rcu_dump_cpu_stacks+0x90/0xd0
>
> [  870.775339]  [] rcu_check_callbacks+0x442/0x730
>
> [  870.812353]  [] ? tick_sched_do_timer+0x50/0x50
>
> [  870.848875]  [] update_process_times+0x46/0x80
>
> [  870.884847]  [] tick_sched_handle+0x30/0x70
>
> [  870.919740]  [] tick_sched_timer+0x39/0x80
>
> [  870.953660]  [] __hrtimer_run_queues+0xd4/0x260
>
> [  870.989276]  [] hrtimer_interrupt+0xaf/0x1d0
>
> [  871.023481]  [] local_apic_timer_interrupt+0x35/0x60
>
> [  871.061233]  [] smp_apic_timer_interrupt+0x3d/0x50
>
> [  871.097838]  [] apic_timer_interrupt+0x232/0x240
>
> [  871.133232][] ? put_page_testzero+0x8/0x15
>
> [  871.170089]  [] put_compound_page+0x151/0x174
>
> [  871.204221]  [] put_page+0x45/0x50
>
> [  871.234554]  [] cxiReleaseAndForgetPages+0xda/0x220
> [mmfslinux]
>
> [  871.275763]  [] ? cxiDeallocPageList+0xbd/0x110
> [mmfslinux]
>
> [  871.316987]  [] cxiDeallocPageList+0x45/0x110
> [mmfslinux]
>
> [  871.356886]  [] ? _raw_spin_lock+0x10/0x30
>
> [  871.389455]  [] cxiFreeSharedMemory+0x12a/0x130
> [mmfslinux]
>
> 

Re: [gpfsug-discuss] /sbin/rmmod mmfs26 hangs on mmshutdown

2018-07-12 Thread Billich Heinrich Rainer (PSI)


Hello Sven,

The machine has

maxFilesToCache 204800   (2M)

it will become a CES node, hence the higher than default value. It’s just a 3 
node cluster with remote cluster mount and no activity (yet). But all three 
nodes are listed as token server by ‘mmdiag –tokenmgr’.

Top showed 100% idle on core 55.  This matches the kernel messages about rmmod 
being stuck on core 55.
I didn’t see a dominating thread/process, but many kernel threads showed 30-40% 
CPU, in sum that used  about 50% of all cpu available.

This time mmshutdown did return and left the module loaded, next mmstartup 
tried to remove the ‘old’ module and got stuck :-(

I append two links to screenshots

Thank you,

Heiner

https://pasteboard.co/Hu86DKf.png
https://pasteboard.co/Hu86rg4.png

If the links don’t work  I can post the images to the list.

Kernel messages:

[  857.791050] CPU: 55 PID: 16429 Comm: rmmod Tainted: GW  OEL 
   3.10.0-693.17.1.el7.x86_64 #1
[  857.842265] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS 
P89 01/22/2018
[  857.884938] task: 883ffafe8fd0 ti: 88342af3 task.ti: 
88342af3
[  857.924120] RIP: 0010:[]  [] 
compound_unlock_irqrestore+0xe/0x20
[  857.970708] RSP: 0018:88342af33d38  EFLAGS: 0246
[  857.999742] RAX:  RBX: 88207ffda068 RCX: 00e5
[  858.037165] RDX: 0246 RSI: 0246 RDI: 0246
[  858.074416] RBP: 88342af33d38 R08:  R09: 
[  858.111519] R10: 88207ffcfac0 R11: ea00fff40280 R12: 0200
[  858.148421] R13: 0001fff40280 R14: 8118cd84 R15: 88342af33ce8
[  858.185845] FS:  7fc797d1e740() GS:883fff0c() 
knlGS:
[  858.227062] CS:  0010 DS:  ES:  CR0: 80050033
[  858.257819] CR2: 004116d0 CR3: 003fc2ec CR4: 001607e0
[  858.295143] DR0:  DR1:  DR2: 
[  858.332145] DR3:  DR6: fffe0ff0 DR7: 0400
[  858.369097] Call Trace:
[  858.384829]  [] put_compound_page+0x149/0x174
[  858.416176]  [] put_page+0x45/0x50
[  858.443185]  [] cxiReleaseAndForgetPages+0xda/0x220 
[mmfslinux]
[  858.481751]  [] ? cxiDeallocPageList+0xbd/0x110 [mmfslinux]
[  858.518206]  [] cxiDeallocPageList+0x45/0x110 [mmfslinux]
[  858.554438]  [] ? _raw_spin_lock+0x10/0x30
[  858.585522]  [] cxiFreeSharedMemory+0x12a/0x130 [mmfslinux]
[  858.622670]  [] kxFreeAllSharedMemory+0xe2/0x160 [mmfs26]
[  858.659246]  [] mmfs+0xc85/0xca0 [mmfs26]
[  858.689379]  [] gpfs_clean+0x26/0x30 [mmfslinux]
[  858.722330]  [] cleanup_module+0x25/0x30 [mmfs26]
[  858.755431]  [] SyS_delete_module+0x19b/0x300
[  858.786882]  [] system_call_fastpath+0x16/0x1b
[  858.818776] Code: 89 ca 44 89 c1 4c 8d 43 10 e8 6f 2b ff ff 89 c2 48 89 13 
5b 5d c3 0f 1f 80 00 00 00 00 55 48 89 e5 f0 80 67 03 fe 48 89 f7 57 9d <0f> 1f 
44 00 00 5d c3 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44
[  859.068528] hrtimer: interrupt took 2877171 ns
[  870.517924] INFO: rcu_sched self-detected stall on CPU { 55}  (t=240003 
jiffies g=18437 c=18436 q=194992)
[  870.577882] Task dump for CPU 55:
[  870.602837] rmmod   R  running task0 16429  16374 0x0008
[  870.645206] Call Trace:
[  870.666388][] sched_show_task+0xa8/0x110
[  870.704271]  [] dump_cpu_task+0x39/0x70
[  870.738421]  [] rcu_dump_cpu_stacks+0x90/0xd0
[  870.775339]  [] rcu_check_callbacks+0x442/0x730
[  870.812353]  [] ? tick_sched_do_timer+0x50/0x50
[  870.848875]  [] update_process_times+0x46/0x80
[  870.884847]  [] tick_sched_handle+0x30/0x70
[  870.919740]  [] tick_sched_timer+0x39/0x80
[  870.953660]  [] __hrtimer_run_queues+0xd4/0x260
[  870.989276]  [] hrtimer_interrupt+0xaf/0x1d0
[  871.023481]  [] local_apic_timer_interrupt+0x35/0x60
[  871.061233]  [] smp_apic_timer_interrupt+0x3d/0x50
[  871.097838]  [] apic_timer_interrupt+0x232/0x240
[  871.133232][] ? put_page_testzero+0x8/0x15
[  871.170089]  [] put_compound_page+0x151/0x174
[  871.204221]  [] put_page+0x45/0x50
[  871.234554]  [] cxiReleaseAndForgetPages+0xda/0x220 
[mmfslinux]
[  871.275763]  [] ? cxiDeallocPageList+0xbd/0x110 [mmfslinux]
[  871.316987]  [] cxiDeallocPageList+0x45/0x110 [mmfslinux]
[  871.356886]  [] ? _raw_spin_lock+0x10/0x30
[  871.389455]  [] cxiFreeSharedMemory+0x12a/0x130 [mmfslinux]
[  871.429784]  [] kxFreeAllSharedMemory+0xe2/0x160 [mmfs26]
[  871.468753]  [] mmfs+0xc85/0xca0 [mmfs26]
[  871.501196]  [] gpfs_clean+0x26/0x30 [mmfslinux]
[  871.536562]  [] cleanup_module+0x25/0x30 [mmfs26]
[  871.572110]  [] SyS_delete_module+0x19b/0x300
[  871.606048]  [] system_call_fastpath+0x16/0x1b

--
Paul Scherrer Institut
Science IT
Heiner Billich
WHGA 106
CH 5232  Villigen PSI
056 310 36 02
https://www.psi.ch


From:  on behalf of Sven Oehme 

Reply-To: gpfsug main discussion list 
Date: Thursday 12 July 2018 at 15:42
To: gpfsug main discussion list 
Subject: Re: 

Re: [gpfsug-discuss] File placement rule for new files in directory - PATH_NAME

2018-07-12 Thread Marc A Kaplan
Why no path name in SET POOL rule?
Maybe more than one reason, but consider, that in Unix, the API has the 
concept of "current directory" and "create a file in the current 
directory"
AND another process or thread may at any time rename (mv!) any 
directory...

So even it you "think" you know the name of the directory  in which you 
are creating a file, you really don't know for sure!

So, you may ask, how does the command /bin/pwd work?  It follows the 
parent inode field of each inode, searches the parent for a matching 
inode, stashes the name in a buffer...
When it reaches the root, it prints out the apparent path it found to the 
root...  Which could be wrong by the time it reaches the root!

For example:

[root@~/gpfs-git]$mkdir -p /tmp/a/b/c/d
[root@~/gpfs-git]$cd /tmp/a/b/c/d
[root@.../c/d]$/bin/pwd
/tmp/a/b/c/d
[root@.../c/d]$pwd
/tmp/a/b/c/d

[root@.../c/d]$mv /tmp/a/b /tmp/a/b2
[root@.../c/d]$pwd
/tmp/a/b/c/d
# Bash still "thinks" it is in /tmp/a/b/c/d
[root@.../c/d]$/bin/pwd
/tmp/a/b2/c/d
# But /bin/pwd knows better


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] /sbin/rmmod mmfs26 hangs on mmshutdown

2018-07-12 Thread Billich Heinrich Rainer (PSI)
Hello Sven,

Thank you. I did enable numaMemorInterleave but the issues stays.

In the meantime I switched to version 5.0.0-2 just to see if it’s version 
dependent – it’s not. All gpfs filesystems are unmounted when this happens.

At shutdown I often need to do a hard reset to force a reboot – o.k., I never 
waited more than 5 minutes once I saw a hang, maybe it would recover after some 
more time.

‘rmmod mmfs26’ doesn’t hang all the times, maybe at every other shutdown or 
mmstartup/mmshutdown cycle. While rmmod hangs the system seems slow, command 
like ‘ps -efH’  or ‘history’ take a long time and some mm commands just block, 
a few times the system gets completely inaccessible.

I’ll reinstall the systems and move back to 4.2.3-8 and see if this is a stable 
configuration to start from an to rule out any hardware/BIOS issues.

I append output from numactl -H below.

Cheers,

Heiner

Test with 5.0.0-2

[root@xbl-ces-2 ~]# numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 36 37 38 39 40 41 42 
43 44 45 46 47 48 49 50 51 52 53
node 0 size: 130942 MB
node 0 free: 60295 MB
node 1 cpus: 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 54 55 56 57 
58 59 60 61 62 63 64 65 66 67 68 69 70 71
node 1 size: 131072 MB
node 1 free: 60042 MB
node distances:
node   0   1
  0:  10  21
  1:  21  10

[root@xbl-ces-2 ~]# mmdiag --config | grep numaM
! numaMemoryInterleave yes

# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.10.0-693.17.1.el7.x86_64 root=/dev/mapper/vg_root-lv_root 
ro crashkernel=auto rd.lvm.lv=vg_root/lv_root console=tty0 console=ttyS0,115200 
nosmap


Example output of ps -efH during mmshutdown when rmmod did hang (last line) 
This is with 5.0.0-2. As I see all gpfs processe already terminated, just

root 1 0  0 14:30 ?00:00:10 /usr/lib/systemd/systemd 
--switched-root --system --deserialize 21
root  1035 1  0 14:30 ?00:00:02   
/usr/lib/systemd/systemd-journald
root  1055 1  0 14:30 ?00:00:00   /usr/sbin/lvmetad -f
root  1072 1  0 14:30 ?00:00:11   /usr/lib/systemd/systemd-udevd
root  1478 1  0 14:31 ?00:00:00   /usr/sbin/sssd -i -f
root  1484  1478  0 14:31 ?00:00:00 /usr/libexec/sssd/sssd_be 
--domain D.PSI.CH --uid 0 --gid 0 --debug-to-files
root  1486  1478  0 14:31 ?00:00:00 /usr/libexec/sssd/sssd_nss 
--uid 0 --gid 0 --debug-to-files
root  1487  1478  0 14:31 ?00:00:00 /usr/libexec/sssd/sssd_pam 
--uid 0 --gid 0 --debug-to-files
root  1479 1  0 14:31 ?00:00:00   /usr/sbin/rasdaemon -f -r
root  1482 1  0 14:31 ?00:00:04   /usr/sbin/irqbalance 
--foreground
dbus  1483 1  0 14:31 ?00:00:00   /bin/dbus-daemon --system 
--address=systemd: --nofork --nopidfile --systemd-activation
root  1496 1  0 14:31 ?00:00:00   /usr/sbin/smartd -n -q never
root  1498 1  0 14:31 ?00:00:00   /usr/sbin/gssproxy -D
nscd  1507 1  0 14:31 ?00:00:01   /usr/sbin/nscd
nrpe  1526 1  0 14:31 ?00:00:00   /usr/sbin/nrpe -c 
/etc/nagios/nrpe.cfg -d
root  1531 1  0 14:31 ?00:00:00   
/usr/lib/systemd/systemd-logind
root  1533 1  0 14:31 ?00:00:00   /usr/sbin/rpc.gssd
root  1803 1  0 14:31 ttyS000:00:00   /sbin/agetty --keep-baud 
115200 38400 9600 ttyS0 vt220
root  1804 1  0 14:31 tty1 00:00:00   /sbin/agetty --noclear tty1 
linux
root  2405 1  0 14:32 ?00:00:00   /sbin/dhclient -q -cf 
/etc/dhcp/dhclient-ib0.conf -lf /var/lib/dhclient/dhclient--ib0.l
root  2461 1  0 14:32 ?00:00:00   /usr/sbin/sshd -D
root 11561  2461  0 14:35 ?00:00:00 sshd: root@pts/0
root 11565 11561  0 14:35 pts/000:00:00   -bash
root 16024 11565  0 14:50 pts/000:00:05 ps -efH
root 11609  2461  0 14:35 ?00:00:00 sshd: root@pts/1
root 11644 11609  0 14:35 pts/100:00:00   -bash
root  2718 1  0 14:32 ?00:00:00   /usr/lpp/mmfs/bin/mmksh 
/usr/lpp/mmfs/bin/mmccrmonitor 15 0 no
root  2758 1  0 14:32 ?00:00:00   /usr/libexec/postfix/master -w
postfix   2785  2758  0 14:32 ?00:00:00 pickup -l -t unix -u
postfix   2786  2758  0 14:32 ?00:00:00 qmgr -l -t unix -u
root  3174 1  0 14:32 ?00:00:00   /usr/sbin/crond -n
ntp   3179 1  0 14:32 ?00:00:00   /usr/sbin/ntpd -u ntp:ntp -g
root  3915 1  3 14:32 ?00:00:33   python 
/usr/lpp/mmfs/bin/mmsysmon.py
root 13618 1  0 14:36 ?00:00:00   /usr/lpp/mmfs/bin/mmsdrserv 
1191 10 10 /var/adm/ras/mmsdrserv.log 8192 yes no
root 15936 1  0 14:49 pts/100:00:00   /usr/lpp/mmfs/bin/mmksh 
/usr/lpp/mmfs/bin/runmmfs
root 15992 15936  0 14:49 pts/100:00:00 /sbin/rmmod mmfs26

--
Paul Scherrer Institut
Science IT
Heiner Billich
WHGA 106
CH 5232  Villigen PSI

Re: [gpfsug-discuss] File placement rule for new files in directory

2018-07-12 Thread Uwe Falke
If that has not changed, then:
PATH_NAME is not usable for placement policies. 
Only the FILESET_NAME attribute is accepted.


One might think, that PATH_NAME is as known on creating a new file as is 
FILESET_NAME, but for some reason the documentation says: 
"When file attributes are referenced in initial placement rules, only the 
following attributes are valid:
FILESET_NAME, GROUP_ID, NAME, and USER_ID. "


 
Mit freundlichen Grüßen / Kind regards

 
Dr. Uwe Falke
 
IT Specialist
High Performance Computing Services / Integrated Technology Services / 
Data Center Services
---
IBM Deutschland
Rathausstr. 7
09111 Chemnitz
Phone: +49 371 6978 2165
Mobile: +49 175 575 2877
E-Mail: uwefa...@de.ibm.com
---
IBM Deutschland Business & Technology Services GmbH / Geschäftsführung: 
Thomas Wolter, Sven Schooß
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, 
HRB 17122 




From:   Michal Zacek 
To: gpfsug-discuss@spectrumscale.org
Date:   12/07/2018 10:49
Subject:Re: [gpfsug-discuss] File placement rule for new files in 
directory
Sent by:gpfsug-discuss-boun...@spectrumscale.org



That's perfect, thank you both.

Best regards
Michal

Dne 12.7.2018 v 10:39 Smita J Raut napsal(a):
If ABCD is not a fileset then below rule can be used-

RULE 'ABCD-rule-01' SET POOL 'fastdata' WHERE PATH_NAME LIKE 
'/gpfs/gpfs01/ABCD/%'

Thanks,
Smita



From:Simon Thompson 
To:gpfsug main discussion list 
Date:07/12/2018 01:34 PM
Subject:Re: [gpfsug-discuss] File placement rule for new files in 
directory
Sent by:gpfsug-discuss-boun...@spectrumscale.org



Is ABCD a fileset? If so, its easy with something like:

RULE 'ABCD-rule-01' SET POOL 'fastdata' FOR FILESET ('ABCD-fileset-name')

Simon

On 12/07/2018, 07:56, "gpfsug-discuss-boun...@spectrumscale.org on behalf 
of zac...@img.cas.cz"  wrote:

   Hello,
   
   it is possible to create file placement policy for new files in one 
   directory? I need something like this --> All new files created in 
   directory "/gpfs/gpfs01/ABCD" will be stored in pool "fastdata".
   Thanks.
   
   Best regards,
   Michal
   
   
   

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss






___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

[attachment "smime.p7s" deleted by Uwe Falke/Germany/IBM] 
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss




___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Analyse steps if disk are down after reboot

2018-07-12 Thread IBM Spectrum Scale
Just to follow up on the question about where to learn why a NSD is marked 
down you should see a message in the GPFS log, /var/adm/ras/mmfs.log.*

Regards, The Spectrum Scale (GPFS) team

--
If you feel that your question can benefit other users of  Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=----0479
. 

If your query concerns a potential software error in Spectrum Scale (GPFS) 
and you have an IBM software maintenance contract please contact 
1-800-237-5511 in the United States or your local IBM Service Center in 
other countries. 

The forum is informally monitored as time permits and should not be used 
for priority messages to the Spectrum Scale (GPFS) team.



From:   "Grunenberg, Renar" 
To: 'gpfsug main discussion list' 
Date:   07/12/2018 06:01 AM
Subject:Re: [gpfsug-discuss] Analyse steps if disk are down after 
reboot
Sent by:gpfsug-discuss-boun...@spectrumscale.org



Hallo Achim, hallo Simon,
first thanks for your answers. I think Achims answers map these at best. 
The nsd-servers (only 2) for these disk were mistakenly restart in a same 
time window. 
 
Renar Grunenberg
Abteilung Informatik – Betrieb

HUK-COBURG
Bahnhofsplatz
96444 Coburg
Telefon:
09561 96-44110
Telefax:
09561 96-44104
E-Mail:
renar.grunenb...@huk-coburg.de
Internet:
www.huk.de
HUK-COBURG Haftpflicht-Unterstützungs-Kasse kraftfahrender Beamter 
Deutschlands a. G. in Coburg
Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
Vorstand: Klaus-Jürgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav 
Herøy, Dr. Jörg Rheinländer (stv.), Sarah Rössler, Daniel Thomas.
Diese Nachricht enthält vertrauliche und/oder rechtlich geschützte 
Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrtümlich 
erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese 
Nachricht.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht 
ist nicht gestattet.

This information may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this information 
in error) please notify the
sender immediately and destroy this information.
Any unauthorized copying, disclosure or distribution of the material in 
this information is strictly forbidden.

Von: gpfsug-discuss-boun...@spectrumscale.org [
mailto:gpfsug-discuss-boun...@spectrumscale.org] Im Auftrag von Achim 
Rehor
Gesendet: Donnerstag, 12. Juli 2018 11:47
An: gpfsug main discussion list 
Betreff: Re: [gpfsug-discuss] Analyse steps if disk are down after reboot
 
Hi Renar, 

whenever an access to a NSD happens, there is a potential that the node 
cannot access the disk, so if the (only) NSD server is down, there will be 
no chance to access the disk, and the disk will be set down.
If you have twintailed disks, the 'second' (or possibly some more) NSD 
server will be asked, switching to networked access, and in that case only 
if that also fails, the disk will be set to down as well. 

Not sure how your setup is, but if you reboot 2 NSD servers, and some 
client possibly did IO to a file served by just these 2, then the 'down' 
state would be explainable. 

Rebooting of an NSD server should never set a disk to down, except, he was 
the only one serving that NSD.


Mit freundlichen Grüßen / Kind regards
Achim Rehor


 
Software Technical Support Specialist AIX/ Emea HPC Support

IBM Certified Advanced Technical Expert - Power Systems with AIX
TSCC Software Service, Dept. 7922
Global Technology Services 

Phone:
+49-7034-274-7862
 IBM Deutschland
E-Mail:
achim.re...@de.ibm.com
 Am Weiher 24
 
 
 65451 Kelsterbach
 
 
 Germany
 
 
 

 
IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter 
Geschäftsführung: Martin Hartmann (Vorsitzender), Norbert Janzen, Stefan 
Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt 
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, 
HRB 14562 WEEE-Reg.-Nr. DE 99369940 
 
 



From:"Grunenberg, Renar" 
To:"'gpfsug-discuss@spectrumscale.org'" <
gpfsug-discuss@spectrumscale.org>
Date:12/07/2018 10:17
Subject:[gpfsug-discuss] Analyse steps if disk are down after 
reboot
Sent by:gpfsug-discuss-boun...@spectrumscale.org




Hallo All,
 
we see after a reboot of two NSD-Servers some disks in different 
filesystems are down and we don’t see why. 
The logs (messages, dmesg, kern,..) are saying nothing. We are on Rhel7.4 
and SS 5.0.1.1.
The question now, there are any log, structures in the gpfs deamon that 
log these situation? What was the reason why the deamon hast no access to 
the disks at that 

Re: [gpfsug-discuss] Analyse steps if disk are down after reboot

2018-07-12 Thread Grunenberg, Renar
Hallo Achim, hallo Simon,
first thanks for your answers. I think Achims answers map these at best. The 
nsd-servers (only 2) for these disk were mistakenly restart in a same time 
window.


Renar Grunenberg
Abteilung Informatik – Betrieb

HUK-COBURG
Bahnhofsplatz
96444 Coburg
Telefon:09561 96-44110
Telefax:09561 96-44104
E-Mail: renar.grunenb...@huk-coburg.de
Internet:   www.huk.de

HUK-COBURG Haftpflicht-Unterstützungs-Kasse kraftfahrender Beamter Deutschlands 
a. G. in Coburg
Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
Vorstand: Klaus-Jürgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav 
Herøy, Dr. Jörg Rheinländer (stv.), Sarah Rössler, Daniel Thomas.

Diese Nachricht enthält vertrauliche und/oder rechtlich geschützte 
Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrtümlich 
erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist 
nicht gestattet.

This information may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this information in 
error) please notify the
sender immediately and destroy this information.
Any unauthorized copying, disclosure or distribution of the material in this 
information is strictly forbidden.

Von: gpfsug-discuss-boun...@spectrumscale.org 
[mailto:gpfsug-discuss-boun...@spectrumscale.org] Im Auftrag von Achim Rehor
Gesendet: Donnerstag, 12. Juli 2018 11:47
An: gpfsug main discussion list 
Betreff: Re: [gpfsug-discuss] Analyse steps if disk are down after reboot

Hi Renar,

whenever an access to a NSD happens, there is a potential that the node cannot 
access the disk, so if the (only) NSD server is down, there will be no chance 
to access the disk, and the disk will be set down.
If you have twintailed disks, the 'second' (or possibly some more) NSD server 
will be asked, switching to networked access, and in that case only if that 
also fails, the disk will be set to down as well.

Not sure how your setup is, but if you reboot 2 NSD servers, and some client 
possibly did IO to a file served by just these 2, then the 'down' state would 
be explainable.

Rebooting of an NSD server should never set a disk to down, except, he was the 
only one serving that NSD.


Mit freundlichen Grüßen / Kind regards

Achim Rehor





Software Technical Support Specialist AIX/ Emea HPC Support

[cid:image001.gif@01D419D7.A9373E60]

IBM Certified Advanced Technical Expert - Power Systems with AIX

TSCC Software Service, Dept. 7922

Global Technology Services



Phone:

+49-7034-274-7862

 IBM Deutschland

E-Mail:

achim.re...@de.ibm.com

 Am Weiher 24





 65451 Kelsterbach





 Germany











IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter
Geschäftsführung: Martin Hartmann (Vorsitzender), Norbert Janzen, Stefan Lutz, 
Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 
14562 WEEE-Reg.-Nr. DE 99369940








From:"Grunenberg, Renar" 
mailto:renar.grunenb...@huk-coburg.de>>
To:"'gpfsug-discuss@spectrumscale.org'" 
mailto:gpfsug-discuss@spectrumscale.org>>
Date:12/07/2018 10:17
Subject:[gpfsug-discuss] Analyse steps if disk are down after reboot
Sent by:
gpfsug-discuss-boun...@spectrumscale.org





Hallo All,

we see after a reboot of two NSD-Servers some disks in different filesystems 
are down and we don’t see why.
The logs (messages, dmesg, kern,..) are saying nothing. We are on Rhel7.4 and 
SS 5.0.1.1.
The question now, there are any log, structures in the gpfs deamon that log 
these situation? What was the reason why the deamon hast no access to the disks 
at that startup phase.
Any hints are appreciated.

Renar Grunenberg
Abteilung Informatik – Betrieb

HUK-COBURG
Bahnhofsplatz
96444 Coburg
Telefon:

09561 96-44110

Telefax:

09561 96-44104

E-Mail:

renar.grunenb...@huk-coburg.de

Internet:

www.huk.de


HUK-COBURG Haftpflicht-Unterstützungs-Kasse kraftfahrender Beamter Deutschlands 
a. G. in Coburg
Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
Vorstand: Klaus-Jürgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav 
Herøy, Dr. Jörg Rheinländer (stv.), Sarah Rössler, Daniel Thomas.

Re: [gpfsug-discuss] Analyse steps if disk are down after reboot

2018-07-12 Thread Achim Rehor
Hi Renar, whenever an access to a NSD happens,
there is a potential that the node cannot access the disk, so if the (only)
NSD server is down, there will be no chance to access the disk, and the
disk will be set down.If you have twintailed disks, the 'second' (or possibly some more) NSD
server will be asked, switching to networked access, and in that case only
if that also fails, the disk will be set to down as well. Not sure how your setup is, but if you
reboot 2 NSD servers, and some client possibly did IO to a file served
by just these 2, then the 'down' state would be explainable. Rebooting of an NSD server should never
set a disk to down, except, he was the only one serving that NSD.Mit freundlichen Grüßen / Kind regardsAchim Rehor Software
Technical Support Specialist AIX/ Emea HPC SupportIBM
Certified Advanced Technical Expert - Power Systems with AIXTSCC
Software Service, Dept. 7922Global
Technology Services Phone:+49-7034-274-7862 IBM
DeutschlandE-Mail:achim.re...@de.ibm.com Am
Weiher 24   65451
Kelsterbach   GermanyIBM
Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Geschäftsführung: Martin Hartmann (Vorsitzender), Norbert Janzen, Stefan
Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart,
HRB 14562 WEEE-Reg.-Nr. DE 99369940  From:      
 "Grunenberg, Renar"
To:      
 "'gpfsug-discuss@spectrumscale.org'"
Date:      
 12/07/2018 10:17Subject:    
   [gpfsug-discuss]
Analyse steps if disk are down after rebootSent by:    
   gpfsug-discuss-boun...@spectrumscale.orgHallo All, we see after a reboot of two NSD-Servers
some disks in different filesystems are down and we don’t see why. The logs (messages, dmesg, kern,..) are saying
nothing. We are on Rhel7.4 and SS 5.0.1.1.The question now, there are any log, structures
in the gpfs deamon that log these situation? What was the reason why the
deamon hast no access to the disks at that startup phase.Any hints are appreciated.  Renar GrunenbergAbteilung Informatik – BetriebHUK-COBURGBahnhofsplatz96444 CoburgTelefon:09561
96-44110Telefax:09561
96-44104E-Mail:renar.grunenb...@huk-coburg.deInternet:www.huk.deHUK-COBURG Haftpflicht-Unterstützungs-Kasse
kraftfahrender Beamter Deutschlands a. G. in CoburgReg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021Sitz der Gesellschaft: Bahnhofsplatz, 96444 CoburgVorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.Vorstand: Klaus-Jürgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav
Herøy, Dr. Jörg Rheinländer (stv.), Sarah Rössler, Daniel Thomas.Diese Nachricht enthält vertrauliche und/oder
rechtlich geschützte Informationen.Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrtümlich
erhalten haben,informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht.Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht
ist nicht gestattet.This information may contain confidential and/or privileged information.If you are not the intended recipient (or have received this information
in error) please notify thesender immediately and destroy this information.Any unauthorized copying, disclosure or distribution of the material in
this information is strictly forbidden.___gpfsug-discuss mailing listgpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] File placement rule for new files in directory

2018-07-12 Thread Michal Zacek

That's perfect, thank you both.

Best regards
Michal

Dne 12.7.2018 v 10:39 Smita J Raut napsal(a):

If ABCD is not a fileset then below rule can be used-

RULE 'ABCD-rule-01' SET POOL 'fastdata' WHERE PATH_NAME LIKE 
'/gpfs/gpfs01/ABCD/%'


Thanks,
Smita



From: Simon Thompson 
To: gpfsug main discussion list 
Date: 07/12/2018 01:34 PM
Subject: Re: [gpfsug-discuss] File placement rule for new files in 
directory

Sent by: gpfsug-discuss-boun...@spectrumscale.org




Is ABCD a fileset? If so, its easy with something like:

RULE 'ABCD-rule-01' SET POOL 'fastdata' FOR FILESET ('ABCD-fileset-name')

Simon

On 12/07/2018, 07:56, "gpfsug-discuss-boun...@spectrumscale.org on 
behalf of zac...@img.cas.cz" on behalf of zac...@img.cas.cz> wrote:


   Hello,

   it is possible to create file placement policy for new files in one
   directory? I need something like this --> All new files created in
   directory "/gpfs/gpfs01/ABCD" will be stored in pool "fastdata".
   Thanks.

   Best regards,
   Michal




___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss






___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss




smime.p7s
Description: Elektronicky podpis S/MIME
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Analyse steps if disk are down after reboot

2018-07-12 Thread Simon Thompson
How are the disks attached? We have some IB/SRP storage that is sometimes a 
little slow to appear in multipath and have seen this in the past (we since set 
autoload=off and always check multipath before restarting GPFS on the node).

Simon

From:  on behalf of 
"renar.grunenb...@huk-coburg.de" 
Reply-To: "gpfsug-discuss@spectrumscale.org" 
Date: Thursday, 12 July 2018 at 09:17
To: "gpfsug-discuss@spectrumscale.org" 
Subject: [gpfsug-discuss] Analyse steps if disk are down after reboot

Hallo All,

we see after a reboot of two NSD-Servers some disks in different filesystems 
are down and we don’t see why.
The logs (messages, dmesg, kern,..) are saying nothing. We are on Rhel7.4 and 
SS 5.0.1.1.
The question now, there are any log, structures in the gpfs deamon that log 
these situation? What was the reason why the deamon hast no access to the disks 
at that startup phase.
Any hints are appreciated.

Renar Grunenberg
Abteilung Informatik – Betrieb

HUK-COBURG
Bahnhofsplatz
96444 Coburg
Telefon:

09561 96-44110

Telefax:

09561 96-44104

E-Mail:

renar.grunenb...@huk-coburg.de

Internet:

www.huk.de


HUK-COBURG Haftpflicht-Unterstützungs-Kasse kraftfahrender Beamter Deutschlands 
a. G. in Coburg
Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
Vorstand: Klaus-Jürgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav 
Herøy, Dr. Jörg Rheinländer (stv.), Sarah Rössler, Daniel Thomas.

Diese Nachricht enthält vertrauliche und/oder rechtlich geschützte 
Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrtümlich 
erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist 
nicht gestattet.

This information may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this information in 
error) please notify the
sender immediately and destroy this information.
Any unauthorized copying, disclosure or distribution of the material in this 
information is strictly forbidden.

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] File placement rule for new files in directory

2018-07-12 Thread Smita J Raut
If ABCD is not a fileset then below rule can be used-

RULE 'ABCD-rule-01' SET POOL 'fastdata' WHERE PATH_NAME LIKE '
/gpfs/gpfs01/ABCD/%'

Thanks,
Smita



From:   Simon Thompson 
To: gpfsug main discussion list 
Date:   07/12/2018 01:34 PM
Subject:Re: [gpfsug-discuss] File placement rule for new files in 
directory
Sent by:gpfsug-discuss-boun...@spectrumscale.org



Is ABCD a fileset? If so, its easy with something like:

RULE 'ABCD-rule-01' SET POOL 'fastdata' FOR FILESET ('ABCD-fileset-name')

Simon

On 12/07/2018, 07:56, "gpfsug-discuss-boun...@spectrumscale.org on 
behalf of zac...@img.cas.cz"  wrote:

Hello,
 
it is possible to create file placement policy for new files in one 
directory? I need something like this --> All new files created in 
directory "/gpfs/gpfs01/ABCD" will be stored in pool "fastdata".
Thanks.
 
Best regards,
Michal
 
 
 

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss






___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] Analyse steps if disk are down after reboot

2018-07-12 Thread Grunenberg, Renar
Hallo All,

we see after a reboot of two NSD-Servers some disks in different filesystems 
are down and we don’t see why.
The logs (messages, dmesg, kern,..) are saying nothing. We are on Rhel7.4 and 
SS 5.0.1.1.
The question now, there are any log, structures in the gpfs deamon that log 
these situation? What was the reason why the deamon hast no access to the disks 
at that startup phase.
Any hints are appreciated.

Renar Grunenberg
Abteilung Informatik – Betrieb

HUK-COBURG
Bahnhofsplatz
96444 Coburg
Telefon:09561 96-44110
Telefax:09561 96-44104
E-Mail: renar.grunenb...@huk-coburg.de
Internet:   www.huk.de

HUK-COBURG Haftpflicht-Unterstützungs-Kasse kraftfahrender Beamter Deutschlands 
a. G. in Coburg
Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
Vorstand: Klaus-Jürgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav 
Herøy, Dr. Jörg Rheinländer (stv.), Sarah Rössler, Daniel Thomas.

Diese Nachricht enthält vertrauliche und/oder rechtlich geschützte 
Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrtümlich 
erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist 
nicht gestattet.

This information may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this information in 
error) please notify the
sender immediately and destroy this information.
Any unauthorized copying, disclosure or distribution of the material in this 
information is strictly forbidden.

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] File placement rule for new files in directory

2018-07-12 Thread Simon Thompson
Is ABCD a fileset? If so, its easy with something like:

RULE 'ABCD-rule-01' SET POOL 'fastdata' FOR FILESET ('ABCD-fileset-name')

Simon

On 12/07/2018, 07:56, "gpfsug-discuss-boun...@spectrumscale.org on behalf of 
zac...@img.cas.cz"  wrote:

Hello,

it is possible to create file placement policy for new files in one 
directory? I need something like this --> All new files created in 
directory "/gpfs/gpfs01/ABCD" will be stored in pool "fastdata".
Thanks.

Best regards,
Michal




___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] File placement rule for new files in directory

2018-07-12 Thread Michal Zacek

Hello,

it is possible to create file placement policy for new files in one 
directory? I need something like this --> All new files created in 
directory "/gpfs/gpfs01/ABCD" will be stored in pool "fastdata".

Thanks.

Best regards,
Michal




smime.p7s
Description: Elektronicky podpis S/MIME
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss