Re: [Lustre-discuss] Future of LusterFS?

2010-04-22 Thread Michael Schwartzkopff
Am Donnerstag, 22. April 2010 08:33:14 schrieb Janne Aho:
 Hi,

 Today we have a storage system based on NFS, but we are really concerned
 about redundancy and are at the brink to take the step to a cluster file
 system as glusterfs, but we have got suggestions on that lusterfs would
 have been the best option for us, but at the same time those who
 recommended lusterfs has said that Oracle has pulled the plug and put
 the resources into OCFS2.
 If using lusterfs in a production environment, it would be good to know
 that it won't be discontinued.

 Will there be a long term future for lusterfs?
 Or should we be looking for something else for a long term solution?

 Thanks in advance for your reply for my a bit cloudy question.

Hi,

for me Lustre is a very good option.

But you also could consider a system composed from
- corosync for the cluster communication
- pacemaker as a cluster resource manager
- DRBD for the replication of data between nodes in a cluster

and

- NFS
or
- OCFS2 or GFS or ...

especially the NFS option provides you with a high available NFS server on 
real cluster stack all managed by pacemaker.

Greetings,
-- 
Dr. Michael Schwartzkopff
MultiNET Services GmbH
Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany
Tel: +49 - 89 - 45 69 11 0
Fax: +49 - 89 - 45 69 11 21
mob: +49 - 174 - 343 28 75

mail: mi...@multinet.de
web: www.multinet.de

Sitz der Gesellschaft: 85630 Grasbrunn
Registergericht: Amtsgericht München HRB 114375
Geschäftsführer: Günter Jurgeneit, Hubert Martens

---

PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B
Skype: misch42
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Filesystem monitoring in Heartbeat

2010-01-22 Thread Michael Schwartzkopff
Am Donnerstag, 21. Januar 2010 23:09:37 schrieb Bernd Schubert:
 On Thursday 21 January 2010, Adam Gandelman wrote:
(...)
 I guess you want to use the pacemaker agent I posted into this bugzilla:

 https://bugzilla.lustre.org/show_bug.cgi?id=20807

Hallo,

how far did you come with the development of the agent? Some kind of finished? 
Publishable?

Greetings,

-- 
Dr. Michael Schwartzkopff
MultiNET Services GmbH
Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany
Tel: +49 - 89 - 45 69 11 0
Fax: +49 - 89 - 45 69 11 21
mob: +49 - 174 - 343 28 75

mail: mi...@multinet.de
web: www.multinet.de

Sitz der Gesellschaft: 85630 Grasbrunn
Registergericht: Amtsgericht München HRB 114375
Geschäftsführer: Günter Jurgeneit, Hubert Martens

---

PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B
Skype: misch42
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] What HA Software to use with Lustre

2010-01-15 Thread Michael Schwartzkopff
Am Freitag, 15. Januar 2010 07:30:13 schrieben Sie:
  A introduction into pacemaker can be found at:
  http://www.clusterlabs.org/doc/en-
  US/Pacemaker/1.0/html/Pacemaker_Explained/index.html

 I wish I were aware of the crm CLI before trying to take the XML way
 according the link above:

   http://www.clusterlabs.org/doc/crm_cli.html

 Cheers,
 Li Wei

We are working on a documentation how to set up lustre together with 
pacemaker. As soon as we are finished it will show up in the wiki.

Greetings,

-- 
Dr. Michael Schwartzkopff
MultiNET Services GmbH
Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany
Tel: +49 - 89 - 45 69 11 0
Fax: +49 - 89 - 45 69 11 21
mob: +49 - 174 - 343 28 75

mail: mi...@multinet.de
web: www.multinet.de

Sitz der Gesellschaft: 85630 Grasbrunn
Registergericht: Amtsgericht München HRB 114375
Geschäftsführer: Günter Jurgeneit, Hubert Martens

---

PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B
Skype: misch42
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] What HA Software to use with Lustre

2010-01-14 Thread Michael Schwartzkopff
Am Freitag, 15. Januar 2010 00:48:53 schrieb Jagga Soorma:
 Hi Guys,

 I am setting up our new Lustre environment and was wondering what is the
 recommended (stable) HA clustering software to use with the MDS and OSS
 failover.  Any input would be greatly appreciated.

 Thanks,
 -J

The docs describe heartbeat but the software is not recommended any more. 
Neither heartbeat version 1 nor heartbeat version 2. Instead the projects 
openais and pacemaker replaced the funcionallity of heartbeat. For the new 
project please see
www.clusterlabs.org

A introduction into pacemaker can be found at:
http://www.clusterlabs.org/doc/en-
US/Pacemaker/1.0/html/Pacemaker_Explained/index.html

Greetings,
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Implementing MMP correctly

2009-12-22 Thread Michael Schwartzkopff
Hi,

I am trying to understand howto implement MMP correctly into a lustre failover 
cluster.

As far as I understood the MMP protects the same filesystem beeing mounted by 
different nodes (OSS) of a failover cluster. So far so good.

If a node was shut down uncleanly it still will occupy its filesystems by MMP 
and thus preventing the clean failover to an other node. Now I want to 
implement a clean failover into the Filesystem Resource Agent of pacemaker. Is 
there a good way to solve the problem with MMP? Possible sotutions are:

- Disable the MMP feature in a cluster at all, since the resource manager 
takes care that the same resource is only mounted once in the cluster

- Do a tunefs -O ^mmp device and a tunefs -O mmp device before every 
mounting of a resource?

- Do a sleep 10 before mounting a resource? But the manual says the file 
system mount require additional time if the file system was not cleanly 
unmounted.

- Check if the file system is in use by another OSS through MMP and wait a 
litte bit longer? How do I do this?

Please mail me any ideas. Thanks.

-- 
Dr. Michael Schwartzkopff
MultiNET Services GmbH
Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany
Tel: +49 - 89 - 45 69 11 0
Fax: +49 - 89 - 45 69 11 21
mob: +49 - 174 - 343 28 75

mail: mi...@multinet.de
web: www.multinet.de

Sitz der Gesellschaft: 85630 Grasbrunn
Registergericht: Amtsgericht München HRB 114375
Geschäftsführer: Günter Jurgeneit, Hubert Martens

---

PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B
Skype: misch42
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Understanding of MMP

2009-10-19 Thread Michael Schwartzkopff
Hi,

perhaps I have a problem understanding multiple mount protection MMP. I have a 
cluster. When a failover happens sometimes I get the log entry:

Oct 19 15:16:08 sososd7 kernel: LDISKFS-fs warning (device dm-2): 
ldiskfs_multi_mount_protect: Device is already active on another node.
Oct 19 15:16:08 sososd7 kernel: LDISKFS-fs warning (device dm-2): 
ldiskfs_multi_mount_protect: MMP failure info: last update time: 1255958168, 
last update node: sososd3, last update device: dm-2

Does the second line mean that my node (sososd7) tried to mount /dev/dm-2 but 
MMP prevented it from doing so because the last update from the old node 
(sososd3) was too recent?

From the manuals I found the MMP time of 109 seconds? Is it correct that after 
the umount the next node cannot mount the same filesystem within 10 seconds?

So the solution would be to wait fotr 10 seconds mounting the resource on the 
next node. Is this correct?

Thanks.

-- 
Dr. Michael Schwartzkopff
MultiNET Services GmbH
Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany
Tel: +49 - 89 - 45 69 11 0
Fax: +49 - 89 - 45 69 11 21
mob: +49 - 174 - 343 28 75

mail: mi...@multinet.de
web: www.multinet.de

Sitz der Gesellschaft: 85630 Grasbrunn
Registergericht: Amtsgericht München HRB 114375
Geschäftsführer: Günter Jurgeneit, Hubert Martens

---

PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B
Skype: misch42
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Problem re-mounting Lustre on an other node

2009-10-14 Thread Michael Schwartzkopff
Hi,

we have a Lustre 1.8 Cluster with openais and pacemaker as the cluster 
manager. When I migrate one lustre resource from one node to an other node I 
get an error. Stopping lustre on one node is no problem, but the node where 
lustre should start says:

Oct 14 09:54:28 sososd6 kernel: kjournald starting.  Commit interval 5 seconds
Oct 14 09:54:28 sososd6 kernel: LDISKFS FS on dm-4, internal journal
Oct 14 09:54:28 sososd6 kernel: LDISKFS-fs: recovery complete.
Oct 14 09:54:28 sososd6 kernel: LDISKFS-fs: mounted filesystem with ordered 
data mode.
Oct 14 09:54:28 sososd6 multipathd: dm-4: umount map (uevent)
Oct 14 09:54:39 sososd6 kernel: kjournald starting.  Commit interval 5 seconds
Oct 14 09:54:39 sososd6 kernel: LDISKFS FS on dm-4, internal journal
Oct 14 09:54:39 sososd6 kernel: LDISKFS-fs: mounted filesystem with ordered 
data mode.
Oct 14 09:54:39 sososd6 kernel: LDISKFS-fs: file extents enabled
Oct 14 09:54:39 sososd6 kernel: LDISKFS-fs: mballoc enabled
Oct 14 09:54:39 sososd6 kernel: Lustre: mgc134.171.16@tcp: Reactivating 
import
Oct 14 09:54:45 sososd6 kernel: LustreError: 137-5: UUID 'segfs-OST_UUID' 
is not available  for connect (no target)
Oct 14 09:54:45 sososd6 kernel: LustreError: Skipped 3 previous similar 
messages
Oct 14 09:54:45 sososd6 kernel: LustreError: 31334:0:
(ldlm_lib.c:1850:target_send_reply_msg()) @@@ processing error (-19)  
r...@810225fcb800 x334514011/t0 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 
1255506985 ref 1 fl Interpret:/0/0 rc -19/0
Oct 14 09:54:45 sososd6 kernel: LustreError: 31334:0:
(ldlm_lib.c:1850:target_send_reply_msg()) Skipped 3 previous similar messages

These log continue until the cluster software times out and the resource tells 
me about the error. Any help understanding these logs? Thanks.

-- 
Dr. Michael Schwartzkopff
MultiNET Services GmbH
Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany
Tel: +49 - 89 - 45 69 11 0
Fax: +49 - 89 - 45 69 11 21
mob: +49 - 174 - 343 28 75

mail: mi...@multinet.de
web: www.multinet.de

Sitz der Gesellschaft: 85630 Grasbrunn
Registergericht: Amtsgericht München HRB 114375
Geschäftsführer: Günter Jurgeneit, Hubert Martens

---

PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B
Skype: misch42
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Setup mail cluster

2009-10-12 Thread Michael Schwartzkopff
Am Montag, 12. Oktober 2009 15:54:04 schrieb Vadym:
 Hello
 I'm do a schema of mail service so I have only one question:
 Can Lustre provide me full automatic failover solution?

No. See the lustre manual for this. You need a cluster solution for this.
The manual is *hopelessly* outdate at this point. Do NOT user heartbeat any 
more. User pacemaker as the cluster manager. See www.clusterlabs.org.

When I find some time I want to write a HOWTO about setting up a lustre clsuter 
with pacemaker and OpenAIS.

 I plan to use for storage the standard servers with 1GE links. I need
 automatic solution as possible.
 E.g. RAID5 functionality, when one or more storage node down user data
 still accessible. So if I have 100TB of disk storage I can serve 50TB of
 data in failover mode with no downtime. Can you provide me more
 information?

Is a  bond-device for cluster interconnect! It is more safe!

Use DRBD for replication of the data if you use Direct attached Storage.

DRBD can operate on top of LVM. So you can have that functionallity also.

Perhaps you can try clustered LVM. Has nice features.

Or just use ZFS, which offers all this.

-- 
Dr. Michael Schwartzkopff
MultiNET Services GmbH
Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany
Tel: +49 - 89 - 45 69 11 0
Fax: +49 - 89 - 45 69 11 21
mob: +49 - 174 - 343 28 75

mail: mi...@multinet.de
web: www.multinet.de

Sitz der Gesellschaft: 85630 Grasbrunn
Registergericht: Amtsgericht München HRB 114375
Geschäftsführer: Günter Jurgeneit, Hubert Martens

---

PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B
Skype: misch42
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Question about sleeping processes

2009-10-06 Thread Michael Schwartzkopff
Hi,

my system load shows that quite a number of processes are waiting. ps shows me 
the same number of processes in state D (uniterruptable sleep). All processes 
are ll_mdt_NN, where NN is a decimal number.

In the logs I find the entry ( see log below).

My questions are:
What causes the problem?
Can I kill the hanging processes?

System: Luste 1.8.1 on RHEL5.3

thanks for any hints.

---

Oct  5 10:28:03 sosmds2 kernel: Lustre: 0:0:(watchdog.c:181:lcw_cb()) Watchdog 
triggered for pid 28402: it was inactive for 200.00s
Oct  5 10:28:03 sosmds2 kernel: ll_mdt_35 D 81000100c980 0 28402
  
1 28403 28388 (L-TLB)
Oct  5 10:28:03 sosmds2 kernel:  81041c723810 0046 
 7fff
Oct  5 10:28:03 sosmds2 kernel:  81041c7237d0 0001 
81022f3e60c0 81022f12e080
Oct  5 10:28:03 sosmds2 kernel:  000177b2feff847c 14df 
81022f3e62a8 0001028f
Oct  5 10:28:03 sosmds2 kernel: Call Trace:
Oct  5 10:28:03 sosmds2 kernel:  [8008a3ef] 
default_wake_function+0x0/0xe
Oct  5 10:28:03 sosmds2 kernel:  [885b1b26] 
:libcfs:lbug_with_loc+0xc6/0xd0
Oct  5 10:28:03 sosmds2 kernel:  [885b9c70] 
:libcfs:tracefile_init+0x0/0x110
Oct  5 10:28:03 sosmds2 kernel:  [88712218] 
:ptlrpc:lustre_shrink_reply_v2+0xa8/0x240
Oct  5 10:28:03 sosmds2 kernel:  [889ec529] 
:mds:mds_getattr_lock+0xc59/0xce0
Oct  5 10:28:03 sosmds2 kernel:  [88710ea4] 
:ptlrpc:lustre_msg_add_version+0x34/0x110
Oct  5 10:28:03 sosmds2 kernel:  [88602923] 
:lnet:lnet_ni_send+0x93/0xd0
Oct  5 10:28:03 sosmds2 kernel:  [88604d23] 
:lnet:lnet_send+0x973/0x9a0
Oct  5 10:28:03 sosmds2 kernel:  [889e6fca] 
:mds:fixup_handle_for_resent_req+0x5a/0x2c0
Oct  5 10:28:03 sosmds2 kernel:  [889f2a76] 
:mds:mds_intent_policy+0x636/0xc10
Oct  5 10:28:03 sosmds2 kernel:  [886d36f6] 
:ptlrpc:ldlm_resource_putref+0x1b6/0x3a0
Oct  5 10:28:03 sosmds2 kernel:  [886d0d46] 
:ptlrpc:ldlm_lock_enqueue+0x186/0xb30
Oct  5 10:28:03 sosmds2 kernel:  [886ecacf] 
:ptlrpc:ldlm_export_lock_get+0x6f/0xe0
Oct  5 10:28:03 sosmds2 kernel:  [8864fe48] 
:obdclass:lustre_hash_add+0x218/0x2e0
Oct  5 10:28:03 sosmds2 kernel:  [886f5530] 
:ptlrpc:ldlm_server_blocking_ast+0x0/0x83d
Oct  5 10:28:03 sosmds2 kernel:  [886f3669] 
:ptlrpc:ldlm_handle_enqueue+0xc19/0x1210
Oct  5 10:28:03 sosmds2 kernel:  [889f0630] 
:mds:mds_handle+0x4080/0x4cb0
Oct  5 10:28:03 sosmds2 kernel:  [885e0047] 
:lvfs:lprocfs_counter_sub+0x57/0x90
Oct  5 10:28:03 sosmds2 kernel:  [80148d4f] __next_cpu+0x19/0x28
Oct  5 10:28:03 sosmds2 kernel:  [88715a15] 
:ptlrpc:lustre_msg_get_conn_cnt+0x35/0xf0
Oct  5 10:28:03 sosmds2 kernel:  [80089d89] enqueue_task+0x41/0x56
Oct  5 10:28:03 sosmds2 kernel:  [8871a72d] 
:ptlrpc:ptlrpc_check_req+0x1d/0x110
Oct  5 10:28:03 sosmds2 kernel:  [8871ce67] 
:ptlrpc:ptlrpc_server_handle_request+0xa97/0x1160
Oct  5 10:28:03 sosmds2 kernel:  [8003dc3f] lock_timer_base+0x1b/0x3c
Oct  5 10:28:03 sosmds2 kernel:  [80088819] __wake_up_common+0x3e/0x68
Oct  5 10:28:03 sosmds2 kernel:  [88720908] 
:ptlrpc:ptlrpc_main+0x1218/0x13e0
Oct  5 10:28:03 sosmds2 kernel:  [8008a3ef] 
default_wake_function+0x0/0xe
Oct  5 10:28:03 sosmds2 kernel:  [800b48dd] 
audit_syscall_exit+0x327/0x342
Oct  5 10:28:03 sosmds2 kernel:  [8005dfb1] child_rip+0xa/0x11
Oct  5 10:28:03 sosmds2 kernel:  [8871f6f0] 
:ptlrpc:ptlrpc_main+0x0/0x13e0
Oct  5 10:28:03 sosmds2 kernel:  [8005dfa7] child_rip+0x0/0x11


-- 
Dr. Michael Schwartzkopff
MultiNET Services GmbH
Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany
Tel: +49 - 89 - 45 69 11 0
Fax: +49 - 89 - 45 69 11 21
mob: +49 - 174 - 343 28 75

mail: mi...@multinet.de
web: www.multinet.de

Sitz der Gesellschaft: 85630 Grasbrunn
Registergericht: Amtsgericht München HRB 114375
Geschäftsführer: Günter Jurgeneit, Hubert Martens

---

PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B
Skype: misch42
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Question about sleeping processes

2009-10-06 Thread Michael Schwartzkopff
Am Dienstag, 6. Oktober 2009 16:22:08 schrieb Brian J. Murrell:
 On Tue, 2009-10-06 at 12:48 +0200, Michael Schwartzkopff wrote:
  Hi,

 Hi,

  my system load shows that quite a number of processes are waiting.

 Blocked.  I guess the word waiting is similar.

  My questions are:
  What causes the problem?

 In this case, the thread has lbugged previously.

 If you look in syslog for node with these processes you should find
 entries with LBUG and/or ASSERTION messages.  These are the defects that
 are causing the processes to get blocked (uninteruptable sleep)
(...)

Here is some additional from the logs. Any ideas about that?

Oct  5 10:26:43 sosmds2 kernel: LustreError: 30617:0:
(pack_generic.c:655:lustre_shrink_reply_v2()) ASSERTION(msg-lm_bufcount  
segment) failed
Oct  5 10:26:43 sosmds2 kernel: LustreError: 30617:0:
(pack_generic.c:655:lustre_shrink_reply_v2()) LBUG
Oct  5 10:26:43 sosmds2 kernel: Lustre: 30617:0:(linux-
debug.c:264:libcfs_debug_dumpstack()) showing stack for process 30617
Oct  5 10:26:43 sosmds2 kernel: ll_mdt_47 R  running task   0 30617 
 
1 30618 30616 (L-TLB)
Oct  5 10:26:43 sosmds2 kernel:   0001 
000714a28100 0001
Oct  5 10:26:43 sosmds2 kernel:  0001 0086 
0012 8102212dfe88
Oct  5 10:26:43 sosmds2 kernel:  0001  
802f6aa0 
Oct  5 10:26:43 sosmds2 kernel: Call Trace:
Oct  5 10:26:43 sosmds2 kernel:  [8009daf8] 
autoremove_wake_function+0x9/0x2e
Oct  5 10:26:43 sosmds2 kernel:  [80088819] __wake_up_common+0x3e/0x68
Oct  5 10:26:43 sosmds2 kernel:  [80088819] __wake_up_common+0x3e/0x68
Oct  5 10:26:43 sosmds2 kernel:  [8008f7ac] vprintk+0x2cb/0x317
Oct  5 10:26:43 sosmds2 kernel:  [800a540a] kallsyms_lookup+0xc2/0x17b
Oct  5 10:26:43 sosmds2 last message repeated 3 times
Oct  5 10:26:43 sosmds2 kernel:  [8006bb5d] printk_address+0x9f/0xab
Oct  5 10:26:43 sosmds2 kernel:  [8008f800] printk+0x8/0xbd
Oct  5 10:26:43 sosmds2 kernel:  [8008f84a] printk+0x52/0xbd
Oct  5 10:26:43 sosmds2 kernel:  [800a2e08] 
module_text_address+0x33/0x3c
Oct  5 10:26:43 sosmds2 kernel:  [8009c088] 
kernel_text_address+0x1a/0x26
Oct  5 10:26:43 sosmds2 kernel:  [8006b843] dump_trace+0x211/0x23a
Oct  5 10:26:43 sosmds2 kernel:  [8006b8a0] show_trace+0x34/0x47
Oct  5 10:26:43 sosmds2 kernel:  [8006b9a5] _show_stack+0xdb/0xea
Oct  5 10:26:43 sosmds2 kernel:  [885b1ada] 
:libcfs:lbug_with_loc+0x7a/0xd0
Oct  5 10:26:43 sosmds2 kernel:  [885b9c70] 
:libcfs:tracefile_init+0x0/0x110
Oct  5 10:26:43 sosmds2 kernel:  [88712218] 
:ptlrpc:lustre_shrink_reply_v2+0xa8/0x240
Oct  5 10:26:43 sosmds2 kernel:  [889ec529] 
:mds:mds_getattr_lock+0xc59/0xce0
Oct  5 10:26:43 sosmds2 kernel:  [88710ea4] 
:ptlrpc:lustre_msg_add_version+0x34/0x110
Oct  5 10:26:43 sosmds2 kernel:  [88602923] 
:lnet:lnet_ni_send+0x93/0xd0
Oct  5 10:26:43 sosmds2 kernel:  [88604d23] 
:lnet:lnet_send+0x973/0x9a0
Oct  5 10:26:43 sosmds2 kernel:  [8005c2dc] 
cache_alloc_refill+0x106/0x186
Oct  5 10:26:43 sosmds2 kernel:  [889e6fca] 
:mds:fixup_handle_for_resent_req+0x5a/0x2c0
Oct  5 10:26:43 sosmds2 kernel:  [889f2a76] 
:mds:mds_intent_policy+0x636/0xc10
Oct  5 10:26:43 sosmds2 kernel:  [886d36f6] 
:ptlrpc:ldlm_resource_putref+0x1b6/0x3a0
Oct  5 10:26:43 sosmds2 kernel:  [886d0d46] 
:ptlrpc:ldlm_lock_enqueue+0x186/0xb30
Oct  5 10:26:43 sosmds2 kernel:  [886ecacf] 
:ptlrpc:ldlm_export_lock_get+0x6f/0xe0
Oct  5 10:26:43 sosmds2 kernel:  [8864fe48] 
:obdclass:lustre_hash_add+0x218/0x2e0
Oct  5 10:26:43 sosmds2 kernel:  [886f5530] 
:ptlrpc:ldlm_server_blocking_ast+0x0/0x83d
Oct  5 10:26:43 sosmds2 kernel:  [886f3669] 
:ptlrpc:ldlm_handle_enqueue+0xc19/0x1210
Oct  5 10:26:43 sosmds2 kernel:  [889f0630] 
:mds:mds_handle+0x4080/0x4cb0
Oct  5 10:26:43 sosmds2 kernel:  [80148d4f] __next_cpu+0x19/0x28
Oct  5 10:26:43 sosmds2 kernel:  [80088f32] 
find_busiest_group+0x20d/0x621
Oct  5 10:26:43 sosmds2 kernel:  [88715a15] 
:ptlrpc:lustre_msg_get_conn_cnt+0x35/0xf0
Oct  5 10:26:43 sosmds2 kernel:  [80089d89] enqueue_task+0x41/0x56
Oct  5 10:26:43 sosmds2 kernel:  [8871a72d] 
:ptlrpc:ptlrpc_check_req+0x1d/0x110
Oct  5 10:26:43 sosmds2 kernel:  [8871ce67] 
:ptlrpc:ptlrpc_server_handle_request+0xa97/0x1160
Oct  5 10:26:43 sosmds2 kernel:  [80063098] thread_return+0x62/0xfe
Oct  5 10:26:43 sosmds2 kernel:  [80088819] __wake_up_common+0x3e/0x68
Oct  5 10:26:43 sosmds2 kernel:  [88720908] 
:ptlrpc:ptlrpc_main+0x1218/0x13e0
Oct  5 10:26:43 sosmds2 kernel:  [8008a3ef] 
default_wake_function+0x0/0xe
Oct  5 10:26:43 sosmds2 kernel:  [800b48dd] 
audit_syscall_exit+0x327/0x342
Oct  5 10:26:43

Re: [Lustre-discuss] Question about sleeping processes

2009-10-06 Thread Michael Schwartzkopff
Am Dienstag, 6. Oktober 2009 17:08:44 schrieb Brian J. Murrell:
 On Tue, 2009-10-06 at 17:01 +0200, Michael Schwartzkopff wrote:
  Here is some additional from the logs. Any ideas about that?
 
  Oct  5 10:26:43 sosmds2 kernel: LustreError: 30617:0:
  (pack_generic.c:655:lustre_shrink_reply_v2()) ASSERTION(msg-lm_bufcount
   segment) failed

 Here's the failed assertion.

  Oct  5 10:26:43 sosmds2 kernel: LustreError: 30617:0:
  (pack_generic.c:655:lustre_shrink_reply_v2()) LBUG

 Which always leads to an LBUG which is what is putting the thread to
 sleep.

 Any time you see an LBUG in a server log file, you need to reboot the
 server.

 So now you need to take that ASSERTION message to our bugzilla and see
 if you can find a bug for already, and if not, file a new one, please.

 Cheers,
 b.

Thanks for your fast reply. I think # 20020 is the one we hit.
Waiting for a solution.
Greetings,
-- 
Dr. Michael Schwartzkopff
MultiNET Services GmbH
Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany
Tel: +49 - 89 - 45 69 11 0
Fax: +49 - 89 - 45 69 11 21
mob: +49 - 174 - 343 28 75

mail: mi...@multinet.de
web: www.multinet.de

Sitz der Gesellschaft: 85630 Grasbrunn
Registergericht: Amtsgericht München HRB 114375
Geschäftsführer: Günter Jurgeneit, Hubert Martens

---

PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B
Skype: misch42
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss