Re: [Samba] Tall tale of woe....

2003-12-16 Thread Ross McInnes (Systems)
On Mon, 15 Dec 2003, Gerald (Jerry) Carter wrote:

 The kernel should log the oops in /var/log/messages.

Yeah, its not there. log stops at 11:29:07 the next entry is at 11:47 when 
its booting.

 
 We can't be blamed for a kernel oops.  If a user space app
 can cause the kernel to die, then that's a kernel bug.
 I would start pursuing this with RedHat (if you have support),
 or logging it in bugzilla.redhat.com.

not trying to aportion blame here. Just trying to get the good old stable 
server back :/ was wondering if anyone else has had anything like this 
before?

i will contact redhat and see if they can offer any suggestions.

many thanks 

Ross McInnes

-- 
To unsubscribe from this list go to the following URL and read the
instructions:  http://lists.samba.org/mailman/listinfo/samba


Re: [Samba] Tall tale of woe....

2003-12-16 Thread Gerald (Jerry) Carter
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Ross McInnes (Systems) wrote:

| not trying to aportion blame here. Just trying to get
| the good old stable server back :/ was wondering if anyone
| else has had anything like this before?
I wasn't on the defensive.  Just stating that it would
have to a kernel bug in this case (one that I've not see
come up before).  It is possilbe that a hardware
component is failing (e.g. RAM).




cheers, jerry
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQE/3xFVIR7qMdg1EfYRAokOAJ0VWHOo42PAOM/hGmzZdv6jpjPjcACeJHQj
Cgs6zc0YctQb2pv1o+jIUuI=
=eQAw
-END PGP SIGNATURE-
--
To unsubscribe from this list go to the following URL and read the
instructions:  http://lists.samba.org/mailman/listinfo/samba


Re: [Samba] Tall tale of woe....

2003-12-16 Thread Ross McInnes (Systems)


On Tue, 16 Dec 2003, Gerald (Jerry) Carter wrote:

 
 I wasn't on the defensive.  Just stating that it would
 have to a kernel bug in this case (one that I've not see
 come up before).  It is possilbe that a hardware
 component is failing (e.g. RAM).
 

sorry i didnt mean it to come across like that. 

If its something thats not been seen before then it must 
be a hardware/kernal issue.

Many Thanks

Ross McInnes

-- 
To unsubscribe from this list go to the following URL and read the
instructions:  http://lists.samba.org/mailman/listinfo/samba


Re: [Samba] Tall tale of woe....

2003-12-15 Thread Ross McInnes (Systems)

Jerry...

 It logs to stdout.

Ah ok so redirect to another file will be in order.

 
 I think the key will be figuring out which tdb the
 runaway smbd is reading.
 
 Probably.  Does ifconfig show an abnormal amount of errors?
 If not, then you are probably ok wrt duplex settings, et. al.
 
 And to clarify, when the smbd starts sucking up CPU, check to
 which client it is connected to and look at the traffic
 pattern from that client to see if the smbd process is doing
 real work on behalf of the client.
 

no its fine, so thats one less thing to worry about.. or not.

eth0  Link encap:Ethernet  HWaddr 00:06:5B:F2:89:25
  inet addr:172.16.128.254  Bcast:172.16.255.255  Mask:255.255.0.0
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:1106496 errors:0 dropped:0 overruns:0 frame:0
  TX packets:1078245 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:100
  RX bytes:228665930 (218.0 Mb)  TX bytes:768785456 (733.1 Mb)
  Interrupt:28 Base address:0xdce0 Memory:fe8e-fe90

half way through writing this reply the server just panic'd and halted.

on the screen was (or there abouts)

smbd process PID 19579, stackpage = f300f000

calltrace [c013e86b] __kmem_cache_alloc

followed by

e1000_alloc_rx_buffers
e1000_alloc_rx_irq

might put some light onto it.
dont suppose you know where RH writes panics to? i cant seem to find it.

when i look at the samba.log there is nothing untoward

[2003/12/15 11:29:06, 1] smbd/service.c:make_connection(636)
  m6-1 (172.16.175.10) connect to service dmn01 as user dmn01 (uid=1269, 
gid=102) (pid 18746)
[2003/12/15 11:29:07, 0] lib/util_sock.c:read_data(436)
  read_data: read failure for 4. Error = Connection reset by peer
[2003/12/15 11:29:07, 1] smbd/service.c:close_cnum(677)
  m5-3 (172.16.142.30) closed connection to service exams
[2003/12/15 11:29:07, 1] smbd/service.c:close_cnum(677)
  m5-3 (172.16.142.30) closed connection to service shared
[2003/12/15 11:29:07, 1] smbd/service.c:close_cnum(677)
  m5-3 (172.16.142.30) closed connection to service intranet
[2003/12/15 11:29:07, 1] smbd/service.c:close_cnum(677)
  m5-3 (172.16.142.30) closed connection to service winfiles
[2003/12/15 11:29:07, 1] smbd/service.c:close_cnum(677)
  m5-3 (172.16.142.30) closed connection to service netlogon
[2003/12/15 11:29:07, 1] smbd/service.c:close_cnum(677)
  m5-3 (172.16.142.30) closed connection to service ab02
[2003/12/15 11:46:24, 1] smbd/service.c:make_connection(636)
  premises (172.16.180.10) connect to service rsmith as user rsmith 
(uid=1029, gid=101) (pid 890)

m6-8 (172.16.175.80) connect to service pn02 as user pn02 
(uid=2906, gid=102) (pid 19579)
[2003/12/15 11:27:49, 1] smbd/service.c:make_connection(636)

is the offending user/pid nothing untoward in his account or network 
traffic to or from his computer at the time.

unfortunatly i was unaware of the slowdown/problems so i was unable to 
perform strace on the pid.

im guessing it panics when the offending pid is left alone, and not kill 
-9 'd like i normally do.

Many thanks

A perturbed Ross McInnes

-- 
To unsubscribe from this list go to the following URL and read the
instructions:  http://lists.samba.org/mailman/listinfo/samba


Re: [Samba] Tall tale of woe....

2003-12-15 Thread Gerald (Jerry) Carter
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Ross McInnes (Systems) wrote:

| half way through writing this reply the server just panic'd and halted.
| on the screen was (or there abouts)
| smbd process PID 19579, stackpage = f300f000
|calltrace [c013e86b] __kmem_cache_alloc
| followed by
|e1000_alloc_rx_buffers
|e1000_alloc_rx_irq
|
| might put some light onto it.
| dont suppose you know where RH writes panics
| to? i cant seem to find it.
The kernel should log the oops in /var/log/messages.

| when i look at the samba.log there is nothing untoward
|
| [2003/12/15 11:29:06, 1] smbd/service.c:make_connection(636)
|   m6-1 (172.16.175.10) connect to service dmn01 as user dmn01 (uid=1269,
| gid=102) (pid 18746)
| [2003/12/15 11:29:07, 0] lib/util_sock.c:read_data(436)
|   read_data: read failure for 4. Error = Connection reset by peer
| [2003/12/15 11:29:07, 1] smbd/service.c:close_cnum(677)
|   m5-3 (172.16.142.30) closed connection to service exams
| [2003/12/15 11:29:07, 1] smbd/service.c:close_cnum(677)
|   m5-3 (172.16.142.30) closed connection to service shared
...
|
| m6-8 (172.16.175.80) connect to service pn02 as user pn02
| (uid=2906, gid=102) (pid 19579)
| [2003/12/15 11:27:49, 1] smbd/service.c:make_connection(636)
|
| is the offending user/pid nothing untoward in his account or network
| traffic to or from his computer at the time.
|
| unfortunatly i was unaware of the slowdown/problems so i was unable to
| perform strace on the pid.
|
| im guessing it panics when the offending pid is left alone, and not kill
| -9 'd like i normally do.
We can't be blamed for a kernel oops.  If a user space app
can cause the kernel to die, then that's a kernel bug.
I would start pursuing this with RedHat (if you have support),
or logging it in bugzilla.redhat.com.




cheers, jerry
~ --
~ Hewlett-Packard- http://www.hp.com
~ SAMBA Team -- http://www.samba.org
~ GnuPG Key   http://www.plainjoe.org/gpg_public.asc
~ If we're adding to the noise, turn off this song --Switchfoot (2003)
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQE/3dNyIR7qMdg1EfYRApdZAJ9htkTwywXzJZX0Ovv4oH3PApHWggCeIMRj
9lP0MyIVNBHMb+jErsEbLmA=
=GwKN
-END PGP SIGNATURE-
--
To unsubscribe from this list go to the following URL and read the
instructions:  http://lists.samba.org/mailman/listinfo/samba


Re: [Samba] Tall tale of woe....

2003-12-12 Thread Ross McInnes (Systems)
Many thanks for your reply gerry, its certainly put some light to all of 
this.

in answer to your questions

 I'm assuming that you are running version 2.2.x
 (included with RH8)..  Have you tested 3.0 (wait until
 3.0.1 if you haven't yet since there are a lot of bug
 fixes in it).

im running 2.2.8a with no imediate plans to upgrade, unless everything
else ive tried fails.

 What is the smbd process doing ?  Trying running strace
 or get a backtrace in gdb to find out where it is spending
 its time.
 

When and if it happens again i will try and get an strace
im assuming its simply strace -p PID

does it log the results somewhere? or do i  to a log file?
was thinking just in case it was a lot of information.

 
 Probably fctnl() calls when looking up data in a tdb.
 Find out which tdb  (withe look in /proc/pid/fd to
 match the file descriptor or us lsof).
 
 Also check the network traffic at this point.

very useful command that lsof. again when it happens i will definatly have
a look.

 Are you servning printers by chance?  If so have you
 set 'disable spoolss = yes' ?  I've seen high CPU utilization
 cases in relation to this param.

yes i am serving printers.. ive just checked the config and i dont have
'disable spoolss = yes'

 
 use mii-tool and check the duplex settings.  And any
 hardware can have problem no matter what the price tag
 says :-)
 Chgeck you routers.  Maybe they are getting overloaded or
 are dropping packets.
 
Ah, yes network traffic. i ran mii-tool and it reports
eth0: negotiated 100baseTx-FD flow-control, link ok

However, its a GB card and acording to the switch linked at a GB. im
hoping mii-tool is wrong.

ive just enabled monitoring on switch and will keep an eye on this...
we also run an admin network here, which is on the same equipment. Never 
during these outages have they complained about it being slow or 
unusable...

however thats not to say that its nothing to do with it, since it could be 
just that server and the port its on.

i'd just like to say i really apriciate your help in this matter.

Many Thanks

Ross McInnes

-- 
To unsubscribe from this list go to the following URL and read the
instructions:  http://lists.samba.org/mailman/listinfo/samba


Re: [Samba] Tall tale of woe....

2003-12-12 Thread Gerald (Jerry) Carter
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Ross McInnes (Systems) wrote:

| When and if it happens again i will try and get an strace
| im assuming its simply strace -p PID
| does it log the results somewhere? or do i  to a log file?
| was thinking just in case it was a lot of information.
It logs to stdout.

| Are you servning printers by chance?  If so have you
| set 'disable spoolss = yes' ?  I've seen high CPU utilization
| cases in relation to this param.
|
| yes i am serving printers.. ive just checked the config
| and i dont have 'disable spoolss = yes'
I think the key will be figuring out which tdb the
runaway smbd is reading.
| use mii-tool and check the duplex settings.  And any
| hardware can have problem no matter what the price tag
| says :-)  Chgeck you routers.  Maybe they are getting
| overloaded or are dropping packets.
|
| Ah, yes network traffic. i ran mii-tool and it reports
| eth0: negotiated 100baseTx-FD flow-control, link ok
|
| However, its a GB card and acording to the switch linked at a GB. im
| hoping mii-tool is wrong.
Probably.  Does ifconfig show an abnormal amount of errors?
If not, then you are probably ok wrt duplex settings, et. al.
And to clarify, when the smbd starts sucking up CPU, check to
which client it is connected to and look at the traffic
pattern from that client to see if the smbd process is doing
real work on behalf of the client.


cheers, jerry
~ --
~ Hewlett-Packard- http://www.hp.com
~ SAMBA Team -- http://www.samba.org
~ GnuPG Key   http://www.plainjoe.org/gpg_public.asc
~ If we're adding to the noise, turn off this song --Switchfoot (2003)
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQE/2czDIR7qMdg1EfYRArbGAJ48JseuqNzY56LSLB95ER63P4NslgCfTd7n
YZ5Bg3WeSzn4Z4PFyai8fWk=
=8Cd8
-END PGP SIGNATURE-
--
To unsubscribe from this list go to the following URL and read the
instructions:  http://lists.samba.org/mailman/listinfo/samba


Re: [Samba] Tall tale of woe....

2003-12-11 Thread Gerald (Jerry) Carter
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Ross McInnes (Systems) wrote:

| The main one is that a smbd process which belongs to a
| user logging in will appear in top (a cpu monitor program)
| using massives amount of CPU  etc. although the system says
| it still has about 10-15% idle, this generally stops
| everyone logging in.
I'm assuming that you are running version 2.2.x
(included with RH8)..  Have you tested 3.0 (wait until
3.0.1 if you haven't yet since there are a lot of bug
fixes in it).
What is the smbd process doing ?  Trying running strace
or get a backtrace in gdb to find out where it is spending
its time.
| When this problem occours it pushes system upto 50-80%!!!

Probably fctnl() calls when looking up data in a tdb.
Find out which tdb  (withe look in /proc/pid/fd to
match the file descriptor or us lsof).
Also check the network traffic at this point.

| Now i have read other peoples emails and gone through
| the archives about this and read about failure for 4. Error =
| No route to host, lib/util_sock.c:read_data(436)
| and oplockingproblems as they all appear to be more
| pronounced around the time of this high CPU/rouge smbd
| process.
Are you servning printers by chance?  If so have you
set 'disable spoolss = yes' ?  I've seen high CPU utilization
cases in relation to this param.
| However it would seem a lot of the oplocking problems
| seem to be hardware related. I use decent 3com kit here
| with a 4950 as a core and 4400's at edge (i.e not cheap
| and cheerful netgear/dlink/etc stuff) so im wondering if
| anyone else has had these problems with this kit. or if its
| not the kit what can it actually be?
use mii-tool and check the duplex settings.  And any
hardware can have problem no matter what the price tag
says :-)
| any ideas on the read_data(436) and failure for 4. Error = No route to
| host ?
Chgeck you routers.  Maybe they are getting overloaded or
are dropping packets.


- --
cheers, jerry
~ --
~ Hewlett-Packard- http://www.hp.com
~ SAMBA Team -- http://www.samba.org
~ GnuPG Key   http://www.plainjoe.org/gpg_public.asc
~ If we're adding to the noise, turn off this song --Switchfoot (2003)
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQE/2KzEIR7qMdg1EfYRAnB+AKDbcg2rGSS4meUkdPt/rkUB232z0gCdEclP
avVw21Ch7NUW5HlcRq2bCZ8=
=kKjK
-END PGP SIGNATURE-
--
To unsubscribe from this list go to the following URL and read the
instructions:  http://lists.samba.org/mailman/listinfo/samba


[Samba] Tall tale of woe....

2003-12-09 Thread Ross McInnes (Systems)
For the last year or so i have been having problems in general with samba 
(various versions) on the same box. 
Dell 2500 Xeon 1.8 with 2gb of ram running Redhat 8.

What will happen from time to time (although its now happened 3 times in 
the last 5 days, hence this email) is people will be slow to log in, if at 
all. Several things appear to happen.

The main one is that a smbd process which belongs to a user logging in 
will appear in top (a cpu monitor program) using massives amount of CPU 
etc. although the system says it still has about 10-15% idle, this 
generally stops everyone logging in.

Now as part of top on RH (doesnt look the same on bsd) it has a system 
entry with a % of cpu given over to that. Now system basically means 
anything I/O or kernal related. since the kernal governs resources this 
isnt uncommon. During a period of 4 hours i monitored this system and it 
never went above 10% and even then for a matter of seconds.
When this problem occours it pushes system upto 50-80%!!! i look at the 
server and the disks are pretty much idle so its not Disk Related. i am at 
a loss to find out what it is actually doing to cause this.

however once i kill off this process it seems to slowly get back to 
normal.

Now i have read other peoples emails and gone through the archives about 
this and read about failure for 4. Error = No route to host, 
lib/util_sock.c:read_data(436) and oplocking
problems as they all appear to be more pronounced around the time of 
this high CPU/rouge smbd process. 

However it would seem a lot of the oplocking problems seem to be 
hardware related. I use decent 3com kit here with a 4950 as a core and 
4400's at edge (i.e not cheap and cheerful netgear/dlink/etc stuff) so im 
wondering if anyone else has had these problems with this kit. or if its 
not the kit what can it actually be?

ive tried turning oplocks on and off to no avail. it still has this issue.

any ideas on the read_data(436) and failure for 4. Error = No route to 
host ?


Any help offered very gratefully recieved.

With thanks

Ross McInnes

-- 
To unsubscribe from this list go to the following URL and read the
instructions:  http://lists.samba.org/mailman/listinfo/samba


RE: [Samba] Tall tale of woe....

2003-12-09 Thread Ladner, Eric (Eric.Ladner)

Next time it happens, running an strace on the offending process strace
-p process_id can provide some insight as to what it's beating around
on, especially if it's system related.  That might help pinpoint a spot
in the code where it's having problems.

Eric

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On
Behalf Of Ross McInnes (Systems)
Sent: Tuesday, December 09, 2003 9:34 AM
To: [EMAIL PROTECTED]
Subject: [Samba] Tall tale of woe


For the last year or so i have been having problems in general with
samba 
(various versions) on the same box. 
Dell 2500 Xeon 1.8 with 2gb of ram running Redhat 8.

What will happen from time to time (although its now happened 3 times in

the last 5 days, hence this email) is people will be slow to log in, if
at 
all. Several things appear to happen.

The main one is that a smbd process which belongs to a user logging in 
will appear in top (a cpu monitor program) using massives amount of CPU 
etc. although the system says it still has about 10-15% idle, this 
generally stops everyone logging in.

Now as part of top on RH (doesnt look the same on bsd) it has a system 
entry with a % of cpu given over to that. Now system basically means 
anything I/O or kernal related. since the kernal governs resources this 
isnt uncommon. During a period of 4 hours i monitored this system and
it 
never went above 10% and even then for a matter of seconds. When this
problem occours it pushes system upto 50-80%!!! i look at the 
server and the disks are pretty much idle so its not Disk Related. i am
at 
a loss to find out what it is actually doing to cause this.

however once i kill off this process it seems to slowly get back to 
normal.

Now i have read other peoples emails and gone through the archives about

this and read about failure for 4. Error = No route to host, 
lib/util_sock.c:read_data(436) and oplocking
problems as they all appear to be more pronounced around the time of 
this high CPU/rouge smbd process. 

However it would seem a lot of the oplocking problems seem to be 
hardware related. I use decent 3com kit here with a 4950 as a core and 
4400's at edge (i.e not cheap and cheerful netgear/dlink/etc stuff) so
im 
wondering if anyone else has had these problems with this kit. or if its

not the kit what can it actually be?

ive tried turning oplocks on and off to no avail. it still has this
issue.

any ideas on the read_data(436) and failure for 4. Error = No route
to 
host ?


Any help offered very gratefully recieved.

With thanks

Ross McInnes

-- 
To unsubscribe from this list go to the following URL and read the
instructions:  http://lists.samba.org/mailman/listinfo/samba


--
To unsubscribe from this list go to the following URL and read the
instructions:  http://lists.samba.org/mailman/listinfo/samba


AW: [Samba] Tall tale of woe....

2003-12-09 Thread Reiner Lanowski
Hello Ross,

I've got a similar problem since a week with my server.
I'm running RH 8.0 on a Xeon 1.4Ghz with 2GB RAM, too.
My CPU usage was pretty low (90% idle) but the CPU load average was at 14.

For now I'm waiting for the next time it happens to get some more
information.

Best regards,   Reiner

-Ursprüngliche Nachricht-
Von: Ross McInnes (Systems) [mailto:[EMAIL PROTECTED]
Gesendet: Dienstag, 9. Dezember 2003 16:34
An: [EMAIL PROTECTED]
Betreff: [Samba] Tall tale of woe


For the last year or so i have been having problems in general with samba 
(various versions) on the same box. 
Dell 2500 Xeon 1.8 with 2gb of ram running Redhat 8.

What will happen from time to time (although its now happened 3 times in 
the last 5 days, hence this email) is people will be slow to log in, if at 
all. Several things appear to happen.

The main one is that a smbd process which belongs to a user logging in 
will appear in top (a cpu monitor program) using massives amount of CPU 
etc. although the system says it still has about 10-15% idle, this 
generally stops everyone logging in.

Now as part of top on RH (doesnt look the same on bsd) it has a system 
entry with a % of cpu given over to that. Now system basically means 
anything I/O or kernal related. since the kernal governs resources this 
isnt uncommon. During a period of 4 hours i monitored this system and it 
never went above 10% and even then for a matter of seconds.
When this problem occours it pushes system upto 50-80%!!! i look at the 
server and the disks are pretty much idle so its not Disk Related. i am at 
a loss to find out what it is actually doing to cause this.

however once i kill off this process it seems to slowly get back to 
normal.

Now i have read other peoples emails and gone through the archives about 
this and read about failure for 4. Error = No route to host, 
lib/util_sock.c:read_data(436) and oplocking
problems as they all appear to be more pronounced around the time of 
this high CPU/rouge smbd process. 

However it would seem a lot of the oplocking problems seem to be 
hardware related. I use decent 3com kit here with a 4950 as a core and 
4400's at edge (i.e not cheap and cheerful netgear/dlink/etc stuff) so im 
wondering if anyone else has had these problems with this kit. or if its 
not the kit what can it actually be?

ive tried turning oplocks on and off to no avail. it still has this issue.

any ideas on the read_data(436) and failure for 4. Error = No route to 
host ?


Any help offered very gratefully recieved.

With thanks

Ross McInnes

-- 
To unsubscribe from this list go to the following URL and read the
instructions:  http://lists.samba.org/mailman/listinfo/samba
--
To unsubscribe from this list go to the following URL and read the
instructions:  http://lists.samba.org/mailman/listinfo/samba