smbfs/cifs large file support history?

2007-10-24 Thread Ville Herva
Does anyone remember when linux smbfs (or cifs) gained large file 
(>2GB, >4GB) file support?

At least most 2.2.x didn't have it (were there 2.2 smbfs LFS patches?)

Was 2.4 the first kernel to support large files on smbfs?



-- v -- 

[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.4.35 SMP: ext3_readdir: bad entry in directory #323888: rec_len is smaller than minimal

2007-09-20 Thread Ville Herva
On Thu, Sep 20, 2007 at 03:20:55PM +0200, you [Willy Tarreau] wrote:
> OK. And your config seems perfectly standard.
> 
> > gcc 2.96-129:
> > cat /proc/version 
> > Linux version 2.4.35 (root) (gcc version 2.96 2731 (Red Hat Linux 7.2 
> > 2.96-129.7.2)) #1 SMP Thu Aug 9 10:35:37 EEST 2007
> 
> I used not to trust 2.96, but I wouldn't accuse it now.

The box was virtualized a while ago and 2.4.32-rc1 and earlier 2.4 compiled
with the same compiler ran very solidly for years. It was UP before
virtualization, though.
 
> > share of chipset bugs with older Via chipsets, but I think it's very likely
> > in this case.
> 
> I think you meant "unlikely".

Yep, sorry for the typo.
 
> > This could very well be a VMware bug, but I wanted to know if this rings
> > bells for someone.
> 
> It could also be a problem with the host OS, drivers, hardware, etc...

Yes, pretty much anything. There's no solid evidency of anything, only
guesses of what might be more likely and what might be less likely...

If it happens again, I'll try to debug more.



-- v -- 

[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.4.35 SMP: ext3_readdir: bad entry in directory #323888: rec_len is smaller than minimal

2007-09-20 Thread Ville Herva
On Tue, Sep 18, 2007 at 11:47:05PM +0200, you [Willy Tarreau] wrote:
> Thanks for your report. Unfortunately, I've rechecked the recent changelogs
> and see nothing related either. At least, in order to keep trace of the
> incident, would you please post some info about your config (CPU, RAM,
> chipset, .config, gcc, and any possible patches you may have applied) ?
> Maybe some of these info may remind old bad memories to some people.
> 
> Also, do you know if this server has ECC memory ? I would more easily
> bet for side effects of one random bit flip in memory than for some
> massive block corruption.
> 
> I vaguely remember about very old reports of people sometimes observing
> zeroed out blocks during writes, which were attributed to chipset bugs
> if my memory serves me. But I would rule this out as recent chipsets
> look more stable than 5-10 years ago !

Willy,

The machine is a virtual machine on an VMware ESX 3.0.1 host.

/proc/cpuinfo shows two of these:
Dual
model   : 15
model name  : Intel(R) Xeon(R) CPU   E5345  @ 2.33GHz
stepping: 8
cpu MHz : 2333.014
cache size  : 64 KB

It has 864MB of memory.

.config is at:
http://v.iki.fi/~vherva/tmp/2.4.35-config
The kernel is plain vanilla 2.4.35 from kernel.org, no patches.

gcc 2.96-129:
cat /proc/version 
Linux version 2.4.35 (root) (gcc version 2.96 2731 (Red Hat Linux 7.2 
2.96-129.7.2)) #1 SMP Thu Aug 9 10:35:37 EEST 2007

Memory is ECC.

The server is HP Proliant ML370 with 82801BA/CA/DB/EB chipset. I've had my
share of chipset bugs with older Via chipsets, but I think it's very likely
in this case.

This could very well be a VMware bug, but I wanted to know if this rings
bells for someone.


-- v -- 

[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.4.35 SMP: ext3_readdir: bad entry in directory #323888: rec_len is smaller than minimal

2007-09-18 Thread Ville Herva
On Tue, Sep 18, 2007 at 06:22:56PM +0200, you [Jan Kara] wrote:
> > Sorry for the sparse details, but when you have these kind of problems on
> > live servers, you tend to forget the debuggability...
>   Yes, I can understand that :). It's just that now it's hard to find
> out what has really happened. Anyway, thanks for your report.

If we are really lucky or unlucky it will happen again. 

Zeroed-out block just might be a kernel problem (SMP race, whatever) -
random corruption would more likely be a hardware problem. There were no IO
error either. But, 2.4 ext3 has been pretty extensively tested, so I don't
suppose that's likely either. And judging from
http://www.kernel.org/pub/linux/kernel/v2.4/ChangeLog-2.4.3[45] there
haven't been many changes to ext3 lately either.
 


-- v -- 

[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.4.35 SMP: ext3_readdir: bad entry in directory #323888: rec_len is smaller than minimal

2007-09-18 Thread Ville Herva
On Tue, Sep 18, 2007 at 05:12:06PM +0200, you [Jan Kara] wrote:
>   Hello,
> 
> > I got a bunch of these into dmesg:
> > 
> > EXT3-fs error (device sd(8,2)): ext3_readdir: bad entry in directory 
> > #323880: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, 
> > name_len=0
> > EXT3-fs error (device sd(8,2)): ext3_readdir: bad entry in directory 
> > #323888: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, 
> > name_len=0
> > EXT3-fs error (device sd(8,2)): ext3_readdir: bad entry in directory 
> > #323882: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, 
> > name_len=0
> > 
> > The kernel is 2.4.35 SMP, dual-processor. The scsi driver is Fusion MPT SCSI
> > Host driver 2.05.16.
> > 
> > The device is /dev/sda2, root fs.
> > 
> > One line per each directory had dropped into dmesg each night (I think
> > during updatedb) before I noticed.
>   Interesting. Can you look (using debugfs) on the content of the
> /usr/share/doc/ directory? It seems like parts of it have been zeroed
> out...

Unfortunately, no. I removed those directories because those were the only
ones causing problems and wasn't able to reboot for a proper fsck
immediately. The rm -rf command gave no errors (to stdout or dmesg), and a
read-only fsck right after that gave no errors on the directory structure.

Sorry for the sparse details, but when you have these kind of problems on
live servers, you tend to forget the debuggability...



-- v -- 

[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.4.35 SMP: ext3_readdir: bad entry in directory #323888: rec_len is smaller than minimal

2007-09-18 Thread Ville Herva
Hello, 

I got a bunch of these into dmesg:

EXT3-fs error (device sd(8,2)): ext3_readdir: bad entry in directory #323880: 
rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0
EXT3-fs error (device sd(8,2)): ext3_readdir: bad entry in directory #323888: 
rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0
EXT3-fs error (device sd(8,2)): ext3_readdir: bad entry in directory #323882: 
rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0

The kernel is 2.4.35 SMP, dual-processor. The scsi driver is Fusion MPT SCSI
Host driver 2.05.16.

The device is /dev/sda2, root fs.

One line per each directory had dropped into dmesg each night (I think
during updatedb) before I noticed.

The directories in question have not been written to for ages:

>debugfs /dev/sda2
debugfs:  ncheck 323888
Inode   Pathname
323888  /usr/share/doc/logcheck-1.1.1
debugfs:  ncheck 323882
Inode   Pathname
323882  /usr/share/doc/dev86-0.15.5
debugfs:  ncheck 323880
Inode   Pathname
323880  /usr/share/doc/mod_put-1.3


The hardware _should_ be solid, although I can never 100% sure rule out disk
level corruption.

Does this ring any bells to anyone, short of block level corruption?



-- v -- 

[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22 Oops attaching usb-storage device

2007-07-11 Thread Ville Herva
On Wed, Jul 11, 2007 at 04:17:14PM +0200, you [Michal Piotrowski] wrote:
> Hi,
> 
> On 11/07/07, Ville Herva <[EMAIL PROTECTED]> wrote:
> >[98790.366620] Modules linked in: ub nvidia(P) ppp_deflate zlib_deflate 
>
> "When you are using a binary driver, the kernel is "tainted", which
> means that the source of possible problems may be unrelated to the
> kernel code (see
> https://secure-support.novell.com/KanisaPlatform/Publishing/250/3582750_f.SAL_Public.html
> for more details). You can check whether or not the kernel was tainted
> when the problem occurred by looking at the corresponding error
> message. If can you see something similar to the following line:
> EIP:  0060:[] Tainted: P  VLI
> (the word Tainted is crucial here), the kernel was tainted and most
> probably the kernel developers will not be able to help you. In that
> case you should try to reproduce the problem without the binary driver
> loaded. Moreover, if the problem does not occur without it, you should
> send a bug report to the creators of the binary driver and ask them to
> ???x it."

Thanks, I do know the kernel was tainted (Nvidia display driver). 

I didn't explicitly mention that in the report, since the kernel developers
see that from the oops message, and because I believe there is a fair change
that the hated nvidia module played no role in this case. Of course, there's
no way to be sure, as it is closed source.

I sent the oops message just in case the usb-storage / usb people can get an
idea from the stack trace what might have gone wrong. I believe it was worth
it even though nvidia module might in theory make the problem unsolvable. 

The problem is spurious (not reproducible). I attached and removed the Nokia
device quite some times today and yesterday without problems, and this only
happened once. 

As regards to nvidia being able to fix this problem, I sure you understand
that will not happen :) Even if it was the nvidia driver mucking arounf the
kernel address space, that would have to be very random and this oops
message couldn't help them.



-- v -- 

[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.22 Oops attaching usb-storage device

2007-07-11 Thread Ville Herva
First: thanks for the new partition print out when failing to mount rootfs.
That came in handy on the very first boot (I had switched harddisks around
and failed to guess the correct root device 2102). I've been longing for
that sort of printout for years. Very useful.

Anyway, 2.6.22, seems pretty solid, but I managed to oops it. I had Creative
Muvo Zen V attached (I believe it played no role in the oops) and tried to
attach Nokia E70 in usb storage mode. The first time Nokia said usb storage
mode couldn't be used since there was an application using the memory card
(in reality there wasn't, but Nokia as this bug it thinks there is.) 

So I had to unplug the Nokia and the replug it. This is when the linux 
usb-storage
oopsed (see below.)

I had to reboot, since all usb related processes hung (like lsusb). 


[98785.107106] usb-storage: device found at 27
[98785.107108] usb-storage: waiting for device to settle before scanning
[98790.104352] usb-storage: device scan complete
[98790.107341] scsi 17:0:0:0: Direct-Access
PQ: 0 ANSI: 0
[98790.117331] sd 17:0:0:0: [sdb] 3910526 512-byte hardware sectors (2002 MB)
[98790.121322] sd 17:0:0:0: [sdb] Write Protect is off
[98790.121328] sd 17:0:0:0: [sdb] Mode Sense: 03 00 00 00
[98790.121331] sd 17:0:0:0: [sdb] Assuming drive cache: write through
[98790.131321] sd 17:0:0:0: [sdb] 3910526 512-byte hardware sectors (2002 MB)
[98790.134311] sd 17:0:0:0: [sdb] Write Protect is off
[98790.134315] sd 17:0:0:0: [sdb] Mode Sense: 03 00 00 00
[98790.134319] sd 17:0:0:0: [sdb] Assuming drive cache: write through
[98790.134323]  sdb:<6>usb 6-1: USB disconnect, address 27
[98790.366609] BUG: unable to handle kernel NULL pointer dereference at virtual 
address 
[98790.366612]  printing eip:
[98790.366614] c02481a9
[98790.366615] *pde = 
[98790.366617] Oops:  [#1]
[98790.366618] SMP 
[98790.366620] Modules linked in: ub nvidia(P) ppp_deflate zlib_deflate 
bsd_comp ppp_async crc_ccitt ppp_generic slhc saa7134_alsa ipt_REJECT 8250_pci 
w83627ehf i2c_isa iptable_filter hidp rfcomm l2cap binfmt_misc nls_iso8859_1 
nls_cp437 dm_mirror dm_mod button battery ac lp nvram loop saa7134_dvb tda826x 
tda10086 tda1004x tda827x snd_hda_intel cx88_dvb cx88_vp3054_i2c mt352 
snd_seq_oss snd_seq_midi_event snd_seq dvb_pll or51132 video_buf_dvb nxt200x 
firmware_class isl6421 zl10353 cx24123 lgdt330x tuner snd_seq_device 
snd_pcm_oss dvb_core snd_mixer_oss cx22702 snd_pcm saa7134 cx8800 cx8802 cx88xx 
ir_kbd_i2c ir_common i2c_i801 i2c_algo_bit intel_agp 8250_pnp parport_pc ide_cd 
8250 compat_ioctl32 tveeprom r8169 videodev video_buf v4l2_common v4l1_compat 
btcx_risc i2c_core parport serial_core agpgart hci_usb cdrom snd_timer 
bluetooth snd soundcore snd_page_alloc pata_jmicron
[98790.366670] CPU:1
[98790.366670] EIP:0060:[]Tainted: P   VLI
[98790.366671] EFLAGS: 00010202   (2.6.22 #1)
[98790.366677] EIP is at make_class_name+0x27/0x7a
[98790.366679] eax:    ebx:    ecx:    edx: 000b
[98790.366681] esi: c038d492   edi:    ebp:    esp: c22f5e68
[98790.366683] ds: 007b   es: 007b   fs: 00d8  gs:   ss: 0068
[98790.366685] Process khubd (pid: 193, ti=c22f4000 task=c21b3030 
task.ti=c22f4000)
[98790.366686] Stack: e0e24a08 e0e24a00 c03cb08c c03cb020 e0e24a08 c02482e0 
 e0e24a00 
[98790.366692]e0e24894 0282 d35c3800 c024836a e0e24800 c0266062 
e0e24800 c842e000 
[98790.366697]c0263f47 c842e030 c842e000 c025fe25 c842e304 f787c818 
c03cd3e0 c02954ec 
[98790.366702] Call Trace:
[98790.366708]  [] class_device_del+0x88/0x10a
[98790.366714]  [] class_device_unregister+0x8/0x10
[98790.366718]  [] __scsi_remove_device+0x23/0x60
[98790.366724]  [] scsi_forget_host+0x2d/0x4a
[98790.366729]  [] scsi_remove_host+0x65/0xd8
[98790.366733]  [] storage_disconnect+0xe/0x16
[98790.366738]  [] usb_unbind_interface+0x44/0x85
[98790.366743]  [] __device_release_driver+0x6e/0x8b
[98790.366747]  [] device_release_driver+0x23/0x39
[98790.366751]  [] bus_remove_device+0x6a/0x7a
[98790.366755]  [] device_del+0x1d7/0x248
[98790.366760]  [] usb_disable_device+0x5c/0xbb
[98790.366765]  [] usb_disconnect+0x88/0x11e
[98790.366771]  [] hub_thread+0x379/0xa83
[98790.366776]  [] __sched_text_start+0x7ef/0x88d
[98790.366785]  [] autoremove_wake_function+0x0/0x35
[98790.366791]  [] hub_thread+0x0/0xa83
[98790.366795]  [] kthread+0x38/0x5d
[98790.366798]  [] kthread+0x0/0x5d
[98790.366801]  [] kernel_thread_helper+0x7/0x10
[98790.366807]  ===
[98790.366808] Code: 5b 04 5b c3 55 31 ed 57 89 c7 56 89 c6 53 89 e8 83 ec 04 
83 cb ff 89 14 24 89 d9 f2 ae f7 d1 49 8b 04 24 89 ca 8b 38 89 d9 89 e8  ae 
f7 d1 49 8d 44 0a 02 ba d0 00 00 00 e8 32 1c f1 ff 31 d2 
[98790.366831] EIP: [] make_class_name+0x27/0x7a SS:ESP 0068:c22f5e68
[98790.366897] sd 17:0:0:0: [sdb] Result: hostbyte=0x01 driverbyte=0x00
[98790.366900] end_request: I/O error, dev sdb, sector 0
[98790.366902] printk: 

Re: 2.6.18-rc7: ide_cd problems

2006-11-26 Thread Ville Herva
On Mon, Nov 27, 2006 at 12:00:30PM +0900, you [Tejun Heo] wrote:
> Ville Herva wrote:
> >When ripping a cd with grip, I noticed the drive was not in DMA mode. I did
> >hdparm -d1 /dev/hdi. The grip process (it uses libcdda_paranoia.so and
> >libcdda_interface.so) hung, and attempt to kill it with -KILL failed.
> >Eventually it died but remained as zombie:
> 
> Known problem but probably won't get fixed.  

Fair enough.

> Just use hdparm only when the drive is idle. Put it somewhere in the boot
> script.

I already did, but DMA had dropped off in the meanwhile. 

It did burn a dvd without a hitch with DMA=1, though.

> Hmm... IDE should enable DMA automatically for most cases.  Can you post 
> full dmesg?

Will do. I'll just have to boot first.

It's currently unavailable, as the 

[2416661.676213] attempt to access beyond end of device
[2416661.676216] loop0: rw=0, want=1401620, limit=946048

messages from loop (/dev/hdi is limited to 946048 block as I described) have
filled dmesg.

Thanks.



-- v -- 

[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Upgrade from 2.6.10-ac8 to 2.6.12.5 broke lvm rootfs

2005-08-16 Thread Ville Herva
After upgrading the kernel from 2.6.10-ac8 to 2.6.12.5 the initramfs was no
longer able to mount rootfs.

  mount: error 6 mounting ext3

All the configuration options are identical, and upgrading lvm2 package:
  lvm2-2.00.25-1.01   -> lvm2-2.01.14-1.0
  device-mapper-1.00.19-2 -> device-mapper-1.01.04-1.0

Did not change anything.

Dm, ext3 and the relevant block device drivers statically compiled in.

The vg has lvm1 format, fwiw.

I enabled all the debug options I could think of in the nash-based initramfs
init script. That did not appear to tell much: all I was able to tell was
that lvm was succesfully called by the init script:

mount -t proc /proc /proc
mount -t sysfs none /sys
insmod /lib/dm-snapshot.ko 
mkdevices /dev
mkdmnod
lvm vgscan -v
# sleep 5
lvm vgchange -ay
# sleep 5
lvm vgmknodes
# sleep 5
mkrootdev /dev/root
umount /sys
# sleep 5
mount -o defaults --ro -t ext3 /dev/root /sysroot
switchroot /sysroot

but those didn't give any meaningful output (other than notices about
setting log indentation level).

Finally, I added "sleep 5" after each lvm command (commented out above),
which appeared "solve" the problem. 

Apparently the lvm scripts somehow do their initialization asynchronously
and the init script tries to mount root before it is available. I'm not sure
why this is affected by the kernel version, though.



-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.4.30-rc3 md/ext3 problems

2005-03-28 Thread Ville Herva
On Mon, Mar 28, 2005 at 07:25:58PM +0200, you [Willy Tarreau] wrote:
> 
> Since you don't seem to be willing to remove vserver, I guess you really
> need it on this machine, and to be honnest, 

Yes, the machine is in production, and for that it, it needs vserver. The
fact it is in production also makes it a tad hankala awkward to test
different options, at least the most experimental ones.

> I too don't see what trouble it could cause in this area. 

Neither do I. Of course it could be memory corruption caused by vserver in
different part of kernel, but the fact that the symptoms are so consistent,
make me think that is unlikely. But not impossible.

> However, could you try removing the journal, or simply mount the FS as
> ext2 ? It would help to narrow the problem down.

That is a good idea, if only I could boot the box at will and reliably
reproduce the problem. While it took less than ten minutes to trigger it the
first time, it took more than five hours the first time. 

I will try to reproduce the problem on different setup first, and if I can
do that, I'll try your suggestion.
 
> To resume, you have your root on ext3 on top of soft raid1 consisting in
> two IDE disks, which works in 2.4.21 but not on 2.4.30-rc3, that's
> correct ? 

Correct. This is PII 266MHz, no SMP.

> There was a fix last week by Neil Brown about RAID1 rebuild process
> (degraded array of 3 disks, etc...), unless it obviously does not come
> from there, you might want to try reverting it first ? 

Sounds sane, although the raid array was not in degraded state at any stage
and no raid rebuild aver triggered.

> The next one is from Doug Ledford on 2004/09/18 and should only affect
> SMP.

Ok, as said, this is UP.
 
> My different raid machines run either reiserfs or xfs on soft raid5 on
> top of scsi and with kernel 2.4.27, so there's not much to compare...
> Perhaps someone on the list has a setup similar to yours and could test
> the kernel ?

I will try to contruct a similar setup on another machine.


thanks for your insights,

-- v -- 

[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Linux 2.4.30-rc3 md/ext3 problems

2005-03-28 Thread Ville Herva
On Mon, Mar 28, 2005 at 10:34:05AM +0300, [Ville Herva] wrote:
> 
> I just upgraded from linux-2.4.21 + vserser 0.17 to 2.4.30rc3 + vserver
> 1.2.10. The box has been running stable with 2.4.21 + vserver 0.17/0.16 for
> a few years (uptime before reboot was nearly 400 days.)
> 
> The boot went fine, but after few hours I got 
> Message from [EMAIL PROTECTED] at Sun Mar 27 22:07:00 2005 ...
> kernel: journal commit I/O error
> 
> and dmesg is filled with 
> --8<---
> EXT3-fs error (device md(9,3)) in start_transaction: Journal has aborted
> EXT3-fs error (device md(9,3)) in start_transaction: Journal has aborted
> EXT3-fs error (device md(9,3)) in start_transaction: Journal has aborted
> EXT3-fs error (device md(9,3)) in start_transaction: Journal has aborted
> --8<---
> 
> This is roofs, on top software raid1 and two ide disks. mdstat claims it's
> healthy:
> 
> --8<---
> md3 : active raid1 hdc3[1] hda3[0]
>   37955648 blocks [2/2] [UU]
> --8<---
> 
> While dmesg has filled up and /var/log/messages is read-only - I can't see
> all the kernel messages - there appears to be no IO errors from the
> underlying devices (md, ide). smartctl -a does not report errors for hda nor
> hdc.
> 
> During reboot, fsck was run for md3, and it was clean. Now I get
> 
> --8<---
> Block bitmap differences:  -(7800660--7801060) -(7801934--7802030) 
> -(7802370--7802602) -(7802604--7802613) -(7802681--7802700) 
> -(7802715--7802716) -(7802726--7802732) 
> -(7802744--7802750)-(7802914--7802927) -(7802934--7802937) 
> -(7802946--7802964)  -(7803392--7803417) -(7805060--7808825) 
> -(7808976--7809608) 
> Fix? no
> 
> Inode bitmap differences:  -3899400
> Fix? no
> --8<---
> 
> No errors from the badblocks part of the fsck, though.
> 
> Running fsck triggers the "journal commit I/O error" messages again, and
> still no IO errors from either md or ide.
> 
> This _could_ have something to do with the vserver patch but it doesn't
> appear so. Also, it doesn't immediately look like hardware problem. 

I rebooted (fsck took the fs errors away, no big offenders), and after a few
minutes, I got the same error ("journal commit I/O error"). So it doesn't
appear all that random memory corruption. The error happened right when I
logged out, but that might have been a coincidence. No ide nor md errors
this time either. 

I don't know what to suspect. What I gather from changelogs, there haven't
been any critical looking ext3 changes in 2.4 lately, but then again,
vserver doesn't mess with block layer / ext3 journalling either.

Any ideas?


-- v -- 

[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.4.30-rc3

2005-03-27 Thread Ville Herva
On Sat, Mar 26, 2005 at 01:28:01PM -0300, you [Marcelo Tosatti] wrote:
> 
> Hi, 
> 
> Here goes -rc3.
> 
> A nasty typo happened while merging v2.6 load_elf_library() DoS fix,
> which could leap to oopses.
> 
> Summary of changes from v2.4.30-rc2 to v2.4.30-rc3
> 
> 
> Marcelo Tosatti:
>   o Andreas Arens: Fix deadly mismerge of binfmt_elf DoS fix
>   o Change VERSION to 2.4.30-rc3

I just upgraded from linux-2.4.21 + vserser 0.17 to 2.4.30rc3 + vserver
1.2.10. The box has been running stable with 2.4.21 + vserver 0.17/0.16 for
a few years (uptime before reboot was nearly 400 days.)

The boot went fine, but after few hours I got 
Message from [EMAIL PROTECTED] at Sun Mar 27 22:07:00 2005 ...
turing kernel: journal commit I/O error

and dmesg is filled with 
--8<---
EXT3-fs error (device md(9,3)) in start_transaction: Journal has aborted
EXT3-fs error (device md(9,3)) in start_transaction: Journal has aborted
EXT3-fs error (device md(9,3)) in start_transaction: Journal has aborted
EXT3-fs error (device md(9,3)) in start_transaction: Journal has aborted
--8<---

This is roofs, on top software raid1 and two ide disks. mdstat claims it's
healthy:

--8<---
md3 : active raid1 hdc3[1] hda3[0]
  37955648 blocks [2/2] [UU]
--8<---

While dmesg has filled up and /var/log/messages is read-only - I can't see
all the kernel messages - there appears to be no IO errors from the
underlying devices (md, ide). smartctl -a does not report errors for hda nor
hdc.

During reboot, fsck was run for md3, and it was clean. Now I get

--8<---
Block bitmap differences:  -(7800660--7801060) -(7801934--7802030) 
-(7802370--7802602) -(7802604--7802613) -(7802681--7802700) -(7802715--7802716) 
-(7802726--7802732) -(7802744--7802750)-(7802914--7802927) -(7802934--7802937) 
-(7802946--7802964)  -(7803392--7803417) -(7805060--7808825) 
-(7808976--7809608) 
Fix? no

Inode bitmap differences:  -3899400
Fix? no
--8<---

No errors from the badblocks part of the fsck, though.

Running fsck triggers the "journal commit I/O error" messages again, and
still no IO errors from either md or ide.

This _could_ have something to do with the vserver patch but it doesn't
appear so. Also, it doesn't immediately look like hardware problem. 

Any ideas?


-- v -- 

[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[OT] Re: Strange errors in /var/log/messages

2001-07-02 Thread Ville Herva

On Mon, Jul 02, 2001 at 01:00:33PM -0400, you [Richard B. Johnson] claimed:
> > Jul  2 15:12:16 gateway SERVER[1240]: Dispatch_input: bad request line
> > 'BBXX%.176u%3
> > 
>00$nsecurity.%301$n%302$n%.192u%303$n\220\220\220\220\220\220\220\220\220\220\220\220\220\220\220\220\220\220\220\22
> > 
>0\220\220\220\220\220\220\220\220\220\220\220\220\220\220\220\220\220\220\220\220\220\220\220\220\220\220\220\220\22
> 
> I think you just got 'rooted'. Look at /etc/inetd.conf (if it exists
> on your system, the xinetd is more robust). It may have a new entry
> on its last line providing a root shell to anybody. This looks somewhat
> like an attack shown by CERN about 6 to 12 months ago.

(This has nothing to do with linux-kernel, sorry...)

I don't think anything particular in that message suggests he actually got
rooted? It just seems that somebody tried to exploit lprNG hole (or
something else) and the daemon logged that. Of course, it *is* perfectly
possible, that he _got_ rooted (although he said he was running redhat-7.0
with all the updates). 

(The attacker may have tried other attacks so if he got rooted, those above
are not necessarily the related log messages. In any case, a 'smart' intruder
would have cleaned the log. Also, 'smart' attacker propably uses something
more advanced as backdoor than /etc/inetd.conf these days.)

Or is there something that actually indicates a succesfull intrusion in the
log snippet that I'm missing?


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Oops in iput

2001-06-26 Thread Ville Herva

On Tue, Jun 26, 2001 at 11:56:51AM +0100, you [Stephen C. Tweedie] claimed:
> Hi,
> 
> On Tue, Jun 26, 2001 at 11:09:33AM +0300, Ville Herva wrote:
> 
> > Well, I for one use the 2.2 ide patches extensively (on almost all of my
> > machines, including a heavy-duty backup server)
> 
> It is highly hardware-dependent.  A huge amount of effort was spent
> early in 2.4 getting blacklists and hardware tweaks right to work
> around problems with specific chipsets with ide udma.  Just because it
> works for one person doesn't give you any confidence that it won't
> trash data for somebody else.

Well, the report said 'Intel BX chipset' - that's as solid as chipsets get
(to loosely quote Alan). Almost all of my boxes are BX (one has HPT366 in
addition, and another one was changed to Via686a recently), and I imagine
that BX gets most testing since it is very common chipset. Moreover, it only
does UDMA33, not any fancy 66 or 100 stuff (although I haven't had problems
with those either on HPT366, HPT370 nor 686a).

As said, it could be the ide patch, but surely there are other just as
likely suspects as well.


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Oops in iput

2001-06-26 Thread Ville Herva

On Mon, Jun 25, 2001 at 07:42:13PM +0100, you [Stephen C. Tweedie] claimed:
> Hi,
> 
> On Mon, Jun 25, 2001 at 08:16:12PM +0200, Florian Lohoff wrote:
> > 
> > oops in iput - Kernel 2.2.19/i386 + ide-udma patches + ext3 patches (0.0.7a)
> 
> The ide-udma patches for 2.2 haven't had nearly the testing of the 2.4
> ones, and simply can't be trusted as a baseline for debugging other
> code.  Can you reproduce this problem without them applied?  The oops
> here is a networking oops on the face of it, and I wouldn't expect to
> see that on 2.2 unless something was corrupting memory.

Well, I for one use the 2.2 ide patches extensively (on almost all of my
machines, including a heavy-duty backup server), and haven't seen any
problems whatsoever. I see _much_ more problems with scsi (aic7xxx), for
example.

I don't mean to say the ide patches are 100% bug free, but I wouldn't
consider them as the prime suspect for an oops that happened elsewhere
either. It could be hw or any other part of kernel just as well... What
about memtest86?


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Access beyond end of dev with FAT on 2.4.4ac17

2001-06-25 Thread Ville Herva

It seems that all updatedb processes hang when accessing my 2GB fat partition.
The kernel spits these:

Jun 24 04:02:29 terminator kernel: Filesystem panic (dev 08:01).
Jun 24 04:02:29 terminator kernel:   FAT error
Jun 24 04:02:29 terminator kernel: Directory 889834: bad FAT
Jun 24 04:02:32 terminator kernel: Filesystem panic (dev 08:01).
Jun 24 04:02:32 terminator kernel:   FAT error
Jun 24 04:02:32 terminator kernel: Directory 889836: bad FAT
Jun 24 04:03:08 terminator kernel: Filesystem panic (dev 08:01).
Jun 24 04:03:08 terminator kernel:   FAT error
Jun 24 04:03:08 terminator kernel: Directory 889864: bad FAT
Jun 24 04:03:26 terminator kernel: Filesystem panic (dev 08:01).
Jun 24 04:03:26 terminator kernel:   FAT error
Jun 24 04:03:26 terminator kernel: Directory 889884: bad FAT
Jun 24 04:03:28 terminator kernel: attempt to access beyond end of device
Jun 24 04:03:28 terminator kernel: 08:01: rw=0, want=2095531, limit=2048256
Jun 24 04:03:28 terminator kernel: attempt to access beyond end of device
Jun 24 04:03:28 terminator kernel: 08:01: rw=0, want=2095531, limit=2048256
Jun 24 04:03:28 terminator kernel: attempt to access beyond end of device
Jun 24 04:03:28 terminator kernel: 08:01: rw=0, want=2095532, limit=2048256
Jun 24 04:03:28 terminator kernel: attempt to access beyond end of device
Jun 24 04:03:28 terminator kernel: 08:01: rw=0, want=2095532, limit=2048256
Jun 24 04:03:28 terminator kernel: attempt to access beyond end of device
Jun 24 04:03:28 terminator kernel: 08:01: rw=0, want=2095533, limit=2048256
Jun 24 04:03:28 terminator kernel: attempt to access beyond end of device

This happened once or twice with an NT fat partition. Then I thought
something (possibly hosed scsi termination) had screwed the partition for
good, and wiped the fs. I remade it with mkfs.msdos (fat16, check for bad
blocks), copied 1.5GB stuff over - and the next night I got the same errors
from updatedb. I didn't boot the machine in the middle, nevermind running NT.

The ext2 fs on the same disk is (fortunately) 100% ok although it's used a
lot more.

   Device BootStart   EndBlocks   Id  System
/dev/sda1   * 1   255   20482566  FAT16
/dev/sda2   256  2100  14819962+  83  Linux
/dev/sda3  2101  2213907672+  82  Linux swap

This is 18GB Seagate, Adaptec 2940 UW Pro scsi, 400PII.

Is the fs now hosed? (Updatedb should only read the disk, but it is
possible that copying the data screwed the fs). What can I do to help
debugging the problem?

 
-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: random errors with bzip2

2001-06-20 Thread Ville Herva

On Tue, Jun 19, 2001 at 06:11:48PM +0200, you [André Dahlqvist] claimed:
> Rodrigo Ventura <[EMAIL PROTECTED]> wrote:
> 
> > - it could be a memory problem, but if it were, lots of kernel
> > oops were expected, right?
> 
> This certainly sounds like a memory problem. I experienced almost the same
> behaviour with a box some years ago, and it turned out to be memory. The
> kernel didn't oops, and I actually had to run several kernel compiles at
> the same time to have gcc die.
> 
> Try memtest86 on the suspect box.

Seconded.

Exactly the same symptoms (bzip2); the culprit turned out to be memory.

That's when I wrote memburn (http://v.iki.fi/~vherva/memburn.c) for quick
testing without a boot (it did find the problem) and I then verified the
problem with memtest86 (http://reality.sgi.com/cbrady_denver/memtest86/).
You do have to run either for hours, propably for days to be sure. 

The box has now ran perfectly for a year or so with the BadRam patch from
Rick van Rein (http://rick.vanrein.org/linux/badram/).


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: initrd oops with 2.4.5ac2: megaraid OOPSes

2001-05-28 Thread Ville Herva

On Mon, May 28, 2001 at 01:28:31PM +0300, you [Ville Herva] claimed:
> 
> The other OOPS (http://v.iki.fi/~vherva/tmp/bootlog.grub and
> http://v.iki.fi/~vherva/tmp/ksymoops-grub) still remains: 

That one appears to be because it couldn't find the initrd (incorrect boot
param, my fault). Should it oops then? 2.4.2-2 definetely doesn't. With the
correct initrd param 2.4.4 boots ok, but 2.4.5ac2 still OOPSes. Now it only
does it later, during scsi adapter probe:

ksymoops 2.4.1 on i686 2.4.5-ac2.  Options used
 -v ./vmlinux (specified)
 -K (specified)
 -l /proc/modules (default)
 -o /lib/modules/2.4.5-ac2/ (default)
 -m ./System.map (specified)

No modules in ksyms, skipping objects
No ksyms, skipping lsmod
Unable to handle kernel NULL pointer dereference at virtual address 004f
c01c02bf
*pde = 
Oops: 0002
CPU:0
EIP:0010:[]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246
eax:    ebx: f7ef007c   ecx: 0001   edx: c1eec400
esi: 000f   edi: f7ef8aac   ebp: f880   esp: cfb2bee8
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 1, stackpage=cfb2b000)
Stack:  c0114966 c1eec400  0001 0f001667 1681
003f
   9060 101e 1667 f7ef 1960 8086 
10cd7b40
   3344103c   c02d7b40 c0267aa0 c01c05ce c0267aa0
8086
Call Trace: [] [] [] [] []
   [] [] [] [] []
[]
   [] []
Code: 89 68 50 8b 83 dc 02 00 00 c6 40 54 10 8b 8b dc 02 00 00 0f

>>EIP; c01c02bf<=
Trace; c0114966 <__call_console_drivers+46/60>
Trace; c01c05ce 
Trace; c01af674 
Trace; c0105000 
Trace; c01af024 
Trace; c0105000 
Trace; c014786f 
Trace; c0147bb0 
Trace; c0105000 
Trace; c0105209 
Trace; c0105000 
Trace; c01056a6 
Trace; c0105200 
Code;  c01c02bf 
 <_EIP>:
Code;  c01c02bf<=
   0:   89 68 50  mov%ebp,0x50(%eax)   <=
Code;  c01c02c2 
   3:   8b 83 dc 02 00 00 mov0x2dc(%ebx),%eax
Code;  c01c02c8 
   9:   c6 40 54 10   movb   $0x10,0x54(%eax)
Code;  c01c02cc 
   d:   8b 8b dc 02 00 00 mov0x2dc(%ebx),%ecx
Code;  c01c02d2 
  13:   0f 00 00  sldt   (%eax)


bootlog:
http://v.iki.fi/~vherva/tmp/bootlog.sym53
ksymoops:
http://v.iki.fi/~vherva/tmp/ksymoops-sym53

I have sym53c875 and megaraid in this box. If the megaraid scsi bios is
disabled, it boots ok (from the sym53c875), if I enable the bios and boot
from megaraid, I get the oops above.


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: initrd oops with 2.4.5ac2: Tthe other oops remains (one fixed)

2001-05-28 Thread Ville Herva

On Mon, May 28, 2001 at 01:05:07PM +0300, you [Ville Herva] claimed:
> On Mon, May 28, 2001 at 06:02:54PM +0900, you [Masaru Kawashima] claimed:
> > On Mon, 28 May 2001 10:25:51 +0300
> > Ville Herva <[EMAIL PROTECTED]> wrote:
> > > The oops call trace seems to be the same as in 
> > > 
> > > http://marc.theaimsgroup.com/?l=linux-kernel&m=99079948404775&w=2
> > > 
> > > Any ideas?
> > 
> > Did you try the patch posted by Go Taniguchi <[EMAIL PROTECTED]>?
> > Following is the copy of his message and the patch itself.
> > 
> > --- linux/fs/block_dev.c.orig   Mon May 28 12:40:12 2001
> > +++ linux/fs/block_dev.cMon May 28 12:40:12 2001
> > @@ -602,6 +602,7 @@
> > if (!bdev->bd_op->ioctl)
> > return -EINVAL;
> > inode_fake.i_rdev=rdev;
> > +   inode_fake.i_bdev=bdev;
> > init_waitqueue_head(&inode_fake.i_wait);
> > set_fs(KERNEL_DS);
> > res = bdev->bd_op->ioctl(&inode_fake, NULL, cmd, arg);
> 
> Yes, I actually spotted the patch on l-k just a while ago and tried it.
> 
> It does fix the initrd case; I haven't tried the grub case, but I suspect it
> still remains. Will try that as well asap.

The other OOPS (http://v.iki.fi/~vherva/tmp/bootlog.grub and
http://v.iki.fi/~vherva/tmp/ksymoops-grub) still remains: 

>>EIP; c014259a<=
Trace; c0142995 
Trace; c012a492 <__alloc_pages+62/230>
Trace; c0145c60 
Trace; c0145d4c 
Trace; c01343e2 
Trace; c01348ff 
Trace; c0105000 
Trace; c0100197 
Code;  c014259a 
 <_EIP>:
Code;  c014259a<=
   0:   39 7e 20  cmp%edi,0x20(%esi)   <=
Code;  c014259d 
   3:   75 f1 jnefff6 <_EIP+0xfff6> c0142590

Code;  c014259f 
   5:   8b 44 24 14   mov0x14(%esp,1),%eax
Code;  c01425a3 
   9:   39 86 90 00 00 00 cmp%eax,0x90(%esi)
Code;  c01425a9 
   f:   75 e5 jnefff6 <_EIP+0xfff6> c0142590

Code;  c01425ab 
  11:   8b 54 24 00   mov0x0(%esp,1),%edx


2.4.2-2 (redhat) is fine; 2.4.4 vanilla oopses after probing PCI hardware
(so it goes little further than 2.4.5ac2), and 2.4.5ac2 oopses after making
page-cache hash table.

 
-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: initrd oops with 2.4.5ac2: FIXED by Kawashima (the other oops may remain)

2001-05-28 Thread Ville Herva

On Mon, May 28, 2001 at 06:02:54PM +0900, you [Masaru Kawashima] claimed:
> On Mon, 28 May 2001 10:25:51 +0300
> Ville Herva <[EMAIL PROTECTED]> wrote:
> > The oops call trace seems to be the same as in 
> > 
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=99079948404775&w=2
> > 
> > Any ideas?
> 
> Did you try the patch posted by Go Taniguchi <[EMAIL PROTECTED]>?
> Following is the copy of his message and the patch itself.
> 
> --- linux/fs/block_dev.c.orig Mon May 28 12:40:12 2001
> +++ linux/fs/block_dev.c  Mon May 28 12:40:12 2001
> @@ -602,6 +602,7 @@
>   if (!bdev->bd_op->ioctl)
>   return -EINVAL;
>   inode_fake.i_rdev=rdev;
> + inode_fake.i_bdev=bdev;
>   init_waitqueue_head(&inode_fake.i_wait);
>   set_fs(KERNEL_DS);
>   res = bdev->bd_op->ioctl(&inode_fake, NULL, cmd, arg);

Yes, I actually spotted the patch on l-k just a while ago and tried it.

It does fix the initrd case; I haven't tried the grub case, but I suspect it
still remains. Will try that as well asap.

Thanks,


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: initrd oops; still happens with 2.4.5ac2

2001-05-28 Thread Ville Herva

On Mon, May 28, 2001 at 12:12:20AM +0300, you [Ville Herva] claimed:
> On Sun, May 27, 2001 at 07:26:50PM +0300, you [Ville Herva] claimed:
> > 
> > I have a reproducible oops on 2.4.4ac17 at initrd unmount (see
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=99079948404775&w=2 for
> > details) that seems to be related:
> 
> Ok, some more info: 
> 
> 2.4.2-2 (redhat)   BOOTS OK 
> 2.4.4ac17  OOPS 
> 2.4.4ac17+av   OOPS 
> 2.4.5  OOPS 
> 2.4.5ac1+avOOPS 
> 2.4.4  BOOTS OK 
> 2.4.4ac9   BOOTS OK 
> 2.4.4ac10  BOOTS OK 
> 2.4.4ac11  BOOTS OK 
> 2.4.4ac12  fails to mount root ("Checking root filesystem.  
>  /dev/sdb is mounted.") 
> 2.4.4ac14  fails to mount root  
> 2.4.4ac15  OOPS 

2.4.5ac2 OOPS

The oops call trace seems to be the same as in 

http://marc.theaimsgroup.com/?l=linux-kernel&m=99079948404775&w=2

Any ideas?


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



initrd oops [was Re: Linux 2.4.5-ac1]

2001-05-27 Thread Ville Herva

On Sun, May 27, 2001 at 07:26:50PM +0300, you [Ville Herva] claimed:
> On Sat, May 26, 2001 at 10:58:25PM +0100, you [Alan Cox] claimed:
> >
> > o   Free the initial ramdisk correctly
> 
> Who made this fix, or who can I contact? 
> 
> I have a reproducible oops on 2.4.4ac17 (see
> http://marc.theaimsgroup.com/?l=linux-kernel&m=99079948404775&w=2 for
> details) that seems to be related:

Ok, some more info: 

2.4.2-2 (redhat)   BOOTS OK 
2.4.4ac17  OOPS 
2.4.4ac17+av   OOPS 
2.4.5  OOPS 
2.4.5ca1+avOOPS 
2.4.4  BOOTS OK 
2.4.4ac9   BOOTS OK 
2.4.4ac10  BOOTS OK 
2.4.4ac11  BOOTS OK 
2.4.4ac12  fails to mount root ("Checking root filesystem.  
 /dev/sdb is mounted.") 
2.4.4ac14  fails to mount root  
2.4.4ac15  OOPS 

This is:
600Mhz Xeon 
256MB   
megaraid RAID and sym53c875 (from which I boot) 
gcc-2.96-85 (tried .91 as well) 


So I gather the ac12 and ac15 Linux tree / av merges are the culprit?   


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4-ac17-2.4.5-ac1 oops in swapper process

2001-05-27 Thread Ville Herva

On Sun, May 27, 2001 at 02:18:57PM -0400, you [[EMAIL PROTECTED]] claimed:
> All,
> 
> I have been getting a oops ever since 2.4.4-ac17 right after the kernel loads
> the sym53c895 driver. I hand copied part of the oops before rebooting.  This
> happens in every kernel since 2.4.4-ac17.  I have changed my compiler from
> gcc-2.96 to egcs-1.12, thinking that the Mandrake gcc was bad. I still see
> the same problems at the exact same point even after recompilation.
> 
> ce at virtual address 0296
> print eip: c017f5d6
> *pde: 
> Oops 
> CPU: 0
> 
> EIP: 0010:[]
> eflags: 0010202
> eax: 0286 ebx: 1261 ecx:  edx: dfff3d74
> csi:  edi: dfe66da0 ebp: dfe 63160 esp: dfff3d54 
> ds: 0018 es: 0018 ss: 0018
> process swapper pid 1, stackpage=dfff3000

I think I'm seeing the same; see
http://marc.theaimsgroup.com/?l=linux-kernel&m=99079948404775&w=2
 
Hmm, I have sym53c895 as well, but I thought this was initrd related. 

> thats all I got, I can try again and copy down any other information from that
> oops that may be useful.   Should I copy the whole thing and then put it
> through ksymoops? 

Definetely.



-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Please help me fill in the blanks.

2001-05-27 Thread Ville Herva

> > * Dynamic Memory Resilience
> 
> RAM fault tolerance?  There was a patch a long time ago which detected
> bad ram, and would mark those memory clusters as unuseable at boot. 
> However that is clearly not dynamic.

If you are referring to Badram patch by Rick van Rein
(http://rick.vanrein.org/linux/badram/), it doesn't detect the bad ram,
memtest86 does that part (and does it well) -- you enter then enter the
badram clusters as boot param. But I have to say badram patch works
marvellously (thanks, Rick.) Shame it didn't find its way to standard
kernel.

 
-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[OOPS] Re: Linux 2.4.5-ac1

2001-05-27 Thread Ville Herva

On Sat, May 26, 2001 at 10:58:25PM +0100, you [Alan Cox] claimed:
>
> o Free the initial ramdisk correctly

Who made this fix, or who can I contact? 

I have a reproducible oops on 2.4.4ac17 (see
http://marc.theaimsgroup.com/?l=linux-kernel&m=99079948404775&w=2 for
details) that seems to be related:

(1)Unable to handle kernel NULL pointer
dereference at virtual address 0013

>>EIP; c02997f6 <__devices_1045+5a/a8>   <= 
Trace; c0136447
Trace; c017fec3
Trace; c0118769 <__run_task_queue+49/60>
Trace; c011b0f6 
Trace; c011b304
Trace; c011868c
Trace; c0118596
Trace; c011849b   
Trace; c010832f   
Trace; c0106f14 
Trace; c0199756 
Trace; c01330d6 
Trace; c0131447  
Trace; c0122cf5
Trace; c01416db
Trace; c0142b81   
Trace; c0136641   
Trace; c013484a  
Trace; c0134ab0
Trace; c0105000
Trace; c01177e6  
Trace; c0105000
Trace; c01051da   
Trace; c010520e 
Trace; c0105000
Trace; c01056a6
Trace; c0105200 
Code;  c02997f6 <__devices_1045+5a/a8>  
 <_EIP>:
Code;  c02997f6 <__devices_1045+5a/a8>   <= 
   0:   8b 40 10  mov0x10(%eax),%eax   <=   
Code;  c02997f9 <__devices_1045+5d/a8>  
   3:   83 f8 02  cmp$0x2,%eax  
Code;  c02997fc <__devices_1045+60/a8>  
   6:   7e 62 jle6a <_EIP+0x6a> c0299860
<__devices_104a 
+c/20>  
Code;  c02997fe <__devices_1045+62/a8>  
   8:   b8 f0 ff ff ffmov$0xfff0,%eax   
Code;  c0299803 <__devices_1045+67/a8>  
   d:   eb 74 jmp83 <_EIP+0x83> c0299879
<__devices_104b 
+5/18>  
Code;  c0299805 <__devices_1045+69/a8>  
   f:   85 c9 test   %ecx,%ecx  
Code;  c0299807 <__devices_1045+6b/a8>  
  11:   b8 ea ff 00 00mov$0xffea,%eax   

Judging from the stack trace, I suspect that after kill_super is called (and
hence, root device invalidated and freed), a function that that assumes a
valid rootdev (ioctl_by_bdev) is called. Rdev is NULL, and hence the null
ptr deref. Is this anywhere near the truth?


-- v --

[EMAIL PROTECTED]

PS: A this is a 600Mhz Xeon w/ 256MB RAM. Right after boot I did 
'diff -nauR linux-2.4.5 linux-2.4.4ac17' on redhat 2.4.2-2, and it never
finished:

root@machine:/poista>diff -nauR linux-2.4.5 linux-2.4.4ac17
zsh: terminated  diff -nauR linux-2.4.5 linux-2.4.4ac17
root@machine:/poista>free
 total   used   free sharedbuffers cached
Mem:255572 253140   2432  0   8332 215108
-/+ buffers/cache:  29700 225872
Swap:0  0  0
root@machine:/poista>uname -a
Linux machine 2.4.2-2 #1 Sun Apr 8 20:41:30 EDT 2001 i686 unknown
root@machine:/poista>dmesg | tail -2
Out of Memory: Killed process 804 (xfs).
Out of Memory: Killed process 15857 (diff).

So diff (which used very little RAM) filled the _cache_ and OOM rambo kicked
in. Pretty embarrassing to say at least...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel"

2.4.4ac17 + LVM-0.9.1beta7: Oops on unmount initrd

2001-05-25 Thread Ville Herva

Whenever I try to boot with root on LVM (using initrd), I get an oops. The
oops happens right after (trying to) unmount old (initrd) root. It also
happens when I run it with root=/dev/sdb (which is a plain ext2fs with no
LVM involved, other than the lvmcreate_initrd-generated initrd).

I patched 2.4.4ac17 with LVM-0.9.1beta7; (the generated patch applied clean
other than that I had to add int get_hardblocksize(kdev_t) {} to fs/buffer.c
(seems to have been removed somewhere between 2.4.1 and 2.4.4). I also tried
2.4.4ac17 vanilla and 2.4.4ac15 vanilla with the same results.

Which is the recommended kernel/LVM version to be used at this time?

Below is the oops and the ksymoops output (gathered by hand, please excuse
possible typos):

(...)
vgscan -- reading all physical volumes (this may take a while...)
vgscan -- found inactive volume group "root-stripe"
vgscan -- "/etc/lvmtab" and "/etc/lvmtab.d" succesfully created
vgscan -- WARNING: This program does not do a VGDA backup of your volume
group
vgchange -- volume group "root-stripe" successfully activated

VFS: Mounted root (ext2 filesystem).
Trying to unmount old root ... (1)Unable to handle kernel NULL pointer
dereferen
ce at virtual address 0013
  printing eip:
c01997f6
*pde = 
Oops: 
CPU:0
EIP:0010:[]
EFLAGS: 00010202
eax: 0003   ebx: 1261  ecx:    edx: cfff3d78
esi:    edi: c15d3160  ebp:    esp: cfff3d58
ds: 0018  es: 0018  ss:  0018
Process swapper (pid: 1, stackpage cfff3000)
Stack: cfff2000  c15d3160 c0136447 cfff3d78  1261 
   c0258528 c017fec3 cfff3d94 cfff3d94  c0228769 cfffa320 c02b5e6c
   c02b5e6c c011b0f6 0001 c011b304 0001  c02a0100 

Call trace: [] [] [] [] []
[] [] [] [] [] []
[] [] [] [] [] []
[] [] [] [] [] []
[] [] [] []

Code: 8b 40 10 83 f8 02 7e 62 b8 f0 ff ff ff eb 74 85 c9 b8 ea ff
 <0>Kernel panic: attempted to kill init!



ksymoops -o /lib/modules/2.4.4-ac17/ -m /usr/src/linux/System.map 
 -v ./vmlinux -K -L  < /root/herkules-oops.txt
ksymoops 2.4.1 on i686 2.4.2-2.  Options used
 -v ./vmlinux (specified)
 -K (specified)
 -L (specified)
 -o /lib/modules/2.4.4-ac17/ (specified)
 -m /usr/src/linux/System.map (specified)

No modules in ksyms, skipping objects
Trying to unmount old root ... (1)Unable to handle kernel NULL pointer
dereferen
c01997f6
*pde = 
Oops: 
CPU:0
EIP:0010:[]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010202
eax: 0003   ebx: 1261  ecx:    edx: cfff3d78
esi:    edi: c15d3160  ebp:    esp: cfff3d58
ds: 0018  es: 0018  ss:  0018
Stack: cfff2000  c15d3160 c0136447 cfff3d78  1261 
   c0258528 c017fec3 cfff3d94 cfff3d94  c0228769 cfffa320 c02b5e6c
   c02b5e6c c011b0f6 0001 c011b304 0001  c02a0100 
Call trace: [] [] [] [] []
[] [] [] [] [] []
[] [] [] [] [] []
[] [] [] [] [] []
[] [] [] []
Code: 8b 40 10 83 f8 02 7e 62 b8 f0 ff ff ff eb 74 85 c9 b8 ea ff

>>EIP; c02997f6 <__devices_1045+5a/a8>   <=
Trace; c0136447 
Trace; c017fec3 
Trace; c0118769 <__run_task_queue+49/60>
Trace; c011b0f6 
Trace; c011b304 
Trace; c011868c 
Trace; c0118596 
Trace; c011849b 
Trace; c010832f 
Trace; c0106f14 
Trace; c0199756 
Trace; c01330d6 
Trace; c0131447 
Trace; c0122cf5 
Trace; c01416db 
Trace; c0142b81 
Trace; c0136641 
Trace; c013484a 
Trace; c0134ab0 
Trace; c0105000 
Trace; c01177e6 
Trace; c0105000 
Trace; c01051da 
Trace; c010520e 
Trace; c0105000 
Trace; c01056a6 
Trace; c0105200 
Code;  c02997f6 <__devices_1045+5a/a8>
 <_EIP>:
Code;  c02997f6 <__devices_1045+5a/a8>   <=
   0:   8b 40 10  mov0x10(%eax),%eax   <=
Code;  c02997f9 <__devices_1045+5d/a8>
   3:   83 f8 02  cmp$0x2,%eax
Code;  c02997fc <__devices_1045+60/a8>
   6:   7e 62 jle6a <_EIP+0x6a> c0299860
<__devices_104a
+c/20>
Code;  c02997fe <__devices_1045+62/a8>
   8:   b8 f0 ff ff ffmov$0xfff0,%eax
Code;  c0299803 <__devices_1045+67/a8>
   d:   eb 74 jmp83 <_EIP+0x83> c0299879
<__devices_104b
+5/18>
Code;  c0299805 <__devices_1045+69/a8>
   f:   85 c9 test   %ecx,%ecx
Code;  c0299807 <__devices_1045+6b/a8>
  11:   b8 ea ff 00 00mov$0xffea,%eax


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: question regarding cpu selection

2001-04-29 Thread Ville Herva

On Sun, Apr 29, 2001 at 11:13:12PM +0200, you [Erik Mouw] claimed:
> On Sun, Apr 29, 2001 at 11:32:51PM +0300, Ville Herva wrote:
> > On Sun, Apr 29, 2001 at 09:28:48PM -0400, you [Duncan Gauld] claimed:
> > > I would supply a patch, but I don't know how to write such a thing :)
> > 
> > It seems Erik Mouw already submitted a patch, altough I agree that "Celeron
> > II" might be a better name for the thing than "Celeron (Coppermine)".
> 
> So what about this one? This time I had to change Configure.help and
> setup.c as well to reflect the changes in config.in :)

Hmm. I just checked Intel's web site (should've done so earlier), and it
appears that Intel still dubs the new revision as "Celeron", although I'm
sure it was introduced as CeleronII in some sources (try
http://www.google.com/search?q=CeleronII). So I'll just have admit that your
first patch was technically correct, and I was wrong. Sorry for the
inconvenience.

I still think "Celeron II" is clearer, but heaven knows what Intel will sell
by that name three years from now (just think of i860).


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: question regarding cpu selection

2001-04-29 Thread Ville Herva

On Sun, Apr 29, 2001 at 09:28:48PM -0400, you [Duncan Gauld] claimed:
> 
> compiling kernel 2.4.4 on mandrake 8.
> Just checked - no mention of Celeron II in there-
>Pentium Pro/Pentium II/Celeron
> is the only line mentioning the celeron; maybe the PIII line could be changed 
> to something like "Pentium III/Celeron II"?
> I would supply a patch, but I don't know how to write such a thing :)

It seems Erik Mouw already submitted a patch, altough I agree that "Celeron
II" might be a better name for the thing than "Celeron (Coppermine)".


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: question regarding cpu selection

2001-04-29 Thread Ville Herva

On Sun, Apr 29, 2001 at 02:56:08PM -0400, you [William Park] claimed:
> On Sun, Apr 29, 2001 at 07:07:51PM -0400, Duncan Gauld wrote:
> > Hi,
> > This seems a silly question but - I have an intel celeron 800mhz CPU and thus 
> > it is of the Coppermine breed. But under cpu selection when configuring the 
> > kernel, should I select PIII or PII/Celeron? Just wondering, since Coppermine 
> > is basically a newish PIII with 128K less cache...
> 
> Try both, and see if your machine throws up.

800Mhz Celeron is actually a CeleronII, and it does SSE just like PIII (the
only difference being cache). Therefore PIII option should work.

Perhaps this should be fixed in the config menu (or is it already? Which
kernel are you compiling?)


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: X15 alpha release: as fast as TUX but in user space

2001-04-28 Thread Ville Herva

On Sat, Apr 28, 2001 at 03:24:25PM +0200, you [Ingo Molnar] claimed:
> 
> On Sat, 28 Apr 2001, Ville Herva wrote:
> 
> > Uhh, perhaps I'm stupid, but why not cache the date field and update
> > the field once a five seconds? Or even once a second?
> 
> perhaps the best way would be to do this updating in the sending code
> itself.
> 
> first there would be a 'current time thread', which updates a global
> shared variable that shows the current time. (ie. no extra system-call is
> needed to access current time.) If the header-sending code detects that
> current time is not equal to the timestamp stored in the header itself,
> then the header is reconstructed. Pretty simple.

Yes, that's vaguely resembles what I had in mind. Of course I had no idea
about the data structures Tux or X15 use internally, so I couldn't think it
too thoroughly.


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: X15 alpha release: as fast as TUX but in user space

2001-04-28 Thread Ville Herva

On Sat, Apr 28, 2001 at 10:42:29AM +0200, you [Ingo Molnar] claimed:
> 
> per RFC 2616:
> .
> The Date general-header field represents the date and time at which the
> message was originated, [...]
> 
> Origin servers MUST include a Date header field in all responses, [...]
> .
> 
> i considered the caching of the Date field for TUX too, and avoided it
> exactly due to this issue, to not violate this 'MUST' item in the RFC. It
> can be reasonably expected from a web server to have a 1-second accurate
> Date: field.
> 
> the header-caching in X15 gives it an edge against TUX, obviously, but IMO
> it's a questionable practice.
> 
> if caching of headers was be allowed then we could the obvious trick of
> sendfile()ing complete web replies (first header, then body).

Uhh, perhaps I'm stupid, but why not cache the date field and update the
field once a five seconds? Or even once a second?

I mean, at the rate of thousands of requests per second that should give you
some advantage over dynamically generating it -- especially if that's the
only thing hindering copletely sendfile()'ing the answer.


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] SMP race in ext2 - metadata corruption.

2001-04-27 Thread Ville Herva

On Fri, Apr 27, 2001 at 09:23:57AM -0400, you [Alexander Viro] claimed:
> 
> 
> On Fri, 27 Apr 2001, Vojtech Pavlik wrote:
> 
> > Actually this is done quite often, even on mounted fs's:
> > 
> > hdparm -t /dev/hda
> 
> You would need either hdparm -t /dev/hda or mounting the
> whole /dev/hda.
> 
> Buffer cache for the disk is unrelated to buffer cache for parititions.

Well, I for one have been running

hdparm -t /dev/md0
or
time head -c 1000m /dev/md0 > /dev/null

while /dev/md0 was mounted without realizing that this could be "stupid" or
that it could eat my data.

/dev/md0 on /backup-versioned type ext2 (rw)

I often cat(1) or head(1) partitions or devices (even mounted ones) if I
need dummy randomish test data for compression or tape drives (that I've
been having trouble with). 

BTW: is 2.2 affected? 2.0? 


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [Fully-OT] Dual Athlon support in kernel

2001-04-24 Thread Ville Herva

On Tue, Apr 24, 2001 at 03:39:22AM -0700, you [Joseph Carter] claimed:
> 
> A warning about agcc, I've discovered that it does not always compile code
> quite the way you expect it.  This is unsurprising given it's based on
> pgcc which is known to change alignments on you in ways that sometimes
> break things subtly.

While people always bash pgcc, I've had pretty good experiences with it.
Mostly everything I've compiled with it has worked quite well - even a
2.0.34 kernel (which I compiled accidentally with pgcc) ran with no problems
for long times. Sometimes pgcc does give internal errors with highest
optimizations, though. 

Don't take me wrong: I'm not advocating using pgcc for any serious
production systems (nevermind any kernel stuff), but perhaps it shouldn't be
completely discarded for perfomance hungry stuff (where a miscompile won't
cause third World War). It does gain as much as 30% in some cases over older
gcc - I'm not sure how good the newest gcc's are, but oldish pgcc does beat
gcc-2.96 on stuff I tried. (Didn't gcc get a new IA32 backend some time ago?
How good is that?)

> I do not know if agcc actually can produce code which simply does not work
> as is reported with pgcc (I suspect the alignment differences account for
> many of those cases), but I recall reading in the past few days that agcc
> is not supported for compiling the kernel.

Any pgcc variant is quite bad idea for kernel stuff. There are known
problems (I think), and it doesn't even gain that much since kernel is
pretty much hand optimized anyway.

But if the instruction scheduler in the compiler knows about K7, I imagine
that could gain something. Perhaps it could use the preload instructions etc
as well? The again, I'm no kernel NOR compiler guru.
 
> It also fails to properly compile certain other programs, notably anything
> that includes asm functions.  As a result, my own experience suggests you
> consider agcc in the same class as gcc 3.0 at the moment - experimental.
> Hopefully the k7 optimizations that work well will find their way into a
> nice athlon subarch options in standard gcc and agcc won't be necessary.

Hope so. Unfortunately AMD doesn't seem to be doing all that much compiler
work (Intel has a whole compiler suite for Win, and did the beginnings of
pgcc, if I've not mistaken.)


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [Semi-OT] Dual Athlon support in kernel

2001-04-24 Thread Ville Herva

On Tue, Apr 24, 2001 at 03:33:00AM -0400, you [Tom Leete] claimed:
>
> The build problen with Athlon+SMP was solved by AA's patch. I had tested a
> similar patch on UP over 2.4.0-test and previous 2.4 releases with nary a
> problem.
> 
> This may be too experimental for your purposes, but FWIW I'm writing from a
> 2.4.4-pre3 built with gcc-2.97-20010205 using -march=athlon set by the k7
> config. I've been building kernels with that snapshot since the middle of
> Feb. With the current image, the box has locked up once in continuous use. I
> can't say what caused that one, no log survived. 

There's also AthlonLinux http://athlonlinux.org/ and AthlonGCC
http://athlonlinux.org/agcc/about.shtml, but I have no experience with those
(I have no Athlon ;( ).


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: kernel: VM

2001-04-23 Thread Ville Herva

On Mon, Apr 23, 2001 at 05:52:37AM +, you [Subba Rao] claimed:
> Hi,
> 
> I have seen several of these messages in my kernel log this morning. The system

What kernel version.

> responded to ping but won't allow me to login. What is VM? 

Virtual memory subsystem in Linux kernel.

> What causes these errors and how can I prevent it from happening again?
 
A VM bug.

If you are running older 2.2, upgrade to 2.2.19. The bug was fixed by Andrea
Arcangeli around 2.2.19pre something.

Also, please search the mailing list archives before asking:

http://marc.theaimsgroup.com/?l=linux-kernel&w=2&r=1&s=try+free+pages+2.2+failed&q=b



-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



2.0.39 stat/inode handling race? [2.0.39 oopses in sys_new(l)stat]

2001-04-06 Thread Ville Herva

On Thu, Apr 05, 2001 at 11:34:46PM +0200, you [David Weinehall] claimed:
> 
> I'll look into it. A note, however: the additional oops:es that follow
> the first one are almost never ever useful, because the system is no
> longer in a consistent state after the first one.

Apr  5 05:33:35 some kernel: general protection: 
Apr  5 05:33:36 some kernel: CPU:0
Apr  5 05:33:36 some kernel: EIP:0010:[__iget+60/544]
Apr  5 05:33:36 some kernel: EFLAGS: 00010292
Apr  5 05:33:36 some kernel: eax: 0341   ebx: 9a0004b6   ecx: 000203e5 edx: 
001c7658
Apr  5 05:33:36 some kernel: esi: 001ba164   edi:    ebp: 001c7658 esp: 
06436ef0
Apr  5 05:33:36 some kernel: ds: 0018   es: 0018   fs: 002b   gs: 002b   ss: 0018
Apr  5 05:33:36 some kernel: Process rsync (pid: 15624, process nr: 76, 
stackpage=06436000)
Apr  5 05:33:36 some kernel: Stack: 05144d00 07ff1418 0004 03070004 07ff1418 
00154f27 001c7658 000203e5
Apr  5 05:33:36 some kernel:0001 05144d00 06436f74 06436f74 0004 
0897eaf8 000203e5 0012ce12
Apr  5 05:33:36 some kernel:05144d00 03070004 0004 06436f74  
06436f74 06436fb4 bfffdb30
Apr  5 05:33:36 some kernel: Call Trace: [ext2_lookup+343/368]
[lookup+222/248] [_namei+90/228] [lnamei+48/72] [sys_newlstat+41/88]
[system_call+85/124]
Apr  5 05:33:36 some kernel: Code: 66 39 03 75 0d 8b 4c 24 1c 39 4b 04 0f 84 fa 00 00 
00 8b 5b

I'm trying to make sense of the oops. Looking at the __iget and ext2_lookup
source, (1) 000203e5 might be the inode number? Is (2) 0004 the length
of the filename? Looking at the System.map (0012cd34 T lookup), (3) seems to
be the return address to lookup+222, and (4) the return address to
ext2_lookup+343.

Stack: 05144d00 07ff1418 0004 03070004 07ff1418 00154f27 001c7658 000203e5
 (2)(4)   (1)
   0001 05144d00 06436f74 06436f74 0004 0897eaf8 000203e5 0012ce12
  (3)
   05144d00 03070004 0004 06436f74  06436f74 06436fb4 bfffdb30

Inode 132069 is (0x000203e5) is

  File: "./mnt/hdb/backup/backup-versioned/2001-03-10_05.23.01/dev/vcs6"
  Size: 0Filetype: Character Device
  Mode: (0620/crw--w) Uid: ( 1414/  vherva)  Gid: (5/
tty)
Device:  3,0   Inode: 132069Links: 30Device type:  7,0 
Access: Tue May  5 23:32:27 1998(01066.13:15:05)
Modify: Tue May  5 23:32:27 1998(01066.13:15:05)
Change: Fri Apr  6 05:28:24 2001(0.07:19:08)

At the time of the oops, to heavy stat() callers where running: a cron job
that rsyncs pretty much the whole / to a backup fs (where there are
mutually hardlinked daily snapshots of /) and a cvs checkout that checks out
~20MB of source and then builds it. Rsync runs on root fs and backup fs,
where as the cvs is running only on /. Perhaps there is some kind of race
for example in the inode cache handling? This is of course just a wild
guess... (Note that this is UP, though). It'd be curious, though, that the
race wouldn't have showed up before.

In addition to the instense and concurrent stat()'ing activity, one
noteworthy thing might be that on the backup fs, the inodes are likely to
have quite high link counts. I don't know what it could affect, though.

Any ideas?


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



2.0.39 oopses in sys_new(l)stat

2001-04-05 Thread Ville Herva

I wonder if there might still be a bug in 2.0.39 sys_new(l)stat. Today, one
of my trustworthy servers crashed (see details below), and it has actually
given me two slightly similar looking oopses before.

While this might be a hardware problem (I'll run memory test asap), it seems
that the oopses are quite similar and could perhaps be caused by a kernel
bug.

This is vanilla 2.0.39 (2.0.37 before), gcc-2.7.2.3, Ppro-200, Intel
motherboard etc. It has been very reliable in past. These oopses are the
_only_ problems. It runs qmail, samba, cvs, rsync, apache, pop, sshd and
oracle. All local fs's are plain ext2.

I hope somebody (with more kernel hacking experience than me) is still
interested in the 2.0.39. I'll be happy to provide any additional details or
try something. The oops will propably be hard to reproduce, however.


==
It began with this:

Apr  5 05:33:35 some kernel: general protection: 
Apr  5 05:33:36 some kernel: CPU:0
Apr  5 05:33:36 some kernel: EIP:0010:[__iget+60/544]
Apr  5 05:33:36 some kernel: EFLAGS: 00010292
Apr  5 05:33:36 some kernel: eax: 0341   ebx: 9a0004b6   ecx: 000203e5 edx: 
001c7658
Apr  5 05:33:36 some kernel: esi: 001ba164   edi:    ebp: 001c7658 esp: 
06436ef0
Apr  5 05:33:36 some kernel: ds: 0018   es: 0018   fs: 002b   gs: 002b   ss: 0018
Apr  5 05:33:36 some kernel: Process rsync (pid: 15624, process nr: 76, 
stackpage=06436000)
Apr  5 05:33:36 some kernel: Stack: 05144d00 07ff1418 0004 03070004 07ff1418 
00154f27 001c7658 000203e5
Apr  5 05:33:36 some kernel:0001 05144d00 06436f74 06436f74 0004 
0897eaf8 000203e5 0012ce12
Apr  5 05:33:36 some kernel:05144d00 03070004 0004 06436f74  
06436f74 06436fb4 bfffdb30
Apr  5 05:33:36 some kernel: Call Trace: [ext2_lookup+343/368]
 [lookup+222/248] [_namei+90/228] [lnamei+48/72] 
[sys_newlstat+41/88]
 [system_call+85/124]
Apr  5 05:33:36 some kernel: Code: 66 39 03 75 0d 8b 4c 24 1c 39 4b 04 0f 84 fa 00 00 
00 8b 5b
Apr  5 05:33:36 some kernel: general protection: 
Apr  5 05:33:36 some kernel: CPU:0
Apr  5 05:33:36 some kernel: EIP:0010:[__iget+60/544]
Apr  5 05:33:36 some kernel: EFLAGS: 00010292
Apr  5 05:33:36 some kernel: eax: 0341   ebx: 9a0004b6   ecx: 000203e5 edx: 
000203e5
Apr  5 05:33:36 some kernel: esi: 001ba164   edi:    ebp: 001c7658 esp: 
01083ef0
Apr  5 05:33:36 some kernel: ds: 0018   es: 0018   fs: 002b   gs: 002b   ss: 0018
Apr  5 05:33:36 some kernel: Process rsync (pid: 15278, process nr: 77, 
stackpage=01083000)
Apr  5 05:33:36 some kernel: Stack: 05144d00 01083f74 0004 09036004 00154e51 
00154e84 001c7658 000203e5
Apr  5 05:33:36 some kernel:0001 05144d00 01083f74 01083f74 0004 
05144d00 000203e5 0012ce12
Apr  5 05:33:36 some kernel:05144d00 09036004 0004 01083f74  
01083f74 01083fb4 b1f8
Apr  5 05:33:36 some kernel: Call Trace: [ext2_lookup+129/368]
 [ext2_lookup+180/368] [lookup+222/248] [_namei+90/228] 
[lnamei+48/72]
 [sys_newlstat+41/88] [system_call+85/124]
Apr  5 05:33:36 some kernel: Code: 66 39 03 75 0d 8b 4c 24 1c 39 4b 04 0f 84 fa 00 00 
00 8b 5b
Apr  5 05:33:37 some kernel: general protection: 
Apr  5 05:33:37 some kernel: CPU:0
Apr  5 05:33:37 some kernel: EIP:0010:[__iget+60/544]
Apr  5 05:33:37 some kernel: EFLAGS: 00010212
Apr  5 05:33:37 some kernel: eax: 0301   ebx: 6a000973   ecx: 01fbc598 edx: 
001c6b4c
Apr  5 05:33:37 some kernel: esi: 001b9d34   edi:    ebp: 001c6b4c esp: 
054b4ef0
Apr  5 05:33:37 some kernel: ds: 0018   es: 0018   fs: 002b   gs: 002b   ss: 0018
Apr  5 05:33:37 some kernel: Process cvs (pid: 15623, process nr: 70, 
stackpage=054b4000)
Apr  5 05:33:37 some kernel: Stack: 0191c400 01fbc598 0008 0320f000 01fbc598 
00154f27 001c6b4c 000c2b1f
Apr  5 05:33:37 some kernel:0001 0191c400 054b4f74 054b4f74 0008 
075f4cc0 000c2b1f 0012ce12
Apr  5 05:33:37 some kernel:0191c400 0320f000 0008 054b4f74  
054b4f74 054b4fb4 b670
Apr  5 05:33:37 some kernel: Call Trace: [ext2_lookup+343/368] [lookup+222/248] 
[_namei+90/228] [lnamei+48/72] [sys_newlstat+41/88] [system_call+85/124]
Apr  5 05:33:37 some kernel: Code: 66 39 03 75 0d 8b 4c 24 1c 39 4b 04 0f 84 fa 00 00 
00 8b 5b
Apr  5 05:33:43 some kernel: general protection: 
Apr  5 05:33:43 some kernel: CPU:0
Apr  5 05:33:43 some kernel: EIP:0010:[__iget+60/544]
Apr  5 05:33:43 some kernel: EFLAGS: 00010292
Apr  5 05:33:43 some kernel: eax: 1605   ebx: 9a000945   ecx: 098eb398 edx: 
001c7330
Apr  5 05:33:43 some kernel: esi: 001ba0ac   edi:    ebp: 001c7330 esp: 
01ce5ec4
Apr  5 05:33:43 some kernel: ds: 0018   es: 0018   fs: 002b   gs: 002b   ss: 0018
Apr  5 05:33:43 some kernel: Process smbd (pid: 23077, process nr: 72, 
stackpage=01ce50

Re: Hang when using loop device

2001-03-20 Thread Ville Herva

On Wed, Mar 21, 2001 at 12:16:05PM +0800, you [[EMAIL PROTECTED]] claimed:
> Hello all,
> 
>   Recently my ext2 partition out of space so I have made a regular file
> in the FAT32 partition and format it  as ext2 partiton and mount it as 
> loop device.However,occasionaly when I extract a large tar to the loop device..
> The computer will hang while extracting. I wonder if deadlock occur.
> I'm using kernel 2.4.1 now and there is no problem when I am using
> kernel 2.2.x kernel

Jens Axboe fixed this. The fix is merged in 2.4.2ac20 and 2.4.3pre6. The fix
will be in 2.4.3. Please search the mailing list archive before asking...


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [OT] how to catch HW fault

2001-03-19 Thread Ville Herva

On Mon, Mar 19, 2001 at 12:35:19PM +0200, you [Ville Herva] claimed:
> I quickly hacked up an user space memory tester, and sure enough it
> reported an error after five

If anyone is interested in the said hack (some already mailed me that they
are), I made it available at

http://v.iki.fi/~vherva/memburn.c

Disclaimer: it's just a quick hack, please use memtest86 if possible.
Memburn does have one found memory error under its belt, though ;).


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [OT] how to catch HW fault

2001-03-19 Thread Ville Herva

On Sun, Mar 18, 2001 at 09:11:46PM +0100, you [kees] claimed:
> Hi,
> 
> I tried memtest86 for 24 hours also and that didn't gave a clue. When bad
> ram was really involved I'd expected to find things like:
> failing fsck's, failing kernel compiles and such. But none of them
> the system runs perfect if it doesn't freeze(lockup).
> 
> So yes, only the CPU's and the mobo are at question. What I was looking
> for was a tool like memtest86 but now for motherboards.

You really cannot say that bad memory is involved ONLY when fsck's fail and
kernel compiled fail. No way. 

Think about it: that failing bit can well be in a place that kernel never
touches, and gcc usually does not touch. Moreover the bit flip usually does
not happen every time; you have to stress the memory for hours and sometimes
use a specific bit pattern to trigger the problem.

I had one machine that compiled kernel just fine, and ran pretty smoothly
overall, but experienced weird hickups like dying apps, failing oracle
install etc. Not too much though, I was attributing them to buggy software.
Then I tried to take a large backup. Bzip failed (internal error) one third
of the time, and once produced a different result. I quickly hacked up an
user space memory tester, and sure enough it reported an error after five
hours. (The machine was already in production, so I couldn't just take it
down and launch memtest86.) I verified the problem with memtest86 during the
next night, and applied the marvellous badmem patch to the kernel. After
marking the problematic place unuable, all problem disappeared. I just lost
2MB out of 256.

What I learned is that spotting meory error can be difficult, and the
symptoms can be stealthy.


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [OT] how to catch HW fault

2001-03-17 Thread Ville Herva

On Sat, Mar 17, 2001 at 01:22:46PM -0500, you [Aaron Lunansky] claimed:
> Sounds like the only thing you haven't swapped out of your machine is the
> ram/cpu.
> 
> It could very well be your ram (I don't suspect the cpu). If you can, try a
> different stick of ram.

Or try memtest86 (http://reality.sgi.com/cbrady_denver/memtest86/) it's a
very good memory tester. My first option if I suspect a hardware fault.


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: cdfs

2001-03-13 Thread Ville Herva

On Tue, Mar 13, 2001 at 08:45:07PM +0100, you [J . A . Magallon] claimed:
> 
> On 03.13 Ville Herva wrote:
> > 
> > Below is one response to a similar question from the l-k archive:
> > 
> > From: David Balazic <[EMAIL PROTECTED]>
> > Date: Thu, 13 Jan 2000 12:08:39 -0800 
> > Subject: Re: CD-ROM Driver Design
> > 
> > There are already two file-systems for CD-audio on Linux :
> > - cdfs at
> > http://www.elis.rug.ac.be/~ronsse/cdfs/
> > - audiofs at
> > http://fly.cc.fer.hr/~ptolomei/audiofs/
> > 
> > Are you sure there is a need for a third one ? The audiofs uses the
> 
> Oh, NO. All my searchs give no result, thanks to all the people who
> answered.

Just to clarify: also the "Are you sure there..." part was from the David
Balazic's mail. I should have quoted more clearly.


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: cdfs

2001-03-13 Thread Ville Herva

On Tue, Mar 13, 2001 at 04:23:41PM +0100, you [J . A . Magallon] claimed:
> Hi, 
> 
> Recently I read the BeOS www page, and answerd a question in other mailing
> list. Both things have remind me of a pretty file system: 'cdfs'.
> 
> Anybody knows if there is a port of 'cdfs' (Audio CD File System) for Linux ?
> Which fs now in kernel would be good as a template to start ?
> I am always looking for something enough easy to start kernel programming,
> and this could be a nice start (look, throw away all your ripping soft
> and just do a 'cp').

Below is one response to a similar question from the l-k archive:

From: David Balazic <[EMAIL PROTECTED]>
Date: Thu, 13 Jan 2000 12:08:39 -0800 
Subject: Re: CD-ROM Driver Design

There are already two file-systems for CD-audio on Linux :
- cdfs at
http://www.elis.rug.ac.be/~ronsse/cdfs/
- audiofs at
http://fly.cc.fer.hr/~ptolomei/audiofs/

Are you sure there is a need for a third one ? The audiofs uses the
CDROMREADAUDIO for reading the data and uses the page-cache for caching. I
personally added the page-cache code , but I don't believe it makes a lot
of sense, because when ripping audio, you read data sequentially , so the
cache just eats all free RAM ( possibly throwing out other more usefull
cached data ) and gives almost no gain. By the way , when there is a
"normal" FS on a "normal" block device, does the data get cached twice,
once in buffer-cache and once in page-cache ?
David Balazic


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Posible bug in gcc

2001-02-27 Thread Ville Herva

On Mon, Feb 26, 2001 at 01:02:45PM -0500, you [Richard B. Johnson] claimed:
> 
> Script started on Mon Feb 26 12:54:20 2001
> # gcc -o xxx bug.c
> # ./xxx
> Correct output: 5 2
> GCC output:  5 2
> # gcc --version
> egcs-2.91.66
> # gcc -O2 -o xxx bug.c
> # ./xxx
> Correct output: 5 2
> GCC output:  10 5
> # exit
> exit

Funny:

vherva@babbage:/tmp>/usr/bin/gcc  c.c -o c; ./c ; /usr/bin/gcc --version 
Correct output: 5 2
GCC output:  5 2
2.96
vherva@babbage:/tmp>/usr/bin/gcc -O2 c.c -o c; ./c ; /usr/bin/gcc --version
Correct output: 5 2
GCC output:  10 5
2.96
vherva@babbage:/tmp>/usr/bin/gcc -O6 c.c -o c; ./c ; /usr/bin/gcc --version
Correct output: 5 2
GCC output:  10 5
2.96
vherva@babbage:/tmp>rpm -q gcc
gcc-2.96-74

vherva@babbage:/tmp>kgcc c.c  -o c; ./c ; kgcc --version 
Correct output: 5 2
GCC output:  5 2
egcs-2.91.66
vherva@babbage:/tmp>kgcc c.c -O2 -o c; ./c ; kgcc --version
Correct output: 5 2
GCC output:  10 5
egcs-2.91.66
vherva@babbage:/tmp>kgcc c.c -O6 -o c; ./c ; kgcc --version
Correct output: 5 2
GCC output:  10 5
egcs-2.91.66
vherva@babbage:/tmp>rpm -q kgcc
kgcc-1.1.2-40

vherva@babbage:/tmp>/usr/local/bin/gcc c.c -o c; ./c ;/usr/local/bin/gcc --version
Correct output: 5 2
GCC output:  5 2
pgcc-2.95.1
vherva@babbage:/tmp>/usr/local/bin/gcc c.c -O2 -o c; ./c ;/usr/local/bin/gcc --version
Correct output: 5 2
GCC output:  5 2
pgcc-2.95.1
vherva@babbage:/tmp>/usr/local/bin/gcc c.c -O6 -o c; ./c ;/usr/local/bin/gcc --version 
Correct output: 5 2
GCC output:  5 2
pgcc-2.95.1

I guess pgcc is not that buggy EVERY time. (Sorry for the off topic post, I
couldn't resist.)


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.1ac17 hang on mounting loopback fs

2001-02-17 Thread Ville Herva

On Sat, Feb 17, 2001 at 12:42:42PM -0800, you [Nate Eldredge] claimed:
> Alan Cox writes:
>  > > # mount -t ext2 -o loop /spare/i486-linuxaout.img /spare/mnt
>  > > loop: enabling 8 loop devices
>  > 
>  > Loop does not currently work in 2.4. It might partly work by luck
>  > but thats it.  This will change as and when the new loop patches go
>  > in. Until then if you need loop use 2.2
> 
> I see.  Thank you.  I can live without it until then.
> 
> Btw, I applied Jens Axboe's loop-3 patch as suggested by Ville Herva.
> It applied with some fuzz and offset.  However, when I booted it, the
> kernel oopsed when I tried to mount the first ordinary ext2 partition
> (no loopback involved).  I can post the oops if anyone cares, but I
> presume that loop-3 and 2.4.1ac17 are just incompatible.

I'm not sure if it'll apply any more cleanly (and work), but the newest is
loop-4 at

ftp://ftp.kernel.org/pub/linux/kernel/people/axboe/patches/2.4.2-pre1

(It should work with 2.4.1pre1, at least).


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.1-ac16 - Loopback device seems broken

2001-02-17 Thread Ville Herva

On Sat, Feb 17, 2001 at 08:25:58AM -, you [Ole  André  Vadla  Ravnås ] claimed:
   
> I don't know if this is broken in 2.4.1-ac17 and  
> 2.4.2-pre4, but, what happens when mounting a filesystem  
> using the loopback device is that the process 'dies' in some  
> way and there's no way I can kill it. 
> This is what I did:   
> mount /test-ext2-image.img /mnt/testimage -o loop,rw -t ext2  
> And after that there's no way I can get the process killed... 

Known problem.  

Go  

ftp://ftp.kernel.org/pub/linux/people/axboe 

and take the latest loop-? patch from there. Let Jens Axboe know if it works
or if you still have problems. Particularry, if you can reliably reproduce  
the problem you referred.

I hear the patch should get merged to 2.4.1ac soon. 

> Please CC replies to this email-address:  
> [EMAIL PROTECTED] 
> As I'm not currently subscribed to the linux kernel mailing-list. :-) 

And, your address [EMAIL PROTECTED] does not work. I'm kinda
puzzled about how you expect people to reply you. With smoke signals?
   
> Ole AndréFå deg en gratis webmail fra Hesbynett!  
> http://diamondhead.hesbynett.no   

Uh, yeah.


-- v -- 

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Aic7xxx troubles with 2.4.1ac6

2001-02-16 Thread Ville Herva

On Thu, Feb 15, 2001 at 02:11:55PM +0200, you [Ville Herva] claimed:
> On Thu, Feb 15, 2001 at 01:22:31PM +0200, you [Ville Herva] claimed:
> > On Thu, Feb 15, 2001 at 06:08:12AM -0500, you [Doug Ledford] claimed:
> > > 
> > > There was a new aic7xxx driver (version 5.2.3) that went into the 2.4.1ac
> > > kernel series around 2.4.1-ac7.  I would be curious to know if it worked on
> > > your machine properly.
> > 
> > Ok. Will try. 
> 
> Tried 2.4.1ac13 vanilla. Still a no-go:

Hmm. I think we finally found a fully functional cable.

2.4.1.ac13 vanilla now seems to work flawlessly, even at 160MB/s. Pretty
weird, though, that 2.4.1ac13+Gibbs's aic7xxx worked even with the previous
cable.


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Aic7xxx troubles with 2.4.1ac6

2001-02-15 Thread Ville Herva

On Thu, Feb 15, 2001 at 01:22:31PM +0200, you [Ville Herva] claimed:
> On Thu, Feb 15, 2001 at 06:08:12AM -0500, you [Doug Ledford] claimed:
> > 
> > There was a new aic7xxx driver (version 5.2.3) that went into the 2.4.1ac
> > kernel series around 2.4.1-ac7.  I would be curious to know if it worked on
> > your machine properly.
> 
> Ok. Will try. 

Tried 2.4.1ac13 vanilla. Still a no-go:

SCSI host 0 abort (pid 0) timed out - resetting
SCSI bus is neing reset for host 0 channel 0
(scsi0:0:0:0) Synchronous at 40.0MBytes/s offset 31
SCSI host 0 abort (pid 0) timed out - trying harder
SCSI bus is being reset for host 0 channel 0
(scsi0:0:0:0) Synchronous at 40.0MBytes/s offset 31
scsi: aborting command due to timeout pid 0 scsi 0 channel 0 id 0 lun 0
Read (10) 00 00 00 00 00 00 00 02 00
SCSI SIGI 0x14 SEDADDR 0x77 SSTAT 0x0 SSTAT 0x2 SG_CACHEPTR 0x6 SSTAT2 0xC0 ST 0x0F0

(copied by hand, so please excuse the typos.)

Although 2.4.1ac13+Gibbs's aic7xxx seems to work perfectly, I still
wouldn't count out the possibility a hardware fault of some kind, since the
box already begun failing to find the boot record at 80MB/sec as well.


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Aic7xxx troubles with 2.4.1ac6

2001-02-15 Thread Ville Herva

On Thu, Feb 15, 2001 at 06:08:12AM -0500, you [Doug Ledford] claimed:
> 
> There was a new aic7xxx driver (version 5.2.3) that went into the 2.4.1ac
> kernel series around 2.4.1-ac7.  I would be curious to know if it worked on
> your machine properly.

Ok. Will try. 

Are there any changes that could affect?


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Aic7xxx troubles with 2.4.1ac6

2001-02-15 Thread Ville Herva

On Thu, Feb 08, 2001 at 06:16:01PM +0200, you [Ville Herva] claimed:
> On Thu, Feb 08, 2001 at 07:53:55AM -0500, you [Doug Ledford] claimed:
> > Ville Herva wrote:
> > > 
> > > It looks like ac6 (which I believe includes the patch you posted) is
> > > still a no-go with 7892. The boot halts and it just prints this once a
> > > second:
> > > 
> > > (SCSI0:0:3:1) Synchronous at 160 Mbyte/sec offset 31
> > > (SCSI0:0:3:1) CRC error during data in phase
> > > (SCSI0:0:3:1)   CRC error in intermediate CRC packet
> > 
> > Check your cables, especially the connector on the card and the drive.  Look
> > for any possible bent pins.  The message you are seeing is *usually*, but not
> > always, a legitimate data corruption issue.  It doesn't show up under the
> > 5.2.1 driver because it limits your Quantum drive to 80MByte/s and that
> > particular speed doesn't include CRC checking.  On this driver you have to be
> > running at 160MByte/s before CRC checking is enabled.
> 
> I checked the cables. I think HP didn't supply proper 160 MB/S capable
> cables (aren't those the ones with wattlings?). When I forced the drive to
> 80MB/s from bios, not only did aic7xxx/ac6 work like charm, but the BIOS
> also found the "missing" MBR. Stupid problem ;).

Umm, I think I said that too early. I begun to have problem even during
boot; the scsi bios did recognize the drive, but the bios didn't find the
boot record. This was completely cured by forcing the drive to 80MB/s mode.
So I think the cable wasn't Ultra160 capable.

However, the 2.4.1ac6, 2.4.1ac2 and 2.19pre6 aic7xxx.c still had trouble
with the drive. I went back to 80MB/s, 40MB/s and even 20MB/s, but that
still didn't help. 2.4.1* reported time out while waiting for a command and
would go into an endless loop resetting the bus. 2.2.19pre6 said there was
an error during the data in phase, but after some coughing it booted up and
seemed to work quite alright.

NT4 booted up without and visible problems.

The HP service guy changed the motherboard (integrated scsi) the cable (to
another (80MB/s one), and the drive logics, but that didn't help.

The problems first started after the motherboard was first changed (due to
separate problem.) The new one had newer bios and scsi bios.

Anyhow, I just compiled 2.4.1ac13 with Justin Gibbs's aic7xxx, and it does
not suffer of any problem at 80MB/s.


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: LDT allocated for cloned task!

2001-02-14 Thread Ville Herva

On Wed, Feb 14, 2001 at 03:12:29PM +0100, you [Gábor Lénárt] claimed:
> 
> xmms-avi uses DLL loader from wine too? 

AFAIK: yes.

> I mean does it use windows codecs
> to play AVIs? In this case, the dll loader set up some LDT settings and
> this casue that message. 

So this is a harmless message?

> However with our player - mplayer - it does not
> detected my myself (it can use DLLs to play DivX movies, as well).

Cool. 


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: LDT allocated for cloned task!

2001-02-14 Thread Ville Herva

On Tue, Feb 13, 2001 at 10:48:23AM -0800, you [Simon Kirby] claimed:
> On Tue, Feb 13, 2001 at 06:22:26PM +, Alan Cox wrote:
> 
> > > LDT allocated for cloned task!
> > > 
> > > I'm seeing this message come up fairly often while running vanilla
> > > 2.4.2-pre3 on my dual Celeron system.  I don't think I saw it before
> > > while running 2.4.1, but I may have just missed it.
> > 
> > Are you running wine or dosemu ?
> 
> Actually, I've ran both of them at least a few times this boot.
> 
> I think I've found what's doing it...xmms with the avi-xmms plugin will
> cause the message to appear at startup even without playing anything. 
> Moving the libraries out of the /usr/lib/xmms/Input directory and
> starting xmms again will not produce any message.  I only just recently
> downloaded this plugin which is probably why I didn't see it before.
> 
> It's also happening on my second (non-DRI) head, so it's probably not
> related to that (I'll reboot and try again without any DRI modules loaded
> and see).

I saw/see a lot of those messages on 2.2.18pre19 as well. I hacked the
kernel to show the process in question, and it's always xmms:

LDT allocated for cloned task (pid=20272; count=3)!

20272 pts/10   RN   186:01 xmms

And I do have the xmms-avi plugin in the plugin directory. So if you find a
bug/fix to 2.4, could you please check 2.2 as well? (I'm afraid I'm not
nearly clueful enough.) 

Are these messages serious anyway?


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.2-pre2(&3) loopback fs hang

2001-02-12 Thread Ville Herva

On Mon, Feb 12, 2001 at 03:45:49PM +0200, you [Ville Herva] claimed:
> 
> Ok, then I just fumbled with my ftp client - I notice the 2.4.2-pre1 dir.

Should've been'I didn't notice'

Looks like I can't get anything right, I already rue I answered the question
in the first place...


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://vger.kernel.org/lkml/



Re: 2.4.2-pre2(&3) loopback fs hang

2001-02-12 Thread Ville Herva

On Mon, Feb 12, 2001 at 02:37:51PM +0100, you [Martin Josefsson] claimed:
> On Mon, 12 Feb 2001, Ville Herva wrote:
> 
> 2.4.2-pre1/loop-4.bz2 is the newest one, but I think I saw that there's
> still a bug in it which can hang the kernel.
> I think he said that he was going to release loop-5 soon.

Ok, then I just fumbled with my ftp client - I notice the 2.4.2-pre1 dir.

Thanks for the correction.


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://vger.kernel.org/lkml/



Re: 2.4.2-pre2(&3) loopback fs hang

2001-02-12 Thread Ville Herva

On Mon, Feb 12, 2001 at 01:54:46AM -0800, you [Colonel] claimed:
> 
> >mount -o loop=/dev/loop1 net.i /var/mnt/image/
> 
> ends up in an uninterruptable sleep state (system cannot umount /
> during shutdown).
> 
> How do I track this down?

This is becoming a FAQ.

Go to 

ftp://ftp.kernel.org/pub/linux/kernel/people/axboe/patches

and get the newest Jens Axboe's loopback fs patch (seems to be
2.4.1-pre10/loop-3.bz2 atm, though I thought Jens was going to release
loop-4 sortly.)

See if the problem goes away with it.

I'm not sure if Alan has any plans to merge this to ac-series. It would
appear a worthy candidate...


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://vger.kernel.org/lkml/



Re: [BUG] 2.4.[01] lockups

2001-02-12 Thread Ville Herva

On Sun, Feb 11, 2001 at 09:02:19PM +0100, you [Pavel Machek] claimed:
> Hi!
> 
> > I am experiencing a problem with both 2.4.0 and 2.4.1. The problem is that
> > at seemingly random times the console locks up. After the lockup I can no
> > longer type and the mouse is frozen. As far as I can tell, other systems
> > services are not affected, i.e. programs continue to run, music is being
> > played, I/O is fine. It looks like _only_ the console devices are locked
> > up.
> 
> Login via network or serial cable, and see if /proc/interrupts entry
> for keyboard/mouse changes as you type. Attempt to blink keyboard leds
> with setleds.

Also, try killing gpm.


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Aic7xxx troubles with 2.4.1ac6

2001-02-08 Thread Ville Herva

On Thu, Feb 08, 2001 at 07:53:55AM -0500, you [Doug Ledford] claimed:
> Ville Herva wrote:
> > 
> > It looks like ac6 (which I believe includes the patch you posted) is
> > still a no-go with 7892. The boot halts and it just prints this once a
> > second:
> > 
> > (SCSI0:0:3:1) Synchronous at 160 Mbyte/sec offset 31
> > (SCSI0:0:3:1) CRC error during data in phase
> > (SCSI0:0:3:1)   CRC error in intermediate CRC packet
> 
> Check your cables, especially the connector on the card and the drive.  Look
> for any possible bent pins.  The message you are seeing is *usually*, but not
> always, a legitimate data corruption issue.  It doesn't show up under the
> 5.2.1 driver because it limits your Quantum drive to 80MByte/s and that
> particular speed doesn't include CRC checking.  On this driver you have to be
> running at 160MByte/s before CRC checking is enabled.

I checked the cables. I think HP didn't supply proper 160 MB/S capable
cables (aren't those the ones with wattlings?). When I forced the drive to
80MB/s from bios, not only did aic7xxx/ac6 work like charm, but the BIOS
also found the "missing" MBR. Stupid problem ;).

Thanks for your help!


--
Ville Herva[EMAIL PROTECTED] +358-50-5164500
Viasys Oy  Hannuntie 6  FIN-02360 Espoo  +358-9-2313-2160
PGP key available: http://www.iki.fi/v/pgp.html  fax +358-9-2313-2250
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Aic7xxx troubles with 2.4.1ac6

2001-02-08 Thread Ville Herva

It looks like ac6 (which I believe includes the patch you posted) is
still a no-go with 7892. The boot halts and it just prints this once a
second:

(SCSI0:0:3:1) Synchronous at 160 Mbyte/sec offset 31
(SCSI0:0:3:1) CRC error during data in phase
(SCSI0:0:3:1)   CRC error in intermediate CRC packet

This happens also with ac5+the small patch you posted earlier. ac2 works
fine (although something did corrupt my MBR while using it. It is still
a complete mystery to me what could have done it. Now I'm unable to
boot NT; linux of course works with the boot floppy.)


ac2 dmesg's:

SCSI subsystem driver Revision: 1.00
(scsi0)  found at PCI
3/9/0
(scsi0) Wide Channel, SCSI ID=7, 32/255 SCBs
(scsi0) Downloading sequencer code... 392 instructions downloaded
scsi0 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.2.1/5.2.0
   
(scsi0:0:3:0) Synchronous at 80.0 Mbyte/sec, offset 31.
  Vendor: QUANTUM   Model: ATLAS 10K 18WLS   Rev: UCHK
  Type:   Direct-Access  ANSI SCSI revision: 03
Attached scsi disk sda at scsi0, channel 0, id 3, lun 0
SCSI device sda: 35566480 512-byte hdwr sectors (18210 MB)
Partition check:
 /dev/scsi/host0/bus0/target3/lun0: p1 p2 p3


cat /proc/scsi/aic7xxx/0
Adaptec AIC7xxx driver version: 5.2.1/5.2.0
Compile Options:
  TCQ Enabled By Default : Enabled
  AIC7XXX_PROC_STATS : Enabled

Adapter Configuration:
   SCSI Adapter: Adaptec AIC-7892 Ultra 160/m SCSI host adapter
   Ultra-160/m LVD/SE Wide Controller at PCI 3/9/0
PCI MMAPed I/O Base: 0xfc8ff000
 Adapter SEEPROM Config: SEEPROM found and used.
  Adaptec SCSI BIOS: Enabled
IRQ: 10
   SCBs: Active 0, Max Active 8,
 Allocated 31, HW 32, Page 255
 Interrupts: 42493
  BIOS Control Word: 0x58a4
   Adapter Control Word: 0x1c5e
   Extended Translation: Enabled
Disconnect Enable Flags: 0x
 Ultra Enable Flags: 0x
 Tag Queue Enable Flags: 0x0008
Ordered Queue Tag Flags: 0x0008
Default Tag Queue Depth: 8
Tagged Queue By Device array for aic7xxx host instance 0:
  {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}
Actual queue depth per device for aic7xxx host instance 0:
  {1,1,1,8,1,1,1,1,1,1,1,1,1,1,1,1}

Statistics:

(scsi0:0:3:0)
  Device using Wide/Sync transfers at 80.0 MByte/sec, offset 31
  Transinfo settings: current(10/31/1/0), goal(10/127/1/0),
user(9/127/1/2)
  Total transfers 42420 (34614 reads and 7806 writes)
 < 2K  2K+ 4K+ 8K+16K+32K+64K+   128K+
   Reads:  39   0   2517536744257 946 433  90
  Writes:   0   035581864 502 480 455 947

cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 03 Lun: 00
  Vendor: QUANTUM  Model: ATLAS 10K 18WLS  Rev: UCHK
  Type:   Direct-AccessANSI SCSI revision: 03


-- 
Ville Herva[EMAIL PROTECTED] +358-50-5164500
Viasys Oy  Hannuntie 6  FIN-02360 Espoo  +358-9-2313-2160
PGP key available: http://www.iki.fi/v/pgp.html  fax +358-9-2313-2250
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Knowing what options a kernel was compiled with

2001-01-29 Thread Ville Herva

On Mon, Jan 29, 2001 at 08:56:12PM -, you [mirabilos] claimed:
> From: "Torrey Hoffman" <[EMAIL PROTECTED]>
> > Should someone submit a patch to copy the .config to a standard location as
> > part of "make install" or "make modules_install"? If included in the
> > official sources, that good example would encourage the distribution
> > maintainers do the same. 

I find this neat:

http://www.it.uc3m.es/~ptb/proconfig/

It created /proc/config entry with obvious functionality, but wastes
pretty little ram.


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.4.0 quality [was: 2.4.0 uptime]

2001-01-26 Thread Ville Herva

On Fri, Jan 26, 2001 at 08:37:38AM +0100, you [Hans Eric Sandström] claimed:
> BP6/Dual Cel 400 (the 2.0 load is setiathome)
> --
> [root@zekeserv /root]# uptime
>   8:28am  up 20 days, 13:04,  2 users,  load average: 2.00, 2.00, 2.00
> [root@zekeserv /root]# uname -a
> Linux zekeserv 2.4.0 #2 SMP Fri Jan 5 07:37:01 CET 2001 i686 unknown
> [root@zekeserv /root]#

Yeah. I think it can be concluded that 2.4.0 was pretty good for a .0
release. IMHO and based on my own limited experience it's much better than
2.2.0. It certainly does pretty well with that kind of ordinary CPU load,
and the bugs (if any) are related to more exotic conditions.

For most people, it appears to have worked _well_. It certainly has worked
fine for me (I had one X lock up with it, but I'd blame the nvidia drivers
for that altough they are very stable on 2.2.).

What known bugs are there btw? I think the only more serious were the
software RAID5 bug and the VIA driver issues, both of which caused fs
corruption. Some people reported problems with X (is this the same forking
problem Jeff Merkey et all tried to hunt down pre-time), and some had
trouble with booting 386 and/or other hardware. Any more? 

Hopefully Linux will squash all of them in 2.4.1!


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



[OFFTOPIC] Re: make mrproper

2001-01-25 Thread Ville Herva

On Thu, Jan 25, 2001 at 09:00:26AM -0500, you [James Lewis Nance] claimed:
> 
> ( mrproper == Mr. Proper )
>
> I saw a post from Linus once about this.  It is Finnish for "Mr. Clean".

Just to be sure: 'proper' does not mean anything in Finnish (nor Swedish
for that matter AFAIK) it just the European(?) product name for 'Mr
Clean'. Possibly it's from German ('proper' = 'clean').


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: File System Corruption with 2.2.18

2001-01-18 Thread Ville Herva

On Thu, Jan 18, 2001 at 12:45:13AM -0800, you [Andre Hedrick] claimed:
> 
> > But it works on all ATA disks? Does it work for SCSI as well?
> 
> The KIOBUFS version may, but not the taskfile version.

Ok.
 
> > I think it would be cool if you'd make it available (on linux-ide.org?),
> > so that people install servers (and anybody who _cares_) could test their
> > hardware/driver combination before starting using the box.
> 
> who is going to ship install images with this kernel code added and the
> most dangerous feature ever create eanble?

I don't think any vendor has to ship it. The memtest86 suite is available
as a floppy image, the harddisk tester could be as well. (Of course with a
LARGE alert saying "THIS *DESTROYS* ALL YOUR HARDDISK CONTENTS, please
unplug everyting you don't want tested to be sure".

> > Now we have memtest86, cpuburn (what more). It would be nice to have a
> > good (if not complete) test set to run on each new linux box. It's not
> > nice to use the box for a month and then go "f..., this box has faulty
> > memory!" or ..."faulty disk!". Yes, that's what's happened to all of us.
> > 
> > It's much nicer to get a warranty replacement, when you don't have any
> > data on the disk.
> 
> It can do that without a file system also ;-)

Yes.


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: File System Corruption with 2.2.18

2001-01-18 Thread Ville Herva

On Wed, Jan 17, 2001 at 05:14:02PM -0800, you [Andre Hedrick] claimed:
> On Thu, 18 Jan 2001, Tim Fletcher wrote:
> 
> > > Well that is useless test them because you can not test things completely.
> > 
> > I ment that if the partiton has no persient data on it then the test can
> > be run (the test wipes all data on the partition out during the test,
> > right?) with no loss of data on the machine. The partition is still on the
> > same disk so the test data is valid?
> > 
> > I am thinking that the test is somewhat like badblocks -w or have I got
> > the wrong end of the stick?
> 
> Sorry there is no stick to get the end of
> This is a pure diagnostic tool the determine OS/CHIPSET/DEVICE failures.
> You generate a pattern buffer and write it to the disk and step the buffer
> 1 byte per sector and go head to tail.  Then you read it back head to tail
> and compare what should be there with what is there.  Failures == FS
> corruption is likely under highest loads, period.  Then you attempt 
> to extract any patterns or periodic events to determine if it is driver or
> device or other portions of the OS.
> 
> I am tired of people pointing the finger at me claim my work is the cause
> of FS corruption.
> 
> This is a pattern walk and it will give some performance issue.
> It does not care about the OS, it is doing the direct access that some
> would call bit-bangging in the old days.

But it works on all ATA disks? Does it work for SCSI as well?

I think it would be cool if you'd make it available (on linux-ide.org?),
so that people install servers (and anybody who _cares_) could test their
hardware/driver combination before starting using the box.

Now we have memtest86, cpuburn (what more). It would be nice to have a
good (if not complete) test set to run on each new linux box. It's not
nice to use the box for a month and then go "f..., this box has faulty
memory!" or ..."faulty disk!". Yes, that's what's happened to all of us.

It's much nicer to get a warranty replacement, when you don't have any
data on the disk.


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Which kernel fixes the VM issues?

2001-01-07 Thread Ville Herva

On Sun, Jan 07, 2001 at 01:47:39PM +0100, you [Andre Tomt] claimed:
> > > This issue is fixed in 2.2.18 AFAIK (never seen it since).
> >
> > Nope.
> >
> > It's fixed 2.2.19pre2 (which includes the Andrea Arcangeli's vm-global-7
> > patch that (among other things) fixes this.)
> 
> I stand corrected. Still, with almost-vanilla 2.2.18 (+ ow patches) on a
> highly loaded webserver has not shown any "LRU block list corruption"
> crashes in over 6 weeks, even when it usually died after a week on 2.2.17
> with the same error (if memory serves me right). Could be the system tuning
> that has "fixed" this by making the usual load not - err - load the server
> as much as before.

I'm not sure about the "LRU block list" thing (it may be another issue),
but AFAIK it's vm-global that fixes the try_to_free_pages thing. It's also
my experience, that the try_to_free_pages is completely gone after
applying vm-global to otherwise identical 18pre.
 
> > You can also apply the vm-global-patch to 2.2.18 if you like.
> 
> Yep, as stated in my previous mail :-)


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Which kernel fixes the VM issues?

2001-01-07 Thread Ville Herva

On Sun, Jan 07, 2001 at 12:50:07PM +0100, you [Andre Tomt] claimed:
> > of the fuzz
> > I have relating to the VM: do_try_to_free_pages issue.
> 
> 
> 
> > About once a week I get the 'VM: do_try_to_free_pages ...' error and
> > eventually get a complete system lockup. And just this morning it
> > locked up
> > again, although this time with a 'VFS: LRU block list corrupted'
> > message in
> > the logs, which i'm assuming is related to the VM issue as well.
> 
> This issue is fixed in 2.2.18 AFAIK (never seen it since).
> 
> 

Nope.

It's fixed 2.2.19pre2 (which includes the Andrea Arcangeli's vm-global-7
patch that (among other things) fixes this.)

You can also apply the vm-global-patch to 2.2.18 if you like.


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [2.2.18] VM: do_try_to_free_pages failed

2000-12-20 Thread Ville Herva

On Wed, Dec 20, 2000 at 01:03:00PM +0100, you [Matthias Andree] claimed:
> Last night, one of your production machines got wedged, I caught a lot
> of kernel: VM: do_try_to_free_pages failed for ... for a whole range of
> processes, among them ypbind, klogd, syslogd, xntpd, cron, nscd, X,
> How can I get rid of those do_try_to_free_pages lockups? That box
 
Almost everybody (including me) who have seen that problem seem to
have had it fixed with Andrea Arcangeli's VM-global-7 patch
(ftp://ftp.kernel.org/pub/linux/kernel/people/andrea...).

> Should I try the most recent 2.2.19-pre?

Yes, Andrea's VM-global-7 is included in pre2.


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.2.18pre19 SMP: LDT allocated for cloned task

2000-12-05 Thread Ville Herva

Are these noteworthy: "LDT allocated for cloned task" (2.2.18pre19 SMP)? I
put an additional printk there to see which pid it is, and it is xmms.
The usage count mm->count is 3.


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.2.18pre19 oops in try_to_free_pages

2000-11-29 Thread Ville Herva

On Wed, Nov 29, 2000 at 01:38:01AM +0100, you [Andrea Arcangeli] claimed:
> On Tue, Nov 28, 2000 at 01:44:18PM +0200, Ville Herva wrote:
> > try Andrea's vm-global-7 now. It seems to include the bits Rik posted, or
> 
> It doesn't include the bits Rik posted because they were unnecessary.

Ummh. What am I smoking?

% patch -p1 --dry-run < ../riel-vm.patch
patching file mm/vmscan.c
Reversed (or previously applied) patch detected!  Assume -R? [n]
Apply anyway? [n]
Skipping patch.
7 out of 7 hunks ignored -- saving rejects to file mm/vmscan.c.rej

I was sure that I only applied vm-global-7 on top of 2.2.18pre19 before
that. Perhaps I have applied Rik's patch on some morning before waking up,
but...


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.2.18pre19 oops in try_to_free_pages

2000-11-28 Thread Ville Herva

On Tue, Nov 28, 2000 at 12:57:59PM +, you [Alan Cox] claimed:
> > I wasn't the one who used cdrom, so it is possible, that the person in
> > question had been able to eject the cd without unmounting it first. I'll
> > check if the door locking on that device works.
> 
> Also rpm -e magicdev --nodeps if magicdev is on the box.

Oops. Seems that there was one. Looks like I did not check the system well
enough after rh70 install...

> > But you are certain that the oops was eventually caused by these and not
> > by any bug in vm?
> 
> This one . Yes.

Ok, good. So avoiding ejecting cdrom without unmounting first will save me
from further oopses.

> The VM layer isnt causing any oopses I've seen in 2.2.17+. It doesn't always
> make good choices and Rik or Andrea's stuff is on the list after 2.2.18

Yes. The other problem I saw with 2.2.18pre vm wasn't an oops, it was a
rampaging vm rambo that slaughtered my X while it was idle. Admittedly
that is not as severe as oops, so the vm situation is not as bad as I
thought.

> (I refuse to mix VM changes with the big driver changes)

No problem. 

 
-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.2.18pre19 oops in try_to_free_pages

2000-11-28 Thread Ville Herva

On Tue, Nov 28, 2000 at 12:45:49PM +, you [Alan Cox] claimed:
> > BTW: What are those seemingly harmless "VFS: busy inodes on changed
> > media." messages I'm getting tons of?
> 
> They are not harmless. Someone forcibly unmounted a disk of some sort
> from a device that was in use, and while that shouldnt have killed the box
> it seems it did

I've got hundreds of those during several weeks, and in different boots.
It seems the kernel starts spitting them right after using cdrom.

I wasn't the one who used cdrom, so it is possible, that the person in
question had been able to eject the cd without unmounting it first. I'll
check if the door locking on that device works.

But you are certain that the oops was eventually caused by these and not
by any bug in vm?


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.2.18pre19 oops in try_to_free_pages

2000-11-28 Thread Ville Herva

This is a dual Pro200. Running mainly oracle. The kernel is a 2.2.18pre19
+ide-patch. The oops was unfortunately mangled by sysklogd, but I did try
to reconstruct it. Sysklogd apparently used the wrong System.map (the
redhat default one I had accidentally left in /boot). I tried to map the
addresses back and re-lookup them from the correct System.map. Do the
addresses I got make any sense?

After the opps, the box had locked up solid (in XFree dmps state), and I
had no way of reading the oops from console. The machine did not react to
any key combination and the networking was dead.

The current 2.2.18pre vm seems very unstable to me. This is the second
machine I'm having serious trouble with it (out of three boxes I'm running
2.2.18pre on). Please consider something for the 2.2.18 final (I haven't
heard Alan comment on the Andrea's VM-global patch?). Anyway, I'm going to
try Andrea's vm-global-7 now. It seems to include the bits Rik posted, or
am I mistaken?

BTW: What are those seemingly harmless "VFS: busy inodes on changed
media." messages I'm getting tons of?


Nov 28 07:12:37 gistraktori kernel: VFS: busy inodes on changed media.
Nov 28 07:12:37 gistraktori kernel: Unable to handle kernel paging request at virtual 
address 00400034
Nov 28 07:12:37 gistraktori kernel: current->tss.cr3 = 00101000, %%cr3 = 00101000
Nov 28 07:12:37 gistraktori kernel: *pde = 
Nov 28 07:12:37 gistraktori kernel: Oops: 0002
Nov 28 07:12:37 gistraktori kernel: CPU:0
Nov 28 07:12:37 gistraktori kernel: EIP:0010:[sys_writev+89/176]
Nov 28 07:12:37 gistraktori kernel: EFLAGS: 00010206
Nov 28 07:12:37 gistraktori kernel: eax: 0040   ebx: c9420300   ecx: c9420300   
edx: cfeb42e0
Nov 28 07:12:37 gistraktori kernel: esi: c9420300   edi:    ebp: c0449aa0   
esp: cffdff78
Nov 28 07:12:37 gistraktori kernel: ds: 0018   es: 0018   ss: 0018
Nov 28 07:12:37 gistraktori kernel: Process kswapd (pid: 5, process nr: 6, 
stackpage=cffdf000)
Nov 28 07:12:37 gistraktori kernel: Stack: c9420300 c0129ac9 c9420300 c0449aa0 
07d7 0001 0030 fffe
Nov 28 07:12:37 gistraktori kernel:c011df46 c0449aa0 0001 cffde000 
000c 0006 0001 0001
Nov 28 07:12:37 gistraktori kernel:c012355f 0006 0030 cffde000 
c01dd85a cffde2bd 00a0 cffde000
Nov 28 07:12:37 gistraktori kernel:
Call Trace:
[set_blocksize+409/492]
[do_mmap+930/1004]
[kmem_cache_alloc+159/348]
[ide_hwif_to_major+2167/2437]
[kmem_cache_free+59/408]
[get_options+0/116]
[apm_enable_power_management+111/112]
Nov 28 07:12:37 gistraktori kernel: Code: 89 50 34 c7 01 00 00 00 00 89 02
c7 41 34 00 00 00 00 ff 0d
Nov 28 07:12:39 gistraktori kernel: VFS: busy inodes on changed media.
Nov 28 07:13:11 gistraktori last message repeated 11 times
(and hung solid at this point)


>From the false System.map
c0129930 T set_blocksize   + 409  = C0129AC9
c011dba4 T do_mmap + 930  = C011DF46
c01234c0 T kmem_cache_alloc+ 159  = C012355F
c01dcfe3 r ide_hwif_to_major   + 2167 = C01DD85A
c012361c T kmem_cache_free + 59   = C0123657
c0106000 T get_options + 0= C0106000
c0106488 t apm_enable_power_management + 111  = C01064F7


>From the correct System.map:
c0129a90 T try_to_free_buffers
c011de2c T shrink_mmap
c0123504 t do_try_to_free_pages
c01dbb40 r tvecs
c01235f4 T kswapd
c0106000 T get_options
c01064d4 T kernel_thread

lsmod at the time:

lsmod
Module  Size  Used by
eepro100   16736   1  (autoclean)
st 25388   0  (unused) 


.config:
#
# Automatically generated by make menuconfig: don't edit
#

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y

#
# Processor type and features
#
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
CONFIG_M686=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_TSC=y
CONFIG_X86_GOOD_APIC=y
# CONFIG_MICROCODE is not set
# CONFIG_X86_MSR is not set
# CONFIG_X86_CPUID is not set
CONFIG_1GB=y
# CONFIG_2GB is not set
# CONFIG_MATH_EMULATION is not set
# CONFIG_MTRR is not set
CONFIG_SMP=y

#
# Loadable module support
#
CONFIG_MODULES=y
CONFIG_MODVERSIONS=y
CONFIG_KMOD=y

#
# General setup
#
CONFIG_NET=y
CONFIG_PCI=y
# CONFIG_PCI_GOBIOS is not set
# CONFIG_PCI_GODIRECT is not set
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_QUIRKS=y
# CONFIG_PCI_OPTIMIZE is not set
CONFIG_PCI_OLD_PROC=y
# CONFIG_MCA is not set
# CONFIG_VISWS is not set
CONFIG_X86_IO_APIC=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_SYSVIPC=y
# CONFIG_BSD_PROCESS_ACCT is not set
CONFIG_SYSCTL=y
CONFIG_BINFMT_AOUT=y
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_MISC=y
# CONFIG_BINFMT_JAVA is not set
# CONFIG_PARPORT is not set
# CONFIG_APM is not set
# CONFIG_TOSHIBA is not set

#
# Plug and Play support
#
# CONFIG_PNP is not set

#
# Block devices
#
CONFIG_BLK_DEV_FD=y
CONFIG_BLK_D

Re: [PATCH] blindingly stupid 2.2 VM bug

2000-11-18 Thread Ville Herva

On Sat, Nov 18, 2000 at 10:04:02PM -0200, you [Rik van Riel] said:
> Hi Alan,
> 
> here's a fix for a blindingly stupid bug that's been in
> 2.2 for ages (and which I've warned you about a few times
> in the last 6 months, and which I've even sent some patches
> for).
> 
> This patch should make 2.2 VM a bit more stable and should
> also fix the complaints from people who's system gets
> flooded by "VM: do_try_to_free_pages failed for process XXX"

Okay, I see those "VM: do_try_to_free_pages failed for process XXX" errors
as well (2.2.18pre19, uptime 8 days, machine had been idle for hours.
Then, all of a sudden, I get 30 times "VM: do_try_to_free_pages failed for
kswapd...", then 15 "VM: do_try_to_free_pages failed for xmms...", then
"VM: killing process xmms" and that repeats for ~10 processes including
X.) Never had problems with earlier 2.2.x.

My questions is: I saw Andrea's VM-global patch being recommended as a
solution for this problem, and I already compiled it in (although I
haven't booted into it yet). Should I use Rik's or Andrea's patch?

Is either of them going into 2.2.18?


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.2.18pre19 and HP DAT40i: mysterious medium error

2000-11-13 Thread Ville Herva
ogram of some sort), but there's no Linux version. They did even
  call me back and suggest different SCSI adapter settings (lower
  transfer rate etc). That didn't help. They said they were going to
  bring the issue up "with their engineers" and I haven't heard of them
  since.

(Kai: for your convenience, I've marked the points that I think I haven't
yet reported to you with 'x').

The place where the writing fails varies from 10MB to 20GB. On successive
trials with the same data, the point of failure *tends to be the same*,
even with different tapes! On fifth run or so, however, the writing
process may go much farther.

This could of course be caused by the RAID, IDE and/or e2compr patch, but
it seems a bit far fetched since the "dd if=/dev/sda of=/dev/st0" should
touch none of them (the two ide disks are md'd together, and the only
e2compr'd fs is on that device.)

I'm also a bit confused of the fact that the said dd command sometimes
yields transfer rate of ~3MB/s (for multiple gigas of same data) but
sometimes only ~1MB/s (with the same data, tape drive and scsi
adapter). What's also weird is that it seems that the failure rate has
gone up during these months of debugging. At first, it seemed to fail
maybe every second or third time, but now it happens practically every
time. On the other machines, it still works.

Originally I thought the problem was caused by too low input data rate
when tarring from the compressed fs. dd'ing the scsi disk should however
be fast enough.

There are absolutely no other problems with the machine or the kernel.
I've ran memtest's, kernel compilations (a la -j10), seti@homes etc and
even before installing more fans (before the tape drive was installed)
when the machine ran at >60C ambient temperature(!), there were NO
stability problems. The temperature inside the box and and on CPU's
remains well below 35C nowadays.

I'm _very_ low on ides here. The only thing I've not tried is installing
NT on the problematic machine, but I a somewhat hasty to do that since the
machine is in production use. The machine is known, though, to have worked
well with a 8GB scsi drive and the 2940UW in its previous life as an NT
workstation.

If anybody can suggest anything I could try, please do so... I'm also
interested in success/failure reports on similar hardware. If any
information is missing from this post, please ask.

I also want to thank Kai Mäkisara for his forbearing efforts with this
problems -- even though a solution is yet to be found.


--
Ville Herva[EMAIL PROTECTED]+358-50-5164500
Viasys Oy  Pohjantie 3  FIN-02100 Espoo  +358-9-4301460
PGP key available: http://www.iki.fi/v/pgp.html  fax +358-9-4301221
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 440FX and DMA on 2.2.18pre18

2000-11-05 Thread Ville Herva

On Sun, Nov 05, 2000 at 09:27:36AM -0800, you [Andre Hedrick] said:  
>   
> On Sun, 5 Nov 2000, Ville Herva wrote:
>   
> > You mean that if I apply the IDE-patch, I can get some DMA mode working?
>   
> YES!!! in a word ;-)  

Okay, I can confirm that:   

hdparm -c1 -d1 /dev/hda 

/dev/hda:   
 setting 32-bit I/O support flag to 1   
 setting using_dma to 1 (on)
 I/O support  =  1 (32-bit) 
 using_dma=  1 (on) 


hdparm  /dev/hda

/dev/hda:   
 multcount=  0 (off)
 I/O support  =  1 (32-bit) 
 unmaskirq=  0 (off)
 using_dma=  1 (on) 
 keepsettings =  1 (on) 
 nowerr   =  0 (off)
 readonly =  0 (off)
 readahead=  8 (on) 
 geometry = 3737/255/63, sectors = 60036480, start = 0  

 DMA modes: mdma0 mdma1 *mdma2 udma0 udma1 udma2 udma3 udma4 udma5  

hdparm -tT /dev/hda 

/dev/hda:   
 Timing buffer-cache reads:   128 MB in  2.55 seconds = 50.20 MB/sec
 Timing buffered disk reads:  64 MB in  5.73 seconds = 11.17 MB/sec 


Not a top-notch result, but I guess 440FX just isn't capable of better. In  
any case, that's heck of a lot better that the <3MB/s I was getting in  
PIO mode without the IDE-patch. 


-- v -- 

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



440FX and DMA on 2.2.18pre18

2000-11-05 Thread Ville Herva

I have a dual Ppro200 with 440FX chipset and an IBM 30GB ide disk. The
kernel is 2.2.18pre18 with no additional patches. DMA appears not to work
with this combination.

lspci:

00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:07.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] (rev 01)
00:07.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:07.2 USB Controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01)
00:0c.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 02)
00:0d.0 VGA compatible controller: Matrox Graphics, Inc. MGA 2164W [Millennium II]
00:0e.0 SCSI storage controller: Adaptec AHA-294x / AIC-7871

hdparm -i /dev/hda:

/dev/hda:

 Model=IBM-DTLA-305030, FwRev=TW3OA60A, SerialNo=YG0YG017000
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=40
 BuffType=DualPortCache, BuffSize=380kB, MaxMultSect=16, MultSect=off
 CurCHS=16383/16/63, CurSects=-66060037, LBA=yes, LBAsects=60036480
 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes: pio0 pio1 pio2 pio3 pio4 
 DMA modes: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 

hdparm /dev/hda:

/dev/hda:
 multcount=  0 (off)
 I/O support  =  0 (default 16-bit)
 unmaskirq=  0 (off)
 using_dma=  0 (off)
 keepsettings =  0 (off)
 nowerr   =  0 (off)
 readonly =  0 (off)
 readahead=  8 (on)
 geometry = 3737/255/63, sectors = 60036480, start = 0

If I do hdparm -d1; hdparm -tT /dev/hda

hda: timeout waiting for DMA
hda: irq timeout: status=0x58 { DriveReady SeekComplete DataRequest }
VFS: Disk change detected on device ide0(3,64)
hda: timeout waiting for DMA
hda: irq timeout: status=0x58 { DriveReady SeekComplete DataRequest }
hda: timeout waiting for DMA
hda: irq timeout: status=0x58 { DriveReady SeekComplete DataRequest }
hda: timeout waiting for DMA
hda: irq timeout: status=0x58 { DriveReady SeekComplete DataRequest }
hda: DMA disabled
ide0: reset: success

On another box with a single PPro200 and 440FX chipset, DMA works just
fine. That box also has IBM disk (30GB) but runs 2.0.37.

Should I try the IDE-patch or suspect the cable? Can anybody confirm that
DMA works with 440FX and a recent 2.2.x?


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: eepro100: card reports no resources [was VM-global...]

2000-10-29 Thread Ville Herva

On Mon, Oct 30, 2000 at 02:23:56PM +0800, you [Andrey Savochkin] claimed:
> Hello,
> 
> On Thu, Oct 26, 2000 at 07:35:08PM +0300, Ville Herva wrote:
> > Markus Pfeiffer <[EMAIL PROTECTED]> wrote:
> > > 
> > > > Oct 26 11:24:13 ns29 kernel: eth0: card reports no resources.
> > > > Oct 26 11:24:15 ns29 kernel: eth0: card reports no resources.
> > > > Oct 26 12:22:21 ns29 kernel: eth0: card reports no resources.
> > > > Oct 26 16:16:59 ns29 kernel: eth0: card reports no resources.
> > > > Oct 26 16:28:37 ns29 kernel: eth0: card reports no resources.
> > > > Oct 26 16:38:01 ns29 kernel: eth0: card reports no resources.
> > > > 
> > > let me guess: intel eepro100 or similar??
> > > Well known problem with that one. dont know if its fully fixed ... With
> > 
> > Happens here too, with 2xPPro200, 2.2.18pre17, Eepro100 and light load.
> > The network stalls for several minutes when it happens.
> > 
> > > 2.4.0-test9-pre3 it doesnt happen on my machine ...
> > 
> > What about a fix for a 2.2.x...?
> 
> The exact reason for this problem is still unknown.

I have to take some of what I said back. The network stalls were actually   
due to a lot more stupid problem: the server went into a power saving   
state and the NIC IRQ didn't wake it up (mouse did). After I disabled the   
the relevant stupidnesses from the BIOS setup, it now works.

The error message, however, does appear with the 2.2.18pre17 eepro100.c,
while as the Becker's version does not show it for whatever reason. 


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: VM-global-2.2.18pre17-7

2000-10-27 Thread Ville Herva

On Fri, Oct 27, 2000 at 11:29:08AM -0200, you [Marcelo Tosatti] claimed:
> 
> 
> On Fri, 27 Oct 2000, Neale Banks wrote:
> 
> > On Thu, 26 Oct 2000, octave klaba wrote:
> > 
> > > > > Oct 26 16:38:01 ns29 kernel: eth0: card reports no resources.
> > > > let me guess: intel eepro100 or similar??
> > > yeap
> > 
> > er, "me too":
> > 
> >   Bus  0, device   2, function  0:
> > Ethernet controller: Intel 82557 (rev 8).
> >   Medium devsel.  Fast back-to-back capable.  IRQ 10.  Master Capable.  
>Latency=64.  Min Gnt=8.Max Lat=56.
> >   Non-prefetchable 32 bit memory at 0xb5fff000 [0xb5fff000].
> >   I/O at 0x2400 [0x2401].
> >   Non-prefetchable 32 bit memory at 0xb5e0 [0xb5e0].
> > 
> > On Debian's 2.2.17-compact on a Compaq DL380 - with 60 days uptime I have
> > 6 "eth0: card reports no resources." messages reported in dmesg.
> 
> We are having the same problem with eepro100 on a Compaq DL360. 
> 
> v1.11 of eepro100.c fixed the problem:
> 
> ftp://ftp.scyld.com/pub/network/eepro100.c

The eepro100 problem (2.2.18pre17 stock) happens here too: "card reports
no resources" and then the network stalls for few minutes.

The hack suggested by David Richardson (
http://marc.theaimsgroup.com/?l=linux-kernel&m=96514412914742&w=2)
did not help.

The Becker's driver from ftp://ftp.scyld.com/pub/network/eepro100.c cures
the error messages, but the network still stalls, and worse yet, seems to
stall forever (as opposed to few minutes with 2.2.18pre17 driver).

A network problem is not out of question (although the rest of the network
works just fine, and we did try another HUB port). It could also be flaky
card, but the machine and the card worked fine for years in their past
life under NT.

This is dual PPro200, 256MB, nothing fancy. 


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



eepro100: card reports no resources [was VM-global...]

2000-10-26 Thread Ville Herva

Markus Pfeiffer <[EMAIL PROTECTED]> wrote:
> 
> > Oct 26 11:24:13 ns29 kernel: eth0: card reports no resources.
> > Oct 26 11:24:15 ns29 kernel: eth0: card reports no resources.
> > Oct 26 12:22:21 ns29 kernel: eth0: card reports no resources.
> > Oct 26 16:16:59 ns29 kernel: eth0: card reports no resources.
> > Oct 26 16:28:37 ns29 kernel: eth0: card reports no resources.
> > Oct 26 16:38:01 ns29 kernel: eth0: card reports no resources.
> > 
> let me guess: intel eepro100 or similar??
> Well known problem with that one. dont know if its fully fixed ... With

Happens here too, with 2xPPro200, 2.2.18pre17, Eepro100 and light load.
The network stalls for several minutes when it happens.

> 2.4.0-test9-pre3 it doesnt happen on my machine ...

What about a fix for a 2.2.x...?


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] 2.2: /proc/config.gz

2000-09-01 Thread Ville Herva

On Thu, Aug 31, 2000 at 02:52:56PM +0200, you [[EMAIL PROTECTED]] claimed:
>
> Does  also include the build number (i.e. the first part of
> UTS_VERSION) ? Is it resilient to patches where, by accident,
> EXTRAVERSION or such hasn't been incremented ? Will people always

Speaking of patches, it would be nice to have a standard way for patches
(I'm not speaking of pre-pacthes and such, but feature-adding
not-included-in-main-tree patches) to add their name and version info
somewhere in the source tree.

For example, if I have 2.2.16pre5 kernel and the following patches:

reiserfs, hedrick-ide, proconfig, lm-sensors, pc-speaker, e2compr,
softraid-0.9x

after applying the patches, I would have the something like this in the
source tree:

cat /usr/src/linux/.patches
16pre5
reiserfs-1.3.20
hedrick-ide-31052000
proconfig-0.81
lm-sensors-2.5.0
pc-speaker-0.9
e2compr-0.4.31
softraid-0.9x-6a

so that I can tell what a given source tree contains after 2 months.
Proconfig or /proc/config.gz -patch might even include this information,
so I could get this info through /proc/version or /proc/extra-version or
something. Of course, .patches could contain more than that, for
example URL and maintainer.

Just a thought.


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] 2.2: /proc/config.gz

2000-08-30 Thread Ville Herva

On Wed, Aug 30, 2000 at 10:36:09AM -0500, you [Timur Tabi] claimed:
> ** Reply to message from Chip Salzenberg <[EMAIL PROTECTED]> on Tue, 29 Aug 2000
> 18:27:20 -0700
> 
> 
> > +CONFIG_PROC_CONFIG
> > +  Say Y here if you want a copy of your current kernel configuration
> > +  saved in the kernel that you build. This is extremely useful if you
> > +  ever build more than one kernel. The cost is around 1K-4K of running
> > +  memory. Only say no if you really can't spare this. You can sneeze
> > +  and lose more on memory than this.
> 
> Wow, this is incredibly useful!!!  Why, or why, isn't this part of the standard
> kernel?!?!?  It would save so many headaches building kernels.

Note that this

 http://www.it.uc3m.es/~ptb/proconfig/

exists, too.


-- v --

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/