Re: Nouveau DRM failure on 5120x1440 screen with 5.8/5.9 kernel

2020-10-13 Thread Byron Stanoszek

On Tue, 13 Oct 2020, Byron Stanoszek wrote:


I'm having a problem with both the 5.8 and 5.9 kernels using the nouveau DRM
driver. I have a laptop with a VGA card (specs below) connected to a
5120x1440 screen. At boot time, the card correctly detects the screen, tries
to allocate fbdev fb0, then the video hangs completely for 15-30 seconds
until it goes blank.


This message eventually displays after a while:

Workqueue: nvkm-disp nv50_disp_super
RIP: 0010:nv50_disp_super_2_2+0x1b0/0x470
Code: 69 00 00 48 69 c0 d3 4d 62 10 48 c1 e8 26 49 89 c5 0f b7 43 40 44 89 e9 8d 44 
02 f9 0f b7 53 46 29 d0 31 d2 48 98 49 0f af c4 <48> f7 f1 48 89 c6 0f b7 43 4e 
0f b7 53 4c 83 e8 19 29 d0 31 d2 48
RSP: 0018:c95e3e08 EFLAGS: 00010206
RAX:  RBX: 88841b08ed20 RCX: 
RDX:  RSI: c90003614200 RDI: 820c1140
RBP: 88841b202060 R08:  R09: 61ce
R10: 0018 R11: 0018 R12: 
R13:  R14: 88841b96b800 R15: 88841b975000
FS:  () GS:88841dc0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f922e61e000 CR3: 0240a004 CR4: 001706b0
Call Trace:
 ? nvkm_dp_disable+0x5d/0x70
 ? nv50_disp_super+0x137/0x220
 ? process_one_work+0x19c/0x2c0
 ? worker_thread+0x48/0x350
 ? process_one_work+0x2c0/0x2c0
 ? kthread+0x129/0x150
 ? __kthread_create_worker+0x100/0x100
 ? ret_from_fork+0x22/0x30
---[ end trace dbb0d14fd1ddb445 ]---
nouveau :01:00.0: DRM: core notifier timeout

Thanks,
 -Byron



Nouveau DRM failure on 5120x1440 screen with 5.8/5.9 kernel

2020-10-13 Thread Byron Stanoszek

I'm having a problem with both the 5.8 and 5.9 kernels using the nouveau DRM
driver. I have a laptop with a VGA card (specs below) connected to a 5120x1440
screen. At boot time, the card correctly detects the screen, tries to allocate
fbdev fb0, then the video hangs completely for 15-30 seconds until it goes
blank.

This used to work in Linux 5.7 and earlier, although it allocated a 3840x1080
fb instead of a 5120x1440. I've attached the full dmesg. I tried commands like
video=DP-2:3840x1080 but it doesn't help.

Linux 5.8 boots without hanging if the laptop is not connected to the 5120x1440
screen.


PCI specs:

01:00.0 0300: 10de:0dfc (rev a1)
01:00.0 VGA compatible controller: NVIDIA Corporation GF108GLM [NVS 5200M] (rev 
a1)


xrandr available resolutions reported (from Linux 5.7 using Xorg):

Screen 0: minimum 320 x 200, current 5120 x 1440, maximum 16384 x 16384
LVDS-1 unknown connection (normal left inverted right x axis y axis)
   1600x900  59.99 +  40.00
   5120x1440 60.00
   1360x1020 73.97
   1152x864  59.97
   1024x768  59.95
   800x600   59.96
   640x480   59.94
DP-1 disconnected (normal left inverted right x axis y axis)
DP-2 connected primary 5120x1440+0+0 (normal left inverted right x axis y axis) 
1200mm x 340mm panning 5120x1440+0+0
   3840x1080 59.97 +
   5120x1440 29.98*
   2560x1080 60.0059.9459.98
   1920x1080 60.0060.0050.0059.94
   1920x1080i60.0050.0059.94
   1600x1200 60.00
   1280x1024 75.0260.02
   1280x800  59.81
   1152x864  75.00
   1280x720  60.0050.0059.94
   1024x768  75.0360.00
   800x600   75.0060.32
   720x576   50.00
   720x480   60.0059.94
   640x480   75.0060.0059.94
   720x400   70.08
HDMI-1 disconnected (normal left inverted right x axis y axis)
VGA-1 disconnected (normal left inverted right x axis y axis)

I'm currently using 5120x1440@30. 60 Hz isn't available. But look below:


xrandr resolutions from Linux 5.9 (even though screen is still blank):

Screen 0: minimum 320 x 200, current 5120 x 1440, maximum 16384 x 16384
LVDS-1 unknown connection (normal left inverted right x axis y axis)
   1600x900  59.99 +  40.00
   5120x1440 60.00
   1360x1020 73.97
   1152x864  59.97
   1024x768  59.95
   800x600   59.96
   640x480   59.94
DP-1 disconnected (normal left inverted right x axis y axis)
DP-2 connected primary 5120x1440+0+0 (normal left inverted right x axis y axis) 
1200mm x 340mm panning 5120x1440+0+0
   5120x1440 59.98 +  29.98*
   3840x1080 59.97 +
   2560x1080 60.0059.9459.98
   1920x1080 60.0060.0050.0059.94
   1920x1080i60.0050.0059.94
   1600x1200 60.00
   1280x1024 75.0260.02
   1280x800  59.81
   1152x864  75.00
   1280x720  60.0050.0059.94
   1024x768  75.0360.00
   800x600   75.0060.32
   720x576   50.00
   720x480   60.0059.94
   640x480   75.0060.0059.94
   720x400   70.08
HDMI-1 disconnected (normal left inverted right x axis y axis)
VGA-1 disconnected (normal left inverted right x axis y axis)


Let me know if you need additional debug information/etc.

Thanks,
 -Byron
microcode: microcode updated early to revision 0x21, date = 2019-02-13
Linux version 5.9.0 (r...@iss.comtime.lan) (gcc (Gentoo 10.2.0-r2 p3) 10.2.0, 
GNU ld (Gentoo 2.35.1 p1) 2.35.1) #2 SMP PREEMPT Tue Oct 13 14:54:36 EDT 2020
Command line: auto BOOT_IMAGE=cti ro root=801 cti
KERNEL supported cpus:
  Intel GenuineIntel
x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 
'standard' format.
BIOS-provided physical RAM map:
BIOS-e820: [mem 0x-0x0009e7ff] usable
BIOS-e820: [mem 0x0009e800-0x0009] reserved
BIOS-e820: [mem 0x000e-0x000f] reserved
BIOS-e820: [mem 0x0010-0xc9f09fff] usable
BIOS-e820: [mem 0xc9f0a000-0xc9ff] reserved
BIOS-e820: [mem 0xca00-0xca752fff] usable
BIOS-e820: [mem 0xca753000-0xca7f] reserved
BIOS-e820: [mem 0xca80-0xcafb1fff] usable
BIOS-e820: [mem 0xcafb2000-0xcaff] ACPI data
BIOS-e820: [mem 0xcb00-0xcc6fbfff] usable
BIOS-e820: [mem 0xcc6fc000-0xcc7f] ACPI NVS
BIOS-e820: [mem 0xcc80-0xcdda0fff] usable
BIOS-e820: [mem 0xcdda1000-0xce7a4fff] reserved
BIOS-e820: [mem 0xce7a5000-0xce7e7fff] ACPI NVS
BIOS-e820: [mem 0xce7e8000-0xcf2b2fff] usable
BIOS-e820: [mem 0xcf2b3000-0xcf7e] reserved
BIOS-e820: [mem 

Re: USBIP is claiming all my USB devices - Commit 7a2f2974f265 is broken

2020-10-03 Thread Byron Stanoszek

On Sat, 3 Oct 2020, Greg Kroah-Hartman wrote:


On Sat, Oct 03, 2020 at 01:18:36PM -0400, Byron Stanoszek wrote:

All,

I was testing Linux 5.9-rc7 today when I realized that none of my USB devices
were responding anymore. For instance, my mouse does not respond and its usual
red LED is not on.

Reverting git commit 7a2f2974f265 solved the problem for me.


Can you try the patches listed here:
https://lore.kernel.org/r/20201003142651.ga794...@kroah.com

As this issue should be solved with them.  Hopefully :)


I confirm this also solved the problem for me.

Thanks,
 -Byron



[PATCH] tmpfs: Restore functionality of nr_inodes=0

2020-09-01 Thread Byron Stanoszek
Commit e809d5f0b5c9 ("tmpfs: per-superblock i_ino support") made changes to
shmem_reserve_inode() in mm/shmem.c, however the original test for
(sbinfo->max_inodes) got dropped. This causes mounting tmpfs with option
nr_inodes=0 to fail:

  # mount -ttmpfs -onr_inodes=0 none /ext0
  mount: /ext0: mount(2) system call failed: Cannot allocate memory.

This patch restores the nr_inodes=0 functionality.

Fixes: e809d5f0b5c9 ("tmpfs: per-superblock i_ino support")
Cc: Chris Down 
Signed-off-by: Byron Stanoszek 
---
 mm/shmem.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 271548ca20f3..8e2b35ba93ad 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -279,11 +279,13 @@ static int shmem_reserve_inode(struct super_block *sb, 
ino_t *inop)
 
if (!(sb->s_flags & SB_KERNMOUNT)) {
spin_lock(>stat_lock);
-   if (!sbinfo->free_inodes) {
-   spin_unlock(>stat_lock);
-   return -ENOSPC;
+   if (sbinfo->max_inodes) {
+   if (!sbinfo->free_inodes) {
+   spin_unlock(>stat_lock);
+   return -ENOSPC;
+   }
+   sbinfo->free_inodes--;
}
-   sbinfo->free_inodes--;
if (inop) {
ino = sbinfo->next_ino++;
if (unlikely(is_zero_ino(ino)))
-- 
2.28.0



Re: [PATCH 4.18 00/79] 4.18.1-stable review

2018-08-15 Thread Byron Stanoszek

On Wed, 15 Aug 2018, Greg Kroah-Hartman wrote:


On Wed, Aug 15, 2018 at 01:24:25PM -0400, Byron Stanoszek wrote:

Hi Greg & Thomas,

I'd like to report a regression in Linux 4.18.1 regarding the L1TF patches.

The kernel no longer thinks I have SMT enabled in the BIOS. This works fine in
4.18.0.

Not sure if this matters, but in my particular 4-core system, my third core is
broken (core #2). So I must boot using "maxcpus=2" and then online the other
cores & SMT threads at startup using:

echo 1 > /sys/devices/system/cpu/cpu3/online
echo 1 > /sys/devices/system/cpu/cpu4/online
echo 1 > /sys/devices/system/cpu/cpu5/online
echo 1 > /sys/devices/system/cpu/cpu7/online

In 4.18.0, dmesg shows:

smpboot: Booting Node 0 Processor 3 APIC 0x6
smpboot: Booting Node 0 Processor 4 APIC 0x1
smpboot: Booting Node 0 Processor 5 APIC 0x3
smpboot: Booting Node 0 Processor 7 APIC 0x7

In 4.18.1, dmesg shows:

smpboot: Booting Node 0 Processor 3 APIC 0x6
smpboot: Booting Node 0 Processor 4 APIC 0x1
smpboot: CPU 4 is now offline
smpboot: Booting Node 0 Processor 5 APIC 0x3
smpboot: CPU 5 is now offline
smpboot: Booting Node 0 Processor 7 APIC 0x7
smpboot: CPU 7 is now offline

and I get an "Operation cancelled" error in the shell when trying to online 4,
5, and 7.

In 4.18.1, /sys/devices/system/cpu/smt/control says "notsupported".

 - - -

A possible second regression is the following:

My CPU normally runs at 3600 MHz. I usually run my CPU at 2800 MHz to keep from
overheating under full load (it is a fanless system). I do this by running
"echo 1 > /sys/class/thermal/cooling_device5/cur_state", and confirm with "cat
/proc/cpuinfo" (shows 2800).

This works in 4.18.0 but not in 4.18.1. I get no error from the "echo" command
(and the state reads back as "1"), but the CPU remains running at 3600 MHz.


How about Linus's tree at the moment, is it ok there?

thanks,

greg k-h



It also fails in Linus's tree. Seems like this logic is to blame:

/*
 * If SMT was disabled by BIOS, detect it here, after the CPUs have been
 * brought online. This ensures the smt/l1tf sysfs entries are consistent
 * with reality. cpu_smt_available is set to true during the bringup of non
 * boot CPUs when a SMT sibling is detected. Note, this may overwrite
 * cpu_smt_control's previous setting.
 */
void __init cpu_smt_check_topology(void)
{
if (!cpu_smt_available)
cpu_smt_control = CPU_SMT_NOT_SUPPORTED;
}

SMT is enabled in my BIOS, but because I booted with maxcpus=2, the init code
never officially booted any SMT thread yet--just primary threads. I suspect the
line 'cpu_smt_available = true;' in kernel/cpu.c function cpu_smt_allowed() is
never being reached.

It is then impossible to boot any SMT thread after init is done, since
'cpu_smt_available' is false.

The following test patch makes everything work for me on both mainline and
4.18.1, but we might as well throw out 'cpu_smt_available' altogether then (or
find another way to set it appropriately). Just because we didn't boot any SMT
threads at init shouldn't mean that SMT is disabled by the BIOS.

---

x86/l1tf: Fix booting with low maxcpus=# causes SMT to be disabled

If maxcpus=# is given on the kernel command line where # is too low for
any SMT CPU (thread) to be booted during init, then the kernel thinks
SMT is disabled by the BIOS. SMT threads are then unable to be manually
brought online later after init.

Set 'cpu_smt_available' early on in init instead, if
topology_smt_supported() returns true.

Fixes: 958f338e96f8 ("Merge branch 'l1tf-final' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
Cc: Thomas Gleixner 
Signed-off-by: Byron Stanoszek 
---
 kernel/cpu.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index ed44d7d34c2d..bf9be8b8c0a0 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -378,6 +378,8 @@ void __init cpu_smt_check_topology_early(void)
 {
if (!topology_smt_supported())
cpu_smt_control = CPU_SMT_NOT_SUPPORTED;
+   else
+   cpu_smt_available = true;
 }

 /*
@@ -405,14 +407,6 @@ static inline bool cpu_smt_allowed(unsigned int cpu)
if (topology_is_primary_thread(cpu))
return true;

-   /*
-* If the CPU is not a 'primary' thread and the booted_once bit is
-* set then the processor has SMT support. Store this information
-* for the late check of SMT support in cpu_smt_check_topology().
-*/
-   if (per_cpu(cpuhp_state, cpu).booted_once)
-   cpu_smt_available = true;
-
if (cpu_smt_control == CPU_SMT_ENABLED)
return true;

--
2.18.0


Re: [PATCH 4.18 00/79] 4.18.1-stable review

2018-08-15 Thread Byron Stanoszek

On Wed, 15 Aug 2018, Greg Kroah-Hartman wrote:


On Wed, Aug 15, 2018 at 01:24:25PM -0400, Byron Stanoszek wrote:

Hi Greg & Thomas,

I'd like to report a regression in Linux 4.18.1 regarding the L1TF patches.

The kernel no longer thinks I have SMT enabled in the BIOS. This works fine in
4.18.0.

Not sure if this matters, but in my particular 4-core system, my third core is
broken (core #2). So I must boot using "maxcpus=2" and then online the other
cores & SMT threads at startup using:

echo 1 > /sys/devices/system/cpu/cpu3/online
echo 1 > /sys/devices/system/cpu/cpu4/online
echo 1 > /sys/devices/system/cpu/cpu5/online
echo 1 > /sys/devices/system/cpu/cpu7/online

In 4.18.0, dmesg shows:

smpboot: Booting Node 0 Processor 3 APIC 0x6
smpboot: Booting Node 0 Processor 4 APIC 0x1
smpboot: Booting Node 0 Processor 5 APIC 0x3
smpboot: Booting Node 0 Processor 7 APIC 0x7

In 4.18.1, dmesg shows:

smpboot: Booting Node 0 Processor 3 APIC 0x6
smpboot: Booting Node 0 Processor 4 APIC 0x1
smpboot: CPU 4 is now offline
smpboot: Booting Node 0 Processor 5 APIC 0x3
smpboot: CPU 5 is now offline
smpboot: Booting Node 0 Processor 7 APIC 0x7
smpboot: CPU 7 is now offline

and I get an "Operation cancelled" error in the shell when trying to online 4,
5, and 7.

In 4.18.1, /sys/devices/system/cpu/smt/control says "notsupported".

 - - -

A possible second regression is the following:

My CPU normally runs at 3600 MHz. I usually run my CPU at 2800 MHz to keep from
overheating under full load (it is a fanless system). I do this by running
"echo 1 > /sys/class/thermal/cooling_device5/cur_state", and confirm with "cat
/proc/cpuinfo" (shows 2800).

This works in 4.18.0 but not in 4.18.1. I get no error from the "echo" command
(and the state reads back as "1"), but the CPU remains running at 3600 MHz.


How about Linus's tree at the moment, is it ok there?

thanks,

greg k-h



It also fails in Linus's tree. Seems like this logic is to blame:

/*
 * If SMT was disabled by BIOS, detect it here, after the CPUs have been
 * brought online. This ensures the smt/l1tf sysfs entries are consistent
 * with reality. cpu_smt_available is set to true during the bringup of non
 * boot CPUs when a SMT sibling is detected. Note, this may overwrite
 * cpu_smt_control's previous setting.
 */
void __init cpu_smt_check_topology(void)
{
if (!cpu_smt_available)
cpu_smt_control = CPU_SMT_NOT_SUPPORTED;
}

SMT is enabled in my BIOS, but because I booted with maxcpus=2, the init code
never officially booted any SMT thread yet--just primary threads. I suspect the
line 'cpu_smt_available = true;' in kernel/cpu.c function cpu_smt_allowed() is
never being reached.

It is then impossible to boot any SMT thread after init is done, since
'cpu_smt_available' is false.

The following test patch makes everything work for me on both mainline and
4.18.1, but we might as well throw out 'cpu_smt_available' altogether then (or
find another way to set it appropriately). Just because we didn't boot any SMT
threads at init shouldn't mean that SMT is disabled by the BIOS.

---

x86/l1tf: Fix booting with low maxcpus=# causes SMT to be disabled

If maxcpus=# is given on the kernel command line where # is too low for
any SMT CPU (thread) to be booted during init, then the kernel thinks
SMT is disabled by the BIOS. SMT threads are then unable to be manually
brought online later after init.

Set 'cpu_smt_available' early on in init instead, if
topology_smt_supported() returns true.

Fixes: 958f338e96f8 ("Merge branch 'l1tf-final' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
Cc: Thomas Gleixner 
Signed-off-by: Byron Stanoszek 
---
 kernel/cpu.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index ed44d7d34c2d..bf9be8b8c0a0 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -378,6 +378,8 @@ void __init cpu_smt_check_topology_early(void)
 {
if (!topology_smt_supported())
cpu_smt_control = CPU_SMT_NOT_SUPPORTED;
+   else
+   cpu_smt_available = true;
 }

 /*
@@ -405,14 +407,6 @@ static inline bool cpu_smt_allowed(unsigned int cpu)
if (topology_is_primary_thread(cpu))
return true;

-   /*
-* If the CPU is not a 'primary' thread and the booted_once bit is
-* set then the processor has SMT support. Store this information
-* for the late check of SMT support in cpu_smt_check_topology().
-*/
-   if (per_cpu(cpuhp_state, cpu).booted_once)
-   cpu_smt_available = true;
-
if (cpu_smt_control == CPU_SMT_ENABLED)
return true;

--
2.18.0


Re: [PATCH 4.18 00/79] 4.18.1-stable review

2018-08-15 Thread Byron Stanoszek

Hi Greg & Thomas,

I'd like to report a regression in Linux 4.18.1 regarding the L1TF patches.

The kernel no longer thinks I have SMT enabled in the BIOS. This works fine in
4.18.0.

Not sure if this matters, but in my particular 4-core system, my third core is
broken (core #2). So I must boot using "maxcpus=2" and then online the other
cores & SMT threads at startup using:

echo 1 > /sys/devices/system/cpu/cpu3/online
echo 1 > /sys/devices/system/cpu/cpu4/online
echo 1 > /sys/devices/system/cpu/cpu5/online
echo 1 > /sys/devices/system/cpu/cpu7/online

In 4.18.0, dmesg shows:

smpboot: Booting Node 0 Processor 3 APIC 0x6
smpboot: Booting Node 0 Processor 4 APIC 0x1
smpboot: Booting Node 0 Processor 5 APIC 0x3
smpboot: Booting Node 0 Processor 7 APIC 0x7

In 4.18.1, dmesg shows:

smpboot: Booting Node 0 Processor 3 APIC 0x6
smpboot: Booting Node 0 Processor 4 APIC 0x1
smpboot: CPU 4 is now offline
smpboot: Booting Node 0 Processor 5 APIC 0x3
smpboot: CPU 5 is now offline
smpboot: Booting Node 0 Processor 7 APIC 0x7
smpboot: CPU 7 is now offline

and I get an "Operation cancelled" error in the shell when trying to online 4,
5, and 7.

In 4.18.1, /sys/devices/system/cpu/smt/control says "notsupported".

 - - -

A possible second regression is the following:

My CPU normally runs at 3600 MHz. I usually run my CPU at 2800 MHz to keep from
overheating under full load (it is a fanless system). I do this by running
"echo 1 > /sys/class/thermal/cooling_device5/cur_state", and confirm with "cat
/proc/cpuinfo" (shows 2800).

This works in 4.18.0 but not in 4.18.1. I get no error from the "echo" command
(and the state reads back as "1"), but the CPU remains running at 3600 MHz.

Thanks,
 -Byron



Re: [PATCH 4.18 00/79] 4.18.1-stable review

2018-08-15 Thread Byron Stanoszek

Hi Greg & Thomas,

I'd like to report a regression in Linux 4.18.1 regarding the L1TF patches.

The kernel no longer thinks I have SMT enabled in the BIOS. This works fine in
4.18.0.

Not sure if this matters, but in my particular 4-core system, my third core is
broken (core #2). So I must boot using "maxcpus=2" and then online the other
cores & SMT threads at startup using:

echo 1 > /sys/devices/system/cpu/cpu3/online
echo 1 > /sys/devices/system/cpu/cpu4/online
echo 1 > /sys/devices/system/cpu/cpu5/online
echo 1 > /sys/devices/system/cpu/cpu7/online

In 4.18.0, dmesg shows:

smpboot: Booting Node 0 Processor 3 APIC 0x6
smpboot: Booting Node 0 Processor 4 APIC 0x1
smpboot: Booting Node 0 Processor 5 APIC 0x3
smpboot: Booting Node 0 Processor 7 APIC 0x7

In 4.18.1, dmesg shows:

smpboot: Booting Node 0 Processor 3 APIC 0x6
smpboot: Booting Node 0 Processor 4 APIC 0x1
smpboot: CPU 4 is now offline
smpboot: Booting Node 0 Processor 5 APIC 0x3
smpboot: CPU 5 is now offline
smpboot: Booting Node 0 Processor 7 APIC 0x7
smpboot: CPU 7 is now offline

and I get an "Operation cancelled" error in the shell when trying to online 4,
5, and 7.

In 4.18.1, /sys/devices/system/cpu/smt/control says "notsupported".

 - - -

A possible second regression is the following:

My CPU normally runs at 3600 MHz. I usually run my CPU at 2800 MHz to keep from
overheating under full load (it is a fanless system). I do this by running
"echo 1 > /sys/class/thermal/cooling_device5/cur_state", and confirm with "cat
/proc/cpuinfo" (shows 2800).

This works in 4.18.0 but not in 4.18.1. I get no error from the "echo" command
(and the state reads back as "1"), but the CPU remains running at 3600 MHz.

Thanks,
 -Byron



Re: Standalone DRM application

2013-04-19 Thread Byron Stanoszek

On Thu, 18 Apr 2013, David Herrmann wrote:


You can acquire/drop DRM-Master via drmSetMaster/drmDropMaster.

If your DRM card is a PCI device, you can use the sysfs "boot_vga"
attribute of the parent PCI device.
(/sys/class/drm/card0/device/boot_vga)


David,

Thanks! That was exactly what I was looking for. Both ideas work wonderfully.

Regards,
 -Byron

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Standalone DRM application

2013-04-19 Thread Byron Stanoszek

On Thu, 18 Apr 2013, David Herrmann wrote:


You can acquire/drop DRM-Master via drmSetMaster/drmDropMaster.

If your DRM card is a PCI device, you can use the sysfs boot_vga
attribute of the parent PCI device.
(/sys/class/drm/card0/device/boot_vga)


David,

Thanks! That was exactly what I was looking for. Both ideas work wonderfully.

Regards,
 -Byron

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Standalone DRM application

2013-04-17 Thread Byron Stanoszek

David,

I'm developing a small application that uses libdrm (DRM ioctls) to change the
resolution of a single graphics display and show a framebuffer. I've run into
two problems with this implementation that I'm hoping you can address.


1. Each application is its own process, which is designed to control 1 graphics
display. This is unlike X, for instance, which could be configured to grab all
of the displays in the system at once.

Depending on our stackup, there can be as many as 4 displays connected to a
single graphics card. One process could open /dev/dri/card0 and call
drmModeSetCrtc() to initialize one of its displays to the requested resolution.
However, whenever a second process calls drmModeSetCrtc() to control a second
display on the same card, it gets -EPERM back from the ioctl.

I've traced this down to the following line in linux/drivers/gpu/drm/drm_drv.c:

DRM_IOCTL_DEF(DRM_IOCTL_MODE_SETCRTC, drm_mode_setcrtc, 
DRM_MASTER|DRM_CONTROL_ALLOW|DRM_UNLOCKED),

If I remove the DRM_MASTER flag, then my application behaves correctly, and 4
separate processes can then control each individual display on the card without
issue.

My question is, is there any real benefit to restricting drm_mode_setcrtc()
with DRM_MASTER, or can we lose this flag in order to support one-process-per-
display programs like the above?


2. My application has the design requirement that "screen 1" always refers to
the card that was initialized by the PC BIOS for bootup. This is the same card
that the Linux Console framebuffer will come up on by default, and therefore
extra processing is required to handle VT switches (e.g. pause the display,
restore original CRTC mode, etc.)

Depending on the "Boot Display First [Onboard] or [PCI Slot]" option in the
BIOS, this might mean either /dev/dri/card0 or /dev/dri/card1 becomes the
default VGA card, as set by the vga_set_default_device() call in
arch/x86/pci/fixup.c.

Is there a way in userspace to identify which card# is the default card? Or
alternatively, is there some way to get the underlying PCI bus/slot ID from a
/dev/dri/card# device.

Thanks,
 -Byron

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Standalone DRM application

2013-04-17 Thread Byron Stanoszek

David,

I'm developing a small application that uses libdrm (DRM ioctls) to change the
resolution of a single graphics display and show a framebuffer. I've run into
two problems with this implementation that I'm hoping you can address.


1. Each application is its own process, which is designed to control 1 graphics
display. This is unlike X, for instance, which could be configured to grab all
of the displays in the system at once.

Depending on our stackup, there can be as many as 4 displays connected to a
single graphics card. One process could open /dev/dri/card0 and call
drmModeSetCrtc() to initialize one of its displays to the requested resolution.
However, whenever a second process calls drmModeSetCrtc() to control a second
display on the same card, it gets -EPERM back from the ioctl.

I've traced this down to the following line in linux/drivers/gpu/drm/drm_drv.c:

DRM_IOCTL_DEF(DRM_IOCTL_MODE_SETCRTC, drm_mode_setcrtc, 
DRM_MASTER|DRM_CONTROL_ALLOW|DRM_UNLOCKED),

If I remove the DRM_MASTER flag, then my application behaves correctly, and 4
separate processes can then control each individual display on the card without
issue.

My question is, is there any real benefit to restricting drm_mode_setcrtc()
with DRM_MASTER, or can we lose this flag in order to support one-process-per-
display programs like the above?


2. My application has the design requirement that screen 1 always refers to
the card that was initialized by the PC BIOS for bootup. This is the same card
that the Linux Console framebuffer will come up on by default, and therefore
extra processing is required to handle VT switches (e.g. pause the display,
restore original CRTC mode, etc.)

Depending on the Boot Display First [Onboard] or [PCI Slot] option in the
BIOS, this might mean either /dev/dri/card0 or /dev/dri/card1 becomes the
default VGA card, as set by the vga_set_default_device() call in
arch/x86/pci/fixup.c.

Is there a way in userspace to identify which card# is the default card? Or
alternatively, is there some way to get the underlying PCI bus/slot ID from a
/dev/dri/card# device.

Thanks,
 -Byron

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] rootmpfs

2013-04-05 Thread Byron Stanoszek

Rob,

FWIW I have a patch to do something like this. It even gives you a rdsize=xxx
tunable kernel parameter that lets you specify the size of the tmpfs, which
acts like the -osize= mount flag (so phrases like 100M or 20% works). So doing
things like 'cat /dev/zero > filename' will not run you out of all available
memory. (Note: If you don't specify rdsize= on the kernel command line, it will
not convert rootfs to tmpfs).

See attached.

 -Byron

On Wed, 3 Apr 2013, Rob Landley wrote:

Attached is my quick and dirty hack to make rootfs be tmpfs when CONFIG_TMPFS 
is
enabled. It can't be this easy or somebody would have done it in the 
_eight_years_

since https://lkml.org/lkml/2006/7/31/145

Yes, it's got an #ifdef and out of place prototypes. Yes, it manually calls a 
module
init function and compensates by making it reentrant. But it works, and when 
I
"cat /dev/zero > filename" the filesystem fills _up_ instead of panicing the 
kernel.


So now that I've posted the error, would someone please tell me how I 
_should_ have done it?


Rob

P.S. If I actually change the filesystem type to a name other than "rootfs", 
it panics on the way up because various bits of the kernel are looking for 
that magic name. Sigh.


P.P.S. removing MS_NOUSER is actually intentional, there's a local cray patch 
that does the same thing because otherwise you can't --bind mount directories 
out of this filesystem, which is a thing they wanted to do.--

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
--- linux/init/initramfs.c.orig	2012-12-05 21:39:09.0 -0500
+++ linux/init/initramfs.c	2012-12-10 12:15:49.0 -0500
@@ -569,14 +569,56 @@
 }
 #endif
 
+static char * __initdata ramdisk_size;
+
+static int __init rdsize_setup(char *str)
+{
+	ramdisk_size = str;
+	return 1;
+}
+__setup("rdsize=", rdsize_setup);
+
+static int __init change_root_to_tmpfs(void)
+{
+	char size[38], *s;
+
+	sprintf(size, "size=%.20s,nr_inodes=0", ramdisk_size);
+	if ((s = strchr(size, ',')))
+		*s = '\0';
+
+	if (!sys_mkdir("/root", 0700) &&
+	!sys_mount("/dev/root", "/root", "tmpfs", 0, size) &&
+	!sys_chdir("/root") &&
+	!sys_mount(".", "/", NULL, MS_MOVE, NULL) &&
+	!sys_chroot("."))
+	return 0;
+
+	panic("Failed to mount tmpfs as root filesystem");
+}
+
+
 static int __init populate_rootfs(void)
 {
-	char *err = unpack_to_rootfs(__initramfs_start, __initramfs_size);
+	char *err;
+
+	if (ramdisk_size)
+		change_root_to_tmpfs();
+
+	err = unpack_to_rootfs(__initramfs_start, __initramfs_size);
 	if (err)
 		panic(err);	/* Failed to decompress INTERNAL initramfs */
 	if (initrd_start) {
 #ifdef CONFIG_BLK_DEV_RAM
 		int fd;
+		if (ramdisk_size) {
+			printk(KERN_INFO "Unpacking initramfs...\n");
+			err = unpack_to_rootfs((char *)initrd_start,
+initrd_end - initrd_start);
+			if (err)
+panic(err);
+			free_initrd();
+			return 0;
+		}
 		printk(KERN_INFO "Trying to unpack rootfs image as initramfs...\n");
 		err = unpack_to_rootfs((char *)initrd_start,
 			initrd_end - initrd_start);


Re: [RFC] rootmpfs

2013-04-05 Thread Byron Stanoszek

Rob,

FWIW I have a patch to do something like this. It even gives you a rdsize=xxx
tunable kernel parameter that lets you specify the size of the tmpfs, which
acts like the -osize= mount flag (so phrases like 100M or 20% works). So doing
things like 'cat /dev/zero  filename' will not run you out of all available
memory. (Note: If you don't specify rdsize= on the kernel command line, it will
not convert rootfs to tmpfs).

See attached.

 -Byron

On Wed, 3 Apr 2013, Rob Landley wrote:

Attached is my quick and dirty hack to make rootfs be tmpfs when CONFIG_TMPFS 
is
enabled. It can't be this easy or somebody would have done it in the 
_eight_years_

since https://lkml.org/lkml/2006/7/31/145

Yes, it's got an #ifdef and out of place prototypes. Yes, it manually calls a 
module
init function and compensates by making it reentrant. But it works, and when 
I
cat /dev/zero  filename the filesystem fills _up_ instead of panicing the 
kernel.


So now that I've posted the error, would someone please tell me how I 
_should_ have done it?


Rob

P.S. If I actually change the filesystem type to a name other than rootfs, 
it panics on the way up because various bits of the kernel are looking for 
that magic name. Sigh.


P.P.S. removing MS_NOUSER is actually intentional, there's a local cray patch 
that does the same thing because otherwise you can't --bind mount directories 
out of this filesystem, which is a thing they wanted to do.--

To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
--- linux/init/initramfs.c.orig	2012-12-05 21:39:09.0 -0500
+++ linux/init/initramfs.c	2012-12-10 12:15:49.0 -0500
@@ -569,14 +569,56 @@
 }
 #endif
 
+static char * __initdata ramdisk_size;
+
+static int __init rdsize_setup(char *str)
+{
+	ramdisk_size = str;
+	return 1;
+}
+__setup(rdsize=, rdsize_setup);
+
+static int __init change_root_to_tmpfs(void)
+{
+	char size[38], *s;
+
+	sprintf(size, size=%.20s,nr_inodes=0, ramdisk_size);
+	if ((s = strchr(size, ',')))
+		*s = '\0';
+
+	if (!sys_mkdir(/root, 0700) 
+	!sys_mount(/dev/root, /root, tmpfs, 0, size) 
+	!sys_chdir(/root) 
+	!sys_mount(., /, NULL, MS_MOVE, NULL) 
+	!sys_chroot(.))
+	return 0;
+
+	panic(Failed to mount tmpfs as root filesystem);
+}
+
+
 static int __init populate_rootfs(void)
 {
-	char *err = unpack_to_rootfs(__initramfs_start, __initramfs_size);
+	char *err;
+
+	if (ramdisk_size)
+		change_root_to_tmpfs();
+
+	err = unpack_to_rootfs(__initramfs_start, __initramfs_size);
 	if (err)
 		panic(err);	/* Failed to decompress INTERNAL initramfs */
 	if (initrd_start) {
 #ifdef CONFIG_BLK_DEV_RAM
 		int fd;
+		if (ramdisk_size) {
+			printk(KERN_INFO Unpacking initramfs...\n);
+			err = unpack_to_rootfs((char *)initrd_start,
+initrd_end - initrd_start);
+			if (err)
+panic(err);
+			free_initrd();
+			return 0;
+		}
 		printk(KERN_INFO Trying to unpack rootfs image as initramfs...\n);
 		err = unpack_to_rootfs((char *)initrd_start,
 			initrd_end - initrd_start);


Re: [PATCH 00/33] Swap over NFS -v14

2007-10-31 Thread Byron Stanoszek

On Wed, 31 Oct 2007, Nick Piggin wrote:


On Wednesday 31 October 2007 15:37, David Miller wrote:

From: Nick Piggin <[EMAIL PROTECTED]>
Date: Wed, 31 Oct 2007 14:26:32 +1100


Is it really worth all the added complexity of making swap
over NFS files work, given that you could use a network block
device instead?


Don't be misled.  Swapping over NFS is just a scarecrow for the
seemingly real impetus behind these changes which is network storage
stuff like iSCSI.


Oh, I'm OK with the network reserves stuff (not the actual patch,
which I'm not really qualified to review, but at least the idea
of it...).

And also I'm not as such against the idea of swap over network.

However, specifically the change to make swapfiles work through
the filesystem layer (ATM it goes straight to the block layer,
modulo some initialisation stuff which uses block filesystem-
specific calls).

I mean, I assume that anybody trying to swap over network *today*
has to be using a network block device anyway, so the idea of
just being able to transparently improve that case seems better
than adding new complexities for seemingly not much gain.


I have some embedded diskless devices that have 16 MB of RAM and >500MB of
swap. Its root fs and swap device are both done over NBD because NFS is too
expensive in 16MB of RAM. Any memory contention (i.e needing memory to swap
memory over the network), however infrequent, causes the system to freeze when
about 50 MB of VM is used up. I would love to see some work done in this area.

 -Byron

--
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/33] Swap over NFS -v14

2007-10-31 Thread Byron Stanoszek

On Wed, 31 Oct 2007, Nick Piggin wrote:


On Wednesday 31 October 2007 15:37, David Miller wrote:

From: Nick Piggin [EMAIL PROTECTED]
Date: Wed, 31 Oct 2007 14:26:32 +1100


Is it really worth all the added complexity of making swap
over NFS files work, given that you could use a network block
device instead?


Don't be misled.  Swapping over NFS is just a scarecrow for the
seemingly real impetus behind these changes which is network storage
stuff like iSCSI.


Oh, I'm OK with the network reserves stuff (not the actual patch,
which I'm not really qualified to review, but at least the idea
of it...).

And also I'm not as such against the idea of swap over network.

However, specifically the change to make swapfiles work through
the filesystem layer (ATM it goes straight to the block layer,
modulo some initialisation stuff which uses block filesystem-
specific calls).

I mean, I assume that anybody trying to swap over network *today*
has to be using a network block device anyway, so the idea of
just being able to transparently improve that case seems better
than adding new complexities for seemingly not much gain.


I have some embedded diskless devices that have 16 MB of RAM and 500MB of
swap. Its root fs and swap device are both done over NBD because NFS is too
expensive in 16MB of RAM. Any memory contention (i.e needing memory to swap
memory over the network), however infrequent, causes the system to freeze when
about 50 MB of VM is used up. I would love to see some work done in this area.

 -Byron

--
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Introduce O_CLOEXEC (take >2)

2007-06-01 Thread Byron Stanoszek

On Thu, 31 May 2007, Kyle McMartin wrote:


On Fri, Jun 01, 2007 at 11:38:40AM +1000, Stephen Rothwell wrote:

This also breaks Alpha (which uses 0200 for O_DIRECT) and parisc
(which uses 0200 for O_RSYNC).  So you ether need to choose a
different value or define O_CLOEXEC for those two architectures.



That's easy enough to fix...

Signed-off-by: Kyle McMartin <[EMAIL PROTECTED]>

diff --git a/include/asm-parisc/fcntl.h b/include/asm-parisc/fcntl.h
index 317851f..4ca0fb0 100644
--- a/include/asm-parisc/fcntl.h
+++ b/include/asm-parisc/fcntl.h
@@ -14,6 +14,7 @@
#define O_DSYNC 0100 /* HPUX only */
#define O_RSYNC 0200 /* HPUX only */
#define O_NOATIME   0400
+#define O_CLOEXEC  0800 /* set close_on_exec */

#define O_DIRECTORY 0001 /* must be a directory */
#define O_NOFOLLOW  0200 /* don't follow links */


These are octal values, so you really want to use 01000 instead of
0800. :-)

While looking at that file further, I noticed these two flags share the same
value. I don't know DMAPI/XDSM, but could they potentially conflict?

#define O_NOATIME   0400
#define O_INVISIBLE 0400 /* invisible I/O, for DMAPI/XDSM */

Regards,
 -Byron

--
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Introduce O_CLOEXEC (take 2)

2007-06-01 Thread Byron Stanoszek

On Thu, 31 May 2007, Kyle McMartin wrote:


On Fri, Jun 01, 2007 at 11:38:40AM +1000, Stephen Rothwell wrote:

This also breaks Alpha (which uses 0200 for O_DIRECT) and parisc
(which uses 0200 for O_RSYNC).  So you ether need to choose a
different value or define O_CLOEXEC for those two architectures.



That's easy enough to fix...

Signed-off-by: Kyle McMartin [EMAIL PROTECTED]

diff --git a/include/asm-parisc/fcntl.h b/include/asm-parisc/fcntl.h
index 317851f..4ca0fb0 100644
--- a/include/asm-parisc/fcntl.h
+++ b/include/asm-parisc/fcntl.h
@@ -14,6 +14,7 @@
#define O_DSYNC 0100 /* HPUX only */
#define O_RSYNC 0200 /* HPUX only */
#define O_NOATIME   0400
+#define O_CLOEXEC  0800 /* set close_on_exec */

#define O_DIRECTORY 0001 /* must be a directory */
#define O_NOFOLLOW  0200 /* don't follow links */


These are octal values, so you really want to use 01000 instead of
0800. :-)

While looking at that file further, I noticed these two flags share the same
value. I don't know DMAPI/XDSM, but could they potentially conflict?

#define O_NOATIME   0400
#define O_INVISIBLE 0400 /* invisible I/O, for DMAPI/XDSM */

Regards,
 -Byron

--
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Q: Shared Memory vs. Ramdisk?

2005-02-23 Thread Byron Stanoszek
Hi all,
I have an application on x86-64 that will require me sharing two memory
segments upwards of 10+ GB each among several processes. Would it be better
performance-wise to mmap in two files from a tmpfs filesystem, or, create two
large ramdisks (/dev/ram0 & /dev/ram1) and mmap those in?
I'm not concerned about swap, but rather just trying to avoid as much kernel
overhead as possible while accessing gobs of memory.
Thanks,
 -Byron
--
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Q: Shared Memory vs. Ramdisk?

2005-02-23 Thread Byron Stanoszek
Hi all,
I have an application on x86-64 that will require me sharing two memory
segments upwards of 10+ GB each among several processes. Would it be better
performance-wise to mmap in two files from a tmpfs filesystem, or, create two
large ramdisks (/dev/ram0  /dev/ram1) and mmap those in?
I'm not concerned about swap, but rather just trying to avoid as much kernel
overhead as possible while accessing gobs of memory.
Thanks,
 -Byron
--
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[OOPS] insmod, 2.4.6-ac1

2001-07-05 Thread Byron Stanoszek
: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
PIIX4: IDE controller on PCI bus 00 dev 39
PIIX4: chipset revision 1
PIIX4: not 100% native mode: will probe irqs later
ide1: BM-DMA at 0x14a8-0x14af, BIOS settings: hdc:DMA, hdd:pio
hdc: Lite-On LTN483S 48x Max, ATAPI CD/DVD-ROM drive
hdd: PLEXTOR CD-R PX-W1210A, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
hdc: ATAPI 48X CD-ROM drive, 120kB Cache
Uniform CD-ROM driver Revision: 3.12
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
loop: loaded (max 8 devices)
PCI: Found IRQ 11 for device 00:0d.0
PCI: The same IRQ used for device 01:00.0
3c59x.c:LK1.1.13 27 Jan 2001  Donald Becker and others. 
http://www.scyld.com/network/vortex.html
See Documentation/networking/vortex.txt
eth0: 3Com PCI 3c905C Tornado at 0x1400,  00:01:02:35:2e:06, IRQ 11
  product code 454e rev 00.14 date 02-10-00
  8K byte-wide RAM 5:3 Rx:Tx split, autoselect/Autonegotiate interface.
  MII transceiver found at address 24, status 782d.
  Enabling bus-master transmits and whole-frame receives.
eth0: scatter/gather enabled. h/w checksums enabled
SCSI subsystem driver Revision: 1.00
PCI: Found IRQ 10 for device 00:0f.0
scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.1.13

aic7890/91: Ultra2 Wide Channel A, SCSI Id=7, 32/255 SCBs

  Vendor: QUANTUM   Model: ATLAS 10K 18WLS   Rev: UCH0
  Type:   Direct-Access  ANSI SCSI revision: 03
(scsi0:A:2): 80.000MB/s transfers (40.000MHz, offset 31, 16bit)
scsi0:0:2:0: Tagged Queuing enabled.  Depth 8
scsi1 : SCSI host adapter emulation for IDE ATAPI devices
  Vendor: PLEXTOR   Model: CD-R   PX-W1210A  Rev: 1.02
  Type:   CD-ROM ANSI SCSI revision: 02
Attached scsi disk sda at scsi0, channel 0, id 2, lun 0
SCSI device sda: 35566499 512-byte hdwr sectors (18210 MB)
Partition check:
 sda: sda1 sda2 sda3
Attached scsi CD-ROM sr0 at scsi1, channel 0, id 0, lun 0
sr0: scsi3-mmc drive: 32x/32x writer cd/rw xa/form2 cdda tray
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP
IP: routing cache hash table of 1024 buckets, 8Kbytes
TCP: Hash tables configured (established 8192 bind 8192)
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
fatfs: bogus logical sector size 0
You didn't specify the type of your ufs filesystem

mount -t ufs -o ufstype=sun|sunx86|44bsd|old|hp|nextstep|netxstep-cd|openstep ...

>>>WARNING<<< Wrong ufstype may corrupt your filesystem, default is ufstype=old
ufs_read_super: bad magic number
reiserfs: checking transaction log (device 08:03) ...
Using r5 hash to sort names
ReiserFS version 3.6.25
VFS: Mounted root (reiserfs filesystem) readonly.
Freeing unused kernel memory: 196k freed
Adding Swap: 136544k swap-space (priority -1)

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[OOPS] insmod, 2.4.6-ac1

2001-07-05 Thread Byron Stanoszek
).
Serial driver version 5.05b (2001-05-03) with MANY_PORTS SHARE_IRQ SERIAL_PCI enabled
ttyS00 at 0x03f8 (irq = 4) is a 16550A
block: queued sectors max/low 83888kB/27962kB, 256 slots per queue
Uniform Multi-Platform E-IDE driver Revision: 6.31
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
PIIX4: IDE controller on PCI bus 00 dev 39
PIIX4: chipset revision 1
PIIX4: not 100% native mode: will probe irqs later
ide1: BM-DMA at 0x14a8-0x14af, BIOS settings: hdc:DMA, hdd:pio
hdc: Lite-On LTN483S 48x Max, ATAPI CD/DVD-ROM drive
hdd: PLEXTOR CD-R PX-W1210A, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
hdc: ATAPI 48X CD-ROM drive, 120kB Cache
Uniform CD-ROM driver Revision: 3.12
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
loop: loaded (max 8 devices)
PCI: Found IRQ 11 for device 00:0d.0
PCI: The same IRQ used for device 01:00.0
3c59x.c:LK1.1.13 27 Jan 2001  Donald Becker and others. 
http://www.scyld.com/network/vortex.html
See Documentation/networking/vortex.txt
eth0: 3Com PCI 3c905C Tornado at 0x1400,  00:01:02:35:2e:06, IRQ 11
  product code 454e rev 00.14 date 02-10-00
  8K byte-wide RAM 5:3 Rx:Tx split, autoselect/Autonegotiate interface.
  MII transceiver found at address 24, status 782d.
  Enabling bus-master transmits and whole-frame receives.
eth0: scatter/gather enabled. h/w checksums enabled
SCSI subsystem driver Revision: 1.00
PCI: Found IRQ 10 for device 00:0f.0
scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.1.13
Adaptec 2940 Ultra2 SCSI adapter (OEM)
aic7890/91: Ultra2 Wide Channel A, SCSI Id=7, 32/255 SCBs

  Vendor: QUANTUM   Model: ATLAS 10K 18WLS   Rev: UCH0
  Type:   Direct-Access  ANSI SCSI revision: 03
(scsi0:A:2): 80.000MB/s transfers (40.000MHz, offset 31, 16bit)
scsi0:0:2:0: Tagged Queuing enabled.  Depth 8
scsi1 : SCSI host adapter emulation for IDE ATAPI devices
  Vendor: PLEXTOR   Model: CD-R   PX-W1210A  Rev: 1.02
  Type:   CD-ROM ANSI SCSI revision: 02
Attached scsi disk sda at scsi0, channel 0, id 2, lun 0
SCSI device sda: 35566499 512-byte hdwr sectors (18210 MB)
Partition check:
 sda: sda1 sda2 sda3
Attached scsi CD-ROM sr0 at scsi1, channel 0, id 0, lun 0
sr0: scsi3-mmc drive: 32x/32x writer cd/rw xa/form2 cdda tray
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP
IP: routing cache hash table of 1024 buckets, 8Kbytes
TCP: Hash tables configured (established 8192 bind 8192)
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
fatfs: bogus logical sector size 0
You didn't specify the type of your ufs filesystem

mount -t ufs -o ufstype=sun|sunx86|44bsd|old|hp|nextstep|netxstep-cd|openstep ...

WARNING Wrong ufstype may corrupt your filesystem, default is ufstype=old
ufs_read_super: bad magic number
reiserfs: checking transaction log (device 08:03) ...
Using r5 hash to sort names
ReiserFS version 3.6.25
VFS: Mounted root (reiserfs filesystem) readonly.
Freeing unused kernel memory: 196k freed
Adding Swap: 136544k swap-space (priority -1)

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Linux 2.4.3-ac9

2001-04-17 Thread Byron Stanoszek

On Wed, 18 Apr 2001, Jason Thomas wrote:

> Alan,
> 
> This does not seem to fix the problem with "clock timer", which
> repeatedly prints the following message:
> 
> probable hardware bug: clock timer configuration lost - probably a VIA686a 
>motherboard.
> probable hardware bug: restoring chip configuration.
> 
> The machine does not get any further than printing the above message.
> This message only appears with an SMP kernel, there are no ide devices
> in the machine.

I've seen this on my Dell P3 700 machine several times. Seems to happen at odd
intervals after I use my CD burner, but that just might be coincidental. But
I'd like to point out that I've never seen this on my VIA686a itself. The P3
machine is UP too, not SMP. I saw this ever since I switched the machine to
2.4.2-ac8 and beyond (previously 2.2.18).

 -Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Linux 2.4.3-ac9

2001-04-17 Thread Byron Stanoszek

On Wed, 18 Apr 2001, Jason Thomas wrote:

 Alan,
 
 This does not seem to fix the problem with "clock timer", which
 repeatedly prints the following message:
 
 probable hardware bug: clock timer configuration lost - probably a VIA686a 
motherboard.
 probable hardware bug: restoring chip configuration.
 
 The machine does not get any further than printing the above message.
 This message only appears with an SMP kernel, there are no ide devices
 in the machine.

I've seen this on my Dell P3 700 machine several times. Seems to happen at odd
intervals after I use my CD burner, but that just might be coincidental. But
I'd like to point out that I've never seen this on my VIA686a itself. The P3
machine is UP too, not SMP. I saw this ever since I switched the machine to
2.4.2-ac8 and beyond (previously 2.2.18).

 -Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Need help with allocating a 2M buffer size

2001-03-15 Thread Byron Stanoszek

I have a real picky tape drive (DLT series) that likes to be fed large chunks
of data at once, otherwise after every 2-4KB of data it halts and rewinds
itself because its cache for writing to the tape is empty.

My best solution to this problem was to use 'tar -b 4096', which sends 4096 x
512-byte blocks at once for a total of a 2MB buffer size. This worked fine for
several weeks, until 2 days ago I got this message (and the backup fails):

st: failed to enlarge buffer to 2097152 bytes.

Free memory shows:

 total   used   free sharedbuffers cached
Mem:517036 514468   2568 751908  47804 189488
-/+ buffers/cache: 277176 239860
Swap:   136544452 136092

Unfortunately, all of the "free" memory right now is eaten up using cache. Is
there a way I can just tell the kernel to allocate memory from the cache for
the buffer? I'm sure there's gotta be a 2MB-sized chunk in that 189MB cache
-somewhere-.

Why doesn't the kernel's get_free_pages() function support moving data around
in memory to get larger chunks for what it needs? I see this same problem
happening in SVGATextMode where allocating space for a NxM character screen
(where NxM >= 16384) fails because there is no contiguous memory space. I think
at least it should be able to use some cache.

Suggestions?
 -Byron

--
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Need help with allocating a 2M buffer size

2001-03-15 Thread Byron Stanoszek

I have a real picky tape drive (DLT series) that likes to be fed large chunks
of data at once, otherwise after every 2-4KB of data it halts and rewinds
itself because its cache for writing to the tape is empty.

My best solution to this problem was to use 'tar -b 4096', which sends 4096 x
512-byte blocks at once for a total of a 2MB buffer size. This worked fine for
several weeks, until 2 days ago I got this message (and the backup fails):

st: failed to enlarge buffer to 2097152 bytes.

Free memory shows:

 total   used   free sharedbuffers cached
Mem:517036 514468   2568 751908  47804 189488
-/+ buffers/cache: 277176 239860
Swap:   136544452 136092

Unfortunately, all of the "free" memory right now is eaten up using cache. Is
there a way I can just tell the kernel to allocate memory from the cache for
the buffer? I'm sure there's gotta be a 2MB-sized chunk in that 189MB cache
-somewhere-.

Why doesn't the kernel's get_free_pages() function support moving data around
in memory to get larger chunks for what it needs? I see this same problem
happening in SVGATextMode where allocating space for a NxM character screen
(where NxM = 16384) fails because there is no contiguous memory space. I think
at least it should be able to use some cache.

Suggestions?
 -Byron

--
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Memory-related hangup (2.4.2-ac3)

2001-03-02 Thread Byron Stanoszek

Is there a reason why the kernel appears to hang temporarily for 3-5 minutes
under this circumstance:

gandalf:~> free
 total   used   free sharedbuffers cached
Mem:126700 125024   1676  0964  61640
-/+ buffers/cache:  62420  64280
Swap:97648  97648  0


The system seems to favor the cache, but leaves no room for processes to use
the remaining 64MB of ram. This happened while running netscape after viewing
a couple of pages with a lot of images on them.

Older kernels would happily allow processes to eat up cache space when memory
was low. In fact, I used to be able to use 32MB of swap without any problems
(even when netscape had more memory allocated to it than this now).

Lately with 2.4 kernels I had to add another 64MB swap file to the existing 32,
and the performance seems no different than without it, when compared to the
old way of letting netscape just use all 125MB if it wants to (and sacrifice
cached files, which aren't important in this case).

Is there a setting I can control to force the kernel to give up cache when
memory is low without hanging the machine? I personally don't think 50% process
memory + 50% cache is an ideal solution--especially when running stuff that
really wants >= 150MB (RAM + swap).

Actually I'd prefer having cache use half the remaining RAM not taken up by
processes, instead of half the total RAM on the system. Any suggestions?

Regards,
 Byron

--
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Memory-related hangup (2.4.2-ac3)

2001-03-02 Thread Byron Stanoszek

Is there a reason why the kernel appears to hang temporarily for 3-5 minutes
under this circumstance:

gandalf:~ free
 total   used   free sharedbuffers cached
Mem:126700 125024   1676  0964  61640
-/+ buffers/cache:  62420  64280
Swap:97648  97648  0


The system seems to favor the cache, but leaves no room for processes to use
the remaining 64MB of ram. This happened while running netscape after viewing
a couple of pages with a lot of images on them.

Older kernels would happily allow processes to eat up cache space when memory
was low. In fact, I used to be able to use 32MB of swap without any problems
(even when netscape had more memory allocated to it than this now).

Lately with 2.4 kernels I had to add another 64MB swap file to the existing 32,
and the performance seems no different than without it, when compared to the
old way of letting netscape just use all 125MB if it wants to (and sacrifice
cached files, which aren't important in this case).

Is there a setting I can control to force the kernel to give up cache when
memory is low without hanging the machine? I personally don't think 50% process
memory + 50% cache is an ideal solution--especially when running stuff that
really wants = 150MB (RAM + swap).

Actually I'd prefer having cache use half the remaining RAM not taken up by
processes, instead of half the total RAM on the system. Any suggestions?

Regards,
 Byron

--
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [preview] VIA IDE 4.0 and AMD IDE 2.0 with automatic PCI clockdetection

2001-02-09 Thread Byron Stanoszek

> I've decided that too much trouble has been caused by a wrong PCI clock
> specified to the IDE drivers (which in turn compute wrong IDE timings).
> 
> I've made the VIA and AMD drivers detect the PCI clock automatically.
> Because this is a very significant change, I've upped the major release
> numbers to 4 and 2.
> 
> Could anyone with these chipsets check these drivers if they detect the
> PCI clock correctly on their systems?

Certainly. I tested this on my machine with PCIClk 33, 34, and 36.6. The driver
correctly tuned the clocks to 33, 34, and 37 (expected and not harmful). It
works very well.

One thing to note: You should probably display the new clock speed in the
kernel debug messages on bootup. Also I don't know if you've done this already,
but if the user specifies an idebus=xx then that should override the auto
detection.

I do have a concern however. When you autodetect the PCI clock, does that
propagate to other IDE controllers that have been initialized? For instance, my
Abit KT7-Raid also has a Highpoint 370 controller. My fear is that it may get
initialized to 33 before the VIA controller is started and before detecting
that the true PCI clock is really 37. Unless the Highpoint controller has an
external timing mechanism I think this could pose a problem.

If it could help things, maybe a patch can be made to the standard IDE setup
routines that will replace the message "Assuming 33MHz for PIO modes" to
"Autodetected PCI Clock at 37MHz". This would ensure that all the IDE drivers
get set up with the correct detected PCI clock, and not just VIA/AMD's.

Thoughts/comments?
 -Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [preview] VIA IDE 4.0 and AMD IDE 2.0 with automatic PCI clockdetection

2001-02-09 Thread Byron Stanoszek

 I've decided that too much trouble has been caused by a wrong PCI clock
 specified to the IDE drivers (which in turn compute wrong IDE timings).
 
 I've made the VIA and AMD drivers detect the PCI clock automatically.
 Because this is a very significant change, I've upped the major release
 numbers to 4 and 2.
 
 Could anyone with these chipsets check these drivers if they detect the
 PCI clock correctly on their systems?

Certainly. I tested this on my machine with PCIClk 33, 34, and 36.6. The driver
correctly tuned the clocks to 33, 34, and 37 (expected and not harmful). It
works very well.

One thing to note: You should probably display the new clock speed in the
kernel debug messages on bootup. Also I don't know if you've done this already,
but if the user specifies an idebus=xx then that should override the auto
detection.

I do have a concern however. When you autodetect the PCI clock, does that
propagate to other IDE controllers that have been initialized? For instance, my
Abit KT7-Raid also has a Highpoint 370 controller. My fear is that it may get
initialized to 33 before the VIA controller is started and before detecting
that the true PCI clock is really 37. Unless the Highpoint controller has an
external timing mechanism I think this could pose a problem.

If it could help things, maybe a patch can be made to the standard IDE setup
routines that will replace the message "Assuming 33MHz for PIO modes" to
"Autodetected PCI Clock at 37MHz". This would ensure that all the IDE drivers
get set up with the correct detected PCI clock, and not just VIA/AMD's.

Thoughts/comments?
 -Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: aacraid 2.4.0 kernel

2001-02-07 Thread Byron Stanoszek

On Wed, 7 Feb 2001, Jason Ford wrote:

> Byron,
> 
> I got your patch to compile in fine however it still exhibits the same
> behavior that the older patches did. It looks like the commands sent to the
> controller are still not working correctly as the new subsystem in the
> kernel was rewritten.
> 
> This is the error I get in my messages file when trying to copy from one
> disk partition to another one.
> 
> so on and so on.. Am I doing something wrong?

Nope. It looks horribly broken.  Oh well.. I guess I'd stick to 2.2.19-pre on
the Dell machines for the time being.

 -Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: aacraid 2.4.0 kernel

2001-02-07 Thread Byron Stanoszek

On Wed, 7 Feb 2001, Jason Ford wrote:

> I see in the archives of this mailing list that someone was working on the
> aacraid driver for the 2.4 kernel however that post was almost 2 months old.
> I know Alan Cox denied inclusion of the driver due to the poor nature it was
> written for the 2.2 tree. Every post that I have seen so far has just said
> that Adaptec is working on it. However, I am sure there are many people out
> there like myself that have to support this card in enviroments that would
> be benifical to upgrade to 2.4 kernel. I am not a part of this list however
> have been scouring through geocrawler.com archives of this list everyday for
> the last month hoping and waiting.

While it's totally unofficial, I have a patch for aacraid 1.0.6 for 2.4.1-ac5.
I have not tested it yet, but it compiles cleanly. I'd like to hear any results
(good or bad) you have on it.

You can find it at:

  ftp://ftp.winds.org/linux/patches/2.4.1/aacraid-2.4.1-1.0.6.patch

Regards,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.4.1-ac5 - The loopback hang saga continues

2001-02-07 Thread Byron Stanoszek

It appears that the loopback-hang parasite is alive and well in 2.4.1-ac5.
I've done several tests and I thus provide the following information:

The bug is independent of UP or SMP configured.. it hung both ways, but the
box itself is UP.

It appears to hang when internal buffers get filled. The way I see it, copying
files from disk to the loopback device (which is a file on the same disk)
begins to read from the disk. When the internal read buffer is full, the
kernel's queued writes start activating and the data gets copied to the
loopback file. This process repeats over and over, as it should normally.

Sometimes however, during a read from the disk, it fills up its buffers and
then never makes the accompanying write. In fact, the entire device freezes on
the read.

I was able to lessen the frequency of hanging by using the -v flag and tapping
^S and ^Q to temporarily 'pause' copying. This ensures that the read buffer
will never become full to the point where it could cause the hang, and appears
to work -- until it came across the libc.a file. There was no way to pause it
here because nothing is being outputted to the screen while it's copying
libc.a. Unfortunately, it fills the buffer too quick and hangs 100% every time.
The disk is totally nonresponsive at this point, and a hard reset is necessary.

I hope this helps anyone who is still tracking down the loopback problem.

Regards,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.4.1-ac5 - The loopback hang saga continues

2001-02-07 Thread Byron Stanoszek

It appears that the loopback-hang parasite is alive and well in 2.4.1-ac5.
I've done several tests and I thus provide the following information:

The bug is independent of UP or SMP configured.. it hung both ways, but the
box itself is UP.

It appears to hang when internal buffers get filled. The way I see it, copying
files from disk to the loopback device (which is a file on the same disk)
begins to read from the disk. When the internal read buffer is full, the
kernel's queued writes start activating and the data gets copied to the
loopback file. This process repeats over and over, as it should normally.

Sometimes however, during a read from the disk, it fills up its buffers and
then never makes the accompanying write. In fact, the entire device freezes on
the read.

I was able to lessen the frequency of hanging by using the -v flag and tapping
^S and ^Q to temporarily 'pause' copying. This ensures that the read buffer
will never become full to the point where it could cause the hang, and appears
to work -- until it came across the libc.a file. There was no way to pause it
here because nothing is being outputted to the screen while it's copying
libc.a. Unfortunately, it fills the buffer too quick and hangs 100% every time.
The disk is totally nonresponsive at this point, and a hard reset is necessary.

I hope this helps anyone who is still tracking down the loopback problem.

Regards,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: aacraid 2.4.0 kernel

2001-02-07 Thread Byron Stanoszek

On Wed, 7 Feb 2001, Jason Ford wrote:

 I see in the archives of this mailing list that someone was working on the
 aacraid driver for the 2.4 kernel however that post was almost 2 months old.
 I know Alan Cox denied inclusion of the driver due to the poor nature it was
 written for the 2.2 tree. Every post that I have seen so far has just said
 that Adaptec is working on it. However, I am sure there are many people out
 there like myself that have to support this card in enviroments that would
 be benifical to upgrade to 2.4 kernel. I am not a part of this list however
 have been scouring through geocrawler.com archives of this list everyday for
 the last month hoping and waiting.

While it's totally unofficial, I have a patch for aacraid 1.0.6 for 2.4.1-ac5.
I have not tested it yet, but it compiles cleanly. I'd like to hear any results
(good or bad) you have on it.

You can find it at:

  ftp://ftp.winds.org/linux/patches/2.4.1/aacraid-2.4.1-1.0.6.patch

Regards,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: aacraid 2.4.0 kernel

2001-02-07 Thread Byron Stanoszek

On Wed, 7 Feb 2001, Jason Ford wrote:

 Byron,
 
 I got your patch to compile in fine however it still exhibits the same
 behavior that the older patches did. It looks like the commands sent to the
 controller are still not working correctly as the new subsystem in the
 kernel was rewritten.
 
 This is the error I get in my messages file when trying to copy from one
 disk partition to another one.
 
 so on and so on.. Am I doing something wrong?

Nope. It looks horribly broken.  Oh well.. I guess I'd stick to 2.2.19-pre on
the Dell machines for the time being.

 -Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: NFS stop/start problems (related to datagram shutdown bug?)

2001-02-06 Thread Byron Stanoszek

> There does seem to be a possible problem with sk_inuse not being
> updated atomically, so a race between an increment and a decrement
> could lose one of them.
> svc_sock_release seems to often be called with no more protection than
> the BKL, and it decrements sk_inuse.
>
> svc_sock_enqueue, on the other hand increments sk_inuse, and is
> protected by sv_lock, but not, I think, by the BKL, as it is called by
> a networking layer callback. So there might be a possibility for a
> race here.
>
> The attached patch might fix it, so if you are having reproducable
> problems, it might be worth applying this patch.
>
> NeilBrown

I applied the patch and the problem seems to have gone away, where it was
fairly reproducable beforehand. It waits a little longer (about 4 seconds)
during the NFS daemon shutdown before [  OK  ] pops up, but it could be my
imagination because I was doing it on the 166 and I was used to the 866's.

But what matters is that I can stop and restart NFS just fine now whereas
before I couldn't. Thanks for the patch.

 -Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: NFS stop/start problems (related to datagram shutdown bug?)

2001-02-06 Thread Byron Stanoszek

 There does seem to be a possible problem with sk_inuse not being
 updated atomically, so a race between an increment and a decrement
 could lose one of them.
 svc_sock_release seems to often be called with no more protection than
 the BKL, and it decrements sk_inuse.

 svc_sock_enqueue, on the other hand increments sk_inuse, and is
 protected by sv_lock, but not, I think, by the BKL, as it is called by
 a networking layer callback. So there might be a possibility for a
 race here.

 The attached patch might fix it, so if you are having reproducable
 problems, it might be worth applying this patch.

 NeilBrown

I applied the patch and the problem seems to have gone away, where it was
fairly reproducable beforehand. It waits a little longer (about 4 seconds)
during the NFS daemon shutdown before [  OK  ] pops up, but it could be my
imagination because I was doing it on the 166 and I was used to the 866's.

But what matters is that I can stop and restart NFS just fine now whereas
before I couldn't. Thanks for the patch.

 -Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



[OT] Why so much memory 'reserved'?

2001-02-05 Thread Byron Stanoszek

This is an offtopic question. What determines the amount of 'reserved' memory,
and how much to reserve?

With 2.4.1-ac3, I came up with the following different memory readings for
both a Pentium 166 and an Athlon 750.

Pentium 166: (96MB RAM)

BIOS-provided physical RAM map:
 BIOS-e820: 0009fc00 @  (usable)
 BIOS-e820: 0400 @ 0009fc00 (usable)
 BIOS-e820: 0001 @ 000f (reserved)
 BIOS-e820: 05f0 @ 0010 (usable)
 BIOS-e820: 0001 @  (reserved)
On node 0 totalpages: 24576
zone(0): 4096 pages.
zone(1): 20480 pages.
zone(2): 0 pages.
Memory: 94732k/98304k available (890k kernel code, 3184k reserved, 261k data,
176k init, 0k highmem)

Athlon 750: (128MB RAM)
---
BIOS-provided physical RAM map:
 BIOS-e820: 0009fc00 @  (usable)
 BIOS-e820: 0400 @ 0009fc00 (reserved)
 BIOS-e820: 0001 @ 000f (reserved)
 BIOS-e820: 0001 @  (reserved)
 BIOS-e820: 07f0 @ 0010 (usable)
On node 0 totalpages: 32768
zone(0): 4096 pages.
zone(1): 28672 pages.
zone(2): 0 pages.
Memory: 126500k/131072k available (1127k kernel code, 4184k reserved, 322k
data, 200k init, 0k highmem)

Last year, when I had 32MB of memory in the Pentium 166 machine, the amount of
'reserved' memory seemed lower. It almost looked as if the amount of reserved
memory is a fraction of total available memory.

Is there a way I can 'regain' this memory from the system, especially in cases
when there's only 32MB to work with?

Thanks,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: NFS stop/start problems (related to datagram shutdown bug?)

2001-02-05 Thread Byron Stanoszek

On Tue, 6 Feb 2001, Neil Brown wrote:

> How repeatable is this?  Is the server SMP?

I've tested this on two UP Athlons and 2 SMP Pentium 3's and the same problem
occurred. I have not tested it more than once on the same system (I left the
NFS servers untouched after the reboot).

The Athlon systems running NFS were 2.4.1-ac3 and the Pentiums were running
2.2.19-pre7. All computers exporting the FS had one directory mounted at least
once.

In one case, only 1 directory was mounted once and then unmounted before
shutting off the NFS server. When I realized I forgot to copy a directory over,
I went to restart NFS on the server and found out I was unable to. Probably
irrelevant, but this had been after transferring 7 gigs of data over 100 Mbps.

I still have the 'broken' server running, so if you would like me to run a
command or two on it I can show you the results.

> The attached patch might fix it, so if you are having reproducable
> problems, it might be worth applying this patch.

I can try it tomorrow and see if it fixes the problem, but since this problem
also occurred on a UP, using spin locks probably will not correct it. Perhaps
it's something else.

> [patch snipped]

 -Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: NFS stop/start problems (related to datagram shutdown bug?)

2001-02-05 Thread Byron Stanoszek

On Mon, 5 Feb 2001, Alan Cox wrote:

> > Seems recently, on both redhat 6.1 and 7.0 using kernel 2.4.1-ac3, I
> > ran into this problem:
> 
> Ok seen this in older 2.2 but not 2.4
> 
> > nfsd: terminating on signal 9
> > svc: server socket destroy delayed
> > 
> > And restarting NFS has the following error message:
> > Starting NFS mountd:   [  OK  ]
> > Starting NFS daemon: nfssvc: Address already in use
> >[FAILED]
> 
> A socket got stuck. Thats preventing you restarting it. The bug is whatever
> leak caused the svc: server socket destroy delayed case. 
> 
> Just for reference what network card ?

Both machines had a 3c905b-tx-nm card in them.

3c59x.c:LK1.1.12 06 Jan 2000  Donald Becker and others.
http://www.scyld.com/network/vortex.html $Revision: 1.102.2.46 $
See Documentation/networking/vortex.txt
eth0: 3Com PCI 3c905B Cyclone 100baseTx at 0x6100,  00:50:da:cd:c8:b9, IRQ 11
  product code 'XC' rev 00.13 date 12-29-99
  8K byte-wide RAM 5:3 Rx:Tx split, autoselect/Autonegotiate interface.
  MII transceiver found at address 24, status 786d.
  Enabling bus-master transmits and whole-frame receives.

-Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



NFS stop/start problems (related to datagram shutdown bug?)

2001-02-05 Thread Byron Stanoszek

Seems recently, on both redhat 6.1 and 7.0 using kernel 2.4.1-ac3, I
ran into this problem:

Stopping NFS says the following in the kernel logs:

nfsd: terminating on signal 9
nfsd: terminating on signal 9
nfsd: terminating on signal 9
nfsd: terminating on signal 9
nfsd: terminating on signal 9
nfsd: terminating on signal 9
nfsd: terminating on signal 9
nfsd: terminating on signal 9
svc: server socket destroy delayed

And restarting NFS has the following error message:

root:~> /etc/rc.d/init.d/nfs start
Starting NFS services: [  OK  ]
Starting NFS quotas:   [  OK  ]
Starting NFS mountd:   [  OK  ]
Starting NFS daemon: nfssvc: Address already in use
   [FAILED]

>From that moment forward, the NFS server is completely broken until the system
is rebooted, and other machines respond during a 'mount' by saying,

nfs: server xxx not responding, still trying

When I tried this, the remote computer had unmounted this NFS-served partition
prior to shutting NFS down with '/etc/rc.d/init.d/nfs stop'. I was wondering if
this could be related to that datagram shutdown bug, and maybe if there's a
quick solution in the meantime to kill the socket so that I can restart NFS
without rebooting.

Thanks,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



NFS stop/start problems (related to datagram shutdown bug?)

2001-02-05 Thread Byron Stanoszek

Seems recently, on both redhat 6.1 and 7.0 using kernel 2.4.1-ac3, I
ran into this problem:

Stopping NFS says the following in the kernel logs:

nfsd: terminating on signal 9
nfsd: terminating on signal 9
nfsd: terminating on signal 9
nfsd: terminating on signal 9
nfsd: terminating on signal 9
nfsd: terminating on signal 9
nfsd: terminating on signal 9
nfsd: terminating on signal 9
svc: server socket destroy delayed

And restarting NFS has the following error message:

root:~ /etc/rc.d/init.d/nfs start
Starting NFS services: [  OK  ]
Starting NFS quotas:   [  OK  ]
Starting NFS mountd:   [  OK  ]
Starting NFS daemon: nfssvc: Address already in use
   [FAILED]

From that moment forward, the NFS server is completely broken until the system
is rebooted, and other machines respond during a 'mount' by saying,

nfs: server xxx not responding, still trying

When I tried this, the remote computer had unmounted this NFS-served partition
prior to shutting NFS down with '/etc/rc.d/init.d/nfs stop'. I was wondering if
this could be related to that datagram shutdown bug, and maybe if there's a
quick solution in the meantime to kill the socket so that I can restart NFS
without rebooting.

Thanks,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: NFS stop/start problems (related to datagram shutdown bug?)

2001-02-05 Thread Byron Stanoszek

On Mon, 5 Feb 2001, Alan Cox wrote:

  Seems recently, on both redhat 6.1 and 7.0 using kernel 2.4.1-ac3, I
  ran into this problem:
 
 Ok seen this in older 2.2 but not 2.4
 
  nfsd: terminating on signal 9
  svc: server socket destroy delayed
  
  And restarting NFS has the following error message:
  Starting NFS mountd:   [  OK  ]
  Starting NFS daemon: nfssvc: Address already in use
 [FAILED]
 
 A socket got stuck. Thats preventing you restarting it. The bug is whatever
 leak caused the svc: server socket destroy delayed case. 
 
 Just for reference what network card ?

Both machines had a 3c905b-tx-nm card in them.

3c59x.c:LK1.1.12 06 Jan 2000  Donald Becker and others.
http://www.scyld.com/network/vortex.html $Revision: 1.102.2.46 $
See Documentation/networking/vortex.txt
eth0: 3Com PCI 3c905B Cyclone 100baseTx at 0x6100,  00:50:da:cd:c8:b9, IRQ 11
  product code 'XC' rev 00.13 date 12-29-99
  8K byte-wide RAM 5:3 Rx:Tx split, autoselect/Autonegotiate interface.
  MII transceiver found at address 24, status 786d.
  Enabling bus-master transmits and whole-frame receives.

-Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: NFS stop/start problems (related to datagram shutdown bug?)

2001-02-05 Thread Byron Stanoszek

On Tue, 6 Feb 2001, Neil Brown wrote:

 How repeatable is this?  Is the server SMP?

I've tested this on two UP Athlons and 2 SMP Pentium 3's and the same problem
occurred. I have not tested it more than once on the same system (I left the
NFS servers untouched after the reboot).

The Athlon systems running NFS were 2.4.1-ac3 and the Pentiums were running
2.2.19-pre7. All computers exporting the FS had one directory mounted at least
once.

In one case, only 1 directory was mounted once and then unmounted before
shutting off the NFS server. When I realized I forgot to copy a directory over,
I went to restart NFS on the server and found out I was unable to. Probably
irrelevant, but this had been after transferring 7 gigs of data over 100 Mbps.

I still have the 'broken' server running, so if you would like me to run a
command or two on it I can show you the results.

 The attached patch might fix it, so if you are having reproducable
 problems, it might be worth applying this patch.

I can try it tomorrow and see if it fixes the problem, but since this problem
also occurred on a UP, using spin locks probably will not correct it. Perhaps
it's something else.

 [patch snipped]

 -Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



[OT] Why so much memory 'reserved'?

2001-02-05 Thread Byron Stanoszek

This is an offtopic question. What determines the amount of 'reserved' memory,
and how much to reserve?

With 2.4.1-ac3, I came up with the following different memory readings for
both a Pentium 166 and an Athlon 750.

Pentium 166: (96MB RAM)

BIOS-provided physical RAM map:
 BIOS-e820: 0009fc00 @  (usable)
 BIOS-e820: 0400 @ 0009fc00 (usable)
 BIOS-e820: 0001 @ 000f (reserved)
 BIOS-e820: 05f0 @ 0010 (usable)
 BIOS-e820: 0001 @  (reserved)
On node 0 totalpages: 24576
zone(0): 4096 pages.
zone(1): 20480 pages.
zone(2): 0 pages.
Memory: 94732k/98304k available (890k kernel code, 3184k reserved, 261k data,
176k init, 0k highmem)

Athlon 750: (128MB RAM)
---
BIOS-provided physical RAM map:
 BIOS-e820: 0009fc00 @  (usable)
 BIOS-e820: 0400 @ 0009fc00 (reserved)
 BIOS-e820: 0001 @ 000f (reserved)
 BIOS-e820: 0001 @  (reserved)
 BIOS-e820: 07f0 @ 0010 (usable)
On node 0 totalpages: 32768
zone(0): 4096 pages.
zone(1): 28672 pages.
zone(2): 0 pages.
Memory: 126500k/131072k available (1127k kernel code, 4184k reserved, 322k
data, 200k init, 0k highmem)

Last year, when I had 32MB of memory in the Pentium 166 machine, the amount of
'reserved' memory seemed lower. It almost looked as if the amount of reserved
memory is a fraction of total available memory.

Is there a way I can 'regain' this memory from the system, especially in cases
when there's only 32MB to work with?

Thanks,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: VT82C686A corruption with 2.4.x

2001-02-01 Thread Byron Stanoszek

On Thu, 1 Feb 2001, Vojtech Pavlik wrote:

> On Thu, Feb 01, 2001 at 11:46:08AM -0500, Byron Stanoszek wrote:
> 
> > Yeah, by bios does the same thing too on the Abit KT7(a).
> 
> Ok, I'll remember this. This is most likely the cause of the problems
> many people had with the KT7 in the past.

What cause are you referring to? As far as I know, there are two options to
increasing the FSB clock.. one increases both FSB+PCICLK, the other just
increases FSB. If you increase the FSB only, it should keep PCICLK at a solid
33. (But I could be wrong, I've never tested that. I can tomorrow though.)

> The U33 chips do UDMA timing in PCICLK (T = 30ns @ 33MHz) increments, U66 in
> PCICLK*2 (T = 15ns @ 33 MHz) increments, and for U100 it's assumed that
> there is an external 100MHz clock fed to the chip, so that the UDMA timing is
> in T = 10ns increments independent of the PCICLK. I'm not 100% sure about
> the last, it might be just PCICLK*3 (T = 10ns @ 33 MHz). An experiment needs
> to be carried out to verify this.

I don't have a KT7A personally, I only have a KT7. Can anyone else with a KT7A
verify this? By verify, I take it you mean to use idebus=33 and overclock
PCICLK? :) At least that would determine if UDMA100 is based on PCI or an
external 100MHz source.

Regards,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: VT82C686A corruption with 2.4.x

2001-02-01 Thread Byron Stanoszek

On Thu, 1 Feb 2001, safemode wrote:

> Vojtech Pavlik wrote:
> 
> > Ugh. What chips your KA7 has? As far as I know the KX133 chip (vt8731)
> > can't do asynchronous PCI, allowing for 2x, 3x and 4x FSB/PCI divisors
> > only. So I don't a way to have your FSB at 114 and your PCI at 34 with
> > this chip.
> 
> Actually it can and it's a simple bios option.  I'd show you but it's in the manual 
>and
> it's hard to scan stuff without a scanner. You can have asynchronous FSB up to
> 28Mhzso i can have 128Mhz FSB with 33Mhz PCI  after that i have to use the
> synchronous increase which changes PCI as i change the FSB value   but the other 
>value
> gets added onto that asynchronously.  It's really a standard feature of this board.
> I'm not making it up and the proof is me not changing idebus at all and still working
> after a day at full load and semi-constant usage and MANY compiles.   also the bios
> screen doesn't lie.

Yeah, by bios does the same thing too on the Abit KT7(a). But you might not
want to run your PCI clock at 34 instead of 33. Two problems can occur. If you
don't specify idebus=34 on the kernel prompt, the IDE timings might eventually
get a DMA reset under 100% disk access. If you do say idebus=34, then you drop
your maximum throughput from 33 MB/s to 27MB/s.

I was curious and compiled a list of timings from Vojtech's formula for certain
idebus=xx MHz ratings (I _think_ the UDMA-66 timings are correct, maybe you can
check on these, Vojtech..)

Clock | Setup  Active  Recover  Cycle  UDMA | UDMA-33  UDMA-66  UDMA-100
   21 |1   21  30   |   28.0 56.0  84.0
   22 |1   21  30   |   29.3 58.6  88.0
   23 |1   21  30   |   30.6 61.2  92.0
   24 |1   21  30   |   32.0 64.0  96.0
   25 |1   21  30   |   33.3 66.6 100.0
   26 |1   22  40   |   26.0 52.0  78.0
   27 |1   22  40   |   27.0 54.0  81.0
   28 |1   22  40   |   28.0 56.0  84.0
   29 |1   31  40   |   29.0 58.0  87.0
   30 |1   31  40   |   30.0 60.0  90.0
   31 |1   31  40   |   31.0 62.0  93.0
   32 |1   31  40   |   32.0 64.0  96.0
   33 |1   31  40   |   33.0 66.0  99.0
   34 |1   32  50   |   27.2 54.4  81.6
   35 |1   32  50   |   28.0 56.0  84.0
   36 |1   32  50   |   28.8 57.6  86.4
   37 |1   32  50   |   29.6 59.2  88.8
   38 |1   32  50   |   30.4 60.8  91.2
   39 |1   32  50   |   31.2 62.4  93.6
   40 |1   32  50   |   32.0 64.0  96.0
   41 |2   32  50   |   32.8 65.6  98.4
   42 |2   42  60   |   28.0 56.0  84.0
   43 |2   42  60   |   28.6 57.2  86.0
   44 |2   42  61   |   29.3 58.6  88.0
   45 |2   42  61   |   30.0 60.0  90.0
   46 |2   42  61   |   30.6 61.2  92.0
   47 |2   42  61   |   31.3 62.6  94.0
   48 |2   42  61   |   32.0 64.0  96.0
   49 |2   42  61   |   32.6 65.2  98.0
   50 |2   42  61   |   33.3 66.6 100.0
   51 |2   43  71   |   29.1 58.2  87.4
   52 |2   43  71   |   29.7 59.4  89.1
   53 |2   43  71   |   30.2 60.4  90.8
   54 |2   43  71   |   30.8 61.6  92.5
   55 |2   43  71   |   31.4 62.8  94.2
   56 |2   53  81   |   28.0 56.0  84.0
   57 |2   53  81   |   28.5 57.0  85.5
   58 |2   53  81   |   29.0 58.0  87.0
   59 |2   53  81   |   29.5 59.0  88.5
   60 |2   53  81   |   30.0 60.0  90.0

Personally I like the 113 MHz FSB setting, which runs PCI at 37 and memory at
150 (133*1.13). It helps to have memory rated for 150. :) I've had a system
run at this rate for the past 4 months now and I've never had any problems.
Of course, your results may vary.

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from 

Re: VT82C686A corruption with 2.4.x

2001-02-01 Thread Byron Stanoszek

On Thu, 1 Feb 2001, safemode wrote:

 Vojtech Pavlik wrote:
 
  Ugh. What chips your KA7 has? As far as I know the KX133 chip (vt8731)
  can't do asynchronous PCI, allowing for 2x, 3x and 4x FSB/PCI divisors
  only. So I don't a way to have your FSB at 114 and your PCI at 34 with
  this chip.
 
 Actually it can and it's a simple bios option.  I'd show you but it's in the manual 
and
 it's hard to scan stuff without a scanner. You can have asynchronous FSB up to
 28Mhzso i can have 128Mhz FSB with 33Mhz PCI  after that i have to use the
 synchronous increase which changes PCI as i change the FSB value   but the other 
value
 gets added onto that asynchronously.  It's really a standard feature of this board.
 I'm not making it up and the proof is me not changing idebus at all and still working
 after a day at full load and semi-constant usage and MANY compiles.   also the bios
 screen doesn't lie.

Yeah, by bios does the same thing too on the Abit KT7(a). But you might not
want to run your PCI clock at 34 instead of 33. Two problems can occur. If you
don't specify idebus=34 on the kernel prompt, the IDE timings might eventually
get a DMA reset under 100% disk access. If you do say idebus=34, then you drop
your maximum throughput from 33 MB/s to 27MB/s.

I was curious and compiled a list of timings from Vojtech's formula for certain
idebus=xx MHz ratings (I _think_ the UDMA-66 timings are correct, maybe you can
check on these, Vojtech..)

Clock | Setup  Active  Recover  Cycle  UDMA | UDMA-33  UDMA-66  UDMA-100
   21 |1   21  30   |   28.0 56.0  84.0
   22 |1   21  30   |   29.3 58.6  88.0
   23 |1   21  30   |   30.6 61.2  92.0
   24 |1   21  30   |   32.0 64.0  96.0
   25 |1   21  30   |   33.3 66.6 100.0
   26 |1   22  40   |   26.0 52.0  78.0
   27 |1   22  40   |   27.0 54.0  81.0
   28 |1   22  40   |   28.0 56.0  84.0
   29 |1   31  40   |   29.0 58.0  87.0
   30 |1   31  40   |   30.0 60.0  90.0
   31 |1   31  40   |   31.0 62.0  93.0
   32 |1   31  40   |   32.0 64.0  96.0
   33 |1   31  40   |   33.0 66.0  99.0
   34 |1   32  50   |   27.2 54.4  81.6
   35 |1   32  50   |   28.0 56.0  84.0
   36 |1   32  50   |   28.8 57.6  86.4
   37 |1   32  50   |   29.6 59.2  88.8
   38 |1   32  50   |   30.4 60.8  91.2
   39 |1   32  50   |   31.2 62.4  93.6
   40 |1   32  50   |   32.0 64.0  96.0
   41 |2   32  50   |   32.8 65.6  98.4
   42 |2   42  60   |   28.0 56.0  84.0
   43 |2   42  60   |   28.6 57.2  86.0
   44 |2   42  61   |   29.3 58.6  88.0
   45 |2   42  61   |   30.0 60.0  90.0
   46 |2   42  61   |   30.6 61.2  92.0
   47 |2   42  61   |   31.3 62.6  94.0
   48 |2   42  61   |   32.0 64.0  96.0
   49 |2   42  61   |   32.6 65.2  98.0
   50 |2   42  61   |   33.3 66.6 100.0
   51 |2   43  71   |   29.1 58.2  87.4
   52 |2   43  71   |   29.7 59.4  89.1
   53 |2   43  71   |   30.2 60.4  90.8
   54 |2   43  71   |   30.8 61.6  92.5
   55 |2   43  71   |   31.4 62.8  94.2
   56 |2   53  81   |   28.0 56.0  84.0
   57 |2   53  81   |   28.5 57.0  85.5
   58 |2   53  81   |   29.0 58.0  87.0
   59 |2   53  81   |   29.5 59.0  88.5
   60 |2   53  81   |   30.0 60.0  90.0

Personally I like the 113 MHz FSB setting, which runs PCI at 37 and memory at
150 (133*1.13). It helps to have memory rated for 150. :) I've had a system
run at this rate for the past 4 months now and I've never had any problems.
Of course, your results may vary.

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message

Re: VT82C686A corruption with 2.4.x

2001-02-01 Thread Byron Stanoszek

On Thu, 1 Feb 2001, Vojtech Pavlik wrote:

 On Thu, Feb 01, 2001 at 11:46:08AM -0500, Byron Stanoszek wrote:
 
  Yeah, by bios does the same thing too on the Abit KT7(a).
 
 Ok, I'll remember this. This is most likely the cause of the problems
 many people had with the KT7 in the past.

What cause are you referring to? As far as I know, there are two options to
increasing the FSB clock.. one increases both FSB+PCICLK, the other just
increases FSB. If you increase the FSB only, it should keep PCICLK at a solid
33. (But I could be wrong, I've never tested that. I can tomorrow though.)

 The U33 chips do UDMA timing in PCICLK (T = 30ns @ 33MHz) increments, U66 in
 PCICLK*2 (T = 15ns @ 33 MHz) increments, and for U100 it's assumed that
 there is an external 100MHz clock fed to the chip, so that the UDMA timing is
 in T = 10ns increments independent of the PCICLK. I'm not 100% sure about
 the last, it might be just PCICLK*3 (T = 10ns @ 33 MHz). An experiment needs
 to be carried out to verify this.

I don't have a KT7A personally, I only have a KT7. Can anyone else with a KT7A
verify this? By verify, I take it you mean to use idebus=33 and overclock
PCICLK? :) At least that would determine if UDMA100 is based on PCI or an
external 100MHz source.

Regards,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: VT82C686A corruption with 2.4.x

2001-01-31 Thread Byron Stanoszek

On Wed, 31 Jan 2001, safemode wrote:

> yea i know. . same mode   i also had a big problem with DMA timeouts on
> 2.4 so  .. i dont know what's up with 2.4 and my motherboard ...2.2
> hasn't shown a single irq or DMA error yet since going back to it.
> currently 2.2.19-pre7 is using UDMA4 i just flashed the bios today so ..
> hopefully that should have fixed any problems.  I get 24MB/s each according
> to hdparm -t   on my hdd's and both are on the same channel.   This is much
> better than i ever got with 2.4 even when only one drive was on a channel.
> Right now my k7-2 750 is at 849mhz with a FSB of 114Mhz and PCI at 34Mhz.
> my nbench performance under 2.2 is comparable to results for 1Ghz t-bird's so
> i'm happy with 2.2.  The only thing that would make me want to upgrade would
> be latency patches.  I'm convinced 2.4 has performance issues so i guess i'll
> be using 2.2 until 2.5 begins.  Is it really only 1 or 2 people having
> this Via corruption problem?   i doubt it's a bios problem because wouldn't
> 2.2 be effected by a bios bug if 2.4 is?   In either case the changelogs dont
> show any  fixes for it.

If your FSB is running at 114 MHz, you should try the kernel parameter
idebus=37 to get DMA working correctly. Otherwise you'll see an ide-reset error
on bootup because the instructions are too fast. The VIA driver on 2.2 doesn't
correctly program the PCI card, so you don't see weird behavior running 2.2
with a faster PCI clock.

(Note: 1.14 * 33 = 37.6 PCI Clk)

It also matters what motherboard you're using, and if it can support speeds up
past 100. If you're serious about overclocking, buy one of the new KT133A
boards that support speeds up to 133 (or an average overclocked 145 limit).

For instance, my Epox KX133 board is unstable at anything above 110 FSB, but
I've seen the Abit KT7 go as high as 116. You should also have some good
memory that is rated for 150MHz, otherwise you're just asking for trouble.

As always, if you have problems with the kernel and want to submit a bug
report, please put all the settings back to normal and test thoroughly before
continuing. It's funny how many bug reports I've heard from people who've
overclocked their FSB and expected the IDE DMA to be set appropriately under
2.4... maybe this should be mentioned somewhere in ide.txt, even though
overclocking is frowned upon.

Regards,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: VIA VT82C686X

2001-01-31 Thread Byron Stanoszek

On Tue, 30 Jan 2001, David D.W. Downey wrote:

> I removed the ide and ata setting. System is running stably as in no
> kernel crashes, but I am getting daemon and shell crashes. With this
> current kernel I've had 1 kernel crash in about 3 hours as compared to 1
> every 10 or 15 minutes. Crash, reboot, 10 minutes or so crash, reboot. ect
> ect.
> 
> I'm wanting to test something else out. I'm wondering if there isn't some
> hardware issue with the RAM. This particular board will do 1GB of PC133,
> or 2.5GB of PC100. I'm wondering if there isn't something wrong with how
> it reads the speed and the appropriate limitation. It's running stably if
> I only run 768MB of PC133 RAM. But if I run a solid 1GB of PC133 I get
> segfaults and sig11 crashes constantly. All the RAM has been
> professionally tested and certified.

That definitely sounds like a RAM problem. The system should perform the same
independent of how many RAM chips you put in there (segfault-wise). If you're
still in doubt, you can try booting up with memtest86 and run it for several
hours with only the memory chip that you think might be causing the problem.

You can grab the bootdisk image from:
  ftp://ftp.winds.org/linux/images/memtest86-2.5.bin

and just write it to a floppy with 'cat memtest86-2.5.bin > /dev/fd0', then
boot up with that disk.

Regards,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: VT82C686A corruption with 2.4.x

2001-01-31 Thread Byron Stanoszek

On Wed, 31 Jan 2001, safemode wrote:

 yea i know. . same mode   i also had a big problem with DMA timeouts on
 2.4 so  .. i dont know what's up with 2.4 and my motherboard ...2.2
 hasn't shown a single irq or DMA error yet since going back to it.
 currently 2.2.19-pre7 is using UDMA4 i just flashed the bios today so ..
 hopefully that should have fixed any problems.  I get 24MB/s each according
 to hdparm -t   on my hdd's and both are on the same channel.   This is much
 better than i ever got with 2.4 even when only one drive was on a channel.
 Right now my k7-2 750 is at 849mhz with a FSB of 114Mhz and PCI at 34Mhz.
 my nbench performance under 2.2 is comparable to results for 1Ghz t-bird's so
 i'm happy with 2.2.  The only thing that would make me want to upgrade would
 be latency patches.  I'm convinced 2.4 has performance issues so i guess i'll
 be using 2.2 until 2.5 begins.  Is it really only 1 or 2 people having
 this Via corruption problem?   i doubt it's a bios problem because wouldn't
 2.2 be effected by a bios bug if 2.4 is?   In either case the changelogs dont
 show any  fixes for it.

If your FSB is running at 114 MHz, you should try the kernel parameter
idebus=37 to get DMA working correctly. Otherwise you'll see an ide-reset error
on bootup because the instructions are too fast. The VIA driver on 2.2 doesn't
correctly program the PCI card, so you don't see weird behavior running 2.2
with a faster PCI clock.

(Note: 1.14 * 33 = 37.6 PCI Clk)

It also matters what motherboard you're using, and if it can support speeds up
past 100. If you're serious about overclocking, buy one of the new KT133A
boards that support speeds up to 133 (or an average overclocked 145 limit).

For instance, my Epox KX133 board is unstable at anything above 110 FSB, but
I've seen the Abit KT7 go as high as 116. You should also have some good
memory that is rated for 150MHz, otherwise you're just asking for trouble.

As always, if you have problems with the kernel and want to submit a bug
report, please put all the settings back to normal and test thoroughly before
continuing. It's funny how many bug reports I've heard from people who've
overclocked their FSB and expected the IDE DMA to be set appropriately under
2.4... maybe this should be mentioned somewhere in ide.txt, even though
overclocking is frowned upon.

Regards,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: VIA VT82C686X

2001-01-30 Thread Byron Stanoszek

On Tue, 30 Jan 2001, David D.W. Downey wrote:

> 
> Woohoo! Just found out that ATA66 on the VIA aint too great.
> 
> I set the kernel boot options idebus=66 ide0=ata66 enabling ATA66
> according to dmesg. The HDD is a WDC UDMA100 30.5GB drive. I retried the

The 'idebus=xx' parameter doesn't refer to the speed of the IDE drive, but
instead the speed of the PCI bus. On the VIA686, that speed should always be 33
(unless you're overclocking). Setting it to 66 will cause the VIA driver to
believe your PCI bus is running at 66MHz and will program the IDE controller to
run at half the speed to maintain 33MHz. In reality, your controller now runs
at 16.

I believe v3.20 of the via82cxxx.c driver disallows any setting lower than 20
or higher than 50.

AFAIK the driver auto-selects the speed of your drive based on how it is
configured in the BIOS, and whether you have the 40- or 80-wire cable. The
'ide0=ata66' option should not be necessary.

To others, I've been running this driver with both a KX133 and a KT133 (both
via686a) for quite some time now and have never seen any problems. Just make
sure 'idebus=xx' matches the speed of your PCICLK as shown in the bios and
you'll be fine (Default is 33).

Regards,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: VIA VT82C686X

2001-01-30 Thread Byron Stanoszek

On Tue, 30 Jan 2001, David D.W. Downey wrote:

 
 Woohoo! Just found out that ATA66 on the VIA aint too great.
 
 I set the kernel boot options idebus=66 ide0=ata66 enabling ATA66
 according to dmesg. The HDD is a WDC UDMA100 30.5GB drive. I retried the

The 'idebus=xx' parameter doesn't refer to the speed of the IDE drive, but
instead the speed of the PCI bus. On the VIA686, that speed should always be 33
(unless you're overclocking). Setting it to 66 will cause the VIA driver to
believe your PCI bus is running at 66MHz and will program the IDE controller to
run at half the speed to maintain 33MHz. In reality, your controller now runs
at 16.

I believe v3.20 of the via82cxxx.c driver disallows any setting lower than 20
or higher than 50.

AFAIK the driver auto-selects the speed of your drive based on how it is
configured in the BIOS, and whether you have the 40- or 80-wire cable. The
'ide0=ata66' option should not be necessary.

To others, I've been running this driver with both a KX133 and a KT133 (both
via686a) for quite some time now and have never seen any problems. Just make
sure 'idebus=xx' matches the speed of your PCICLK as shown in the bios and
you'll be fine (Default is 33).

Regards,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



[PATCH] (2) Compile warning fixes for gcc 2.97

2001-01-02 Thread Byron Stanoszek

Here's a set of patches that fix compile warnings using gcc 2.97. The first
patch is purely a syntactical change (mainly removing default: statements that
do nothing), the second is a change in code structure that "looks" correct but
was brought on by the same type of warning where the case label has no effect.

So I split up the patch into two parts so we cna decide to throw out the second
if it is indeed incorrect. The 2nd patch (arp.patch) changes the following code

switch (dev->type) {
default:
break;
case ARPHRD_ROSE:
#if defined(CONFIG_AX25) || defined(CONFIG_AX25_MODULE)
case ARPHRD_AX25:
#if defined(CONFIG_NETROM) || defined(CONFIG_NETROM_MODULE)
case ARPHRD_NETROM:
#endif
neigh->ops = _broken_ops;
neigh->output = neigh->ops->output;
return 0;
#endif
}


--to this:--


switch (dev->type) {
case ARPHRD_ROSE:
#if defined(CONFIG_AX25) || defined(CONFIG_AX25_MODULE)
case ARPHRD_AX25:
#endif
#if defined(CONFIG_NETROM) || defined(CONFIG_NETROM_MODULE)
case ARPHRD_NETROM:
#endif
neigh->ops = _broken_ops;
neigh->output = neigh->ops->output;
return 0;
}

---
Which I believe is really the correct flow for that switch statement.
If someone disagrees, just toss it and apply the first patch. :-)

Regards,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]


--- linux/drivers/sound/sequencer.c.bak Tue Jan  2 22:34:44 2001
+++ linux/drivers/sound/sequencer.c Tue Jan  2 22:36:32 2001
@@ -510,8 +510,6 @@
voice = chn;
synth_devs[dev]->aftertouch(dev, voice, parm);
break;
-
-   default:
}
 #undef dev
 #undef cmd
@@ -613,8 +611,6 @@
else/* MODE 1 */
synth_devs[dev]->bender(dev, chn, w14);
break;
-
-   default:
}
 }
 
@@ -683,8 +679,6 @@
seq_copy_to_input((unsigned char *) , 4);
}
break;
-
-   default:
}
 
return TIMER_NOT_ARMED;
@@ -700,8 +694,6 @@
case LOCL_STARTAUDIO:
DMAbuf_start_devices(parm);
break;
-
-   default:
}
 }
 
@@ -858,8 +850,6 @@
case EV_SYSEX:
seq_sysex_message(q);
break;
-
-   default:
}
return 0;
 }
--- linux/drivers/sound/sb_ess.c.bakTue Jan  2 22:40:01 2001
+++ linux/drivers/sound/sb_ess.cTue Jan  2 22:43:05 2001
@@ -766,12 +766,6 @@
case IMODE_INPUT:
DMAbuf_inputintr (dev);
break;
-
-   case IMODE_INIT:
-   break;
-
-   default:
-   /* printk(KERN_WARN "ESS: Unexpected interrupt\n"); */
}
 }
 
@@ -1529,13 +1523,11 @@
 
 static int ess_has_rec_mixer (int submodel)
 {
-   switch (submodel) {
-   case SUBMDL_ES1887:
+   if(submodel == SUBMDL_ES1887)
return 1;
-   default:
+   else
return 0;
-   };
-};
+}
 
 #ifdef FKS_LOGGING
 static int ess_mixer_mon_regs[]
--- linux/drivers/sound/sound_timer.c.bak   Tue Jan  2 22:44:56 2001
+++ linux/drivers/sound/sound_timer.c   Tue Jan  2 22:45:06 2001
@@ -164,8 +164,6 @@
case TMR_ECHO:
seq_copy_to_input(event, 8);
break;
-
-   default:
}
return TIMER_NOT_ARMED;
 }
--- linux/fs/isofs/inode.c.bak  Tue Jan  2 22:46:55 2001
+++ linux/fs/isofs/inode.c  Tue Jan  2 22:47:01 2001
@@ -1264,7 +1264,7 @@
(volume_seq_no != 0) && (volume_seq_no != 1)) {
printk(KERN_WARNING "Multi-volume CD somehow got mounted.\n");
} else
-#endif IGNORE_WRONG_MULTI_VOLUME_SPECS
+#endif /* IGNORE_WRONG_MULTI_VOLUME_SPECS */
{
if (S_ISREG(inode->i_mode)) {
inode->i_fop = _ro_fops;


--- linux/net/ipv4/arp.c.bakTue Jan  2 22:53:31 2001
+++ linux/net/ipv4/arp.cTue Jan  2 22:56:13 2001
@@ -267,7 +267,6 @@
   in old paradigm.
 */
 
-#if 1
/* So... these "amateur" devices are hopeless.

[PATCH] (2) Compile warning fixes for gcc 2.97

2001-01-02 Thread Byron Stanoszek

Here's a set of patches that fix compile warnings using gcc 2.97. The first
patch is purely a syntactical change (mainly removing default: statements that
do nothing), the second is a change in code structure that "looks" correct but
was brought on by the same type of warning where the case label has no effect.

So I split up the patch into two parts so we cna decide to throw out the second
if it is indeed incorrect. The 2nd patch (arp.patch) changes the following code

switch (dev-type) {
default:
break;
case ARPHRD_ROSE:
#if defined(CONFIG_AX25) || defined(CONFIG_AX25_MODULE)
case ARPHRD_AX25:
#if defined(CONFIG_NETROM) || defined(CONFIG_NETROM_MODULE)
case ARPHRD_NETROM:
#endif
neigh-ops = arp_broken_ops;
neigh-output = neigh-ops-output;
return 0;
#endif
}


--to this:--


switch (dev-type) {
case ARPHRD_ROSE:
#if defined(CONFIG_AX25) || defined(CONFIG_AX25_MODULE)
case ARPHRD_AX25:
#endif
#if defined(CONFIG_NETROM) || defined(CONFIG_NETROM_MODULE)
case ARPHRD_NETROM:
#endif
neigh-ops = arp_broken_ops;
neigh-output = neigh-ops-output;
return 0;
}

---
Which I believe is really the correct flow for that switch statement.
If someone disagrees, just toss it and apply the first patch. :-)

Regards,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]


--- linux/drivers/sound/sequencer.c.bak Tue Jan  2 22:34:44 2001
+++ linux/drivers/sound/sequencer.c Tue Jan  2 22:36:32 2001
@@ -510,8 +510,6 @@
voice = chn;
synth_devs[dev]-aftertouch(dev, voice, parm);
break;
-
-   default:
}
 #undef dev
 #undef cmd
@@ -613,8 +611,6 @@
else/* MODE 1 */
synth_devs[dev]-bender(dev, chn, w14);
break;
-
-   default:
}
 }
 
@@ -683,8 +679,6 @@
seq_copy_to_input((unsigned char *) parm, 4);
}
break;
-
-   default:
}
 
return TIMER_NOT_ARMED;
@@ -700,8 +694,6 @@
case LOCL_STARTAUDIO:
DMAbuf_start_devices(parm);
break;
-
-   default:
}
 }
 
@@ -858,8 +850,6 @@
case EV_SYSEX:
seq_sysex_message(q);
break;
-
-   default:
}
return 0;
 }
--- linux/drivers/sound/sb_ess.c.bakTue Jan  2 22:40:01 2001
+++ linux/drivers/sound/sb_ess.cTue Jan  2 22:43:05 2001
@@ -766,12 +766,6 @@
case IMODE_INPUT:
DMAbuf_inputintr (dev);
break;
-
-   case IMODE_INIT:
-   break;
-
-   default:
-   /* printk(KERN_WARN "ESS: Unexpected interrupt\n"); */
}
 }
 
@@ -1529,13 +1523,11 @@
 
 static int ess_has_rec_mixer (int submodel)
 {
-   switch (submodel) {
-   case SUBMDL_ES1887:
+   if(submodel == SUBMDL_ES1887)
return 1;
-   default:
+   else
return 0;
-   };
-};
+}
 
 #ifdef FKS_LOGGING
 static int ess_mixer_mon_regs[]
--- linux/drivers/sound/sound_timer.c.bak   Tue Jan  2 22:44:56 2001
+++ linux/drivers/sound/sound_timer.c   Tue Jan  2 22:45:06 2001
@@ -164,8 +164,6 @@
case TMR_ECHO:
seq_copy_to_input(event, 8);
break;
-
-   default:
}
return TIMER_NOT_ARMED;
 }
--- linux/fs/isofs/inode.c.bak  Tue Jan  2 22:46:55 2001
+++ linux/fs/isofs/inode.c  Tue Jan  2 22:47:01 2001
@@ -1264,7 +1264,7 @@
(volume_seq_no != 0)  (volume_seq_no != 1)) {
printk(KERN_WARNING "Multi-volume CD somehow got mounted.\n");
} else
-#endif IGNORE_WRONG_MULTI_VOLUME_SPECS
+#endif /* IGNORE_WRONG_MULTI_VOLUME_SPECS */
{
if (S_ISREG(inode-i_mode)) {
inode-i_fop = generic_ro_fops;


--- linux/net/ipv4/arp.c.bakTue Jan  2 22:53:31 2001
+++ linux/net/ipv4/arp.cTue Jan  2 22:56:13 2001
@@ -267,7 +267,6 @@
   in old paradigm.
 */
 
-#if 1
/* So... these "amateur" devices are hopeless.
   The only thing, that I can say now:
 

Re: test13-pre6 (Fork Bug with Athlons? Temporary Fix)

2000-12-29 Thread Byron Stanoszek

On Fri, 29 Dec 2000, Linus Torvalds wrote:

> 
> Ok, there's a test13-pre6 out there now, which does a partial sync with
> Alan, in addition to hopefully fixing the innd shared mapping writeback
> problem for good.  Thanks to Marcelo Tosatti and others..

I've been noticing a problem with the memory context switching conflicting with
fork() on my Athlon. The problem began in the test13-pre2 patch, and because
nobody else has seen this problem (or otherwise reported it) since then, I
felt I should look into it a little further.

I narrowed the problem down to a subset of patches from the MM set in
test13-pre2. Reversing the attached 'context.patch' fixes the problem (only for
i386), but I'm not yet sure why. test13-pre2 and up work without any problems
on an Intel cpu (Pentium 180 & P3 800 tested).

Anyways, I can't seem to find out what really changes with the patch except for
the obvious 'void *segment' changing into a typedef-struct. The only thing I
can think of is that the compiler decodes it differently, but I think I can
safely rule that out. I tried both 2.91.66 and 2.95.2, using both different
types of parameters for P5 & K7 (-march=i586 & -march=i686 -malign-functions=4)
and it still gives the problem on the Athlon. Maybe there's something I've
overlooked in that attached patch. Request for an extra pair of eyes please. :)


Here are the casual symptoms. The parent seems to die as soon as a forked child
exits, which seems to me that a new LDT isn't being initialized correctly:

root:~> ps -aux
USER   PID %CPU %MEM   VSZ  RSS TTY  STAT START   TIME COMMAND
root 1  1.1  0.4  1228  532 ?S21:42   0:05 init [3]
root 2  0.0  0.0 00 ?SW   21:42   0:00 [keventd]
root 3  0.0  0.0 00 ?SW   21:42   0:00 [kswapd]
root 4  0.0  0.0 00 ?SW   21:42   0:00 [kreclaimd]
root 5  0.0  0.0 00 ?SW   21:42   0:00 [bdflush]
root 6  0.0  0.0 00 ?SW   21:42   0:00 [kupdate]
root   289  0.0  0.4  1284  604 ?S21:42   0:00 syslogd -m 0
root   299  0.0  0.8  1912 1104 ?S21:42   0:00 klogd
root   351  0.0  1.2  9292 1576 ?S21:42   0:00 named
root   361  0.0  0.0 00 ?Z21:42   0:00 [named ]
root   363  0.0  1.2  9292 1576 ?S21:42   0:00 named
root   364  0.0  1.2  9292 1576 ?S21:42   0:00 named
root   365  0.0  0.7  2064  936 ?S21:42   0:00 /usr/sbin/sshd
..etc
(Note PID 361)

root:~> strace nslookup sunsite.unc.edu
 :
 :
rt_sigaction(SIGINT, {0x4003ce78, ~[], 0x400}, NULL, 8) = 0
rt_sigaction(SIGTERM, {0x4003ce78, ~[], 0x400}, NULL, 8) = 0
rt_sigaction(SIGPIPE, {SIG_IGN}, NULL, 8) = 0
rt_sigaction(SIGHUP, {SIG_DFL}, NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [HUP INT TERM], NULL, 8) = 0
getpid()= 2615
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
close(3)= 0
socket(PF_INET6, SOCK_STREAM, 0)= -1 ENOSYS (Function not implemented)
socket(PF_INET6, SOCK_STREAM, 0)= -1 ENOSYS (Function not implemented)
socket(PF_INET6, SOCK_STREAM, 0)= -1 EAFNOSUPPORT (Address family not 
supported by protocol)--- SIGSEGV (Segmentation fault) ---
+++ killed by SIGSEGV +++


---Example parent/child process:

root:~> tar -xzvvf ../pkgs/zgv-5.2.tar.gz
 :
 :
-rw--- rus/users  1356 2000-06-01 11:46:57 zgv-5.2/INSTALL
-rw--- rus/users 17976 1994-08-23 16:09:05 zgv-5.2/COPYING
-rw--- rus/users  1077 1998-08-26 09:24:31 zgv-5.2/README.fonts
-rw--- rus/users   120 2000-04-22 22:46:49 zgv-5.2/AUTHORS
-rw--- rus/users  3714 2000-01-23 16:29:40 zgv-5.2/SECURITY
Segmentation fault (core dumped)

root:~> strace tar -xzvvf ../pkgs/zgv-5.2.tar.gz
 :
 :
open("zgv-5.2/COPYING", O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0600) = 4
write(4, "\t\tGNU GENERAL PUBLIC LICENSE"..., 9728) = 9728
read(3, "ccept this License.  Therefore, "..., 10240) = 10240
write(4, "ccept this License.  Therefore, "..., 8248) = 8248
close(4)= 0
utime("zgv-5.2/COPYING", [2000/12/29-20:21:16, 1994/08/23-16:09:05]) = 0
chown32("zgv-5.2/COPYING", 500, 100)= 0
write(1, "-rw--- rus/users  1077 1"..., 72-rw--- rus/users  1077 
1998-08-26 09:24:31 zgv-5.2/README.fonts
) = 72
open("zgv-5.2/README.fonts", O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0600) = 4
write(4, "The copyright for *.bdf (taken f"..., 1024) = 1024
read(3, "\"as\nis\" without express or impli"..., 10240) = 8192
--- SIGCHLD (Child exited) ---
--- SIGSEGV (Segmentation fault) ---
+++ killed by SIGSEGV +++

Ideas, anyone?

 -Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
C

Re: test13-pre6 (Fork Bug with Athlons? Temporary Fix)

2000-12-29 Thread Byron Stanoszek

On Fri, 29 Dec 2000, Linus Torvalds wrote:

 
 Ok, there's a test13-pre6 out there now, which does a partial sync with
 Alan, in addition to hopefully fixing the innd shared mapping writeback
 problem for good.  Thanks to Marcelo Tosatti and others..

I've been noticing a problem with the memory context switching conflicting with
fork() on my Athlon. The problem began in the test13-pre2 patch, and because
nobody else has seen this problem (or otherwise reported it) since then, I
felt I should look into it a little further.

I narrowed the problem down to a subset of patches from the MM set in
test13-pre2. Reversing the attached 'context.patch' fixes the problem (only for
i386), but I'm not yet sure why. test13-pre2 and up work without any problems
on an Intel cpu (Pentium 180  P3 800 tested).

Anyways, I can't seem to find out what really changes with the patch except for
the obvious 'void *segment' changing into a typedef-struct. The only thing I
can think of is that the compiler decodes it differently, but I think I can
safely rule that out. I tried both 2.91.66 and 2.95.2, using both different
types of parameters for P5  K7 (-march=i586  -march=i686 -malign-functions=4)
and it still gives the problem on the Athlon. Maybe there's something I've
overlooked in that attached patch. Request for an extra pair of eyes please. :)


Here are the casual symptoms. The parent seems to die as soon as a forked child
exits, which seems to me that a new LDT isn't being initialized correctly:

root:~ ps -aux
USER   PID %CPU %MEM   VSZ  RSS TTY  STAT START   TIME COMMAND
root 1  1.1  0.4  1228  532 ?S21:42   0:05 init [3]
root 2  0.0  0.0 00 ?SW   21:42   0:00 [keventd]
root 3  0.0  0.0 00 ?SW   21:42   0:00 [kswapd]
root 4  0.0  0.0 00 ?SW   21:42   0:00 [kreclaimd]
root 5  0.0  0.0 00 ?SW   21:42   0:00 [bdflush]
root 6  0.0  0.0 00 ?SW   21:42   0:00 [kupdate]
root   289  0.0  0.4  1284  604 ?S21:42   0:00 syslogd -m 0
root   299  0.0  0.8  1912 1104 ?S21:42   0:00 klogd
root   351  0.0  1.2  9292 1576 ?S21:42   0:00 named
root   361  0.0  0.0 00 ?Z21:42   0:00 [named defunct]
root   363  0.0  1.2  9292 1576 ?S21:42   0:00 named
root   364  0.0  1.2  9292 1576 ?S21:42   0:00 named
root   365  0.0  0.7  2064  936 ?S21:42   0:00 /usr/sbin/sshd
..etc
(Note PID 361)

root:~ strace nslookup sunsite.unc.edu
 :
 :
rt_sigaction(SIGINT, {0x4003ce78, ~[], 0x400}, NULL, 8) = 0
rt_sigaction(SIGTERM, {0x4003ce78, ~[], 0x400}, NULL, 8) = 0
rt_sigaction(SIGPIPE, {SIG_IGN}, NULL, 8) = 0
rt_sigaction(SIGHUP, {SIG_DFL}, NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [HUP INT TERM], NULL, 8) = 0
getpid()= 2615
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
close(3)= 0
socket(PF_INET6, SOCK_STREAM, 0)= -1 ENOSYS (Function not implemented)
socket(PF_INET6, SOCK_STREAM, 0)= -1 ENOSYS (Function not implemented)
socket(PF_INET6, SOCK_STREAM, 0)= -1 EAFNOSUPPORT (Address family not 
supported by protocol)--- SIGSEGV (Segmentation fault) ---
+++ killed by SIGSEGV +++


---Example parent/child process:

root:~ tar -xzvvf ../pkgs/zgv-5.2.tar.gz
 :
 :
-rw--- rus/users  1356 2000-06-01 11:46:57 zgv-5.2/INSTALL
-rw--- rus/users 17976 1994-08-23 16:09:05 zgv-5.2/COPYING
-rw--- rus/users  1077 1998-08-26 09:24:31 zgv-5.2/README.fonts
-rw--- rus/users   120 2000-04-22 22:46:49 zgv-5.2/AUTHORS
-rw--- rus/users  3714 2000-01-23 16:29:40 zgv-5.2/SECURITY
Segmentation fault (core dumped)

root:~ strace tar -xzvvf ../pkgs/zgv-5.2.tar.gz
 :
 :
open("zgv-5.2/COPYING", O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0600) = 4
write(4, "\t\tGNU GENERAL PUBLIC LICENSE"..., 9728) = 9728
read(3, "ccept this License.  Therefore, "..., 10240) = 10240
write(4, "ccept this License.  Therefore, "..., 8248) = 8248
close(4)= 0
utime("zgv-5.2/COPYING", [2000/12/29-20:21:16, 1994/08/23-16:09:05]) = 0
chown32("zgv-5.2/COPYING", 500, 100)= 0
write(1, "-rw--- rus/users  1077 1"..., 72-rw--- rus/users  1077 
1998-08-26 09:24:31 zgv-5.2/README.fonts
) = 72
open("zgv-5.2/README.fonts", O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0600) = 4
write(4, "The copyright for *.bdf (taken f"..., 1024) = 1024
read(3, "\"as\nis\" without express or impli"..., 10240) = 8192
--- SIGCHLD (Child exited) ---
--- SIGSEGV (Segmentation fault) ---
+++ killed by SIGSEGV +++

Ideas, anyone?

 -Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Em

Re: ATI Mach 64 (2.4.0-test12)

2000-12-23 Thread Byron Stanoszek

On Sat, 23 Dec 2000, Rok Pergarec wrote:

> Hello,
> 
> I have problems with the ATI Mach 64 (Rage 2) video card. After a boot, I
> get just a blank screen with a few vertical lines, but the system boots up
> normally beacuse I can reboot the machine anyway. I don't get a sigle
> error in compiling.

Try configuring just standard 'VGA text console' and disabling 'Video mode
selection support'. Don't configure any special ATI choices either (or frame
buffering), and see if it boots up then. The Mach64 should have standard VGA
capabilities and should be compatible with every other vga card on bootup.

If it still doesn't work, send me your /usr/src/linux/.config file so I can see
what you have configured.

Regards,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: ATI Mach 64 (2.4.0-test12)

2000-12-23 Thread Byron Stanoszek

On Sat, 23 Dec 2000, Rok Pergarec wrote:

 Hello,
 
 I have problems with the ATI Mach 64 (Rage 2) video card. After a boot, I
 get just a blank screen with a few vertical lines, but the system boots up
 normally beacuse I can reboot the machine anyway. I don't get a sigle
 error in compiling.

Try configuring just standard 'VGA text console' and disabling 'Video mode
selection support'. Don't configure any special ATI choices either (or frame
buffering), and see if it boots up then. The Mach64 should have standard VGA
capabilities and should be compatible with every other vga card on bootup.

If it still doesn't work, send me your /usr/src/linux/.config file so I can see
what you have configured.

Regards,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test12 randomly hangs up

2000-12-13 Thread Byron Stanoszek

On Wed, 13 Dec 2000, Lukasz Trabinski wrote:

> In article <[EMAIL PROTECTED]> you wrote:
> 
> > I can (re)confirm that. I work several hours on console without any
> > problem ... then I start X session and after several minutes system
> > hangs.
> 
> I can confirm that, too.
> Todaye, crashed two difference machines
> One: AMD-K6 3D, 300 MHz, RH 7.0 + updates, 64MB RAM
> Second one: AMD Athlon 600, 600MHz with, 128MB RAM, RH 7.0+updates
> 
> > Red Hat 7.0, XFree-3.3.6 (SVGA server), S3Virge/G2 (4MB)

I've been running 2.4.0-test12 patched with only the O_SYNC bug fix and I have
_not_ experienced any lockups on this machine.

Classic Athlon 825(750) MHz, 128MB Ram,
RH 7.0 based w/glibc 2.2, XFree-3.3.6 (S3 Trio 64 accel server), gcc 2.95.2

Not sure what the problem is yet... keep trying folks. :)

 -Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test12 randomly hangs up

2000-12-13 Thread Byron Stanoszek

On Wed, 13 Dec 2000, Lukasz Trabinski wrote:

 In article [EMAIL PROTECTED] you wrote:
 
  I can (re)confirm that. I work several hours on console without any
  problem ... then I start X session and after several minutes system
  hangs.
 
 I can confirm that, too.
 Todaye, crashed two difference machines
 One: AMD-K6 3D, 300 MHz, RH 7.0 + updates, 64MB RAM
 Second one: AMD Athlon 600, 600MHz with, 128MB RAM, RH 7.0+updates
 
  Red Hat 7.0, XFree-3.3.6 (SVGA server), S3Virge/G2 (4MB)

I've been running 2.4.0-test12 patched with only the O_SYNC bug fix and I have
_not_ experienced any lockups on this machine.

Classic Athlon 825(750) MHz, 128MB Ram,
RH 7.0 based w/glibc 2.2, XFree-3.3.6 (S3 Trio 64 accel server), gcc 2.95.2

Not sure what the problem is yet... keep trying folks. :)

 -Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Trashing ext2 with hdparm

2000-12-06 Thread Byron Stanoszek

On Wed, 6 Dec 2000, Udo A. Steinberg wrote:

> Hi,
> 
> Following the discussion in another thread where someone
> reported fs corruption when enabling DMA with hdparm, I've
> played around with hdparm and found that even the rather
> harmless hdparm operations are capable of trashing an ext2
> filesystem quite nicely.
> 
> hdparm version is 3.9
> 
> hdparm -tT /dev/hdb1 does the trick here.
> 
> After that, several files are corrupted, such as /etc/mtab.
> Reboot+fsck fixes the problem, however e2fsck never finds
> any errors in the fs on disk.
> 
> I'm quite sure that earlier kernel versions didn't exhibit
> this kind of behaviour, although I'm not quite sure at
> which point things started to break. I have test12-pre6
> here atm, but I have test-11 still lying around and will
> test that in a bit.

I've seen this behavior on test-6 and up. I think it has something to do with
a problem in shared memory which is used by the 'hdparm -tT' code snippet. I
believe it munges over a lot of the memory segments that contain cached disk
files (the common ones accessed, such as /etc/mtab and /etc/utmp..etc). When
looking at the contents of those files, the memory is obtained from the cache
and they appear bogus, but on disk they are still correct.

If this problem occurs, it's best to hit the reset button so that no 'bad' data
is written back to disk during a sync call.

Can anyone else verify that the problem is in shared memory and not the disk
caching layer?

Regards,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Trashing ext2 with hdparm

2000-12-06 Thread Byron Stanoszek

On Wed, 6 Dec 2000, Udo A. Steinberg wrote:

 Hi,
 
 Following the discussion in another thread where someone
 reported fs corruption when enabling DMA with hdparm, I've
 played around with hdparm and found that even the rather
 harmless hdparm operations are capable of trashing an ext2
 filesystem quite nicely.
 
 hdparm version is 3.9
 
 hdparm -tT /dev/hdb1 does the trick here.
 
 After that, several files are corrupted, such as /etc/mtab.
 Reboot+fsck fixes the problem, however e2fsck never finds
 any errors in the fs on disk.
 
 I'm quite sure that earlier kernel versions didn't exhibit
 this kind of behaviour, although I'm not quite sure at
 which point things started to break. I have test12-pre6
 here atm, but I have test-11 still lying around and will
 test that in a bit.

I've seen this behavior on test-6 and up. I think it has something to do with
a problem in shared memory which is used by the 'hdparm -tT' code snippet. I
believe it munges over a lot of the memory segments that contain cached disk
files (the common ones accessed, such as /etc/mtab and /etc/utmp..etc). When
looking at the contents of those files, the memory is obtained from the cache
and they appear bogus, but on disk they are still correct.

If this problem occurs, it's best to hit the reset button so that no 'bad' data
is written back to disk during a sync call.

Can anyone else verify that the problem is in shared memory and not the disk
caching layer?

Regards,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: major problem with linux-2.4.0test10

2000-11-08 Thread Byron Stanoszek

On Wed, 8 Nov 2000, J.D. Hollis wrote:

> I've been having a problem with memory since 2.4.0test9...for some reason,
> shortly after my system boots, my hard drive begins to seek rapidly for no
> apparent reason, and suddenly about 150MB of my 256MB RAM is filled up with
> something gtop labels 'Other.'  whatever's filling my memory isn't attached
> to any process.   the one time I managed to get gtop open while the hard
> drive was seeking, I noticed that kflushd was using about 98% of my
> processor (an Athlon 900MHz).  I'm running Redhat 7.0 (although I have no
> idea whether that makes a difference).

It's probably a program called 'sa' that anacron starts up right after boot
(on RH 7.0). It's a system accounting program that starts up and scans the
entire drive looking for stuff, which fills up cache in RAM.

I generally disable all anacron stuff and remove /etc/cron.??* and the
daily/weekly/monthly entries in /etc/crontab, then I run 
'/etc/rc.d/init.d/crond restart'. That oughta fix it, but you might want to
look into the cron scripts individually and selectively remove the lines you
don't want. :-)

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: major problem with linux-2.4.0test10

2000-11-08 Thread Byron Stanoszek

On Wed, 8 Nov 2000, J.D. Hollis wrote:

 I've been having a problem with memory since 2.4.0test9...for some reason,
 shortly after my system boots, my hard drive begins to seek rapidly for no
 apparent reason, and suddenly about 150MB of my 256MB RAM is filled up with
 something gtop labels 'Other.'  whatever's filling my memory isn't attached
 to any process.   the one time I managed to get gtop open while the hard
 drive was seeking, I noticed that kflushd was using about 98% of my
 processor (an Athlon 900MHz).  I'm running Redhat 7.0 (although I have no
 idea whether that makes a difference).

It's probably a program called 'sa' that anacron starts up right after boot
(on RH 7.0). It's a system accounting program that starts up and scans the
entire drive looking for stuff, which fills up cache in RAM.

I generally disable all anacron stuff and remove /etc/cron.??* and the
daily/weekly/monthly entries in /etc/crontab, then I run 
'/etc/rc.d/init.d/crond restart'. That oughta fix it, but you might want to
look into the cron scripts individually and selectively remove the lines you
don't want. :-)

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test10-pre5 -- Compile error drivers/video/video.o: In function`vesafb_set_disp': undefined references

2000-10-24 Thread Byron Stanoszek

On Tue, 24 Oct 2000, Miles Lane wrote:

> James, I tried something even more drastic than running
> make mrproper.  I blew away my old source tree, untarred
> a test9 tree, patched it to test10-pre5, copied my old .config
> file into it, ran make oldconfig menuconfig dep all install
> modules modules_install.  When I ran menuconfig, I set the
> options as a mentioned in my first message.  I still got the
> errors.

I sometimes have better luck when splitting the make commands into three:
 make clean
 make dep
 make bzImage

Try that and see if it works any better.

Regards,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test10-pre5 -- Compile error drivers/video/video.o: In function`vesafb_set_disp': undefined references

2000-10-24 Thread Byron Stanoszek

On Tue, 24 Oct 2000, Miles Lane wrote:

 James, I tried something even more drastic than running
 make mrproper.  I blew away my old source tree, untarred
 a test9 tree, patched it to test10-pre5, copied my old .config
 file into it, ran make oldconfig menuconfig dep all install
 modules modules_install.  When I ran menuconfig, I set the
 options as a mentioned in my first message.  I still got the
 errors.

I sometimes have better luck when splitting the make commands into three:
 make clean
 make dep
 make bzImage

Try that and see if it works any better.

Regards,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: OOM Test Case - Failed!

2000-10-21 Thread Byron Stanoszek

On Sat, 21 Oct 2000, Rik van Riel wrote:

> > The oom killer avoided killing your busy, large, root-owned
> > process. Don't run gcc compiles as root.  Protecting root
> > processes is an explicit design goal here.
> 
> Also:
> 
> 1) his system pretty much continued to run
> 2) since only httpd children got killed, no work
>was lost

The system ran, but nothing moved. No process was able to do any activity,
because they were all waiting on swapped out space or waiting to use more
as-of-yet unallocated virtual memory. I could verify this because one of
my daemons writes one line to disk every 5 minutes. That stopped completely
during this event.

> (only the fact that he ran genattrtab as root screwed
> up things a bit and kept the system from killing the
> task -- but probably only just)

If I would have known, I would have done otherwise.

 -Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: OOM Test Case - Failed!

2000-10-21 Thread Byron Stanoszek

On Sat, 21 Oct 2000, Rik van Riel wrote:

  The oom killer avoided killing your busy, large, root-owned
  process. Don't run gcc compiles as root.  Protecting root
  processes is an explicit design goal here.
 
 Also:
 
 1) his system pretty much continued to run
 2) since only httpd children got killed, no work
was lost

The system ran, but nothing moved. No process was able to do any activity,
because they were all waiting on swapped out space or waiting to use more
as-of-yet unallocated virtual memory. I could verify this because one of
my daemons writes one line to disk every 5 minutes. That stopped completely
during this event.

 (only the fact that he ran genattrtab as root screwed
 up things a bit and kept the system from killing the
 task -- but probably only just)

If I would have known, I would have done otherwise.

 -Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



OOM Test Case - Failed!

2000-10-17 Thread Byron Stanoszek

I am very unimpressed with the current OOM killer. After 10 days of online
time, I decided to try compiling gcc again, the very culprit that killed my
last system using 2.4.0-test8 Friday night (to which I was unable to reset
the system until Monday morning).

GCC started compiling normally, until it reached the command:
  ./genattrtab ../../gcc/config/i386/i386.md > tmp-attrtab.c

At this time, genattrtab started to accumulate 70+ Megabytes of memory. For
comparison, I only have 32MB of RAM and 64MB of swap space. Also during this
time were several daemon and user-level programs running, using at most 4MB
of ram each and running peacefully in the background.

The system slowed down to a crawl. 5 minutes later, the OOM killer finally
kicked in and killed 5 processes: . I figure, okay, httpd doesn't need
to run, I'd rather give the GCC-compilation the extra RAM it needs to finish
its 'genattrtab' program.

10 minutes pass and the system does not get better. Then all of a sudden, the
console flashes with more  processes killed. "What is going on here," I
thought to myself. There were only 6 httpd processes running when I first
started the compilation. It appears that the OOM killer destroyed only the
children of the Apache web daemon, and not the daemon itself! The web daemon
just spawned more httpd processes to fill in the children that it lost earlier.

Meanwhile, genattrtab continued to consume RAM in the background. After 10 more
minutes of waiting on the OOM killer, I come back to a console that is filled
with 'Killing process ' messages. It never had the bright idea to kill
the parent or any process OTHER than httpd.

The expected process to kill here would be ./genattrtab, which at the time was
consuming more RAM than available and had only started 25 minutes prior...

root  1099 63.6 61.5 71424 18740 pts/0   R09:39   1:22 ./genattrtab

This was my first OOM killer test, run on 2.4.0-test9-final with Rik's VM
patches that went into test10-pre1. My prognosis is that the VM runs almost 2x
as fast when there is memory available and swapping occurs, compared to the
old VM. However, when memory runs out, it takes up to 5 minutes for the OOM
killer to start killing processes, and does a bad job at that.

Granted, the random OOM killer in 2.2 was better at its job than this because
it brought back a usable system. Even something that killed the process that's
using the most RAM or the process that allocates the most space in a set period
of time would be good in this case. We need to decide on a better algorithm,
albeit simple, that will alleviate this problem before 2.4.0 final comes out.

Regards,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



OOM Test Case - Failed!

2000-10-17 Thread Byron Stanoszek

I am very unimpressed with the current OOM killer. After 10 days of online
time, I decided to try compiling gcc again, the very culprit that killed my
last system using 2.4.0-test8 Friday night (to which I was unable to reset
the system until Monday morning).

GCC started compiling normally, until it reached the command:
  ./genattrtab ../../gcc/config/i386/i386.md  tmp-attrtab.c

At this time, genattrtab started to accumulate 70+ Megabytes of memory. For
comparison, I only have 32MB of RAM and 64MB of swap space. Also during this
time were several daemon and user-level programs running, using at most 4MB
of ram each and running peacefully in the background.

The system slowed down to a crawl. 5 minutes later, the OOM killer finally
kicked in and killed 5 processes: httpd. I figure, okay, httpd doesn't need
to run, I'd rather give the GCC-compilation the extra RAM it needs to finish
its 'genattrtab' program.

10 minutes pass and the system does not get better. Then all of a sudden, the
console flashes with more httpd processes killed. "What is going on here," I
thought to myself. There were only 6 httpd processes running when I first
started the compilation. It appears that the OOM killer destroyed only the
children of the Apache web daemon, and not the daemon itself! The web daemon
just spawned more httpd processes to fill in the children that it lost earlier.

Meanwhile, genattrtab continued to consume RAM in the background. After 10 more
minutes of waiting on the OOM killer, I come back to a console that is filled
with 'Killing process httpd' messages. It never had the bright idea to kill
the parent or any process OTHER than httpd.

The expected process to kill here would be ./genattrtab, which at the time was
consuming more RAM than available and had only started 25 minutes prior...

root  1099 63.6 61.5 71424 18740 pts/0   R09:39   1:22 ./genattrtab

This was my first OOM killer test, run on 2.4.0-test9-final with Rik's VM
patches that went into test10-pre1. My prognosis is that the VM runs almost 2x
as fast when there is memory available and swapping occurs, compared to the
old VM. However, when memory runs out, it takes up to 5 minutes for the OOM
killer to start killing processes, and does a bad job at that.

Granted, the random OOM killer in 2.2 was better at its job than this because
it brought back a usable system. Even something that killed the process that's
using the most RAM or the process that allocates the most space in a set period
of time would be good in this case. We need to decide on a better algorithm,
albeit simple, that will alleviate this problem before 2.4.0 final comes out.

Regards,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



CDROMPLAYTRKIND problems (as of 2.4.0-test8)

2000-10-14 Thread Byron Stanoszek

This change seems to have broken playing a track index:

  3.11 Jun 12, 2000 - Jens Axboe <[EMAIL PROTECTED]>
  -- Reinstate "correct" CDROMPLAYTRKIND

The way it's done is very bad for IDE drives (at least). In the cdrom_play_audio
function in ide-cd.c, we see:

  pc.c[0] = GPCMD_PLAY_AUDIO_10;
  put_unaligned(cpu_to_be32(lba_start), (unsigned int *) [2]);
  put_unaligned(cpu_to_be16(lba_end - lba_start), (unsigned int *) [7]);

Problem is, lba_end on most CDs is much, much higher than what 16-bits can
store. Therefore, when playing audio, the CDROM stops playing tracks at every
12-minute intervals throughout the CD. Sometimes playing a track will only play
the first few seconds, if the track begins right before sector 65535.

I would like to propose switching back to using the scsi CDROMPLAYTRKIND on ide
drives (even though most don't support it), or choose a better method of
playing a track so it'll play to the end of the cd. (Is there a 32-bit version
for [length] with PLAY_AUDIO_10?)

-Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Updated 2.4 TODO List -- new addition WAS(test9 PCI

2000-10-11 Thread Byron Stanoszek

On Wed, 11 Oct 2000, Alan Cox wrote:

> The only case that 2.95 was at fault is strstr() miscompiles which 2.96
> snapshots actually got right.

I couldn't get llabs() to compile correctly either on 2.95.2. There were other
small problems when using long long, but I can't remember them right now.

 -Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Updated 2.4 TODO List -- new addition WAS(test9 PCI

2000-10-11 Thread Byron Stanoszek

On Wed, 11 Oct 2000, Alan Cox wrote:

 The only case that 2.95 was at fault is strstr() miscompiles which 2.96
 snapshots actually got right.

I couldn't get llabs() to compile correctly either on 2.95.2. There were other
small problems when using long long, but I can't remember them right now.

 -Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] New ideas for the OOM handler

2000-10-09 Thread Byron Stanoszek

On Mon, 9 Oct 2000 [EMAIL PROTECTED] wrote:

> how about registering the full path (or inode number of the executable?),
> the owner, and an optional high water mark of memory consumption, over
> which the process is considered to be leaking memory and gets added to the
> algorithm of processes to kill?  this is because while normally i want to
> ignore syslogd, if syslogd is consuming >20MB then it probably has sprung
> a leak.

Now we're getting somewhere! :)

Inode number would have to store disk maj:min too, and have the ability to
flush inodes stored for a particular disk when mounted/umounted. This might
cause a problem for one-time writes to a /proc entry that aren't refreshed when
the fs is [re]mounted.

Also, if you upgrade 'named' and replace the old executable, that inode entry
in /proc no longer makes sense. It's almost too hard to maintain by storing
the inode, unfortunately.

I like the idea for adding a high MB mark at which to kill the process (only
when OOM is kicked in). That can be added in addition to a priority scheme.

I envision a /proc format such as this, one entry per line:

/full/path/to/executable [priority] [max kbytes used]

where Priority is -20 to +20 (default 0), -20 being kill at last resort and +20
being try to kill these processes first.

max kbytes, if present and nonzero, would match this process only if its total
VSZ is >= kb.

An example file could be:

/sbin/init -20
/usr/sbin/syslogd -19
/usr/sbin/named -18
/usr/local/apache/bin/httpd -15
/usr/lib/netscape/netscape +15
/home/byron/daemonprogram -5 32768

Under the rule I described (below), /sbin/init is practically invalid because
adding that entry to the /proc file would come AFTER /sbin/init was first
executed.

httpd would be killed with greater priority than named and syslogd (assuming
VSZ was the same), in case of a memory leak. Of course, you might not want
this, and you can easily change it.

On the other hand, if my daemonprogram starts taking up more than 32 meg, then
kill it first before killing the other daemons (only when system is running low
on memory).

> it also might be good to have options to kill anything connected to a pty
> first, and to not kill anything attatched to the console.  obviously these
> leave ways for admins to shoot themselves in the foot, but they could be
> useful.

I _had_ thought of that, but I don't know how clear that is in the process
structure. Malicious users can simply run setsid() to detach from a controlling
tty, thereby defeating the rule.

> also ignoring or including based on users and groups would be good (kill
> all the student logins, keep all the operator logins, or whatever...).

This would almost be in the same category as the 32-bit UIDs, where there are
TONS of people using a server or network cluster where memory shortage might
actually be a problem. I think it is a good route to explore.

> in general i like the idea of tunable OOM killing, since no one OOM killer
> suits everyone.

Agreed.

> > > What about a user-defined list of "wishes"? The administrator should be
> > > enabled to enforce that specific processes are to be terminated only as a
> > > last resource (syslogd), or that they should be killed first (netscape).
> > > Could that be done using some /proc interface - some lines, each
> > > containing a program name, and a modifier for the killing priority?
> > 
> > echo "init" > /proc/sys/oom-ignore
> > echo "httpd" >> /proc/sys/oom-ignore
> > echo "parallel-fft" >> /proc/sys/oom-ignore
> >  etc...
> > 
> > This is a very workable option. It allows the admin to define what is
> > "important" on his computer and tells the OOM killer to terminate at
> > last resort (or ignore completely).
> > 
> > I like it.  Rik, what do you think?
> 

> On Mon, 9 Oct 2000, Matthew Dharm wrote:
> 
> > On Mon, Oct 09, 2000 at 09:25:38PM -0400, Byron Stanoszek wrote:
> > > echo "init" > /proc/sys/oom-ignore
> > > echo "httpd" >> /proc/sys/oom-ignore
> > > echo "parallel-fft" >> /proc/sys/oom-ignore
> > >  etc...
> >
> > I'd be concerned with the security implications of this feature... after
> > all, I can edit argv[0] to change the name of my program -- a malicious
> > application could do this to attempt to "hide" from the OOM Killer.
> 
> You have a point. How about specifying the complete path in the /proc file,
> so that a special bit is set on the process when it is executed.
> 
> echo "/sbin/init" > /proc/sys/oom-ignore
> 
> User does an execvp("init", ...)
> 
> At this time, the user's path is searched for a match for "init".
> The kernel d

Re: [RFC] New ideas for the OOM handler

2000-10-09 Thread Byron Stanoszek

On Mon, 9 Oct 2000, Matthew Dharm wrote:

> On Mon, Oct 09, 2000 at 09:25:38PM -0400, Byron Stanoszek wrote:
> > echo "init" > /proc/sys/oom-ignore
> > echo "httpd" >> /proc/sys/oom-ignore
> > echo "parallel-fft" >> /proc/sys/oom-ignore
> >  etc...
> 
> I'd be concerned with the security implications of this feature... after
> all, I can edit argv[0] to change the name of my program -- a malicious
> application could do this to attempt to "hide" from the OOM Killer.

You have a point. How about specifying the complete path in the /proc file,
so that a special bit is set on the process when it is executed.

echo "/sbin/init" > /proc/sys/oom-ignore

User does an execvp("init", ...)

At this time, the user's path is searched for a match for "init".
The kernel determines "/sbin/init" is the correct path name and begins
to execute the program. It creates the process structure.

Then, before starting program execution, it compares the full path with the
list of entries in the /proc file. Seeing a match, it sets a bit (or a
variable kill-rate) on the process structure. OOM reads this process structure
in at kill-time and instantly determines which it should kill quickly or stay
away from. If none apply, default to standard algorithm.

 -Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] New ideas for the OOM handler

2000-10-09 Thread Byron Stanoszek

On Tue, 10 Oct 2000, Jochen Striepe wrote:

> Hi, a question regarding the OOM process killer...
> 
> Hmm, sometimes daemon-like processes (e.g. web servers) only need root
> privileges to open a network port<1024 - you may start them as non-root
> if they do not need such a privileged port. Might be hard to sort them
> out...

That is very true. I run several daemons in non-superuser mode that would
become the victim of that.

This reminds me of an earlier post where I discussed that CPU Time should not
be factored in. I may have misunderstood (my apologies to Rik). What is more
important is how long the process has been running (daemons usually get started
first thing at bootup, versus running 'netscape' and shortly using up all
memory in 30 minutes).

Figuring in Time Since Process Creation can almost be a misguided way of doing
things. I don't know how many times I've restarted or upgraded 'named' on my
90-day-uptime system, just to change the configuration file or to back out
invalid serial numbers in DNS zone files.

However, if I have been running a 99% cpu-intensive mathematical modeling
program for the past 10 days, I wouldn't want it to get killed because it
allocated 50 MB in the first second of its life when the system had 250MB to
spare.

However, what if I started up netscape on day 1, ran tons of other processes
meanwhile, and only used netscape lightly day by day. After 90 days of light
usage, netscape might actually be using 150MB of ram now. Does netscape
(rightfully) get killed, or does my modeling program which only uses 50MB get
killed because I started it only 10 days ago instead of 90 days?

Neither process start date nor CPU Usage % can correctly detect which process
to kill over a period of 10 to 90 days, in this scenario. This is why I don't
like factoring in these two elements. The OOM killer will get this right 90% of
the time, maybe even 95%. But what about the sequential 'child worker' that was
forked off of the modeling program once every 5 hours?

I think there should be a better solution.

> What about a user-defined list of "wishes"? The administrator should be
> enabled to enforce that specific processes are to be terminated only as a
> last resource (syslogd), or that they should be killed first (netscape).
> Could that be done using some /proc interface - some lines, each
> containing a program name, and a modifier for the killing priority?

echo "init" > /proc/sys/oom-ignore
echo "httpd" >> /proc/sys/oom-ignore
echo "parallel-fft" >> /proc/sys/oom-ignore
 etc...

This is a very workable option. It allows the admin to define what is
"important" on his computer and tells the OOM killer to terminate at
last resort (or ignore completely).

I like it.  Rik, what do you think?

> Just a thought. Hope my English is not too bad to make my thoughts
> clear... Sorry if this was discussed on l-k before - I do not have the
> time to read each posting on the list.
> 
> 
> Greetings from Germany,
> 
> Jochen Striepe.
> 
> 

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



[RFC] New ideas for the OOM handler

2000-10-09 Thread Byron Stanoszek
 with no virtual memory should not be touched -- Kernel threads.

Additional ideas:

I thought of some additional ways of determining which process gets killed
first, prioritized on the above criteria:

 1. Keep a count of the number of sbrk() memory regions in terms of size for
each process. The count should not be a recent total or moving average kept
for the past 5-10 minutes, but instead it should be a ratio relative to the
size of sbrk() requests of other processes. This quickly determines which
process is eating up memory the fastest. 99 out of 100 times this will be a
runaway process, an evil malloc(), or an overly abusive user. At times like
these, the user will 'expect' the program to crash with a SEGV anyway.

 2. Short of marking a process a "System Process", we want to keep programs
like X or Svgalib from crashing. In this manner, I agree with the person
who said programs that have I/O Ports or devices open should be one of the
last to kill.

Also, if such processes DO get killed, we want them to return the user into
a usable state where they can interact with the computer. In all OOM
killers 2.2 and up, killing X with sig 9 is a _bad_ idea. With all due
respects, we should be killing these processes with SIGSEGV instead of
SIGKILL to give programs a chance to cool down. However, when the OOM
killer kicks in there might not be enough memory free for even a printk()
let alone a core dump. It should be possible to reserve memory for handling
OOM situations (for instance, kick in OOM when there is 64kb of memory free
and no less). Chances are the program will just crash due to default signal
handling. But if the program catches SEGV and does nothing about it, then
when 0kb of memory becomes free, completely terminate the program.

This, of course, should only happen when swap is something like 95% full
and the program isn't almost entirely swapped out. We should also set a
flag to Never dump core. We should leave enough space on the swap partition
for memory to get swapped out to disk (and program memory swapped in) to
let a signal handler do its job. I think using 100% swap is a bad idea.

These are all ideas and suggestions, and I expect most to be flamed out quick.
I wrote this to get people thinking about how we could improve our current OOM
killer and kill the 'right' programs instead of vital system daemons, without
leaving our machine idle for 5 minutes while the OOM killer tries to think of
what to kill next, either because the program is ignoring SIGTERM or there is
100% swap space used.

All in all, the OOM killer we have now is much better than the 2.2 version and
works very well for its intended purpose. These are the types of ideas I would
toss around if I were to implement the killer myself. Keeping it from being too
complicated is the hard part. So, having said the above, elaborate on these
ideas to see if we can _really_ improve our OOM and if it is worth the trouble
doing so.

I however suggest strongly that we implement the check for PID == 1 into the
current OOM and toss out checking for Nice status, which makes no real sense
(see my last post, and the posts for several others).

 -Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-09 Thread Byron Stanoszek

On Mon, 9 Oct 2000 [EMAIL PROTECTED] wrote:

> Anyway, there is/was an API in PTX to say (either from in-kernel or through
> some user machinations) "I Am a System Process".  Turns on a bit in the
> proc struct (task struct) that made it exempt from death from a variety
> of sources, e.g. OOM, generic user signals, portions of system shutdown,
> etc.

The current OOM killer does this, except for init. Checking to see if the
process has a page table is equivalent to checking for the kernel threads that
are integral to the system (PIDs 2-5). These will never be killed by the OOM.
Init, however, still can be killed, and there should be an additional statement
that doesn't kill if PID == 1.

I think we need to sit down and write a better OOM proposal, something that
doesn't use CPU time and the NICE flag. Lets concentrate our efforts on what
constitutes a good selection method instead of bickering with each other.

How about we start by everyone in this discussion give their opinion on what
the OOM selection process should do, listing them in both order of importance
and severity, giving a rational reason for each choice. Maybe then we can get
somewhere.

 -Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-09 Thread Byron Stanoszek

On Mon, 9 Oct 2000, Marco Colombo wrote:

> On Fri, 6 Oct 2000, Rik van Riel wrote:
> 
> [...]
> > They are niced because the user thinks them a bit less
> > important. 
> 
> Please don't, this assumption is quite wrong. I use nice just to be
> 'nice' to other users. I can run my *important* CPU hog simulation
> nice +10 in order to let other people get more CPU when the need it.
> But if you put the logic "niced == not important" somewhere into the
> kernel, nobody will use nice anymore. I'd rather give a bonus to niced
> processes.
> 
> I agree this is a small issue, the OOM killer job isn't "nice" at all
> anyway. B-) (at OOM time, I'd not even look at the nice of a process at
> all. But my point here is that you do, and you take it as an hint for
> process importance as percieved by the user that run it, and I believe
> it's just wrong guessing).

I agree completely. Friday night I had a talk with a few others at the office,
and we all came to a concensus that the 'nice' value really shouldn't be a
factor to determine which process gets killed first. The primary point was
that 'nice' is most commonly used for background tasks that are meant to run in
hidden and unseen with low priority. It would be extremely upsetting if a user
decided to log in and browse 50 picture-intensive pages with netscape,
racking up the memory over time, and allowing the OOM killer to zap the
peaceful, 'nice' process in the background that wasn't causing any harm.

Why else would you nice a process? Because you don't want it to interfere with
normal cpu usage by those that normally use the system. You expect that process
to still be running at the end of the day when everyone's gone home.

Regards,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 OOM handler

2000-10-09 Thread Byron Stanoszek

On Mon, 9 Oct 2000, Marco Colombo wrote:

 On Fri, 6 Oct 2000, Rik van Riel wrote:
 
 [...]
  They are niced because the user thinks them a bit less
  important. 
 
 Please don't, this assumption is quite wrong. I use nice just to be
 'nice' to other users. I can run my *important* CPU hog simulation
 nice +10 in order to let other people get more CPU when the need it.
 But if you put the logic "niced == not important" somewhere into the
 kernel, nobody will use nice anymore. I'd rather give a bonus to niced
 processes.
 
 I agree this is a small issue, the OOM killer job isn't "nice" at all
 anyway. B-) (at OOM time, I'd not even look at the nice of a process at
 all. But my point here is that you do, and you take it as an hint for
 process importance as percieved by the user that run it, and I believe
 it's just wrong guessing).

I agree completely. Friday night I had a talk with a few others at the office,
and we all came to a concensus that the 'nice' value really shouldn't be a
factor to determine which process gets killed first. The primary point was
that 'nice' is most commonly used for background tasks that are meant to run in
hidden and unseen with low priority. It would be extremely upsetting if a user
decided to log in and browse 50 picture-intensive pages with netscape,
racking up the memory over time, and allowing the OOM killer to zap the
peaceful, 'nice' process in the background that wasn't causing any harm.

Why else would you nice a process? Because you don't want it to interfere with
normal cpu usage by those that normally use the system. You expect that process
to still be running at the end of the day when everyone's gone home.

Regards,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 OOM handler

2000-10-09 Thread Byron Stanoszek

On Mon, 9 Oct 2000 [EMAIL PROTECTED] wrote:

 Anyway, there is/was an API in PTX to say (either from in-kernel or through
 some user machinations) "I Am a System Process".  Turns on a bit in the
 proc struct (task struct) that made it exempt from death from a variety
 of sources, e.g. OOM, generic user signals, portions of system shutdown,
 etc.

The current OOM killer does this, except for init. Checking to see if the
process has a page table is equivalent to checking for the kernel threads that
are integral to the system (PIDs 2-5). These will never be killed by the OOM.
Init, however, still can be killed, and there should be an additional statement
that doesn't kill if PID == 1.

I think we need to sit down and write a better OOM proposal, something that
doesn't use CPU time and the NICE flag. Lets concentrate our efforts on what
constitutes a good selection method instead of bickering with each other.

How about we start by everyone in this discussion give their opinion on what
the OOM selection process should do, listing them in both order of importance
and severity, giving a rational reason for each choice. Maybe then we can get
somewhere.

 -Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] New ideas for the OOM handler

2000-10-09 Thread Byron Stanoszek

On Tue, 10 Oct 2000, Jochen Striepe wrote:

 Hi, a question regarding the OOM process killer...
 
 Hmm, sometimes daemon-like processes (e.g. web servers) only need root
 privileges to open a network port1024 - you may start them as non-root
 if they do not need such a privileged port. Might be hard to sort them
 out...

That is very true. I run several daemons in non-superuser mode that would
become the victim of that.

This reminds me of an earlier post where I discussed that CPU Time should not
be factored in. I may have misunderstood (my apologies to Rik). What is more
important is how long the process has been running (daemons usually get started
first thing at bootup, versus running 'netscape' and shortly using up all
memory in 30 minutes).

Figuring in Time Since Process Creation can almost be a misguided way of doing
things. I don't know how many times I've restarted or upgraded 'named' on my
90-day-uptime system, just to change the configuration file or to back out
invalid serial numbers in DNS zone files.

However, if I have been running a 99% cpu-intensive mathematical modeling
program for the past 10 days, I wouldn't want it to get killed because it
allocated 50 MB in the first second of its life when the system had 250MB to
spare.

However, what if I started up netscape on day 1, ran tons of other processes
meanwhile, and only used netscape lightly day by day. After 90 days of light
usage, netscape might actually be using 150MB of ram now. Does netscape
(rightfully) get killed, or does my modeling program which only uses 50MB get
killed because I started it only 10 days ago instead of 90 days?

Neither process start date nor CPU Usage % can correctly detect which process
to kill over a period of 10 to 90 days, in this scenario. This is why I don't
like factoring in these two elements. The OOM killer will get this right 90% of
the time, maybe even 95%. But what about the sequential 'child worker' that was
forked off of the modeling program once every 5 hours?

I think there should be a better solution.

 What about a user-defined list of "wishes"? The administrator should be
 enabled to enforce that specific processes are to be terminated only as a
 last resource (syslogd), or that they should be killed first (netscape).
 Could that be done using some /proc interface - some lines, each
 containing a program name, and a modifier for the killing priority?

echo "init"  /proc/sys/oom-ignore
echo "httpd"  /proc/sys/oom-ignore
echo "parallel-fft"  /proc/sys/oom-ignore
 etc...

This is a very workable option. It allows the admin to define what is
"important" on his computer and tells the OOM killer to terminate at
last resort (or ignore completely).

I like it.  Rik, what do you think?

 Just a thought. Hope my English is not too bad to make my thoughts
 clear... Sorry if this was discussed on l-k before - I do not have the
 time to read each posting on the list.
 
 
 Greetings from Germany,
 
 Jochen Striepe.
 
 

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] New ideas for the OOM handler

2000-10-09 Thread Byron Stanoszek

On Mon, 9 Oct 2000, Matthew Dharm wrote:

 On Mon, Oct 09, 2000 at 09:25:38PM -0400, Byron Stanoszek wrote:
  echo "init"  /proc/sys/oom-ignore
  echo "httpd"  /proc/sys/oom-ignore
  echo "parallel-fft"  /proc/sys/oom-ignore
   etc...
 
 I'd be concerned with the security implications of this feature... after
 all, I can edit argv[0] to change the name of my program -- a malicious
 application could do this to attempt to "hide" from the OOM Killer.

You have a point. How about specifying the complete path in the /proc file,
so that a special bit is set on the process when it is executed.

echo "/sbin/init"  /proc/sys/oom-ignore

User does an execvp("init", ...)

At this time, the user's path is searched for a match for "init".
The kernel determines "/sbin/init" is the correct path name and begins
to execute the program. It creates the process structure.

Then, before starting program execution, it compares the full path with the
list of entries in the /proc file. Seeing a match, it sets a bit (or a
variable kill-rate) on the process structure. OOM reads this process structure
in at kill-time and instantly determines which it should kill quickly or stay
away from. If none apply, default to standard algorithm.

 -Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] New ideas for the OOM handler

2000-10-09 Thread Byron Stanoszek

On Mon, 9 Oct 2000 [EMAIL PROTECTED] wrote:

 how about registering the full path (or inode number of the executable?),
 the owner, and an optional high water mark of memory consumption, over
 which the process is considered to be leaking memory and gets added to the
 algorithm of processes to kill?  this is because while normally i want to
 ignore syslogd, if syslogd is consuming 20MB then it probably has sprung
 a leak.

Now we're getting somewhere! :)

Inode number would have to store disk maj:min too, and have the ability to
flush inodes stored for a particular disk when mounted/umounted. This might
cause a problem for one-time writes to a /proc entry that aren't refreshed when
the fs is [re]mounted.

Also, if you upgrade 'named' and replace the old executable, that inode entry
in /proc no longer makes sense. It's almost too hard to maintain by storing
the inode, unfortunately.

I like the idea for adding a high MB mark at which to kill the process (only
when OOM is kicked in). That can be added in addition to a priority scheme.

I envision a /proc format such as this, one entry per line:

/full/path/to/executable [priority] [max kbytes used]

where Priority is -20 to +20 (default 0), -20 being kill at last resort and +20
being try to kill these processes first.

max kbytes, if present and nonzero, would match this process only if its total
VSZ is = kb.

An example file could be:

/sbin/init -20
/usr/sbin/syslogd -19
/usr/sbin/named -18
/usr/local/apache/bin/httpd -15
/usr/lib/netscape/netscape +15
/home/byron/daemonprogram -5 32768

Under the rule I described (below), /sbin/init is practically invalid because
adding that entry to the /proc file would come AFTER /sbin/init was first
executed.

httpd would be killed with greater priority than named and syslogd (assuming
VSZ was the same), in case of a memory leak. Of course, you might not want
this, and you can easily change it.

On the other hand, if my daemonprogram starts taking up more than 32 meg, then
kill it first before killing the other daemons (only when system is running low
on memory).

 it also might be good to have options to kill anything connected to a pty
 first, and to not kill anything attatched to the console.  obviously these
 leave ways for admins to shoot themselves in the foot, but they could be
 useful.

I _had_ thought of that, but I don't know how clear that is in the process
structure. Malicious users can simply run setsid() to detach from a controlling
tty, thereby defeating the rule.

 also ignoring or including based on users and groups would be good (kill
 all the student logins, keep all the operator logins, or whatever...).

This would almost be in the same category as the 32-bit UIDs, where there are
TONS of people using a server or network cluster where memory shortage might
actually be a problem. I think it is a good route to explore.

 in general i like the idea of tunable OOM killing, since no one OOM killer
 suits everyone.

Agreed.

   What about a user-defined list of "wishes"? The administrator should be
   enabled to enforce that specific processes are to be terminated only as a
   last resource (syslogd), or that they should be killed first (netscape).
   Could that be done using some /proc interface - some lines, each
   containing a program name, and a modifier for the killing priority?
  
  echo "init"  /proc/sys/oom-ignore
  echo "httpd"  /proc/sys/oom-ignore
  echo "parallel-fft"  /proc/sys/oom-ignore
   etc...
  
  This is a very workable option. It allows the admin to define what is
  "important" on his computer and tells the OOM killer to terminate at
  last resort (or ignore completely).
  
  I like it.  Rik, what do you think?
 

 On Mon, 9 Oct 2000, Matthew Dharm wrote:
 
  On Mon, Oct 09, 2000 at 09:25:38PM -0400, Byron Stanoszek wrote:
   echo "init"  /proc/sys/oom-ignore
   echo "httpd"  /proc/sys/oom-ignore
   echo "parallel-fft"  /proc/sys/oom-ignore
etc...
 
  I'd be concerned with the security implications of this feature... after
  all, I can edit argv[0] to change the name of my program -- a malicious
  application could do this to attempt to "hide" from the OOM Killer.
 
 You have a point. How about specifying the complete path in the /proc file,
 so that a special bit is set on the process when it is executed.
 
 echo "/sbin/init"  /proc/sys/oom-ignore
 
 User does an execvp("init", ...)
 
 At this time, the user's path is searched for a match for "init".
 The kernel determines "/sbin/init" is the correct path name and begins
 to execute the program. It creates the process structure.
 
 Then, before starting program execution, it compares the full path with the
 list of entries in the /proc file. Seeing a match, it sets a bit (or a
 variable kill-rate) on the process structure. OOM reads this process structure
 in at kill-time and instan

Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

2000-10-06 Thread Byron Stanoszek

On Fri, 6 Oct 2000, Rik van Riel wrote:

> 3. add the out of memory killer, which has been tuned with
>-test9 to be ran at exactly the right moment; process
>selection: "principle of least surprise"  <== OOM handling

In the OOM killer, shouldn't there be a check for PID 1 just to enforce that
INIT will not be the victim? Sure its total_vm might be small, but if there was
a memory leak in the kernel somewhere, it might eventually become the target.
I suppose, if it ever were to become the victim, your system wouldn't be too
usable anyway...

Can you give me your rationale for selecting 'nice' processes as being badder?
Do you think it would be a good idea to scale the amount of badness according
to how nice the process is (a nice value of 20 could get the full *2, otherwise
a smaller multiplier)?

How about using the current process priority level instead of nicety. If a
process was deprioritized (or auto-niced) because it was starting to eat up CPU
time, AND its memory is abnormally high, then should that be our #1 victim? We
also don't want to kill things like benchmarks either, but hopefully they
wouldn't start eating up more than the available system memory.

Just some thoughts.

 -Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fix for 2.4.0-test9 OOM handler

2000-10-06 Thread Byron Stanoszek

On Fri, 6 Oct 2000, Rik van Riel wrote:

 3. add the out of memory killer, which has been tuned with
-test9 to be ran at exactly the right moment; process
selection: "principle of least surprise"  == OOM handling

In the OOM killer, shouldn't there be a check for PID 1 just to enforce that
INIT will not be the victim? Sure its total_vm might be small, but if there was
a memory leak in the kernel somewhere, it might eventually become the target.
I suppose, if it ever were to become the victim, your system wouldn't be too
usable anyway...

Can you give me your rationale for selecting 'nice' processes as being badder?
Do you think it would be a good idea to scale the amount of badness according
to how nice the process is (a nice value of 20 could get the full *2, otherwise
a smaller multiplier)?

How about using the current process priority level instead of nicety. If a
process was deprioritized (or auto-niced) because it was starting to eat up CPU
time, AND its memory is abnormally high, then should that be our #1 victim? We
also don't want to kill things like benchmarks either, but hopefully they
wouldn't start eating up more than the available system memory.

Just some thoughts.

 -Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.4.0-test9-pre2 OOM problem

2000-09-29 Thread Byron Stanoszek

I've been using 2.4.0-test9-pre2 happily for about 2 weeks without any crashes
or other memory problems (I'm using X 3.3.6, so the shared memory stuff didn't
break anything).

I accidentally compiled a program that I have using electric fence. The program
does a huge number of mallocs. Well, within 1 minute of running the program, it
eventually ate up 150MB of ram -- 128mb physical ram and 32mb swap. Bad thing
was, this program wasn't killed by the kernel, and it proceeded to eat up all
available memory.

For a few minutes, the HD light blinked on a few times as it was probably
sorting out which pages to store in swap and which to keep in memory. But after
2 minutes or so, the HD activity ceased and no other program appeared to run.
Using Alt+ScrollLock (I have sysrq disabled), I was able to determine that the
kernel was still flipping through the processes and it was still active and
hasn't crashed yet. Only, there was no memory available for processes to do
anything, and judging from Ctrl+ScrollLock, most were in the state waiting to
be swapped in.

I'm afraid to try any kernel above pre2 right now because of the OOPSen
reported by many people. I just wanted to bring attention to this problem in
case it hasn't already been fixed.

Regards,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.4.0-test9-pre2 OOM problem

2000-09-29 Thread Byron Stanoszek

I've been using 2.4.0-test9-pre2 happily for about 2 weeks without any crashes
or other memory problems (I'm using X 3.3.6, so the shared memory stuff didn't
break anything).

I accidentally compiled a program that I have using electric fence. The program
does a huge number of mallocs. Well, within 1 minute of running the program, it
eventually ate up 150MB of ram -- 128mb physical ram and 32mb swap. Bad thing
was, this program wasn't killed by the kernel, and it proceeded to eat up all
available memory.

For a few minutes, the HD light blinked on a few times as it was probably
sorting out which pages to store in swap and which to keep in memory. But after
2 minutes or so, the HD activity ceased and no other program appeared to run.
Using Alt+ScrollLock (I have sysrq disabled), I was able to determine that the
kernel was still flipping through the processes and it was still active and
hasn't crashed yet. Only, there was no memory available for processes to do
anything, and judging from Ctrl+ScrollLock, most were in the state waiting to
be swapped in.

I'm afraid to try any kernel above pre2 right now because of the OOPSen
reported by many people. I just wanted to bring attention to this problem in
case it hasn't already been fixed.

Regards,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.4.0-test9-pre6 shmem problems revisited

2000-09-23 Thread Byron Stanoszek

On Sun, 24 Sep 2000, Daniel Stone wrote:

> > Somehow i cant help but think this is somehow linked to an OOM problem
> > that has yet to be fixed with the 2.4.0-testX series.   It seems
> > suspiciously like the kernel is killing init when X decides it would be
> > peachy to gobble up all the ram. i dont know of any way to prove
> > this though.
> 
> The problem is most definitely NOT X as I experienced the exact same
> problems and reported it to l-k yesterday; and my box has no trace of X on
> it. gcc and grep take it down though.

I have been running 2.4.0-test9-pre2 for some time now and have not experienced
any deadlock or shared memory problem, except for one instance.

Any time I do a 'hdparm -tT /dev/hda', it really screws up all the memory
segments in the kernel. A subsequent 'df' or 'w' will print either garbage or
'no data in /etc/wtmp' file, etc. So it really looks like the cache is being
messed with. I don't think it affects programs whose ram is already in
userspace.

This problem I've noticed has existed in 2.4.0-test7 and on through to
test9-pre2. I do not know if pre6 has the same problem, but I suspect it does.
I'm using hdparm 1.6, and at quick glance of the code, it does use shared
memory to do its timing runs from the disk. This leads me to believe shm is
very buggy, and perhaps Rik van Riel's latest patches have just brought the
bug into the spotlight, analogous to how the test8 patches uncovered the
truncate problem.

Linus, I think you should hold off a little before removing Rik's VM patches
from the kernel, and let the Linux community spend more time tracking down this
problem.

Regards,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [DOC] Debugging early kernel hangs

2000-09-21 Thread Byron Stanoszek

On Fri, 22 Sep 2000, Keith Owens wrote:

> If a kernel hangs early in the boot process (before the console has
> been initialized) then printk is no use because you never see the
> output.  There is a technique for using the video display to indicate
> boot progress so you can localize the problem.  Reporting "my kernel
> hangs during boot at line nnn in routine xyz" is a lot better than "my
> kernel hangs during boot".
> 
> The idea is to write characters direct to the video screen during
> booting using a macro called VIDEO_CHAR.  This macro takes a character
> position and a single character value to be displayed.  Use different
> positions on the screen for different levels of code and use different
> characters in one position to indicate which stage that level is up to.
> For example, with the patch below, the string EAC at hang indicates
> parse_options(), checksetup().

Why not just redirect printk() to output a string of characters one by one
using VIDEO_CHAR until the console subsystem is initialized. You can use a
statically defined int to keep track of what row & column you're on. There is
no need to be so cryptic about the readout.

Regards,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



IDE Troubles - linux-2.4.0-test9-pre2

2000-09-21 Thread Byron Stanoszek

After about 3 days running 2.4.0-test9-pre2 (32mb i586 machine), I switched on
the system console and saw these messages. Nothing seems to be wrong with the
system. Can anyone enlighten me?

  Flags; bus-master 1, full 0; dirty 1267452(12) current 1267456(0).
  Transmit list 010c72f0 vs. c10c72c0.
  0: @c10c7200  length 85ce status 000105ce
  1: @c10c7210  length 85ea status 000105ea
  2: @c10c7220  length 85ea status 000105ea
  3: @c10c7230  length 85ea status 000105ea
  4: @c10c7240  length 85ea status 000105ea
  5: @c10c7250  length 85ea status 000105ea
  6: @c10c7260  length 85ea status 000105ea
  7: @c10c7270  length 85ea status 000105ea
  8: @c10c7280  length 85ea status 000105ea
  9: @c10c7290  length 85ea status 000105ea
  10: @c10c72a0  length 85ea status 000105ea
  11: @c10c72b0  length 85ea status 000105ea
  12: @c10c72c0  length 85ea status 000105ea
  13: @c10c72d0  length 85ea status 000105ea
  14: @c10c72e0  length 85ea status 000105ea
  15: @c10c72f0  length 85ea status 85ea
TCP: peer shrinks window. Bad, what else can I say?

---

root:~> lspci -v
00:00.0 Host bridge: Intel Corporation 430VX - 82437VX TVX [Triton VX] (rev 02)
Flags: bus master, medium devsel, latency 32

00:07.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] (rev 01)
Flags: bus master, medium devsel, latency 0

00:07.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] (prog-if 
80 [Master])
Flags: bus master, medium devsel, latency 32
I/O ports at f000 [size=16]

00:11.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone] (rev 34)
Subsystem: 3Com Corporation: Unknown device 9055
Flags: bus master, medium devsel, latency 32, IRQ 11
I/O ports at 6100 [size=128]
Memory at e100 (32-bit, non-prefetchable) [size=128]
Expansion ROM at  [disabled] [size=128K]
Capabilities: [dc] Power Management version 1

00:13.0 VGA compatible controller: Tseng Labs Inc ET6000 (rev 30) (prog-if 00 [VGA])
Flags: slow devsel, IRQ 10
Memory at e000 (32-bit, non-prefetchable) [size=16M]
I/O ports at 6200 [size=256]
Expansion ROM at  [disabled] [size=16M]


-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



IDE Troubles - linux-2.4.0-test9-pre2

2000-09-21 Thread Byron Stanoszek

After about 3 days running 2.4.0-test9-pre2 (32mb i586 machine), I switched on
the system console and saw these messages. Nothing seems to be wrong with the
system. Can anyone enlighten me?

  Flags; bus-master 1, full 0; dirty 1267452(12) current 1267456(0).
  Transmit list 010c72f0 vs. c10c72c0.
  0: @c10c7200  length 85ce status 000105ce
  1: @c10c7210  length 85ea status 000105ea
  2: @c10c7220  length 85ea status 000105ea
  3: @c10c7230  length 85ea status 000105ea
  4: @c10c7240  length 85ea status 000105ea
  5: @c10c7250  length 85ea status 000105ea
  6: @c10c7260  length 85ea status 000105ea
  7: @c10c7270  length 85ea status 000105ea
  8: @c10c7280  length 85ea status 000105ea
  9: @c10c7290  length 85ea status 000105ea
  10: @c10c72a0  length 85ea status 000105ea
  11: @c10c72b0  length 85ea status 000105ea
  12: @c10c72c0  length 85ea status 000105ea
  13: @c10c72d0  length 85ea status 000105ea
  14: @c10c72e0  length 85ea status 000105ea
  15: @c10c72f0  length 85ea status 85ea
TCP: peer shrinks window. Bad, what else can I say?

---

root:~ lspci -v
00:00.0 Host bridge: Intel Corporation 430VX - 82437VX TVX [Triton VX] (rev 02)
Flags: bus master, medium devsel, latency 32

00:07.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] (rev 01)
Flags: bus master, medium devsel, latency 0

00:07.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] (prog-if 
80 [Master])
Flags: bus master, medium devsel, latency 32
I/O ports at f000 [size=16]

00:11.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone] (rev 34)
Subsystem: 3Com Corporation: Unknown device 9055
Flags: bus master, medium devsel, latency 32, IRQ 11
I/O ports at 6100 [size=128]
Memory at e100 (32-bit, non-prefetchable) [size=128]
Expansion ROM at unassigned [disabled] [size=128K]
Capabilities: [dc] Power Management version 1

00:13.0 VGA compatible controller: Tseng Labs Inc ET6000 (rev 30) (prog-if 00 [VGA])
Flags: slow devsel, IRQ 10
Memory at e000 (32-bit, non-prefetchable) [size=16M]
I/O ports at 6200 [size=256]
Expansion ROM at unassigned [disabled] [size=16M]


-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [DOC] Debugging early kernel hangs

2000-09-21 Thread Byron Stanoszek

On Fri, 22 Sep 2000, Keith Owens wrote:

 If a kernel hangs early in the boot process (before the console has
 been initialized) then printk is no use because you never see the
 output.  There is a technique for using the video display to indicate
 boot progress so you can localize the problem.  Reporting "my kernel
 hangs during boot at line nnn in routine xyz" is a lot better than "my
 kernel hangs during boot".
 
 The idea is to write characters direct to the video screen during
 booting using a macro called VIDEO_CHAR.  This macro takes a character
 position and a single character value to be displayed.  Use different
 positions on the screen for different levels of code and use different
 characters in one position to indicate which stage that level is up to.
 For example, with the patch below, the string EAC at hang indicates
 parse_options(), checksetup().

Why not just redirect printk() to output a string of characters one by one
using VIDEO_CHAR until the console subsystem is initialized. You can use a
statically defined int to keep track of what row  column you're on. There is
no need to be so cryptic about the readout.

Regards,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Very aggressive swapping after 2 hours rest

2000-09-18 Thread Byron Stanoszek

I've finally had a chance to test out the new VM patch on my 32mb system.

It runs much, much better than the previous test8, and the pages->swap change
is actually much smoother than I had expected it to be considering the recent
talk about making it more gradual. I'm against having the swap more gradual
because of the low amount of available memory and the high amount of memory
actually taken up by processes required for normal operation.

At the moment, there's only room for about 5-6 meg of cache. If a gradual swap
goes into effect, then I'm afraid that the processes that actually 'need' to
stay in memory will start swapping out and thrashing, even when there's 6 meg
still available for use. This was precisely the problem with the old VM on my
machine, only the system wanted to keep 16 meg free for cache (*gag*).

So, please take my opinions into consideration when/if you redesign the swap
mechanism.

Regards,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Very aggressive swapping after 2 hours rest

2000-09-18 Thread Byron Stanoszek

I've finally had a chance to test out the new VM patch on my 32mb system.

It runs much, much better than the previous test8, and the pages-swap change
is actually much smoother than I had expected it to be considering the recent
talk about making it more gradual. I'm against having the swap more gradual
because of the low amount of available memory and the high amount of memory
actually taken up by processes required for normal operation.

At the moment, there's only room for about 5-6 meg of cache. If a gradual swap
goes into effect, then I'm afraid that the processes that actually 'need' to
stay in memory will start swapping out and thrashing, even when there's 6 meg
still available for use. This was precisely the problem with the old VM on my
machine, only the system wanted to keep 16 meg free for cache (*gag*).

So, please take my opinions into consideration when/if you redesign the swap
mechanism.

Regards,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Very aggressive swapping after 2 hours rest

2000-09-17 Thread Byron Stanoszek

On Sun, 17 Sep 2000, Andrea Arcangeli wrote:

> Some of the fields recently added to /proc/meminfo are very dependent on the
> internal of the memory management of the kernel, I don't think it's good idea
> to make them part of the user<->kernel API because there are alteratvive
> algorithms that won't generate those numbers and that can generate different
> ones. I understand the interest in collecting the new data for debugging
> the behaviour of the MM but a few line of perl are enough to do that.

Not to be a bother, but I would still like to see a value or at least someone
tell me what calculations I would need to do with the values listed in
/proc/meninfo in order to determine the number of pages actually in-use by
processes (or in otherwords, the amount of memory I can allocate before I fill
up the system RAM at current state).

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Very aggressive swapping after 2 hours rest

2000-09-17 Thread Byron Stanoszek

On Sun, 17 Sep 2000, Andrea Arcangeli wrote:

 Some of the fields recently added to /proc/meminfo are very dependent on the
 internal of the memory management of the kernel, I don't think it's good idea
 to make them part of the user-kernel API because there are alteratvive
 algorithms that won't generate those numbers and that can generate different
 ones. I understand the interest in collecting the new data for debugging
 the behaviour of the MM but a few line of perl are enough to do that.

Not to be a bother, but I would still like to see a value or at least someone
tell me what calculations I would need to do with the values listed in
/proc/meninfo in order to determine the number of pages actually in-use by
processes (or in otherwords, the amount of memory I can allocate before I fill
up the system RAM at current state).

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



  1   2   >