date:20070330

Re: 2.6.21-rc5-mm2 - compile error on x86-64

2007-03-30 Thread Helge Hafting

The patch did not apply, but mm3 compiled so I'll try that instead.
Helge Hafting
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21-rc5-mm2 - compile error on x86-64

2007-03-30 Thread Eric W. Biederman

Helge Hafting <[EMAIL PROTECTED]> writes:

> Correct. I seem to remember that the latter is considered 
> "deprecated, but some programs may still depend on it".  So I disabled it to 
> see what broke.  udev complained about the missing /proc/sys/kernel/hotplug,
> but was happy to use /sys/kernel/uevent_helper instead.  I didn't
> notice other problems, so I left things like that.

Well if anything it is the other way around.  The preferred interface to
sysctls is /proc/sys.  There is the whole thing where people aren't to
happy with non-process related things in /proc, so in that sense there
is a bit of deprecation, but /proc and /proc/sys are fully supported.

The plethora of configuration is what remains when I dug into the binary
sys_sysctl interface and tested the assertion that no one uses it, and
it has been deprecated for years and we could just kill it.

We can now remove the binary sys_sysctl syscall while keeping /proc/sys
support.  Someday I might even get ambitious and add the appropriate
deprecated warnings so we can kill the binary interface.

I got as far as seeing that there were a small handful of real
programs that use sys_sysctl.  I looked at how were giving notice
and realized that was insufficient to tell users we were deprecating
the thing.  I didn't see much point (except being able to immediate
drop support) to removing sys_sysctl and since we would have to go
a couple of years still supporting it to remove it properly I got
lazy and stopped.

Maybe myself or someone else can get ambitious and deprecate
sys_sysctl properly and we can remove it one of these years...

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: mcdx -- do_request(): non-read command to cd!!

2007-03-30 Thread Jens Axboe

On Fri, Mar 30 2007, Rene Herman wrote:
> Hi Al.
> 
> GIT doesn't remember, it's been too long, but IIRC you were the last one 
> to do some work on mcdx (the old proprietary mitsumi cd-rom driver). The 
> thing builds without warnings on 2.6.20.4, unlike most other proprietary 
> CD-ROM drivers, so someone did...
> 
> In any case, I just bet you're positively thrilled receiving bug-reports 
> for the thing right? Mmm?
> 
> I dug up a 1-speed Mitsumi CRMC-LU005S today. Brilliant drive! You push 
> on the front, after which it comes loose and you then yank the entire 
> drive, mechanism and all, out of its casing over some kind of magnetic 
> resistance it seems and then open a _second_ top-loading door, put in 
> the CD and follow the procedure backwards again. I've done that at least 
> 20 times now and I'm not by any means done yet. Brilliant.
> 
> The drive works fine under DOS (*), with both IRQ-less and IRQ-enabled 
> controllers. The linux driver does not work though:
> 
> [EMAIL PROTECTED]:~# modprobe mcdx
> 
> [EMAIL PROTECTED]:~# dmesg | tail -4
> mcdx Version 2.14(hs)
> mcdx $Id: mcdx.c,v 1.21 1997/01/26 07:12:59 davem Exp $
> Uniform CD-ROM driver Revision: 3.20
>  mcdx: Mitsumi CD-ROM installed at 0x300, irq 15. (Firmware version M 4)
> 
> [EMAIL PROTECTED]:~# mount /dev/mcdx0 /mnt/cdrom
> mount: block device /dev/mcdx0 is write-protected, mounting read-only
> mount: /dev/mcdx0: can't read superblock
> 
> [EMAIL PROTECTED]:~# dmesg | tail -4
>  mcdx: Mitsumi CD-ROM installed at 0x300, irq 15. (Firmware version M 4)
> mcdx do_request(): non-read command to cd!!
> end_request: I/O error, dev mcdx0, sector 0
> FAT: unable to read boot sector
> [EMAIL PROTECTED]:~#
> 
> This same 300/15 pair works under DOS in the same machine and IRQ15 is 
> firing. The error sounds very block-ish. Would you happen to know?
> 
> I'll happily test patches :-)

Try this.

diff --git a/drivers/cdrom/mcdx.c b/drivers/cdrom/mcdx.c
index f574962..7086313 100644
--- a/drivers/cdrom/mcdx.c
+++ b/drivers/cdrom/mcdx.c
@@ -577,6 +577,11 @@ static void do_mcdx_request(request_queue_t * q)
if (!req)
return;
 
+   if (!blk_fs_request(req)) {
+   end_request(req, 0);
+   goto again;
+   }
+
stuffp = req->rq_disk->private_data;
 
if (!stuffp->present) {
@@ -596,7 +601,7 @@ static void do_mcdx_request(request_queue_t * q)
xtrace(REQUEST, "do_request() (%lu + %lu)\n",
   req->sector, req->nr_sectors);
 
-   if (req->cmd != READ) {
+   if (rq_data_dir(req) != READ) {
xwarn("do_request(): non-read command to cd!!\n");
xtrace(REQUEST, "end_request(0): write\n");
end_request(req, 0);

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [test] hackbench.c interactivity results: vanilla versus SD/RSDL

2007-03-30 Thread Mike Galbraith

On Sat, 2007-03-31 at 08:31 +0200, Mike Galbraith wrote:
> On Fri, 2007-03-30 at 22:41 -0700, Xenofon Antidides wrote:
> 
> > Patch makes X yuck with any load. I stick with SD.

General comment directed at nobody in particular:

If anyone thinks the current scheduler sucks rocks, maybe they should
try to fix it.  If they think SD is the best thing since sliced bread,
maybe they should help Con fix that.  Code talks...

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [4/4] 2.6.21-rc5: known regressions (v2)

2007-03-30 Thread Frédéric Riss

Le vendredi 30 mars 2007 à 23:49 +0200, Adrian Bunk a écrit :
> Subject: MacMini doesn't come out of suspend to ram  (i386 clockevents)
>  (CONFIG_HPET_TIMER)
> References : http://lkml.org/lkml/2007/3/21/374
> Submitter  : Frédéric Riss <[EMAIL PROTECTED]>
>  Tino Keitel <[EMAIL PROTECTED]>
> Caused-By  : Thomas Gleixner <[EMAIL PROTECTED]>
>  commit e9e2cdb412412326c4827fc78ba27f410d837e6e
> Status : unknown

This one has been fixed by 399afa4fc9238fbae42116cf25a54671c0e8f56e.
Suspend to ram now works with HPET enabled (and regardless of the NO_HZ
setting).

Thanks!

Fred.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] fix dependency generation

2007-03-30 Thread Sam Ravnborg

On Thu, Mar 29, 2007 at 10:27:14AM +0100, Jan Beulich wrote:
> Commit 2e3646e51b2d6415549b310655df63e7e0d7a080 changed the way
> the split config tree is built, but failed to also adjust fixdep
> accordingly - if changing a config option from or to m, files
> referencing the respective CONFIG_..._MODULE (but not the
> corresponding CONFIG_...) didn't get rebuilt.

The problem is that tristate symbol represent three values.
=n => CONFIG_SYMBOL is undefined
=y => CONFIG_SYMBOL is defined
=m => COMFIG_SYMBOL_MODULE is defined

The function split_config does not take into account the
different values and 'fixing' this in fixdep is wrong.
Because fixdep does not know if the variable is a tristate symbol or not
so it can either blindly remove _MODULE (your patch)
or each time it encounters _MODULE check for a symbol with and
without _MODULE.

The better fix is to teach the split_config function that
for tristate symbols two files shall be created in the include/config
hirachy. So for apm this gets:
include/config/apm.h
include/config/apm/module.h

This will make kconfig behave correct the day that someone add a config
symbol with a _MODULE suffix.

I will follow-up with two patches that implement the changes to split_config.
The first is a pure code refactoring preparing for the second patch.

Roman - please ack/nack these this since they touches kconfig backend.

Sam

> 
> Once at it, also eliminate false dependencies due to use of
> ...CONFIG_... identifiers.
> 
> Signed-off-by: Jan Beulich <[EMAIL PROTECTED]>
> 
> --- linux-2.6.21-rc5/scripts/basic/fixdep.c   2007-02-04 19:44:54.0 
> +0100
> +++ 2.6.21-rc5-fixdep-mod/scripts/basic/fixdep.c  2007-03-29 
> 11:11:10.0 +0200
> @@ -29,8 +29,7 @@
>   * option which is mentioned in any of the listed prequisites.
>   *
>   * To be exact, split-include populates a tree in include/config/,
> - * e.g. include/config/his/driver.h, which contains the #define/#undef
> - * for the CONFIG_HIS_DRIVER option.
> + * e.g. include/config/his/driver.h, consiting of empty files.
>   *
>   * So if the user changes his CONFIG_HIS_DRIVER option, only the objects
>   * which depend on "include/linux/config/his/driver.h" will be rebuilt,
> @@ -223,7 +222,7 @@ void use_config(char *m, int slen)
>  void parse_config_file(char *map, size_t len)
>  {
>   int *end = (int *) (map + len);
> - /* start at +1, so that p can never be < map */
> + /* start at +1, so that p can never be <= map */
>   int *m   = (int *) map + 1;
>   char *p, *q;
>  
> @@ -235,6 +234,8 @@ void parse_config_file(char *map, size_t
>   continue;
>   conf:
>   if (p > map + len - 7)
> + break;
> + if (isalnum(p[-1]) || p[-1] == '_')
>   continue;
>   if (memcmp(p, "CONFIG_", 7))
>   continue;
> @@ -245,6 +246,8 @@ void parse_config_file(char *map, size_t
>   continue;
>  
>   found:
> + if (!memcmp(q - 7, "_MODULE", 7))
> + q -= 7;
>   use_config(p+7, q-p-7);
>   }
>  }
> 
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.21-rc5: Thinkpad X60 gets critical thermal shutdowns

2007-03-30 Thread Jeremy Fitzhardinge

When I run 2.6.21-rc5 + Andi's x86 patches + paravirt_ops patches, I've
been getting my machine shut down with critical thermal shutdown messages:

Mar 30 23:19:03 localhost kernel: ACPI: Critical trip point
Mar 30 23:19:03 localhost kernel: Critical temperature reached (128 C), 
shutting down.
Mar 30 23:19:03 localhost kernel: Critical temperature reached (128 C), 
shutting down.
Mar 30 23:19:03 localhost shutdown[19417]: shutting down for system halt

and the machine does feel pretty hot.  Interestingly, when the machine
reboots, the fan spins up to a noticeably higher speed, so it seems that
maybe something is getting fan speed control wrong.

The machine is a Thinkpad X60, with a 1.8GHz Core Duo.  I can run it
indefinitely with the FC6 2.6.20-1.2933.fc6 kernel, so I don't think
there's anything wrong with the hardware.  And it was sitting on a
desktop plugged into mains, so there's no problems with obstructed airflow.

I was running a normal email/browsing/editing/compiling workload, and I
don't think there was anything particularly CPU intensive running at the
time.  I run cpufreq with the conservative governor.

Running now with the FC6 kernel, I get:
: ezr:pts/2; cat /proc/acpi/thermal_zone/THM?/temperature
temperature: 69 C
temperature: 82 C


Config attached.

J
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION="-paravirt"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_SYSFS_DEPRECATED=y
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KALLSYMS_EXTRA_PASS=y
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SHMEM=y
CONFIG_SLAB=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODVERSIONS=y
CONFIG_MODULE_SRCVERSION_ALL=y
CONFIG_KMOD=y
CONFIG_STOP_MACHINE=y
CONFIG_BLOCK=y
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
CONFIG_DEFAULT_CFQ=y
CONFIG_DEFAULT_IOSCHED="cfq"
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_SMP=y
CONFIG_X86_PC=y
CONFIG_PARAVIRT=y
CONFIG_VMI=y
CONFIG_MPENTIUMM=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_TSC=y
CONFIG_HPET_TIMER=y
CONFIG_NR_CPUS=8
CONFIG_SCHED_MC=y
CONFIG_PREEMPT_VOLUNTARY=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_MCE=y
CONFIG_X86_MCE_P4THERMAL=y
CONFIG_VM86=y
CONFIG_X86_CPUID=m
CONFIG_EDD=m
CONFIG_HIGHMEM64G=y
CONFIG_PAGE_OFFSET=0xC000
CONFIG_HIGHMEM=y
CONFIG_X86_PAE=y
CONFIG_ARCH_FLATMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_FLATMEM_MANUAL=y
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
CONFIG_SPARSEMEM_STATIC=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_RESOURCES_64BIT=y
CONFIG_ZONE_DMA_FLAG=1
CONFIG_HIGHPTE=y
CONFIG_MATH_EMULATION=y
CONFIG_MTRR=y
CONFIG_IRQBALANCE=y
CONFIG_HZ_1000=y
CONFIG_HZ=1000
CONFIG_PHYSICAL_START=0x10
CONFIG_PHYSICAL_ALIGN=0x10
CONFIG_HOTPLUG_CPU=y
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_PM=y
CONFIG_PM_DEBUG=y
CONFIG_SOFTWARE_SUSPEND=y
CONFIG_PM_STD_PARTITION=""
CONFIG_SUSPEND_SMP=y
CONFIG_ACPI=y
CONFIG_ACPI_SLEEP=y
CONFIG_ACPI_SLEEP_PROC_FS=y
CONFIG_ACPI_PROCFS=y
CONFIG_ACPI_AC=y
CONFIG_ACPI_BATTERY=y
CONFIG_ACPI_BUTTON=y
CONFIG_ACPI_VIDEO=m
CONFIG_ACPI_FAN=y
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_HOTPLUG_CPU=y
CONFIG_ACPI_THERMAL=y
CONFIG_ACPI_IBM=m
CONFIG_ACPI_IBM_BAY=y
CONFIG_ACPI_BLACKLIST_YEAR=1999
CONFIG_ACPI_EC=y
CONFIG_ACPI_POWER=y
CONFIG_ACPI_SYSTEM=y
CONFIG_X86_PM_TIMER=y
CONFIG_ACPI_CONTAINER=y
CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_TABLE=y
CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE=y
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
CONFIG_CPU_FREQ_GOV_POWERSAVE=y
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
CONFIG_X86_ACPI_CPUFREQ=y
CONFIG_X86_SPEEDSTEP_CENTRINO=y
CONFIG_X86_SPEEDSTEP_CENTRINO_ACPI=y
CONFIG_X86_SPEEDSTEP_CENTRINO_TABLE=y

Re: 2.6.21-rc5-mm2 - compile error on x86-64

2007-03-30 Thread Helge Hafting

On Thu, Mar 29, 2007 at 02:28:16PM -0700, Andrew Morton wrote:
> On Thu, 29 Mar 2007 20:20:20 +0200
> Helge Hafting <[EMAIL PROTECTED]> wrote:
> 
[...]
> yup, people will presumably work on fixing these things up after the
> feature hits mainline.
> 
> >   LD  init/built-in.o
> >   LD  .tmp_vmlinux1
> > fs/built-in.o: In function `proc_root_init':
> > /usr/src/linux/fs/proc/root.c:83: undefined reference to `proc_sys_init'
> 
> Ah.  I assume you have CONFIG_SYSCTL=y, CONFIG_PROC_SYSCTL=n?

Correct. I seem to remember that the latter is considered 
"deprecated, but some programs may still depend on it".  So I disabled it to 
see what broke.  udev complained about the missing /proc/sys/kernel/hotplug,
but was happy to use /sys/kernel/uevent_helper instead.  I didn't
notice other problems, so I left things like that.

Helge Hafting
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [test] hackbench.c interactivity results: vanilla versus SD/RSDL

2007-03-30 Thread Mike Galbraith

On Fri, 2007-03-30 at 22:41 -0700, Xenofon Antidides wrote:

> Patch makes X yuck with any load. I stick with SD.

Shrug.  My milage is different, but hey, it's a work in progress.  If SD
ever gets to the point that it actually delivers what it claims, I may
join you.

In the meantime, IMHO mainline is MUCH better in the general case.  If
the general case was that which the various sleep exploits do, the
history mechanism in mainline wouldn't have survived it's first day.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21-rc5-mm1

2007-03-30 Thread Mariusz Kozłowski

Hello,

> > > 2) This was found a couple minutes later when the system was
> > >really busy and close to oom condition.
> > > 
> > >  INFO: lockdep is turned off.
> > >  BUG: soft lockup detected on CPU#0!
> > >   [] show_trace_log_lvl+0x1a/0x30
> > >   [] show_trace+0x12/0x14
> > >   [] dump_stack+0x16/0x18
> > >   [] softlockup_tick+0x81/0xa8
> > >   [] run_local_timers+0x12/0x14
> > >   [] update_process_times+0x2b/0x63
> > >   [] tick_sched_timer+0x4d/0x9e
> > >   [] hrtimer_interrupt+0x12e/0x1a6
> > >   [] timer_interrupt+0xe/0x15
> > >   [] handle_IRQ_event+0x28/0x59
> > >   [] handle_level_irq+0x6e/0xe7
> > >   [] do_IRQ+0x3d/0x7f
> > >   [] common_interrupt+0x2e/0x34
> > >   [] do_softirq+0x4d/0x50
> > >   [] irq_exit+0x7e/0x80
> > >   [] do_IRQ+0x42/0x7f
> > >   [] common_interrupt+0x2e/0x34
> > >   [] core_sys_select+0x1c6/0x310
> > >   [] sys_select+0x39/0x18f
> > >   [] sysenter_past_esp+0x5d/0x99
> > >   ===
> > >  Clocksource tsc unstable (delta = 9372804176 ns)
> > >  Time: acpi_pm clocksource has been installed.
> 
> Hmm.. No clue right off. Does booting w/ clocksource=acpi_pm avoid the
> issue?

Sorry. Can't reproduce it either way.

Mariusz
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [test] hackbench.c interactivity results: vanilla versus SD/RSDL

2007-03-30 Thread Mike Galbraith

On Sat, 2007-03-31 at 05:42 +0200, Mike Galbraith wrote:

> Yesterday, I piddled around with tracking interactive backlog as a way
> to detect when the load isn't really an interactive load, that's very
> simple and has potential.

Kinda like the patch below (though it can all be done slow path), or
something like my old throttling patches do (for grins I revived one,
and watched it yawn at your exploit)...

top - 07:49:36 up 6 min, 13 users,  load average: 4.42, 3.11, 1.40

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  P COMMAND
 6027 root  20   0  1564  104   24 R   45  0.0   0:09.47 1 fiftypercent
 6028 root  19   0  1564  104   24 R   40  0.0   0:09.43 1 fiftypercent
 6025 root  25   0  2892 1240 1032 R   32  0.1   0:09.04 1 sh
 6024 root  16   0  1564  436  356 S   30  0.0   0:10.45 0 fiftypercent
 6026 root  15   0  1564  104   24 R   27  0.0   0:09.52 0 fiftypercent
 6029 root  16   0  1564  104   24 R   18  0.0   0:09.33 0 fiftypercent

...or both, or maybe something clever instead :)

--- kernel/sched.c.org  2007-03-27 15:47:49.0 +0200
+++ kernel/sched.c  2007-03-31 06:56:57.0 +0200
@@ -109,6 +109,7 @@ unsigned long long __attribute__((weak))
 #define MAX_SLEEP_AVG  (DEF_TIMESLICE * MAX_BONUS)
 #define STARVATION_LIMIT   (MAX_SLEEP_AVG)
 #define NS_MAX_SLEEP_AVG   (JIFFIES_TO_NS(MAX_SLEEP_AVG))
+#define INTERACTIVE_LIMIT  (DEF_TIMESLICE * 4)
 
 /*
  * If a task is 'interactive' then we reinsert it in the active
@@ -167,6 +168,9 @@ unsigned long long __attribute__((weak))
(JIFFIES_TO_NS(MAX_SLEEP_AVG * \
(MAX_BONUS / 2 + DELTA((p)) + 1) / MAX_BONUS - 1))
 
+#define INTERACTIVE_BACKLOG_EXCEEDED(array) \
+   ((array)->interactive_ticks > INTERACTIVE_LIMIT)
+
 #define TASK_PREEMPTS_CURR(p, rq) \
((p)->prio < (rq)->curr->prio)
 
@@ -201,6 +205,7 @@ static inline unsigned int task_timeslic
 
 struct prio_array {
unsigned int nr_active;
+   int interactive_ticks;
DECLARE_BITMAP(bitmap, MAX_PRIO+1); /* include 1 bit for delimiter */
struct list_head queue[MAX_PRIO];
 };
@@ -234,6 +239,7 @@ struct rq {
 */
unsigned long nr_uninterruptible;
 
+   unsigned long switch_timestamp;
unsigned long expired_timestamp;
/* Cached timestamp set by update_cpu_clock() */
unsigned long long most_recent_timestamp;
@@ -691,6 +697,8 @@ static void dequeue_task(struct task_str
list_del(>run_list);
if (list_empty(array->queue + p->prio))
__clear_bit(p->prio, array->bitmap);
+   if (TASK_INTERACTIVE(p))
+   array->interactive_ticks -= p->time_slice;
 }
 
 static void enqueue_task(struct task_struct *p, struct prio_array *array)
@@ -700,6 +708,8 @@ static void enqueue_task(struct task_str
__set_bit(p->prio, array->bitmap);
array->nr_active++;
p->array = array;
+   if (TASK_INTERACTIVE(p))
+   array->interactive_ticks += p->time_slice;
 }
 
 /*
@@ -882,7 +892,11 @@ static int recalc_task_prio(struct task_
/* Caller must always ensure 'now >= p->timestamp' */
unsigned long sleep_time = now - p->timestamp;
 
-   if (batch_task(p))
+   /*
+* Migration timestamp adjustment may induce negative time.
+* Ignore unquantifiable values as well as SCHED_BATCH tasks.
+*/ 
+   if (now < p->timestamp || batch_task(p))
sleep_time = 0;
 
if (likely(sleep_time > 0)) {
@@ -3051,9 +3065,9 @@ static inline int expired_starving(struc
 {
if (rq->curr->static_prio > rq->best_expired_prio)
return 1;
-   if (!STARVATION_LIMIT || !rq->expired_timestamp)
+   if (!STARVATION_LIMIT)
return 0;
-   if (jiffies - rq->expired_timestamp > STARVATION_LIMIT * rq->nr_running)
+   if (jiffies - rq->switch_timestamp > STARVATION_LIMIT * rq->nr_running)
return 1;
return 0;
 }
@@ -3131,8 +3145,74 @@ void account_steal_time(struct task_stru
cpustat->steal = cputime64_add(cpustat->steal, tmp);
 }
 
+/*
+ * Promote and requeue the next lower priority task.  If no task
+ * is available in the active array, switch to the expired array.
+ * @rq: runqueue to search.
+ * @prio: priority at which to begin search.
+ */
+static inline void promote_next_lower(struct rq *rq, int prio)
+{
+   struct prio_array *array = rq->active;
+   struct task_struct *p = NULL;
+   unsigned long long now = rq->most_recent_timestamp;
+   unsigned long *bitmap;
+   unsigned long starving = JIFFIES_TO_NS(rq->nr_running * DEF_TIMESLICE);
+   int idx = prio + 1, found_noninteractive = 0;
+
+repeat:
+   bitmap = array->bitmap;
+   idx = find_next_bit(bitmap, MAX_PRIO, idx);
+   if (idx < MAX_PRIO) {
+   struct list_head *queue = array->queue + idx;
+
+   p = list_entry(queue->next, struct task_struct,

Re: [test] hackbench.c interactivity results: vanilla versus SD/RSDL

2007-03-30 Thread Xenofon Antidides

--- Mike Galbraith <[EMAIL PROTECTED]> wrote:

> On Fri, 2007-03-30 at 19:36 -0700, Xenofon Antidides
> wrote:
> 
> > Something different on many cpus? Sorry I was
> thinking
> > something other. I try 50% run + 50% sleep on one
> cpu
> > and mainline has big problem. Sorry for bad code I
> > copy bits to make it work. Start program first
> then
> > run bash 100% cpu (while : ; do : ; done). Try
> change
> > program forks from 1 till 3 or more mainline
> kernel
> > and bash gets 0%.

Mainline hangs with program. SD does not have problem
with program and is more responsible then mainline.

> That's mainline with the below (which I'm trying
> various ideas to improve).

Patch makes X yuck with any load. I stick with SD.

Xant

Bored stiff? Loosen up... 
Download and play hundreds of games for free on Yahoo! Games.
http://games.yahoo.com/games/front
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [test] hackbench.c interactivity results: vanilla versus SD/RSDL

2007-03-30 Thread Nick Piggin

Xenofon Antidides wrote:

- Original Message 
From: Ingo Molnar <[EMAIL PROTECTED]>
To: Con Kolivas <[EMAIL PROTECTED]>
Cc: linux list ; Andrew Morton <[EMAIL PROTECTED]>; 
Mike Galbraith <[EMAIL PROTECTED]>
Sent: Thursday, March 29, 2007 9:22:49 PM
Subject: [test] hackbench.c interactivity results: vanilla versus SD/RSDL

* Ingo Molnar <[EMAIL PROTECTED]> wrote:

* Con Kolivas <[EMAIL PROTECTED]> wrote:

I'm cautiously optimistic that we're at the thin edge of the bugfix 
wedge now.

[...]

and the numbers he posted:

http://marc.info/?l=linux-kernel=117448900626028=2

We been staring at these numbers for while now and we come to the conclusion 
they wrong.

The test is f is 3 tasks, two on different and one on same cpu as sh here:
virgin 2.6.21-rc3-rsdl-smp
top - 13:52:50 up 7 min, 12 users,  load average: 3.45, 2.89, 1.51

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  P COMMAND
 6560 root  31   0  2892 1236 1032 R   82  0.1   1:50.24 1 sh
 6558 root  28   0  1428  276  228 S   42  0.0   1:00.09 1 f
 6557 root  30   0  1424  280  228 R   35  0.0   1:00.25 0 f
 6559 root  39   0  1424  276  228 R   33  0.0   0:58.36 0 f

6560 sh is asking for 100% cpu on cpu number 1
6558 f is asking for 50% cpu on cpu number 1
6557 f is asking for 50% cpu on cpu number 0
6559 f is asking for 50% cpu on cpu number 0

So if 6560 and 6558 are asking for cpu from cpu number 1:
6560 wants 100% and 6558 wants 50%.
6560 should get 2/3 cpu 6558 should get 1/3 cpu

I don't think you can say that. If the 50% task alternated between
long periods of running and sleeping, then the end result should
approach a task that is sleeping for 50% of the time, and on the
CPU 25% of the time. As the periods get shorter, then the schedulers
will favour the 50% task relatively more, but details will depend on
implementation.

You could have an implementation that always gives runs the 50% task
when it becomes runnable, because it is decided that its priority is
higher because it has been sleeping.

The only thing you can really say is that the 50% task should get
between 25% and 50% (inclusive) CPU time.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 4/6] Convert PDA into the percpu section

2007-03-30 Thread Jeremy Fitzhardinge

Currently x86 (similar to x84-64) has a special per-cpu structure
called "i386_pda" which can be easily and efficiently referenced via
the %fs register.  An ELF section is more flexible than a structure,
allowing any piece of code to use this area.  Indeed, such a section
already exists: the per-cpu area.

So this patch:
(1) Removes the PDA and uses per-cpu variables for each current member.
(2) Replaces the __KERNEL_PDA segment with __KERNEL_PERCPU.
(3) Creates a per-cpu mirror of __per_cpu_offset called this_cpu_off, which
can be used to calculate addresses for this CPU's variables.
(4) Simplifies startup, because %fs doesn't need to be loaded with a
special segment at early boot; it can be deferred until the first
percpu area is allocated (or never for UP).

The result is less code and one less x86-specific concept.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
---
 arch/i386/kernel/asm-offsets.c |5 -
 arch/i386/kernel/cpu/common.c  |   17 -
 arch/i386/kernel/entry.S   |5 -
 arch/i386/kernel/head.S|   31 +
 arch/i386/kernel/i386_ksyms.c  |2 
 arch/i386/kernel/irq.c |3 
 arch/i386/kernel/process.c |   12 ++-
 arch/i386/kernel/smpboot.c |   34 --
 arch/i386/kernel/vmi.c |6 -
 arch/i386/kernel/vmlinux.lds.S |1 
 include/asm-i386/current.h |5 -
 include/asm-i386/irq_regs.h|   12 ++-
 include/asm-i386/pda.h |   99 --
 include/asm-i386/percpu.h  |  132 +---
 include/asm-i386/processor.h   |2 
 include/asm-i386/segment.h |6 -
 include/asm-i386/smp.h |4 -
 17 files changed, 179 insertions(+), 197 deletions(-)

===
--- a/arch/i386/kernel/asm-offsets.c
+++ b/arch/i386/kernel/asm-offsets.c
@@ -15,7 +15,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #define DEFINE(sym, val) \
 asm volatile("\n->" #sym " %0 " #val : : "i" (val))
@@ -101,10 +100,6 @@ void foo(void)
 
OFFSET(crypto_tfm_ctx_offset, crypto_tfm, __crt_ctx);
 
-   BLANK();
-   OFFSET(PDA_cpu, i386_pda, cpu_number);
-   OFFSET(PDA_pcurrent, i386_pda, pcurrent);
-
 #ifdef CONFIG_PARAVIRT
BLANK();
OFFSET(PARAVIRT_enabled, paravirt_ops, paravirt_enabled);
===
--- a/arch/i386/kernel/cpu/common.c
+++ b/arch/i386/kernel/cpu/common.c
@@ -18,7 +18,6 @@
 #include 
 #include 
 #endif
-#include 
 
 #include "cpu.h"
 
@@ -47,12 +46,9 @@ DEFINE_PER_CPU(struct gdt_page, gdt_page
[GDT_ENTRY_APMBIOS_BASE+2] = { 0x, 0x00409200 }, /* data */
 
[GDT_ENTRY_ESPFIX_SS] = { 0x, 0x00c09200 },
-   [GDT_ENTRY_PDA] = { 0x, 0x00c09200 }, /* set in setup_pda */
+   [GDT_ENTRY_PERCPU] = { 0x, 0x },
 } };
 EXPORT_PER_CPU_SYMBOL_GPL(gdt_page);
-
-DEFINE_PER_CPU(struct i386_pda, _cpu_pda);
-EXPORT_PER_CPU_SYMBOL(_cpu_pda);
 
 static int cachesize_override __cpuinitdata = -1;
 static int disable_x86_fxsr __cpuinitdata;
@@ -627,20 +623,13 @@ void __init early_cpu_init(void)
 #endif
 }
 
-/* Make sure %gs is initialized properly in idle threads */
+/* Make sure %fs is initialized properly in idle threads */
 struct pt_regs * __devinit idle_regs(struct pt_regs *regs)
 {
memset(regs, 0, sizeof(struct pt_regs));
-   regs->xfs = __KERNEL_PDA;
+   regs->xfs = __KERNEL_PERCPU;
return regs;
 }
-
-/* Initial PDA used by boot CPU */
-struct i386_pda boot_pda = {
-   ._pda = _pda,
-   .cpu_number = 0,
-   .pcurrent = _task,
-};
 
 /*
  * cpu_init() initializes state that is per-CPU. Some data is already
===
--- a/arch/i386/kernel/entry.S
+++ b/arch/i386/kernel/entry.S
@@ -132,7 +132,7 @@ 1:
movl $(__USER_DS), %edx; \
movl %edx, %ds; \
movl %edx, %es; \
-   movl $(__KERNEL_PDA), %edx; \
+   movl $(__KERNEL_PERCPU), %edx; \
movl %edx, %fs
 
 #define RESTORE_INT_REGS \
@@ -560,7 +560,6 @@ END(syscall_badsys)
 
 #define FIXUP_ESPFIX_STACK \
/* since we are on a wrong stack, we cant make it a C code :( */ \
-   movl %fs:PDA_cpu, %ebx; \
PER_CPU(gdt_page, %ebx); \
GET_DESC_BASE(GDT_ENTRY_ESPFIX_SS, %ebx, %eax, %ax, %al, %ah); \
addl %esp, %eax; \
@@ -685,7 +684,7 @@ error_code:
pushl %fs
CFI_ADJUST_CFA_OFFSET 4
/*CFI_REL_OFFSET fs, 0*/
-   movl $(__KERNEL_PDA), %ecx
+   movl $(__KERNEL_PERCPU), %ecx
movl %ecx, %fs
UNWIND_ESPFIX_STACK
popl %ecx
===
--- a/arch/i386/kernel/head.S
+++ b/arch/i386/kernel/head.S
@@ -317,12 +317,12 @@ 2:movl %cr0,%eax
movl %eax,%cr0

[patch 6/6] Define per_cpu_offset

2007-03-30 Thread Jeremy Fitzhardinge

Define per_cpu_offset in asm-i386/percpu.h when SMP defined, like
asm-generic/percpu.h does for UP.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>

---
 include/asm-i386/percpu.h |2 ++
 1 file changed, 2 insertions(+)

===
--- a/include/asm-i386/percpu.h
+++ b/include/asm-i386/percpu.h
@@ -34,6 +34,8 @@
 
 /* This is used for other cpus to find our section. */
 extern unsigned long __per_cpu_offset[];
+
+#define per_cpu_offset(x) (__per_cpu_offset[x])
 
 /* Separate out the type, so (int[3], foo) works. */
 #define DECLARE_PER_CPU(type, name) extern __typeof__(type) per_cpu__##name

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 5/6] cleanups to help using per-cpu variables from asm

2007-03-30 Thread Jeremy Fitzhardinge

This patch does a few small cleanups:
 - use PER_CPU_NAME to generate the names of per-cpu variables
 - use lea to add the per_cpu offset in PER_CPU(), because it doesn't
   affect condition flags
 - add PER_CPU_VAR which allows direct access to pre-cpu variables
   with the %fs: prefix on SMP.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>

---
 include/asm-i386/percpu.h |   12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

===
--- a/include/asm-i386/percpu.h
+++ b/include/asm-i386/percpu.h
@@ -16,12 +16,14 @@
  *PER_CPU(cpu_gdt_descr, %ebx)
  */
 #ifdef CONFIG_SMP
+#define PER_CPU(var, reg)  \
+   movl %fs:per_cpu__##this_cpu_off, reg;  \
+   lea per_cpu__##var(reg), reg
+#define PER_CPU_VAR(var)   %fs:per_cpu__##var
+#else /* ! SMP */
 #define PER_CPU(var, reg)  \
-   movl %fs:per_cpu__this_cpu_off, reg;\
-   addl $per_cpu__##var, reg
-#else /* ! SMP */
-#define PER_CPU(var, reg) \
-   movl $per_cpu__##var, reg;
+   movl $per_cpu__##var, reg
+#define PER_CPU_VAR(var)   per_cpu__##var
 #endif /* SMP */
 
 #else /* ...!ASSEMBLY */

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [test] hackbench.c interactivity results: vanilla versus

2007-03-30 Thread Al Boldi

Mike Galbraith wrote:
> Yesterday, I piddled around with tracking interactive backlog as a way
> to detect when the load isn't really an interactive load, that's very
> simple and has potential.

You may want to consider fixing latencies per nice relative to load, as the 
biggest problem with iab are huge latency delays, which exhibit themselves 
as starvation, caused by unfair timeslice management.

Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 2/6] Allow percpu variables to be page-aligned

2007-03-30 Thread Jeremy Fitzhardinge

Let's allow page-alignment in general for per-cpu data (wanted by Xen, and
Ingo suggested KVM as well).

Because larger alignments can use more room, we increase the max per-cpu
memory to 64k rather than 32k: it's getting a little tight.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
Acked-by: Ingo Molnar <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---
 arch/alpha/kernel/vmlinux.lds.S   |2 +-
 arch/arm/kernel/vmlinux.lds.S |2 +-
 arch/cris/arch-v32/vmlinux.lds.S  |1 +
 arch/frv/kernel/vmlinux.lds.S |1 +
 arch/i386/kernel/vmlinux.lds.S|2 +-
 arch/m32r/kernel/vmlinux.lds.S|2 +-
 arch/mips/kernel/vmlinux.lds.S|2 +-
 arch/parisc/kernel/vmlinux.lds.S  |2 +-
 arch/powerpc/kernel/setup_64.c|4 ++--
 arch/powerpc/kernel/vmlinux.lds.S |6 +-
 arch/ppc/kernel/vmlinux.lds.S |2 +-
 arch/s390/kernel/vmlinux.lds.S|2 +-
 arch/sh/kernel/vmlinux.lds.S  |2 +-
 arch/sh64/kernel/vmlinux.lds.S|2 +-
 arch/sparc/kernel/vmlinux.lds.S   |2 +-
 arch/sparc64/kernel/smp.c |6 +++---
 arch/x86_64/kernel/setup64.c  |4 ++--
 arch/x86_64/kernel/vmlinux.lds.S  |2 +-
 arch/xtensa/kernel/vmlinux.lds.S  |2 +-
 init/main.c   |4 ++--
 kernel/module.c   |8 
 21 files changed, 29 insertions(+), 31 deletions(-)

===
--- a/arch/alpha/kernel/vmlinux.lds.S
+++ b/arch/alpha/kernel/vmlinux.lds.S
@@ -69,7 +69,7 @@ SECTIONS
   . = ALIGN(8);
   SECURITY_INIT
 
-  . = ALIGN(64);
+  . = ALIGN(8192);
   __per_cpu_start = .;
   .data.percpu : { *(.data.percpu) }
   __per_cpu_end = .;
===
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -59,7 +59,7 @@ SECTIONS
usr/built-in.o(.init.ramfs)
__initramfs_end = .;
 #endif
-   . = ALIGN(64);
+   . = ALIGN(4096);
__per_cpu_start = .;
*(.data.percpu)
__per_cpu_end = .;
===
--- a/arch/cris/arch-v32/vmlinux.lds.S
+++ b/arch/cris/arch-v32/vmlinux.lds.S
@@ -91,6 +91,7 @@ SECTIONS
}
SECURITY_INIT
 
+   . =  ALIGN (8192);
__per_cpu_start = .;
.data.percpu  : { *(.data.percpu) }
__per_cpu_end = .;
===
--- a/arch/frv/kernel/vmlinux.lds.S
+++ b/arch/frv/kernel/vmlinux.lds.S
@@ -57,6 +57,7 @@ SECTIONS
   __alt_instructions_end = .;
  .altinstr_replacement : { *(.altinstr_replacement) }
 
+  . = ALIGN(4096);
   __per_cpu_start = .;
   .data.percpu  : { *(.data.percpu) }
   __per_cpu_end = .;
===
--- a/arch/i386/kernel/vmlinux.lds.S
+++ b/arch/i386/kernel/vmlinux.lds.S
@@ -194,7 +194,7 @@ SECTIONS
__initramfs_end = .;
   }
 #endif
-  . = ALIGN(L1_CACHE_BYTES);
+  . = ALIGN(4096);
   .data.percpu  : AT(ADDR(.data.percpu) - LOAD_OFFSET) {
__per_cpu_start = .;
*(.data.percpu)
===
--- a/arch/m32r/kernel/vmlinux.lds.S
+++ b/arch/m32r/kernel/vmlinux.lds.S
@@ -110,7 +110,7 @@ SECTIONS
   __initramfs_end = .;
 #endif
 
-  . = ALIGN(32);
+  . = ALIGN(4096);
   __per_cpu_start = .;
   .data.percpu  : { *(.data.percpu) }
   __per_cpu_end = .;
===
--- a/arch/mips/kernel/vmlinux.lds.S
+++ b/arch/mips/kernel/vmlinux.lds.S
@@ -119,7 +119,7 @@ SECTIONS
   .init.ramfs : { *(.init.ramfs) }
   __initramfs_end = .;
 #endif
-  . = ALIGN(32);
+  . = ALIGN(_PAGE_SIZE);
   __per_cpu_start = .;
   .data.percpu  : { *(.data.percpu) }
   __per_cpu_end = .;
===
--- a/arch/parisc/kernel/vmlinux.lds.S
+++ b/arch/parisc/kernel/vmlinux.lds.S
@@ -181,7 +181,7 @@ SECTIONS
   .init.ramfs : { *(.init.ramfs) }
   __initramfs_end = .;
 #endif
-  . = ALIGN(32);
+  . = ALIGN(ASM_PAGE_SIZE);
   __per_cpu_start = .;
   .data.percpu  : { *(.data.percpu) }
   __per_cpu_end = .;
===
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -583,14 +583,14 @@ void __init setup_per_cpu_areas(void)
char *ptr;
 
/* Copy section for each CPU (we discard the original) */
-   size = ALIGN(__per_cpu_end - __per_cpu_start, SMP_CACHE_BYTES);
+   size = ALIGN(__per_cpu_end - __per_cpu_start, PAGE_SIZE);
 #ifdef CONFIG_MODULES
if (size < PERCPU_ENOUGH_ROOM)
size = PERCPU_ENOUGH_ROOM;
 #endif
 
for_each_possible_cpu(i) {
-   ptr = alloc_bootmem_node(NODE_DATA(cpu_to_node(i)), size);
+

[patch 0/6] i386 gdt and percpu cleanups

2007-03-30 Thread Jeremy Fitzhardinge

Hi Andi,

This is a series of patches based on your latest queue (as of the
other day, at least).

It includes:
 - the most recent patch to compute the appropriate amount of percpu
   space to allocate, using a separate reservation for modules where
   needed.
 - make the percpu sections page-aligned, so that percpu variables can
   be page aligned if needed (which is used by gdt_page)
 - page-align the gdt
 - remove the pda and convert all pda usages into percpu variables
   (percpu variables still use the %fs prefix mechanism the pda used)
 - some improvements to asm-i386/percpu.h to make asm access to percpu
   variables easy
 - define per_cpu_offset in asm-i386/percpu.h, to match asm-generic/

Thanks,
J

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 3/6] Page-align the GDT

2007-03-30 Thread Jeremy Fitzhardinge

Xen wants a dedicated page for the GDT.  I believe VMI likes it too.
lguest, KVM and native don't care.

Simple transformation to page-aligned "struct gdt_page".

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
Acked-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

---
 arch/i386/kernel/cpu/common.c |6 +++---
 arch/i386/kernel/entry.S  |2 +-
 arch/i386/kernel/head.S   |2 +-
 arch/i386/kernel/traps.c  |2 +-
 include/asm-i386/desc.h   |9 +++--
 5 files changed, 13 insertions(+), 8 deletions(-)

===
--- a/arch/i386/kernel/cpu/common.c
+++ b/arch/i386/kernel/cpu/common.c
@@ -22,7 +22,7 @@
 
 #include "cpu.h"
 
-DEFINE_PER_CPU(struct desc_struct, cpu_gdt[GDT_ENTRIES]) = {
+DEFINE_PER_CPU(struct gdt_page, gdt_page) = { .gdt = {
[GDT_ENTRY_KERNEL_CS] = { 0x, 0x00cf9a00 },
[GDT_ENTRY_KERNEL_DS] = { 0x, 0x00cf9200 },
[GDT_ENTRY_DEFAULT_USER_CS] = { 0x, 0x00cffa00 },
@@ -48,8 +48,8 @@ DEFINE_PER_CPU(struct desc_struct, cpu_g
 
[GDT_ENTRY_ESPFIX_SS] = { 0x, 0x00c09200 },
[GDT_ENTRY_PDA] = { 0x, 0x00c09200 }, /* set in setup_pda */
-};
-EXPORT_PER_CPU_SYMBOL_GPL(cpu_gdt);
+} };
+EXPORT_PER_CPU_SYMBOL_GPL(gdt_page);
 
 DEFINE_PER_CPU(struct i386_pda, _cpu_pda) = {
._pda = _cpu___cpu_pda,
===
--- a/arch/i386/kernel/entry.S
+++ b/arch/i386/kernel/entry.S
@@ -558,7 +558,7 @@ END(syscall_badsys)
 #define FIXUP_ESPFIX_STACK \
/* since we are on a wrong stack, we cant make it a C code :( */ \
movl %fs:PDA_cpu, %ebx; \
-   PER_CPU(cpu_gdt, %ebx); \
+   PER_CPU(gdt_page, %ebx); \
GET_DESC_BASE(GDT_ENTRY_ESPFIX_SS, %ebx, %eax, %ax, %al, %ah); \
addl %esp, %eax; \
pushl $__KERNEL_DS; \
===
--- a/arch/i386/kernel/head.S
+++ b/arch/i386/kernel/head.S
@@ -599,7 +599,7 @@ idt_descr:
.word 0 # 32 bit align gdt_desc.address
 ENTRY(early_gdt_descr)
.word GDT_ENTRIES*8-1
-   .long per_cpu__cpu_gdt  /* Overwritten for secondary CPUs */
+   .long per_cpu__gdt_page /* Overwritten for secondary CPUs */
 
 /*
  * The boot_gdt_table must mirror the equivalent in setup.S and is
===
--- a/arch/i386/kernel/traps.c
+++ b/arch/i386/kernel/traps.c
@@ -1037,7 +1037,7 @@ fastcall unsigned long patch_espfix_desc
 fastcall unsigned long patch_espfix_desc(unsigned long uesp,
  unsigned long kesp)
 {
-   struct desc_struct *gdt = __get_cpu_var(cpu_gdt);
+   struct desc_struct *gdt = __get_cpu_var(gdt_page).gdt;
unsigned long base = (kesp - uesp) & -THREAD_SIZE;
unsigned long new_kesp = kesp - base;
unsigned long lim_pages = (new_kesp | (THREAD_SIZE - 1)) >> PAGE_SHIFT;
===
--- a/include/asm-i386/desc.h
+++ b/include/asm-i386/desc.h
@@ -18,10 +18,15 @@ struct Xgt_desc_struct {
unsigned short pad;
 } __attribute__ ((packed));
 
-DECLARE_PER_CPU(struct desc_struct, cpu_gdt[GDT_ENTRIES]);
+struct gdt_page
+{
+   struct desc_struct gdt[GDT_ENTRIES];
+} __attribute__((aligned(PAGE_SIZE)));
+DECLARE_PER_CPU(struct gdt_page, gdt_page);
+
 static inline struct desc_struct *get_cpu_gdt_table(unsigned int cpu)
 {
-   return per_cpu(cpu_gdt, cpu);
+   return per_cpu(gdt_page, cpu).gdt;
 }
 
 extern struct Xgt_desc_struct idt_descr;

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 1/6] i386: Account for module percpu space separately from kernel percpu

2007-03-30 Thread Jeremy Fitzhardinge

Rather than using a single constant PERCPU_ENOUGH_ROOM, compute it as
the sum of kernel_percpu + PERCPU_MODULE_RESERVE.  This is now common
to all architectures; if an architecture wants to set
PERCPU_ENOUGH_ROOM to something special, then it may do so (ia64 is
the only one which does).

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>
Cc: Eric W. Biederman <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>

---
 include/asm-alpha/percpu.h   |   14 --
 include/asm-sparc64/percpu.h |   10 --
 include/asm-x86_64/percpu.h  |   10 --
 include/linux/percpu.h   |9 -
 init/main.c  |7 ++-
 kernel/module.c  |2 +-
 6 files changed, 11 insertions(+), 41 deletions(-)

===
--- a/include/asm-alpha/percpu.h
+++ b/include/asm-alpha/percpu.h
@@ -1,19 +1,5 @@
 #ifndef __ALPHA_PERCPU_H
 #define __ALPHA_PERCPU_H
-
-/*
- * Increase the per cpu area for Alpha so that
- * modules using percpu area can load.
- */
-#ifdef CONFIG_MODULES
-# define PERCPU_MODULE_RESERVE 8192
-#else
-# define PERCPU_MODULE_RESERVE 0
-#endif
-
-#define PERCPU_ENOUGH_ROOM \
-   (ALIGN(__per_cpu_end - __per_cpu_start, SMP_CACHE_BYTES) + \
-PERCPU_MODULE_RESERVE)
 
 #include 
 
===
--- a/include/asm-sparc64/percpu.h
+++ b/include/asm-sparc64/percpu.h
@@ -4,16 +4,6 @@
 #include 
 
 #ifdef CONFIG_SMP
-
-#ifdef CONFIG_MODULES
-# define PERCPU_MODULE_RESERVE 8192
-#else
-# define PERCPU_MODULE_RESERVE 0
-#endif
-
-#define PERCPU_ENOUGH_ROOM \
-   (ALIGN(__per_cpu_end - __per_cpu_start, SMP_CACHE_BYTES) + \
-PERCPU_MODULE_RESERVE)
 
 extern void setup_per_cpu_areas(void);
 
===
--- a/include/asm-x86_64/percpu.h
+++ b/include/asm-x86_64/percpu.h
@@ -10,16 +10,6 @@
 #ifdef CONFIG_SMP
 
 #include 
-
-#ifdef CONFIG_MODULES
-# define PERCPU_MODULE_RESERVE 8192
-#else
-# define PERCPU_MODULE_RESERVE 0
-#endif
-
-#define PERCPU_ENOUGH_ROOM \
-   (ALIGN(__per_cpu_end - __per_cpu_start, SMP_CACHE_BYTES) + \
-PERCPU_MODULE_RESERVE)
 
 #define __per_cpu_offset(cpu) (cpu_pda(cpu)->data_offset)
 #define __my_cpu_offset() read_pda(data_offset)
===
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -11,8 +11,15 @@
 
 /* Enough to cover all DEFINE_PER_CPUs in kernel, including modules. */
 #ifndef PERCPU_ENOUGH_ROOM
-#define PERCPU_ENOUGH_ROOM 32768
+#ifdef CONFIG_MODULES
+#define PERCPU_MODULE_RESERVE  8192
+#else
+#define PERCPU_MODULE_RESERVE  0
 #endif
+
+#define PERCPU_ENOUGH_ROOM \
+   (__per_cpu_end - __per_cpu_start + PERCPU_MODULE_RESERVE)
+#endif /* PERCPU_ENOUGH_ROOM */
 
 /*
  * Must be an lvalue. Since @var must be a simple identifier,
===
--- a/init/main.c
+++ b/init/main.c
@@ -369,11 +369,8 @@ static void __init setup_per_cpu_areas(v
unsigned long nr_possible_cpus = num_possible_cpus();
 
/* Copy section for each CPU (we discard the original) */
-   size = ALIGN(__per_cpu_end - __per_cpu_start, SMP_CACHE_BYTES);
-#ifdef CONFIG_MODULES
-   if (size < PERCPU_ENOUGH_ROOM)
-   size = PERCPU_ENOUGH_ROOM;
-#endif
+
+   size = ALIGN(PERCPU_ENOUGH_ROOM, SMP_CACHE_BYTES);
ptr = alloc_bootmem(size * nr_possible_cpus);
 
for_each_possible_cpu(i) {
===
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -430,7 +430,7 @@ static int percpu_modinit(void)
pcpu_size = kmalloc(sizeof(pcpu_size[0]) * pcpu_num_allocated,
GFP_KERNEL);
/* Static in-kernel percpu data (used). */
-   pcpu_size[0] = -ALIGN(__per_cpu_end-__per_cpu_start, SMP_CACHE_BYTES);
+   pcpu_size[0] = -(__per_cpu_end-__per_cpu_start);
/* Free room. */
pcpu_size[1] = PERCPU_ENOUGH_ROOM + pcpu_size[0];
if (pcpu_size[1] < 0) {

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[powerpc] RS/6000 43p-150 no longer boots as of 2.6.18

2007-03-30 Thread Peter Samuelson


I know this is a bit late to be reporting this, as it happened before
2.6.18, but my PowerPC CHRP machine (RS/6000 43p-150, 604e CPU) no
longer boots.  From the console:

  instantiating rtas at 0x1ffe5000 ... done
  copying OF device tree ...
  Building dt strings...
  Building dt structure...
  Device tree strings 0x008dc000 -> 0x008dcd97
  Device tree struct  0x008dd000 -> 0x008e2000
  Calling quiesce ...
  returning from prom_init

...and here it hangs.  This happened between 2.6.17-git21 and -git22.
.config is attached.  I'd be happy to test patches and provide more
information.

Thanks,
Peter
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.17-git21
# Sat Feb  3 23:58:54 2007
#
# CONFIG_PPC64 is not set
CONFIG_PPC32=y
CONFIG_PPC_MERGE=y
CONFIG_MMU=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_IRQ_PER_CPU=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_GENERIC_FIND_NEXT_BIT=y
CONFIG_PPC=y
CONFIG_EARLY_PRINTK=y
CONFIG_GENERIC_NVRAM=y
CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_PPC_OF=y
CONFIG_PPC_UDBG_16550=y
# CONFIG_GENERIC_TBSYNC is not set
# CONFIG_DEFAULT_UIMAGE is not set

#
# Processor support
#
CONFIG_CLASSIC32=y
# CONFIG_PPC_52xx is not set
# CONFIG_PPC_82xx is not set
# CONFIG_PPC_83xx is not set
# CONFIG_PPC_85xx is not set
# CONFIG_PPC_86xx is not set
# CONFIG_40x is not set
# CONFIG_44x is not set
# CONFIG_8xx is not set
# CONFIG_E200 is not set
CONFIG_6xx=y
CONFIG_PPC_FPU=y
# CONFIG_ALTIVEC is not set
CONFIG_PPC_STD_MMU=y
CONFIG_PPC_STD_MMU_32=y
# CONFIG_SMP is not set
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32

#
# General setup
#
CONFIG_LOCALVERSION="-wire"
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
CONFIG_SYSCTL=y
CONFIG_AUDIT=y
# CONFIG_AUDITSYSCALL is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
# CONFIG_RELAY is not set
CONFIG_INITRAMFS_SOURCE=""
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
# CONFIG_EMBEDDED is not set
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_RT_MUTEXES=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SHMEM=y
CONFIG_SLAB=y
CONFIG_VM_EVENT_COUNTERS=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
# CONFIG_SLOB is not set

#
# Loadable module support
#
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_KMOD=y

#
# Block layer
#
# CONFIG_LBD is not set
# CONFIG_BLK_DEV_IO_TRACE is not set
# CONFIG_LSF is not set

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
CONFIG_DEFAULT_AS=y
# CONFIG_DEFAULT_DEADLINE is not set
# CONFIG_DEFAULT_CFQ is not set
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="anticipatory"

#
# Platform support
#
CONFIG_PPC_MULTIPLATFORM=y
# CONFIG_PPC_ISERIES is not set
# CONFIG_EMBEDDED6xx is not set
# CONFIG_APUS is not set
CONFIG_PPC_CHRP=y
# CONFIG_PPC_PMAC is not set
# CONFIG_PPC_CELL is not set
# CONFIG_PPC_CELL_NATIVE is not set
# CONFIG_UDBG_RTAS_CONSOLE is not set
CONFIG_MPIC=y
CONFIG_PPC_RTAS=y
# CONFIG_RTAS_ERROR_LOGGING is not set
CONFIG_RTAS_PROC=y
# CONFIG_MMIO_NVRAM is not set
CONFIG_PPC_MPC106=y
# CONFIG_PPC_970_NAP is not set
# CONFIG_CPU_FREQ is not set
CONFIG_TAU=y
# CONFIG_TAU_INT is not set
# CONFIG_TAU_AVERAGE is not set
# CONFIG_WANT_EARLY_SERIAL is not set

#
# Kernel options
#
# CONFIG_HIGHMEM is not set
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_MISC=m
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
# CONFIG_KEXEC is not set
CONFIG_ARCH_FLATMEM_ENABLE=y
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_FLATMEM_MANUAL=y
# CONFIG_DISCONTIGMEM_MANUAL is not set
# CONFIG_SPARSEMEM_MANUAL is not set
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
# CONFIG_SPARSEMEM_STATIC is not set
CONFIG_SPLIT_PTLOCK_CPUS=4
# CONFIG_RESOURCES_64BIT is not set
CONFIG_PROC_DEVICETREE=y
CONFIG_CMDLINE_BOOL=y
CONFIG_CMDLINE="console=ttyS0,9600 console=tty0 root=/dev/hda1"
CONFIG_PM=y
CONFIG_PM_LEGACY=y
# CONFIG_PM_DEBUG is not set
# CONFIG_SOFTWARE_SUSPEND is not set
CONFIG_SECCOMP=y
CONFIG_ISA_DMA_API=y

#
# Bus options
#
CONFIG_ISA=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_PPC_I8259=y
CONFIG_PPC_INDIRECT_PCI=y
CONFIG_PCI=y
CONFIG_PCI_DOMAINS=y
# CONFIG_PCIEPORTBUS is not set
# CONFIG_PCI_DEBUG is not set

#
# PCCARD (PCMCIA/CardBus) support
#
# CONFIG_PCCARD is not set

#
# PCI Hotplug Support
#
# CONFIG_HOTPLUG_PCI is not set

#
# Advanced setup
#
# CONFIG_ADVANCED_OPTIONS is not set

#
# Default settings

Re: [patch 32/37] CRYPTO: api: scatterwalk_copychunks() fails to advance through scatterlist

2007-03-30 Thread Herbert Xu

On Fri, Mar 30, 2007 at 08:11:29PM -0700, Greg KH wrote:
> 
> Is this an "add-on" patch, or a replacement one?

This is an add-on.  In case you want a replacement, here it is:

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/crypto/scatterwalk.c b/crypto/scatterwalk.c
index 35172d3..0f76175 100644
--- a/crypto/scatterwalk.c
+++ b/crypto/scatterwalk.c
@@ -91,6 +91,8 @@ void scatterwalk_copychunks(void *buf, struct scatter_walk 
*walk,
memcpy_dir(buf, vaddr, len_this_page, out);
scatterwalk_unmap(vaddr, out);
 
+   scatterwalk_advance(walk, len_this_page);
+
if (nbytes == len_this_page)
break;
 
@@ -99,7 +101,5 @@ void scatterwalk_copychunks(void *buf, struct scatter_walk 
*walk,
 
scatterwalk_pagedone(walk, out, 1);
}
-
-   scatterwalk_advance(walk, nbytes);
 }
 EXPORT_SYMBOL_GPL(scatterwalk_copychunks);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [test] hackbench.c interactivity results: vanilla versus SD/RSDL

2007-03-30 Thread Mike Galbraith

On Sat, 2007-03-31 at 05:23 +0200, Mike Galbraith wrote:
> On Fri, 2007-03-30 at 19:36 -0700, Xenofon Antidides wrote:
> 
> > Something different on many cpus? Sorry I was thinking
> > something other. I try 50% run + 50% sleep on one cpu
> > and mainline has big problem. Sorry for bad code I
> > copy bits to make it work. Start program first then
> > run bash 100% cpu (while : ; do : ; done). Try change
> > program forks from 1 till 3 or more mainline kernel
> > and bash gets 0%.
> 
> top - 05:16:41 up 43 min, 13 users,  load average: 9.51, 4.32, 5.67
> 
>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  P COMMAND
>  7146 root  15   0  1564  104   24 R   43  0.0   0:20.74 0 fiftypercent
>  7142 root  15   0  1564  104   24 S   37  0.0   0:18.08 0 fiftypercent
>  7140 root  15   0  1564  436  356 R   21  0.0   0:18.94 1 fiftypercent
>  7144 root  15   0  1564  104   24 R   21  0.0   0:18.75 1 fiftypercent
>  7143 root  15   0  1564  104   24 R   20  0.0   0:18.85 1 fiftypercent
>  7145 root  15   0  1564  104   24 R   19  0.0   0:18.30 1 fiftypercent
>  7147 root  15   0  1564  104   24 R   19  0.0   0:18.03 1 fiftypercent
>  7141 root  16   0  1564  104   24 R   10  0.0   0:18.29 0 fiftypercent
>  6245 root  16   0  3368 1876 1376 R7  0.2   0:49.81 0 bash
> 
> That's mainline with the below (which I'm trying various ideas to improve).

Note: that's not an sh -c started at the same time as the 50% duty cycle
dos, the pertinent data is that bash is getting into the loop.

Yesterday, I piddled around with tracking interactive backlog as a way
to detect when the load isn't really an interactive load, that's very
simple and has potential.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 32/37] CRYPTO: api: scatterwalk_copychunks() fails to advance through scatterlist

2007-03-30 Thread Greg KH

On Sat, Mar 31, 2007 at 12:14:37PM +1000, Herbert Xu wrote:
> On Sat, Mar 31, 2007 at 03:41:32AM +0200, Patrick McHardy wrote:
> >
> > > [CRYPTO] api: scatterwalk_copychunks() fails to advance through 
> > > scatterlist
> > 
> > This patch seems to cause some problems, I get reproducable freezes
> > on the receiving system with net-2.6.22 when sending IPsec packets
> > larger than the mtu (reproduced about 10 times). Reverting this
> > patch seems to fix it. In a few cases the oops also occured on the
> > sending system.
> > 
> > Backtrace from UML (sending system):
> > 
> > uml:~# ping 10.0.0.1 -s 2
> > PING 10.0.0.1 (10.0.0.1) 2(20028) bytes of data.
> > BUG: soft lockup detected on CPU#0!
> > Call Trace:
> 
> Indeed.  That patch was buggy.  Sorry for not catching this earlier.
> 
> This should fix the problem.
> 
> [CRYPTO] api: Use the right value when advancing scatterwalk_copychunks
> 
> In the scatterwalk_copychunks loop, We should be advancing by
> len_this_page and not nbytes.  The latter is the total length.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Is this an "add-on" patch, or a replacement one?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: libata bugfix: preserve LBA bit for HDIO_DRIVE_TASK

2007-03-30 Thread Tejun Heo

Mark Lord wrote:
> Ideally, this would go into linux-2.6.21.
> 
> Preserve the LBA bit in the DevSel/Head register for HDIO_DRIVE_TASK.
> 
> Signed-off-by:  Mark Lord <[EMAIL PROTECTED]>
> ---
> --- linux/drivers/ata/libata-scsi.c.orig2007-03-21
> 13:35:02.0 -0400
> +++ linux/drivers/ata/libata-scsi.c2007-03-30 17:40:58.0 -0400
> @@ -333,7 +333,7 @@
> scsi_cmd[8]  = args[3];
> scsi_cmd[10] = args[4];
> scsi_cmd[12] = args[5];
> -scsi_cmd[13] = args[6] & 0x0f;
> +scsi_cmd[13] = args[6] & 0x4f;
> scsi_cmd[14] = args[0];
> 
> /* Good values for timeout and retries?  Values below

IDE seems to be just overriding devsel (0x10) and leaving the rest
alone.  Maybe we should do (args[6] & ~0x10) here?  Or is it safer this way?

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [test] hackbench.c interactivity results: vanilla versus SD/RSDL

2007-03-30 Thread Mike Galbraith

On Fri, 2007-03-30 at 19:36 -0700, Xenofon Antidides wrote:

> Something different on many cpus? Sorry I was thinking
> something other. I try 50% run + 50% sleep on one cpu
> and mainline has big problem. Sorry for bad code I
> copy bits to make it work. Start program first then
> run bash 100% cpu (while : ; do : ; done). Try change
> program forks from 1 till 3 or more mainline kernel
> and bash gets 0%.

top - 05:16:41 up 43 min, 13 users,  load average: 9.51, 4.32, 5.67

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  P COMMAND
 7146 root  15   0  1564  104   24 R   43  0.0   0:20.74 0 fiftypercent
 7142 root  15   0  1564  104   24 S   37  0.0   0:18.08 0 fiftypercent
 7140 root  15   0  1564  436  356 R   21  0.0   0:18.94 1 fiftypercent
 7144 root  15   0  1564  104   24 R   21  0.0   0:18.75 1 fiftypercent
 7143 root  15   0  1564  104   24 R   20  0.0   0:18.85 1 fiftypercent
 7145 root  15   0  1564  104   24 R   19  0.0   0:18.30 1 fiftypercent
 7147 root  15   0  1564  104   24 R   19  0.0   0:18.03 1 fiftypercent
 7141 root  16   0  1564  104   24 R   10  0.0   0:18.29 0 fiftypercent
 6245 root  16   0  3368 1876 1376 R7  0.2   0:49.81 0 bash

That's mainline with the below (which I'm trying various ideas to improve).

--- linux-2.6.21-rc5/kernel/sched.c.org 2007-03-27 15:47:49.0 +0200
+++ linux-2.6.21-rc5/kernel/sched.c 2007-03-30 18:21:12.0 +0200
@@ -234,6 +234,7 @@ struct rq {
 */
unsigned long nr_uninterruptible;
 
+   unsigned long switch_timestamp;
unsigned long expired_timestamp;
/* Cached timestamp set by update_cpu_clock() */
unsigned long long most_recent_timestamp;
@@ -882,7 +883,11 @@ static int recalc_task_prio(struct task_
/* Caller must always ensure 'now >= p->timestamp' */
unsigned long sleep_time = now - p->timestamp;
 
-   if (batch_task(p))
+   /*
+* Migration timestamp adjustment may induce negative time.
+* Ignore unquantifiable values as well as SCHED_BATCH tasks.
+*/ 
+   if (now < p->timestamp || batch_task(p))
sleep_time = 0;
 
if (likely(sleep_time > 0)) {
@@ -3051,9 +3056,9 @@ static inline int expired_starving(struc
 {
if (rq->curr->static_prio > rq->best_expired_prio)
return 1;
-   if (!STARVATION_LIMIT || !rq->expired_timestamp)
+   if (!STARVATION_LIMIT)
return 0;
-   if (jiffies - rq->expired_timestamp > STARVATION_LIMIT * rq->nr_running)
+   if (jiffies - rq->switch_timestamp > STARVATION_LIMIT * rq->nr_running)
return 1;
return 0;
 }
@@ -3131,6 +3136,67 @@ void account_steal_time(struct task_stru
cpustat->steal = cputime64_add(cpustat->steal, tmp);
 }
 
+/*
+ * Promote and requeue the next lower priority task.  If no task
+ * is available in the active array, switch to the expired array.
+ * @rq: runqueue to search.
+ * @prio: priority at which to begin search.
+ */
+static inline void promote_next_lower(struct rq *rq, int prio)
+{
+   struct prio_array *array = rq->active;
+   struct task_struct *p = NULL;
+   unsigned long long now = rq->most_recent_timestamp;
+   unsigned long *bitmap;
+   unsigned long starving = JIFFIES_TO_NS(rq->nr_running * DEF_TIMESLICE);
+   int idx = prio + 1, found_noninteractive = 0;
+
+repeat:
+   bitmap = array->bitmap;
+   idx = find_next_bit(bitmap, MAX_PRIO, idx);
+   if (idx < MAX_PRIO) {
+   struct list_head *queue = array->queue + idx;
+
+   p = list_entry(queue->next, struct task_struct, run_list);
+   if (!TASK_INTERACTIVE(p))
+   found_noninteractive = 1;
+
+   /* Skip non-starved queues. */
+   if (now < p->last_ran + starving) {
+   idx++;
+   p = NULL;
+   goto repeat;
+   }
+   } else if (!found_noninteractive && array == rq->active) {
+   /* Nobody home, check the expired array. */
+   array = rq->expired;
+   idx = 0;
+   p = NULL;
+   goto repeat;
+   }
+
+   /* Found one, requeue it. */
+   if (p) {
+   dequeue_task(p, p->array);
+   if (array == rq->active)
+   p->prio--;
+   /*
+* If we pulled a task from the expired array, correct
+* expired array info.  We can't afford a full search
+* for best_expired_prio, but do the best we can.
+*/
+   else {
+   idx = sched_find_first_bit(array->bitmap);
+   if (idx < MAX_PRIO) {
+   if (rq->best_expired_prio > idx)
+   rq->best_expired_prio = idx;
+   } else
+

[PATCH] Clean up ELF note generation

2007-03-30 Thread Jeremy Fitzhardinge

Three cleanups:

1: ELF notes are never mapped, so there's no need to have any access
   flags in their phdr.

2: When generating them from asm, tell the assembler to use a SHT_NOTE
   section type.  There doesn't seem to be a way to do this from C.

3: Use ANSI rather than traditional cpp behaviour to stringify the
   macro argument.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: Eric W. Biederman <[EMAIL PROTECTED]>

---
 arch/i386/kernel/vmlinux.lds.S|2 +-
 include/asm-generic/vmlinux.lds.h |2 +-
 include/linux/elfnote.h   |4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

===
--- a/arch/i386/kernel/vmlinux.lds.S
+++ b/arch/i386/kernel/vmlinux.lds.S
@@ -34,7 +34,7 @@ PHDRS {
 PHDRS {
text PT_LOAD FLAGS(5);  /* R_E */
data PT_LOAD FLAGS(7);  /* RWE */
-   note PT_NOTE FLAGS(4);  /* R__ */
+   note PT_NOTE FLAGS(0);  /* ___ */
 }
 SECTIONS
 {
===
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -208,7 +208,7 @@
}
 
 #define NOTES  \
-   .notes : { *(.note.*) } :note
+   .notes : { *(.note.*) } :note
 
 #define INITCALLS  \
*(.initcall0.init)  \
===
--- a/include/linux/elfnote.h
+++ b/include/linux/elfnote.h
@@ -39,12 +39,12 @@
  *  ELFNOTE(XYZCo, 12, .long, 0xdeadbeef)
  */
 #define ELFNOTE(name, type, desctype, descdata)\
-.pushsection .note.name;   \
+.pushsection .note.name, "",@note  ;   \
   .align 4 ;   \
   .long 2f - 1f/* namesz */;   \
   .long 4f - 3f/* descsz */;   \
   .long type   ;   \
-1:.asciz "name";   \
+1:.asciz #name ;   \
 2:.align 4 ;   \
 3:desctype descdata;   \
 4:.align 4 ;   \

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [3/4] 2.6.21-rc5: known regressions (v2)

2007-03-30 Thread Adrian Bunk

On Sat, Mar 31, 2007 at 10:52:59AM +0800, Jeff Chua wrote:
> On 3/31/07, Adrian Bunk <[EMAIL PROTECTED]> wrote:
> 
> >Subject: ThinkPad doesn't resume from suspend to RAM
> >References : http://lkml.org/lkml/2007/2/27/80
> > http://lkml.org/lkml/2007/2/28/348
> >Submitter  : Jens Axboe <[EMAIL PROTECTED]>
> > Jeff Chua <[EMAIL PROTECTED]>
> >Status : unknown
> 
> Fixed with CONFIG_NO_HZ unset and patch from Maxim
> (http://lkml.org/lkml/2007/3/29/108).

Thanks for this information.

Jens, does suspend to RAM also work for you with the latest -git?

> Thanks,
> Jeff,

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFD driver-core] Lifetime problems of the current driver model

2007-03-30 Thread Tejun Heo

Tejun Heo wrote:
> Cornelia Huck wrote:
>> On Sat, 31 Mar 2007 00:08:19 +0900,
>> Tejun Heo <[EMAIL PROTECTED]> wrote:
>>
>>> (3) make sure all existing kobjects are released by module exit function.
>>>
>>> For example, let's say there is a hypothetical disk device /dev/dk0
>>> driven by a hypothetical driver mydrv.  /dev/dk0 is represented like the
>>> following in the sysfs tree.
>>>
>>> /sys/devices/pci:00/:00:1f.0/dk0/{myknob0,myknob1}
>>>
>>> Owner of both attrs myknob0 and myknob1 is mydrv and opening either
>>> increases the reference counts of dk0 and mydrv and closing does the
>>> opposite.
>>>
>>> * When there is no opener of either knob and the /dev/dk0 isn't used by
>>> anyone.  Reference count of dk0 is 1, mydrv 0.
>> Hm, but as long as dk0 is registered, it can be looked up and someone
>> could get a reference on it.
> 
> Yeah, exactly.  That's why any getting any kobject reference backed by a
> module must be accompanied by try_module_get().
> 
> int mydrv_get_dk(struct dk *dk)
> {
>   rc = try_module_get(mydrv);
>   if (rc)
>   return rc;
>   kobject_get(>kobj);
>   return 0;
> }

And one more thing just in case.  In the above code, try_module_get()
and kobject_get() must be and is atomic w.r.t. try_stop_module().
That's why we do the following.

  stop_machine_run(__try_stop_module, , NR_CPUS);.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 0/6] i386 gdt and percpu cleanups

2007-03-30 Thread Jeremy Fitzhardinge

Jeremy Fitzhardinge wrote:
> This is a series of patches based on your latest queue (as of the
> other day, at least).
>   

BTW, the From: line attributions got dropped from a few of these.  These:

Allow percpu variables to be page-aligned
Page-align the GDT

should be From: Rusty.  He did most of the work on the others, but I
changed them enough that he shouldn't be saddled with the blame when
things break ;)

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [PATCH] uml: fix static linking for real

2007-03-30 Thread Jeff Dike

On Sat, Mar 31, 2007 at 03:20:27AM +0200, Paolo 'Blaisorblade' Giarrusso wrote:
> There was a typo in commit 7632fc8f809a97f9d82ce125e8e3e579390ce2e5, 
> preventing
> it from working - 32bit binaries crashed hopelessly before the below fix and
> work perfectly now.
> Merge for 2.6.21, please.

ACK.

Jeff

-- 
Work email - jdike at linux dot intel dot com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFD driver-core] Lifetime problems of the current driver model

2007-03-30 Thread Tejun Heo

Cornelia Huck wrote:
> On Sat, 31 Mar 2007 00:08:19 +0900,
> Tejun Heo <[EMAIL PROTECTED]> wrote:
> 
>> (3) make sure all existing kobjects are released by module exit function.
>>
>> For example, let's say there is a hypothetical disk device /dev/dk0
>> driven by a hypothetical driver mydrv.  /dev/dk0 is represented like the
>> following in the sysfs tree.
>>
>> /sys/devices/pci:00/:00:1f.0/dk0/{myknob0,myknob1}
>>
>> Owner of both attrs myknob0 and myknob1 is mydrv and opening either
>> increases the reference counts of dk0 and mydrv and closing does the
>> opposite.
>>
>> * When there is no opener of either knob and the /dev/dk0 isn't used by
>> anyone.  Reference count of dk0 is 1, mydrv 0.
> 
> Hm, but as long as dk0 is registered, it can be looked up and someone
> could get a reference on it.

Yeah, exactly.  That's why any getting any kobject reference backed by a
module must be accompanied by try_module_get().

int mydrv_get_dk(struct dk *dk)
{
rc = try_module_get(mydrv);
if (rc)
return rc;
kobject_get(>kobj);
return 0;
}

>> * User issues rmmod mydrv.  As mydrv's reference count is zero, unload
>> proceeds and mydrv's exit function is called.
>>
>> * mydrv's exit function looks like the following.
>>
>>   mydrv_exit()
>>   {
>>  sysfs_remove_file(dk0, myknob0);
>>  sysfs_remove_file(dk1, myknob1);
>>  device_del(dk0);
>>  deinit controller;
>>  release all resources;
>>   }
>>
>> The device_del(dk0) drops dk0's reference count to zero and its
>> ->release is invoked immediately.
> 
> And here is the problem if someone else still has a reference. The
> module will be unloaded, but ->release will not be called until the
> "someone else" gives up the reference...

Exactly, in that case, module reference count must not be zero.  You and
I are saying the same thing.  Why are we running in circle?

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/block_dev.c:953: warning: 'found' might be used uninitialized in this function

2007-03-30 Thread Cong WANG


2007/3/31, Adrian Bunk <[EMAIL PROTECTED]>:

On Thu, Mar 29, 2007 at 11:16:39PM -0400, Kyle Moffett wrote:
> On Mar 28, 2007, at 16:14:54, Andrew Morton wrote:
> >On Wed, 28 Mar 2007 19:23:32 +0200 (CEST)
> >Jiri Kosina <[EMAIL PROTECTED]> wrote:
> >
> >>blockdev: bd_claim_by_kobject() could check value of unititalized
> >>pointer
> >>
> >>Fixes this warning:
> >>
> >>fs/block_dev.c: In function `bd_claim_by_kobject':
> >>fs/block_dev.c:953: warning: 'found' might be used uninitialized
> >>in this function
> >>
> >>struct bd_holder *found is initialized only when bd_claim()
> >>returns zero. If it returns nonzero, ptr stays uninitialized.
> >>Later the value of the pointer is checked.
> >
> >that generates extra code and people get upset.
> >
> >One approach which we could ue in here is
> >
> > struct bd_holder *found = found;  /* Suppress bogus gcc warning */
>
> Well, that would be correct except the warning is an actual kernel
> bug.  Read Jiri's message (which you also quoted):
> >struct bd_holder *found is initialized only when bd_claim() returns
> >zero. If it returns nonzero, ptr stays uninitialized. Later the
> >value of the pointer is checked.
>
> So in this case it has to be initialized to NULL or there's a
> potential BUG() lurking.


No, the code is correct and it's impossible that the variable ever gets
read uninitialized.

And BTW, i386 gcc 4.1 doesn't give me a warning for this.
Toralf, which gcc version and architecture did you see this with?



I am also using i386 gcc 4.1.1, and I did receive many warnings of
such kind yesterday. I think we should fix them.

And the reason for the existence of such things is we just want to use
them for writing first instead of reading, thus ignore the
initialization.

--
So Dark The Con Of Man.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 0/6] i386 gdt and percpu cleanups

2007-03-30 Thread Jeremy Fitzhardinge

Rusty Russell wrote:
> One nitpick: I'd really like PER_CPU() renamed to PER_CPU_ADDR().
> That's a separate patch, but I think would be far clearer.
>   

Seems pretty simple, given that it has precisely one use.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 4/6] Convert PDA into the percpu section

2007-03-30 Thread Jeremy Fitzhardinge

Andi Kleen wrote:
> On Saturday 31 March 2007 04:00, Jeremy Fitzhardinge wrote:
>   
>> Currently x86 (similar to x84-64) has a special per-cpu structure
>> called "i386_pda" which can be easily and efficiently referenced via
>> the %fs register.  An ELF section is more flexible than a structure,
>> allowing any piece of code to use this area.  Indeed, such a section
>> already exists: the per-cpu area.
>> 
>
> Hmm, I'm a little reluctant. This moves i386 more away from x86-64
> again. If we ever merge them it would mean more work. Do you really need it?

It cleans things up a fair bit:

   1. At initialization, it doesn't require %fs to be loaded before
  being able to use per-cpu variables, since you can use percpu with
  %fs set to a plain 0-based 4G segment; you can defer
  initialization until SMP bringup (which is never on a UP kernel).
  PDA requires %fs to be specially set up to point to an initial
  PDA, which includes setting up a gdt entry, generally before C
  code is run.  For paravirtualized boot, this setup needs to be
  replicated by each hypervisor startup sequence; without the PDA,
  it becomes a non-issue (especially since hypervisors typically
  start up with %fs as a flat segment anyway).  Overall, both UP and
  SMP boot is simpler and less fragile.
   2. Adding things to the pda requires changing , which
  often means including extra headers to allow added definitions. 
  Since pda.h is used to implement things like "current" and
  "smp_processor_id", it gets included everywhere.  Any header
  included in  effectively gets included everywhere in
  the kernel.  Also, it turns pda.h into a concentrated nest of
  patch conflicts.  percpu requires no central modifications to add
  a new percpu variable.
   3. There's no disadvantage to using a percpu at all, especially if
  you can use the x86_*_percpu functions which allow direct access
  to the variable via %fs.  If one construct will do, why have two? 
  Removing the pda removes quite a bit of unnecessary code.
   4. I think, ultimately, it would be better to migrate x86_64 away
  from using the pda to all percpu too, though this has some tricky
  bits for now.

Certainly, not having this patch at this stage will require me to rework
quite a few of the later patches.  I was going to put off sending out
this patch until later, but reworking everything to work both with pda
and percpu was so fragile and tricky-bug-prone that I decided to push it
early and save myself a lot of work.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 0/6] i386 gdt and percpu cleanups

2007-03-30 Thread Rusty Russell

>  - some improvements to asm-i386/percpu.h to make asm access to percpu
>variables easy

One nitpick: I'd really like PER_CPU() renamed to PER_CPU_ADDR().
That's a separate patch, but I think would be far clearer.

Thanks,
Rusty.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 4/6] Convert PDA into the percpu section

2007-03-30 Thread Rusty Russell

On Sat, 2007-03-31 at 04:35 +0200, Andi Kleen wrote:
> On Saturday 31 March 2007 04:00, Jeremy Fitzhardinge wrote:
> > Currently x86 (similar to x84-64) has a special per-cpu structure
> > called "i386_pda" which can be easily and efficiently referenced via
> > the %fs register.  An ELF section is more flexible than a structure,
> > allowing any piece of code to use this area.  Indeed, such a section
> > already exists: the per-cpu area.
> 
> Hmm, I'm a little reluctant. This moves i386 more away from x86-64
> again. If we ever merge them it would mean more work. Do you really need it?

Well, I think the merge should go the other way in this case: this
really does simplify things.

The only thing stopping x86-64 from doing the same as i386 is the
stack-protector stuff.  And that can be fixed (unfortunately requires a
gcc patch to change the %gs:40 to %gs:__gcc_stack_protector_offset and
emit a weak absolute symbol __gss_stack_protector_offset = 40).

I shall prepare a patch for that next week; I've been busy Kleening up
lguest 8)

Rusty.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [3/4] 2.6.21-rc5: known regressions (v2)

2007-03-30 Thread Jeff Chua


On 3/31/07, Adrian Bunk <[EMAIL PROTECTED]> wrote:


Subject: ThinkPad doesn't resume from suspend to RAM
References : http://lkml.org/lkml/2007/2/27/80
 http://lkml.org/lkml/2007/2/28/348
Submitter  : Jens Axboe <[EMAIL PROTECTED]>
 Jeff Chua <[EMAIL PROTECTED]>
Status : unknown


Fixed with CONFIG_NO_HZ unset and patch from Maxim
(http://lkml.org/lkml/2007/3/29/108).

Thanks,
Jeff,
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 9/9] clocksource: refactor duplicate registration checking

2007-03-30 Thread Daniel Walker

On Fri, 2007-03-30 at 21:59 -0400, James Morris wrote:
> On Fri, 30 Mar 2007, Daniel Walker wrote:
> 
> >  /**
> >   * clocksource_register - Used to install new clocksources
> >   * @t: clocksource to be registered
> >   *
> > - * Returns -EBUSY if registration fails, zero otherwise.
> > + * Always returns zero.
> >   */
> >  int clocksource_register(struct clocksource *c)
> 
> Return should be void, then.

Yeah, that's another patch tho ..

Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] CPUSETS: add mems to basic usage documentation

2007-03-30 Thread Simon Horman

On Fri, Mar 30, 2007 at 02:30:47AM -0700, Paul Jackson wrote:
> Simon Horman wrote:
> > +++ linux-2.6/Documentation/cpusets.txt 2007-03-30 13:03:19.0 
> > +0900
> > ...
> > +Add some mems:
> > +# /bin/echo 0-7 > mems
> 
> Nice change - thanks.
> 
> Acked-by: Paul Jackson <[EMAIL PROTECTED]>

Thanks

> (I probably would not add a dmesg complaint; we don't usually
> do that for ordinary system call failures.  Pay close attention
> to the resulting errno - in this case ENOSPC.)

Understood

-- 
Horms
  H: http://www.vergenet.net/~horms/
  W: http://www.valinux.co.jp/en/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [4/4] 2.6.21-rc5: known regressions (v2)

2007-03-30 Thread Jeff Chua


On 3/31/07, Adrian Bunk <[EMAIL PROTECTED]> wrote:


Subject: suspend to disk hangs  (CONFIG_NO_HZ)
References : http://lkml.org/lkml/2007/3/25/217
Submitter  : Jeff Chua <[EMAIL PROTECTED]>
Status : unknown


Still broken on.2.6.21-rc5.

Jeff.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 7/7] containers (V7): Container interface to nsproxy subsystem

2007-03-30 Thread Srivatsa Vaddagiri

On Mon, Feb 12, 2007 at 12:15:28AM -0800, [EMAIL PROTECTED] wrote:
> +int ns_container_clone(struct task_struct *tsk)
> +{
> + return container_clone(tsk, _subsys);
> +}

This function is a no-op if ns hierarchy is not mounted at this point.
This would mean that we will miss out on some directories in ns
hierarchy if it happened to be mounted later. It would be nice to
recreate such missing directories upon mount. However I suspect it would
not be easy ..Maybe we need to scan the task list and (re-)invoke
ns_container_clone() for every new tsk->nsproxy we find in the list.
Alternately perhaps we could auto mount (kern_mount) ns hierarchy very early at
bootup? On the flip side that would require remount support so that additional
controllers (like cpuset, mem) can be bound to (non-empty) ns hierarchy after
bootup.

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [test] hackbench.c interactivity results: vanilla versus SD/RSDL

2007-03-30 Thread Xenofon Antidides


--- Mike Galbraith <[EMAIL PROTECTED]> wrote:

> On Fri, 2007-03-30 at 15:05 +, Xenofon Antidides
> wrote:
> > - Original Message 
> > From: Ingo Molnar <[EMAIL PROTECTED]>
> > To: Con Kolivas <[EMAIL PROTECTED]>
> > Cc: linux list ;
> Andrew Morton <[EMAIL PROTECTED]>; Mike
> Galbraith <[EMAIL PROTECTED]>
> > Sent: Thursday, March 29, 2007 9:22:49 PM
> > Subject: [test] hackbench.c interactivity results:
> vanilla versus SD/RSDL
> > 
> > 
> > * Ingo Molnar <[EMAIL PROTECTED]> wrote:
> > 
> > > * Con Kolivas <[EMAIL PROTECTED]> wrote:
> > > 
> > > > I'm cautiously optimistic that we're at the
> thin edge of the bugfix 
> > > > wedge now.
> > [...]
> > 
> > > and the numbers he posted:
> > > 
> > > 
>
http://marc.info/?l=linux-kernel=117448900626028=2
> > 
> > We been staring at these numbers for while now and
> we come to the conclusion they wrong.
> > 
> > The test is f is 3 tasks, two on different and one
> on same cpu as sh here:
> > virgin 2.6.21-rc3-rsdl-smp
> > top - 13:52:50 up 7 min, 12 users,  load average:
> 3.45, 2.89, 1.51
> > 
> >   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEM
>TIME+  P COMMAND
> >  6560 root  31   0  2892 1236 1032 R   82  0.1
>   1:50.24 1 sh
> >  6558 root  28   0  1428  276  228 S   42  0.0
>   1:00.09 1 f
> >  6557 root  30   0  1424  280  228 R   35  0.0
>   1:00.25 0 f
> >  6559 root  39   0  1424  276  228 R   33  0.0
>   0:58.36 0 f
> 
> This is a 1 second sample, tasks migrate.
> 
>   -Mike

Something different on many cpus? Sorry I was thinking
something other. I try 50% run + 50% sleep on one cpu
and mainline has big problem. Sorry for bad code I
copy bits to make it work. Start program first then
run bash 100% cpu (while : ; do : ; done). Try change
program forks from 1 till 3 or more mainline kernel
and bash gets 0%.



Xant


 

Get your own web address.  
Have a HUGE year through Yahoo! Small Business.
http://smallbusiness.yahoo.com/domains/?p=BESTDEAL// gcc -O2 -o fiftyp fiftyp.c -lrt
// code from interbench.c
#include 
#include 
#include 
#include 
#include 
#include 
int forks=1;
int runus,sleepus=7000;
unsigned long loops_per_ms;
void terminal_error(const char *name)
{
	fprintf(stderr, "\n");
	perror(name);
	exit (1);
}

unsigned long long get_nsecs(struct timespec *myts)
{
	if (clock_gettime(CLOCK_REALTIME, myts))
		terminal_error("clock_gettime");
	return (myts->tv_sec * 10 + myts->tv_nsec );
}

void burn_loops(unsigned long loops)
{
	unsigned long i;

	/*
	 * We need some magic here to prevent the compiler from optimising
	 * this loop away. Otherwise trying to emulate a fixed cpu load
	 * with this loop will not work.
	 */
	for (i = 0 ; i < loops ; i++)
	 asm volatile("" : : : "memory");
}

/* Use this many usecs of cpu time */
void burn_usecs(unsigned long usecs)
{
	unsigned long ms_loops;

	ms_loops = loops_per_ms / 1000 * usecs;
	burn_loops(ms_loops);
}

void microsleep(unsigned long long usecs)
{
	struct timespec req, rem;

	rem.tv_sec = rem.tv_nsec = 0;

	req.tv_sec = usecs / 100;
	req.tv_nsec = (usecs - (req.tv_sec * 100)) * 1000;
continue_sleep:
	if ((nanosleep(, )) == -1) {
		if (errno == EINTR) {
			if (rem.tv_sec || rem.tv_nsec) {
req.tv_sec = rem.tv_sec;
req.tv_nsec = rem.tv_nsec;
goto continue_sleep;
			}
			goto out;
		}
		terminal_error("nanosleep");
	}
out:
	return;
}

/*
 * In an unoptimised loop we try to benchmark how many meaningless loops
 * per second we can perform on this hardware to fairly accurately
 * reproduce certain percentage cpu usage
 */
void calibrate_loop(void)
{
	unsigned long long start_time, loops_per_msec, run_time = 0;
	unsigned long loops;
	struct timespec myts;

	loops_per_msec = 100;
redo:
	/* Calibrate to within 1% accuracy */
	while (run_time > 101 || run_time < 99) {
		loops = loops_per_msec;
		start_time = get_nsecs();
		burn_loops(loops);
		run_time = get_nsecs() - start_time;
		loops_per_msec = (100 * loops_per_msec / run_time ? :
			loops_per_msec);
	}

	/* Rechecking after a pause increases reproducibility */
	sleep(1);
	loops = loops_per_msec;
	start_time = get_nsecs();
	burn_loops(loops);
	run_time = get_nsecs() - start_time;

	/* Tolerate 5% difference on checking */
	if (run_time > 105 || run_time < 95)
		goto redo;
 loops_per_ms=loops_per_msec;
 sleep(1);
 start_time=get_nsecs();
 microsleep(sleepus);
 run_time=get_nsecs()-start_time;
 runus=run_time/1000;
}

int main(void){
 int i;
 calibrate_loop();
 printf("starting %d forks\n",forks);
 for(i=1;i

Re: [patch 4/6] Convert PDA into the percpu section

2007-03-30 Thread Andi Kleen

On Saturday 31 March 2007 04:00, Jeremy Fitzhardinge wrote:
> Currently x86 (similar to x84-64) has a special per-cpu structure
> called "i386_pda" which can be easily and efficiently referenced via
> the %fs register.  An ELF section is more flexible than a structure,
> allowing any piece of code to use this area.  Indeed, such a section
> already exists: the per-cpu area.

Hmm, I'm a little reluctant. This moves i386 more away from x86-64
again. If we ever merge them it would mean more work. Do you really need it?



-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [bug] hung bootup in various drivers, was: "2.6.21-rc5: known regressions"

2007-03-30 Thread Kay Sievers

On Fri, 2007-03-30 at 12:32 -0700, Greg KH wrote:
> On Fri, Mar 30, 2007 at 07:46:19PM +0200, Ingo Molnar wrote:
> > 
> > * Greg KH <[EMAIL PROTECTED]> wrote:
> > 
> > > >  BUG: at drivers/base/driver.c:187 driver_unregister()
> > > >   [] show_trace_log_lvl+0x19/0x2e
> > > >   [] show_trace+0x12/0x14
> > > >   [] dump_stack+0x14/0x16
> > > >   [] driver_unregister+0x3d/0x43
> > > >   [] pci_unregister_driver+0x10/0x5f
> > > >   [] slgt_init+0x9b/0x1ca
> > > >   [] init+0x15d/0x2bd
> > > >   [] kernel_thread_helper+0x7/0x10
> > 
> > > Yes, we should allow the ability to call unregister_driver from within 
> > > the module_init function.
> > > 
> > > But I don't understand what is causing you to see this problem.  Who 
> > > is holding the reference on the struct device at this point in time?  
> > > Is it the fact that userspace has some files open and it hasn't 
> > > released them yet?
> > 
> > at least in the slgt_init() case the affected codepath is trivial:
> > 
> > if ((rc = pci_register_driver(_driver)) < 0) {
> > printk("%s pci_register_driver error=%d\n", driver_name, 
> > rc);
> > return rc;
> > }
> > pci_registered = 1;
> > 
> > if (!slgt_device_list) {
> > printk("%s no devices found\n",driver_name);
> > pci_unregister_driver(_driver);
> > return -ENODEV;
> > 
> > slgt_device_list is NULL because no matching PCI ID is on my system (i 
> > dont have this hardware), so the ->probe() function did not get called 
> > at all.
> 
> Sorry, no, I realize how this could happen in the driver, I just don't
> see what in the driver core would be keeping this driver from having
> it's release function called at the unregister() time.
> 
> Something has grabbed a reference to the driver...
> 
> Oh wait, is this code a module or built into the kernel?
> 
> If it's built in, there's still a reference counting bug in the
> module/driver hookup logic as we really don't have a "module" yet we are
> still thinking we do as we represent it in /sys/module and create the
> linkages.
> 
> I created some horrible patches to try to track this down, as it was
> reported on lkml (look for "Subject: kref refcounting breakage in mainline" )
> but never got it working correctly.
> 
> I bet if you build that code as a module, it will work just fine, can
> you try it?
> 
> Kay, did you ever get a chance to look into this reference counting
> issue?

Does the attached work for you?

Thanks,
Kay
diff --git a/include/linux/device.h b/include/linux/device.h
index caad9bb..5cf30e9 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -128,6 +128,7 @@ struct device_driver {
 
 	struct module		* owner;
 	const char 		* mod_name;	/* used for built-in modules */
+	struct module_kobject	* mkobj;
 
 	int	(*probe)	(struct device * dev);
 	int	(*remove)	(struct device * dev);
diff --git a/kernel/module.c b/kernel/module.c
index fbc51de..dcdb32b 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2384,8 +2384,13 @@ void module_add_driver(struct module *mo
 
 		/* Lookup built-in module entry in /sys/modules */
 		mkobj = kset_find_obj(_subsys.kset, drv->mod_name);
-		if (mkobj)
+		if (mkobj) {
 			mk = container_of(mkobj, struct module_kobject, kobj);
+			/* remember our module structure */
+			drv->mkobj = mk;
+			/* kset_find_obj took a reference */
+			kobject_put(mkobj);
+		}
 	}
 
 	if (!mk)
@@ -2405,17 +2410,22 @@ EXPORT_SYMBOL(module_add_driver);
 
 void module_remove_driver(struct device_driver *drv)
 {
+	struct module_kobject *mk = NULL;
 	char *driver_name;
 
 	if (!drv)
 		return;
 
 	sysfs_remove_link(>kobj, "module");
-	if (drv->owner && drv->owner->mkobj.drivers_dir) {
+
+	if (drv->owner)
+		mk = >owner->mkobj;
+	else if (drv->mkobj)
+		mk = drv->mkobj;
+	if (mk && mk->drivers_dir) {
 		driver_name = make_driver_name(drv);
 		if (driver_name) {
-			sysfs_remove_link(drv->owner->mkobj.drivers_dir,
-	  driver_name);
+			sysfs_remove_link(mk->drivers_dir, driver_name);
 			kfree(driver_name);
 		}
 	}

Re: [patch 32/37] CRYPTO: api: scatterwalk_copychunks() fails to advance through scatterlist

2007-03-30 Thread Patrick McHardy

Herbert Xu wrote:
> Indeed.  That patch was buggy.  Sorry for not catching this earlier.
> 
> This should fix the problem.

Works fine, thanks Herbert.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[-mm3 patch]Warning fix: check the return value of kobject_add etc.

2007-03-30 Thread Cong WANG


Since kobject_add, sysfs_create_link and sysfs_create_file are marked
as '__must_check', so we must always check their return values, or gcc
will give us warnings.

Signed-off-by: Cong WANG <[EMAIL PROTECTED]>

---
--- fs/partitions/check.c.orig  2007-03-30 21:35:45.0 +0800
+++ fs/partitions/check.c   2007-03-30 21:49:53.0 +0800
@@ -385,10 +385,16 @@ void add_partition(struct gendisk *disk,
p->kobj.parent = >kobj;
p->kobj.ktype = _part;
kobject_init(>kobj);
-   kobject_add(>kobj);
+   if (kobject_add(>kobj)) {
+   kfree(p);
+   return;
+   }
if (!disk->part_uevent_suppress)
kobject_uevent(>kobj, KOBJ_ADD);
-   sysfs_create_link(>kobj, _subsys.kset.kobj, "subsystem");
+   if (sysfs_create_link(>kobj, _subsys.kset.kobj, "subsystem")) {
+   kfree(p);
+   return;
+   }
if (flags & ADDPART_FLAG_WHOLEDISK) {
static struct attribute addpartattr = {
.name = "whole_disk",
@@ -396,7 +402,10 @@ void add_partition(struct gendisk *disk,
.owner = THIS_MODULE,
};

-   sysfs_create_file(>kobj, );
+   if (sysfs_create_file(>kobj, )) {
+   kfree(p);
+   return;
+   }
}
partition_sysfs_add_subdir(p);
disk->part[part-1] = p;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/13] signal/timer/event fds v8 - anonymous inode source ...

2007-03-30 Thread Linus Torvalds

On Fri, 30 Mar 2007, Andrew Morton wrote:
> > 
> > Ok, it was panincing, and someone made me change it. Would you please 
> > agree?
> > The system can survive w/out, but it'll be a broken system WRT userspace.
> 
> I'd say panic.  There's no much point in limping along with an
> incorrectly-working kernel, only to have some small number of apps fail
> mysteriously later on.

Well, in this case (since it's at bootup only), I'd agree with panic(), 
but generally I disagree - it's actually much better to have a broken 
system limping along and allowing things like syslogd to write the problem 
to log-files and generally working as well as possible.

If people can do a "dmesg" and send it out as an email, we're much more 
likely to get good bug-reports.

But for early boot, and for something that can't really happen anyway, 
panic() sounds fine.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 32/37] CRYPTO: api: scatterwalk_copychunks() fails to advance through scatterlist

2007-03-30 Thread Herbert Xu

On Sat, Mar 31, 2007 at 03:41:32AM +0200, Patrick McHardy wrote:
>
> > [CRYPTO] api: scatterwalk_copychunks() fails to advance through scatterlist
> 
> This patch seems to cause some problems, I get reproducable freezes
> on the receiving system with net-2.6.22 when sending IPsec packets
> larger than the mtu (reproduced about 10 times). Reverting this
> patch seems to fix it. In a few cases the oops also occured on the
> sending system.
> 
> Backtrace from UML (sending system):
> 
> uml:~# ping 10.0.0.1 -s 2
> PING 10.0.0.1 (10.0.0.1) 2(20028) bytes of data.
> BUG: soft lockup detected on CPU#0!
> Call Trace:

Indeed.  That patch was buggy.  Sorry for not catching this earlier.

This should fix the problem.

[CRYPTO] api: Use the right value when advancing scatterwalk_copychunks

In the scatterwalk_copychunks loop, We should be advancing by
len_this_page and not nbytes.  The latter is the total length.

Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/crypto/scatterwalk.c b/crypto/scatterwalk.c
index a664231..0f76175 100644
--- a/crypto/scatterwalk.c
+++ b/crypto/scatterwalk.c
@@ -91,7 +91,7 @@ void scatterwalk_copychunks(void *buf, struct scatter_walk 
*walk,
memcpy_dir(buf, vaddr, len_this_page, out);
scatterwalk_unmap(vaddr, out);
 
-   scatterwalk_advance(walk, nbytes);
+   scatterwalk_advance(walk, len_this_page);
 
if (nbytes == len_this_page)
break;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: strange high system cpu usage.

2007-03-30 Thread Elliott Johnson

Lee

Thanks for your help.  In testing different kernels we found that using an 
unpatched kernel from kernel.org seems to fix the problem.  I'm assuming that a 
patch added in the gentoo-sources patch set was creating the problem.  Our once 
8 minute untar is now down to 7-8 seconds with a vanilla 2.6.18.6 kernel.

If anyone is interested in our oprofile code or other info, just ask and I'll 
post it.  Otherwise I'll be reporting this to the gentoo developers.

-E

> - Original Message -
> From: "Elliott Johnson" <[EMAIL PROTECTED]>
> To: linux-kernel@vger.kernel.org
> Subject: Re: strange high system cpu usage.
> Date: Fri, 30 Mar 2007 11:54:57 +0800
> 
> 
> > What problem are you trying to solve?  IOW, how do you know it's not
> > just an artifact of diferent load average calculation between 2.4 and
> > 2.6?
> >
> > Are you actually seeing reduced throughput/performance?  Or are you
> > just looking at load average?
> >
> > Lee
> 
> Well the problem is apparent, we are having abnormally high cpu 
> usage.  It's about a
> 20-40% performance hit.
> 
> The load calculations were not between 2.4 and 2.6 kernel versions, 
> but between 2.6.8 and
> 2.6.19.  Sorry if this wasn't very clear from my last email.
> 
> In trying to diagnose the problem I also looked at memory stats 
> (vmstat) and found the
> 'buffered' memory statistic way off from the comparable debian 
> (2.6.8) install (0-300kb
> versus 500mb).
> 
> The vmstat man page has little information on this statistic and 
> there seems to be varying
> explanations on the web.  I was hoping for a decisive explanation 
> (or link) and possibly
> advice in toggling this value (or reasons not to).
> 
> I'm still trying to work on this at my end.  Some recent tests show 
> that it might be
> related to the megasas driver or the large number of small files we 
> are using on a xfs
> formated 10T array.  I'll keep at it.
> 
> Thanks for your response,
> 
> -Elliott
> 
> =
> Search for products and services at:
> http://search.mail.com
> 
> --
> Powered by Outblaze
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

>


=
Search for products and services at: 
http://search.mail.com

-- 
Powered by Outblaze
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 9/9] clocksource: refactor duplicate registration checking

2007-03-30 Thread James Morris

On Fri, 30 Mar 2007, Daniel Walker wrote:

>  /**
>   * clocksource_register - Used to install new clocksources
>   * @t:   clocksource to be registered
>   *
> - * Returns -EBUSY if registration fails, zero otherwise.
> + * Always returns zero.
>   */
>  int clocksource_register(struct clocksource *c)

Return should be void, then.



- James
-- 
James Morris
<[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 32/37] CRYPTO: api: scatterwalk_copychunks() fails to advance through scatterlist

2007-03-30 Thread Patrick McHardy

Greg KH wrote:
> -stable review patch.  If anyone has any objections, please let us know.
> 
> --
> From: J. Bruce Fields <[EMAIL PROTECTED]>
> 
> [CRYPTO] api: scatterwalk_copychunks() fails to advance through scatterlist


This patch seems to cause some problems, I get reproducable freezes
on the receiving system with net-2.6.22 when sending IPsec packets
larger than the mtu (reproduced about 10 times). Reverting this
patch seems to fix it. In a few cases the oops also occured on the
sending system.

Backtrace from UML (sending system):

uml:~# ping 10.0.0.1 -s 2
PING 10.0.0.1 (10.0.0.1) 2(20028) bytes of data.
BUG: soft lockup detected on CPU#0!
Call Trace:
61787408:  [<602b346f>] _spin_lock+0x9/0xb
61787418:  [<6004f7b7>] softlockup_tick+0xa1/0xaf
61787438:  [<6003c9d3>] run_local_timers+0x13/0x15
61787448:  [<6003c7e8>] update_process_times+0x49/0x73
61787478:  [<6001926e>] timer_handler+0x21/0x4f
617874a8:  [<60029327>] sig_handler_common_skas+0xff/0x118
617874e8:  [<6002625f>] real_alarm_handler+0x37/0x3b
61787508:  [<600262b6>] alarm_handler+0x53/0x63
61787538:  [<60027e65>] hard_handler+0x15/0x18
617875f8:  [<6015bfd9>] scatterwalk_copychunks+0x6d/0xb4
617876d8:  [<6001adda>] maybe_map+0x32/0x9f
61787728:  [<6015d332>] blkcipher_walk_next+0x11d/0x30f
61787738:  [<6006b58c>] poison_obj+0x27/0x32
61787740:  [<6015d332>] blkcipher_walk_next+0x11d/0x30f
61787758:  [<6006cc92>] cache_alloc_debugcheck_after+0xe5/0x12e
61787780:  [<6015bfbf>] scatterwalk_copychunks+0x53/0xb4
61787788:  [<6006d14e>] __kmalloc+0xb7/0xc4
617877c8:  [<6015d3b6>] blkcipher_walk_next+0x1a1/0x30f
61787828:  [<6015d186>] blkcipher_walk_done+0x12e/0x1bd
61787838:  [<6002dae3>] aes_encrypt+0x0/0xb
61787850:  [<601643d8>] xor_128+0x0/0x1c
61787878:  [<6016416d>] crypto_cbc_encrypt+0x7a/0x8b
61787918:  [<60244183>] esp_output+0x32b/0x44c
61787948:  [<602b34dc>] _spin_unlock_bh+0x12/0x14
617879c8:  [<60257051>] xfrm4_output_one+0xaa/0x16a
61787a08:  [<60257234>] xfrm4_output_finish2+0x123/0x131
61787a28:  [<6025728f>] xfrm4_output_finish+0x3d/0xb9
61787a58:  [<60257366>] xfrm4_output+0x5b/0x5d
61787a78:  [<602183a9>] ip_push_pending_frames+0x374/0x442
61787ac8:  [<6023008d>] raw_sendmsg+0x2d0/0x396
61787b78:  [<60237edd>] inet_sendmsg+0x46/0x53
61787ba8:  [<601bb5ca>] sock_sendmsg+0xea/0x103
61787c18:  [<600473b9>] autoremove_wake_function+0x0/0x39
61787c38:  [<600192d3>] add_mmap+0x37/0x149
61787c98:  [<6001b10d>] buffer_op+0x2e/0x5f
61787cd8:  [<6001b1d5>] copy_from_user_skas+0x7a/0x7c
61787d08:  [<601c2747>] verify_iovec+0x4f/0x90
61787d38:  [<601bcc32>] sys_sendmsg+0x172/0x1db
61787d68:  [<602b34b5>] _spin_unlock_irqrestore+0x18/0x1d
61787d88:  [<601924bf>] __up_read+0x76/0x7f
61787db8:  [<60049ae1>] up_read+0x9/0xb
61787dc8:  [<60019d78>] handle_page_fault+0x1f4/0x224
61787e28:  [<60019f29>] segv+0xa7/0x27e
61787ef8:  [<6001ab91>] handle_syscall+0x65/0x80
61787f08:  [<60019e7c>] segv_handler+0x68/0x6e
61787f28:  [<600287ab>] handle_trap+0xd0/0xdb
61787f68:  [<60028c2d>] userspace+0x139/0x181
61787fc8:  [<6001a8ba>] fork_handler+0x86/0x8d

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: exposing FSB clock speed in /sys

2007-03-30 Thread Andi Kleen

Stephane Eranian <[EMAIL PROTECTED]> writes:

> It seems that the kernel does not expose the Front-Side Bus (FSN) Clock
> speed to user applications. 

You mean the APIC timer frequency which happens to match the FSB 
on some CPUs? 
 
> Knowledge the the FSB speed is very useful to monitoring tools. It is used
> to compute certain bus-related metrics.

Can you describe those metrics in detail? 

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/13] signal/timer/event fds v8 - anonymous inode source ...

2007-03-30 Thread Davide Libenzi

On Fri, 30 Mar 2007, Andrew Morton wrote:

> I'd say panic.  There's no much point in limping along with an
> incorrectly-working kernel, only to have some small number of apps fail
> mysteriously later on.

Panic it is ...


> > > Can we make this optional if CONFIG_EMBEDDED?  You plan on converting 
> > > epoll
> > > to use this facility, but with CONFIG_EPOLL=n, this is all dead code?
> > 
> > Hmmm, the whole point is that all this stuff works with or without epoll. 
> > And epoll need no changes to support this.
> 
> I'm suggesting that all known clients of anon_inode be made optional. 
> Hence anon_iode can become optional too.
> 
> It's a desirable objective, at least.  The default, really.

Ok, I'll put them under Kconf.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 10/13] signal/timer/event fds v8 - eventfd core ...

2007-03-30 Thread Andrew Morton

On Fri, 30 Mar 2007 18:11:55 -0700 (PDT) Davide Libenzi 
 wrote:

> 
> > > + */
> > 
> > So it is the caller's responsibility to ensure that *file refers to an
> > eventfd file?
> 
> In which function? I lost you ...
> 

eventfd_signal() assumes that the passed in file* refers to an eventfd
file.  So if a caller passes in a file* for /etc/passwd, the kernel will go
splat.

I guess that's caveat emptor, and any violations of that will show up
quickly in testing.  My main concern would be that there might be some way
for a naughty user to force the kernel to pass a non-eventfd file* into
this function.  That depends upon as-yet-unwritten code - is there a risk
of this happening, and how do we prevent it?

> 
> > > +int eventfd_signal(struct file *file, int n)
> > > +{
> > > + struct eventfd_ctx *ctx = file->private_data;
> > > + unsigned long flags;
> > > +
> > > + if (n < 0)
> > > + return -EINVAL;
> > > + spin_lock_irqsave(>lock, flags);
> > > + if (ULLONG_MAX - ctx->count < n)
> > > + n = (int) (ULLONG_MAX - ctx->count);
> > > + ctx->count += n;
> > > + if (waitqueue_active(>wqh))
> > > + wake_up_locked(>wqh);
> > > + spin_unlock_irqrestore(>lock, flags);
> > > +
> > > + return n;
> > > +}
>
>
>
> > > + DECLARE_WAITQUEUE(wait, current);
> > > +
> > > + if (count < sizeof(ucnt))
> > > + return -EINVAL;
> > > + if (get_user(ucnt, (const __u64 __user *) buf))
> > > + return -EFAULT;
> > 
> > Some architectures do not implement 64-bit get_user()
> 
> copy_from_user it is, then ...
> 

spose so.  I think architectures _should_ implement 64-bit get_user() and
put_user() nowadays.  So you could leave the code as-is and inform the arch
maintainers, if you're feeling keen.

If all this code has its own Kconfig options then the architectures won't
break until their maintainers come along to enable the new features, so
they'll implement 64-bit get_user() at that time and things will all unfold
in a nicely non-chaotic fashion.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: drivers/video/aty/atyfb_base.c: array overruns

2007-03-30 Thread Antonino A. Daplas

On Mon, 2007-03-19 at 10:22 +0100, Adrian Bunk wrote:
> The Coverity checker spotted the following two array overruns in 
> drivers/video/aty/atyfb_base.c:
> 
> <--  snip  -->
> 
> ...
> static const u32 lt_lcd_regs[] = {
> CONFIG_PANEL_LG,
> LCD_GEN_CNTL_LG,
> DSTN_CONTROL_LG,
> HFB_PITCH_ADDR_LG,
> HORZ_STRETCHING_LG,
> VERT_STRETCHING_LG,
> 0, /* EXT_VERT_STRETCH */
> LT_GIO_LG,
> POWER_MANAGEMENT_LG
> };

We can pad this array with zeroes, as a stop-gap measure. Ville, what do
you think?

Tony

> 
> void aty_st_lcd(int index, u32 val, const struct atyfb_par *par)
> {
> if (M64_HAS(LT_LCD_REGS)) {
> aty_st_le32(lt_lcd_regs[index], val, par);
> ...
> }
> ...
> u32 aty_ld_lcd(int index, const struct atyfb_par *par)
> {
> if (M64_HAS(LT_LCD_REGS)) {
> return aty_ld_le32(lt_lcd_regs[index], par);
> ...
> }
> ...
> static int aty_bl_update_status(struct backlight_device *bd)
> {
> struct atyfb_par *par = class_get_devdata(>class_dev);
> unsigned int reg = aty_ld_lcd(LCD_MISC_CNTL, par);
> ...
> aty_st_lcd(LCD_MISC_CNTL, reg, par);
> 
> return 0;
> }
> ...
> 
> <--  snip  -->
> 
> LCD_MISC_CNTL = 0x14 = 20 > 8
> 
> cu
> Adrian
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [uml-devel] [patch 06/37] UML - Fix static linking

2007-03-30 Thread Blaisorblade

On venerdì 30 marzo 2007, Greg KH wrote:
> -stable review patch.  If anyone has any objections, please let us know.

I have one objection, the fix has a typo! This is the additional fix 
(note '.note' instead of 'note'):

--- linux-2.6.git.orig/include/asm-um/common.lds.S
+++ linux-2.6.git/include/asm-um/common.lds.S
@@ -15,7 +15,7 @@
   PROVIDE (_unprotected_end = .);

   . = ALIGN(4096);
-  .note : { *(note.*) }
+  .note : { *(.note.*) }
   __start___ex_table = .;
   __ex_table : { *(__ex_table) }
   __stop___ex_table = .;

With this, the fix should be merged - I just re-hit this bug and rechecked 
everything, now it's ok.

> --
> From: Jeff Dike <[EMAIL PROTECTED]>
>
> During a static link, ld has started putting a .note section in the
> .uml.setup.init section.  This has the result that the UML setups
> begin with 32 bytes of garbage and UML crashes immediately on boot.
>
> This patch creates a specific .note section for ld to drop this stuff
> into.
>
> Signed-off-by: Jeff Dike <[EMAIL PROTECTED]>
> Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
>
> ---
>  include/asm-um/common.lds.S |1 +
>  1 file changed, 1 insertion(+)
>
> --- a/include/asm-um/common.lds.S
> +++ b/include/asm-um/common.lds.S
> @@ -15,6 +15,7 @@
>PROVIDE (_unprotected_end = .);
>
>. = ALIGN(4096);
> +  .note : { *(note.*) }
>__start___ex_table = .;
>__ex_table : { *(__ex_table) }
>__stop___ex_table = .;



-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] [PATCH] uml: fix static linking for real

2007-03-30 Thread Paolo 'Blaisorblade' Giarrusso

There was a typo in commit 7632fc8f809a97f9d82ce125e8e3e579390ce2e5, preventing
it from working - 32bit binaries crashed hopelessly before the below fix and
work perfectly now.
Merge for 2.6.21, please.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 include/asm-um/common.lds.S |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/asm-um/common.lds.S b/include/asm-um/common.lds.S
index b16222b..f5de80c 100644
--- a/include/asm-um/common.lds.S
+++ b/include/asm-um/common.lds.S
@@ -15,7 +15,7 @@
   PROVIDE (_unprotected_end = .);
 
   . = ALIGN(4096);
-  .note : { *(note.*) }
+  .note : { *(.note.*) }
   __start___ex_table = .;
   __ex_table : { *(__ex_table) }
   __stop___ex_table = .;



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 10/13] signal/timer/event fds v8 - eventfd core ...

2007-03-30 Thread Davide Libenzi

On Fri, 30 Mar 2007, Davide Libenzi wrote:

> > Some architectures do not implement 64-bit get_user()
> 
> copy_from_user it is, then ...

That's messed up though. We do have put_user and we miss get_user. Bah...



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 10/13] signal/timer/event fds v8 - eventfd core ...

2007-03-30 Thread Davide Libenzi

On Fri, 30 Mar 2007, Andrew Morton wrote:

> > +struct eventfd_ctx {
> > +   spinlock_t lock;
> > +   wait_queue_head_t wqh;
> > +   __u64 count;
> > +};
> 
> Again, can we borrow wqh.lock?
> 
> `count' needs documentation - these things are key to understanding the
> code.

Added.



> > + */
> 
> So it is the caller's responsibility to ensure that *file refers to an
> eventfd file?

In which function? I lost you ...



> > +int eventfd_signal(struct file *file, int n)
> > +{
> > +   struct eventfd_ctx *ctx = file->private_data;
> > +   unsigned long flags;
> > +
> > +   if (n < 0)
> > +   return -EINVAL;
> > +   spin_lock_irqsave(>lock, flags);
> > +   if (ULLONG_MAX - ctx->count < n)
> > +   n = (int) (ULLONG_MAX - ctx->count);
> > +   ctx->count += n;
> > +   if (waitqueue_active(>wqh))
> > +   wake_up_locked(>wqh);
> > +   spin_unlock_irqrestore(>lock, flags);
> > +
> > +   return n;
> > +}
> 
> Neither the incoming arg (usefully named "n") nor the return value are
> documented.

Documented now.




> Needs interface documentation, please.  Even the changelog doesn't tell us
> what an EAGAIN return from read() means.

I'll be adding the errno documentation to all of them.




> > +static ssize_t eventfd_write(struct file *file, const char __user *buf, 
> > size_t count,
> > +loff_t *ppos)
> > +{
> > +   struct eventfd_ctx *ctx = file->private_data;
> > +   ssize_t res;
> > +   __u64 ucnt;
> > +   DECLARE_WAITQUEUE(wait, current);
> > +
> > +   if (count < sizeof(ucnt))
> > +   return -EINVAL;
> > +   if (get_user(ucnt, (const __u64 __user *) buf))
> > +   return -EFAULT;
> 
> Some architectures do not implement 64-bit get_user()

copy_from_user it is, then ...



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 6/13] signal/timer/event fds v8 - timerfd core ...

2007-03-30 Thread Andrew Morton

On Fri, 30 Mar 2007 17:47:28 -0700 (PDT) Davide Libenzi 
 wrote:

> On Fri, 30 Mar 2007, Andrew Morton wrote:
> 
> > > +struct timerfd_ctx {
> > > + struct hrtimer tmr;
> > > + ktime_t tintv;
> > > + spinlock_t lock;
> > > + wait_queue_head_t wqh;
> > > + unsigned long ticks;
> > > +};
> > 
> > Did you consider using the (presently unused) lock inside wqh instead of
> > adding a new one?  That's a little bit rude, poking into waitqueue
> > internals like that, but we do it elsewhere and tricks like that are
> > acceptable in core-kernel, I guess.
> 
> Please, no. Gain is not worth the plug into the structure design IMO.
> 

The decision is not that obvious - your patch's main use of
timerfd_ctx.lock is to provide locking for wqh - ie: to duplicate the
function of the existing lock which is there for that purpose.

So I think it's a legitimate optimisation to borrow it.

> 
> > > +static enum hrtimer_restart timerfd_tmrproc(struct hrtimer *htmr)
> > > +{
> > > + struct timerfd_ctx *ctx = container_of(htmr, struct timerfd_ctx, tmr);
> > > + enum hrtimer_restart rval = HRTIMER_NORESTART;
> > > + unsigned long flags;
> > > +
> > > + spin_lock_irqsave(>lock, flags);
> > > + ctx->ticks++;
> > > + wake_up_locked(>wqh);
> > > + if (ctx->tintv.tv64 != 0) {
> > > + hrtimer_forward(htmr, hrtimer_cb_get_time(htmr), ctx->tintv);
> > > + rval = HRTIMER_RESTART;
> > > + }
> > > + spin_unlock_irqrestore(>lock, flags);
> > > +
> > > + return rval;
> > > +}
> > 
> > What's this do?
> 
> Really, do we need to comment such trivial code? There is *nothing* that 
> is worth a line of comment in there. IMO useless comment are more annoying 
> than blank lines.
> 

Look at it from the point of view of someone who knows kernel code but does
not specifically know this subsystem.  That describes the great majority of
people who will be reading your code.

> 
> 
> > > +static void timerfd_setup(struct timerfd_ctx *ctx, int clockid, int 
> > > flags,
> > > +   const struct itimerspec *ktmr)
> > > +{
> > > + enum hrtimer_mode htmode;
> > > + ktime_t texp;
> > > +
> > > + htmode = (flags & TFD_TIMER_ABSTIME) ? HRTIMER_MODE_ABS: 
> > > HRTIMER_MODE_REL;
> > > +
> > > + texp = timespec_to_ktime(ktmr->it_value);
> > > + ctx->ticks = 0;
> > > + ctx->tintv = timespec_to_ktime(ktmr->it_interval);
> > > + hrtimer_init(>tmr, clockid, htmode);
> > > + ctx->tmr.expires = texp;
> > > + ctx->tmr.function = timerfd_tmrproc;
> > > + if (texp.tv64 != 0)
> > > + hrtimer_start(>tmr, texp, htmode);
> > > +}
> > 
> > What does the special case texp.tv64 == 0 signify?  Is that obvious to
> > anyone who understands hrtimers?  Is it something which we can expect
> > Micheal to immediately understand?  Should it be documented somewhere?
> 
> Michael should not read the code, but the patch description that comes 
> with it ;)
> 

To some extent, yes - there's a lot of material which is relevant to a
complex system call like this which isn't appropriate to code comments.

But a descrition of the role of texp.tv64 in here is an aid to
understanding the implementation and hence is appropriate and needed.

> 
> > > +asmlinkage long sys_timerfd(int ufd, int clockid, int flags,
> > > + const struct itimerspec __user *utmr)
> > 
> > Somehow we need to get from this to a manpage.
> 
> Again, the patch description describes (modulo returned errno's) the API 
> pretty well.
> 

A basic description of the inputs, outputs and return value is appropriate
to most high-level kernel funtions.  One here won't hurt.

> 
> 
> > OK, this is briefly documented in the patch changelog.  That interface
> > documentation should be fleshed out and moved into the .c file.  a) because
> > it is easier to find and b) if we change it, it's a bit hard to go back and
> > alter that changelog!
> 
> I think it's better to leave it out of the code, and keep it in the patch 
> header.
> 

Patch headers are not maintainable.

Nobody wants to have to go off and waddle though the git repo to understand
the design intent behind each function.

Look, I'm just providing feedback as an experienced kernel developer who is
reading your code for the first time.  I had questions, and I saw things
which I felt were not adequately communicated.  You are the last person who
can judge what is obvious and what is not, because you already understand
it!

I do err on the make-it-easy-for-them side, but that's not a bad thing, I
think.  Very large numbers of people read core kernel code and the actual
change rate of this code will be low.  So we can afford to put the effort
into making these peoples' code-reading as productive as we can.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 6/13] signal/timer/event fds v8 - timerfd core ...

2007-03-30 Thread Davide Libenzi

On Fri, 30 Mar 2007, Andrew Morton wrote:

> > +struct timerfd_ctx {
> > +   struct hrtimer tmr;
> > +   ktime_t tintv;
> > +   spinlock_t lock;
> > +   wait_queue_head_t wqh;
> > +   unsigned long ticks;
> > +};
> 
> Did you consider using the (presently unused) lock inside wqh instead of
> adding a new one?  That's a little bit rude, poking into waitqueue
> internals like that, but we do it elsewhere and tricks like that are
> acceptable in core-kernel, I guess.

Please, no. Gain is not worth the plug into the structure design IMO.



> I find that the key to understanding kernel code is to understand the data
> structures and the relationships between them.  Once you have that in your
> head, the code tends to just fall out.  Hence there is good maintainability
> payoff in putting work into documenting the struct, its fields, the
> relationship between this struct and other structs, and any and all locking
> requirements.
> 
> 

Seemed obvious to me, but comment added.



> > +static enum hrtimer_restart timerfd_tmrproc(struct hrtimer *htmr);
> > +static void timerfd_setup(struct timerfd_ctx *ctx, int clockid, int flags,
> > + const struct itimerspec *ktmr);
> > +static int timerfd_close(struct inode *inode, struct file *file);
> > +static unsigned int timerfd_poll(struct file *file, poll_table *wait);
> > +static ssize_t timerfd_read(struct file *file, char __user *buf, size_t 
> > count,
> > +   loff_t *ppos);
> 
> It'd be nice to find a way to make these declarations go away.

Gone.


> 
> > +
> > +
> > +
> 
> blankness.

You blank freak! :)



> > +static const struct file_operations timerfd_fops = {
> > +   .release= timerfd_close,
> 
> Rename to timerfd_release

Done.




> > +static enum hrtimer_restart timerfd_tmrproc(struct hrtimer *htmr)
> > +{
> > +   struct timerfd_ctx *ctx = container_of(htmr, struct timerfd_ctx, tmr);
> > +   enum hrtimer_restart rval = HRTIMER_NORESTART;
> > +   unsigned long flags;
> > +
> > +   spin_lock_irqsave(>lock, flags);
> > +   ctx->ticks++;
> > +   wake_up_locked(>wqh);
> > +   if (ctx->tintv.tv64 != 0) {
> > +   hrtimer_forward(htmr, hrtimer_cb_get_time(htmr), ctx->tintv);
> > +   rval = HRTIMER_RESTART;
> > +   }
> > +   spin_unlock_irqrestore(>lock, flags);
> > +
> > +   return rval;
> > +}
> 
> What's this do?

Really, do we need to comment such trivial code? There is *nothing* that 
is worth a line of comment in there. IMO useless comment are more annoying 
than blank lines.





> > +static void timerfd_setup(struct timerfd_ctx *ctx, int clockid, int flags,
> > + const struct itimerspec *ktmr)
> > +{
> > +   enum hrtimer_mode htmode;
> > +   ktime_t texp;
> > +
> > +   htmode = (flags & TFD_TIMER_ABSTIME) ? HRTIMER_MODE_ABS: 
> > HRTIMER_MODE_REL;
> > +
> > +   texp = timespec_to_ktime(ktmr->it_value);
> > +   ctx->ticks = 0;
> > +   ctx->tintv = timespec_to_ktime(ktmr->it_interval);
> > +   hrtimer_init(>tmr, clockid, htmode);
> > +   ctx->tmr.expires = texp;
> > +   ctx->tmr.function = timerfd_tmrproc;
> > +   if (texp.tv64 != 0)
> > +   hrtimer_start(>tmr, texp, htmode);
> > +}
> 
> What does the special case texp.tv64 == 0 signify?  Is that obvious to
> anyone who understands hrtimers?  Is it something which we can expect
> Micheal to immediately understand?  Should it be documented somewhere?

Michael should not read the code, but the patch description that comes 
with it ;)




> > +asmlinkage long sys_timerfd(int ufd, int clockid, int flags,
> > +   const struct itimerspec __user *utmr)
> 
> Somehow we need to get from this to a manpage.

Again, the patch description describes (modulo returned errno's) the API 
pretty well.




> OK, this is briefly documented in the patch changelog.  That interface
> documentation should be fleshed out and moved into the .c file.  a) because
> it is easier to find and b) if we change it, it's a bit hard to go back and
> alter that changelog!

I think it's better to leave it out of the code, and keep it in the patch 
header.



> How come it's OK to truncate 64-bit timerfd_ctx.ticks to 32-bit like this?

2^32 ticks should be fine. I could make it a 64 bit thing, but IMO 32 bit 
is OK.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Intel DP965LT Mainboard running?

2007-03-30 Thread Kok, Auke


Grant Coady wrote:

On Sat, 31 Mar 2007 00:31:38 +0200, Oliver Joa <[EMAIL PROTECTED]> wrote:


Hi,

does anyone have a running Intel DP965LT Mainboard? I can not get this 
Board running. You can see the Problems in the Thread "Corrupt 
XFS-Filesystems on new Hardware and Kernel". Please can you give me a 
running Kernel-Config?


http://bugsplatter.mine.nu/system/dp965lt.html  some notes and gotchas
http://bugsplatter.mine.nu/test/boxen/silly/configs and dmesgs

I've only had reiserfs and ext3 going, not XFS.  


that page mentions that the onboard NIC has problems linking at 100mbit. Have 
you tried debugging the issue with us? If you can, open up a mail to 
[EMAIL PROTECTED] or file a ticket at e1000.sf.net?!


Cheers,

Auke
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2/13] signal/timer/event fds v7 - signalfd core ...

2007-03-30 Thread Davide Libenzi

On Fri, 30 Mar 2007, Andrew Morton wrote:

> General comments:
> 
> - All these patches will be considered a 100% regression by the
>   linux-on-a-cellphone people.  What do we have to do to make all of this
>   stuff Kconfigurable?

I guess we can, yes.



> - All this code is moving us toward being able to unify all asynchronous
>   event handling under epoll, yes?
> 
>   If so, it is a competitor to kevent, only it's coming from the other
>   direction.
> 
>   I personally find it an attractive competitor, because it is much more
>   incremental and is easier from a design POV.  But what are its
>   shortcomings wrt kevent?  Do we have a feel for the what the performance
>   difference will be?
> 
>   Which other kernel subsystems need to be wired up for this approach to
>   reach the same level of capability as kevent?

Those patches are not bound to an interface to be used, that's the whole 
point of it. You can use it with POSIX select/poll if you want.
Epoll was there, and was already covering the huge set of pollable 
devices. Timers, signals and event fds complement this set (and you 
don't need epoll to use them). The KAIO notification coming to an eventfd 
(last patch of the serie - like 30 lines), allows you to listen for KAIO 
readiness on an event fd (hence using either selct/poll/epoll).




> - Some poor schmuck needs to document all this stuff.  Other poor schmucks
>   need to program to it, and to develop libraries which talk to it, etc. 
>   Other schmucks need to understand and maintain it.  I judge the code and
>   the patches to be inadequately documented.

Well, many Linux man pages are smaller than the API description that comes 
with those patches ;)




>   Apart from general code commentary, which I will point out at the
>   relevant sites, I wonder about things like:
> 
>   - What are the sharing semantics?
> 
> - Across dup(), dup2() and fork()?

They work. But you'd still be "listening" the the sighand that created the 
signalfd. Until that sighand gets detached. After that you read(2) zero 
bytes, that tells you that the "remote disconnected" ;)




> - If two !CLONE_SIGHAND, CLONE_FS threads are sharing a signalfd and one
>   alters its signal mask?

The "mask" is private to the file*, and does not alter the sighand one.
A thread can fetch other one signals if you like.



> - If two processes are sharing a signalfd across fork() and one
>   alters its signal mask or something?

Which signal mask? Process one or signalfd one?



>   - What are the effects upon the signalfd if the process alters its
> signal state?

As I said, signalfd and process competes over dequeue_signal for the 
signal fetch. You get a given signal once, either on the fd or with std 
async delivery. If you want to be sure to get it always on the fd, you 
need to block it.




>   - What happens if a task has multiple signalfds open?  Does one
> signal get delivered to all of the fds?

Signalfds compete over dequeue_signal(), so only one of them will get it.




>   IMO all combinations and permutations should be documented for
>   posterity and it should be done now so we can review this design.

Ok.



> > +static int signalfd_lock(struct signalfd_ctx *ctx, struct signalfd_lockctx 
> > *lk);
> > +static void signalfd_unlock(struct signalfd_lockctx *lk);
> > +static void signalfd_cleanup(struct signalfd_ctx *ctx);
> > +static int signalfd_close(struct inode *inode, struct file *file);
> > +static unsigned int signalfd_poll(struct file *file, poll_table *wait);
> > +static int signalfd_copyinfo(struct signalfd_siginfo __user *uinfo,
> > +siginfo_t const *kinfo);
> 
> `cosnt siginfo_t *', please.
> 
> I dunno, I find all these forward declarations to be a fugly waste of
> space, and a maintenance hassle.  I think a lot of them can be made to go
> away with some very simple code reorganisations.

Done.




> > +static ssize_t signalfd_read(struct file *file, char __user *buf, size_t 
> > count,
> > +loff_t *ppos);
> > +
> > +
> > +
> > +static const struct file_operations signalfd_fops = {
> > +   .release= signalfd_close,
> 
> Please rename signalfd_close to signalfd_release.

Done.



> > +static int signalfd_lock(struct signalfd_ctx *ctx, struct signalfd_lockctx 
> > *lk)
> > +{
> > +   struct sighand_struct *sighand = NULL;
> > +
> > +   rcu_read_lock();
> > +   lk->tsk = rcu_dereference(ctx->tsk);
> > +   if (likely(lk->tsk != NULL))
> > +   sighand = lock_task_sighand(lk->tsk, >flags);
> > +   rcu_read_unlock();
> > +
> > +   if (sighand && !ctx->tsk) {
> > +   unlock_task_sighand(lk->tsk, >flags);
> > +   sighand = NULL;
> > +   }
> > +
> > +   return sighand != NULL;
> > +}
> 
> This function needs documentation - it really is quite obscure.  What does
> its return value mean?  Why does it sometimes do lock_task_sighand() and
> sometimes does not?  I assume that it's handling exitted tasks

Re: [1/4] 2.6.21-rc5: known regressions (v2)

2007-03-30 Thread Michal Jaegermann

On Fri, Mar 30, 2007 at 11:32:09PM +0200, Adrian Bunk wrote:
> 
> Subject: kernels fail to boot with drives on ATIIXP controller
>  (ACPI/IRQ related)
> References : https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=229621
>  http://lkml.org/lkml/2007/3/4/257
> Submitter  : Michal Jaegermann <[EMAIL PROTECTED]>
> Status : unknown

I have now even better one with pata_via.  A kernel, which for
all practical purposes is 2.6.21-rc5, not only refuses to boot
(and I cannot find some option combination which would allow me to
do so anyway) but simply refuses to read _any_ data from a media.
This included a partitioning information.

Earlier kernel on the same hardware boots without raising any fuss.

Details are collected as
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=234650

   Michal
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/4] [SCSI]stex: fix id mapping issue

2007-03-30 Thread Jeff Garzik


Ed Lin wrote:

The internal id/lun mapping of st_vsc and st_vsc1 controllers is different
from st_shasta. The original driver code can only  map first 16 'entities'
for st_vsc and st_vsc1 while there are actually 128 available.

Also the  ST_MAX_LUN_PER_TARGET should be 8, although this can do
no harm because inquiries beyond boundary are discarded by firmware.

The correct internal mapping should be:
id:0~15, lun:0~7 (st_shasta)
id:0, lun:0~127 (st_yosemite)
id:0~127, lun:0 (st_vsc and st_vsc1)
To scsi mid layer they are all channel:0~7, id:0~15, lun:0, with a maximun
'entity' number of 128. The RAID console only interfaces to scsi mid layer
and is always mapped at channel:0, id:16, lun:0.

Signed-off-by: Ed Lin <[EMAIL PROTECTED]>


ACK patches 1-4.  I presume James will apply them to scsi-fixes...

Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-30 Thread Denis Vlasenko

On Thursday 08 March 2007 18:28, Linus Torvalds wrote:
> The sad part is that there really is no reason why the BSD crowd couldn't 
> have done recvmsg() as an "extended read with per-system call flags", 
> which would have made things like O_NONBLOCK etc unnecessary, because you 
> could do it just with MSG_DONTWAIT..

Wait a second here... O_NONBLOCK is not just unnecessary - it's buggy!

Try to do nonblocking read from stdin (fd #0) -
* setting O_NONBLOCK with fcntl will set it for all other processes
  which has the same stdin!
* trying to reset O_NONBLOCK after the read doesn't help (think kill -9)
* duping fd #0 doesn't help because O_NONBLOCK is not per-fd,
  it's shared just like filepos.

I really like that trick with recvmsg + MSG_DONTWAIT instead.
--
vda
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Intel DP965LT Mainboard running?

2007-03-30 Thread Grant Coady

On Sat, 31 Mar 2007 00:31:38 +0200, Oliver Joa <[EMAIL PROTECTED]> wrote:

>Hi,
>
>does anyone have a running Intel DP965LT Mainboard? I can not get this 
>Board running. You can see the Problems in the Thread "Corrupt 
>XFS-Filesystems on new Hardware and Kernel". Please can you give me a 
>running Kernel-Config?

http://bugsplatter.mine.nu/system/dp965lt.html  some notes and gotchas
http://bugsplatter.mine.nu/test/boxen/silly/configs and dmesgs

I've only had reiserfs and ext3 going, not XFS.  

Grant.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/4] [SCSI]stex: minor cleanup and version update

2007-03-30 Thread Ed Lin

Add debug information into abort and host_reset routine.
Change ioremap to ioremap_nocache.
Version updated to 3.6..1.

Signed-off-by: Ed Lin <[EMAIL PROTECTED]>
---
diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c
index 9465f35..5a10cfa 100644
--- a/drivers/scsi/stex.c
+++ b/drivers/scsi/stex.c
@@ -32,11 +32,12 @@ #include 
 #include 
 #include 
 #include 
+#include 
 
 #define DRV_NAME "stex"
-#define ST_DRIVER_VERSION "3.1.0.1"
+#define ST_DRIVER_VERSION "3.6..1"
 #define ST_VER_MAJOR   3
-#define ST_VER_MINOR   1
+#define ST_VER_MINOR   6
 #define ST_OEM 0
 #define ST_BUILD_VER   1
 
@@ -1007,6 +1008,11 @@ static int stex_abort(struct scsi_cmnd *
u32 data;
int result = SUCCESS;
unsigned long flags;
+
+   printk(KERN_INFO DRV_NAME
+   "(%s): aborting command\n", pci_name(hba->pdev));
+   scsi_print_command(cmd);
+
base = hba->mmio_base;
spin_lock_irqsave(host->host_lock, flags);
if (tag < host->can_queue && hba->ccb[tag].cmd == cmd)
@@ -1092,6 +1098,10 @@ static int stex_reset(struct scsi_cmnd *
unsigned long before;
hba = (struct st_hba *) >device->host->hostdata[0];
 
+   printk(KERN_INFO DRV_NAME
+   "(%s): resetting host\n", pci_name(hba->pdev));
+   scsi_print_command(cmd);
+
hba->mu_status = MU_STATE_RESETTING;
 
if (hba->cardtype == st_shasta)
@@ -1211,7 +1221,7 @@ stex_probe(struct pci_dev *pdev, const s
goto out_scsi_host_put;
}
 
-   hba->mmio_base = ioremap(pci_resource_start(pdev, 0),
+   hba->mmio_base = ioremap_nocache(pci_resource_start(pdev, 0),
pci_resource_len(pdev, 0));
if ( !hba->mmio_base) {
printk(KERN_ERR DRV_NAME "(%s): memory map failed\n",


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/4] [SCSI]stex: extend hard reset wait time

2007-03-30 Thread Ed Lin

During hard bus reset of st_shasta controllers, 1 ms is not enough for
16-port controllers, although it's good for 8-port controllers.  Extend the
wait time to 100  ms to allow bus resets finish successfully.

Signed-off-by: Ed Lin <[EMAIL PROTECTED]>
---
diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c
index 4d68533..1e8d7ac 100644
--- a/drivers/scsi/stex.c
+++ b/drivers/scsi/stex.c
@@ -1055,7 +1055,12 @@ static void stex_hard_reset(struct st_hb
pci_read_config_byte(bus->self, PCI_BRIDGE_CONTROL, _bctl);
pci_bctl |= PCI_BRIDGE_CTL_BUS_RESET;
pci_write_config_byte(bus->self, PCI_BRIDGE_CONTROL, pci_bctl);
-   msleep(1);
+
+   /*
+* 1 ms may be enough for 8-port controllers. But 16-port controllers
+* require more time to finish bus reset. Use 100 ms here for safety
+*/
+   msleep(100);
pci_bctl &= ~PCI_BRIDGE_CTL_BUS_RESET;
pci_write_config_byte(bus->self, PCI_BRIDGE_CONTROL, pci_bctl);
 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/4] [SCSI]stex: fix reset recovery for console device

2007-03-30 Thread Ed Lin

After reset completed, the scsi error handler sends out START_STOP
and TEST_UNIT_READY to the device. For 'normal' devices these
commands will be handled by firmware. However, because the RAID
console only interfaces to scsi mid layer, the firmware will not process
these commands for it. This will make the console to be offlined right
after reset. Add the handling in driver to fix this problem.

Signed-off-by: Ed Lin <[EMAIL PROTECTED]>
---
diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c
index 1e8d7ac..9465f35 100644
--- a/drivers/scsi/stex.c
+++ b/drivers/scsi/stex.c
@@ -605,6 +605,14 @@ stex_queuecommand(struct scsi_cmnd *cmd,
stex_invalid_field(cmd, done);
return 0;
}
+   case TEST_UNIT_READY:
+   case START_STOP:
+   if (id == ST_MAX_ARRAY_SUPPORTED) {
+   cmd->result = DID_OK << 16 | COMMAND_COMPLETE << 8;
+   done(cmd);
+   return 0;
+   }
+   break;
case INQUIRY:
if (id != ST_MAX_ARRAY_SUPPORTED)
break;


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/4] [SCSI]stex: fix id mapping issue

2007-03-30 Thread Ed Lin

The internal id/lun mapping of st_vsc and st_vsc1 controllers is different
from st_shasta. The original driver code can only  map first 16 'entities'
for st_vsc and st_vsc1 while there are actually 128 available.

Also the  ST_MAX_LUN_PER_TARGET should be 8, although this can do
no harm because inquiries beyond boundary are discarded by firmware.

The correct internal mapping should be:
id:0~15, lun:0~7 (st_shasta)
id:0, lun:0~127 (st_yosemite)
id:0~127, lun:0 (st_vsc and st_vsc1)
To scsi mid layer they are all channel:0~7, id:0~15, lun:0, with a maximun
'entity' number of 128. The RAID console only interfaces to scsi mid layer
and is always mapped at channel:0, id:16, lun:0.

Signed-off-by: Ed Lin <[EMAIL PROTECTED]>
---
diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c
index 69be132..4d68533 100644
--- a/drivers/scsi/stex.c
+++ b/drivers/scsi/stex.c
@@ -115,7 +115,7 @@ enum {
 
ST_MAX_ARRAY_SUPPORTED  = 16,
ST_MAX_TARGET_NUM   = (ST_MAX_ARRAY_SUPPORTED+1),
-   ST_MAX_LUN_PER_TARGET   = 16,
+   ST_MAX_LUN_PER_TARGET   = 8,
 
st_shasta   = 0,
st_vsc  = 1,
@@ -645,12 +645,16 @@ stex_queuecommand(struct scsi_cmnd *cmd,
 
req = stex_alloc_req(hba);
 
-   if (hba->cardtype == st_yosemite) {
-   req->lun = lun * (ST_MAX_TARGET_NUM - 1) + id;
-   req->target = 0;
-   } else {
+   if (hba->cardtype == st_shasta) {
req->lun = lun;
req->target = id;
+   } else if (hba->cardtype == st_yosemite){
+   req->lun = id * ST_MAX_LUN_PER_TARGET + lun;
+   req->target = 0;
+   } else {
+   /* st_vsc and st_vsc1 */
+   req->lun = 0;
+   req->target = id * ST_MAX_LUN_PER_TARGET + lun;
}
 
/* cdb */


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/3] slab: avoid __initdata warning (may be a bogus one)

2007-03-30 Thread Paolo 'Blaisorblade' Giarrusso

set_up_list3s is not __init and references initkmem_list3.

Also, kmem_cache_create calls setup_cpu_cache which calls set_up_list3s. The
state machine _may_ prevent the code from accessing this data after freeing
initdata (it makes sure it's used only up to boot), so this warning may be a
false positive.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 mm/slab.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 0934f8d..0772faf 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -305,7 +305,7 @@ struct kmem_list3 {
  * Need this for bootstrapping a per node allocator.
  */
 #define NUM_INIT_LISTS (2 * MAX_NUMNODES + 1)
-struct kmem_list3 __initdata initkmem_list3[NUM_INIT_LISTS];
+struct kmem_list3 initkmem_list3[NUM_INIT_LISTS];
 #defineCACHE_CACHE 0
 #defineSIZE_AC 1
 #defineSIZE_L3 (1 + MAX_NUMNODES)



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/3] utrace - uml: make UML compile with utrace enabled

2007-03-30 Thread Paolo 'Blaisorblade' Giarrusso

* The prototype of arch_ptrace doesn't match the one in include/linux/ptrace.h.
* utrace_um_native is referred to by utrace_native_view but never defined.

Cc: Jeff Dike <[EMAIL PROTECTED]>
Cc: Roland McGrath <[EMAIL PROTECTED]>
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 arch/um/kernel/ptrace.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/um/kernel/ptrace.c b/arch/um/kernel/ptrace.c
index f66d01c..a42caf3 100644
--- a/arch/um/kernel/ptrace.c
+++ b/arch/um/kernel/ptrace.c
@@ -16,7 +16,12 @@ void ptrace_disable(struct task_struct *child)
 { 
 }
 
-long arch_ptrace(struct task_struct *child, long request, long addr, long data)
+const struct utrace_regset_view utrace_um_native;
+
+int arch_ptrace(long *request, struct task_struct *child,
+  struct utrace_attached_engine *engine,
+  unsigned long addr, unsigned long data,
+  long *retval)
 {
return -ENOSYS;
 }



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [uml-devel] [PATCH 1/2] UML - Fix umid in xterm titles

2007-03-30 Thread Blaisorblade

On venerdì 30 marzo 2007, Jeff Dike wrote:
> From: Davide Brini <[EMAIL PROTECTED]>
>
> Calls lines_init() *after* xterm_title is modified to include umid.
>
> Signed-off-by: Davide Brini <[EMAIL PROTECTED]>
> Signed-off-by: Jeff Dike <[EMAIL PROTECTED]>

Acked-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

> --
>  arch/um/drivers/ssl.c   |4 ++--
>  arch/um/drivers/stdio_console.c |4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
>
> Index: linux-2.6.21-mm/arch/um/drivers/ssl.c
> ===
> --- linux-2.6.21-mm.orig/arch/um/drivers/ssl.c2007-03-30
> 10:11:01.0 -0400 +++
> linux-2.6.21-mm/arch/um/drivers/ssl.c 2007-03-30 10:28:51.0 -0400
> @@ -191,12 +191,12 @@ static int ssl_init(void)
>   ssl_driver = register_lines(, _ops, serial_lines,
>   ARRAY_SIZE(serial_lines));
>
> - lines_init(serial_lines, ARRAY_SIZE(serial_lines), );
> -
>   new_title = add_xterm_umid(opts.xterm_title);
>   if (new_title != NULL)
>   opts.xterm_title = new_title;
>
> + lines_init(serial_lines, ARRAY_SIZE(serial_lines), );
> +
>   ssl_init_done = 1;
>   register_console(_cons);
>   return 0;
> Index: linux-2.6.21-mm/arch/um/drivers/stdio_console.c
> ===
> --- linux-2.6.21-mm.orig/arch/um/drivers/stdio_console.c  2007-03-30
> 10:11:01.0 -0400 +++
> linux-2.6.21-mm/arch/um/drivers/stdio_console.c   2007-03-30
> 10:28:51.0 -0400 @@ -166,12 +166,12 @@ int stdio_init(void)
>   return -1;
>   printk(KERN_INFO "Initialized stdio console driver\n");
>
> - lines_init(vts, ARRAY_SIZE(vts), );
> -
>   new_title = add_xterm_umid(opts.xterm_title);
>   if(new_title != NULL)
>   opts.xterm_title = new_title;
>
> + lines_init(vts, ARRAY_SIZE(vts), );
> +
>   con_init_done = 1;
>   register_console();
>   return 0;
>
> -
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share
> your opinions on IT & business topics through brief surveys-and earn cash
> http://www.techsay.com/default.php?page=join.php=sourceforge=DEVDEV
> ___
> User-mode-linux-devel mailing list
> User-mode-linux-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel



-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/3] sys_futex64-allows-64bit-futexes-workaround for uml

2007-03-30 Thread Paolo 'Blaisorblade' Giarrusso

Copy sys_futex64-allows-64bit-futexes-workaround.patch to UML (to unbreak the
UML build). Note however that in include/asm-generic/futex.h we have:

static inline int
futex_atomic_cmpxchg_inatomic(int __user *uaddr, int oldval, int newval)
{
return -ENOSYS;
}

Which is a better solution. Pierre Peiffer, please consider that.

Cc: Pierre Peiffer <[EMAIL PROTECTED]>
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 include/asm-um/futex.h |   13 +
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/include/asm-um/futex.h b/include/asm-um/futex.h
index 6a332a9..e875d3e 100644
--- a/include/asm-um/futex.h
+++ b/include/asm-um/futex.h
@@ -3,4 +3,17 @@
 
 #include 
 
+static inline u64
+futex_atomic_cmpxchg_inatomic64(u64 __user *uaddr, u64 oldval, u64 newval)
+{
+   return 0;
+}
+
+static inline int
+futex_atomic_op_inuser64 (int encoded_op, u64 __user *uaddr)
+{
+   return 0;
+}
+
+
 #endif



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/3] mm-only patches

2007-03-30 Thread Paolo 'Blaisorblade' Giarrusso

Patch-arounds for mm-only compile errors/warnings, got on 2.6.21-rc5-mm2, still
apply on 2.6.21-rc5-mm3.
-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/13] signal/timer/event fds v8 - anonymous inode source ...

2007-03-30 Thread Andrew Morton

On Fri, 30 Mar 2007 15:44:15 -0700 (PDT)
Davide Libenzi  wrote:

> On Fri, 30 Mar 2007, Andrew Morton wrote:
> 
> > > +#include 
> > > +
> > > +
> > > +
> > 
> > Too many blank lines
> 
> It'd be interesting to know how much is enough. You use one, ppl says it 
> is too dense. You use more, ppl says it's too much.
> There's the one-line rule for inter-function spacing, but what's the 
> include-functions ones? Or the functions-data ones?
> 

less ;)

> 
> > > +static int __init aino_init(void)
> > > +{
> > > + int error;
> > > +
> > > + error = register_filesystem(_fs_type);
> > > + if (error)
> > > + goto err_exit;
> > > + aino_mnt = kern_mount(_fs_type);
> > > + if (IS_ERR(aino_mnt)) {
> > > + error = PTR_ERR(aino_mnt);
> > > + goto err_unregister_filesystem;
> > > + }
> > > + aino_inode = aino_mkinode();
> > > + if (IS_ERR(aino_inode)) {
> > > + error = PTR_ERR(aino_inode);
> > > + goto err_mntput;
> > > + }
> > > +
> > > + return 0;
> > > +
> > > +err_mntput:
> > > + mntput(aino_mnt);
> > > +err_unregister_filesystem:
> > > + unregister_filesystem(_fs_type);
> > > +err_exit:
> > > + printk(KERN_ERR "aino_init() failed (%d)\n", error);
> > 
> > I suspect this is panic time?
> 
> Ok, it was panincing, and someone made me change it. Would you please 
> agree?
> The system can survive w/out, but it'll be a broken system WRT userspace.

I'd say panic.  There's no much point in limping along with an
incorrectly-working kernel, only to have some small number of apps fail
mysteriously later on.

> > 
> > Can we make this optional if CONFIG_EMBEDDED?  You plan on converting epoll
> > to use this facility, but with CONFIG_EPOLL=n, this is all dead code?
> 
> Hmmm, the whole point is that all this stuff works with or without epoll. 
> And epoll need no changes to support this.

I'm suggesting that all known clients of anon_inode be made optional. 
Hence anon_iode can become optional too.

It's a desirable objective, at least.  The default, really.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] fix page leak during core dump

2007-03-30 Thread Hugh Dickins

On Fri, 30 Mar 2007, Andrew Morton wrote:
> 
>  again?>

Oooh, yes please.

> diff -puN fs/binfmt_elf_fdpic.c~fix-page-leak-during-core-dump 
> fs/binfmt_elf_fdpic.c
> --- a/fs/binfmt_elf_fdpic.c~fix-page-leak-during-core-dump
> +++ a/fs/binfmt_elf_fdpic.c
> @@ -1480,8 +1480,10 @@ static int elf_fdpic_dump_segments(struc
>   DUMP_SEEK(file->f_pos + PAGE_SIZE);
>   }
>   else if (page == ZERO_PAGE(addr)) {
> - DUMP_SEEK(file->f_pos + PAGE_SIZE);
> - page_cache_release(page);
> + if (!dump_seek(file, file->f_pos + PAGE_SIZE)) {
> + page_cache_release(page);
> + return 0;
> + }
>   }
>   else {
>   void *kaddr;
> _

No, I think that's wrong: whereas the binfmt_elf one did its
page_cache_release down below at the bottom of the block, this
version does it in each subblock, so there you're removing the
dump_seek success one.  Can't we preserve that beauteous macro
here and just do...

--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -1480,8 +1480,8 @@ static int elf_fdpic_dump_segments(struc
DUMP_SEEK(file->f_pos + PAGE_SIZE);
}
else if (page == ZERO_PAGE(addr)) {
-   DUMP_SEEK(file->f_pos + PAGE_SIZE);
page_cache_release(page);
+   DUMP_SEEK(file->f_pos + PAGE_SIZE);
}
else {
void *kaddr;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2.6.21-rc5 2/3] msi: fix ARM compile

2007-03-30 Thread Dan Williams

In file included from drivers/pci/msi.c:22:
include/asm/smp.h:17:26: asm/arch/smp.h: No such file or directory
include/asm/smp.h:20:3: #error " included in non-SMP build"
include/asm/smp.h:23:1: warning: "raw_smp_processor_id" redefined
In file included from include/linux/sched.h:65,
 from include/linux/mm.h:4,
 from drivers/pci/msi.c:10:
include/linux/smp.h:85:1: warning: this is the location of the previous
definition

Tested on powerpc, i386, and x86_64.

Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
Acked-by: Eric W. Biederman <[EMAIL PROTECTED]>
---

 drivers/pci/msi.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index ad33e01..7a7152b 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -16,10 +16,10 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
-#include 
 
 #include "pci.h"
 #include "msi.h"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2.6.21-rc5 3/3] iop13xx: msi support (rev6)

2007-03-30 Thread Dan Williams

From: Daniel Wolstenholme <[EMAIL PROTECTED]>

Enable devices to signal interrupts via PCI memory cycles.

rev6:
* fix enable/disable typo, Michael Ellerman

rev5:
* fix up ack, enable, and disable for iop13xx_msi_chip

rev4:
* move smp compile fix to separate patch
* use dynamic_irq_init in create_irq()
* hookup mask/unmask routines in iop13xx_msi_chip

rev3:
* change msi.c to use linux/smp.h instead of asm/smp.h
* call dynamic_irq_cleanup at destroy_irq time

rev2:
* destroy_irq did not take the full 128 bits of msi_irq_in_use into account
* added missing '&' for calls to test_and_set_bit and clear_bit

[EMAIL PROTECTED]: review comments/suggestions]
[EMAIL PROTECTED]: cleanups/forward port to 2.6-git]
Signed-off-by: Daniel Wolstenholme <[EMAIL PROTECTED]>
Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
Acked-by: Eric W. Biederman <[EMAIL PROTECTED]>
---

 arch/arm/mach-iop13xx/Makefile |1 
 arch/arm/mach-iop13xx/irq.c|5 +
 arch/arm/mach-iop13xx/msi.c|  193 
 arch/arm/mach-iop13xx/pci.c|   16 +++
 include/asm-arm/arch-iop13xx/iop13xx.h |   29 +
 include/asm-arm/arch-iop13xx/irqs.h|8 +
 include/asm-arm/arch-iop13xx/msi.h |   11 ++
 7 files changed, 261 insertions(+), 2 deletions(-)

diff --git a/arch/arm/mach-iop13xx/Makefile b/arch/arm/mach-iop13xx/Makefile
index 4185e05..02bd511 100644
--- a/arch/arm/mach-iop13xx/Makefile
+++ b/arch/arm/mach-iop13xx/Makefile
@@ -9,3 +9,4 @@ obj-$(CONFIG_ARCH_IOP13XX) += pci.o
 obj-$(CONFIG_ARCH_IOP13XX) += io.o
 obj-$(CONFIG_MACH_IQ81340SC) += iq81340sc.o
 obj-$(CONFIG_MACH_IQ81340MC) += iq81340mc.o
+obj-$(CONFIG_PCI_MSI) += msi.o
diff --git a/arch/arm/mach-iop13xx/irq.c b/arch/arm/mach-iop13xx/irq.c
index b2eb0b9..5791add 100644
--- a/arch/arm/mach-iop13xx/irq.c
+++ b/arch/arm/mach-iop13xx/irq.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* INTCTL0 CP6 R0 Page 4
  */
@@ -258,7 +259,7 @@ void __init iop13xx_init_irq(void)
write_intbase(INTBASE);
write_intsize(INTSIZE_4);
 
-   for(i = 0; i < NR_IOP13XX_IRQS; i++) {
+   for(i = 0; i <= IRQ_IOP13XX_HPI; i++) {
if (i < 32)
set_irq_chip(i, _irqchip1);
else if (i < 64)
@@ -271,4 +272,6 @@ void __init iop13xx_init_irq(void)
set_irq_handler(i, handle_level_irq);
set_irq_flags(i, IRQF_VALID | IRQF_PROBE);
}
+
+   iop13xx_msi_init();
 }
diff --git a/arch/arm/mach-iop13xx/msi.c b/arch/arm/mach-iop13xx/msi.c
new file mode 100644
index 000..f620675
--- /dev/null
+++ b/arch/arm/mach-iop13xx/msi.c
@@ -0,0 +1,193 @@
+/*
+ * arch/arm/mach-iop13xx/msi.c
+ *
+ * PCI MSI support for the iop13xx processor
+ *
+ * Copyright (c) 2006, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ */
+#include 
+#include 
+#include 
+#include 
+
+
+static unsigned long msi_irq_in_use[4];
+
+/* IMIPR0 CP6 R8 Page 1
+ */
+static inline u32 read_imipr_0(void)
+{
+   u32 val;
+   asm volatile("mrc p6, 0, %0, c8, c1, 0":"=r" (val));
+   return val;
+}
+static inline void write_imipr_0(u32 val)
+{
+   asm volatile("mcr p6, 0, %0, c8, c1, 0"::"r" (val));
+}
+
+/* IMIPR1 CP6 R9 Page 1
+ */
+static inline u32 read_imipr_1(void)
+{
+   u32 val;
+   asm volatile("mrc p6, 0, %0, c9, c1, 0":"=r" (val));
+   return val;
+}
+static inline void write_imipr_1(u32 val)
+{
+   asm volatile("mcr p6, 0, %0, c9, c1, 0"::"r" (val));
+}
+
+/* IMIPR2 CP6 R10 Page 1
+ */
+static inline u32 read_imipr_2(void)
+{
+   u32 val;
+   asm volatile("mrc p6, 0, %0, c10, c1, 0":"=r" (val));
+   return val;
+}
+static inline void write_imipr_2(u32 val)
+{
+   asm volatile("mcr p6, 0, %0, c10, c1, 0"::"r" (val));
+}
+
+/* IMIPR3 CP6 R11 Page 1
+ */
+static inline u32 read_imipr_3(void)
+{
+   u32 val;
+   asm volatile("mrc p6, 0, %0, c11, c1, 0":"=r" (val));
+   return val;
+}
+static inline void write_imipr_3(u32 val)
+{
+   asm volatile("mcr p6, 0, %0, c11, c1, 0"::"r" (val));
+}
+
+static u32 (*read_imipr[])(void) = {
+   read_imipr_0,
+   read_imipr_1,
+   read_imipr_2,
+   read_imipr_3,
+};
+
+static void (*write_imipr[])(u32) = {
+   write_imipr_0,
+   write_imipr_1,
+   write_imipr_2,
+   write_imipr_3,
+};

[PATCH 2.6.21-rc5 1/3] msi: introduce ARCH_SUPPORTS_MSI Kconfig option (rev2)

2007-03-30 Thread Dan Williams

Allows architectures to advertise that they support MSI rather than listing
each architecture as a PCI_MSI dependency.

rev2:
* update i386 and x86_64 as well

Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
Acked-by: "Eric W. Biederman" <[EMAIL PROTECTED]>
---

 arch/arm/Kconfig |1 +
 arch/i386/Kconfig|1 +
 arch/ia64/Kconfig|1 +
 arch/sparc64/Kconfig |1 +
 arch/x86_64/Kconfig  |1 +
 drivers/pci/Kconfig  |6 +-
 6 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index e7baca2..db00376 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -255,6 +255,7 @@ config ARCH_IOP13XX
depends on MMU
select PLAT_IOP
select PCI
+   select ARCH_SUPPORTS_MSI
help
  Support for Intel's IOP13XX (XScale) family of processors.
 
diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig
index 53d6237..bcf2fc4 100644
--- a/arch/i386/Kconfig
+++ b/arch/i386/Kconfig
@@ -1073,6 +1073,7 @@ config PCI
bool "PCI support" if !X86_VISWS
depends on !X86_VOYAGER
default y if X86_VISWS
+   select ARCH_SUPPORTS_MSI if (X86_LOCAL_APIC && X86_IO_APIC)
help
  Find out whether you have a PCI motherboard. PCI is the name of a
  bus system, i.e. the way the CPU talks to the other stuff inside
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index e19185d..3b71f97 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -14,6 +14,7 @@ config IA64
select PCI if (!IA64_HP_SIM)
select ACPI if (!IA64_HP_SIM)
select PM if (!IA64_HP_SIM)
+   select ARCH_SUPPORTS_MSI
default y
help
  The Itanium Processor Family is Intel's 64-bit successor to
diff --git a/arch/sparc64/Kconfig b/arch/sparc64/Kconfig
index 1a6348b..b9b2b52 100644
--- a/arch/sparc64/Kconfig
+++ b/arch/sparc64/Kconfig
@@ -299,6 +299,7 @@ config SUN_IO
 
 config PCI
bool "PCI support"
+   select ARCH_SUPPORTS_MSI
help
  Find out whether you have a PCI motherboard. PCI is the name of a
  bus system, i.e. the way the CPU talks to the other stuff inside
diff --git a/arch/x86_64/Kconfig b/arch/x86_64/Kconfig
index 56eb14c..e9b4f05 100644
--- a/arch/x86_64/Kconfig
+++ b/arch/x86_64/Kconfig
@@ -676,6 +676,7 @@ menu "Bus options (PCI etc.)"
 
 config PCI
bool "PCI support"
+   select ARCH_SUPPORTS_MSI if (X86_LOCAL_APIC && X86_IO_APIC)
 
 # x86-64 doesn't support PCI BIOS access from long mode so always go direct.
 config PCI_DIRECT
diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 5ea5bc7..70efe8f 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -1,10 +1,14 @@
 #
 # PCI configuration
 #
+config ARCH_SUPPORTS_MSI
+   bool
+   default n
+
 config PCI_MSI
bool "Message Signaled Interrupts (MSI and MSI-X)"
depends on PCI
-   depends on (X86_LOCAL_APIC && X86_IO_APIC) || IA64 || SPARC64
+   depends on ARCH_SUPPORTS_MSI
help
   This allows device drivers to enable MSI (Message Signaled
   Interrupts).  Message Signaled Interrupts enable a device to
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2.6.21-rc5 0/3] iop13xx msi support and a couple msi cleanups

2007-03-30 Thread Dan Williams

Here is the latest revision of some patches that have been bouncing
around linux-pci for a while.  linux-kernel is copied to get a few more
eyes on the ARCH_SUPPORTS_MSI change.  To my knowledge these patches
have not yet been queued into a maintainer tree.

Dan Williams (2):
  msi: introduce ARCH_SUPPORTS_MSI Kconfig option (rev2)
  msi: fix ARM compile
 
Daniel Wolstenholme (1):
  iop13xx: msi support (rev6)

git pull git://lost.foo-projects.org/~dwillia2/git/iop msi

--
Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 34/37] libata bugfix: HDIO_DRIVE_TASK

2007-03-30 Thread Mark Lord


Greg KH wrote:

-stable review patch.  If anyone has any objections, please let us know.

--
From: Mark Lord <[EMAIL PROTECTED]>

libata bugfix: HDIO_DRIVE_TASK

I was trying to use HDIO_DRIVE_TASK for something today,
and discovered that the libata implementation does not copy
over the upper four LBA bits from args[6].

This is serious, as any tools using this ioctl would have their
commands applied to the wrong sectors on the drive, possibly resulting
in disk corruption.

Ideally, newer apps should use SG_IO/ATA_16 directly,
avoiding this bug.  But with libata poised to displace drivers/ide,
better compatibility here is a must.

This patch fixes libata to use the upper four LBA bits passed
in from the ioctl.

The original drivers/ide implementation copies over all bits
except for the master/slave select bit.  With this patch,
libata will copy only the four high-order LBA bits,
just in case there are assumptions elsewhere in libata (?).

Signed-off-by: Mark Lord <[EMAIL PROTECTED]>
Cc: Chuck Ebbert <[EMAIL PROTECTED]>
Signed-off-by: Jeff Garzik <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>

..

Mmmm.. I've just noticed another bit we should  be preserving there,
both for *stable* and current mainline.

Instead of:


+   scsi_cmd[13] = args[6] & 0x0f;


We should be doing:


+   scsi_cmd[13] = args[6] & 0x4f;


As-is, the patch still helps, but it is not as useful as it could be.
Here's the fixed version.  I'm also sending out a 2.6.21 patch via Jeff.

Signed-off-by: Mark Lord <[EMAIL PROTECTED]>
---
drivers/ata/libata-scsi.c |1 +
1 file changed, 1 insertion(+)

--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -295,6 +295,7 @@ int ata_task_ioctl(struct scsi_device *s
scsi_cmd[8]  = args[3];
scsi_cmd[10] = args[4];
scsi_cmd[12] = args[5];
+   scsi_cmd[13] = args[6] & 0x4f;
scsi_cmd[14] = args[0];

/* Good values for timeout and retries?  Values below

--
Mark Lord
Real-Time Remedies Inc.
[EMAIL PROTECTED] 
-

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/13] signal/timer/event fds v8 - anonymous inode source ...

2007-03-30 Thread Randy Dunlap

On Fri, 30 Mar 2007 15:44:15 -0700 (PDT) Davide Libenzi wrote:

> On Fri, 30 Mar 2007, Andrew Morton wrote:
> 
> > > +#include 
> > > +
> > > +
> > > +
> > 
> > Too many blank lines
> 
> It'd be interesting to know how much is enough. You use one, ppl says it 
> is too dense. You use more, ppl says it's too much.
> There's the one-line rule for inter-function spacing, but what's the 
> include-functions ones? Or the functions-data ones?

1 :)


> > > +static int ainofs_delete_dentry(struct dentry *dentry);
> > > +static struct inode *aino_mkinode(void);
> > 
> > Unneeded forward declaration.
> 
> Same here. You're the third says this, so I'm gonna change it. But pls 
> consider adding it to the coding style.
> 
> 
> 
> > > +static struct file_system_type aino_fs_type = {
> > > + .name   = "ainofs",
> > > + .get_sb = ainofs_get_sb,
> > > + .kill_sb= kill_anon_super,
> > > +};
> > > +static struct dentry_operations ainofs_dentry_operations = {
> > > + .d_delete   = ainofs_delete_dentry,
> > > +};
> > 
> > If this is moved elsewhere we can perhaps remove some or all of the
> > unpleasing static function forward-declarations.
> 
> Grrr :)

you are a puttycat


> > > +/**
> > > + * aino_getfd - creates a new file instance by hooking it up to and 
> > > anonymous
> > > + *  inode, and a dentry that describe the "class" of the file
> > > + * @pfd: [out]   pointer to the file descriptor
> > > + * @dpinode: [out]   pointer to the inode
> > > + * @pfile:   [out]   pointer to the file struct
> > > + * @name:[in]name of the "class" of the new file
> > > + * @fops [in]file operations for the new file
> > > + * @priv [in]private data for the new file (will be file's 
> > > private_data)
> > 
> > The [in] and [out] thing is nice - does kerneldoc handle it appropriately?
> 
> No idea. It should come out as text at least.

Yes, it's just [nice] text.

But the function description needs to fit on one line.  If that's
not enough, put more description after the @params lines, separated
by a
 *
"blank" line.


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] remove artificial software max_loop limit

2007-03-30 Thread Andrew Morton

On Fri, 30 Mar 2007 15:06:03 -0700
"Ken Chen" <[EMAIL PROTECTED]> wrote:

> On 3/30/07, Andrew Morton <[EMAIL PROTECTED]> wrote:
> > So..  this change will cause a fatal error for anyone who is presently
> > using max_loop, won't it?  If they're doing that within their
> > initramfs/initrd/etc then things could get rather ugly for them.
> 
> probably, if they access loop device non-sequentially.
> 

My point is that the modprobe will fail if it is passed an unrecognised
module parameter (won't it?)

So if we're worried about not breaking existing setups, we should retain
this module parameter as a do-nothing thing, maybe with a
this-is-going-away warning printk, too.

> 
> > I don't know how much of a problem this will be in practice - do people use
> > max_loop much?
> 
> I don't know either.

hm.

> 
> > btw, did you test this change as both a module and as linked-into-vmlinux?
> 
> as linked-into-vmlinux.  why do you ask?  It breaks if it is module?
> I made last minute change to a mutex name and shamely posted without
> doing a compile test.  Besides that, is there something else breaks?

Just idle curiosity regarding how much testing it had seen.

Generally one would expect things to be OK, but there can be startup
ordering problems.

The most common problem is that the module simply doesn't load because it's
using some not-exported-to-modules symbol

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 1/5] RT kernel: force detect HPET from PCI space

2007-03-30 Thread Mikko Tiihonen




Anyone got the same thing for CK804? I had my hopes high, and then I saw
the DECLARE_PCI_FIXUP_HEADER values [and the thread title was misleading]


I have an A8N-E motherboard with AthlonX2 and the ACPI definitions are 
missing the HPET (standard feature of Asus motherboards).


I too got interested to get my motherboard working. Luckily I found this 
http://lkml.org/lkml/2006/12/17/69 from which I generated the following patch:


--- arch/i386/kernel/quirks.c.orig  2007-03-30 23:43:06.0 +0300
+++ arch/i386/kernel/quirks.c   2007-03-30 23:26:47.0 +0300
@@ -101,5 +101,39 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_I
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL,   PCI_DEVICE_ID_INTEL_ICH7_1,
 force_enable_hpet);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL,   PCI_DEVICE_ID_INTEL_ICH7_31,   
  force_enable_hpet);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL,   PCI_DEVICE_ID_INTEL_ICH8_1,
 force_enable_hpet);
+
+static void __init force_enable_nvidia_hpet(struct pci_dev *dev)
+{
+u8 enabled;
+   u32 addr;
+
+   if (hpet_address)
+   return;
+
+   pci_read_config_dword(dev, 0x44, );
+   if (addr != 0xfefff000L) {
+   printk(KERN_INFO "Unsafe HPET address 0x%08x. Cannot force enable 
HPET\n", addr);
+   return;
+   }
+
+   pci_read_config_byte(dev, 0xA3, );
+   if ((enabled & 4) == 0) {
+   if (enabled != 0xc1) {
+ printk(KERN_INFO "Unsafe HPET enable 0x%02x. Cannot force enable 
HPET\n", enabled);
+ return;
+   }
+   pci_write_config_byte(dev, 0xA3, enabled | 4);
+   pci_read_config_byte(dev, 0xA3, );
+   if ((enabled & 4) == 0) {
+   printk(KERN_INFO "Failed to force enable HPET\n");
+   return;
+   }
+   }
+
+   force_hpet_address = addr;
+   printk(KERN_INFO "Force enabled HPET. Base address 0x%08lx\n", 
force_hpet_address);
+}
+
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_NVIDIA, 0x0050, 
force_enable_nvidia_hpet); // NForce4
 #endif

Now Linux seems to detect HPET and it passes at least the basic sanity checks:

Force enabled HPET. Base address 0xfefff000
HPET: hpet_period 4000, hpet_tick 8
Successfully registered HPET clocksource

Unfortunately the 2.6.20-mm2 kernel to which I tried to patch the patch series 
seems to hang few seconds later after half way in udev startup event 
processing.


It could either be something totally different in 2.6.20-mm2 that just happens 
to fail or more likely some interrupt setup that still needs to be done.


I have no idea how to continue from here.

-Mikko
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

libata bugfix: preserve LBA bit for HDIO_DRIVE_TASK

2007-03-30 Thread Mark Lord


Ideally, this would go into linux-2.6.21.

Preserve the LBA bit in the DevSel/Head register for HDIO_DRIVE_TASK.

Signed-off-by:  Mark Lord <[EMAIL PROTECTED]>
---
--- linux/drivers/ata/libata-scsi.c.orig2007-03-21 13:35:02.0 
-0400
+++ linux/drivers/ata/libata-scsi.c 2007-03-30 17:40:58.0 -0400
@@ -333,7 +333,7 @@
scsi_cmd[8]  = args[3];
scsi_cmd[10] = args[4];
scsi_cmd[12] = args[5];
-   scsi_cmd[13] = args[6] & 0x0f;
+   scsi_cmd[13] = args[6] & 0x4f;
scsi_cmd[14] = args[0];

/* Good values for timeout and retries?  Values below
--
Mark Lord
Real-Time Remedies Inc.
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/13] signal/timer/event fds v8 - anonymous inode source ...

2007-03-30 Thread Davide Libenzi

On Fri, 30 Mar 2007, Andrew Morton wrote:

> > +#include 
> > +
> > +
> > +
> 
> Too many blank lines

It'd be interesting to know how much is enough. You use one, ppl says it 
is too dense. You use more, ppl says it's too much.
There's the one-line rule for inter-function spacing, but what's the 
include-functions ones? Or the functions-data ones?



> > +static int ainofs_delete_dentry(struct dentry *dentry);
> > +static struct inode *aino_mkinode(void);
> 
> Unneeded forward declaration.

Same here. You're the third says this, so I'm gonna change it. But pls 
consider adding it to the coding style.



> > +static int ainofs_get_sb(struct file_system_type *fs_type, int flags,
> > +const char *dev_name, void *data, struct vfsmount 
> > *mnt);
> > +
> > +
> > +
> > +static struct vfsmount *aino_mnt __read_mostly;
> > +static struct inode *aino_inode;
> > +static const struct file_operations aino_fops = { };
> 
> Unneeded { }

Ack.



> > +static struct file_system_type aino_fs_type = {
> > +   .name   = "ainofs",
> > +   .get_sb = ainofs_get_sb,
> > +   .kill_sb= kill_anon_super,
> > +};
> > +static struct dentry_operations ainofs_dentry_operations = {
> > +   .d_delete   = ainofs_delete_dentry,
> > +};
> 
> If this is moved elsewhere we can perhaps remove some or all of the
> unpleasing static function forward-declarations.

Grrr :)




> > +/**
> > + * aino_getfd - creates a new file instance by hooking it up to and 
> > anonymous
> > + *  inode, and a dentry that describe the "class" of the file
> > + * @pfd: [out]   pointer to the file descriptor
> > + * @dpinode: [out]   pointer to the inode
> > + * @pfile:   [out]   pointer to the file struct
> > + * @name:[in]name of the "class" of the new file
> > + * @fops [in]file operations for the new file
> > + * @priv [in]private data for the new file (will be file's 
> > private_data)
> 
> The [in] and [out] thing is nice - does kerneldoc handle it appropriately?

No idea. It should come out as text at least.




> > + *
> > + * Creates a new file by hooking it on a single inode. This is useful for 
> > files
> > + * that do not need to have a full-fledged inode in order to operate 
> > correctly.
> > + * All the files created with aino_getfd() will share a single inode, by 
> > hence
> > + * saving memory and avoiding code duplication for the file/inode/dentry 
> > setup.
> > + */
> > +int aino_getfd(int *pfd, struct inode **pinode, struct file **pfile,
> > +  char const *name, const struct file_operations *fops, void *priv)
> 
> Dunno about others, but the "aino" naming doesn't grab me, really. 
> anon_inode_getfd() would make more sense.

Why? Don't you like fortran-like compact naming? :)



> We conventionally use `const char *' rather than `char const *', and I thnk
> it is more logical to do so.

Okie




> > +static int ainofs_delete_dentry(struct dentry *dentry)
> > +{
> > +   /*
> > +* We faked vfs to believe the dentry was hashed when we created it.
> > +* Now we restore the flag so that dput() will work correctly.
> > +*/
> > +   dentry->d_flags |= DCACHE_UNHASHED;
> > +   return 1;
> > +}
> 
> Is that legit, or is it a hack??

Same thing used in pipes. Avoid loading the hash for things that'll never 
be looked up.




> > +/*
> > + * A single inode exist for all aino files. On the contrary of pipes,
> > + * aino inodes has no per-instance data associated, so we can avoid
> > + * the allocation of multiple of them.
> > + */
> 
> "Contrary to pipes, aino inodes have no "

Ok



> > +static struct inode *aino_mkinode(void)
> > +{
> > +   struct inode *inode = new_inode(aino_mnt->mnt_sb);
> > +
> > +   if (!inode)
> > +   return ERR_PTR(-ENOMEM);
> > +
> > +   inode->i_fop = _fops;
> > +
> > +   /*
> > +* Mark the inode dirty from the very beginning,
> > +* that way it will never be moved to the dirty
> > +* list because mark_inode_dirty() will think
> > +* that it already _is_ on the dirty list.
> > +*/
> 
> Thus breaking what is hopefully a VFS invariant.  How come?

Copied from pipes.




> > +static int __init aino_init(void)
> > +{
> > +   int error;
> > +
> > +   error = register_filesystem(_fs_type);
> > +   if (error)
> > +   goto err_exit;
> > +   aino_mnt = kern_mount(_fs_type);
> > +   if (IS_ERR(aino_mnt)) {
> > +   error = PTR_ERR(aino_mnt);
> > +   goto err_unregister_filesystem;
> > +   }
> > +   aino_inode = aino_mkinode();
> > +   if (IS_ERR(aino_inode)) {
> > +   error = PTR_ERR(aino_inode);
> > +   goto err_mntput;
> > +   }
> > +
> > +   return 0;
> > +
> > +err_mntput:
> > +   mntput(aino_mnt);
> > +err_unregister_filesystem:
> > +   unregister_filesystem(_fs_type);
> > +err_exit:
> > +   printk(KERN_ERR "aino_init() failed (%d)\n", error);
> 
> I suspect this is panic time?

Ok, it was panincing, and someone made me change it. Would you please

RE: [PATCH 1/5] RT kernel: force detect HPET from PCI space

2007-03-30 Thread Nicolas Mailhot

Le samedi 31 mars 2007 à 01:09 +0300, Mikko Tiihonen a écrit :
> > Anyone got the same thing for CK804? I had my hopes high, and then I saw
> > the DECLARE_PCI_FIXUP_HEADER values [and the thread title was misleading]
> 
> I have an A8N-E motherboard with AthlonX2 and the ACPI definitions are 
> missing the HPET (standard feature of Asus motherboards).
> 
> I too got interested to get my motherboard working. Luckily I found this 
> http://lkml.org/lkml/2006/12/17/69 

Oh, is looks so close to my system

00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev a3)
Subsystem: Giga-byte Technology GA-K8N Ultra-9 Mainboard
Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort-
SERR-  HPET disabled (according to your reference)

I don't have an HPET toggle in my bios

> from which I generated the following patch:

I'd love to test it, but what kernel baseline did you use?
2.6.21-rc5-mm2?

-- 
Nicolas Mailhot


signature.asc
Description: Ceci est une partie de message	numériquement signée

Intel DP965LT Mainboard running?

2007-03-30 Thread Oliver Joa


Hi,

does anyone have a running Intel DP965LT Mainboard? I can not get this 
Board running. You can see the Problems in the Thread "Corrupt 
XFS-Filesystems on new Hardware and Kernel". Please can you give me a 
running Kernel-Config?


Thanks a lot

Oliver

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21-rc5-mm2 -- ACPI problems (lid switch broken, always shows running from AC, plugging in AC hangs machine)

2007-03-30 Thread Miles Lane

On 3/30/07, Len Brown <[EMAIL PROTECTED]> wrote:

On Thursday 29 March 2007 17:49, Miles Lane wrote:
> Hmm.   I've reproduced these problems with vanilla 2.6.21-rc5, so the
> latest acpi-git changes are off the hook

I think the 1st message on this thread holds the answer:

> ACPI: BIOS bug: multiple APIC/MADT found, using 2
> ACPI: If "acpi_apic_instance=0" works better, notify 
linux-acpi@vger.kernel.org

Looking at your acpidump in
http://bugzilla.kernel.org/show_bug.cgi?id=8283
clearly the 2nd table is broken (it is missing the SCI override)
and that is why you get no ACPI interrupts.

If Windows works on your machine -- ACPI events such as power button etc --
then that is proof that Windows doesn't use the 2nd MADT and I need
to revert 09fe58356d148ff66901ddf639e725ca1a48a0af

Yes, the patch you sent fixes these problems.

Thanks!
  Miles
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Why is arch/s390/crypto/Kconfig sourced when building for another arch ?

2007-03-30 Thread Robert P. J. Day

On Fri, 30 Mar 2007, Jan Glauber wrote:

> On Fri, 2007-03-30 at 05:55 -0400, Robert P. J. Day wrote:
> >   i'm betting the S390 folks would *really* hate that idea but, if you
> > look closely, the generic Kconfig file *already* has some
> > arch-dependent content:
> >
> > ...
> > config CRYPTO_DEV_PADLOCK
> > tristate "Support for VIA PadLock ACE"
> > depends on X86_32 <-
> > ...
>
> Yes, but the padlock driver is located under drivers/crypto. The
> s390 crypto stuff is not. It is under arch/s390/crypto, thats why
> the Kconfig file is there...

  which differs from the way it's handled with x86_64.  even though
the actual crypto routines specific to x86_64 are in
arch/x86_64/crypto (as they are with s390), the menuconfig selections
for them are defined in the *generic* crypto/Kconfig file:

...
config CRYPTO_TWOFISH_X86_64
tristate "Twofish cipher algorithm (x86_64)"
depends on (X86 || UML_X86) && 64BIT
select CRYPTO_ALGAPI
select CRYPTO_TWOFISH_COMMON
help
  Twofish cipher algorithm (x86_64).

  Twofish was submitted as an AES (Advanced Encryption Standard)
  candidate cipher by researchers at CounterPane Systems.  It is a
  16 round block cipher supporting key sizes of 128, 192, and 256
  bits.

  See also:

...

  a little consistency would be nice.  :-)

rday

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] fix page leak during core dump

2007-03-30 Thread Andrew Morton

On Fri, 30 Mar 2007 23:01:45 +0100 (BST)
Hugh Dickins <[EMAIL PROTECTED]> wrote:

> On Fri, 30 Mar 2007, Andrew Morton wrote:
> > On Thu, 29 Mar 2007 13:39:13 -0700
> > Brian Pomerantz <[EMAIL PROTECTED]> wrote:
> > 
> > > When the dump cannot occur most likely because of a full file system
> > > and the page to be written is the zero page, the call to
> > > page_cache_release() is missed.
> > > 
> > > Signed-off-by: Brian Pomerantz <[EMAIL PROTECTED]>
> > > 
> > > diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> > > index a2fceba..9cc4f0a 100644
> > > --- a/fs/binfmt_elf.c
> > > +++ b/fs/binfmt_elf.c
> > > @@ -1704,7 +1704,10 @@ static int elf_core_dump(long signr, struct 
> > > pt_regs *regs, struct file *file)
> > >   DUMP_SEEK(PAGE_SIZE);
> > >   } else {
> > >   if (page == ZERO_PAGE(addr)) {
> > > - DUMP_SEEK(PAGE_SIZE);
> > > + if (!dump_seek(file, PAGE_SIZE)) {
> > > + page_cache_release(page);
> > > + goto end_coredump;
> > > + }
> > 
> > Oh for gawds sake I wish we could be rid of those idiotic macros :(
> > 
> > This patch looks OK to me, although a refcount leak on the ZERO_PAGE is
> > special, because that page is PageReserved().
> > 
> > It used to be the case that we'd ignore attempts to change the refcount on
> > reserved pages (or at least on the ZERO_PAGE), but we changed that, so we
> > now actually refcount the ZERO_PAGE.  (I think, from a quick read of the
> > code.  This contradicts my memory of how it works).
> > 
> > So I expect the net effect here is that a sufficiently determined attacker
> > can overflow the ZERO_PAGE's refcount, thus causing it to be "freed".  The
> > page allocator won't actually free the page due to PG_Reserved, but it'll
> > all become very noisy.
> > 
> > Nick, Hugh: agree?
> 
> I think so - lots of "Bad page state" messages as the count bounces
> around the 0 mark, but not actually freed.  But when CONFIG_DEBUG_VM
> you'll get BUG_ONs.  And I can't swear bad things won't happen some-
> where once the count wraps to negative.  Easier to fix than work out
> the consequences.
> 
> (Of course, Nick is right now proposing a patch to take us back the
> other way, back to not accounting the ZERO_PAGE: so the fix needs
> to go in, then he'll need to reverse that again in his patch.)

OK.

> Doesn't fs/binfmt_elf_fdpic.c need the same fix?  It looks slightly
> different there, but I think when you look closer there's exactly
> the same issue?

Think so.  David, does it look OK?




From: Brian Pomerantz <[EMAIL PROTECTED]>

When the dump cannot occur most likely because of a full file system and
the page to be written is the zero page, the call to page_cache_release()
is missed.

Signed-off-by: Brian Pomerantz <[EMAIL PROTECTED]>
Cc: Hugh Dickins <[EMAIL PROTECTED]>
Cc: Nick Piggin <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 fs/binfmt_elf.c   |5 -
 fs/binfmt_elf_fdpic.c |6 --
 2 files changed, 8 insertions(+), 3 deletions(-)

diff -puN fs/binfmt_elf.c~fix-page-leak-during-core-dump fs/binfmt_elf.c
--- a/fs/binfmt_elf.c~fix-page-leak-during-core-dump
+++ a/fs/binfmt_elf.c
@@ -1704,7 +1704,10 @@ static int elf_core_dump(long signr, str
DUMP_SEEK(PAGE_SIZE);
} else {
if (page == ZERO_PAGE(addr)) {
-   DUMP_SEEK(PAGE_SIZE);
+   if (!dump_seek(file, PAGE_SIZE)) {
+   page_cache_release(page);
+   goto end_coredump;
+   }
} else {
void *kaddr;
flush_cache_page(vma, addr,
diff -puN fs/binfmt_elf_fdpic.c~fix-page-leak-during-core-dump 
fs/binfmt_elf_fdpic.c
--- a/fs/binfmt_elf_fdpic.c~fix-page-leak-during-core-dump
+++ a/fs/binfmt_elf_fdpic.c
@@ -1480,8 +1480,10 @@ static int elf_fdpic_dump_segments(struc
DUMP_SEEK(file->f_pos + PAGE_SIZE);
}
else if (page == ZERO_PAGE(addr)) {
-   DUMP_SEEK(file->f_pos + PAGE_SIZE);
-   page_cache_release(page);
+   if (!dump_seek(file, file->f_pos + PAGE_SIZE)) {
+   page_cache_release(page);
+   return 0;
+   }
}
else {
void *kaddr;
_

-
To unsubscribe from this list: send the line "unsubscribe

Re: [uml-devel] [PATCH 2/2] UML - Speed up exec

2007-03-30 Thread Blaisorblade

On venerdì 30 marzo 2007, Jeff Dike wrote:
> flush_thread doesn't need to do a full page table walk in order to
> clear the address space.  It knows what the end result needs to be, so
> it can call unmap directly.
>
> This results in a 10-20% speedup in an exec from bash.

Oh, yeah!
When porting part of Ingo's work, I realized that a similar thing can be done 
for fork().

If the whole address space is unmapped in init_new_context_skas(), the first 
fix_range_common() call won't need to call unmap at all. He did this with 
remap_file_pages(), where init_new_context_skas() must "unmap" everything 
anyway.

This is giving some speedup in lmbench (5% better in fork proc, 2% better in 
exec proc), but the results are still controversial, there is one benchmark 
with a 2% slowdown (called 'mmap latency').

In a loop, it maps, touches a byte per page and unmaps a region with growing 
size (up to 32MB).

However, since results aren't yet stable for some other benchmark (context 
switching benchmark is crazy), I'm still studying on this.

> Signed-off-by: Jeff Dike <[EMAIL PROTECTED]>
> --
>  arch/um/kernel/skas/exec.c |   12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> Index: linux-2.6.21-mm/arch/um/kernel/skas/exec.c
> ===
> --- linux-2.6.21-mm.orig/arch/um/kernel/skas/exec.c   2007-03-30
> 10:28:24.0 -0400 +++
> linux-2.6.21-mm/arch/um/kernel/skas/exec.c2007-03-30 10:30:15.0
> -0400 @@ -17,7 +17,17 @@
>
>  void flush_thread_skas(void)
>  {
> - force_flush_all();
> + void *data = NULL;
> + unsigned long end = proc_mm ? task_size : CONFIG_STUB_START;
> + int ret;
> +
> + ret = unmap(>mm->context.skas.id, 0, end, 1, );
> + if(ret){
> + printk("flush_thread_skas - clearing address space failed, "
> +"err = %d\n", ret);
> + force_sig(SIGKILL, current);
> + }
> +
>   switch_mm_skas(>mm->context.skas.id);
>  }


-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
Index: linux-2.6.git/arch/um/include/skas/mmu-skas.h
===
--- linux-2.6.git.orig/arch/um/include/skas/mmu-skas.h
+++ linux-2.6.git/arch/um/include/skas/mmu-skas.h
@@ -16,6 +16,7 @@ struct mmu_context_skas {
 	unsigned long last_pmd;
 #endif
 	uml_ldt_t ldt;
+	int first_flush;
 };
 
 extern void switch_mm_skas(struct mm_id * mm_idp);
Index: linux-2.6.git/arch/um/kernel/skas/mmu.c
===
--- linux-2.6.git.orig/arch/um/kernel/skas/mmu.c
+++ linux-2.6.git/arch/um/kernel/skas/mmu.c
@@ -77,6 +77,7 @@ int init_new_context_skas(struct task_st
 	struct mmu_context_skas *to_mm = >context.skas;
 	unsigned long stack = 0;
 	int ret = -ENOMEM;
+	void *unused = NULL;
 
 	if(skas_needs_stub){
 		stack = get_zeroed_page(GFP_KERNEL);
@@ -121,6 +122,14 @@ int init_new_context_skas(struct task_st
 		else to_mm->id.u.pid = start_userspace(stack);
 	}
 
+	mm->context.skas.first_flush = 1;
+	ret = unmap(>context.skas.id, 0, TASK_SIZE, 1, );
+	if (ret < 0) {
+		printk("init_new_context_skas - unmap failed, "
+		   "errno = %d; continuing\n", ret);
+		mm->context.skas.first_flush = 0;
+	}
+
 	ret = init_new_ldt(to_mm, from_mm);
 	if(ret < 0){
 		printk("init_new_context_skas - init_ldt"
Index: linux-2.6.git/arch/um/kernel/tlb.c
===
--- linux-2.6.git.orig/arch/um/kernel/tlb.c
+++ linux-2.6.git/arch/um/kernel/tlb.c
@@ -139,10 +139,17 @@ void fix_range_common(struct mm_struct *
 	void *flush = NULL;
 	int op_index = -1, last_op = ARRAY_SIZE(ops) - 1;
 	int ret = 0;
+	int first_flush;
 
 	if(mm == NULL)
 		return;
 
+	/* Nothing is mapped in this address space, so no call to add_munmap()
+	 * must be done */
+	first_flush = mm->context.skas.first_flush;
+
+	mm->context.skas.first_flush = 0;
+
 	ops[0].type = NONE;
 	for(addr = start_addr; addr < end_addr && !ret;){
 		npgd = pgd_offset(mm, addr);
@@ -151,9 +158,10 @@ void fix_range_common(struct mm_struct *
 			if(end > end_addr)
 end = end_addr;
 			if(force || pgd_newpage(*npgd)){
-ret = add_munmap(addr, end - addr, ops,
-		 _index, last_op, mmu,
-		 , do_ops);
+if (!first_flush)
+	ret = add_munmap(addr, end - addr, ops,
+			 _index, last_op, mmu,
+			 , do_ops);
 pgd_mkuptodate(*npgd);
 			}
 			addr = end;
@@ -166,9 +174,10 @@ void fix_range_common(struct mm_struct *
 			if(end > end_addr)
 end = end_addr;
 			if(force || pud_newpage(*npud)){
-ret = add_munmap(addr, end - addr, ops,
-		 _index, last_op, mmu,
-		 , do_ops);
+if (!first_flush)
+	ret = add_munmap(addr, end - addr, ops,
+			 _index, last_op, mmu,
+			 , do_ops);
 pud_mkuptodate(*npud);
 			}
 			addr = end;
@@ -181,9 +190,10 @@ void

Re: [patch 34/37] libata bugfix: HDIO_DRIVE_TASK

2007-03-30 Thread Greg KH

On Fri, Mar 30, 2007 at 05:42:13PM -0400, Mark Lord wrote:
> Greg KH wrote:
> >-stable review patch.  If anyone has any objections, please let us know.
> >
> >--
> >From: Mark Lord <[EMAIL PROTECTED]>
> >
> >libata bugfix: HDIO_DRIVE_TASK
> >
> >I was trying to use HDIO_DRIVE_TASK for something today,
> >and discovered that the libata implementation does not copy
> >over the upper four LBA bits from args[6].
> >
> >This is serious, as any tools using this ioctl would have their
> >commands applied to the wrong sectors on the drive, possibly resulting
> >in disk corruption.
> >
> >Ideally, newer apps should use SG_IO/ATA_16 directly,
> >avoiding this bug.  But with libata poised to displace drivers/ide,
> >better compatibility here is a must.
> >
> >This patch fixes libata to use the upper four LBA bits passed
> >in from the ioctl.
> >
> >The original drivers/ide implementation copies over all bits
> >except for the master/slave select bit.  With this patch,
> >libata will copy only the four high-order LBA bits,
> >just in case there are assumptions elsewhere in libata (?).
> >
> >Signed-off-by: Mark Lord <[EMAIL PROTECTED]>
> >Cc: Chuck Ebbert <[EMAIL PROTECTED]>
> >Signed-off-by: Jeff Garzik <[EMAIL PROTECTED]>
> >Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
> ..
> 
> Mmmm.. I've just noticed another bit we should  be preserving there,
> both for *stable* and current mainline.
> 
> Instead of:
> 
> >+scsi_cmd[13] = args[6] & 0x0f;
> 
> We should be doing:
> 
> >+scsi_cmd[13] = args[6] & 0x4f;
> 
> As-is, the patch still helps, but it is not as useful as it could be.
> Here's the fixed version.  I'm also sending out a 2.6.21 patch via Jeff.
> 
> Signed-off-by: Mark Lord <[EMAIL PROTECTED]>

Can you forward that one to the [EMAIL PROTECTED] address too, with the
full changelog and Jeff's ack/signed-off-by so that I will remember to
add it to the tree?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] remove artificial software max_loop limit

2007-03-30 Thread Ken Chen


On 3/30/07, Andrew Morton <[EMAIL PROTECTED]> wrote:

So..  this change will cause a fatal error for anyone who is presently
using max_loop, won't it?  If they're doing that within their
initramfs/initrd/etc then things could get rather ugly for them.


probably, if they access loop device non-sequentially.



I don't know how much of a problem this will be in practice - do people use
max_loop much?


I don't know either.



btw, did you test this change as both a module and as linked-into-vmlinux?


as linked-into-vmlinux.  why do you ask?  It breaks if it is module?
I made last minute change to a mutex name and shamely posted without
doing a compile test.  Besides that, is there something else breaks?

- Ken
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] fix page leak during core dump

2007-03-30 Thread Hugh Dickins

On Fri, 30 Mar 2007, Andrew Morton wrote:
> On Thu, 29 Mar 2007 13:39:13 -0700
> Brian Pomerantz <[EMAIL PROTECTED]> wrote:
> 
> > When the dump cannot occur most likely because of a full file system
> > and the page to be written is the zero page, the call to
> > page_cache_release() is missed.
> > 
> > Signed-off-by: Brian Pomerantz <[EMAIL PROTECTED]>
> > 
> > diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> > index a2fceba..9cc4f0a 100644
> > --- a/fs/binfmt_elf.c
> > +++ b/fs/binfmt_elf.c
> > @@ -1704,7 +1704,10 @@ static int elf_core_dump(long signr, struct pt_regs 
> > *regs, struct file *file)
> > DUMP_SEEK(PAGE_SIZE);
> > } else {
> > if (page == ZERO_PAGE(addr)) {
> > -   DUMP_SEEK(PAGE_SIZE);
> > +   if (!dump_seek(file, PAGE_SIZE)) {
> > +   page_cache_release(page);
> > +   goto end_coredump;
> > +   }
> 
> Oh for gawds sake I wish we could be rid of those idiotic macros :(
> 
> This patch looks OK to me, although a refcount leak on the ZERO_PAGE is
> special, because that page is PageReserved().
> 
> It used to be the case that we'd ignore attempts to change the refcount on
> reserved pages (or at least on the ZERO_PAGE), but we changed that, so we
> now actually refcount the ZERO_PAGE.  (I think, from a quick read of the
> code.  This contradicts my memory of how it works).
> 
> So I expect the net effect here is that a sufficiently determined attacker
> can overflow the ZERO_PAGE's refcount, thus causing it to be "freed".  The
> page allocator won't actually free the page due to PG_Reserved, but it'll
> all become very noisy.
> 
> Nick, Hugh: agree?

I think so - lots of "Bad page state" messages as the count bounces
around the 0 mark, but not actually freed.  But when CONFIG_DEBUG_VM
you'll get BUG_ONs.  And I can't swear bad things won't happen some-
where once the count wraps to negative.  Easier to fix than work out
the consequences.

(Of course, Nick is right now proposing a patch to take us back the
other way, back to not accounting the ZERO_PAGE: so the fix needs
to go in, then he'll need to reverse that again in his patch.)

Doesn't fs/binfmt_elf_fdpic.c need the same fix?  It looks slightly
different there, but I think when you look closer there's exactly
the same issue?

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] remove artificial software max_loop limit

2007-03-30 Thread Jan Engelhardt


On Mar 30 2007 14:46, Andrew Morton wrote:
>
>ahem.
>
>On Fri, 30 Mar 2007 02:25:37 -0700
>"Ken Chen" <[EMAIL PROTECTED]> wrote:
>
>> +static DEFINE_MUTEX(loop_devices_mutex);
>> ...
>> +mutex_lock(_device_mutex);
>
>which makes me suspect that you didn't send the patch which you meant to
>send, so I'll drop it.

/me smells plagiarism :)


Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 >

1 - 100 of 694 matches

Mail list logo