Re: [Fastboot] [PATCH 2/5] Use the APIC to determine the hardware processor id - i386

2007-02-28 Thread Vivek Goyal
On Thu, Mar 01, 2007 at 04:16:59PM +0900, Fernando Luis Vázquez Cao wrote:
> Use the APIC to determine the hardware processor id when APIC support
> has been selected, independently of whether CONFIG_SMP is set or not.
> 
> Signed-off-by: Fernando Luis Vazquez Cao <[EMAIL PROTECTED]>
> ---
> 
> diff -urNp linux-2.6.21-rc2/include/asm-i386/smp.h 
> linux-2.6.21-rc2-hwcpuid/include/asm-i386/smp.h
> --- linux-2.6.21-rc2/include/asm-i386/smp.h   2007-03-01 14:02:21.0 
> +0900
> +++ linux-2.6.21-rc2-hwcpuid/include/asm-i386/smp.h   2007-03-01 
> 14:08:50.0 +0900
> @@ -74,20 +74,6 @@ static inline int num_booting_cpus(void)
>   return cpus_weight(cpu_callout_map);
>  }
>  
> -#ifdef CONFIG_X86_LOCAL_APIC
> -
> -#ifdef APIC_DEFINITION
> -extern int hard_smp_processor_id(void);
> -#else
> -#include 
> -static inline int hard_smp_processor_id(void)
> -{
> - /* we don't want to mark this access volatile - bad code generation */
> - return GET_APIC_ID(*(unsigned long *)(APIC_BASE+APIC_ID));
> -}
> -#endif
> -#endif
> -
>  extern int safe_smp_processor_id(void);
>  extern int __cpu_disable(void);
>  extern void __cpu_die(unsigned int cpu);
> @@ -102,10 +88,23 @@ extern unsigned int num_processors;
>  
>  #define NO_PROC_ID   0xFF/* No processor magic marker */
>  
> -#endif
> +#endif /* CONFIG_SMP */
>  
>  #ifndef __ASSEMBLY__
>  
> +#ifdef CONFIG_X86_LOCAL_APIC
> +#ifdef APIC_DEFINITION
> +extern int hard_smp_processor_id(void);
> +#else
> +#include 
> +static inline int hard_smp_processor_id(void)
> +{
> + /* we don't want to mark this access volatile - bad code generation */
> + return GET_APIC_ID(*(unsigned long *)(APIC_BASE+APIC_ID));
> +}
> +#endif /* APIC_DEFINITION */
> +#endif /* CONFIG_X86_LOCAL_APIC */
> +

I think compilation will fail if CONFIG_X86_LOCAL_APIC=n and I build a UP
kernel? There is no definition of hard_smp_processor_id in that case.

Otherwise, as a concept it seems to make sense that hard_smp_processor_id()
is not necessarily zero on UP systems.

Thanks
Vivek
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 04/22] fix deadlock in throttle_vm_writeout

2007-02-28 Thread Miklos Szeredi
> > From: Miklos Szeredi <[EMAIL PROTECTED]>
> > 
> > This deadlock is similar to the one in balance_dirty_pages, but
> > instead of waiting in balance_dirty_pages after submitting a write
> > request, it happens during a memory allocation for filesystem B before
> > submitting a write request.
> > 
> > It is easy to reproduce on a machine with not too much memory.
> > E.g. try this on 2.6.21-rc1 UML with 32MB (works on physical hw as
> > well):
> > 
> >   dd if=/dev/zero of=/tmp/tmp.img bs=1048576 count=40
> >   mke2fs -j -F /tmp/tmp.img
> >   mkdir /tmp/img
> >   mount -oloop /tmp/tmp.img /tmp/img
> >   bash-shared-mapping /tmp/img/foo 3000
> > 
> > The deadlock doesn't happen immediately, sometimes only after a few
> > minutes.
> > 
> > Simplified stack trace for bash-shared-mapping after the deadlock:
> > 
> >   io_schedule_timeout
> >   congestion_wait
> >   balance_dirty_pages
> >   balance_dirty_pages_ratelimited_nr
> >   generic_file_buffered_write
> >   __generic_file_aio_write_nolock
> >   generic_file_aio_write
> >   ext3_file_write
> >   do_sync_write
> >   vfs_write
> >   sys_pwrite64
> > 
> > and for [loop0]:
> > 
> >   io_schedule_timeout
> >   congestion_wait
> >   throttle_vm_writeout
> >   shrink_zone
> >   shrink_zones
> >   try_to_free_pages
> >   __alloc_pages
> >   find_or_create_page
> >   do_lo_send_aops
> >   lo_send
> >   do_bio_filebacked
> >   loop_thread
> > 
> > The requirement for the deadlock is that
> > 
> >   nr_writeback > dirty_thresh * 1.1 + margin
> > 
> > Again margin seems to be in the 100 page range.
> > 
> > The task of throttle_vm_writeout is to limit the rate at which
> > under-writeback pages are created due to swapping.  There's no other
> > way direct reclaim can increase the nr_writeback + nr_file_dirty.
> > 
> > So when there are few or no under-swap pages, it is safe for this
> > function to return.  This ensures, that there's progress with writing
> > back dirty pages.
> > 
> 
> Would this also be solved by the below just-submitted bugfix?  I guess not.
> 
> I think the basic problem here is that the loop thread is reponsible for 
> cleaning
> memory, but in throttle_vm_writeout(), the loop thread can get stuck waiting
> for some other thread to clean memory, but that ain't going to happen.
> 
> throttle_vm_writeout() wasn't very well thought through, I suspect.
> 
> 
> I suspect we can just delete throttle_vm_writeout() now.  The original
> rationale was:
> 
> [PATCH] vm: pageout throttling
> 
> With silly pageout testcases it is possible to place huge amounts of 
> memory
> under I/O.  With a large request queue (CFQ uses 8192 requests) it is
> possible to place _all_ memory under I/O at the same time.
> 
> This means that all memory is pinned and unreclaimable and the VM gets
> upset and goes oom.
> 
> The patch limits the amount of memory which is under pageout writeout to 
> be
> a little more than the amount of memory at which balance_dirty_pages()
> callers will synchronously throttle.
> 
> This means that heavy pageout activity can starve heavy writeback activity
> completely, but heavy writeback activity will not cause starvation of
> pageout.  Because we don't want a simple `dd' to be causing excessive
> latencies in page reclaim.
> 
> but now that we limit the amount of dirty MAP_SHARED memory, and given that
> the various pieces of the dirty-memory limiting puzzle also take the number
> of under-writeback pages into account, we should no longer be able to get
> in the situation where the total number of writeback+dirty pages exceeds
> dirty_ratio.

Only with swap, which can generate a lot of writeback pages, without
limiting.  Does this matter?  I guess that depends on the queue
length.  If a lot of swap requests are using up memory, than that's
bad.

> From: Andrew Morton <[EMAIL PROTECTED]>
> 
> throttle_vm_writeout() is designed to wait for the dirty levels to subside. 
> But if the caller holds IO or FS locks, we might be holding up that writeout.
> 
> So change it to take a single nap to give other devices a chance to clean some
> memory, then return.
> 

Why does it nap unconditionally for GFP_NOFS/NOIO?  That seems to be a
waste of time.

Btw, I did try not calling throttle_vm_writeout() for GFP_NOFS/NOIO
and IIRC it didn't help.  I'll check again.

Thanks,
Miklos


> Cc: Nick Piggin <[EMAIL PROTECTED]>
> Cc: OGAWA Hirofumi <[EMAIL PROTECTED]>
> Cc: Kumar Gala <[EMAIL PROTECTED]>
> Cc: Pete Zaitcev <[EMAIL PROTECTED]>
> Cc: <[EMAIL PROTECTED]>
> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
> ---
> 
>  include/linux/writeback.h |2 +-
>  mm/page-writeback.c   |   13 +++--
>  mm/vmscan.c   |2 +-
>  3 files changed, 13 insertions(+), 4 deletions(-)
> 
> diff -puN 
> include/linux/writeback.h~throttle_vm_writeout-dont-loop-on-gfp_nofs-and-gfp_noio-allocations
>  include/linux/writeback.h
> --- 
> 

Re: [PATCH 2.6.20] kobject net ifindex + rename

2007-02-28 Thread Jarek Poplawski
On Wed, Feb 28, 2007 at 10:45:41AM -0800, Jean Tourrilhes wrote:
> On Wed, Feb 28, 2007 at 10:34:37AM +0100, Jarek Poplawski wrote:
> > On 28-02-2007 02:27, Jean Tourrilhes wrote:
> > >   Hi all,
> > ...
> > >   Patch for 2.6.20 is attached. The patch was tested on a system
> > > running the hotplug scripts, and on another system running udev.
> > > 
> > >   Have fun...
> > > 
> > >   Jean
> > > 
> > > Signed-off-by: Jean Tourrilhes <[EMAIL PROTECTED]>
> > > 
> > > -
> > ...
> > > diff -u -p linux/net/core/net-sysfs.j1.c linux/net/core/net-sysfs.c
> > > --- linux/net/core/net-sysfs.j1.c 2007-02-27 15:01:08.0 -0800
> > > +++ linux/net/core/net-sysfs.c2007-02-27 15:06:49.0 -0800
> > > @@ -412,6 +412,17 @@ static int netdev_uevent(struct class_de
> > >   if ((size <= 0) || (i >= num_envp))
> > >   return -ENOMEM;
> > >  
> > > + /* pass ifindex to uevent.
> > > +  * ifindex is useful as it won't change (interface name may change)
> > > +  * and is what RtNetlink uses natively. */
> > > + envp[i++] = buf;
> > > + n = snprintf(buf, size, "IFINDEX=%d", dev->ifindex) + 1;
> > > + buf += n;
> > > + size -= n;
> > > +
> > > + if ((size <= 0) || (i >= num_envp))
> > 
> > Btw.:
> > 1. if size == 10 and snprintf returns 9 (without NULL)
> >then n == 10 (with NULL), so isn't it enough (here and above):
> >  
> > if ((size < 0) || (i >= num_envp))
> 
>   I just cut'n'pasted the code a few line above. If the original
> code is incorrect, it need fixing. And it will need fixing in probably
> a lot of places.

I think you're kind of responsible for your part, at least.

> 
> > 2. shouldn't there be (here and above):
> >  
> > envp[--i] = NULL;
> > 
> 
>   No, envp is local, so who cares.

But envp[i] isn't (at least here). So, I guess, a caller
of this function could care.

> > > + if ((size <= 0) || (i >= num_envp))
> > > + return -ENOMEM;

And one more thing (not necessarily for you):
ENOBUFS is probably more adequate here.

Cheers,
Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] - Altix: reinitialize acpi tables

2007-02-28 Thread Len Brown
so will the 1st acpi_table_init() always fail -- even
on future machines?

-Len

On Wednesday 28 February 2007 18:47, John Keller wrote:
> To provide compatibilty with SN kernels that do and do not
> have ACPI IO support, the SN PROM must build different
> versions of some ACPI tables based on which kernel is booting.
> As such, the tables may have to change at kernel boot time.
> By default, prior to kernel boot, the PROM builds an empty
> DSDT (header only) and no SSDTs. If an ACPI capable kernel
> boots, the kernel will notify the PROM, at platform setup time,
> and the PROM will build full DSDT and SSDT tables.
> 
> With the latest changes to acpi_table_init(), the table lengths
> are saved, and when our PROM changes them, the changes are not seen,
> and the kernel will crash on boot. Because of issues with kexec support,
> we are not able to create the tables prior to acpi_table_init().
> As a result, we are making a second call to acpi_table_init() to
> process the rebuilt DSDT and SSDTs.
> 
> Signed-off-by: John Keller <[EMAIL PROTECTED]>
> ---
> 
> 
> Index: release/arch/ia64/sn/kernel/setup.c
> ===
> --- release.orig/arch/ia64/sn/kernel/setup.c  2007-02-28 11:02:34.558139870 
> -0600
> +++ release/arch/ia64/sn/kernel/setup.c   2007-02-28 11:02:39.362737953 
> -0600
> @@ -397,6 +397,8 @@ void __init sn_setup(char **cmdline_p)
>   ia64_sn_set_os_feature(OSF_PCISEGMENT_ENABLE);
>   ia64_sn_set_os_feature(OSF_ACPI_ENABLE);
>  
> + /* Load the new DSDT and SSDT tables into the global table list. */
> + acpi_table_init();
>  
>  #if defined(CONFIG_VT) && defined(CONFIG_VGA_CONSOLE)
>   /*
> -
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 03/22] fix deadlock in balance_dirty_pages

2007-02-28 Thread Miklos Szeredi
> > This deadlock happens, when dirty pages from one filesystem are
> > written back through another filesystem.  It easiest to demonstrate
> > with fuse although it could affect looback mounts as well (see
> > following patches).
> > 
> > Let's call the filesystems A(bove) and B(elow).  Process Pr_a is
> > writing to A, and process Pr_b is writing to B.
> > 
> > Pr_a is bash-shared-mapping.  Pr_b is the fuse filesystem daemon
> > (fusexmp_fh), for simplicity let's assume that Pr_b is single
> > threaded.
> > 
> > These are the simplified stack traces of these processes after the
> > deadlock:
> > 
> > Pr_a (bash-shared-mapping):
> > 
> >   (block on queue)
> >   fuse_writepage
> >   generic_writepages
> >   writeback_inodes
> >   balance_dirty_pages
> >   balance_dirty_pages_ratelimited_nr
> >   set_page_dirty_mapping_balance
> >   do_no_page
> > 
> > 
> > Pr_b (fusexmp_fh):
> > 
> >   io_schedule_timeout
> >   congestion_wait
> >   balance_dirty_pages
> >   balance_dirty_pages_ratelimited_nr
> >   generic_file_buffered_write
> >   generic_file_aio_write
> >   ext3_file_write
> >   do_sync_write
> >   vfs_write
> >   sys_pwrite64
> > 
> > 
> > Thanks to the aggressive nature of Pr_a, it can happen, that
> > 
> >   nr_file_dirty > dirty_thresh + margin
> > 
> > This is due to both nr_dirty growing and dirty_thresh shrinking, which
> > in turn is due to nr_file_mapped rapidly growing.  The exact size of
> > the margin at which the deadlock happens is not known, but it's around
> > 100 pages.
> > 
> > At this point Pr_a enters balance_dirty_pages and starts to write back
> > some if it's dirty pages.  After submitting some requests, it blocks
> > on the request queue.
> > 
> > The first write request will trigger Pr_b to perform a write()
> > syscall.  This will submit a write request to the block device and
> > then may enter balance_dirty_pages().
> > 
> > The condition for exiting balance_dirty_pages() is
> > 
> >  - either that write_chunk pages have been written
> > 
> >  - or nr_file_dirty + nr_writeback < dirty_thresh
> > 
> > It is entirely possible that less than write_chunk pages were written,
> > in which case balance_dirty_pages() will not exit even after all the
> > submitted requests have been succesfully completed.
> > 
> > Which means that the write() syscall does not return.
> 
> But the balance_dirty_pages() loop does more than just wait for those two
> conditions.  It will also submit _more_ dirty pages for writeout.  ie: it
> should be feeding more of file A's pages into writepage.
> 
> Why isn't that happening?

All of A's data is actually written by B.  So just submitting more
pages to some queue doesn't help, it will just make the queue longer.

If the queue length were not limited, and B would have limitless
threads, and the write() wouldn't exclude other writes to the same
file (i_mutex), then there would be no deadlock.

But for fuse the first and the last condition isn't met.

For the loop device the second condition isn't met, loop is single
threaded.

Thanks,
Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] bonding: replace system timer with work queue

2007-02-28 Thread Andrew Morton
On Wed, 28 Feb 2007 10:12:01 +0100 (CET) Jaroslav Kysela <[EMAIL PROTECTED]> 
wrote:

> Hi,
> 
>   please, review and apply to mm tree for further testing. The patch 
> is also available at 
> ftp://ftp.alsa-project.org/pub/kernel-patches/bonding-workqueue.patch .

Please cc netdev@vger.kernel.org on net-related patches, thanks.

>   Thank you,
>   Jaroslav
> 
> ==
> bonding: replace system timer with work queue
> 
> This patch replaces system timer with work queue in monitor functions.
> The reason for this change is that bonding handlers calls various
> sleeping functions from the timer handler which is not allowed.

Which sleeping functions?  I'd have expected the kernel to spew runtime
warnings when this happens, but I don't recall any such reports.


> Because we cannot share the main workqueue threads (rtnl_lock is used
> also in linkwatch_event) - new bond workqueue thread is created.
> 
> Signed-off-by: Jaroslav Kysela <[EMAIL PROTECTED]>
> 
> diff -rupN linux-2.6.20.orig/drivers/net/bonding/bond_3ad.c 
> linux-2.6.20/drivers/net/bonding/bond_3ad.c
> --- linux-2.6.20.orig/drivers/net/bonding/bond_3ad.c  2007-02-04 
> 19:44:54.0 +0100
> +++ linux-2.6.20/drivers/net/bonding/bond_3ad.c   2007-02-28 
> 09:19:43.831369202 +0100
> @@ -2097,8 +2097,10 @@ void bond_3ad_unbind_slave(struct slave 
>   * times out, and it selects an aggregator for the ports that are yet not
>   * related to any aggregator, and selects the active aggregator for a bond.
>   */
> -void bond_3ad_state_machine_handler(struct bonding *bond)
> +void bond_3ad_state_machine_handler(struct work_struct *work)
>  {
> + struct ad_bond_info *ad_info = container_of(work, struct ad_bond_info, 
> ad_work.work);
> + struct bonding *bond = (struct bonding *)((char *)ad_info - 
> offsetof(struct bonding, ad_info));

We can use containers_of here too?

> -void bond_alb_monitor(struct bonding *bond)
> +void bond_alb_monitor(struct work_struct *work)
>  {
> - struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
> + struct alb_bond_info *bond_info = container_of(work, struct 
> alb_bond_info, alb_work.work);
> + struct bonding *bond = (struct bonding *)((char *)bond_info - 
> offsetof(struct bonding, alb_info));

And here.

> + cancel_rearming_delayed_workqueue(bond_wq, 
> &(BOND_AD_INFO(bond).ad_work));
>   break;
>   case BOND_MODE_TLB:
>   case BOND_MODE_ALB:
> - del_timer_sync(&(BOND_ALB_INFO(bond).alb_timer));
> + cancel_rearming_delayed_workqueue(bond_wq, 
> &(BOND_ALB_INFO(bond).alb_work));
>   break;
>   default:
>   break;
> @@ -4289,6 +4272,14 @@ static int bond_init(struct net_device *
>   rwlock_init(>lock);
>   rwlock_init(>curr_slave_lock);
>  
> + /* initialize work */
> + INIT_DELAYED_WORK(>mii_work, (void *)_mii_monitor);
> + if (params->mode == BOND_MODE_ACTIVEBACKUP) {
> + INIT_DELAYED_WORK(>arp_work, (void 
> *)_activebackup_arp_mon);
> + } else {
> + INIT_DELAYED_WORK(>arp_work, (void 
> *)_loadbalance_arp_mon);
> + }

Can we lose the unneeded braces, the unneeded typecasts and fit the code
into 80 cols?



yup.

>   bond->params = *params; /* copy params struct */
>  
>   /* Initialize pointers */
> @@ -4782,6 +4773,12 @@ static int __init bonding_init(void)
>   goto err;
>   }
>  
> + bond_wq = create_singlethread_workqueue("bond");
> + if (bond_wq == NULL) {
> + res = -ENOMEM;
> + goto err;
> + }
> +
>   res = bond_create_sysfs();
>   if (res)
>   goto err;
> @@ -4807,6 +4804,7 @@ static void __exit bonding_exit(void)
>  
>   rtnl_lock();
>   bond_free_all();
> + destroy_workqueue(bond_wq);
>   bond_destroy_sysfs();
>   rtnl_unlock();

Are you sure that all pending delayed works have been cancelled when we
destroy this workqueue?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 01/22] update ctime and mtime for mmaped write

2007-02-28 Thread Miklos Szeredi
> >> What happens if the application overwrites what it had written some
> >> time later?  Nothing.  The page is already read-write, the pte dirty,
> >> so even though the file was clearly modified, there's absolutely no
> >> way in which this can be used to force an update to the timestamp.
> >> 
> >
> > Which, I realize now, actually means, that the patch is wrong.  Msync
> > will have to write protect the page table entries, so that later
> > dirtyings may have an effect on the timestamp.
> 
> I thought that PeterZ's changes were to write-protect the page after
> cleaning it so that future modifications could be detected and tracked
> accordingly?  Does the right thing not happen already?

Yes, but MS_ASYNC does not clean the pages.

In fact a better solution may be to rely on the dirty bit in the page
tables, so that no more page faults are necessary.

Thanks,
Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] linux-kernel-markers-documentation-update-flags

2007-02-28 Thread Mathieu Desnoyers
linux-kernel-markers-documentation-update-flags

Documents the flag usage.

Signed-off-by: Mathieu Desnoyers <[EMAIL PROTECTED]>

--- a/Documentation/marker.txt
+++ b/Documentation/marker.txt
@@ -78,6 +78,21 @@ which saves a data cache hit, but also requires cross CPU 
code modification. In
 order to support embedded systems which use read-only memory for their code, 
the
 optimization can be disabled through menu options.
 
+The MF_* flags can be used to control the type of marker. See the
+include/marker.h header for the list of flags. They can be specified as the
+first parameter of the _MARK() macro, such as the following example which is
+safe wrt lockdep.c (useful for marking lockdep.c functions).
+
+_MARK(_MF_DEFAULT | ~_MF_LOCKDEP, subsystem_eventb,
+  MARK_NOARGS);
+
+Another example is to specify that a specific marker must never call printk :
+_MARK(_MF_DEFAULT | ~_MF_PRINTK, subsystem_eventc,
+  "%d %s %p[struct task_struct]",
+  someint, somestring, current);
+
+Flag compatibility is checked before connecting the probe to the marker.
+
 
 * Probe example
 
@@ -122,6 +137,13 @@ void probe_subsystem_event(const char *format, ...)
va_end(ap);
 }
 
+#define SUBSYSTEM_EVENTB_FORMAT MARK_NOARGS
+void probe_subsystem_eventb(const char *format, ...)
+{
+   /* Increment counters, trace, ... but _never_ generate a call to
+* lockdep.c ! */
+}
+
 static int __init probe_init(void)
 {
int result;
@@ -130,16 +152,24 @@ static int __init probe_init(void)
probe_subsystem_event);
if (!result)
goto cleanup;
+   result = _marker_set_probe(_MF_DEFAULT & ~_MF_LOCKDEP,
+   "subsystem_eventb",
+   SUBSYSTEM_EVENTB_FORMAT,
+   probe_subsystem_eventb);
+   if (!result)
+   goto cleanup;
return 0;
 
 cleanup:
marker_remove_probe(probe_subsystem_event);
+   marker_remove_probe(probe_subsystem_eventb);
return -EPERM;
 }
 
 static void __exit probe_fini(void)
 {
marker_remove_probe(probe_subsystem_event);
+   marker_remove_probe(probe_subsystem_eventb);
 }
 
 module_init(probe_init);
-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] linux-kernel-markers-non-optimized-architures-fallback-flags

2007-02-28 Thread Mathieu Desnoyers
linux-kernel-markers-non-optimized-architures-fallback-flags

- asm-generic/marker.h is now only used as a fallback defining _MARK as
  MARK_GENERIC.
- flags support

Signed-off-by: Mathieu Desnoyers <[EMAIL PROTECTED]>

--- a/include/asm-generic/marker.h
+++ b/include/asm-generic/marker.h
@@ -1,8 +1,11 @@
+#ifndef _ASM_GENERIC_MARKER_H
+#define _ASM_GENERIC_MARKER_H
+
 /*
  * marker.h
  *
  * Code markup for dynamic and static tracing. Generic header.
  *
  * This file is released under the GPLv2.
  * See the file COPYING for more details.
  *
@@ -10,31 +13,18 @@
  * "used" attribute to fix a gcc 4.1.x bug.
  */
 
-#ifdef CONFIG_MARKERS
+#define _MF_DEFAULT(_MF_LOCKDEP | _MF_PRINTK)
 
-#define GEN_MARK(name, format, args...) \
-   do { \
-   static marker_probe_func *__mark_call_##name = \
-   __mark_empty_function; \
-   static char __marker_enable_##name = 0; \
-   static const struct __mark_marker_c __mark_c_##name \
-   __attribute__((section(".markers.c"))) = \
-   { #name, &__mark_call_##name, format, \
-   MARKER_GENERIC } ; \
-   static const struct __mark_marker __mark_##name \
-   __attribute__((section(".markers"))) = \
-   { &__mark_c_##name, &__marker_enable_##name } ; \
-   asm volatile ( "" : : "i" (&__mark_##name)); \
-   __mark_check_format(format, ## args); \
-   if (unlikely(__marker_enable_##name)) { \
-   preempt_disable(); \
-   (*__mark_call_##name)(format, ## args); \
-   preempt_enable(); \
-   } \
-   } while (0)
+#define MARK_OPTIMIZED MARK_GENERIC
+#define _MARK  MARK_GENERIC
+#define MARK(format, args...)  _MARK(_MF_DEFAULT, format, ## args)
 
+#define MARK_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET \
+   MARK_GENERIC_ENABLE_IMMEDIATE_OFFSET
+#define MARK_OPTIMIZED_ENABLE_TYPE MARK_GENERIC_ENABLE_TYPE
+/* Dereference enable as lvalue from a pointer to its instruction */
+#define MARK_OPTIMIZED_ENABLE  MARK_GENERIC_ENABLE
 
-#define GEN_MARK_ENABLE_IMMEDIATE_OFFSET 0
-#define GEN_MARK_ENABLE_TYPE char
+#define marker_optimized_set_enable marker_generic_set_enable
 
-#endif
+#endif /* _ASM_GENERIC_MARKER_H */
-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/5] Move definition of hard_smp_processor_id to asm/smp.h - alpha, m32r, powerpc, s390, sparc, sparc64, um

2007-02-28 Thread Fernando Luis Vázquez Cao
Move definition of hard_smp_processor_id to asm/smp.h on alpha, m32r,
powerpc, s390, sparc, sparc64, and um architectures.

Signed-off-by: Fernando Luis Vazquez Cao <[EMAIL PROTECTED]>
---

diff -urNp linux-2.6.21-rc2/include/asm-alpha/smp.h 
linux-2.6.21-rc2-hwcpuid/include/asm-alpha/smp.h
--- linux-2.6.21-rc2/include/asm-alpha/smp.h2007-02-05 03:44:54.0 
+0900
+++ linux-2.6.21-rc2-hwcpuid/include/asm-alpha/smp.h2007-03-07 
13:34:14.0 +0900
@@ -51,6 +51,7 @@ int smp_call_function_on_cpu(void (*func
 
 #else /* CONFIG_SMP */
 
+#define hard_smp_processor_id()0
 #define smp_call_function_on_cpu(func,info,retry,wait,cpu)({ 0; })
 
 #endif /* CONFIG_SMP */
diff -urNp linux-2.6.21-rc2/include/asm-m32r/smp.h 
linux-2.6.21-rc2-hwcpuid/include/asm-m32r/smp.h
--- linux-2.6.21-rc2/include/asm-m32r/smp.h 2007-03-07 12:04:26.0 
+0900
+++ linux-2.6.21-rc2-hwcpuid/include/asm-m32r/smp.h 2007-03-07 
13:39:22.0 +0900
@@ -108,6 +108,10 @@ extern unsigned long send_IPI_mask_phys(
 #define IPI_SHIFT  (0)
 #define NR_IPIS(8)
 
-#endif /* CONFIG_SMP */
+#else  /* CONFIG_SMP */
+
+#define hard_smp_processor_id()0
+
+#endif /* CONFIG_SMP */
 
 #endif /* _ASM_M32R_SMP_H */
diff -urNp linux-2.6.21-rc2/include/asm-powerpc/smp.h 
linux-2.6.21-rc2-hwcpuid/include/asm-powerpc/smp.h
--- linux-2.6.21-rc2/include/asm-powerpc/smp.h  2007-03-07 12:04:26.0 
+0900
+++ linux-2.6.21-rc2-hwcpuid/include/asm-powerpc/smp.h  2007-03-07 
13:54:18.0 +0900
@@ -83,6 +83,7 @@ extern void __cpu_die(unsigned int cpu);
 
 #else
 /* for UP */
+#define hard_smp_processor_id()0
 #define smp_setup_cpu_maps()
 
 #endif /* CONFIG_SMP */
diff -urNp linux-2.6.21-rc2/include/asm-s390/smp.h 
linux-2.6.21-rc2-hwcpuid/include/asm-s390/smp.h
--- linux-2.6.21-rc2/include/asm-s390/smp.h 2007-03-07 12:04:26.0 
+0900
+++ linux-2.6.21-rc2-hwcpuid/include/asm-s390/smp.h 2007-03-07 
13:43:31.0 +0900
@@ -113,6 +113,7 @@ static inline void smp_send_stop(void)
__load_psw_mask(psw_kernel_bits & ~PSW_MASK_MCHECK);
 }
 
+#define hard_smp_processor_id()0
 #define smp_cpu_not_running(cpu)   1
 #define smp_get_cpu(cpu) ({ 0; })
 #define smp_put_cpu(cpu) ({ 0; })
diff -urNp linux-2.6.21-rc2/include/asm-sparc/smp.h 
linux-2.6.21-rc2-hwcpuid/include/asm-sparc/smp.h
--- linux-2.6.21-rc2/include/asm-sparc/smp.h2007-02-05 03:44:54.0 
+0900
+++ linux-2.6.21-rc2-hwcpuid/include/asm-sparc/smp.h2007-03-07 
13:36:27.0 +0900
@@ -165,6 +165,7 @@ void smp_setup_cpu_possible_map(void);
 
 #else /* SMP */
 
+#define hard_smp_processor_id()0
 #define smp_setup_cpu_possible_map() do { } while (0)
 
 #endif /* !(SMP) */
diff -urNp linux-2.6.21-rc2/include/asm-sparc64/smp.h 
linux-2.6.21-rc2-hwcpuid/include/asm-sparc64/smp.h
--- linux-2.6.21-rc2/include/asm-sparc64/smp.h  2007-02-05 03:44:54.0 
+0900
+++ linux-2.6.21-rc2-hwcpuid/include/asm-sparc64/smp.h  2007-03-07 
13:48:35.0 +0900
@@ -47,6 +47,7 @@ extern void smp_setup_cpu_possible_map(v
 
 #else
 
+#define hard_smp_processor_id()0
 #define smp_setup_cpu_possible_map() do { } while (0)
 
 #endif /* !(CONFIG_SMP) */
diff -urNp linux-2.6.21-rc2/include/asm-um/smp.h 
linux-2.6.21-rc2-hwcpuid/include/asm-um/smp.h
--- linux-2.6.21-rc2/include/asm-um/smp.h   2007-02-05 03:44:54.0 
+0900
+++ linux-2.6.21-rc2-hwcpuid/include/asm-um/smp.h   2007-03-07 
13:26:01.0 +0900
@@ -24,6 +24,10 @@ extern inline void smp_cpus_done(unsigne
 
 extern struct task_struct *idle_threads[NR_CPUS];
 
+#else
+
+#define hard_smp_processor_id()0
+
 #endif
 
 #endif


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/5] Always ask the hardware to obtain hardware processor id - ia64

2007-02-28 Thread Fernando Luis Vázquez Cao
Always ask the hardware to determine the hardware processor id in both UP and 
SMP kernels.

Signed-off-by: Fernando Luis Vazquez Cao <[EMAIL PROTECTED]>
---

diff -urNp linux-2.6.21-rc2/include/asm-ia64/smp.h 
linux-2.6.21-rc2-hwcpuid/include/asm-ia64/smp.h
--- linux-2.6.21-rc2/include/asm-ia64/smp.h 2007-02-05 03:44:54.0 
+0900
+++ linux-2.6.21-rc2-hwcpuid/include/asm-ia64/smp.h 2007-03-07 
13:03:48.0 +0900
@@ -38,6 +38,8 @@ ia64_get_lid (void)
return lid.f.id << 8 | lid.f.eid;
 }
 
+#define hard_smp_processor_id()ia64_get_lid()
+
 #ifdef CONFIG_SMP
 
 #define XTP_OFFSET 0x1e0008
@@ -110,8 +112,6 @@ max_xtp (void)
writeb(0x0f, ipi_base_addr + XTP_OFFSET); /* Set XTP to max */
 }
 
-#define hard_smp_processor_id()ia64_get_lid()
-
 /* Upping and downing of CPUs */
 extern int __cpu_disable (void);
 extern void __cpu_die (unsigned int cpu);
@@ -128,7 +128,7 @@ extern void unlock_ipi_calllock(void);
 extern void identify_siblings (struct cpuinfo_ia64 *);
 extern int is_multithreading_enabled(void);
 
-#else
+#else /* CONFIG_SMP */
 
 #define cpu_logical_id(i)  0
 #define cpu_physical_id(i) ia64_get_lid()


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/5] Use the APIC to determine the hardware processor id - x86_64

2007-02-28 Thread Fernando Luis Vázquez Cao
Use the APIC to determine the hardware processor id in both UP and SMP
kernels.

Signed-off-by: Fernando Luis Vazquez Cao <[EMAIL PROTECTED]>
---

diff -urNp linux-2.6.21-rc2/include/asm-x86_64/smp.h 
linux-2.6.21-rc2-hwcpuid/include/asm-x86_64/smp.h
--- linux-2.6.21-rc2/include/asm-x86_64/smp.h   2007-02-05 03:44:54.0 
+0900
+++ linux-2.6.21-rc2-hwcpuid/include/asm-x86_64/smp.h   2007-03-07 
12:42:47.0 +0900
@@ -58,12 +58,6 @@ static inline int num_booting_cpus(void)
 
 #define raw_smp_processor_id() read_pda(cpunumber)
 
-static inline int hard_smp_processor_id(void)
-{
-   /* we don't want to mark this access volatile - bad code generation */
-   return GET_APIC_ID(*(unsigned int *)(APIC_BASE+APIC_ID));
-}
-
 extern int __cpu_disable(void);
 extern void __cpu_die(unsigned int cpu);
 extern void prefill_possible_map(void);
@@ -72,7 +66,13 @@ extern unsigned disabled_cpus;
 
 #define NO_PROC_ID 0xFF/* No processor magic marker */
 
-#endif
+#endif /* CONFIG_SMP */
+
+static inline int hard_smp_processor_id(void)
+{
+   /* we don't want to mark this access volatile - bad code generation */
+   return GET_APIC_ID(*(unsigned int *)(APIC_BASE+APIC_ID));
+}
 
 /*
  * Some lowlevel functions might want to know about


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/5] Use the APIC to determine the hardware processor id - i386

2007-02-28 Thread Fernando Luis Vázquez Cao
Use the APIC to determine the hardware processor id when APIC support
has been selected, independently of whether CONFIG_SMP is set or not.

Signed-off-by: Fernando Luis Vazquez Cao <[EMAIL PROTECTED]>
---

diff -urNp linux-2.6.21-rc2/include/asm-i386/smp.h 
linux-2.6.21-rc2-hwcpuid/include/asm-i386/smp.h
--- linux-2.6.21-rc2/include/asm-i386/smp.h 2007-03-01 14:02:21.0 
+0900
+++ linux-2.6.21-rc2-hwcpuid/include/asm-i386/smp.h 2007-03-01 
14:08:50.0 +0900
@@ -74,20 +74,6 @@ static inline int num_booting_cpus(void)
return cpus_weight(cpu_callout_map);
 }
 
-#ifdef CONFIG_X86_LOCAL_APIC
-
-#ifdef APIC_DEFINITION
-extern int hard_smp_processor_id(void);
-#else
-#include 
-static inline int hard_smp_processor_id(void)
-{
-   /* we don't want to mark this access volatile - bad code generation */
-   return GET_APIC_ID(*(unsigned long *)(APIC_BASE+APIC_ID));
-}
-#endif
-#endif
-
 extern int safe_smp_processor_id(void);
 extern int __cpu_disable(void);
 extern void __cpu_die(unsigned int cpu);
@@ -102,10 +88,23 @@ extern unsigned int num_processors;
 
 #define NO_PROC_ID 0xFF/* No processor magic marker */
 
-#endif
+#endif /* CONFIG_SMP */
 
 #ifndef __ASSEMBLY__
 
+#ifdef CONFIG_X86_LOCAL_APIC
+#ifdef APIC_DEFINITION
+extern int hard_smp_processor_id(void);
+#else
+#include 
+static inline int hard_smp_processor_id(void)
+{
+   /* we don't want to mark this access volatile - bad code generation */
+   return GET_APIC_ID(*(unsigned long *)(APIC_BASE+APIC_ID));
+}
+#endif /* APIC_DEFINITION */
+#endif /* CONFIG_X86_LOCAL_APIC */
+
 extern u8 apicid_2_node[];
 
 #ifdef CONFIG_X86_LOCAL_APIC


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RFC 0/5] hard_smp_processor_id overhaul

2007-02-28 Thread Fernando Luis Vázquez Cao
With the advent of kdump, the assumption that the boot CPU when running
an UP kernel is always the CPU with a hardware ID of 0 (usually referred
to as BSP on some architectures) does not hold true anymore. The reason
being that the dump capture kernel boots on the crashed CPU (the CPU
that invoked crash_kexec).

As a consequence, the hardcoding of hard_smp_processor_id() to 0 on UP
systems (see "linux/smp.h") is not correct.

This patch-set does the following:

1- Remove hardcoding of hard_smp_processor_id on UP systems.

2- Ask the hardware when possible to obtain the hardware processor id on
i386, x86_64, and ia64, independently of whether CONFIG_SMP is set or
not.

3- Move definition of hard_smp_processor_id for the UP case to asm/smp.h
on alpha, m32r, powerpc, s390, sparc, sparc64, and um architectures. I
guess that hardware features could be used to implement
hard_smp_processor_id even in the UP case, but since I am not an expert
in this architectures I just move the definition.

The patches have been tested on i386, x86_64, and ia64.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/5] Remove hardcoding of hard_smp_processor_id on UP systems

2007-02-28 Thread Fernando Luis Vázquez Cao
With the advent of kdump, the assumption that the boot CPU when booting
an UP kernel is always the CPU with a hardware ID of 0 (usually referred
to as BSP on some architectures) is not valid anymore.

Signed-off-by: Fernando Luis Vazquez Cao <[EMAIL PROTECTED]>
---

diff -urNp linux-2.6.21-rc2/include/linux/smp.h 
linux-2.6.21-rc2-hwcpuid/include/linux/smp.h
--- linux-2.6.21-rc2/include/linux/smp.h2007-02-05 03:44:54.0 
+0900
+++ linux-2.6.21-rc2-hwcpuid/include/linux/smp.h2007-03-07 
12:02:13.0 +0900
@@ -83,7 +83,6 @@ void smp_prepare_boot_cpu(void);
  * These macros fold the SMP functionality into a single CPU system
  */
 #define raw_smp_processor_id() 0
-#define hard_smp_processor_id()0
 static inline int up_smp_call_function(void)
 {
return 0;


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] linux-kernel-markers-i386-optimization-flags

2007-02-28 Thread Mathieu Desnoyers
linux-kernel-markers-i386-optimization-flags

i386 marker optimization flags support.
Also fixes the @progbits assembly section declaration.

Signed-off-by: Mathieu Desnoyers <[EMAIL PROTECTED]>

--- a/arch/i386/kernel/marker.c
+++ b/arch/i386/kernel/marker.c
@@ -40,7 +40,7 @@ static int mark_notifier(struct notifier_block *nb,
 {
enum die_val die_val = (enum die_val) val;
struct die_args *args = (struct die_args *)data;
-
+   
if (!args->regs || user_mode_vm(args->regs))
return NOTIFY_DONE;
 
@@ -56,7 +56,7 @@ static struct notifier_block mark_notify = {
.priority = 0x7fff, /* we need to be notified first */
 };
 
-int arch_marker_set_ins_enable(void *address, char enable)
+int marker_optimized_set_enable(void *address, char enable)
 {
char saved_byte;
int ret;
@@ -91,4 +91,4 @@ int arch_marker_set_ins_enable(void *address, char enable)
flush_icache_range(address, size);
return 0;
 }
-EXPORT_SYMBOL_GPL(arch_marker_set_ins_enable);
+EXPORT_SYMBOL_GPL(marker_optimized_set_enable);
--- a/include/asm-i386/marker.h
+++ b/include/asm-i386/marker.h
@@ -11,16 +11,19 @@
 
 
 #ifdef CONFIG_MARKERS
-#define MARK(name, format, args...) \
+
+#define _MF_DEFAULT (_MF_OPTIMIZED | _MF_LOCKDEP | _MF_PRINTK)
+
+#define MARK_OPTIMIZED(flags, name, format, args...) \
do { \
static marker_probe_func *__mark_call_##name = \
__mark_empty_function; \
static const struct __mark_marker_c __mark_c_##name \
__attribute__((section(".markers.c"))) = \
{ #name, &__mark_call_##name, format, \
-   MARKER_OPTIMIZED } ; \
+   (flags) | _MF_OPTIMIZED } ; \
char condition; \
-   asm volatile(   ".section .markers, \"a\";\n\t" \
+   asm volatile(   ".section .markers, \"a\", @progbits;\n\t" \
".long %1, 0f;\n\t" \
".previous;\n\t" \
".align 2\n\t" \
@@ -36,12 +39,25 @@
} \
} while (0)
 
+#define _MARK(flags, format, args...) \
+do { \
+   if (((flags) & _MF_LOCKDEP) && ((flags) & _MF_OPTIMIZED)) \
+   MARK_OPTIMIZED(flags, format, ## args); \
+   else \
+   MARK_GENERIC(flags, format, ## args); \
+} while (0)
+
+#define MARK(format, args...) _MARK(_MF_DEFAULT, format, ## args)
+
 /* Offset of the immediate value from the start of the movb instruction, in
  * bytes. */
-#define MARK_ENABLE_IMMEDIATE_OFFSET 1
-#define MARK_ENABLE_TYPE char
-#define MARK_POLYMORPHIC
+#define MARK_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET 1
+#define MARK_OPTIMIZED_ENABLE_TYPE char
+/* Dereference enable as lvalue from a pointer to its instruction */
+#define MARK_OPTIMIZED_ENABLE(a) \
+   *(MARK_OPTIMIZED_ENABLE_TYPE*) \
+   ((char*)a+MARK_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET)
 
-extern int arch_marker_set_ins_enable(void *address, char enable);
+extern int marker_optimized_set_enable(void *address, char enable);
 
 #endif
-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] linux-kernel-markers-powerpc-optimization-flags

2007-02-28 Thread Mathieu Desnoyers
linux-kernel-markers-powerpc-optimization-flags

Add flag support to powerpc optimization
Fix the .section .markers, \"a\", @progbits;\n\t inline assembly :
adding the @progbits seems to remove a warning from the linker (forces
the section to DATA, which is the same a what the MARK_GENERIC version
does).

Signed-off-by: Mathieu Desnoyers <[EMAIL PROTECTED]>

--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -92,3 +92,4 @@ obj-$(CONFIG_PPC64)   += $(obj64-y)
 
 extra-$(CONFIG_PPC_FPU)+= fpu.o
 extra-$(CONFIG_PPC64)  += entry_64.o
+obj-$(CONFIG_MARKERS_ENABLE_OPTIMIZATION)  += marker.o
--- /dev/null
+++ b/arch/powerpc/kernel/marker.c
@@ -0,0 +1,25 @@
+/* marker.c
+ *
+ * Powerpc optimized marker enabling/disabling.
+ *
+ * Mathieu Desnoyers <[EMAIL PROTECTED]>
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+int marker_optimized_set_enable(void *address, char enable)
+{
+   char newi[MARK_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET+1];
+   int size = MARK_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET
+   + sizeof(MARK_OPTIMIZED_ENABLE_TYPE);
+
+   memcpy(newi, address, size);
+   MARK_OPTIMIZED_ENABLE([0]) = enable;
+   memcpy(address, newi, size);
+   flush_icache_range((unsigned long)address, size);
+   return 0;
+}
+EXPORT_SYMBOL_GPL(marker_optimized_set_enable);
diff --git a/include/asm-powerpc/marker.h b/include/asm-powerpc/marker.h
index 7ff8387..eeb6ad3 100644
--- a/include/asm-powerpc/marker.h
+++ b/include/asm-powerpc/marker.h
@@ -14,16 +14,18 @@
 
 #ifdef CONFIG_MARKERS
 
-#define MARK(name, format, args...) \
+#define _MF_DEFAULT (_MF_OPTIMIZED | _MF_LOCKDEP | _MF_PRINTK)
+
+#define MARK_OPTIMIZED(flags, name, format, args...) \
do { \
static marker_probe_func *__mark_call_##name = \
__mark_empty_function; \
static const struct __mark_marker_c __mark_c_##name \
__attribute__((section(".markers.c"))) = \
{ #name, &__mark_call_##name, format, \
-   MARKER_OPTIMIZED } ; \
+   (flags) | _MF_OPTIMIZED } ; \
char condition; \
-   asm volatile(   ".section .markers, \"a\";\n\t" \
+   asm volatile(   ".section .markers, \"a\", @progbits;\n\t" \
PPC_LONG "%1, 0f;\n\t" \
".previous;\n\t" \
".align 4\n\t" \
@@ -39,11 +41,25 @@
} \
} while (0)
 
+#define _MARK(flags, format, args...) \
+do { \
+   if ((flags) & _MF_OPTIMIZED) \
+   MARK_OPTIMIZED(flags, format, ## args); \
+   else \
+   MARK_GENERIC(flags, format, ## args); \
+} while (0)
+
+#define MARK(format, args...) _MARK(_MF_DEFAULT, format, ## args)
 
 /* Offset of the immediate value from the start of the addi instruction (result
  * of the li mnemonic), in bytes. */
-#define MARK_ENABLE_IMMEDIATE_OFFSET 2
-#define MARK_ENABLE_TYPE short
-#define MARK_POLYMORPHIC
+#define MARK_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET 2
+#define MARK_OPTIMIZED_ENABLE_TYPE short
+/* Dereference enable as lvalue from a pointer to its instruction */
+#define MARK_OPTIMIZED_ENABLE(a) \
+   *(MARK_OPTIMIZED_ENABLE_TYPE*) \
+   ((char*)a+MARK_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET)
+
+extern int marker_optimized_set_enable(void *address, char enable);
 
 #endif
-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 04/22] fix deadlock in throttle_vm_writeout

2007-02-28 Thread Andrew Morton
On Tue, 27 Feb 2007 23:38:13 +0100 Miklos Szeredi <[EMAIL PROTECTED]> wrote:

> From: Miklos Szeredi <[EMAIL PROTECTED]>
> 
> This deadlock is similar to the one in balance_dirty_pages, but
> instead of waiting in balance_dirty_pages after submitting a write
> request, it happens during a memory allocation for filesystem B before
> submitting a write request.
> 
> It is easy to reproduce on a machine with not too much memory.
> E.g. try this on 2.6.21-rc1 UML with 32MB (works on physical hw as
> well):
> 
>   dd if=/dev/zero of=/tmp/tmp.img bs=1048576 count=40
>   mke2fs -j -F /tmp/tmp.img
>   mkdir /tmp/img
>   mount -oloop /tmp/tmp.img /tmp/img
>   bash-shared-mapping /tmp/img/foo 3000
> 
> The deadlock doesn't happen immediately, sometimes only after a few
> minutes.
> 
> Simplified stack trace for bash-shared-mapping after the deadlock:
> 
>   io_schedule_timeout
>   congestion_wait
>   balance_dirty_pages
>   balance_dirty_pages_ratelimited_nr
>   generic_file_buffered_write
>   __generic_file_aio_write_nolock
>   generic_file_aio_write
>   ext3_file_write
>   do_sync_write
>   vfs_write
>   sys_pwrite64
> 
> and for [loop0]:
> 
>   io_schedule_timeout
>   congestion_wait
>   throttle_vm_writeout
>   shrink_zone
>   shrink_zones
>   try_to_free_pages
>   __alloc_pages
>   find_or_create_page
>   do_lo_send_aops
>   lo_send
>   do_bio_filebacked
>   loop_thread
> 
> The requirement for the deadlock is that
> 
>   nr_writeback > dirty_thresh * 1.1 + margin
> 
> Again margin seems to be in the 100 page range.
> 
> The task of throttle_vm_writeout is to limit the rate at which
> under-writeback pages are created due to swapping.  There's no other
> way direct reclaim can increase the nr_writeback + nr_file_dirty.
> 
> So when there are few or no under-swap pages, it is safe for this
> function to return.  This ensures, that there's progress with writing
> back dirty pages.
> 

Would this also be solved by the below just-submitted bugfix?  I guess not.

I think the basic problem here is that the loop thread is reponsible for 
cleaning
memory, but in throttle_vm_writeout(), the loop thread can get stuck waiting
for some other thread to clean memory, but that ain't going to happen.

throttle_vm_writeout() wasn't very well thought through, I suspect.


I suspect we can just delete throttle_vm_writeout() now.  The original
rationale was:

[PATCH] vm: pageout throttling

With silly pageout testcases it is possible to place huge amounts of memory
under I/O.  With a large request queue (CFQ uses 8192 requests) it is
possible to place _all_ memory under I/O at the same time.

This means that all memory is pinned and unreclaimable and the VM gets
upset and goes oom.

The patch limits the amount of memory which is under pageout writeout to be
a little more than the amount of memory at which balance_dirty_pages()
callers will synchronously throttle.

This means that heavy pageout activity can starve heavy writeback activity
completely, but heavy writeback activity will not cause starvation of
pageout.  Because we don't want a simple `dd' to be causing excessive
latencies in page reclaim.

but now that we limit the amount of dirty MAP_SHARED memory, and given that
the various pieces of the dirty-memory limiting puzzle also take the number
of under-writeback pages into account, we should no longer be able to get
in the situation where the total number of writeback+dirty pages exceeds
dirty_ratio.



From: Andrew Morton <[EMAIL PROTECTED]>

throttle_vm_writeout() is designed to wait for the dirty levels to subside. 
But if the caller holds IO or FS locks, we might be holding up that writeout.

So change it to take a single nap to give other devices a chance to clean some
memory, then return.

Cc: Nick Piggin <[EMAIL PROTECTED]>
Cc: OGAWA Hirofumi <[EMAIL PROTECTED]>
Cc: Kumar Gala <[EMAIL PROTECTED]>
Cc: Pete Zaitcev <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 include/linux/writeback.h |2 +-
 mm/page-writeback.c   |   13 +++--
 mm/vmscan.c   |2 +-
 3 files changed, 13 insertions(+), 4 deletions(-)

diff -puN 
include/linux/writeback.h~throttle_vm_writeout-dont-loop-on-gfp_nofs-and-gfp_noio-allocations
 include/linux/writeback.h
--- 
a/include/linux/writeback.h~throttle_vm_writeout-dont-loop-on-gfp_nofs-and-gfp_noio-allocations
+++ a/include/linux/writeback.h
@@ -84,7 +84,7 @@ static inline void wait_on_inode(struct 
 int wakeup_pdflush(long nr_pages);
 void laptop_io_completion(void);
 void laptop_sync_completion(void);
-void throttle_vm_writeout(void);
+void throttle_vm_writeout(gfp_t gfp_mask);
 
 /* These are exported to sysctl. */
 extern int dirty_background_ratio;
diff -puN 
mm/page-writeback.c~throttle_vm_writeout-dont-loop-on-gfp_nofs-and-gfp_noio-allocations
 mm/page-writeback.c
--- 

[PATCH] linux-kernel-markers-kconfig-menus-fix-5

2007-02-28 Thread Mathieu Desnoyers
linux-kernel-markers-kconfig-menus-fix-5

- Change CONFIG_MARKERS_ENABLE_OPTIMIZATION for
  CONFIG_MARKERS_DISABLE_OPTIMIZATION.
- Have CONFIG_MARKERS_DISABLE_OPTIMIZATION depend on EMBEDDED.

Signed-off-by: Mathieu Desnoyers <[EMAIL PROTECTED]>

--- a/kernel/Kconfig.marker
+++ b/kernel/Kconfig.marker
@@ -7,10 +7,14 @@ config MARKERS
  Place an empty function call at each marker site. Can be
  dynamically changed for a probe function.
 
-config MARKERS_ENABLE_OPTIMIZATION
-   bool "Enable marker optimization"
-   depends on MARKERS
-   default y
+config MARKERS_DISABLE_OPTIMIZATION
+   bool "Disable marker optimization"
+   depends on MARKERS && EMBEDDED
+   default n
help
  Disable code replacement jump optimisations. Especially useful if your
  code is in a read-only rom/flash.
+
+config MARKERS_ENABLE_OPTIMIZATION
+   def_bool y
+   depends on MARKERS && !MARKERS_DISABLE_OPTIMIZATION
-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] linux-kernel-markers-architecture-independant-code-flags

2007-02-28 Thread Mathieu Desnoyers
linux-kernel-markers-architecture-independant-code-flags

- GEN_MARK changed for MARK_GENERIC is now declared in linux/marker.h
- Adds the MF_* flags than can be used by the _MARK macro
  MF_OPTIMIZED (Use optimized markers)
  MF_LOCKDEP (Can call lockdep)
  MF_PRINTK (vprintk can be called in the probe)
- The MARK marcro calls _MARK with a default set of flags (_MF_DEFAULT)
- Verification of flag compatibility at probe connexion

Signed-off-by: Mathieu Desnoyers <[EMAIL PROTECTED]>

--- a/include/linux/marker.h
+++ b/include/linux/marker.h
@@ -6,29 +6,7 @@
  *
  * Code markup for dynamic and static tracing.
  *
- * Example :
- *
- * MARK(subsystem_event, "%d %s %p[struct task_struct]",
- *   someint, somestring, current);
- * Where :
- * - Subsystem is the name of your subsystem.
- * - event is the name of the event to mark.
- * - "%d %s %p[struct task_struct]" is the formatted string for printk.
- * - someint is an integer.
- * - somestring is a char pointer.
- * - current is a pointer to a struct task_struct.
- *
- * - Dynamically overridable function call based on marker mechanism
- *   from Frank Ch. Eigler <[EMAIL PROTECTED]>.
- * - Thanks to Jeremy Fitzhardinge <[EMAIL PROTECTED]> for his constructive
- *   criticism about gcc optimization related issues.
- *
- * The marker mechanism supports multiple instances of the same marker.
- * Markers can be put in inline functions, inlined static functions and
- * unrolled loops.
- *
- * Note : It is safe to put markers within preempt-safe code : preempt_enable()
- * will not call the scheduler due to the tests in preempt_schedule().
+ * See Documentation/marker.txt.
  *
  * (C) Copyright 2006 Mathieu Desnoyers <[EMAIL PROTECTED]>
  *
@@ -36,17 +14,15 @@
  * See the file COPYING for more details.
  */
 
-#ifndef __ASSEMBLY__
+#ifdef __KERNEL__
 
 typedef void marker_probe_func(const char *fmt, ...);
 
-enum marker_type { MARKER_GENERIC, MARKER_OPTIMIZED };
-
 struct __mark_marker_c {
const char *name;
marker_probe_func **call;
const char *format;
-   enum marker_type type;
+   int flags;
 } __attribute__((packed));
 
 struct __mark_marker {
@@ -54,32 +30,64 @@ struct __mark_marker {
void *enable;
 } __attribute__((packed));
 
-#ifdef CONFIG_MARKERS_ENABLE_OPTIMIZATION
-#include 
-#endif
-
-#include 
-
-#define MARK_NOARGS " "
-#define MARK_MAX_FORMAT_LEN1024
-
-#ifndef CONFIG_MARKERS
-#define GEN_MARK(name, format, args...) \
+/* Generic marker flavor always available */
+#ifdef CONFIG_MARKERS
+
+#define MF_OPTIMIZED 1 /* Use optimized markers */
+#define MF_LOCKDEP 2   /* Can call lockdep */
+#define MF_PRINTK 3/* vprintk can be called in the probe */
+
+#define _MF_OPTIMIZED (1 << MF_OPTIMIZED)
+#define _MF_LOCKDEP (1 << MF_LOCKDEP)
+#define _MF_PRINTK (1 << MF_PRINTK)
+
+#define MARK_GENERIC(flags, name, format, args...) \
+   do { \
+   static marker_probe_func *__mark_call_##name = \
+   __mark_empty_function; \
+   static char __marker_enable_##name = 0; \
+   static const struct __mark_marker_c __mark_c_##name \
+   __attribute__((section(".markers.c"))) = \
+   { #name, &__mark_call_##name, format, \
+   (flags) | ~_MF_OPTIMIZED } ; \
+   static const struct __mark_marker __mark_##name \
+   __attribute__((section(".markers"))) = \
+   { &__mark_c_##name, &__marker_enable_##name } ; \
+   asm volatile ( "" : : "i" (&__mark_##name)); \
+   __mark_check_format(format, ## args); \
+   if (unlikely(__marker_enable_##name)) { \
+   preempt_disable(); \
+   (*__mark_call_##name)(format, ## args); \
+   preempt_enable(); \
+   } \
+   } while (0)
+
+#define MARK_GENERIC_ENABLE_IMMEDIATE_OFFSET 0
+#define MARK_GENERIC_ENABLE_TYPE char
+/* Dereference enable as lvalue from a pointer to its instruction */
+#define MARK_GENERIC_ENABLE(a) \
+   *(MARK_GENERIC_ENABLE_TYPE*) \
+   ((char*)a+MARK_GENERIC_ENABLE_IMMEDIATE_OFFSET)
+
+static inline int marker_generic_set_enable(void *address, char enable)
+{
+   MARK_GENERIC_ENABLE(address) = enable;
+   return 0;
+}
+
+#else /* !CONFIG_MARKERS */
+#define MARK_GENERIC(flags, name, format, args...) \
__mark_check_format(format, ## args)
-#endif
+#endif /* CONFIG_MARKERS */
 
-#ifndef MARK
-#define MARK GEN_MARK
-#define MARK_ENABLE_TYPE GEN_MARK_ENABLE_TYPE
-#define MARK_ENABLE_IMMEDIATE_OFFSET GEN_MARK_ENABLE_IMMEDIATE_OFFSET
+#ifdef CONFIG_MARKERS_ENABLE_OPTIMIZATION
+#include /* optimized marker flavor */
+#else
+#include /* fallback on generic markers 
*/
 #endif
 
-/* Dereference enable as lvalue from a pointer to its instruction */
-#define MARK_ENABLE(a) \
-   

Re: [patch 03/22] fix deadlock in balance_dirty_pages

2007-02-28 Thread Andrew Morton
On Tue, 27 Feb 2007 23:38:12 +0100 Miklos Szeredi <[EMAIL PROTECTED]> wrote:

> From: Miklos Szeredi <[EMAIL PROTECTED]>
> 
> This deadlock happens, when dirty pages from one filesystem are
> written back through another filesystem.  It easiest to demonstrate
> with fuse although it could affect looback mounts as well (see
> following patches).
> 
> Let's call the filesystems A(bove) and B(elow).  Process Pr_a is
> writing to A, and process Pr_b is writing to B.
> 
> Pr_a is bash-shared-mapping.  Pr_b is the fuse filesystem daemon
> (fusexmp_fh), for simplicity let's assume that Pr_b is single
> threaded.
> 
> These are the simplified stack traces of these processes after the
> deadlock:
> 
> Pr_a (bash-shared-mapping):
> 
>   (block on queue)
>   fuse_writepage
>   generic_writepages
>   writeback_inodes
>   balance_dirty_pages
>   balance_dirty_pages_ratelimited_nr
>   set_page_dirty_mapping_balance
>   do_no_page
> 
> 
> Pr_b (fusexmp_fh):
> 
>   io_schedule_timeout
>   congestion_wait
>   balance_dirty_pages
>   balance_dirty_pages_ratelimited_nr
>   generic_file_buffered_write
>   generic_file_aio_write
>   ext3_file_write
>   do_sync_write
>   vfs_write
>   sys_pwrite64
> 
> 
> Thanks to the aggressive nature of Pr_a, it can happen, that
> 
>   nr_file_dirty > dirty_thresh + margin
> 
> This is due to both nr_dirty growing and dirty_thresh shrinking, which
> in turn is due to nr_file_mapped rapidly growing.  The exact size of
> the margin at which the deadlock happens is not known, but it's around
> 100 pages.
> 
> At this point Pr_a enters balance_dirty_pages and starts to write back
> some if it's dirty pages.  After submitting some requests, it blocks
> on the request queue.
> 
> The first write request will trigger Pr_b to perform a write()
> syscall.  This will submit a write request to the block device and
> then may enter balance_dirty_pages().
> 
> The condition for exiting balance_dirty_pages() is
> 
>  - either that write_chunk pages have been written
> 
>  - or nr_file_dirty + nr_writeback < dirty_thresh
> 
> It is entirely possible that less than write_chunk pages were written,
> in which case balance_dirty_pages() will not exit even after all the
> submitted requests have been succesfully completed.
> 
> Which means that the write() syscall does not return.

But the balance_dirty_pages() loop does more than just wait for those two
conditions.  It will also submit _more_ dirty pages for writeout.  ie: it
should be feeding more of file A's pages into writepage.

Why isn't that happening?

> Which means, that no more dirty pages from A will be written back, and
> neither nr_writeback nor nr_file_dirty will decrease.
> 
> Which means, that balance_dirty_pages() will loop forever.
> 
> Q.E.D.
> 
> The solution is to exit balance_dirty_pages() on the condition, that
> there are only a few dirty + writeback pages for this backing dev.  This
> makes sure, that there is always some progress with this setup.
> 
> The number of outstanding dirty + written pages is limited to 8, which
> means that when over the threshold (dirty_exceeded == 1), each
> filesystem may only effectively pin a maximum of 16 (+8 because of
> ratelimiting) extra pages.
> 
> Note: a similar safety vent is always needed if there's a global limit
> for the dirty+writeback pages, even if in the future there will be
> some per-queue (or other) soft limit.
> 
> Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]>
> ---
> 
> Index: linux/mm/page-writeback.c
> ===
> --- linux.orig/mm/page-writeback.c2007-02-27 14:41:07.0 +0100
> +++ linux/mm/page-writeback.c 2007-02-27 14:41:07.0 +0100
> @@ -201,6 +201,17 @@ static void balance_dirty_pages(struct a
>   if (!dirty_exceeded)
>   dirty_exceeded = 1;
>  
> + /*
> +  * Acquit producer of dirty pages if there's little or
> +  * nothing to write back to this particular queue.
> +  *
> +  * Without this check a deadlock is possible for if
> +  * one filesystem is writing data through another.
> +  */
> + if (atomic_long_read(>nr_dirty) +
> + atomic_long_read(>nr_writeback) < 8)
> + break;
> +
>   /* Note: nr_reclaimable denotes nr_dirty + nr_unstable.
>* Unstable writes are a feature of certain networked
>* filesystems (i.e. NFS) in which data may have been
> 
> --
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thread flags modified without set_thread_flag() (non atomically)

2007-02-28 Thread David Miller
From: Andrew Morton <[EMAIL PROTECTED]>
Date: Wed, 28 Feb 2007 22:03:49 -0800

> On Mon, 26 Feb 2007 12:10:37 -0800 Mathieu Desnoyers <[EMAIL PROTECTED]> 
> wrote:
> 
> > Other examples :
> > 
> > sparc64/kernel/ptrace.c:if 
> > ((task_thread_info(child)->flags & _TIF_32BIT) != 0) {
> > sparc64/kernel/process.c:   t->flags ^= (_TIF_ABI_PENDING | 
> > _TIF_32BIT);
> > sparc64/kernel/process.c:   t->flags &= ~_TIF_PERFCTR;
> > 
> > sparc/kernel/process.c: current_thread_info()->flags &= 
> > ~_TIF_USEDFPU;
> > sparc/kernel/process.c: current_thread_info()->flags &= 
> > ~_TIF_USEDFPU;
> > sparc/kernel/process.c: current_thread_info()->flags &= 
> > ~_TIF_USEDFPU;
> > sparc/kernel/process.c: current_thread_info()->flags &= 
> > ~(_TIF_USEDFPU);
> > sparc/kernel/traps.c:   current_thread_info()->flags |= _TIF_USEDFPU;
> > sparc/kernel/traps.c:   task_thread_info(fpt)->flags &= ~_TIF_USEDFPU;
> 
> That all looks rather deliberate.
> 
> > powerpc/kernel/process.c:   t->flags ^= (_TIF_ABI_PENDING | 
> > _TIF_32BIT);
> >
> > ia64/kernel/mca.c:  ti->flags = _TIF_MCA_INIT;
> >
> > avr32/kernel/ptrace.c:  ti->flags |= _TIF_BREAKPOINT;
> 
> No, I don't immediately see anything in the flush_old_exec() code path
> which tells us that nobody else can look up this thread_info (or be holding
> a ref to it) in this context.

Provide the counter example, what other threads of control can modify
relevant flags while a thread is exec()'ing?  It's essentially frozen
outside of these code paths, otherwise we wouldn't be able to make all
of these modifications to the task state.

I think these cases are very safe.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 14/22] spufs: use SPU master control to prevent wild SPU execution

2007-02-28 Thread Michael Ellerman
On Mon, 2006-11-20 at 18:45 +0100, Arnd Bergmann wrote:
> plain text document attachment (spufs-master-control.diff)
> When the user changes the runcontrol register, an SPU might be
> running without a process being attached to it and waiting for
> events. In order to prevent this, make sure we always disable
> the priv1 master control when we're not inside of spu_run.

Hi Arnd,

Sorry I didn't comment on this when you sent it, I wasn't paying enough
attention. This patch confuses me, you say we should make sure we always
disable the master control when we're not inside spu_run, but I see
several exit paths where we leave the master run bit enabled - or maybe
I'm reading it wrong.

I think I've also seen it happen:

[EMAIL PROTECTED] dma5]# ./put-test 
10963.13
[EMAIL PROTECTED] dma5]# find /spu
/spu
[EMAIL PROTECTED] dma5]# echo x > /proc/sysrq-trigger 
SysRq : Entering xmon

0:mon> ss
..
Stopped spu 06, was running (mfc_sr1: 0x32 runcntl: 0x1)
Stopped spu 07, was running (mfc_sr1: 0x32 runcntl: 0x1)
..

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person


signature.asc
Description: This is a digitally signed message part


Re: [PATCH]: Use stop_machine_run in the Intel RNG driver

2007-02-28 Thread Andrew Morton
On Tue, 27 Feb 2007 07:22:00 -0500 Prarit Bhargava <[EMAIL PROTECTED]> wrote:

> Replace call_smp_function with stop_machine_run in the Intel RNG driver.
> 
> CPU A has done read_lock()
> CPU B has done write_lock_irq() and is waiting for A to release the lock.
> 
> A third CPU calls call_smp_function and issues the IPI.  CPU A takes CPU C's
> IPI.  CPU B is waiting with interrupts disabled and does not see the IPI.
> CPU C is stuck waiting for CPU B to respond to the IPI.
> 
> Deadlock.

I think what you're describing here is just the standard old
smp_call_function() deadlock, rather than anything which is specific to
intel-rng, yes?

It is "well known" that you can't call smp_call_function() with local
interrupts disabled.  In fact i386 will spit a warning if you try it.


intel-rng doesn't do that, but what it _does_ do is:

smp_call_function(..., wait = 0);
local_irq_disable();

so some CPUs will still be entering the IPI while this CPU has gone and
disabled interrupts, thus exposing us to the deadlock, yes?

In which case a suitable fix might be to make intel-rng spin until all the
other CPUs have entered intel_init_wait().

> The solution is to use stop_machine_run instead of call_smp_function
> (call_smp_function should not be called in situations where the CPUs may
> be suspended).

But that seems to be a nice change anyway.  It took rather a lot of code
churn to do it, and it does find it necessary to export stop_machine_run()
to modules, but that seems OK too.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thread flags modified without set_thread_flag() (non atomically)

2007-02-28 Thread Andrew Morton
On Mon, 26 Feb 2007 12:10:37 -0800 Mathieu Desnoyers <[EMAIL PROTECTED]> wrote:

> Hi,

How come I'm the only person around here with a Reply button?

> Looking into the thread flags, I found out that some architecture 
> specific kernel functions (in 2.6.20) sets the thread flags with non 
> atomic operation.
> 
> A good way to list the most trivial : grep -r TIF_ * | grep =
> 
> Some examples follows. If, for instance, 
> x86_64/kernel/process.c:flush_thread is called from an exec system call, 
> it will do the following :
> 
> x86_64/kernel/process.c:t->flags ^= (_TIF_ABI_PENDING | 
> _TIF_IA32);
> x86_64/kernel/process.c:t->flags &= ~_TIF_DEBUG;
> 
> void flush_thread(void)
> {
> struct task_struct *tsk = current;
> struct thread_info *t = current_thread_info();
> 
> if (t->flags & _TIF_ABI_PENDING) {
> t->flags ^= (_TIF_ABI_PENDING | _TIF_IA32);
> if (t->flags & _TIF_IA32)
> current_thread_info()->status |= TS_COMPAT;
> }
> t->flags &= ~_TIF_DEBUG;
> 
> 
> As long as the flags are only updated by the thread itself at this 
> moment, it seems safe, but if other updates coming from other threads 
> are expected, wouldn't it result in a bad behavior ?
> 
> i.e if resched_task ia being called by another CPU at the same time for 
> this specific thread would set the TIF_NEED_RESCHED flag, but it could 
> be overwritten by the non-atomic modification in flush_thread.

It does seem risky.  Perhaps it is a micro-optimisation which utilises
knowledge that this thread_struct cannot be looked up via any path in this
context.

Or perhaps it is a bug.  Andi, can you please comment?

> And about this specific flush_thread, I am puzzled about the t->flags ^= 
> (_TIF_ABI_PENDING | _TIF_IA32); line. The XOR will clearly flip the 
> _TIF_ABI_PENDING bit to 0, and very likely set _TIF_IA32 to the opposite 
> of its current value. Why does this change need to be written atomically 
> (can other threads play with these flags ?) ?
> 

Don't know.

> 
> 
> Other examples :
> 
> sparc64/kernel/ptrace.c:if 
> ((task_thread_info(child)->flags & _TIF_32BIT) != 0) {
> sparc64/kernel/process.c:   t->flags ^= (_TIF_ABI_PENDING | 
> _TIF_32BIT);
> sparc64/kernel/process.c:   t->flags &= ~_TIF_PERFCTR;
> 
> sparc/kernel/process.c: current_thread_info()->flags &= 
> ~_TIF_USEDFPU;
> sparc/kernel/process.c: current_thread_info()->flags &= 
> ~_TIF_USEDFPU;
> sparc/kernel/process.c: current_thread_info()->flags &= 
> ~_TIF_USEDFPU;
> sparc/kernel/process.c: current_thread_info()->flags &= 
> ~(_TIF_USEDFPU);
> sparc/kernel/traps.c:   current_thread_info()->flags |= _TIF_USEDFPU;
> sparc/kernel/traps.c:   task_thread_info(fpt)->flags &= ~_TIF_USEDFPU;

That all looks rather deliberate.

> powerpc/kernel/process.c:   t->flags ^= (_TIF_ABI_PENDING | 
> _TIF_32BIT);
>
> ia64/kernel/mca.c:  ti->flags = _TIF_MCA_INIT;
>
> avr32/kernel/ptrace.c:  ti->flags |= _TIF_BREAKPOINT;

No, I don't immediately see anything in the flush_old_exec() code path
which tells us that nobody else can look up this thread_info (or be holding
a ref to it) in this context.


> avr32/kernel/ptrace.c:  ti->flags |= TIF_SINGLE_STEP;

heh.  Haarvard, you got a bug.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc1: CIFS cheers, NFS4 jeers

2007-02-28 Thread Andrew Morton
On Mon, 26 Feb 2007 00:45:00 -0600 [EMAIL PROTECTED] (Florin Iucha) wrote:

> Hello, it's me and my 70 GB of photos again.
> 
> I have tested both CIFS and NFSv4 clients in kernel 2.6.20-rc1 . CIFS
> passed with flying colors and NFSv4 stalled after 7 GB.
> 
> Configuration:
> 
>Server: PIII/1GHz, 512 MB RAM, Debian testing,
>   distro kernel 2.6.18-3-vserver-686, Intel E1000 NIC, 
>   filesystem 170 GB ext3 with default mkfs values on a SATA disk
>   
>Client: AMD x2 4200+, 2 GB RAM, Debian testing/unstable
>   kernel 2.6.20-rc1, Marvell SKGE onboard,
>   filesystem 120 GB ext3 with default mkfs values on a SATA disk
> 
> After the writing stalls, I have echoed 't' into /proc/sysrq-trigger
> and got a trace, which is at http://iucha.net/20-rc1/after.1.  There was
> no oops before the trace request; the 'before' dmesg is at
> http://iucha.net/20-rc1/before.1 .
> 
> Running 'top', one core is idle and the other is 99% waiting, while
> the 'cp' program is in 'D' state.  Also, after NFSv4 stalls, invokations
> of 'lsof' stall as well.  I can 'ssh' into the box without problems.

and

>
> The kernel on the client is 2.6.21-rc1 (but it echoes problems I
> reported in December with 2.6.20 series as well) as can be seen from
> the kernel logs.
> 
> I have corrected the links:
> 
>http://iucha.net/21-rc1/before.1
>http://iucha.net/21-rc1/after.1
>http://iucha.net/21-rc1/config-2.6.21-rc1
> 

The relevant part is:

[ 1215.657827] cpD 00f86f105704 0  2859   2843  
   (NOTLB)
[ 1215.657833]  81007343faa8 0082  
81007343fb58
[ 1215.657837]  0002 81007343faa8 0008 
81007e578ee0
[ 1215.657842]  810002f4a080 2150 81007e5790b8 
00017343fb50
[ 1215.657847] Call Trace:
[ 1215.657852]  [] io_schedule+0x28/0x34
[ 1215.657856]  [] sync_page+0x41/0x45
[ 1215.657859]  [] __wait_on_bit+0x45/0x77
[ 1215.657862]  [] sync_page+0x0/0x45
[ 1215.657867]  [] wait_on_page_bit+0x6e/0x75
[ 1215.657870]  [] wake_bit_function+0x0/0x2a
[ 1215.657874]  [] pagevec_lookup_tag+0x22/0x2b
[ 1215.657878]  [] wait_on_page_writeback_range+0x6e/0x142
[ 1215.657885]  [] filemap_fdatawait+0x20/0x22
[ 1215.657889]  [] filemap_write_and_wait+0x29/0x38
[ 1215.657894]  [] nfs_setattr+0xa0/0x11a
[ 1215.657897]  [] link_path_walk+0xe8/0xfc
[ 1215.657902]  [] autoremove_wake_function+0x0/0x38
[ 1215.657907]  [] poison_obj+0x27/0x32
[ 1215.657910]  [] current_fs_time+0x3f/0x41
[ 1215.657913]  [] __user_walk_fd+0x53/0x62
[ 1215.657918]  [] notify_change+0x129/0x238
[ 1215.657923]  [] do_utimes+0xfc/0x126
[ 1215.657928]  [] _raw_spin_lock+0xf3/0xf9
[ 1215.657933]  [] sys_futimesat+0x45/0x56
[ 1215.657937]  [] sys_utimes+0x14/0x16
[ 1215.657941]  [] system_call+0x7e/0x83

seems that we've simply lost an IO completion.

Was 2.6.19 OK?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: "BUG:" when resuming from suspend-to-ram

2007-02-28 Thread Dmitry Torokhov
On Wednesday 28 February 2007 07:45, Rafael J. Wysocki wrote:
> 
> > This gives:
> > 
> > (gdb) l *evdev_disconnect+0xb1
> > 0xa81 is in evdev_disconnect (include/asm/processor.h:716).
> > 711However we don't do prefetches for pre XP Athlons currently
> > 712That should be fixed. */
> > 713 #define ARCH_HAS_PREFETCH
> > 714 static inline void prefetch(const void *x)
> > 715 {
> > 716 alternative_input(ASM_NOP4,
> > 717   "prefetchnta (%1)",
> > 718   X86_FEATURE_XMM,
> > 719   "r" (x));
> > 720 }
> 
> Hm, interesting.  Looks like a pointer points to nowhere in
> input_unregister_device(), but I don't know which one.  This may be
> an evdev problem ...
> 

Please try the patch below.

-- 
Dmitry

Input: use krefs for refcounting in input handlers

This should fix problems whith accessing memory already freed by
another thread.

Signed-off-by: Dmitry Torokhov <[EMAIL PROTECTED]>
---

 drivers/input/evdev.c|   62 ++---
 drivers/input/joydev.c   |   51 +-
 drivers/input/mousedev.c |  166 +--
 drivers/input/tsdev.c|   65 +++---
 4 files changed, 227 insertions(+), 117 deletions(-)

Index: work/drivers/input/evdev.c
===
--- work.orig/drivers/input/evdev.c
+++ work/drivers/input/evdev.c
@@ -29,8 +29,10 @@ struct evdev {
char name[16];
struct input_handle handle;
wait_queue_head_t wait;
-   struct evdev_list *grab;
+   struct kref kref;
struct list_head list;
+
+   struct evdev_list *grab;
 };
 
 struct evdev_list {
@@ -94,53 +96,62 @@ static int evdev_flush(struct file *file
return input_flush_device(>evdev->handle, file);
 }
 
-static void evdev_free(struct evdev *evdev)
+static void evdev_free(struct kref *kref)
 {
+   struct evdev *evdev = container_of(kref, struct evdev, kref);
+
evdev_table[evdev->minor] = NULL;
kfree(evdev);
 }
 
-static int evdev_release(struct inode * inode, struct file * file)
+static int evdev_release(struct inode *inode, struct file *file)
 {
struct evdev_list *list = file->private_data;
+   struct evdev *evdev = list->evdev;
 
-   if (list->evdev->grab == list) {
-   input_release_device(>evdev->handle);
-   list->evdev->grab = NULL;
+   if (evdev->grab == list) {
+   input_release_device(>handle);
+   evdev->grab = NULL;
}
 
evdev_fasync(-1, file, 0);
+
list_del(>node);
+   kfree(list);
 
-   if (!--list->evdev->open) {
-   if (list->evdev->exist)
-   input_close_device(>evdev->handle);
-   else
-   evdev_free(list->evdev);
-   }
+   if (!--evdev->open && evdev->exist)
+   input_close_device(>handle);
+
+   kref_put(>kref, evdev_free);
 
-   kfree(list);
return 0;
 }
 
-static int evdev_open(struct inode * inode, struct file * file)
+static int evdev_open(struct inode *inode, struct file *file)
 {
struct evdev_list *list;
+   struct evdev *evdev;
int i = iminor(inode) - EVDEV_MINOR_BASE;
 
-   if (i >= EVDEV_MINORS || !evdev_table[i] || !evdev_table[i]->exist)
+   if (i >= EVDEV_MINORS)
+   return -ENODEV;
+
+   evdev = evdev_table[i];
+   if (!evdev || !evdev->exist)
return -ENODEV;
 
-   if (!(list = kzalloc(sizeof(struct evdev_list), GFP_KERNEL)))
+   list = kzalloc(sizeof(struct evdev_list), GFP_KERNEL);
+   if (!list)
return -ENOMEM;
 
-   list->evdev = evdev_table[i];
-   list_add_tail(>node, _table[i]->list);
+   kref_get(>kref);
+
+   list->evdev = evdev;
+   list_add_tail(>node, >list);
file->private_data = list;
 
-   if (!list->evdev->open++)
-   if (list->evdev->exist)
-   input_open_device(>evdev->handle);
+   if (!evdev->open++ && evdev->exist)
+   input_open_device(>handle);
 
return 0;
 }
@@ -629,9 +640,11 @@ static struct input_handle *evdev_connec
return NULL;
}
 
-   if (!(evdev = kzalloc(sizeof(struct evdev), GFP_KERNEL)))
+   evdev = kzalloc(sizeof(struct evdev), GFP_KERNEL);
+   if (!evdev)
return NULL;
 
+   kref_init(>kref);
INIT_LIST_HEAD(>list);
init_waitqueue_head(>wait);
 
@@ -672,8 +685,9 @@ static void evdev_disconnect(struct inpu
wake_up_interruptible(>wait);
list_for_each_entry(list, >list, node)
kill_fasync(>fasync, SIGIO, POLL_HUP);
-   } else
-   evdev_free(evdev);
+   }
+
+   kref_put(>kref, evdev_free);
 }
 
 static const struct input_device_id evdev_ids[] = {
Index: 

Re: 2.6.21-rc1: known regressions (v2) (part 2)

2007-02-28 Thread Mike Galbraith
On Thu, 2007-03-01 at 09:01 +1100, Con Kolivas wrote:
> On Wednesday 28 February 2007 15:21, Mike Galbraith wrote:

> > On Wed, 2007-02-28 at 09:58 +1100, Con Kolivas wrote:
> > > On Tuesday 27 February 2007 19:54, Mike Galbraith wrote:
> > > > Agreed.
> > > >
> > > > I was recently looking at that spot because I found that niced tasks
> > > > were taking latency hits, and disabled it, which helped a bunch.
> > >
> > > Ok... as I said above to Ingo, nice means more latency too, and there is
> > > no doubt that if we disable nice as a working feature then the niced
> > > tasks will have less latency. Of course, this ends up meaning that
> > > un-niced tasks no longer receive their better cpu performance..  You're
> > > basically saying that you prefer nice not to work in the setting of HT.
> >
> > No I'm not, but let's go further in that direction just for the sake of
> > argument.  You're then saying that you prefer realtime priorities to not
> > work in the HT setting, given that realtime tasks don't participate in
> > the 'single stream me' program.
> 
> Where do I say that? I do not presume to manage realtime priorities in any 
> way. You're turning my argument about nice levels around and somehow saying 
> that because hyperthreading breaks the single stream me semantics by 
> parallelising them that I would want to stop that happening. Nowhere have I 
> argued that realtime semantics should be changed to somehow work around 
> hyperthreading. SMT nice is about managing nice only, and not realtime 
> priorities which are independent entities.

I see no real difference between the two assertions.  Nice is just a
mechanism to set priority, so I applied your assertion to a different
range of priorities than nice covers, and returned it to show that the
code contradicts itself.  It can't be bad for a nice 1 task to run with
a nice 0 task, but OK for a minimum RT task to run with a maximum RT
task.  Iff HT without corrective measures breaks nice, then it breaks
realtime priorities as well.

> > I'm saying only that we're defeating the purpose of HT, and overriding a
> > user decision every time we force a sibling idle.
> >
> > > > I also
> > > > can't understand why it would be OK to interleave a normal task with an
> > > > RT task sometimes, but not others.. that's meaningless to the RT task.
> > >
> > > Clearly there would be a reason that code is there... The whole problem
> > > with HT is that as soon as you load up a sibling, you slow down the
> > > logical sibling, hence why this code is there in the first place. Since I
> > > know you're one to test things for yourself, I will put it to you this
> > > way:
> > >
> > > Boot into UP. Time how long it takes to do a million of these in a real
> > > time task:
> > >  asm volatile("" : : : "memory");
> > >
> > > Then start up a SCHED_NORMAL task fully cpu bound such as "yes >
> > > /dev/null" and time that again. Obviously the former being a realtime
> > > task will take the same amount of time and the SCHED_NORMAL task will be
> > > starved until the realtime task finishes.
> >
> > Sure.
> >
> > > Now try the same experiment with hyperthreading enabled and an ordinary
> > > SMP kernel. You'll find the realtime task runs at only ~60% performance.
> >
> > So?  User asked for HT.  That's hardware multiplexing. It ain't free.
> > Buyer beware.
> 
> But the buyer is not aware. You are aware because you tinker, but the vast 
> majority of users who enable hyperthreading in their shiny pcs are not aware.

Then we need to make them aware of what they're enabling?
 
> The only thing they know is that if they enable hyperthreading their programs 
> run slower in multitasking environments no matter how much they nice the 
> other processes. Buyers do not buy hardware knowing that the internal design 
> breaks something as fundamental as 'nice'. You seem to presume that most 
> people who get hyperthreading are happy to compromise 'nice' in order to get 
> their second core working and I put it to you that they do not make that 
> decision.

To me it's pretty much black and white.  Either you want to split your
cpu into logical units, which means each has less to offer than the
total, or you want all your processing power in one bucket.

> > >  That's a
> > > serious performance hit for realtime tasks considering you're running a
> > > SCHED_NORMAL task. The SMT code that you seem to dislike fixes this
> > > problem.
> >
> > I don't think it does actually. Let your RT task sleep regularly, and
> > ever so briefly.  We don't evict lower priority tasks from siblings upon
> > wakeup, we only prevent entry... sometimes.
> 
> Well you know as well as I do that you're selecting out the exception rather 
> than the rule, and statistically overall, it does work.

I don't agree that it's the exception, and if you look at this HT thing
from the split cpu perspective, I'm not sure there's even a problem.

Scrolling down, I see that this is getting too long, and we aren't

[PATCH] Loop device - Tracking page writes made to a loop device through mmap

2007-02-28 Thread Kandan Venkataraman
The patch is for tracking writes made to a loop device made through mmap. 
 
A  file_operations structure variable called loop_fops is initialised with the 
default block device file operations (def_blk_fops).
The mmap operation is overriden with a new function called loop_file_mmap. 
 
A vm_operations structure variable called loop_file_vm_ops is initialised with 
the default operations for a disk file.
The page_mkwrite operation in this variable is initialised to a new function 
called loop_track_pgwrites.
 
In the function lo_open, the file operations pointer of the device file is 
initialised with the address of loop_fops.
 
The function loop_file_mmap simply calls generic_file_mmap and then initialises 
the vm_ops of the vma with address of loop_file_vm_ops.
 
The function loop_track_pgwrites stores the page offset of the page that is 
being written to,  in a red-black tree within the loop device.
 
A flag lo_track_pgwrite has been added to the structs loop_device and 
loop_info64 to turn on/off tracking of page writes.
 
Two new ioctls have been added.
 
The ioctl cmd LOOP_GET_PGWRITES retrieves the page offsets of pages that have 
been written to.
The ioctl cmd LOOP_CLR_PGWRITES empties the red-black tree
 
This functionality would allow us to have a read only version and a write 
version of memory by doing the following:
Associate a normal file as backing storage for  the loop device and mmap to the 
loop device. Call this mmapped address space as area1.
Mmap to a normal file of identical size. Call this mmapped address space as 
area2.
 
Changes made to area1 can be periodically copied to area2 using the ioctl cmds 
(retreive dirty page offsets and copy the dirty pages from area1 to area2). 
This facility would provide a quick way of updating the read only version.

Please CC your reply to [EMAIL PROTECTED]
 
The following patch is against vanilla linux-2.6.19.2
 
Signed-off-by: Kandan Venkataraman [EMAIL PROTECTED]


diff -uprN linux-2.6.19.2/drivers/block/loop.c 
linux-2.6.19.2-new/drivers/block/loop.c
--- linux-2.6.19.2/drivers/block/loop.c 2007-01-11 06:10:37.0 +1100
+++ linux-2.6.19.2-new/drivers/block/loop.c 2007-02-27 17:23:18.0 
+1100
@@ -74,12 +74,16 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
 static int max_loop = 8;
 static struct loop_device *loop_dev;
 static struct gendisk **disks;
+static kmem_cache_t *pgoff_elem_cache;
+static char*  cache_name = "loop_pgoff_elem_cache";
+static struct file_operations loop_fops;
 
 /*
  * Transfer functions
@@ -646,6 +650,73 @@ static void do_loop_switch(struct loop_d
complete(>wait);
 }
 
+static void pgoff_tree_clear(struct rb_root *rb_root)
+{
+   struct rb_node *rb_node  = rb_root->rb_node;
+
+   while (rb_node != NULL) {
+
+   rb_erase(rb_node, rb_root); 
+   kmem_cache_free(pgoff_elem_cache, rb_entry(rb_node, struct 
pgoff_elem, node));
+   rb_node = rb_root->rb_node;
+   }
+
+  *rb_root = RB_ROOT;
+}
+
+
+static int loop_clr_pgwrites(struct loop_device *lo)
+{
+   struct file *filp = lo->lo_backing_file;
+
+   if (lo->lo_state != Lo_bound)
+   return -ENXIO;
+
+   if (filp == NULL)
+   return -EINVAL;
+
+   if (!lo->lo_track_pgwrite)
+ return 0;
+
+   pgoff_tree_clear(>pgoff_tree);
+
+   return 0;
+}
+
+static int loop_get_pgwrites(struct loop_device *lo, struct loop_pgoff_array 
__user *arg)
+{
+   struct file *filp = lo->lo_backing_file;
+   struct loop_pgoff_array array;
+   loff_t i = 0;
+   struct rb_node *rb_node  = rb_first(>pgoff_tree);
+
+   if (lo->lo_state != Lo_bound)
+   return -ENXIO;
+
+   if (filp == NULL)
+   return -EINVAL;
+
+   if (!lo->lo_track_pgwrite)
+ return 0;
+
+   if (copy_from_user(, arg, sizeof (struct loop_pgoff_array)))
+   return -EFAULT;
+
+   while (i < array.max && rb_node != NULL) {
+
+ if (put_user(rb_entry(rb_node, struct pgoff_elem, node)->offset, 
array.pgoff + i))
+return -EFAULT;
+
+ ++i;
+ rb_node = rb_next(rb_node);
+   }
+   array.num = i;
+
+   if (copy_to_user(arg, , sizeof(array)))
+ return -EFAULT;
+
+   return 0;
+}
 
 /*
  * loop_change_fd switched the backing store of a loopback device to
@@ -692,6 +763,8 @@ static int loop_change_fd(struct loop_de
if (get_loop_size(lo, file) != get_loop_size(lo, old_file))
goto out_putf;
 
+   pgoff_tree_clear(>pgoff_tree);
+
/* and ... switch */
error = loop_switch(lo, file);
if (error)
@@ -799,6 +872,8 @@ static int loop_set_fd(struct loop_devic
lo->transfer = transfer_none;
lo->ioctl = NULL;
lo->lo_sizelimit = 0;
+   lo->lo_track_pgwrite = 0;
+   lo->pgoff_tree = RB_ROOT;
lo->old_gfp_mask = mapping_gfp_mask(mapping);
mapping_set_gfp_mask(mapping, lo->old_gfp_mask 

Re: [PATCH] input/spi: add ads7843 support to ads7846 touchscreen driver

2007-02-28 Thread Dmitry Torokhov
On Tuesday 20 February 2007 04:19, Nicolas Ferre wrote:
> Add support for the ads7843 touchscreen controller to the ads7846
> driver code.

Applied to the input tree, thank you.

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Fastboot] [PATCH 1/1] - platform_kernel_launch_event is noop on generic kernel

2007-02-28 Thread Horms
On Wed, Feb 28, 2007 at 01:45:17PM -0600, John Keller wrote:
> Add a missing #define for the platform_kernel_launch_event.
> Without this fix, a call to platform_kernel_launch_event()
> becomes a noop on generic kernels. SN systems require this
> fix to successfully kdump/kexec from certain hardware errors.
>
> Signed-off-by: John Keller <[EMAIL PROTECTED]>

I made a similar change when porting to xen, but I hadn't thought
to see if mainline linux needs it to.

Acked-by: Simon Horman <[EMAIL PROTECTED]>

-- 
Horms
  H: http://www.vergenet.net/~horms/
  W: http://www.valinux.co.jp/en/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm 4/5] Blackfin: patch add blackfin support in smc91x ethernet controller driver

2007-02-28 Thread Wu, Bryan
Hi folks,

As SMC91X ethernet controller are used in blackfin STAMP 533 development
board, this patch add blackfin support to the smc91x linux driver. 

It's name is blackfin-driver-net-stamp533.patch.

[PATCH] Blackfin: patch add blackfin support in smc91x ethernet
controller driver

Signed-off-by: Bryan Wu <[EMAIL PROTECTED]> 
---

 drivers/net/Kconfig  |2 +-
 drivers/net/smc91x.h |   47 +++
 2 files changed, 48 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/net/Kconfig
===
--- linux-2.6.orig/drivers/net/Kconfig  2007-03-01 11:33:24.0 +0800
+++ linux-2.6/drivers/net/Kconfig   2007-03-01 11:39:14.0 +0800
@@ -822,7 +822,7 @@
tristate "SMC 91C9x/91C1xxx support"
select CRC32
select MII
-   depends on NET_ETHERNET && (ARM || REDWOOD_5 || REDWOOD_6 || M32R || 
SUPERH || SOC_AU1X00)
+   depends on NET_ETHERNET && (ARM || REDWOOD_5 || REDWOOD_6 || M32R || 
SUPERH || SOC_AU1X00 || BFIN)
help
  This is a driver for SMC's 91x series of Ethernet chipsets,
  including the SMC91C94 and the SMC91C111. Say Y if you want it
Index: linux-2.6/drivers/net/smc91x.h
===
--- linux-2.6.orig/drivers/net/smc91x.h 2007-03-01 11:33:18.0 +0800
+++ linux-2.6/drivers/net/smc91x.h  2007-03-01 11:39:14.0 +0800
@@ -55,6 +55,53 @@
 #define SMC_insw(a, r, p, l)   readsw((a) + (r), p, l)
 #define SMC_outsw(a, r, p, l)  writesw((a) + (r), p, l)
 
+#elif defined(CONFIG_BFIN)
+
+#define SMC_IRQ_FLAGS  IRQF_TRIGGER_HIGH   
+
+# if defined (CONFIG_BFIN561_EZKIT)
+#define SMC_CAN_USE_8BIT   0
+#define SMC_CAN_USE_16BIT  1
+#define SMC_CAN_USE_32BIT  1
+#define SMC_IO_SHIFT   0
+#define SMC_NOWAIT 1
+#define SMC_USE_BFIN_DMA   0
+
+
+#define SMC_inw(a, r)  readw((a) + (r))
+#define SMC_outw(v, a, r)  writew(v, (a) + (r))
+#define SMC_inl(a, r)  readl((a) + (r))
+#define SMC_outl(v, a, r)  writel(v, (a) + (r))
+#define SMC_outsl(a, r, p, l)  outsl((unsigned long *)((a) + (r)), p, l)
+#define SMC_insl(a, r, p, l)   insl ((unsigned long *)((a) + (r)), p, l)
+# else
+#define SMC_CAN_USE_8BIT   0
+#define SMC_CAN_USE_16BIT  1
+#define SMC_CAN_USE_32BIT  0
+#define SMC_IO_SHIFT   0
+#define SMC_NOWAIT 1
+#define SMC_USE_BFIN_DMA   0
+
+
+#define SMC_inw(a, r)  readw((a) + (r))
+#define SMC_outw(v, a, r)  writew(v, (a) + (r))
+#define SMC_outsw(a, r, p, l)  outsw((unsigned long *)((a) + (r)), p, l)
+#define SMC_insw(a, r, p, l)   insw ((unsigned long *)((a) + (r)), p, l)
+# endif
+/* check if the mac in reg is valid */
+#define SMC_GET_MAC_ADDR(addr) \
+   do {\
+   unsigned int __v;   \
+   __v = SMC_inw( ioaddr, ADDR0_REG ); \
+   addr[0] = __v; addr[1] = __v >> 8;  \
+   __v = SMC_inw( ioaddr, ADDR1_REG ); \
+   addr[2] = __v; addr[3] = __v >> 8;  \
+   __v = SMC_inw( ioaddr, ADDR2_REG ); \
+   addr[4] = __v; addr[5] = __v >> 8;  \
+   if (*(u32 *)([0]) == 0x) { \
+   random_ether_addr(addr);\
+   }   \
+   } while (0)
 #elif defined(CONFIG_REDWOOD_5) || defined(CONFIG_REDWOOD_6)
 
 /* We can only do 16-bit reads and writes in the static memory space. */
---

Thanks,
-Bryan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm 5/5] Blackfin: on-chip RTC controller driver

2007-02-28 Thread Wu, Bryan
Hi folks,

Here is the blackfin on-chip RTC controller driver for Linux.

It's name is blackfin-driver-rtc.patch

[PATCH] Blackfin: on-chip RTC controller driver

This patch implements the driver necessary use the Analog Devices
Blackfin processor's on-chip RTC controller.

Signed-off-by: Bryan Wu <[EMAIL PROTECTED]> 
---

 Kconfig|   10 +
 Makefile   |1
 rtc-bfin.c |  445 +
 3 files changed, 456 insertions(+)

Index: linux-2.6/drivers/rtc/Kconfig
===
--- linux-2.6.orig/drivers/rtc/Kconfig  2007-03-01 11:33:07.0 +0800
+++ linux-2.6/drivers/rtc/Kconfig   2007-03-01 11:40:17.0 +0800
@@ -352,4 +352,14 @@
  This driver can also be built as a module. If so, the module
  will be called rtc-v3020.
 
+config RTC_DRV_BFIN
+   tristate "Blackfin On-Chip RTC"
+   depends on RTC_CLASS && BFIN
+   help
+ If you say yes here you will get support for the
+ Blackfin On-Chip Real Time Clock.
+
+ This driver can also be built as a module. If so, the module
+ will be called rtc-bfin.
+
 endmenu
Index: linux-2.6/drivers/rtc/Makefile
===
--- linux-2.6.orig/drivers/rtc/Makefile 2007-03-01 11:33:07.0 +0800
+++ linux-2.6/drivers/rtc/Makefile  2007-03-01 11:40:17.0 +0800
@@ -38,3 +38,4 @@
 obj-$(CONFIG_RTC_DRV_V3020)+= rtc-v3020.o
 obj-$(CONFIG_RTC_DRV_AT91RM9200)+= rtc-at91rm9200.o
 obj-$(CONFIG_RTC_DRV_SH)   += rtc-sh.o
+obj-$(CONFIG_RTC_DRV_BFIN) += rtc-bfin.o
Index: linux-2.6/drivers/rtc/rtc-bfin.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6/drivers/rtc/rtc-bfin.c2007-03-01 11:40:17.0 +0800
@@ -0,0 +1,445 @@
+/*
+ * Blackfin On-Chip Real Time Clock Driver
+ *  Supports BF531/BF532/BF533/BF534/BF536/BF537
+ *
+ * Copyright 2004-2007 Analog Devices Inc.
+ *
+ * Enter bugs at http://blackfin.uclinux.org/
+ *
+ * Licensed under the GPL-2 or later.
+ */
+
+/* The biggest issue we deal with in this driver is that register writes are
+ * synced to the RTC frequency of 1Hz.  So if you write to a register and
+ * attempt to write again before the first write has completed, the new write
+ * is simply discarded.  This can easily be troublesome if userspace disables
+ * one event (say periodic) and then right after enables an event (say alarm).
+ * Since all events are maintained in the same interrupt mask register, if
+ * we wrote to it to disable the first event and then wrote to it again to
+ * enable the second event, that second event would not be enabled as the
+ * write would be discarded and things quickly fall apart.
+ *
+ * To keep this delay from significantly degrading performance (we, in theory,
+ * would have to sleep for up to 1 second everytime we wanted to write a
+ * register), we only check the write pending status before we start to issue
+ * a new write.  We bank on the idea that it doesnt matter when the sync
+ * happens so long as we don't attempt another write before it does.  The only
+ * time userspace would take this penalty is when they try and do multiple
+ * operations right after another ... but in this case, they need to take the
+ * sync penalty, so we should be OK.
+ *
+ * Also note that the RTC_ISTAT register does not suffer this penalty; its
+ * writes to clear status registers complete immediately.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#define stamp(fmt, args...) pr_debug("%s:%i: " fmt "\n", __FUNCTION__, 
__LINE__, ## args)
+#define stampit() stamp("here i am")
+
+struct bfin_rtc {
+   struct rtc_device *rtc_dev;
+   struct rtc_time rtc_alarm;
+   spinlock_t lock;
+};
+
+/* Bit values for the ISTAT / ICTL registers */
+#define RTC_ISTAT_WRITE_COMPLETE  0x8000
+#define RTC_ISTAT_WRITE_PENDING   0x4000
+#define RTC_ISTAT_ALARM_DAY   0x0040
+#define RTC_ISTAT_24HR0x0020
+#define RTC_ISTAT_HOUR0x0010
+#define RTC_ISTAT_MIN 0x0008
+#define RTC_ISTAT_SEC 0x0004
+#define RTC_ISTAT_ALARM   0x0002
+#define RTC_ISTAT_STOPWATCH   0x0001
+
+/* Shift values for RTC_STAT register */
+#define DAY_BITS_OFF17
+#define HOUR_BITS_OFF   12
+#define MIN_BITS_OFF6
+#define SEC_BITS_OFF0
+
+/* Some helper functions to convert between the common RTC notion of time
+ * and the internal Blackfin notion that is stored in 32bits.
+ */
+static inline u32 rtc_time_to_bfin(unsigned long now)
+{
+   u32 sec  = (now % 60);
+   u32 min  = (now % (60 * 60)) / 60;
+   u32 hour = (now % (60 * 60 * 24)) / (60 * 60);
+   u32 days = (now / (60 * 60 * 24));
+   return (sec  << SEC_BITS_OFF) +
+  (min  << MIN_BITS_OFF) +
+  

[PATCH -mm 3/5] Blackfin: on-chip ethernet MAC controller driver

2007-02-28 Thread Wu, Bryan
Hi folks,

Here is the blackfin on-chip ethernet MAC controller driver for Linux.

It's name is blackfin-driver-net-stamp537.patch

[PATCH] Blackfin: on-chip ethernet MAC controller driver

This patch implements the driver necessary use the Analog Devices
Blackfin processor's on-chip ethernet MAC controller.

Signed-off-by: Bryan Wu <[EMAIL PROTECTED]> 
---

 drivers/net/Kconfig|   44 ++
 drivers/net/Makefile   |1
 drivers/net/bfin_mac.c |  988 
+
 drivers/net/bfin_mac.h |  146 +
 4 files changed, 1179 insertions(+)

Index: linux-2.6/drivers/net/Kconfig
===
--- linux-2.6.orig/drivers/net/Kconfig  2007-03-01 11:39:14.0 +0800
+++ linux-2.6/drivers/net/Kconfig   2007-03-01 11:39:19.0 +0800
@@ -836,6 +836,50 @@
  module, say M here and read  as well
  as .
 
+config BFIN_MAC
+   tristate "Blackfin 536/537 on-chip mac support"
+   depends on NET_ETHERNET && (BF537 || BF536) && (!BF537_PORT_H)
+   select CRC32
+   select BFIN_MAC_USE_L1 if DMA_UNCACHED_NONE
+   help
+ This is the driver for blackfin on-chip mac device. Say Y if you want 
it
+ compiled into the kernel. This driver is also available as a module
+ ( = code which can be inserted in and removed from the running kernel
+ whenever you want). The module will be called bfin_mac.
+
+config BFIN_MAC_USE_L1
+   bool "Use L1 memory for rx/tx packets"
+   depends on BFIN_MAC && BF537
+   default y
+   help
+ To get maximum network performace, you should use L1 memory as rx/tx 
buffers.
+ Say N here if you want to reserve L1 memory for other uses.
+
+config BFIN_TX_DESC_NUM
+   int "Number of transmit buffer packets"
+   depends on BFIN_MAC
+   range 6 10 if BFIN_MAC_USE_L1
+   range 10 100
+   default "10"
+   help
+ Set the number of buffer packets used in driver.
+
+config BFIN_RX_DESC_NUM
+   int "Number of receive buffer packets"
+   depends on BFIN_MAC
+   range 20 100 if BFIN_MAC_USE_L1
+   range 20 800
+   default "20"
+   help
+ Set the number of buffer packets used in driver.
+
+config BFIN_MAC_RMII
+   bool "RMII PHY Interface (EXPERIMENTAL)"
+   depends on BFIN_MAC && EXPERIMENTAL
+   default n
+   help
+ Use Reduced PHY MII Interface
+
 config SMC9194
tristate "SMC 9194 support"
depends on NET_VENDOR_SMC && (ISA || MAC && BROKEN)
Index: linux-2.6/drivers/net/Makefile
===
--- linux-2.6.orig/drivers/net/Makefile 2007-03-01 11:33:24.0 +0800
+++ linux-2.6/drivers/net/Makefile  2007-03-01 11:39:19.0 +0800
@@ -195,6 +195,7 @@
 obj-$(CONFIG_MYRI10GE) += myri10ge/
 obj-$(CONFIG_SMC91X) += smc91x.o
 obj-$(CONFIG_SMC911X) += smc911x.o
+obj-$(CONFIG_BFIN_MAC) += bfin_mac.o
 obj-$(CONFIG_DM9000) += dm9000.o
 obj-$(CONFIG_FEC_8XX) += fec_8xx/
 obj-$(CONFIG_PASEMI_MAC) += pasemi_mac.o
Index: linux-2.6/drivers/net/bfin_mac.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6/drivers/net/bfin_mac.c2007-03-01 11:39:19.0 +0800
@@ -0,0 +1,988 @@
+/*
+ * File: drivers/net/bfin_mac.c
+ * Based on:
+ * Author:   Luke Yang <[EMAIL PROTECTED]>
+ *
+ * Created:
+ * Description:
+ *
+ * Rev:  $Id: bfin_mac.c,v 1.60 2006/12/16 11:23:56 hennerich Exp $
+ *
+ * Modified:
+ *   Copyright 2004-2006 Analog Devices Inc.
+ *
+ * Bugs: Enter bugs at http://blackfin.uclinux.org/
+ *
+ * This program is free software ;  you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation ;  either version 2, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY ;  without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program ;  see the file COPYING.
+ * If not, write to the Free Software Foundation,
+ * 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include "bfin_mac.h"
+
+#define CARDNAME "bfin_mac"
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Luke Yang");
+MODULE_DESCRIPTION("Blackfin MAC Driver");
+
+#if 

[PATCH -mm 2/5] Blackfin: on-chip serial driver update

2007-02-28 Thread Wu, Bryan
Hi folks,

Here is the update version of blackfin serial driver in -mm tree.
Fixed some bugs and please rename the patch file to
blackfin-driver-serial-core.patch.

[PATCH] Blackfin: serial driver for Blackfin architecture

This patch implements the driver necessary use the Analog Devices Blackfin 
processor's Serial Port.

Signed-off-by: Bryan Wu <[EMAIL PROTECTED]>
Cc: Alan Cox <[EMAIL PROTECTED]>
Cc: Russell King <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 drivers/serial/Kconfig  |   94 
 drivers/serial/Makefile |1
 drivers/serial/bfin_5xx.c   | 1012 
 include/linux/serial_core.h |3
 4 files changed, 1110 insertions(+)

Index: linux-2.6/drivers/serial/Kconfig
===
--- linux-2.6.orig/drivers/serial/Kconfig   2007-03-01 17:20:57.0 
+0800
+++ linux-2.6/drivers/serial/Kconfig2007-03-01 17:21:12.0 +0800
@@ -509,6 +509,100 @@
  your boot loader (lilo or loadlin) about how to pass options to the
  kernel at boot time.)
 
+config SERIAL_BFIN
+   tristate "Blackfin serial port support (EXPERIMENTAL)"
+   depends on BFIN && EXPERIMENTAL
+   select SERIAL_CORE
+   select SERIAL_BFIN_UART0 if (BF531 || BF532 || BF533 || BF561)
+   help
+ Add support for the built-in UARTs on the Blackfin.
+
+ To compile this driver as a module, choose M here: the
+ module will be called bfin_5xx.
+
+config SERIAL_BFIN_CONSOLE
+   bool "Console on Blackfin serial port"
+   depends on SERIAL_BFIN
+   select SERIAL_CORE_CONSOLE
+
+choice
+   prompt  "Blackfin UART Mode"
+   depends on SERIAL_BFIN
+   default SERIAL_BFIN_DMA
+   help
+ This driver supports the built-in serial ports of the Blackfin family
+ of CPUs
+
+config SERIAL_BFIN_DMA
+   bool "Blackfin UART DMA mode"
+   depends on DMA_UNCACHED_1M
+   help
+ This driver works under DMA mode. If this option is selected, the
+ blackfin simple dma driver is also enabled.
+
+config SERIAL_BFIN_PIO
+   bool "Blackfin UART PIO mode"
+   help
+ This driver works under PIO mode.
+
+endchoice
+
+config SERIAL_BFIN_UART0
+   bool "Enable UART0"
+   depends on SERIAL_BFIN
+   help
+ Enable UART0
+
+config BFIN_UART0_CTSRTS
+   bool "Enable UART0 hardware flow control"
+   depends on SERIAL_BFIN_UART0
+   help
+ Enable hardware flow control in the driver. Using GPIO emulate the 
CTS/RTS
+ signal.
+
+config UART0_CTS_PIN
+   int "UART0 CTS pin"
+   depends on BFIN_UART0_CTSRTS
+   default 23
+   help
+ The default pin is GPIO_GP7.
+ Refer to ./include/asm-blackfin/gpio.h to see the GPIO map.
+
+config UART0_RTS_PIN
+   int "UART0 RTS pin"
+   depends on BFIN_UART0_CTSRTS
+   default 22
+   help
+ The default pin is GPIO_GP6.
+ Refer to ./include/asm-blackfin/gpio.h to see the GPIO map.
+
+config SERIAL_BFIN_UART1
+   bool "Enable UART1"
+   depends on SERIAL_BFIN && BF537
+   help
+ Enable UART1
+
+config BFIN_UART1_CTSRTS
+   bool "Enable UART1 hardware flow control"
+   depends on SERIAL_BFIN_UART1
+   help
+ Enable hardware flow control in the driver. Using GPIO emulate the 
CTS/RTS
+ signal.
+
+config UART1_CTS_PIN
+   int "UART1 CTS pin"
+   depends on BFIN_UART1_CTSRTS
+   default -1
+   help
+ Refer to ./include/asm-blackfin/gpio.h to see the GPIO map.
+
+config UART1_RTS_PIN
+   int "UART1 RTS pin"
+   depends on BFIN_UART1_CTSRTS
+   default -1
+   help
+ Refer to ./include/asm-blackfin/gpio.h to see the GPIO map.
+
 config SERIAL_IMX
bool "IMX serial port support"
depends on ARM && ARCH_IMX
Index: linux-2.6/drivers/serial/Makefile
===
--- linux-2.6.orig/drivers/serial/Makefile  2007-03-01 17:20:21.0 
+0800
+++ linux-2.6/drivers/serial/Makefile   2007-03-01 17:21:12.0 +0800
@@ -27,6 +27,7 @@
 obj-$(CONFIG_SERIAL_PXA) += pxa.o
 obj-$(CONFIG_SERIAL_PNX8XXX) += pnx8xxx_uart.o
 obj-$(CONFIG_SERIAL_SA1100) += sa1100.o
+obj-$(CONFIG_SERIAL_BFIN) += bfin_5xx.o
 obj-$(CONFIG_SERIAL_S3C2410) += s3c2410.o
 obj-$(CONFIG_SERIAL_SUNCORE) += suncore.o
 obj-$(CONFIG_SERIAL_SUNHV) += sunhv.o
Index: linux-2.6/drivers/serial/bfin_5xx.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6/drivers/serial/bfin_5xx.c 2007-03-01 17:22:47.0 +0800
@@ -0,0 +1,1012 @@
+/*
+ * File: drivers/serial/bfin_5xx.c
+ * Based on: Based on drivers/serial/sa1100.c
+ * Author:   Aubrey Li <[EMAIL PROTECTED]>
+ *
+ * Created:
+ * Description:  Driver for blackfin 5xx serial ports
+ *
+ * Rev:  

[PATCH -mm 1/5] Blackfin: blackfin architecture patch update

2007-02-28 Thread Wu, Bryan
Hi folks,

Here is the update version of blackfin-arch.patch in -mm tree.
simply add support to utrace and it was tested on blackfin STAMP board
as well as other following patches.

The whole patch is located at URL:
https://blackfin.uclinux.org/gf/download/frsrelease/39/2583/blackfin-arch.patch
The incremental patch is located at URL:
https://blackfin.uclinux.org/gf/download/frsrelease/39/2584/blackfin-arch-mm2-update.patch

[PATCH] Blackfin Architecture

This adds support for the Analog Devices Blackfin processor
architecture, and currently supports the BF533, BF532, BF531, BF537,
BF536, BF534, and BF561 (Dual Core) devices, with a variety of
development platforms including those avaliable from Analog Devices
(BF533-EZKit, BF533-STAMP, BF537-STAMP, BF561-EZKIT), and Bluetechnix!
Tinyboards.

The Blackfin architecture was jointly developed by Intel and Analog
Devices Inc. (ADI) as the Micro Signal Architecture (MSA) core and
introduced it in December of 2000. Since then ADI has put this core into
it’s Blackfin processor family of devices. The Blackfin core has the
advantages of a clean, orthogonal,RISC-like microprocessor instruction
set. It combines a dual‑MAC (Multiply/Accumulate), state‑of‑the‑art
signal processing engine and single-instruction, multiple‑data (SIMD)
multimedia capabilities into a single instruction-set architecture. 

The Blackfin architecture, including the instruction set, is described
by the ADSP-BF53x/BF56x Blackfin® Processor  Programming Reference
http://blackfin.uclinux.org/gf/download/frsrelease/29/2549/Blackfin_PRM.pdf

The Blackfin processor is already supported by major releases of gcc,
and there are binary and source rpms/tarballs for many architectures at:
http://blackfin.uclinux.org/gf/project/toolchain/frs
There is complete documentation, including "getting started" guides
available at:
http://docs.blackfin.uclinux.org/
which provides links to the sources and patches you will need in order
to set up a cross-compiling environment for bfin-linux-uclibc

This patch, as well as the other patches (toolchain, distribution,
uClibc) are actively supported by Analog Devices Inc, at:
http://blackfin.uclinux.org/

We have tested this on LTP, and our test plan (including pass/fails) can
be found at:
http://docs.blackfin.uclinux.org/doku.php?id=testing_the_linux_kernel

Signed-off-by: Bryan Wu <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
--- 

 arch/blackfin/Kconfig  |  989 
 arch/blackfin/Makefile |   81 
 arch/blackfin/boot/Makefile|   27 
 arch/blackfin/defconfig| 1314 +++
 arch/blackfin/kernel/Makefile  |   13 
 arch/blackfin/kernel/asm-offsets.c |  140 +
 arch/blackfin/kernel/bfin_dma_5xx.c|  747 ++
 arch/blackfin/kernel/bfin_gpio.c   |  654 +
 arch/blackfin/kernel/bfin_ksyms.c  |  119 
 arch/blackfin/kernel/dma-mapping.c |  174 +
 arch/blackfin/kernel/dualcore_test.c   |   51 
 arch/blackfin/kernel/entry.S   |   96 
 arch/blackfin/kernel/init_task.c   |   63 
 arch/blackfin/kernel/irqchip.c |  149 +
 arch/blackfin/kernel/module.c  |  431 +++
 arch/blackfin/kernel/process.c |  398 +++
 arch/blackfin/kernel/ptrace.c  |  421 +++
 arch/blackfin/kernel/setup.c   |  921 +++
 arch/blackfin/kernel/signal.c  |  436 +++
 arch/blackfin/kernel/sys_bfin.c|  133 +
 arch/blackfin/kernel/time.c|  330 ++
 arch/blackfin/kernel/traps.c   |  666 +
 arch/blackfin/kernel/vmlinux.lds.S |  221 +
 arch/blackfin/lib/Makefile |   11 
 arch/blackfin/lib/ashldi3.c|   60 
 arch/blackfin/lib/ashrdi3.c|   61 
 arch/blackfin/lib/checksum.c   |  142 +
 arch/blackfin/lib/divsi3.S |  217 +
 arch/blackfin/lib/gcclib.h |   49 
 arch/blackfin/lib/ins.S|   71 
 arch/blackfin/lib/lshrdi3.c|   74 
 arch/blackfin/lib/memchr.S |   65 
 arch/blackfin/lib/memcmp.S |  110 
 arch/blackfin/lib/memcpy.S |  135 +
 arch/blackfin/lib/memmove.S|  103 
 arch/blackfin/lib/memset.S |  109 
 arch/blackfin/lib/modsi3.S |   81 
 arch/blackfin/lib/muldi3.c |  101 
 arch/blackfin/lib/outs.S   |   64 
 arch/blackfin/lib/smulsi3_highpart.S   |   30 
 arch/blackfin/lib/strcmp.c |   11 
 arch/blackfin/lib/strcpy.c   

[PATCH] Fix soft lockup with iSeries viocd driver

2007-02-28 Thread Tony Breeds
From: Tony Breeds <[EMAIL PROTECTED]>

Fix soft lockup with iSeries viocd driver, caused by eventually calling
end_that_request_first() with nr_bytes 0.

The lockup is triggered by hald, interrogating the device.

Signed-off-by: Tony Breeds <[EMAIL PROTECTED]>
Signed-off-by: Jens Axboe <[EMAIL PROTECTED]>

---
 viocd.c |   27 +++
 1 file changed, 23 insertions(+), 4 deletions(-)

Index: linux-2.6.20-rc5/drivers/cdrom/viocd.c
===
--- linux-2.6.20-rc5.orig/drivers/cdrom/viocd.c
+++ linux-2.6.20-rc5/drivers/cdrom/viocd.c
@@ -376,6 +376,25 @@ static int send_request(struct request *
return 0;
 }
 
+static void viocd_end_request(struct request *req, int uptodate)
+{
+   int nsectors = req->hard_nr_sectors;
+
+   /*
+* Make sure it's fully ended, and ensure that we process
+* at least one sector.
+*/
+   if (blk_pc_request(req))
+   nsectors = (req->data_len + 511) >> 9;
+   if (!nsectors)
+   nsectors = 1;
+
+   if (end_that_request_first(req, uptodate, nsectors))
+   BUG();
+   add_disk_randomness(req->rq_disk);
+   blkdev_dequeue_request(req);
+   end_that_request_last(req, uptodate);
+}
 
 static int rwreq;
 
@@ -385,11 +404,11 @@ static void do_viocd_request(request_que
 
while ((rwreq == 0) && ((req = elv_next_request(q)) != NULL)) {
if (!blk_fs_request(req))
-   end_request(req, 0);
+   viocd_end_request(req, 0);
else if (send_request(req) < 0) {
printk(VIOCD_KERN_WARNING
"unable to send message to OS/400!");
-   end_request(req, 0);
+   viocd_end_request(req, 0);
} else
rwreq++;
}
@@ -601,9 +620,9 @@ return_complete:
"with rc %d:0x%04X: %s\n",
req, event->xRc,
bevent->sub_result, err->msg);
-   end_request(req, 0);
+   viocd_end_request(req, 0);
} else
-   end_request(req, 1);
+   viocd_end_request(req, 1);
 
/* restart handling of incoming requests */
spin_unlock_irqrestore(_reqlock, flags);





Yours Tony

  linux.conf.auhttp://linux.conf.au/ || http://lca2008.linux.org.au/
  Jan 28 - Feb 02 2008 The Australian Linux Technical Conference!



Yours Tony

  linux.conf.auhttp://linux.conf.au/ || http://lca2008.linux.org.au/
  Jan 28 - Feb 02 2008 The Australian Linux Technical Conference!

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc1: known regressions (v2) (part 1)

2007-02-28 Thread Jeff Chua

On 3/1/07, Michael S. Tsirkin <[EMAIL PROTECTED]> wrote:


with 2.6.20, pressing Fn/F4 generates an ACPI event and triggers suspend to RAM.

On 2.6.21-rc2, after resume (when the box is accessible from network),
pressing Fn/F4 again does not seem to have any effect.


I have the same problem on my IBM X60s on rc1 and rc2. Can't resume
from RAM, can't suspend to disk. It is possible to revert all the
changes to ACPI and test it?

Jeff.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 02/12] syslets: add syslet.h include file, user API/ABI definitions

2007-02-28 Thread Kevin O'Connor
On Wed, Feb 28, 2007 at 10:41:17PM +0100, Ingo Molnar wrote:
> From: Ingo Molnar <[EMAIL PROTECTED]>
> 
> add include/linux/syslet.h which contains the user-space API/ABI
> declarations. Add the new header to include/linux/Kbuild as well.

Hi Ingo,

I'd like to propose a simpler userspace API for syslets.  I believe
this revised API is just as capable as yours (anything done purely in
kernel space with the existing API can also be done with this one).

An "atom" would look like:

struct syslet_uatom {
u32 nr;
u64 ret_ptr;
u64 next;
u64 arg_nr;
u64 args[6];
};

The sys_nr, ret_ptr, and next fields would be unchanged.  The args
array would directly store the arguments to the system call.  To
optimize the case where only a few arguments are necessary, an
explicit argument count would be set in the arg_nr field.

The above is very similar to what Linus and Davide described as a
"single submission" syslet interface - it differs only with the
addition of the next parameter.  As with your API, a null next field
would immediately stop the syslet.  So, a user wishing to run a single
system call asynchronously could use the above interface with the next
field set to null.

Of course, the above lacks the syscall return testing capabilities in
your atoms.  To obtain that capability, one could add a new syscall:

long sys_syslet_helper(long flags, long *ptr, long inc, u64 new_next)

The above is effectively a combination of sys_umem_add and the "flags"
field from the existing syslet_uatom.  The system call would only be
available from syslets.  It would add "inc" to the integer stored in
"ptr" and return the result.  The "flags" field could optionally
contain one of:
 SYSLET_BRANCH_ON_NONZERO
 SYSLET_BRANCH_ON_ZERO
 SYSLET_BRANCH_ON_NEGATIVE
 SYSLET_BRANCH_ON_NON_POSITIVE
If the flag were set and the return result of the syscall met the
specified condition, then the code would arrange for the calling
syslet to branch to "new_next" instead of the normal "next".

I would also change the event ring notification system.  Instead of
building that support into all syslets, one could introduce an "add to
head" syscall specifically for that purpose.  If done this way,
userspace could arrange for this new sys_addtoring call to always be
the last uatom executed in a syslet.  This would make the support
optional - those userspace applications that prefer to use a futex or
signal as an event system could arrange to have those system calls as
the last one in the chain instead.  With this change, the
sys_async_exec would simplify to:

long sys_async_exec(struct syslet_uatom *uatom);

As above, I believe this API has as much power as the existing system.
The general idea is to make the system call / atoms simpler and use
more atoms when building complex chains.

For example, the open & stat case could be done with a chain like the
following:

atom1: >args[1] = sys_open(...)
atom2: sys_syslet_helper(SYSLET_BRANCH_ON_NON_POSITIVE,
 >args[1], 0, atom4)
atom3: sys_stat([arg1 filled above], ...)
atom4: sys_futex(...)   // alert parent of completion

It is also possible to use sys_syslet_helper to push a return value to
multiple syslet parameters (for example, propagating an fd from open
to multiple reads).  For example:

atom1: >args[1] = sys_open(...)
atom2: >args[1] = sys_syslet_helper(0, >args[1], 0, 0)
atom3: sys_read([arg1 filled in atom1], ...)
atom4: sys_read([arg1 filled in atom2], ...)
...

Although this is a bit ugly, I must wonder how many times one would
build chains complex enough to require it.

Cheers,
-Kevin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH]: Fix radeon blanking return value.

2007-02-28 Thread Antonino A. Daplas
On Wed, 2007-02-28 at 16:20 -0800, David Miller wrote:
> If you'll recall, over a year ago, I pointed out that the current
> Radeon driver erroneously returns -EINVAL for valid blanking codes,
> here is a link to that thread:
> 
>   http://lkml.org/lkml/2006/1/28/6
> 
> No other driver does this, and it confuses the X server into thinking
> that the device does not support blanking properly.
> 
> As a result I have to switch to a console VC and back to the X server
> to unblank the screen, which is rediculious.
> 
> I've had to do this for more than 3 years and I'm really tired of this
> problem still being around :-)
> 
> I looked again and there is simply no reason for the Radeon driver to
> return -EINVAL for FB_BLANK_NORMAL.  It claims it wants to do this in
> order to convince fbcon to blank in software, right here:
> 
>   if (fb_blank(info, blank))
>   fbcon_generic_blank(vc, info, blank);
> 
> to software blank the screen.  But it only causes that to happen
> in the FB_BLANK_NORMAL case.
> 
> That makes no sense because the Radeon code does this:
> 
>   val |= CRTC_DISPLAY_DIS;
> 
> in the FB_BLANK_NORMAL case so should be blanking the hardware, and
> there is therefore no reason to SW blank by returning -EINVAL.
> 
> Therefore I propose we finally apply the following patch.  No other
> fbdev driver does this madness, and as I've shown above there is no
> justification for the behavior at all :-)

This was before radeonfb did not fully differentiate each blanking
levels.  Currently, it is fixed, so I agree with this patch.

> 
> Signed-off-by: David S. Miller <[EMAIL PROTECTED]>
Acked-by: Antonino Daplas <[EMAIL PROTECTED]>

Tony


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] SLUB The unqueued slab allocator V3

2007-02-28 Thread David Miller
From: Christoph Lameter <[EMAIL PROTECTED]>
Date: Wed, 28 Feb 2007 17:06:19 -0800 (PST)

> On Wed, 28 Feb 2007, David Miller wrote:
> 
> > Arguably SLAB_HWCACHE_ALIGN and SLAB_MUST_HWCACHE_ALIGN should
> > not be set here, but SLUBs change in semantics in this area
> > could cause similar grief in other areas, an audit is probably
> > in order.
> > 
> > The above example was from sparc64, but x86 does the same thing
> > as probably do other platforms which use SLAB for pagetables.
> 
> Maybe this will address these concerns?
> 
> Index: linux-2.6.21-rc2/mm/slub.c
> ===
> --- linux-2.6.21-rc2.orig/mm/slub.c   2007-02-28 16:54:23.0 -0800
> +++ linux-2.6.21-rc2/mm/slub.c2007-02-28 17:03:54.0 -0800
> @@ -1229,8 +1229,10 @@ static int calculate_order(int size)
>  static unsigned long calculate_alignment(unsigned long flags,
>   unsigned long align)
>  {
> - if (flags & (SLAB_MUST_HWCACHE_ALIGN|SLAB_HWCACHE_ALIGN))
> + if (flags & SLAB_HWCACHE_ALIGN)
>   return L1_CACHE_BYTES;
> + if (flags & SLAB_MUST_HWCACHE_ALIGN)
> + return max(align, (unsigned long)L1_CACHE_BYTES);
>  
>   if (align < ARCH_SLAB_MINALIGN)
>   return ARCH_SLAB_MINALIGN;

It would achiever parity with existing SLAB behavior, sure.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


System hanging randomly (SMP Kernel 2.6.20) - ATI chipset+Pentium 4HT

2007-02-28 Thread Xavier Callejas

Hi,

I have a laptop Toshiba A70, I attach a DMESG so you know my system.

The problem is that the system hang randomly and I don't know why, I can se 
nothing on /var/log/message realted to the hang (I have to force a shutdown 
first).

With the NOAPIC option the hang just delay or dismish in frequency. Sometimes 
the mouse still can move.

I can't work in my comp. like this, I have posted many messages to suse forum, 
and other forums but nobody help me. I'm a primary Linux user, I'm a little 
disapointed because I have not been able to find the solution since monts.

I've posted may messages to suse forums, and other forums but with no 
solution. Here is my most recent post:

http://forums.suselinuxsupport.de/

even writing the middle of this email my laptop get hanged (I booted without 
the noapic option).

Thank you in advance.

-- 
Xavier Callejas
International Bonded Couriers
El Salvador
+503 2250-5900

¡Toda una Organización Mundial a Su Servicio!

--
Open Your Mind, Use Open Source.
Linux version 2.6.20-245-xavier ([EMAIL PROTECTED]) (gcc version 4.1.2 20061115 
(prerelease) (SUSE Linux)) #1 SMP Mon Feb 26 15:00:17 CST 2007
BIOS-provided physical RAM map:
sanitize start
sanitize end
copy_e820_map() start:  size: 0009f800 end: 
0009f800 type: 1
copy_e820_map() type is E820_RAM
copy_e820_map() start: 0009f800 size: 0800 end: 
000a type: 2
copy_e820_map() start: 000d size: 8000 end: 
000d8000 type: 2
copy_e820_map() start: 000e4000 size: 0001c000 end: 
0010 type: 2
copy_e820_map() start: 0010 size: 1be7 end: 
1bf7 type: 1
copy_e820_map() type is E820_RAM
copy_e820_map() start: 1bf7 size: b000 end: 
1bf7b000 type: 3
copy_e820_map() start: 1bf7b000 size: 5000 end: 
1bf8 type: 4
copy_e820_map() start: 1bf8 size: 0008 end: 
1c00 type: 2
copy_e820_map() start: 2bf8 size: 0008 end: 
2c00 type: 2
copy_e820_map() start: fec0 size: 0001 end: 
fec1 type: 2
copy_e820_map() start: fee0 size: 1000 end: 
fee01000 type: 2
copy_e820_map() start: fff8 size: 0008 end: 
0001 type: 2
 BIOS-e820:  - 0009f800 (usable)
 BIOS-e820: 0009f800 - 000a (reserved)
 BIOS-e820: 000d - 000d8000 (reserved)
 BIOS-e820: 000e4000 - 0010 (reserved)
 BIOS-e820: 0010 - 1bf7 (usable)
 BIOS-e820: 1bf7 - 1bf7b000 (ACPI data)
 BIOS-e820: 1bf7b000 - 1bf8 (ACPI NVS)
 BIOS-e820: 1bf8 - 1c00 (reserved)
 BIOS-e820: 2bf8 - 2c00 (reserved)
 BIOS-e820: fec0 - fec1 (reserved)
 BIOS-e820: fee0 - fee01000 (reserved)
 BIOS-e820: fff8 - 0001 (reserved)
0MB HIGHMEM available.
447MB LOWMEM available.
found SMP MP-table at 000f6ae0
Entering add_active_range(0, 0, 114544) 0 entries of 256 used
Zone PFN ranges:
  DMA 0 -> 4096
  Normal   4096 ->   114544
  HighMem114544 ->   114544
early_node_map[1] active PFN ranges
0:0 ->   114544
On node 0 totalpages: 114544
  DMA zone: 32 pages used for memmap
  DMA zone: 0 pages reserved
  DMA zone: 4064 pages, LIFO batch:0
  Normal zone: 862 pages used for memmap
  Normal zone: 109586 pages, LIFO batch:31
  HighMem zone: 0 pages used for memmap
DMI present.
ACPI: RSDP (v000 PTLTD ) @ 0x000f6b40
ACPI: RSDT (v001 PTLTDRSDT   0x0604  LTP 0x) @ 0x1bf75dd2
ACPI: FADT (v001 TOSCPL Chinook  0x0604 ATI  0x0003) @ 0x1bf7af24
ACPI: MADT (v001 PTLTD   APIC   0x0604  LTP 0x) @ 0x1bf7af98
ACPI: SSDT (v001  PmRefCpuPm 0x3000 INTL 0x20030224) @ 0x1bf75e02
ACPI: DSDT (v001 TOSCPLSB200 0x0604 MSFT 0x010e) @ 0x
ATI board detected. Disabling timer routing over 8254.
ACPI: PM-Timer IO Port: 0x8008
ACPI: Local APIC address 0xfee0
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 15:4 APIC version 20
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
Processor #1 15:4 APIC version 20
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 2, version 17, address 0xfec0, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 21 low level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
Enabling APIC mode:  Flat.  Using 1 I/O APICs
Using ACPI (MADT) for SMP configuration information
Allocating 

[PATCH] Loop device - Tracking page writes made to a loop device through mmap

2007-02-28 Thread Kandan Venkataraman
The patch is for tracking writes made to a loop device made through
mmap. 
 
A  file_operations structure variable called loop_fops is initialised
with the default block device file operations (def_blk_fops).
The mmap operation is overriden with a new function called
loop_file_mmap. 
 
A vm_operations structure variable called loop_file_vm_ops is
initialised with the default operations for a disk file.
The page_mkwrite operation in this variable is initialised to a new
function called loop_track_pgwrites.
 
In the function lo_open, the file operations pointer of the device file
is initialised with the address of loop_fops.
 
The function loop_file_mmap simply calls generic_file_mmap and then
initialises the vm_ops of the vma with address of loop_file_vm_ops.
 
The function loop_track_pgwrites stores the page offset of the page that
is being written to,  in a red-black tree within the loop device.
 
A flag lo_track_pgwrite has been added to the structs loop_device and
loop_info64 to turn on/off tracking of page writes.
 
Two new ioctls have been added.
 
The ioctl cmd LOOP_GET_PGWRITES retrieves the page offsets of pages that
have been written to.
The ioctl cmd LOOP_CLR_PGWRITES empties the red-black tree
 
This functionality would allow us to have a read only version and a
write version of memory by doing the following:
Associate a normal file as backing storage for  the loop device and mmap
to the loop device. Call this mmapped address space as area1.
Mmap to a normal file of identical size. Call this mmapped address space
as area2.
 
Changes made to area1 can be periodically copied to area2 using the
ioctl cmds (retreive dirty page offsets and copy the dirty pages from
area1 to area2). This facility would provide a quick way of updating the
read only version.
 
The following patch is against vanilla linux-2.6.19.2
 
Signed-off-by: Kandan Venkataraman [EMAIL PROTECTED]
 
 
diff -uprN linux-2.6.19.2/drivers/block/loop.c
linux-2.6.19.2-new/drivers/block/loop.c
--- linux-2.6.19.2/drivers/block/loop.c 2007-01-11 06:10:37.0
+1100
+++ linux-2.6.19.2-new/drivers/block/loop.c 2007-02-27
17:23:18.0 +1100
@@ -74,12 +74,16 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
 static int max_loop = 8;
 static struct loop_device *loop_dev;
 static struct gendisk **disks;
+static kmem_cache_t *pgoff_elem_cache;
+static char*  cache_name = "loop_pgoff_elem_cache";
+static struct file_operations loop_fops;
 
 /*
  * Transfer functions
@@ -646,6 +650,73 @@ static void do_loop_switch(struct loop_d
  complete(>wait);
 }
 
+static void pgoff_tree_clear(struct rb_root *rb_root)
+{
+ struct rb_node *rb_node  = rb_root->rb_node;
+
+ while (rb_node != NULL) {
+
+  rb_erase(rb_node, rb_root); 
+  kmem_cache_free(pgoff_elem_cache, rb_entry(rb_node, struct
pgoff_elem, node));
+  rb_node = rb_root->rb_node;
+ }
+
+  *rb_root = RB_ROOT;
+}
+
+
+static int loop_clr_pgwrites(struct loop_device *lo)
+{
+ struct file *filp = lo->lo_backing_file;
+
+ if (lo->lo_state != Lo_bound)
+  return -ENXIO;
+
+ if (filp == NULL)
+  return -EINVAL;
+
+ if (!lo->lo_track_pgwrite)
+   return 0;
+
+ pgoff_tree_clear(>pgoff_tree);
+
+ return 0;
+}
+
+static int loop_get_pgwrites(struct loop_device *lo, struct
loop_pgoff_array __user *arg)
+{
+ struct file *filp = lo->lo_backing_file;
+ struct loop_pgoff_array array;
+ loff_t i = 0;
+ struct rb_node *rb_node  = rb_first(>pgoff_tree);
+
+ if (lo->lo_state != Lo_bound)
+  return -ENXIO;
+
+ if (filp == NULL)
+  return -EINVAL;
+
+ if (!lo->lo_track_pgwrite)
+   return 0;
+
+ if (copy_from_user(, arg, sizeof (struct loop_pgoff_array)))
+  return -EFAULT;
+
+ while (i < array.max && rb_node != NULL) {
+
+   if (put_user(rb_entry(rb_node, struct pgoff_elem, node)->offset,
array.pgoff + i))
+   return -EFAULT;
+
+   ++i;
+   rb_node = rb_next(rb_node);
+ }
+ array.num = i;
+
+ if (copy_to_user(arg, , sizeof(array)))
+   return -EFAULT;
+
+ return 0;
+}
 
 /*
  * loop_change_fd switched the backing store of a loopback device to
@@ -692,6 +763,8 @@ static int loop_change_fd(struct loop_de
  if (get_loop_size(lo, file) != get_loop_size(lo, old_file))
   goto out_putf;
 
+ pgoff_tree_clear(>pgoff_tree);
+
  /* and ... switch */
  error = loop_switch(lo, file);
  if (error)
@@ -799,6 +872,8 @@ static int loop_set_fd(struct loop_devic
  lo->transfer = transfer_none;
  lo->ioctl = NULL;
  lo->lo_sizelimit = 0;
+ lo->lo_track_pgwrite = 0;
+ lo->pgoff_tree = RB_ROOT;
  lo->old_gfp_mask = mapping_gfp_mask(mapping);
  mapping_set_gfp_mask(mapping, lo->old_gfp_mask &
~(__GFP_IO|__GFP_FS));
 
@@ -913,6 +988,8 @@ static int loop_clr_fd(struct loop_devic
  lo->lo_sizelimit = 0;
  lo->lo_encrypt_key_size = 0;
  lo->lo_flags = 0;
+ lo->lo_track_pgwrite = 0;
+ pgoff_tree_clear(>pgoff_tree);
  lo->lo_thread = NULL;
  memset(lo->lo_encrypt_key, 0, LO_KEY_SIZE);
  memset(lo->lo_crypt_name, 0, LO_NAME_SIZE);
@@ -969,6 +1046,14 @@ loop_set_status(struct loop_device *lo, 
return -EFBIG;
  }
 
+ if 

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-02-28 Thread Andrea Arcangeli
On Thu, Mar 01, 2007 at 12:12:28AM +0100, Ingo Molnar wrote:
> more capable by providing more special system calls like sys_upcall() to 
> execute a user-space function. (that way a syslet could still execute 
> user-space code without having to exit out of kernel mode too 
> frequently) Or perhaps a sys_x86_bytecode() call, that would execute a 
> pre-verified, kernel-stored sequence of simplified x86 bytecode, using 
> the kernel stack.

Which means the userspace code would then run with kernel privilege
level somehow (after security verifier, whatever). You remember I
think it's a plain crazy idea...

I don't want to argue about syslets, threadlets, whatever async or
syscall-merging mechanism here, I'm just focusing on the idea of yours
of running userland code in kernel space somehow (I hoped you given up
on it by now). Fixing the greatest syslets limitation is going to open
a can of worms as far as security is concerned.

The fact that userland code must not run with kernel privilege level,
is the reason why syslets aren't very useful (but again: focusing on
the syslets vs async-syscalls isn't my interest).

Frankly I think this idea of running userland code with kernel
privileges fits in the same category of porting linux to segmentation
to avoid the cost of pagetables to gain some bit of performance
despite losing in many other areas. Nobody in real life will want to
make that trade, for such an incredibly small performance
improvement.

For things that can be frequently combined, it's much simpler and
cheaper to created a "merged" syscall (i.e. sys_spawn =
sys_fork+sys_exec) than to invent a way to upload userland generated
bytecodes to kernel space to do that.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20 SATA error

2007-02-28 Thread Gerhard Mack
On Wed, 28 Feb 2007, Robert Hancock wrote:

> Date: Wed, 28 Feb 2007 18:21:48 -0600
> From: Robert Hancock <[EMAIL PROTECTED]>
> To: Gerhard Mack <[EMAIL PROTECTED]>
> Cc: linux-kernel ,
> Charles Shannon Hendrix <[EMAIL PROTECTED]>
> Subject: Re: 2.6.20 SATA error
> 
> Gerhard Mack wrote:
> > > > Sorry for the false alarm, 
> > > There is one thing that seems odd, if you do have an nForce4 chipset, the
> > > kernel should be running the SATA controller in ADMA mode in 2.6.20, but
> > > it
> > > doesn't seem like it is from your dmesg output. Can you post the output of
> > > "lspci -vvn"? Also what kind of motherboard is that?
> > > 
> > Sure thing.  It's an Asus m2npv-vm.
> 
> Ah, that would be why, it's not one of the original nForce4 (CK804/MCP04)
> chipsets, it's the newer nForce 430 (MCP51) chipset which doesn't support
> ADMA. NVidia said they'd be sending some patches to allow NCQ support on MCP51
> and MCP61 chipsets back in October, but I haven't seen any, or information
> required to implement same..

fun stuff.. I guess it's back to trying to get the onboard ethernet card 
to work in debian.

Gerhard

--
Gerhard Mack

[EMAIL PROTECTED]

<>< As a computer I find your faith in technology amusing.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] udivdi3: 64 bit divide

2007-02-28 Thread Tim Schmielau
On Tue, 27 Feb 2007, Andrew Morton wrote:
> > On Mon, 26 Feb 2007 17:35:17 -0800 Stephen Hemminger <[EMAIL PROTECTED]> 
> > wrote:
> > The kernel already has several implmentations and usages of 64 by 64 
> > bit divide.
> > 
> > Although it is significantly slower, there are places that need it so
> > provide one generic version using scaling, and allow existing platform
> > versions to continue.
> 
> The reason we implement 64/32 via do_div() is, for better or for worse, to
> make people think before they use it.  And to make it stand out, and so
> that we discover places that are using it by accident, where they could use
> something cheaper.

IMHO it is even more important that the user of your 64/64 div is aware 
that it only returns an approximate result.

I certainly don't want to have any code in the kernel that by accident 
makes an allocation a few bytes short of the actual size of the object 
(just to make up a drastic example).

Tim
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PID entries in /proc sorted by number, not start time in 2.6.19

2007-02-28 Thread Albert Cahalan

On 2/28/07, Eric W. Biederman <[EMAIL PROTECTED]> wrote:

Chuck Ebbert <[EMAIL PROTECTED]> writes:

> Starting with kernel 2.6.19, the process directories in
> /proc are sorted by number. They were sorted by process
> start time in 2.6.18 and earlier. This makes the output
> of procps come out in that order too, pissing off users
> who are used to the old way.


ps --sort=start_time

I've always just assumed the order to be random.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] SLUB The unqueued slab allocator V3

2007-02-28 Thread Christoph Lameter
On Wed, 28 Feb 2007, David Miller wrote:

> Maybe if you managed your individual changes in GIT or similar
> this could be debugged very quickly. :-)

I think once things calm down and the changes become smaller its going 
to be easier. Likely the case with after V4.

> Meanwhile I noticed that your alignment algorithm is different
> than SLAB's.  And I think this is important for the page table
> SLABs that some platforms use.

Ok.
 
> No matter what flags are specified, SLAB gives at least the
> passed in alignment specified in kmem_cache_create().  That
> logic in slab is here:
> 
>   /* 3) caller mandated alignment */
>   if (ralign < align) {
>   ralign = align;
>   }

Hmmm... Right.
 
> Whereas SLUB uses the CPU cacheline size when the MUSTALIGN
> flag is set.  Architectures do things like:
> 
>   pgtable_cache = kmem_cache_create("pgtable_cache",
> PAGE_SIZE, PAGE_SIZE,
> SLAB_HWCACHE_ALIGN |
> SLAB_MUST_HWCACHE_ALIGN,
> zero_ctor,
> NULL);
> 
> to get a PAGE_SIZE aligned slab, SLUB doesn't give the same
> behavior SLAB does in this case.

SLUB only supports this by passing through allocations to the page 
allocator since it does not maintain queues. So the above will cause the 
pgtable_cache to use the caches of the page allocator. The queueing effect 
that you get from SLAB is not present in SLUB since it does not provide 
them. If SLUB is to be used this way then we need to have higher order 
page sizes and allocate chunks from the higher order page for the 
pgtable_cache.

There are other ways of doing it. IA64 f.e. uses a linked list to 
accomplish the same avoiding SLAB overhead.

> Arguably SLAB_HWCACHE_ALIGN and SLAB_MUST_HWCACHE_ALIGN should
> not be set here, but SLUBs change in semantics in this area
> could cause similar grief in other areas, an audit is probably
> in order.
> 
> The above example was from sparc64, but x86 does the same thing
> as probably do other platforms which use SLAB for pagetables.

Maybe this will address these concerns?

Index: linux-2.6.21-rc2/mm/slub.c
===
--- linux-2.6.21-rc2.orig/mm/slub.c 2007-02-28 16:54:23.0 -0800
+++ linux-2.6.21-rc2/mm/slub.c  2007-02-28 17:03:54.0 -0800
@@ -1229,8 +1229,10 @@ static int calculate_order(int size)
 static unsigned long calculate_alignment(unsigned long flags,
unsigned long align)
 {
-   if (flags & (SLAB_MUST_HWCACHE_ALIGN|SLAB_HWCACHE_ALIGN))
+   if (flags & SLAB_HWCACHE_ALIGN)
return L1_CACHE_BYTES;
+   if (flags & SLAB_MUST_HWCACHE_ALIGN)
+   return max(align, (unsigned long)L1_CACHE_BYTES);
 
if (align < ARCH_SLAB_MINALIGN)
return ARCH_SLAB_MINALIGN;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] needs to include

2007-02-28 Thread Andrew Morton
On Sat, 24 Feb 2007 12:22:11 +
Ralf Baechle <[EMAIL PROTECTED]> wrote:

> sysdev.h uses THIS_MODULE so should include .
> 
> Signed-off-by: Ralf Baechle <[EMAIL PROTECTED]>
> 
> diff --git a/include/linux/sysdev.h b/include/linux/sysdev.h
> index 389ccf8..e699ab2 100644
> --- a/include/linux/sysdev.h
> +++ b/include/linux/sysdev.h
> @@ -22,6 +22,7 @@
>  #define _SYSDEV_H_
>  
>  #include 
> +#include 
>  #include 
>  

This causes an x86_64 trainwreck:

akpm2:/usr/src/25> xb arch/x86_64/ia32/ia32_binfmt.o
  CHK include/linux/version.h
  CHK include/linux/utsrelease.h
  CC  arch/x86_64/kernel/asm-offsets.s
  GEN include/asm-x86_64/asm-offsets.h
  CC  arch/x86_64/ia32/ia32_binfmt.o
arch/x86_64/ia32/ia32_binfmt.c:48:1: warning: "ELF_ET_DYN_BASE" redefined
In file included from include/linux/elf.h:7,
 from include/linux/module.h:15,
 from include/linux/sysdev.h:25,
 from include/linux/sched.h:1645,
 from arch/x86_64/ia32/ia32_binfmt.c:11:
include/asm/elf.h:93:1: warning: this is the location of the previous definition
arch/x86_64/ia32/ia32_binfmt.c:58:1: warning: "USE_ELF_CORE_DUMP" redefined
include/asm/elf.h:85:1: warning: this is the location of the previous definition
arch/x86_64/ia32/ia32_binfmt.c:62: error: conflicting types for 'elf_greg_t'
include/asm/elf.h:32: error: previous declaration of 'elf_greg_t' was here
arch/x86_64/ia32/ia32_binfmt.c:64:1: warning: "ELF_NGREG" redefined
include/asm/elf.h:34:1: warning: this is the location of the previous definition
arch/x86_64/ia32/ia32_binfmt.c:65: error: conflicting types for 'elf_gregset_t'
include/asm/elf.h:35: error: previous declaration of 'elf_gregset_t' was here
arch/x86_64/ia32/ia32_binfmt.c:118:1: warning: "ELF_CORE_COPY_REGS" redefined
include/asm/elf.h:99:1: warning: this is the location of the previous definition
arch/x86_64/ia32/ia32_binfmt.c:139:1: warning: "__ASM_X86_64_ELF_H" redefined
include/asm/elf.h:2:1: warning: this is the location of the previous definition
arch/x86_64/ia32/ia32_binfmt.c:144: error: conflicting types for 
'elf_fpregset_t'
include/asm/elf.h:37: error: previous declaration of 'elf_fpregset_t' was here
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel Oops with shm namespace cleanups

2007-02-28 Thread Eric W. Biederman
Adam Litke <[EMAIL PROTECTED]> writes:

> Hey.  While testing 2.6.21-rc2 with libhugetlbfs, the shm-fork test case
> causes the kernel to oops.  To reproduce:  Execute 'make check' in the
> latest libhugetlbfs source on a 2.6.21-rc2 kernel with 100 huge pages
> allocated.  Using fewer huge pages will likely also trigger the oops.
> Libhugetlbfs can be downloaded from:
> http://libhugetlbfs.ozlabs.org/snapshots/libhugetlbfs-dev-20070228.tar.gz
>
> I have collected the following information:

Thanks.  I'm going to be offline starting early tomorrow so I'm
unfortunately not going to be timely in tracing this one down.

Ok. Looking at the code I have a clue what is going on.  I think
I must have been out of it the day I wrote this patch.  I don't have
fsync or get_unmapped_area methods appropriately wrapped.  I clearly
did not do a close audit of the filesystem methods that hugetlbfs
inodes use.  I may have just gotten luck on other architectures.

get_unmapped_area looks like it will be a bit of a trick.

If it is just failing to wrap the methods a couple of file methods
then the patch below should fix it or come close.  That's the best
I can do before I leave.

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index a60995a..44f1f05 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -168,7 +168,9 @@ void hugetlb_put_quota(struct address_space *mapping);
 
 static inline int is_file_hugepages(struct file *file)
 {
-   return file->f_op == _file_operations;
+   return (file->f_op == _file_operations) ||
+   is_file_shm_hugepages(file);
+   
 }
 
 static inline void set_file_hugepages(struct file *file)
diff --git a/include/linux/shm.h b/include/linux/shm.h
index a2c896a..ad2e3af 100644
--- a/include/linux/shm.h
+++ b/include/linux/shm.h
@@ -96,12 +96,17 @@ struct shmid_kernel /* private to the kernel */
 
 #ifdef CONFIG_SYSVIPC
 long do_shmat(int shmid, char __user *shmaddr, int shmflg, unsigned long 
*addr);
+extern int is_file_shm_hugepages(struct file *file);
 #else
 static inline long do_shmat(int shmid, char __user *shmaddr,
int shmflg, unsigned long *addr)
 {
return -ENOSYS;
 }
+static inline int is_file_shm_hugepages(struct file *file)
+{
+   return 0;
+}
 #endif
 
 #endif /* __KERNEL__ */
diff --git a/ipc/shm.c b/ipc/shm.c
index 26b935b..93cfa35 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -235,7 +235,7 @@ struct page *shm_nopage(struct vm_area_struct *vma, 
unsigned long address, int *
 }
 
 #ifdef CONFIG_NUMA
-int shm_set_policy(struct vm_area_struct *vma, struct mempolicy *new)
+static int shm_set_policy(struct vm_area_struct *vma, struct mempolicy *new)
 {
struct file *file = vma->vm_file;
struct shm_file_data *sfd = shm_file_data(file);
@@ -245,7 +245,7 @@ int shm_set_policy(struct vm_area_struct *vma, struct 
mempolicy *new)
return err;
 }
 
-struct mempolicy *shm_get_policy(struct vm_area_struct *vma, unsigned long 
addr)
+static struct mempolicy *shm_get_policy(struct vm_area_struct *vma, unsigned 
long addr)
 {
struct file *file = vma->vm_file;
struct shm_file_data *sfd = shm_file_data(file);
@@ -284,21 +284,41 @@ static int shm_release(struct inode *ino, struct file 
*file)
return 0;
 }
 
-#ifndef CONFIG_MMU
+static int shm_fsync(struct file *file, struct dentry *dentry, int datasync)
+{
+   int (*fsync) (struct file *, struct dentry *, int datasync);
+   struct shm_file_data *sfd;
+   int ret;
+   ret = -EINVAL;
+   sfd = shm_file_data(file);
+   fsync = sfd->file->f_op->fsync;
+   if (fsync)
+   ret = fsync(sfd->file, sfd->file->f_path.dentry, datasync);
+   return ret;
+}
+
 static unsigned long shm_get_unmapped_area(struct file *file,
unsigned long addr, unsigned long len, unsigned long pgoff,
unsigned long flags)
 {
struct shm_file_data *sfd = shm_file_data(file);
-   return sfd->file->f_op->get_unmapped_area(sfd->file, addr, len, pgoff,
-   flags);
+   return get_unmapped_area(file, addr, len, pgoff, flags);
+}
+
+int is_file_shm_hugepages(struct file *file)
+{
+   int ret = 0;
+   if (file->f_op == _file_operations) {
+   struct shm_file_data *sfd;
+   sfd = shm_file_data(file);
+   ret = is_file_hugepages(file);
+   }
+   return ret;
 }
-#else
-#define shm_get_unmapped_area NULL
-#endif
 
 static struct file_operations shm_file_operations = {
.mmap   = shm_mmap,
+   .fsync  = shm_fsync,
.release= shm_release,
.get_unmapped_area  = shm_get_unmapped_area,
 };


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.20] kobject net ifindex + rename

2007-02-28 Thread Johannes Berg
On Wed, 2007-02-28 at 16:51 -0800, Jean Tourrilhes wrote:

>   I would prefer to fix the comment when this change actually
> happens. I prefer comments to refer to the current reality, rather
> than past/future situation.

Uh, no. device_rename is perfectly fine, even other people may use it in
the future.

>  When you introduce wireless renaming, you
> will need to verify the whole chain anyway, so you might as well fix
> the comment while merging wireless renaming.

No again, device_rename is perfectly fine API, I shouldn't have to look
at it's internals to see if it's broken in my use case. Even if it's
only a broken comment.

I'm not going to respin your patches though, if this doesn't make it in
I don't care.

johannes


signature.asc
Description: This is a digitally signed message part


Re: [PATCH 2.6.20] kobject net ifindex + rename

2007-02-28 Thread Jean Tourrilhes
On Thu, Mar 01, 2007 at 01:37:46AM +0100, Johannes Berg wrote:
> On Wed, 2007-02-28 at 16:26 -0800, Jean Tourrilhes wrote:
> 
> > +   /* This function is only used for network interface.
> > +* Some hotplug package track interfaces by their name and
> > +* therefore want to know when the name is changed by the user. */
> 
> Right now, that's true, but wireless is going to start using
> device_rename pretty soon as well. Could you rephrase this comment?
> 
> johannes

I would prefer to fix the comment when this change actually
happens. I prefer comments to refer to the current reality, rather
than past/future situation. When you introduce wireless renaming, you
will need to verify the whole chain anyway, so you might as well fix
the comment while merging wireless renaming.
Note also that my comment is technically correct. I did not
say 'netdev' but the more generic term 'network interface', and I
believe your wireless interface is a 'network interface', even if it's
not a netdev ;-)
But if this really bugs you, please feel free to respin my
patch.

Have fun...

Jean

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.20] kobject net ifindex + rename

2007-02-28 Thread Johannes Berg
On Wed, 2007-02-28 at 16:26 -0800, Jean Tourrilhes wrote:

> + /* This function is only used for network interface.
> +  * Some hotplug package track interfaces by their name and
> +  * therefore want to know when the name is changed by the user. */

Right now, that's true, but wireless is going to start using
device_rename pretty soon as well. Could you rephrase this comment?

johannes


signature.asc
Description: This is a digitally signed message part


Re: Make sure we populate the initroot filesystem late enough

2007-02-28 Thread Michael Ellerman
On Wed, 2007-02-28 at 10:13 +, David Woodhouse wrote:
> On Wed, 2007-02-28 at 07:43 +0100, Benjamin Herrenschmidt wrote:
> > I wouldn't be that sure ... I've had problems in the past with PMU based
> > cpufreq... looks like flushing all caches and hard-resetting the
> > processor on the fly when there can be pending DMAs might be a source of
> > trouble... especially on CPUs that don't have working cache flush HW
> > assist. 
> 
> I've seen it on a PowerMac3,1 (400MHz G4) where we don't have cpufreq.
> I've also seen it on the latest 1.5GHz Mac Mini, and on my shinybook.
> They all fall over with the latest kernel, although the shinybook only
> does so immediately when booted with mem=512M. The shinybook does crash
> later with new kernels though; I don't yet know why. It could be the
> same thing, or it could be something different. That one seemed to
> appear between Fedora's 2.6.19-1.2913 and 2.6.19-1.2914 kernels, where
> we did nothing but turned CONFIG_SYSFS_DEPRECATED on.
> 
> I don't blame cpufreq. At various times I've been equally convinced that
> it was due to CONFIG_KPROBES, and Linus' initrd-moving patch.

Is there any pattern to the way it dies? Or is it just randomly dieing
somewhere depending on which config options you have enabled?

This is starting to sound reminiscent of a bug I chased for a while last
year on Power5, but didn't find. It was "fixed" on some machines by
disabling CONFIG_KEXEC, and/or other random unrelated CONFIG options.
Unfortunately it magically stopped reproducing so I never caught it :/

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person


signature.asc
Description: This is a digitally signed message part


Re: [PATCH 2.6.20] kobject net ifindex + rename

2007-02-28 Thread Jean Tourrilhes
On Wed, Feb 28, 2007 at 07:36:17AM -0800, Greg KH wrote:
> On Tue, Feb 27, 2007 at 05:27:41PM -0800, Jean Tourrilhes wrote:
> > diff -u -p linux/drivers/base/class.j1.c linux/drivers/base/class.c
> > --- linux/drivers/base/class.j1.c   2007-02-26 18:38:10.0 -0800
> > +++ linux/drivers/base/class.c  2007-02-27 15:52:37.0 -0800
> > @@ -841,6 +841,8 @@ int class_device_rename(struct class_dev
> 
> This function is not in the 2.6.21-rc2 kernel, so you might want to
> rework this patch a bit :)

Thanks for all you good comments. I ported my patch to
2.6.21-rc2, and tested it both on a hotplug and a udev system. Patch
is attached, I would be glad if you could push that through the usual
channels.

Also, I realised that I forgot to say in my original e-mail
that migrating udev to use ifindex instead of ifname would fix the
remove/add race condition for network devices. But that's not going to
happen overnight...

Have fun...

Jean

Signed-off-by: Jean Tourrilhes <[EMAIL PROTECTED]>

-

diff -u -p linux/include/linux/kobject.j1.h linux/include/linux/kobject.h
--- linux/include/linux/kobject.j1.h2007-02-28 14:26:29.0 -0800
+++ linux/include/linux/kobject.h   2007-02-28 14:27:54.0 -0800
@@ -48,6 +48,7 @@ enum kobject_action {
KOBJ_OFFLINE= (__force kobject_action_t) 0x06,  /* device 
offline */
KOBJ_ONLINE = (__force kobject_action_t) 0x07,  /* device 
online */
KOBJ_MOVE   = (__force kobject_action_t) 0x08,  /* device move 
*/
+   KOBJ_RENAME = (__force kobject_action_t) 0x09,  /* device 
renamed */
 };
 
 struct kobject {
diff -u -p linux/net/core/net-sysfs.j1.c linux/net/core/net-sysfs.c
--- linux/net/core/net-sysfs.j1.c   2007-02-28 14:26:45.0 -0800
+++ linux/net/core/net-sysfs.c  2007-02-28 14:27:54.0 -0800
@@ -424,6 +424,17 @@ static int netdev_uevent(struct device *
if ((size <= 0) || (i >= num_envp))
return -ENOMEM;
 
+   /* pass ifindex to uevent.
+* ifindex is useful as it won't change (interface name may change)
+* and is what RtNetlink uses natively. */
+   envp[i++] = buf;
+   n = snprintf(buf, size, "IFINDEX=%d", dev->ifindex) + 1;
+   buf += n;
+   size -= n;
+
+   if ((size <= 0) || (i >= num_envp))
+   return -ENOMEM;
+
envp[i] = NULL;
return 0;
 }
diff -u -p linux/lib/kobject_uevent.j1.c linux/lib/kobject_uevent.c
--- linux/lib/kobject_uevent.j1.c   2007-02-28 14:26:58.0 -0800
+++ linux/lib/kobject_uevent.c  2007-02-28 14:27:54.0 -0800
@@ -52,6 +52,8 @@ static char *action_to_string(enum kobje
return "online";
case KOBJ_MOVE:
return "move";
+   case KOBJ_RENAME:
+   return "rename";
default:
return NULL;
}
diff -u -p linux/drivers/base/core.j1.c linux/drivers/base/core.c
--- linux/drivers/base/core.j1.c2007-02-28 15:45:45.0 -0800
+++ linux/drivers/base/core.c   2007-02-28 15:47:30.0 -0800
@@ -1007,6 +1007,8 @@ int device_rename(struct device *dev, ch
char *new_class_name = NULL;
char *old_symlink_name = NULL;
int error;
+   char *devname_string = NULL;
+   char *envp[2];
 
dev = get_device(dev);
if (!dev)
@@ -1014,6 +1016,15 @@ int device_rename(struct device *dev, ch
 
pr_debug("DEVICE: renaming '%s' to '%s'\n", dev->bus_id, new_name);
 
+   devname_string = kmalloc(strlen(dev->bus_id) + 15, GFP_KERNEL);
+   if (!devname_string) {
+   put_device(dev);
+   return -ENOMEM;
+   }
+   sprintf(devname_string, "INTERFACE_OLD=%s", dev->bus_id);
+   envp[0] = devname_string;
+   envp[1] = NULL;
+
 #ifdef CONFIG_SYSFS_DEPRECATED
if ((dev->class) && (dev->parent))
old_class_name = make_class_name(dev->class->name, >kobj);
@@ -1049,12 +1060,20 @@ int device_rename(struct device *dev, ch
sysfs_create_link(>class->subsys.kset.kobj, >kobj,
  dev->bus_id);
}
+
+   /* This function is only used for network interface.
+* Some hotplug package track interfaces by their name and
+* therefore want to know when the name is changed by the user. */
+   if(!error)
+   kobject_uevent_env(>kobj, KOBJ_RENAME, envp);
+
put_device(dev);
 
kfree(new_class_name);
kfree(old_symlink_name);
  out_free_old_class:
kfree(old_class_name);
+   kfree(devname_string);
 
return error;
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: debug registers and fork

2007-02-28 Thread Stephane Eranian
Alan,

On Wed, Feb 28, 2007 at 07:01:17PM -0500, Alan Stern wrote:
> On Wed, 28 Feb 2007, Roland McGrath wrote:
> 
> > It is true that debug registers are inherited by fork and clone.
> > I am 99% sure that this was never specifically intended, but it
> > has been this way for a long time (since 2.4 at least).  It's an
> > implicit consequence of the do_fork implementation style, which
> > does a blind copy of the whole task_struct and then explicitly
> > reinitializes some individual fields.  I suppose this has some
> > benefit or other, but it is very prone to new pieces of state
> > getting implicitly copied without the person adding that new state
> > ever consciously deciding what its inheritance semantics should be.
> > 
> > Alan Stern is working on a revamp of the x86 debug register
> > support.  This is a fine opportunity to clean this area up and
> > decide positively what the semantics ought to be.
> 
> Absolutely.  Right now I just have a placeholder function with a note
> about checking for CLONE_PTRACE.  The cleanest solution, far and away,
> would be to have the child process inherit no breakpoints and no debug
> register values.
> 
I agree and that is how we have it on IA-64. With debugging, there is
always another process involved and no matter what I think it needs to 
be aware of the new child. I don't think autoamtic inheritance is good.
It should always be trigger by the controlling process (e.g., debugger).
There is enough support in ptrace to catch the fork/vfork/pthread_create
and decide what to do.  This is how I have coded perfmon so that hardware
performance counters are never automatically inherited.

-- 
-Stephane
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20 SATA error

2007-02-28 Thread Robert Hancock

Gerhard Mack wrote:
Sorry for the false alarm, 

There is one thing that seems odd, if you do have an nForce4 chipset, the
kernel should be running the SATA controller in ADMA mode in 2.6.20, but it
doesn't seem like it is from your dmesg output. Can you post the output of
"lspci -vvn"? Also what kind of motherboard is that?


Sure thing.  It's an Asus m2npv-vm.


Ah, that would be why, it's not one of the original nForce4 
(CK804/MCP04) chipsets, it's the newer nForce 430 (MCP51) chipset which 
doesn't support ADMA. NVidia said they'd be sending some patches to 
allow NCQ support on MCP51 and MCP61 chipsets back in October, but I 
haven't seen any, or information required to implement same..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SMP performance degradation with sysbench

2007-02-28 Thread Nish Aravamudan

On 2/27/07, Nish Aravamudan <[EMAIL PROTECTED]> wrote:

On 2/27/07, Bill Davidsen <[EMAIL PROTECTED]> wrote:
> Paulo Marques wrote:
> > Rik van Riel wrote:
> >> J.A. Magallón wrote:
> >>> [...]
> >>> Its the same to answer 4+4 queries than 8 at half the speed, isn't it ?
> >>
> >> That still doesn't fix the potential Linux problem that this
> >> benchmark identified.
> >>
> >> To clarify: I don't care as much about MySQL performance as
> >> I care about identifying and fixing this potential bug in
> >> Linux.
> >
> > IIRC a long time ago there was a change in the scheduler to prevent a
> > low prio task running on a sibling of a hyperthreaded processor to slow
> > down a higher prio task on another sibling of the same processor.
> >
> > Basically the scheduler would put the low prio task to sleep during an
> > adequate task slice to allow the other sibling to run at full speed for
> > a while.



> > If that is the case, turning off CONFIG_SCHED_SMT would solve the problem.



> Note that Intel does make multicore HT processors, and hopefully when
> this code works as intended it will result in more total throughput. My
> supposition is that it currently is NOT working as intended, since
> disabling SMT scheduling is reported to help.

It does help, but we still drop off, clearly. Also, that's my
baseline, so I'm not able to reproduce the *sharp* dropoff from the
blog post yet.

> A test with MC on and SMT off would be informative for where to look next.

I'm rebooting my box with 2.6.20.1 and exactly this setup now.


Here are the results:

idle.png: average % idle over 120s runs from 1 to 32 threads
transactions.png: TPS over 120s runs from 1 to 32 threads

Hope the data is useful. All I can conclude right now is that SMT
appears to help (contradicting what I said earlier), but that MC seems
to have no effect (or no substantial effect).

Thanks,
Nish


idle.png
Description: PNG image


transactions.png
Description: PNG image


[PATCH]: Fix radeon blanking return value.

2007-02-28 Thread David Miller

If you'll recall, over a year ago, I pointed out that the current
Radeon driver erroneously returns -EINVAL for valid blanking codes,
here is a link to that thread:

http://lkml.org/lkml/2006/1/28/6

No other driver does this, and it confuses the X server into thinking
that the device does not support blanking properly.

As a result I have to switch to a console VC and back to the X server
to unblank the screen, which is rediculious.

I've had to do this for more than 3 years and I'm really tired of this
problem still being around :-)

I looked again and there is simply no reason for the Radeon driver to
return -EINVAL for FB_BLANK_NORMAL.  It claims it wants to do this in
order to convince fbcon to blank in software, right here:

if (fb_blank(info, blank))
fbcon_generic_blank(vc, info, blank);

to software blank the screen.  But it only causes that to happen
in the FB_BLANK_NORMAL case.

That makes no sense because the Radeon code does this:

val |= CRTC_DISPLAY_DIS;

in the FB_BLANK_NORMAL case so should be blanking the hardware, and
there is therefore no reason to SW blank by returning -EINVAL.

Therefore I propose we finally apply the following patch.  No other
fbdev driver does this madness, and as I've shown above there is no
justification for the behavior at all :-)

Signed-off-by: David S. Miller <[EMAIL PROTECTED]>

diff --git a/drivers/video/aty/radeon_base.c b/drivers/video/aty/radeon_base.c
index c32b714..af7eb07 100644
--- a/drivers/video/aty/radeon_base.c
+++ b/drivers/video/aty/radeon_base.c
@@ -1032,8 +1032,7 @@ int radeon_screen_blank(struct radeonfb_info *rinfo, int 
blank, int mode_switch)
break;
}
 
-   /* let fbcon do a soft blank for us */
-   return (blank == FB_BLANK_NORMAL) ? -EINVAL : 0;
+   return 0;
 }
 
 static int radeonfb_blank (int blank, struct fb_info *info)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Menuconfig has butterfly effects?

2007-02-28 Thread Rob Landley
On Tuesday 27 February 2007 6:36 pm, Randy Dunlap wrote:
> > The first hunk I expect, the second I did not.  Anybody care to venture a 
> > guess why the visibility logic is unstable?
> 
> can we get .config^Wtryit ?  (version 0, not version 1)

Unfortunately, the first .config was generated by me tooling around in 
menuconfig, and then overwritten by a later build.  And now I can't reproduce 
the problem. :(

Rob
-- 
"Perfection is reached, not when there is no longer anything to add, but
when there is no longer anything to take away." - Antoine de Saint-Exupery
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] make sscanf honor %n at end of input string

2007-02-28 Thread Johannes Berg
I was playing with some code that sometimes got a string where a %n
match should have been done where the input string ended, for example
like this:

  sscanf("abc123", "abc%d%n", , );

However, the scanf function in the kernel doesn't convert the %n in that
case because it has already matched the complete input after %d. This
patch changes that.

Signed-off-by: Johannes Berg <[EMAIL PROTECTED]>

---
Shrug. My use case for this collapsed, but I figured having scanf doing
that correctly might be a good thing.

 lib/vsprintf.c |9 +
 1 file changed, 9 insertions(+)

--- wireless-dev.orig/lib/vsprintf.c2007-03-01 00:28:31.776381760 +0100
+++ wireless-dev/lib/vsprintf.c 2007-03-01 00:59:33.256381760 +0100
@@ -825,6 +825,15 @@ int vsscanf(const char * buf, const char
break;
str = next;
}
+
+   /* Now we've come all the way through so either the input string or
+* the format ended. In the former case, there can be a %n at the
+* current position in the format that needs to be filled. */
+   if (*fmt == '%' && *(fmt+1) == 'n') {
+   int *i = (int *)va_arg(args,int*);
+   *i = str - buf;
+   }
+
return num;
 }
 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MOST(Media Oriented Systems Transport) Interface?

2007-02-28 Thread Bernhard Walle
Hello,

* Jan Kiszka <[EMAIL PROTECTED]> [2007-03-01 00:22]:
> Robin Getz wrote:
> > Does anyone have a pointer for a MOST (Media Oriented Systems Transport) 
> > driver?
> > 
> > http://en.wikipedia.org/wiki/Media_Oriented_Systems_Transport
> > 
> > I have seen announcements of Linux systems that support MOST:
> > 
> > http://linuxdevices.com/news/NS2586090082.html
> > 
> > But I have not seen the driver architecture, or the protocol that people 
> > are running on them...
> > 
> > Any pointers? 
> 
> The were some rumours earlier, but now I actually stumbled over the
> release - and recalled this thread.

Quite interesting that you found it. It was released yesterday, and I
wanted to write an announcement in a few days, at least at the Xenomai
lists. Anyway ...

> This might be what you are looking for:
> 
> http://most4linux.sourceforge.net/

Yes, that's a MOST driver for a OS 8604 PCI interface that I wrote as
Diploma Thesis at Siemens. It was relased as OpenSource just now.

It only supports synchronous data transfer. The Most NetServices are
in userspace. There's a OpenSource demonstration available, but no
full NetServices implementation. You can buy a license from SMC or use
the specification to re-write an OSS implementation. However, that's
all userspace, so no GPL problem here. :)

I must say that it was more a demonstration how a Linux driver could
be ported to real-time Linux extension than a real productive working
MOST driver for Linux.

Well, but of course, it works. At least it did in my environement
(i386/single-CPU).

You can find more information on the homepage ...



Regards,
Bernhard

PS: SUSE has nothing to do with that driver, I only write with my SUSE
address because I don't want to re-subscribe because of one mail.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Menuconfig has butterfly effects?

2007-02-28 Thread Rob Landley
On Tuesday 27 February 2007 6:43 pm, Gregor Jasny wrote:
> Hi,
> 
> 2007/2/28, Rob Landley <[EMAIL PROTECTED]>:
> > I ran "make ARCH=x86_64 menuconfig", did a lot of editing, and saved
> > the .config.  Then I copied that to a backup, ran "make oldconfig" on the
> 
> I'd try with "make ARCH=x86_64 oldconfig"

I did.  Sorry for the confusion there.  (If I'd gotten that wrong, there'd 
have been a bit more change. :)

Rob
-- 
"Perfection is reached, not when there is no longer anything to add, but
when there is no longer anything to take away." - Antoine de Saint-Exupery
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: debug registers and fork

2007-02-28 Thread Alan Stern
On Wed, 28 Feb 2007, Roland McGrath wrote:

> It is true that debug registers are inherited by fork and clone.
> I am 99% sure that this was never specifically intended, but it
> has been this way for a long time (since 2.4 at least).  It's an
> implicit consequence of the do_fork implementation style, which
> does a blind copy of the whole task_struct and then explicitly
> reinitializes some individual fields.  I suppose this has some
> benefit or other, but it is very prone to new pieces of state
> getting implicitly copied without the person adding that new state
> ever consciously deciding what its inheritance semantics should be.
> 
> Alan Stern is working on a revamp of the x86 debug register
> support.  This is a fine opportunity to clean this area up and
> decide positively what the semantics ought to be.

Absolutely.  Right now I just have a placeholder function with a note
about checking for CLONE_PTRACE.  The cleanest solution, far and away,
would be to have the child process inherit no breakpoints and no debug
register values.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] Define FIXED_PORT flag for serial_core

2007-02-28 Thread David Gibson
On Wed, Feb 28, 2007 at 10:26:30PM +, Russell King wrote:
> On Tue, Feb 20, 2007 at 02:19:51PM +1100, David Gibson wrote:
> > Therefore, this patch defines a UPF_FIXED_PORT flag for the uart_port
> > structure.  If this flag is set when the serial port is configured,
> > any attempts to alter the port's type, io address, irq or base clock
> > with setserial are ignored.
> 
> I've been wondering about this, and it is questionable whether we
> should allow any serial port which isn't owned by the legacy platform
> device (the one called "serial8250", iow by the 8250 driver itself)
> to have the base addresses and interrupts changed.
> 
> IOW, we apply this "fixed port" to any port registered by probe
> modules external to the 8250 driver itself, such as PCI, PNP, etc.

Sounds reasonable to me.  But maybe in that case we should invert the
sense of the flag.  UPF_MOVABLE_PORT or UPF_USER_CONFIGURABLE or
something.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Linux 2.6.16.43-rc1

2007-02-28 Thread Adrian Bunk
New hwmon drivers since 2.6.16.42 for the following hardware:
- National Semiconductor pc87427
- SMSC lpc47m192 and lpc47m997
- Winbond w83791d


Location:
ftp://ftp.kernel.org/pub/linux/kernel/people/bunk/linux-2.6.16.y/testing/

git tree:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.16.y.git


Changes since 2.6.16.42:

Adrian Bunk (1):
  Linux 2.6.16.43-rc1

Alexey Dobriyan (1):
  [IPV4/IPV6] multicast: Check add_grhead() return value

Charles Spirakis (2):
  HWMON: w83791d: New hardware monitoring driver for the Winbond W83791D
  w83791d: Documentation update

Francois Romieu (1):
  sis190: failure to set the MAC address from EEPROM

Hartmut Rick (1):
  smsc47m192: New hwmon driver for SMSC LPC47M192/997

Ilpo Järvinen (1):
  [TCP]: Prevent pseudo garbage in SYN's advertized window

Jean Delvare (3):
  hwmon: New PC87427 hardware monitoring driver
  hwmon: Add support for the Winbond W83687THF
  i2c-isa: Restore driver owner

Jim Cromie (2):
  hwmon: Allow sensor attributes arrays
  hwmon: Refactor SENSOR_DEVICE_ATTR_2

Jordan Crouse (1):
  hwmon lm83: Add LM82 support

Kirill Korotaev (1):
  fix ext3 block bitmap leakage

Marcel Siegert (1):
  V4L/DVB: Dvbdev: fix illegal re-usage of fileoperations struct

Martin Devera (1):
  I2C: i2c-piix4: Add Broadcom HT-1000 support

Patrick McHardy (1):
  [DECNET]: Fix sfuzz hanging on 2.6.18

Rudolf Marek (1):
  i2c-piix4: Add ATI IXP200/300/400 support

Stephen Hemminger (6):
  sky2: fix ram buffer allocation settings
  sky2: allow multicast pause frames
  sky2: fix for use on big endian
  sky2: more stats
  sky2: add more pci ids
  sky2: email and version change.


 Documentation/hwmon/lm83|   16 
 Documentation/hwmon/pc87427 |   38 
 Documentation/hwmon/smsc47m192  |  102 ++
 Documentation/hwmon/sysfs-interface |6 
 Documentation/hwmon/w83627hf|4 
 Documentation/hwmon/w83791d |  120 ++
 Documentation/i2c/busses/i2c-piix4  |4 
 Makefile|2 
 drivers/hwmon/Kconfig   |   57 +
 drivers/hwmon/Makefile  |3 
 drivers/hwmon/it87.c|1 
 drivers/hwmon/lm78.c|1 
 drivers/hwmon/lm83.c|   50 -
 drivers/hwmon/pc87360.c |1 
 drivers/hwmon/pc87427.c |  627 +
 drivers/hwmon/sis5595.c |1 
 drivers/hwmon/smsc47b397.c  |1 
 drivers/hwmon/smsc47m1.c|1 
 drivers/hwmon/smsc47m192.c  |  648 ++
 drivers/hwmon/via686a.c |1 
 drivers/hwmon/vt8231.c  |1 
 drivers/hwmon/w83627ehf.c   |1 
 drivers/hwmon/w83627hf.c|   73 +
 drivers/hwmon/w83781d.c |1 
 drivers/hwmon/w83791d.c | 1256 
 drivers/i2c/busses/Kconfig  |9 
 drivers/i2c/busses/i2c-piix4.c  |   10 
 drivers/media/dvb/dvb-core/dvbdev.c |   13 
 drivers/net/sis190.c|2 
 drivers/net/sky2.c  |  146 ++-
 fs/ext3/inode.c |1 
 include/linux/hwmon-sysfs.h |   24 
 include/linux/pci_ids.h |4 
 net/decnet/af_decnet.c  |4 
 net/ipv4/igmp.c |2 
 net/ipv4/tcp_output.c   |4 
 net/ipv6/mcast.c|2 
 37 files changed, 3126 insertions(+), 111 deletions(-)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MOST(Media Oriented Systems Transport) Interface?

2007-02-28 Thread Jan Kiszka
Robin Getz wrote:
> Does anyone have a pointer for a MOST (Media Oriented Systems Transport) 
> driver?
> 
> http://en.wikipedia.org/wiki/Media_Oriented_Systems_Transport
> 
> I have seen announcements of Linux systems that support MOST:
> 
> http://linuxdevices.com/news/NS2586090082.html
> 
> But I have not seen the driver architecture, or the protocol that people are 
> running on them...
> 
> Any pointers? 

The were some rumours earlier, but now I actually stumbled over the
release - and recalled this thread.

This might be what you are looking for:

http://most4linux.sourceforge.net/

Jan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Make XFS workqueues nonfreezable

2007-02-28 Thread Rafael J. Wysocki
Since freezable workqueues are broken in 2.6.21-rc
(cf. http://marc.theaimsgroup.com/?l=linux-kernel=116855740612755,
http://marc.theaimsgroup.com/?l=linux-kernel=117261312523921=2)
it's better to change the only user of them, which is XFS, to use "normal"
nonfreezable workqueues.

Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
---
 fs/xfs/linux-2.6/xfs_buf.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6.21-rc2/fs/xfs/linux-2.6/xfs_buf.c
===
--- linux-2.6.21-rc2.orig/fs/xfs/linux-2.6/xfs_buf.c
+++ linux-2.6.21-rc2/fs/xfs/linux-2.6/xfs_buf.c
@@ -1829,11 +1829,11 @@ xfs_buf_init(void)
if (!xfs_buf_zone)
goto out_free_trace_buf;
 
-   xfslogd_workqueue = create_freezeable_workqueue("xfslogd");
+   xfslogd_workqueue = create_workqueue("xfslogd");
if (!xfslogd_workqueue)
goto out_free_buf_zone;
 
-   xfsdatad_workqueue = create_freezeable_workqueue("xfsdatad");
+   xfsdatad_workqueue = create_workqueue("xfsdatad");
if (!xfsdatad_workqueue)
goto out_destroy_xfslogd_workqueue;
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/1] - Altix: reinitialize acpi tables

2007-02-28 Thread John Keller
To provide compatibilty with SN kernels that do and do not
have ACPI IO support, the SN PROM must build different
versions of some ACPI tables based on which kernel is booting.
As such, the tables may have to change at kernel boot time.
By default, prior to kernel boot, the PROM builds an empty
DSDT (header only) and no SSDTs. If an ACPI capable kernel
boots, the kernel will notify the PROM, at platform setup time,
and the PROM will build full DSDT and SSDT tables.

With the latest changes to acpi_table_init(), the table lengths
are saved, and when our PROM changes them, the changes are not seen,
and the kernel will crash on boot. Because of issues with kexec support,
we are not able to create the tables prior to acpi_table_init().
As a result, we are making a second call to acpi_table_init() to
process the rebuilt DSDT and SSDTs.

Signed-off-by: John Keller <[EMAIL PROTECTED]>
---


Index: release/arch/ia64/sn/kernel/setup.c
===
--- release.orig/arch/ia64/sn/kernel/setup.c2007-02-28 11:02:34.558139870 
-0600
+++ release/arch/ia64/sn/kernel/setup.c 2007-02-28 11:02:39.362737953 -0600
@@ -397,6 +397,8 @@ void __init sn_setup(char **cmdline_p)
ia64_sn_set_os_feature(OSF_PCISEGMENT_ENABLE);
ia64_sn_set_os_feature(OSF_ACPI_ENABLE);
 
+   /* Load the new DSDT and SSDT tables into the global table list. */
+   acpi_table_init();
 
 #if defined(CONFIG_VT) && defined(CONFIG_VGA_CONSOLE)
/*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Wanted: simple, safe x86 stack overflow detection

2007-02-28 Thread Bill Irwin
On Feb 28 2007 15:20, Bill Irwin wrote:
>> I don't know about the rest of the world, but halting the system in the
>> case of memory corruption sounds like an extremely good idea to me.

On Thu, Mar 01, 2007 at 12:36:47AM +0100, Jan Engelhardt wrote:
> Just because a rather "unimportant" driver (e.g. parport) might oops
> thanks to a now-invalid address after memory corruption, I'd still like
> to shutdown the system normally - which should be possible when not
> using parport after said corruption.

Panic on oops/bug is sysctl-activated as things now stand, so you're
all set.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] mv643xx_eth: move mac_addr inside of mv643xx_eth_platform_data

2007-02-28 Thread Dale Farnsworth
On Wed, Feb 28, 2007 at 03:11:03PM -0800, Stephen Hemminger wrote:
> On Wed, 28 Feb 2007 15:40:31 -0700
> "Dale Farnsworth" <[EMAIL PROTECTED]> wrote:
> 
> > The information contained within platform_data should be self-contained.
> > Replace the pointer to a MAC address with the actual MAC address in
> > struct mv643xx_eth_platform_data.
> > 
> > Signed-off-by: Dale Farnsworth <[EMAIL PROTECTED]>
> > 
> > Index: b/drivers/net/mv643xx_eth.c
> > ===
> > --- a/drivers/net/mv643xx_eth.c
> > +++ b/drivers/net/mv643xx_eth.c
> > @@ -1380,7 +1380,9 @@ static int mv643xx_eth_probe(struct plat
> >  
> > pd = pdev->dev.platform_data;
> > if (pd) {
> > -   if (pd->mac_addr)
> > +   static u8 zero_mac_addr[6] = { 0 };
> > +
> > +   if (memcmp(pd->mac_addr, zero_mac_addr, 6) != 0)
> > memcpy(dev->dev_addr, pd->mac_addr, 6);
> 
> 
> is_zero_ether_addr() is faster/cleaner for this

Thanks.  I follow up with a modified patch in a day or two.

-Dale
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Ramdisk size

2007-02-28 Thread linux-os \(Dick Johnson\)

Hello,
On an embedded system, I use two ramdisks. They are both
16 megabytes in size. I can create them interactively in
the normal way with mke2fs. However, when the system is
booted using isolinux, the RAM disks become corrupted.
Apparently isolinux.cfg's ramdisk_size (not documented,
only referenced, BYW) is not in 1k increments. That value
seems to affect all ramdisks, not just the `initrd` one.
I can prevent corruption by setting the value to 20
(way too much). However I won't have any RAM left for
applications! I need to know how much RAM a ramdisk of
16386 bytes takes, and how to set it in isolinux.cfg
(what the multiplier is).

The file-system mounts properly, but a write of one byte
causes the following error:

EXT-FS error (Device ram2);
ext2_new_block: Allocation block in system zone block = 129

e2fsck reports:
+(129--256) + (8321--8448)
block bitmap differences

It's difficult to find out what is being corrupted because
`mke2fs` never creates the same thing twice!! There is some
time-stuff in it, therefore I can't find out what actually
is getting trashed. Currently, it doesn't look like anything
at the ends of the raw disk get trashed. Instead, it looks
like the ramdisk size when booted using isolinux is simply
not the same size as when the ramdisk contents were created.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.24 on an i686 machine (5592.65 BogoMips).
New book: http://www.AbominableFirebug.com/
_



The information transmitted in this message is confidential and may be 
privileged.  Any review, retransmission, dissemination, or other use of this 
information by persons or entities other than the intended recipient is 
prohibited.  If you are not the intended recipient, please notify Analogic 
Corporation immediately - by replying to this message or by sending an email to 
[EMAIL PROTECTED] - and destroy all copies of this information, including any 
attachments, without reading or disclosing them.

Thank you.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PID entries in /proc sorted by number, not start time in 2.6.19

2007-02-28 Thread Eric W. Biederman
Chuck Ebbert <[EMAIL PROTECTED]> writes:

> Starting with kernel 2.6.19, the process directories in
> /proc are sorted by number. They were sorted by process
> start time in 2.6.18 and earlier. This makes the output
> of procps come out in that order too, pissing off users
> who are used to the old way.
>
> To reproduce:
>   1. Wrap your PID numbers.
>   2. Do ls -fl /proc
>   3. Look at output of ps command.
>
> Compare 2.6.18 to 2.6.19.
>
> See also:
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=230227

Apologies, but this was a bug fix for a more serious issue.  The code
to report the directory entries by start time was fundamentally broken.
In particular the sequence:
opendir
readdir
readdir
readdir

closedir

can miss processes that exist for the entire duration of that
sequence.  Which is non-posix, non-intuitive, and has no reasonable
work around.

The sorting by pid happened as a side effect of finding a stable token
we can come back to so we can at least guarantee normal readdir
semantics.  That objects that exist for the entire readdir are
guaranteed to be displayed.  That objects that come into existence or
are deleted during the readdir may be missed.  That isn't perfect but
it is a useable semantic. 

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Wanted: simple, safe x86 stack overflow detection

2007-02-28 Thread Jan Engelhardt

On Feb 28 2007 15:20, Bill Irwin wrote:
>
>I don't know about the rest of the world, but halting the system in the
>case of memory corruption sounds like an extremely good idea to me.

Just because a rather "unimportant" driver (e.g. parport) might oops
thanks to a now-invalid address after memory corruption, I'd still like
to shutdown the system normally - which should be possible when not
using parport after said corruption.


Jan
-- 
ft: http://freshmeat.net/p/chaostables/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [stable] [patch 00/21] 2.6.19-stable review

2007-02-28 Thread Eric W. Biederman
Greg KH <[EMAIL PROTECTED]> writes:

> On Wed, Feb 28, 2007 at 05:28:27AM -0700, Eric W. Biederman wrote:
>> 
>> What are the rules that are supposed to govern backports to stable
>> trees these days anyway?
>
> Documentation/stable_kernel_rules.txt


Ok if that is really what we are going with, the this silly patch isn't
simple enough for a backport.  There used to other rules to the effect
the patch must be merged in mainline, and we only backport to one kernel
revision.

I think it fails the 100 lines with context test.

The meaning of obviously correct is a little bit nebulous.  But if
something is obvious multiple people can easily understand what
is going on.  I haven't gotten any feedback that has said yes I
see what you are doing on the mentioned patch.

I'm really not certain how this patch got seriously proposed then.
I guess it was the serious of the issues of peoples boxes falling
over.

I guess somewhere I got the rules for weird vendor trees confused with
our stable branches.  The relaxed stable branch rules probably did it
to me.

So the best we can do is the commit below for a backport.  It doesn't
fix the issue but it generally keeps the machines from falling over.

p.s. The copy below is whitespace damaged because I just cut and
pasted it into this email.

commit 2fb12a9bca5ad9aa6dcd2c639b4a7656a8843ef8
Author: Eric W. Biederman <[EMAIL PROTECTED]>
Date:   Tue Feb 13 13:26:25 2007 +0100

[PATCH] x86-64: survive having no irq mapping for a vector

Occasionally the kernel has bugs that result in no irq being found for a
given cpu vector.  If we acknowledge the irq the system has a good chance
of continuing even though we dropped an irq message.  If we continue to
simply print a message and not acknowledge the irq the system is likely to
become non-responsive shortly there after.

AK: Fixed compilation for UP kernels

Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]>
Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>
Cc: "Luigi Genoni" <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>

diff --git a/arch/x86_64/kernel/irq.c b/arch/x86_64/kernel/irq.c
index 0c06af6..3bc30d2 100644
--- a/arch/x86_64/kernel/irq.c
+++ b/arch/x86_64/kernel/irq.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 atomic_t irq_err_count;
 
@@ -120,9 +121,14 @@ asmlinkage unsigned int do_IRQ(struct pt_regs *regs)
 
if (likely(irq < NR_IRQS))
generic_handle_irq(irq);
-   else if (printk_ratelimit())
-   printk(KERN_EMERG "%s: %d.%d No irq handler for vector\n",
-   __func__, smp_processor_id(), vector);
+   else {
+   if (!disable_apic)
+   ack_APIC_irq();
+
+   if (printk_ratelimit())
+   printk(KERN_EMERG "%s: %d.%d No irq handler for 
vector\n",
+   __func__, smp_processor_id(), vector);
+   }
 
irq_exit();
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20 SATA error

2007-02-28 Thread Gerhard Mack
On Wed, 28 Feb 2007, Robert Hancock wrote:
> Gerhard Mack wrote:
> > On Wed, 28 Feb 2007, Charles Shannon Hendrix wrote:
> > 
> > > On Wed, 28 Feb 2007 13:25:00 -0500 (EST)
> > > Gerhard Mack <[EMAIL PROTECTED]> wrote:
> > > 
> > >  
> > > > > In another thread, I think they were saying it was either a SATA
> > > > > chipset
> > > > > driver bug, or a problem in libata core.
> > > > I also have an nforce4.
> > > On another mailing list, someone with an Intel chipset is reporting the
> > > same
> > > problem, and also that others without nforce chipsets are seeing it.
> > 
> > I was reaching inside my computer to check something and heared the thing
> > click and got the same error message.
> > 
> > Turns out the adaptor that goes between SATA drive and the old style power
> > connector was loose on the drive side.  Doesn't seem to me like it was very
> > snug fitting to begin with.  I changed it to one of the proper SATA
> > connectors comming off the power supply and it doesn't do that anymore.
> > 
> > Sorry for the false alarm, 
> 
> There is one thing that seems odd, if you do have an nForce4 chipset, the
> kernel should be running the SATA controller in ADMA mode in 2.6.20, but it
> doesn't seem like it is from your dmesg output. Can you post the output of
> "lspci -vvn"? Also what kind of motherboard is that?
> 
Sure thing.  It's an Asus m2npv-vm.

Gerhard

mgerhard:/home/gmack# lspci -vvn
00:00.0 0500: 10de:02f0 (rev a2)
Subsystem: 1043:81c0
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- TAbort- SERR+ TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- Reset- FastB2B-
Capabilities: [40] Subsystem: 10de:
Capabilities: [48] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ Queue=0/1 
Enable+
Address: fee0300c  Data: 4141
Capabilities: [60] HyperTransport: MSI Mapping
Capabilities: [80] Express Root Port (Slot+) IRQ 0
Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag-
Device: Latency L0s <512ns, L1 <4us
Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s L1, Port 2
Link: Latency L0s <512ns, L1 <4us
Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch-
Link: Speed 2.5Gb/s, Width x1
Slot: AtnBtn- PwrCtrl- MRL- AtnInd- PwrInd- HotPlug- Surpise-
Slot: Number 0, PowerLimit 0.00
Slot: Enabled AtnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq-
Slot: AttnInd Off, PwrInd On, Power-
Root: Correctable- Non-Fatal- Fatal- PME-
Capabilities: [100] Virtual Channel

00:03.0 0604: 10de:02fd (rev a1) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B-
Capabilities: [40] Subsystem: 10de:
Capabilities: [48] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ Queue=0/1 
Enable+
Address: fee0300c  Data: 4149
Capabilities: [60] HyperTransport: MSI Mapping
Capabilities: [80] Express Root Port (Slot+) IRQ 0
Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag-
Device: Latency L0s <512ns, L1 <4us
Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s L1, Port 1
Link: Latency L0s <512ns, L1 <4us
Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch-
Link: Speed 2.5Gb/s, Width x1
Slot: AtnBtn- PwrCtrl- MRL- AtnInd- PwrInd- HotPlug- Surpise-
Slot: Number 0, PowerLimit 0.00
Slot: Enabled AtnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq-
Slot: AttnInd Off, PwrInd On, Power-
Root: Correctable- Non-Fatal- Fatal- PME-
Capabilities: [100] Virtual Channel

00:04.0 0604: 

2.6.21-rc1 build errors with X86_VOYAGER=y

2007-02-28 Thread andrew hendry

2.6.21-rc2-git2
from some make randconfig

# CONFIG_SMP is not set
CONFIG_X86_VOYAGER=y
CONFIG_X86_MSR=y
CONFIG_X86_CPUID=y

LD  .tmp_vmlinux1
arch/i386/kernel/built-in.o: In function `vic_sys_interrupt':
(.text+0x3141): undefined reference to `smp_vic_sys_interrupt'
arch/i386/kernel/built-in.o: In function `vic_cmn_interrupt':
(.text+0x3171): undefined reference to `smp_vic_cmn_interrupt'
arch/i386/kernel/built-in.o: In function `vic_cpi_interrupt':
(.text+0x31a1): undefined reference to `smp_vic_cpi_interrupt'
arch/i386/kernel/built-in.o: In function `qic_timer_interrupt':
(.text+0x31d1): undefined reference to `smp_qic_timer_interrupt'
arch/i386/kernel/built-in.o: In function `qic_invalidate_interrupt':
 (.text+0x3201): undefined reference to `smp_qic_invalidate_interrupt'
arch/i386/kernel/built-in.o: In function `qic_reschedule_interrupt':
(.text+0x3231): undefined reference to `smp_qic_reschedule_interrupt'
arch/i386/kernel/built-in.o: In function `qic_enable_irq_interrupt':
(.text+0x3261): undefined reference to `smp_qic_enable_irq_interrupt'
arch/i386/kernel/built-in.o: In function `qic_call_function_interrupt':
(.text+0x3291): undefined reference to `smp_qic_call_function_interrup
t'
arch/i386/kernel/built-in.o: In function `setup_bootmem_allocator':
(.init.text+0x606): undefined reference to `find_smp_config'
arch/i386/mach-voyager/built-in.o: In function `voyager_power_off':
(.text+0x1d4): undefined reference to `voyager_cat_power_off'
arch/i386/mach-voyager/built-in.o: In function `thread':
voyager_thread.c:(.text+0x41f): undefined reference to `voyager_status'
voyager_thread.c:(.text+0x476): undefined reference to `voyager_status'
voyager_thread.c:(.text+0x489): undefined reference to `voyager_cat_psi'
voyager_thread.c:(.text+0x4c3): undefined reference to `voyager_status'
voyager_thread.c:(.text+0x4d6): undefined reference to `voyager_status'

CONFIG_SMP=y
CONFIG_X86_VOYAGER=y
CONFIG_X86_MSR=y
CONFIG_X86_CPUID=y

arch/i386/kernel/built-in.o: In function `msr_read':
msr.c:(.text+0xbd15): undefined reference to `smp_call_function_single'
arch/i386/kernel/built-in.o: In function `msr_write':
msr.c:(.text+0xbe41): undefined reference to `smp_call_function_single'
arch/i386/kernel/built-in.o: In function `cpuid_read':
cpuid.c:(.text+0xc069): undefined reference to `smp_call_function_single'


CONFIG_SMP=y
CONFIG_X86_VOYAGER=y
# CONFIG_X86_MSR is not set
CONFIG_X86_CPUID=y

arch/i386/kernel/built-in.o: In function `cpuid_read':
cpuid.c:(.text+0xbce9): undefined reference to `smp_call_function_single'

CONFIG_SMP=y
CONFIG_X86_VOYAGER=y
# CONFIG_X86_MSR is not set
# CONFIG_X86_CPUID is not set

Builds ok
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20 SATA error

2007-02-28 Thread Robert Hancock

Charles Shannon Hendrix wrote:

On Wed, 28 Feb 2007 07:40:23 -0500 (EST)
Gerhard Mack <[EMAIL PROTECTED]> wrote:

hello, 


Can someone tell me what this means?
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x40 action 0x2 frozen
ata1.00: cmd 35/00:00:40:a6:23/00:04:00:00:00/e0 tag 0 cdb 0x0 data 524288 
out

 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1: port is slow to respond, please be patient (Status 0xd0)
ata1: port failed to respond (30 secs, Status 0xd0)
ata1: soft resetting port
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: configured for UDMA/100


I am fairly certain this is a bug in the 2.6.20 kernel.

I never see it in 2.6.19*, just 2.6.20.

It is some kind of but in the SATA code paths, or at least that's all it
appears to affect on my system.

What chipset do you have?

I have an nforce4 chipset.

In another thread, I think they were saying it was either a SATA chipset
driver bug, or a problem in libata core.


There's a known issue with sata_nv on nForce4 controllers running in 
ADMA mode in 2.6.20 (the first released kernel with ADMA support) where 
commands can time out when switching between NCQ commands and non-NCQ 
commands. Hopefully this is fixed in 2.6.21-rc. This doesn't seem to be 
the issue here, since his system isn't using ADMA mode, for reasons 
unclear to me..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Wanted: simple, safe x86 stack overflow detection

2007-02-28 Thread Bill Irwin
On Wed, Feb 28, 2007 at 09:41:44PM +0100, Andi Kleen wrote:
> Likely already too late then -- if critical state is overwritten
> you crashed before. Also a lot of stack intensive codes
> relatively large unused holes so it might miss the canary completely
> Anyways if you want a crash on context switch in the non
> hole case you can probably get it by just rearranging thread_info a bit.
> e.g. put preempt_count first. Any corruption of that will lead
> to schedule complaining.
> Don't think it is worth it though.
> I suppose one could have a CONFIG_DEBUG_STACK_OVERFLOW that gets
> the stacks from vmalloc which would catch any overflow with its
> guard pages. This is you would need to change __pa() to handle
> that too because there might be still some drivers that do
> DMA on stack addresses.  Would be somewhat ugly but doable.
> But I have my doubts it is worth it again -- in my experience static
> analysis works well enough to trace them down and 
> there are not that many anyways.

I don't know about the rest of the world, but halting the system in the
case of memory corruption sounds like an extremely good idea to me.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: solved Re: 2.6.20 SATA error

2007-02-28 Thread Robert Hancock

Gerhard Mack wrote:

On Wed, 28 Feb 2007, Charles Shannon Hendrix wrote:


On Wed, 28 Feb 2007 13:25:00 -0500 (EST)
Gerhard Mack <[EMAIL PROTECTED]> wrote:

 

In another thread, I think they were saying it was either a SATA chipset
driver bug, or a problem in libata core.

I also have an nforce4.

On another mailing list, someone with an Intel chipset is reporting the same
problem, and also that others without nforce chipsets are seeing it.


I was reaching inside my computer to check something and heared the thing 
click and got the same error message.


Turns out the adaptor that goes between SATA drive and the old style power 
connector was loose on the drive side.  Doesn't seem to me like it was 
very snug fitting to begin with.  I changed it to one of the proper SATA 
connectors comming off the power supply and it doesn't do that anymore.


Sorry for the false alarm, 


There is one thing that seems odd, if you do have an nForce4 chipset, 
the kernel should be running the SATA controller in ADMA mode in 2.6.20, 
but it doesn't seem like it is from your dmesg output. Can you post the 
output of "lspci -vvn"? Also what kind of motherboard is that?


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-02-28 Thread Ingo Molnar

* Linus Torvalds <[EMAIL PROTECTED]> wrote:

> So I would repeat my call for getting rid of the atoms, and instead 
> just do a "single submission" at a time. Do the linking by running a 
> threadlet that has user space code (and the stack overhead), which is 
> MUCH more flexible. And do nonlinked single system calls without 
> *either* atoms *or* a user-space stack footprint.

I agree that threadlets are much more flexible - and they might in fact 
win in the long run due to that.

i'll add a one-shot syscall API in v6 and then we'll be able to see them 
side by side. (wanted to do that in v5 but it got delayed by x86_64 
issues, x86_64's entry code is certainly ... tricky wrt. ptregs saving)

wrt. one-shot syscalls, the user-space stack footprint would still 
probably be there, because even async contexts that only do single-shot 
processing need to drop out of kernel mode to handle signals. We could 
probably hack the signal routing code to never deliver to such threads 
(but bounce it over to the head context, which is always available) but 
i think that would be a bit messy. (i dont exclude it though)

I think syslets might also act as a prototyping platform for new system 
calls. If any particular syslet atom string comes up more frequently 
(and we could even automate the profiling of that within the kernel), 
then it's a good candidate for a standalone syscall. Currently we dont 
have such information in any structured way: the connection between 
streams of syscalls done by applications is totally opaque to the 
kernel.

Also, i genuinely believe that to be competitive (performance-wise) with 
fully in-kernel queueing solutions, we need syslets - the syslet NULL 
overhead is 20 cycles (this includes copying, engine overhead, etc.), 
the syscall NULL overhead is 280-300 cycles. It could probably be made 
more capable by providing more special system calls like sys_upcall() to 
execute a user-space function. (that way a syslet could still execute 
user-space code without having to exit out of kernel mode too 
frequently) Or perhaps a sys_x86_bytecode() call, that would execute a 
pre-verified, kernel-stored sequence of simplified x86 bytecode, using 
the kernel stack.

My fear is that if we force all these things over to one-shot syscalls 
or threadlets then this will become another second-tier mechanism. By 
providing syslets we give the message: "sure, come on and play within 
the kernel if you want to, but it's not easy".

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] mv643xx_eth: move mac_addr inside of mv643xx_eth_platform_data

2007-02-28 Thread Stephen Hemminger
On Wed, 28 Feb 2007 15:40:31 -0700
"Dale Farnsworth" <[EMAIL PROTECTED]> wrote:

> The information contained within platform_data should be self-contained.
> Replace the pointer to a MAC address with the actual MAC address in
> struct mv643xx_eth_platform_data.
> 
> Signed-off-by: Dale Farnsworth <[EMAIL PROTECTED]>
> 
> Index: b/drivers/net/mv643xx_eth.c
> ===
> --- a/drivers/net/mv643xx_eth.c
> +++ b/drivers/net/mv643xx_eth.c
> @@ -1380,7 +1380,9 @@ static int mv643xx_eth_probe(struct plat
>  
>   pd = pdev->dev.platform_data;
>   if (pd) {
> - if (pd->mac_addr)
> + static u8 zero_mac_addr[6] = { 0 };
> +
> + if (memcmp(pd->mac_addr, zero_mac_addr, 6) != 0)
>   memcpy(dev->dev_addr, pd->mac_addr, 6);


is_zero_ether_addr() is faster/cleaner for this

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 2.6.21-rc2] init dma masks in pnp_dev

2007-02-28 Thread David Brownell
PNP now initializes device dma masks, which prevents oopses when generic
dma calls are made using pnp device nodes.

This assumes PNP only uses ISA DMA, with 24 bit addresses; and that it's
safe to init those masks for all devices (rather than finding out which
devices have been assigned DMA channels, and handling only those).

Signed-off-by: David Brownell <[EMAIL PROTECTED]>

Index: g26/include/linux/pnp.h
===
--- g26.orig/include/linux/pnp.h2007-02-12 00:31:26.0 -0800
+++ g26/include/linux/pnp.h 2007-02-18 20:18:55.0 -0800
@@ -177,6 +177,7 @@ static inline void pnp_set_card_drvdata 
 
 struct pnp_dev {
struct device dev;  /* Driver Model device interface */
+   u64 dma_mask;
unsigned char number;   /* used as an index, must be unique */
int status;
 
Index: g26/drivers/pnp/core.c
===
--- g26.orig/drivers/pnp/core.c 2005-11-12 22:24:18.0 -0800
+++ g26/drivers/pnp/core.c  2007-02-18 20:42:17.0 -0800
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "base.h"
 
@@ -114,6 +115,8 @@ int __pnp_add_device(struct pnp_dev *dev
int ret;
pnp_fixup_device(dev);
dev->dev.bus = _bus_type;
+   dev->dev.dma_mask = >dma_mask;
+   dev->dma_mask = dev->dev.coherent_dma_mask = DMA_24BIT_MASK;
dev->dev.release = _release_device;
dev->status = PNP_READY;
spin_lock(_lock);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 2.6.21-rc2] parport is an orphan

2007-02-28 Thread David Brownell
The writing on the wall seem to be that the parport stack is orphaned,
rather than maintained by four folk ... and having a webpage that says
the latest patches are based on a 2.5 kernel.

Signed-off-by: David Brownell <[EMAIL PROTECTED]>

Index: g26/MAINTAINERS
===
--- g26.orig/MAINTAINERS2007-02-28 12:46:00.0 -0800
+++ g26/MAINTAINERS 2007-02-28 12:47:58.0 -0800
@@ -2552,16 +2552,8 @@ L:   [EMAIL PROTECTED]
 S: Maintained
 
 PARALLEL PORT SUPPORT
-P: Phil Blundell
-M: [EMAIL PROTECTED]
-P: Tim Waugh
-M: [EMAIL PROTECTED]
-P: David Campbell
-P: Andrea Arcangeli
-M: [EMAIL PROTECTED]
 L: [EMAIL PROTECTED]
-W: http://people.redhat.com/twaugh/parport/
-S: Maintained
+S: Orphan
 
 PARIDE DRIVERS FOR PARALLEL PORT IDE DEVICES
 P: Tim Waugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] mv643xx_eth: Place explicit port number in mv643xx_eth_platform_data

2007-02-28 Thread Dale Farnsworth
We had been using the platform_device.id field to identify which ethernet
port is used for mv643xx_eth device.  This is not correct in general.
It will be incorrect, for example, if a hardware platform uses a single
port but not the first port.  Here, we add an explicit port_number field
to struct mv643xx_eth_platform_data.

This makes the mv643xx_eth_platform_data structure required, but that
isn't an issue since all users currently provide it already.

Signed-off-by: Dale Farnsworth <[EMAIL PROTECTED]>

diff --git a/arch/mips/momentum/jaguar_atx/platform.c 
b/arch/mips/momentum/jaguar_atx/platform.c
Index: b/arch/mips/momentum/jaguar_atx/platform.c
===
--- a/arch/mips/momentum/jaguar_atx/platform.c
+++ b/arch/mips/momentum/jaguar_atx/platform.c
@@ -48,6 +48,8 @@ static struct resource mv64x60_eth0_reso
 };
 
 static struct mv643xx_eth_platform_data eth0_pd = {
+   .port_number= 0,
+
.tx_sram_addr   = MV_SRAM_BASE_ETH0,
.tx_sram_size   = MV_SRAM_TXRING_SIZE,
.tx_queue_size  = MV_SRAM_TXRING_SIZE / 16,
@@ -77,6 +79,8 @@ static struct resource mv64x60_eth1_reso
 };
 
 static struct mv643xx_eth_platform_data eth1_pd = {
+   .port_number= 1,
+
.tx_sram_addr   = MV_SRAM_BASE_ETH1,
.tx_sram_size   = MV_SRAM_TXRING_SIZE,
.tx_queue_size  = MV_SRAM_TXRING_SIZE / 16,
@@ -105,7 +109,9 @@ static struct resource mv64x60_eth2_reso
},
 };
 
-static struct mv643xx_eth_platform_data eth2_pd;
+static struct mv643xx_eth_platform_data eth2_pd = {
+   .port_number= 2,
+};
 
 static struct platform_device eth2_device = {
.name   = MV643XX_ETH_NAME,
Index: b/arch/mips/momentum/ocelot_3/platform.c
===
--- a/arch/mips/momentum/ocelot_3/platform.c
+++ b/arch/mips/momentum/ocelot_3/platform.c
@@ -48,6 +48,8 @@ static struct resource mv64x60_eth0_reso
 };
 
 static struct mv643xx_eth_platform_data eth0_pd = {
+   .port_number= 0,
+
.tx_sram_addr   = MV_SRAM_BASE_ETH0,
.tx_sram_size   = MV_SRAM_TXRING_SIZE,
.tx_queue_size  = MV_SRAM_TXRING_SIZE / 16,
@@ -77,6 +79,8 @@ static struct resource mv64x60_eth1_reso
 };
 
 static struct mv643xx_eth_platform_data eth1_pd = {
+   .port_number= 1,
+
.tx_sram_addr   = MV_SRAM_BASE_ETH1,
.tx_sram_size   = MV_SRAM_TXRING_SIZE,
.tx_queue_size  = MV_SRAM_TXRING_SIZE / 16,
@@ -105,7 +109,9 @@ static struct resource mv64x60_eth2_reso
},
 };
 
-static struct mv643xx_eth_platform_data eth2_pd;
+static struct mv643xx_eth_platform_data eth2_pd = {
+   .port_number= 2,
+};
 
 static struct platform_device eth2_device = {
.name   = MV643XX_ETH_NAME,
Index: b/arch/mips/momentum/ocelot_c/platform.c
===
--- a/arch/mips/momentum/ocelot_c/platform.c
+++ b/arch/mips/momentum/ocelot_c/platform.c
@@ -47,6 +47,8 @@ static struct resource mv64x60_eth0_reso
 };
 
 static struct mv643xx_eth_platform_data eth0_pd = {
+   .port_number= 0,
+
.tx_sram_addr   = MV_SRAM_BASE_ETH0,
.tx_sram_size   = MV_SRAM_TXRING_SIZE,
.tx_queue_size  = MV_SRAM_TXRING_SIZE / 16,
@@ -76,6 +78,8 @@ static struct resource mv64x60_eth1_reso
 };
 
 static struct mv643xx_eth_platform_data eth1_pd = {
+   .port_number= 1,
+
.tx_sram_addr   = MV_SRAM_BASE_ETH1,
.tx_sram_size   = MV_SRAM_TXRING_SIZE,
.tx_queue_size  = MV_SRAM_TXRING_SIZE / 16,
Index: b/arch/powerpc/platforms/chrp/pegasos_eth.c
===
--- a/arch/powerpc/platforms/chrp/pegasos_eth.c
+++ b/arch/powerpc/platforms/chrp/pegasos_eth.c
@@ -58,6 +58,7 @@ static struct resource mv643xx_eth0_reso
 
 
 static struct mv643xx_eth_platform_data eth0_pd = {
+   .port_number= 0,
.tx_sram_addr = PEGASOS2_SRAM_BASE_ETH0,
.tx_sram_size = PEGASOS2_SRAM_TXRING_SIZE,
.tx_queue_size = PEGASOS2_SRAM_TXRING_SIZE/16,
@@ -87,6 +88,7 @@ static struct resource mv643xx_eth1_reso
 };
 
 static struct mv643xx_eth_platform_data eth1_pd = {
+   .port_number= 1,
.tx_sram_addr = PEGASOS2_SRAM_BASE_ETH1,
.tx_sram_size = PEGASOS2_SRAM_TXRING_SIZE,
.tx_queue_size = PEGASOS2_SRAM_TXRING_SIZE/16,
Index: b/arch/ppc/syslib/mv64x60.c
===
--- a/arch/ppc/syslib/mv64x60.c
+++ b/arch/ppc/syslib/mv64x60.c
@@ -339,7 +339,9 @@ static struct resource mv64x60_eth0_reso
},
 };
 
-static struct mv643xx_eth_platform_data eth0_pd;
+static struct mv643xx_eth_platform_data eth0_pd = {
+   .port_number= 0,
+};
 
 static struct platform_device eth0_device = {
.name   = MV643XX_ETH_NAME,
@@ -362,7 +364,9 @@ static struct resource mv64x60_eth1_reso
},
 };
 
-static struct 

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-02-28 Thread Davide Libenzi
On Wed, 28 Feb 2007, Ingo Molnar wrote:

> > Or with a simple/parellel async submission, coupled with threadlets, 
> > we can cover a pretty broad range of real life use cases?
> 
> sure, if we debate its virtualization driven market penetration via self 
> promoting technologies that also drive customer satisfaction, then we'll 
> be able to increase shareholder value by improving the user experience 
> and we'll also succeed in turning this vision into a supply/demand 
> marketplace. Or not?

Okkey then, I guess it's good to go as is :)



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: lanana: Add major/minor entries for PPC QE UART devices

2007-02-28 Thread Jan Engelhardt

On Feb 28 2007 20:25, Segher Boessenkool wrote:
>> Just allocate the four slots and we'll deal with
>> anything above this in custom products.
>
> Another option is to use 46..49 for UARTs #0..3,
> and 192..195 for UARTs #4..7.
>
> Or, perhaps better, use 46..49 for #0..3, and
> 192..199 for #0..7, handling the duplication in
> the driver; and deprecate the old range.

I'd "vote" for the 2nd. 


Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problem with freezable workqueues

2007-02-28 Thread Pavel Machek
On Wed 2007-02-28 23:39:30, Rafael J. Wysocki wrote:
> On Wednesday, 28 February 2007 21:35, Oleg Nesterov wrote:
> > On 02/28, Rafael J. Wysocki wrote:
> > > 
> > > Okay, but I've just finished the patch that removes the freezability of
> > > workqueues (appended), so can we please do this in a separate one?
> > 
> > Please, please, no. This patch is of course correct, but it breaks _a lot_
> > of patches in -mm tree.
> > 
> > May I ask you to send just
> > 
> > > ===
> > > --- linux-2.6.21-rc2.orig/fs/xfs/linux-2.6/xfs_buf.c
> > > +++ linux-2.6.21-rc2/fs/xfs/linux-2.6/xfs_buf.c
> > > @@ -1829,11 +1829,11 @@ xfs_buf_init(void)
> > >   if (!xfs_buf_zone)
> > >   goto out_free_trace_buf;
> > >  
> > > - xfslogd_workqueue = create_freezeable_workqueue("xfslogd");
> > > + xfslogd_workqueue = create_workqueue("xfslogd");
> > >   if (!xfslogd_workqueue)
> > >   goto out_free_buf_zone;
> > >  
> > > - xfsdatad_workqueue = create_freezeable_workqueue("xfsdatad");
> > > + xfsdatad_workqueue = create_workqueue("xfsdatad");
> > >   if (!xfsdatad_workqueue)
> > >   goto out_destroy_xfslogd_workqueue;
> > >  
> > > 
> > 
> > this bit?
> > 
> > After that, we can do the "removes the freezability of workqueues" patch
> > against -mm tree.
> 
> Okay, if that's better.
> 
> Pavel, is that acceptable to you?

No problem, but get that acked-by: from XFS people ;-).
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] mv643xx_eth: move mac_addr inside of mv643xx_eth_platform_data

2007-02-28 Thread Dale Farnsworth
The information contained within platform_data should be self-contained.
Replace the pointer to a MAC address with the actual MAC address in
struct mv643xx_eth_platform_data.

Signed-off-by: Dale Farnsworth <[EMAIL PROTECTED]>

Index: b/drivers/net/mv643xx_eth.c
===
--- a/drivers/net/mv643xx_eth.c
+++ b/drivers/net/mv643xx_eth.c
@@ -1380,7 +1380,9 @@ static int mv643xx_eth_probe(struct plat
 
pd = pdev->dev.platform_data;
if (pd) {
-   if (pd->mac_addr)
+   static u8 zero_mac_addr[6] = { 0 };
+
+   if (memcmp(pd->mac_addr, zero_mac_addr, 6) != 0)
memcpy(dev->dev_addr, pd->mac_addr, 6);
 
if (pd->phy_addr || pd->force_phy_addr)
Index: b/include/linux/mv643xx.h
===
--- a/include/linux/mv643xx.h
+++ b/include/linux/mv643xx.h
@@ -1289,7 +1289,6 @@ struct mv64xxx_i2c_pdata {
 #define MV643XX_ETH_NAME   "mv643xx_eth"
 
 struct mv643xx_eth_platform_data {
-   char*mac_addr;  /* pointer to mac address */
u16 force_phy_addr; /* force override if phy_addr == 0 */
u16 phy_addr;
 
@@ -1304,6 +1303,7 @@ struct mv643xx_eth_platform_data {
u32 tx_sram_size;
u32 rx_sram_addr;
u32 rx_sram_size;
+   u8  mac_addr[6];/* mac address if non-zero*/
 };
 
 #endif /* __ASM_MV643XX_H */
Index: b/arch/mips/momentum/jaguar_atx/platform.c
===
--- a/arch/mips/momentum/jaguar_atx/platform.c
+++ b/arch/mips/momentum/jaguar_atx/platform.c
@@ -47,11 +47,7 @@ static struct resource mv64x60_eth0_reso
},
 };
 
-static char eth0_mac_addr[ETH_ALEN];
-
 static struct mv643xx_eth_platform_data eth0_pd = {
-   .mac_addr   = eth0_mac_addr,
-
.tx_sram_addr   = MV_SRAM_BASE_ETH0,
.tx_sram_size   = MV_SRAM_TXRING_SIZE,
.tx_queue_size  = MV_SRAM_TXRING_SIZE / 16,
@@ -80,11 +76,7 @@ static struct resource mv64x60_eth1_reso
},
 };
 
-static char eth1_mac_addr[ETH_ALEN];
-
 static struct mv643xx_eth_platform_data eth1_pd = {
-   .mac_addr   = eth1_mac_addr,
-
.tx_sram_addr   = MV_SRAM_BASE_ETH1,
.tx_sram_size   = MV_SRAM_TXRING_SIZE,
.tx_queue_size  = MV_SRAM_TXRING_SIZE / 16,
@@ -113,11 +105,7 @@ static struct resource mv64x60_eth2_reso
},
 };
 
-static char eth2_mac_addr[ETH_ALEN];
-
-static struct mv643xx_eth_platform_data eth2_pd = {
-   .mac_addr   = eth2_mac_addr,
-};
+static struct mv643xx_eth_platform_data eth2_pd;
 
 static struct platform_device eth2_device = {
.name   = MV643XX_ETH_NAME,
@@ -200,9 +188,9 @@ static int __init mv643xx_eth_add_pds(vo
int ret;
 
get_mac(mac);
-   eth_mac_add(eth0_mac_addr, mac, 0);
-   eth_mac_add(eth1_mac_addr, mac, 1);
-   eth_mac_add(eth2_mac_addr, mac, 2);
+   eth_mac_add(eth0_pd.mac_addr, mac, 0);
+   eth_mac_add(eth1_pd.mac_addr, mac, 1);
+   eth_mac_add(eth2_pd.mac_addr, mac, 2);
ret = platform_add_devices(mv643xx_eth_pd_devs,
ARRAY_SIZE(mv643xx_eth_pd_devs));
 
Index: b/arch/mips/momentum/ocelot_3/platform.c
===
--- a/arch/mips/momentum/ocelot_3/platform.c
+++ b/arch/mips/momentum/ocelot_3/platform.c
@@ -47,11 +47,7 @@ static struct resource mv64x60_eth0_reso
},
 };
 
-static char eth0_mac_addr[ETH_ALEN];
-
 static struct mv643xx_eth_platform_data eth0_pd = {
-   .mac_addr   = eth0_mac_addr,
-
.tx_sram_addr   = MV_SRAM_BASE_ETH0,
.tx_sram_size   = MV_SRAM_TXRING_SIZE,
.tx_queue_size  = MV_SRAM_TXRING_SIZE / 16,
@@ -80,11 +76,7 @@ static struct resource mv64x60_eth1_reso
},
 };
 
-static char eth1_mac_addr[ETH_ALEN];
-
 static struct mv643xx_eth_platform_data eth1_pd = {
-   .mac_addr   = eth1_mac_addr,
-
.tx_sram_addr   = MV_SRAM_BASE_ETH1,
.tx_sram_size   = MV_SRAM_TXRING_SIZE,
.tx_queue_size  = MV_SRAM_TXRING_SIZE / 16,
@@ -113,11 +105,7 @@ static struct resource mv64x60_eth2_reso
},
 };
 
-static char eth2_mac_addr[ETH_ALEN];
-
-static struct mv643xx_eth_platform_data eth2_pd = {
-   .mac_addr   = eth2_mac_addr,
-};
+static struct mv643xx_eth_platform_data eth2_pd;
 
 static struct platform_device eth2_device = {
.name   = MV643XX_ETH_NAME,
@@ -200,9 +188,9 @@ static int __init mv643xx_eth_add_pds(vo
int ret;
 
get_mac(mac);
-   eth_mac_add(eth0_mac_addr, mac, 0);
-   eth_mac_add(eth1_mac_addr, mac, 1);
-   eth_mac_add(eth2_mac_addr, mac, 2);
+   eth_mac_add(eth0_pd.mac_addr, mac, 0);
+   eth_mac_add(eth1_pd.mac_addr, mac, 1);
+   eth_mac_add(eth2_pd.mac_addr, mac, 2);
ret = 

Re: Problem with freezable workqueues

2007-02-28 Thread Rafael J. Wysocki
On Wednesday, 28 February 2007 21:35, Oleg Nesterov wrote:
> On 02/28, Rafael J. Wysocki wrote:
> > 
> > Okay, but I've just finished the patch that removes the freezability of
> > workqueues (appended), so can we please do this in a separate one?
> 
> Please, please, no. This patch is of course correct, but it breaks _a lot_
> of patches in -mm tree.
> 
> May I ask you to send just
> 
> > ===
> > --- linux-2.6.21-rc2.orig/fs/xfs/linux-2.6/xfs_buf.c
> > +++ linux-2.6.21-rc2/fs/xfs/linux-2.6/xfs_buf.c
> > @@ -1829,11 +1829,11 @@ xfs_buf_init(void)
> > if (!xfs_buf_zone)
> > goto out_free_trace_buf;
> >  
> > -   xfslogd_workqueue = create_freezeable_workqueue("xfslogd");
> > +   xfslogd_workqueue = create_workqueue("xfslogd");
> > if (!xfslogd_workqueue)
> > goto out_free_buf_zone;
> >  
> > -   xfsdatad_workqueue = create_freezeable_workqueue("xfsdatad");
> > +   xfsdatad_workqueue = create_workqueue("xfsdatad");
> > if (!xfsdatad_workqueue)
> > goto out_destroy_xfslogd_workqueue;
> >  
> > 
> 
> this bit?
> 
> After that, we can do the "removes the freezability of workqueues" patch
> against -mm tree.

Okay, if that's better.

Pavel, is that acceptable to you?

Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/3] Freezer: Fix vfork problem

2007-02-28 Thread Rafael J. Wysocki
On Wednesday, 28 February 2007 21:30, Oleg Nesterov wrote:
> On 02/28, Rafael J. Wysocki wrote:
> >
> > Okay, I have added a comment to freezer.h.  Please have a look.
> >
> >
> > -extern void thaw_some_processes(int all);
> > +/*
> > + * The PF_FREEZER_SKIP flag should be set by a vfork parent right before it
> > + * calls wait_for_completion() and reset right after it returns from 
> > this
> > + * function.  Next, the parent should call try_to_freeze() to freeze itself
> > + * appropriately in case the child has exited before the freezing of tasks 
> > is
> > + * complete.  However, we don't want kernel threads to be frozen in 
> > unexpected
> > + * places, so we allow them to block freeze_processes() instead or to set
> > + * PF_NOFREEZE if needed and PF_FREEZER_SKIP is only set for userland vfork
> > + * parents.  Fortunately, in the call_usermodehelper() case the parent 
> > won't
> > + * really block freeze_processes(), since call_usermodehelper() (the 
> > child)
> > + * does a little before exec/exit and it can't be frozen before waking up 
> > the
> > + * parent.
> > + */
> 
> I think this comment is accurate and understandable, and I am not suggesting
> to change it.
> 
> However, please note that PF_FREEZER_SKIP can be used not only for vfork().

Yes, it can.

> For example, it seems to me we can also use freezer_...count() to solve the
> problem with coredump. We can use the same "wait_for_completion_freezable"
> pattern in exit_mm() and in coredump_wait(). (i do not claim this is a best
> fix though).

You're right, but in that comment I wanted to explain why it was done this way
rather than what else it could be used for.  There may be some uses of it that
we can't even anticipate right now. :-)

Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] udivdi3: 64 bit divide

2007-02-28 Thread Jan Engelhardt

On Feb 27 2007 22:39, Ian Molton wrote:
> Russell King wrote:
>> On Tue, Feb 27, 2007 at 01:36:56PM -0800, Andrew Morton wrote:
>> > On Tue, 27 Feb 2007 13:18:40 -0800 Stephen Hemminger
>> > <[EMAIL PROTECTED]> wrote:
>> > > Then we should pull the existing udivdi3 implementations?
>> > >
>> > Not much point really.  Some architectures have gone
>> > and done that, but x86 has not.  x86 has enough
>> > coverage for us to pick up most problems, and any
>> > remaining problems are obviously in scruffy
>> > architectures which don't care about performance ;)
>> 
>> I doubt arm26 uses udivdi3, but that's something Ian
>> would have to confirm.
>
> I doubt it is used also, however I am not in a position
> to test this until at least after I have moved house.
> Please leave alone for now.

Simple. The non-arch specific code does not use 64/64
divides through the "/" operator (otherwise there would
already have been udivdi3 linking errors). So what
remains to check is arch/arm26. grep -Pr 'int64|\bu64'
returns only a few results to check (kernel/ecard.c,
nwfpe/), so the answer is most likely no.



Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-02-28 Thread Ingo Molnar

* Davide Libenzi  wrote:

> On Wed, 28 Feb 2007, Ingo Molnar wrote:
> 
> > * Davide Libenzi  wrote:
> > 
> > > Did you hide all the complexity of the userspace atom decoding inside 
> > > another function? :)
> > 
> > no, i made the 64-bit and 32-bit structures layout-compatible. This 
> > makes the 32-bit structure as large as the 64-bit ones, but that's not a 
> > big issue, compared to the simplifications it brings.
> 
> Do you have a new version to review?

yep, i've just released -v5.

> How about this, with async_wait returning asynid's back to a userspace 
> ring buffer?
> 
> struct syslet_utaom {
> long *result;
> unsigned long asynid;
> unsigned long nr_sysc;
> unsigned long params[8];
> };

we talked about the parameters at length: if they are pointers the 
layout is significantly more flexible and more capable. It's a pretty 
similar argument to the return-pointer thing. For example take a look at 
how the IO syslet atoms in Jens' FIO engine share the same fd. Even if 
there's 2 of them. And they are fully cacheable in constructed 
state. The same goes for the webserving examples i've got in the 
async-test userspace sample code. I can pick up a cached request and 
only update req->fd, i dont have to reinit the atoms at all. It stays 
nicely in the cache, is not re-dirtied, etc.

furthermore, having the parameters as pointers is also an optimization: 
look at the copy_uatom() x86 assembly code i did - it can do a simple 
jump out of the parameter fetching code. I actually tried /both/ of 
these variants in assembly (as i mentioned it in a previous reply, in 
the v1 thread) and the speed difference between a pointer and 
non-pointer variant was negligible. (even with 6 parameters filled in)

but yes ... another two more small changes and your layout will be 
awfully similar to the current uatom layout =B-)

> My problem with the syslets in their current form is, do we have a 
> real use for them that justify the extra complexity inside the kernel?

i call bullshit. really. I have just gone out and wasted some time 
cutting & pasting all the syslet engine code: it is 153 lines total, 
plus 51 lines of comments. The total patchset in comparison is:

 35 files changed, 1890 insertions(+), 71 deletions(-)

(and this over-estimates it because if this got removed then we'd still 
have to add an async execution syscall.) And the code is pretty compact 
and self-contained. Threadlets share much of the infrastructure with 
syslets: for example the completion ring code is _100%_ shared, the 
async execution code is 98% shared.

You are free to not like it though, and i'm willing to change any aspect 
of the API to make it more intuitive and more useful, but calling it 
'complexity' at this point is just handwaving. And believe it or not, a 
good number of people actually find syslets pretty cool.

> Or with a simple/parellel async submission, coupled with threadlets, 
> we can cover a pretty broad range of real life use cases?

sure, if we debate its virtualization driven market penetration via self 
promoting technologies that also drive customer satisfaction, then we'll 
be able to increase shareholder value by improving the user experience 
and we'll also succeed in turning this vision into a supply/demand 
marketplace. Or not?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


PID entries in /proc sorted by number, not start time in 2.6.19

2007-02-28 Thread Chuck Ebbert
Starting with kernel 2.6.19, the process directories in
/proc are sorted by number. They were sorted by process
start time in 2.6.18 and earlier. This makes the output
of procps come out in that order too, pissing off users
who are used to the old way.

To reproduce:
1. Wrap your PID numbers.
2. Do ls -fl /proc
3. Look at output of ps command.

Compare 2.6.18 to 2.6.19.

See also:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=230227

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] Define FIXED_PORT flag for serial_core

2007-02-28 Thread Russell King
On Tue, Feb 20, 2007 at 02:19:51PM +1100, David Gibson wrote:
> Therefore, this patch defines a UPF_FIXED_PORT flag for the uart_port
> structure.  If this flag is set when the serial port is configured,
> any attempts to alter the port's type, io address, irq or base clock
> with setserial are ignored.

I've been wondering about this, and it is questionable whether we
should allow any serial port which isn't owned by the legacy platform
device (the one called "serial8250", iow by the 8250 driver itself)
to have the base addresses and interrupts changed.

IOW, we apply this "fixed port" to any port registered by probe
modules external to the 8250 driver itself, such as PCI, PNP, etc.

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vlan & net drivers: avoid a 4-order allocation

2007-02-28 Thread Stephen Hemminger
On Wed, 28 Feb 2007 14:41:57 +0200
Dan Aloni <[EMAIL PROTECTED]> wrote:

> Hello,
> 
> This patch splits the vlan_group struct into a multi-allocated struct. On
> x86_64, the size of the original struct is a little more than 32KB, causing
> a 4-order allocation, which is prune to problems caused by buddy-system 
> external fragmentation conditions.
> 
> I couldn't just use vmalloc() because vfree() cannot be called in the
> softirq context of the RCU callback.
> 
> Signed-off-by: Dan Aloni <[EMAIL PROTECTED]>
> 

Please submit patch to proper place: netdev@vger.kernel.org
and Ben Greear <[EMAIL PROTECTED]>



-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] bonding: replace system timer with work queue

2007-02-28 Thread Stephen Hemminger
On Wed, 28 Feb 2007 10:12:01 +0100 (CET)
Jaroslav Kysela <[EMAIL PROTECTED]> wrote:

> Hi,
> 
>   please, review and apply to mm tree for further testing. The patch 
> is also available at 
> ftp://ftp.alsa-project.org/pub/kernel-patches/bonding-workqueue.patch .
> 
>   Thank you,
>   Jaroslav
> 

You should submit network patches to the entry in the MAINTAINERS file.

BONDING DRIVER
P:  Chad Tindel
M:  [EMAIL PROTECTED]
P:  Jay Vosburgh
M:  [EMAIL PROTECTED]
L:  [EMAIL PROTECTED]
W:  http://sourceforge.net/projects/bonding/
S:  Supported


> @@ -3569,20 +3552,20 @@ static int bond_close(struct net_device 
>*/
>  
>   if (bond->params.miimon) {  /* link check interval, in milliseconds. */
> - del_timer_sync(>mii_timer);
> + cancel_rearming_delayed_workqueue(bond_wq, >mii_work);
>   }
>  
>   if (bond->params.arp_interval) {  /* arp interval, in milliseconds. */
> - del_timer_sync(>arp_timer);
> + cancel_rearming_delayed_workqueue(bond_wq, >arp_work);
>   }
>  
>   switch (bond->params.mode) {
>   case BOND_MODE_8023AD:
> - del_timer_sync(&(BOND_AD_INFO(bond).ad_timer));
> + cancel_rearming_delayed_workqueue(bond_wq, 
> &(BOND_AD_INFO(bond).ad_work));
>   break;
>   case BOND_MODE_TLB:
>   case BOND_MODE_ALB:
> - del_timer_sync(&(BOND_ALB_INFO(bond).alb_timer));
> + cancel_rearming_delayed_workqueue(bond_wq, 
> &(BOND_ALB_INFO(bond).alb_work));
>   break;
>   default:
>   break;


This part will deadlock since it is not safe to cancel a workqueue
entry with RTNL mutex held. The cancel operation has to wait for the workqueue
to run, and the entry being run maybe stuck waiting for the RTNL.



-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] SLUB The unqueued slab allocator V3

2007-02-28 Thread David Miller
From: David Miller <[EMAIL PROTECTED]>
Date: Wed, 28 Feb 2007 14:00:22 -0800 (PST)

> V3 doesn't boot successfully on sparc64

False alarm!

This crash was actually due to an unrelated problem in the parport_pc
driver on my machine.

Slub v3 boots up and seems to work fine so far on sparc64.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: 2.6.21-rc1: more ACPI errors (EC__)

2007-02-28 Thread Moore, Robert
This exception appears to be originating somewhere in the EC driver:

ACPI Exception (evregion-0420): AE_NOT_FOUND, Returned by Handler for
[EmbeddedControl] [20070126]



> -Original Message-
> From: [EMAIL PROTECTED] [mailto:linux-acpi-
> [EMAIL PROTECTED] On Behalf Of Meelis Roos
> Sent: Tuesday, February 27, 2007 12:50 PM
> To: Linux Kernel list; linux-acpi@vger.kernel.org
> Subject: 2.6.21-rc1: more ACPI errors (EC__)
> 
> I tested 2.6.21-rc1 on my laptop (IBM X20 with 440BX) and found two
> problems:
> 
> First, a seemingly harmless one - ACPI error messages during bootup:
> 
> ACPI Exception (evregion-0420): AE_NOT_FOUND, Returned by Handler for
> [EmbeddedControl] [20070126]
> ACPI Exception (dswexec-0462): AE_NOT_FOUND, While resolving operands
for
> [OpcodeName unavailable] [20070126]
> ACPI Error (psparse-0537): Method parse/execution failed
> [\_SB_.PCI0.ISA_.EC__.BAT0._STA] (Node d3c435e0), AE_NOT_FOUND
> ACPI Exception (evregion-0420): AE_NOT_FOUND, Returned by Handler for
> [EmbeddedControl] [20070126]
> ACPI Error (psparse-0537): Method parse/execution failed
> [\_SB_.PCI0.ISA_.EC__.AC__._PSR] (Node d3c43540), AE_NOT_FOUND
> ACPI Exception (ac-0095): AE_NOT_FOUND, Error reading AC Adapter state
> [20070126]
> 
> And second, there were ACPI error messages about EC__ during shutdown.
> These messages appeared, then the "acpi_power_off called" message
> appeared, then it waited for about 5 seconds (this does not happen
> normally) and then it succeeded shutting down. Captured a screenshot
of
> these messages, temporarily available at
> http://www.cs.ut.ee/~mroos/x20-shutdown-acpi-errors.jpg
> 
> --
> Meelis Roos ([EMAIL PROTECTED])
> -
> To unsubscribe from this list: send the line "unsubscribe linux-acpi"
in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: lanana: Add major/minor entries for PPC QE UART devices

2007-02-28 Thread Segher Boessenkool

Please, let's just leave the four we have


No one is suggesting otherwise.


and let
the driver just allocate increasing minor numbers.
If anyone has a product with more than 4 UARTs,
they will have to figure out what to do with the
additional minors.


Since you say no one has ever used more than 4 UARTs,
there are two options:

- Cap the driver at 4 UARTs;
- Assign an extra range of minors for more ports.

Just randomly using some extra minors that aren't
assigned to you isn't such a great idea.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   >