Re: SLUB: Support for statistics to help analyze allocator behavior

2008-02-04 Thread Pekka J Enberg
On Tue, 5 Feb 2008, Eric Dumazet wrote:
> > Looks good but I am wondering if we want to make the statistics per-CPU so
> > that we can see the kmalloc/kfree ping-pong of, for example, hackbench
> > better?
> 
> AFAIK Christoph patch already have percpu statistics :)

Heh, sure, but it's not exported to userspace which is required for 
slabinfo to display the statistics.

Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86_64: make traps on 'iret' be debuggable in user space

2008-02-04 Thread Roland McGrath

This makes the x86-64 behavior for 32-bit processes that set
bogus %cs/%ss values (the only ones that can fault in iret)
match what the native i386 behavior has been since:

commit a879cbbb34cbecfa9707fbb6e5a00c503ac1ecb9
Author: Linus Torvalds <[EMAIL PROTECTED]>
Date:   Fri Apr 29 09:38:44 2005 -0700

x86: make traps on 'iret' be debuggable in user space

Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>
---
 arch/x86/kernel/entry_64.S |   25 +++--
 1 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 07d4aba..62744b1 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -592,13 +592,26 @@ ENTRY(native_iret)
.quad native_iret, bad_iret
.previous
.section .fixup,"ax"
-   /* force a signal here? this matches i386 behaviour */
-   /* running with kernel gs */
 bad_iret:
-   movq $11,%rdi   /* SIGSEGV */
-   TRACE_IRQS_ON
-   ENABLE_INTERRUPTS(CLBR_ANY | ~(CLBR_RDI))
-   jmp do_exit
+   /*
+* The iret traps when the %cs or %ss being restored is bogus.
+* (This can only happen in a 32-bit process, and only by invalid
+* selectors being set via ptrace.  Changing the value enforces
+* that the USER_RPL bits are set, but not that the index is valid.)
+* We've lost the original trap vector and error code.
+* #GPF is the most likely one to get for an invalid selector.
+* So pretend we completed the iret and took the #GPF in user mode.
+*
+* We are now running with the kernel GS after exception recovery.
+* But error_entry expects us to have user GS to match the user %cs,
+* so swap back.
+*/
+   INTR_FRAME
+   pushq $0
+   CFI_ADJUST_CFA_OFFSET 8
+   SWAPGS
+   jmp general_protection
+   CFI_ENDPROC
.previous
 
/* edi: workmask, edx: work */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SLUB: Support for statistics to help analyze allocator behavior

2008-02-04 Thread Eric Dumazet

Pekka J Enberg a écrit :

Hi Christoph,

On Mon, 4 Feb 2008, Christoph Lameter wrote:

The statistics provided here allow the monitoring of allocator behavior
at the cost of some (minimal) loss of performance. Counters are placed in
SLUB's per cpu data structure that is already written to by other code.


Looks good but I am wondering if we want to make the statistics per-CPU so 
that we can see the kmalloc/kfree ping-pong of, for example, hackbench 
better?


AFAIK Christoph patch already have percpu statistics :)


+#define STAT_ATTR(si, text)\
+static ssize_t text##_show(struct kmem_cache *s, char *buf)\
+{  \
+   unsigned long sum  = 0; \
+   int cpu;\
+   \
+   for_each_online_cpu(cpu)\
+   sum += get_cpu_slab(s, cpu)->stat[si];   \
+   return sprintf(buf, "%lu\n", sum);\
+}  \

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] USB: mark USB drivers as being GPL only

2008-02-04 Thread Diego Zuccato

Christer Weinigel ha scritto:

It isn't that easy.  The "Tamper-Proof Torx" screws on a vacuum cleaner 
or a toaster won't stop anybody from opening up the thing, I mean every 
little hardware store stocks those Torx bits.  But by using a slightly 
odd screw, the company can say "look, we'we done all we can to stop 
them, but the user bypassed our security device, and it's not our 
fault".
ROFL! Well, since a lot of screwdriver types are easily available, I 
don't think a judge could agree with 'em...


Apparently Intel and Atheros are trying to protect themselves 
in a similar way, they Open Source everything except for the regulatory 
daemon (Intel) or HAL object file (Atheros).  Why?  Because they belive 
that if they give away the sources to those parts they do the software 
equivalent of putting a normal Phillips screw in a home appliance. 
(Personally I think what they are doing is ridiculous, but apparently 
those companies' lawyers dont' agree).
Well, then why close the driver? Simply place the check in the firmware. 
Much harder to find, since it have to run on proprietary HW. The OS 
driver instead runs on standard (and usualli well-known) HW.
Keeping the screws similitude, closing the driver is more like using a 
Torx, while placing checks in the FW is more like using a lock-only 
screw (already seen some)...


[...]

so hiding the source really doesn't help.

Well, we all agree on this... Now we just have to make THEM agree, too...


BYtE,
 Diego.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86_64: fix iret exception recovery

2008-02-04 Thread Roland McGrath

This change broke recovery of exceptions in iret:

commit 72fe4858544292ad64600765cb78bc02298c6b1c
Author: Glauber de Oliveira Costa <[EMAIL PROTECTED]>

x86: replace privileged instructions with paravirt macros

The ENTRY(native_iret) macro adds alignment padding before the iretq
instruction, so "iret_label" no longer points exactly at the instruction.
It was sloppy to leave the old "iret_label" label behind when replacing
its nearby use.  Removing it would have revealed the other use of the
label later in the file, and upon noticing that use, anyone exercising
the minimum of attention to detail expected of anyone touching this
subtle code would realize it needed to change as well.

Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>
---
 arch/x86/kernel/entry_64.S |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index bea8474..07d4aba 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -582,7 +582,6 @@ retint_restore_args:/* return to kernel space */
TRACE_IRQS_IRETQ
 restore_args:
RESTORE_ARGS 0,8,0  
-iret_label:
 #ifdef CONFIG_PARAVIRT
INTERRUPT_RETURN
 #endif
@@ -911,7 +910,7 @@ error_kernelspace:
   iret run with kernel gs again, so don't set the user space flag.
   B stepping K8s sometimes report an truncated RIP for IRET 
   exceptions returning to compat mode. Check for these here too. */
-   leaq iret_label(%rip),%rbp
+   leaq native_iret(%rip),%rbp
cmpq %rbp,RIP(%rsp) 
je   error_swapgs
movl %ebp,%ebp  /* zero extend */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH for review] ACPI: Create /sys/firmware/acpi/interrupts/ counters

2008-02-04 Thread Len Brown
From: Len Brown <[EMAIL PROTECTED]>

Here is the ACPI GPE statistics patch, forward ported to grok 2.6.25's kobject 
changes.
(Greg, let me know if I was able to unleash the inner beauty of the kobj 
interface)
(yes, I know you don't like sysfs files with more than 1 value,
 but I violate that rule for only one of the 33 files:-)

thanks,
-Len
--

# ls /sys/firmware/acpi/interrupts/
gpe00  gpe02  gpe04  gpe06  gpe08  gpe0A  gpe0C  gpe0E  gpe10  gpe12  gpe14  
gpe16  gpe18  gpe1A  gpe1C  gpe1E  summary
gpe01  gpe03  gpe05  gpe07  gpe09  gpe0B  gpe0D  gpe0F  gpe11  gpe13  gpe15  
gpe17  gpe19  gpe1B  gpe1D  gpe1F

# cat /sys/firmware/acpi/interrupts/summary
pm_timer 0
glbl_lock0
power_btn0
sleep_btn0
rtc  0
gpe000
gpe010
gpe020
gpe030
gpe040
gpe050
gpe060
gpe070
gpe080
gpe092
gpe0A0
gpe0B0
gpe0C0
gpe0D0
gpe0E0
gpe0F0
gpe100
gpe11   60
gpe120
gpe130
gpe140
gpe150
gpe160
gpe170
gpe180
gpe191
gpe1A0
gpe1B0
gpe1C0
gpe1D0
gpe1E0
gpe1F0
gpe_hi0
gpe_total   63
acpi_irq63

Inspired-by-original-patch-by: Luming Yu <[EMAIL PROTECTED]>
Signed-off-by: Len Brown <[EMAIL PROTECTED]>
---
 drivers/acpi/events/evevent.c |2 +-
 drivers/acpi/events/evgpe.c   |2 +
 drivers/acpi/osl.c|   12 ++-
 drivers/acpi/system.c |  218 +
 drivers/acpi/utilities/utglobal.c |2 +
 include/acpi/acglobal.h   |1 +
 include/acpi/aclocal.h|1 +
 include/acpi/acpiosxf.h   |2 +
 include/linux/acpi.h  |2 +
 9 files changed, 240 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/events/evevent.c b/drivers/acpi/events/evevent.c
index e412878..18514bf 100644
--- a/drivers/acpi/events/evevent.c
+++ b/drivers/acpi/events/evevent.c
@@ -259,7 +259,7 @@ u32 acpi_ev_fixed_event_detect(void)
enable_bit_mask)) {
 
/* Found an active (signalled) event */
-
+   acpi_fixed_event_count[i]++;
int_status |= acpi_ev_fixed_event_dispatch((u32) i);
}
}
diff --git a/drivers/acpi/events/evgpe.c b/drivers/acpi/events/evgpe.c
index e22f4a9..515128f 100644
--- a/drivers/acpi/events/evgpe.c
+++ b/drivers/acpi/events/evgpe.c
@@ -620,6 +620,8 @@ acpi_ev_gpe_dispatch(struct acpi_gpe_event_info 
*gpe_event_info, u32 gpe_number)
 
acpi_gpe_count++;
 
+   acpi_os_gpe_count(gpe_number);
+
/*
 * If edge-triggered, clear the GPE status bit now.  Note that
 * level-triggered events are cleared after the GPE is serviced.
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index e53fb51..1087efe 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -332,7 +332,15 @@ acpi_os_table_override(struct acpi_table_header * 
existing_table,
 
 static irqreturn_t acpi_irq(int irq, void *dev_id)
 {
-   return (*acpi_irq_handler) (acpi_irq_context) ? IRQ_HANDLED : IRQ_NONE;
+   u32 handled;
+
+   handled = (*acpi_irq_handler) (acpi_irq_context);
+
+   if (handled) {
+   acpi_irq_handled++;
+   return IRQ_HANDLED;
+   } else
+   return IRQ_NONE;
 }
 
 acpi_status
@@ -341,6 +349,8 @@ acpi_os_install_interrupt_handler(u32 gsi, acpi_osd_handler 
handler,
 {
unsigned int irq;
 
+   acpi_irq_stats_init();
+
/*
 * Ignore the GSI from the core, and use the value in our copy of the
 * FADT. It may not be the same if an interrupt source override exists
diff --git a/drivers/acpi/system.c b/drivers/acpi/system.c
index 5ffe0ea..538d154 100644
--- a/drivers/acpi/system.c
+++ b/drivers/acpi/system.c
@@ -40,6 +40,8 @@ ACPI_MODULE_NAME("system");
 #define ACPI_SYSTEM_CLASS  "system"
 #define ACPI_SYSTEM_DEVICE_NAME"System"
 
+u32 acpi_irq_handled;
+
 /*
  * Make ACPICA version work as module param
  */
@@ -166,6 +168,222 @@ static int acpi_system_sysfs_init(void)
return 0;
 }
 
+/*
+ * ACPI IRQ counters
+ *
+ * /sys/firmware/acpi/interrupts/
+ *   summary -- IRQ, Fixed Event, and GPE counters
+ *   gpeXX -- broken-out counters for up to 32 GPEs
+ */
+
+static u32 *acpi_gpe_counters;
+static u32 gpe_counter_high;   /* bucket for GPE's >= 32 */
+static u32 number_of_gpes;
+static struct attribute **all_attrs;
+
+static struct attribute_group interrupt_stats_attr_group = {
+   .name = "interrupts",
+};
+static struct kobj_attribute *gpe_attrs;
+
+static int count_num_gpes(void)
+{
+   int count = 0;
+   struct acpi_gpe_xrupt_info *gpe_xrupt_info;
+   struct acpi_gpe_block_info *gpe_block;
+   acpi_cpu_flags flags;
+
+   flags = acpi_os_acquire_lock(acpi_gbl_gpe_lock);
+
+   gpe_xrupt_info = acpi_gbl_gpe_xrupt_list_head;
+   while (gpe_xrupt_info) {
+   gpe_block = 

Re: 2.6.24-mm1 - Build failure at net/sched/cls_flow.c:598

2008-02-04 Thread Rami Rosen
Hello,
  I had sent a patch recently (which is currently pending) which
solves this problem.

see:
http://www.spinics.net/lists/netdev/msg54455.html


Regards,
Rami Rosen


On Feb 5, 2008 1:25 AM, Andrew Morton <[EMAIL PROTECTED]> wrote:
> On Mon, 04 Feb 2008 23:32:49 +0100
> Tilman Schmidt <[EMAIL PROTECTED]> wrote:
>
> > My attempt to build this failed with:
> >
> >CC [M]  net/sched/cls_flow.o
> > net/sched/cls_flow.c: In function ___flow_dump___:
> > net/sched/cls_flow.c:598: error: ___struct tcf_ematch_tree___ has no member 
> > named ___hdr___
> >
> > Config attached.
>
> Thanks.  hm.
>
> #else /* CONFIG_NET_EMATCH */
>
> struct tcf_ematch_tree
> {
> };
>
> methinks Patrick has a CONFIG_NET_EMATCH=n problem?
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SLUB: Support for statistics to help analyze allocator behavior

2008-02-04 Thread Pekka J Enberg
Hi Christoph,

On Mon, 4 Feb 2008, Christoph Lameter wrote:
> The statistics provided here allow the monitoring of allocator behavior
> at the cost of some (minimal) loss of performance. Counters are placed in
> SLUB's per cpu data structure that is already written to by other code.

Looks good but I am wondering if we want to make the statistics per-CPU so 
that we can see the kmalloc/kfree ping-pong of, for example, hackbench 
better?

Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: > best asked at one of the nvidia forums, not on lkml...

2008-02-04 Thread Arjan van de Ven
On Mon, 04 Feb 2008 22:53:10 -0800
Zachary Amsden <[EMAIL PROTECTED]> wrote:

> 
> On Tue, 2008-02-05 at 13:44 +0700, Igor M Podlesny wrote:
> > On 2008-02-05 13:34, Arjan van de Ven wrote:
> > [...]
> > >>  1) To have compiled it I had to replace
> > >> global_flush_tlb() call with __flush_tlb_all() and still
> > >> guessing was it(?) a correct replacment at all :-)
> > > 
> > > it is not; 
> > 
> > I see, thanks. What would be the correct one? ;-)
> 
> global_flush_tlb() would be the correct one.

... except that that function got absorbed into the functions that would 
otherwise require this guy to be called
(which is a needed step to do more selective clflushes for the specific range 
rather than wholesale wbinvd's
that flush all 12Mb or your cache while you only need to flush 4Kb... but only 
the other function knew the exact
range of stuff to flush)

> 


-- 
If you want to reach me at my work email, use [EMAIL PROTECTED]
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: issue with patch "x86: no CPA on iounmap"

2008-02-04 Thread Arjan van de Ven

Ingo Molnar wrote:

* Arjan van de Ven <[EMAIL PROTECTED]> wrote:


Siddha, Suresh B wrote:

This is wrt to x86 git commit f56d005d30342a45d8af2b75e82200f09600
"x86: no CPA on iounmap"

This can use performance issue. When a GART driver unmaps a RAM page,

thinking about this some more...

afaik the gart driver doesn't use ioremap

(and it does caching control explicitly, and sets its pages back to 
cached)


there are many GART drivers, and the method used depends on the GART 
driver. The following GART drivers still use ioremap in one way or 
another:


 drivers/char/agp/amd-k7-agp.c
 drivers/char/agp/ati-agp.c
 drivers/char/agp/generic.c
 drivers/char/agp/sworks-agp.c
 drivers/char/drm/radeon_cp.c

the method use is in all cases the same: they use __get_free_page() to 
pick up a general RAM page, they do SetPageReserved() and then they use 
ioremap_nocache() to map it non-cached, and then they also program the 
GART to access those pages.


when the GART code deinits, it does an iounmap() on those pages, unmaps 
it from the GART hardware itself, does a ClearPageReserved() and does 
__free_page() to put the page into the general page pool again. So 
Suresh is right: these pages are currently marked UC at this point and 
we need to mark them cacheable.


we could do this automatically in iounmap() upon seeing a page_is_ram() 
that has PageReserved set. Or we could stick in a set_memory_wb() into 
the deinit [and ioremap_nocache()-failure] sequence.


Since we treat PageReserved pages specially in ioremap() already [we 
allow them, despite them being listed in the e820 map], i think the more 
robust solution is to recognize them in iounmap() as well - this way it 
cannot be forgotten accidentally. (and UC pages in the buddy are _hard_ 
to notice after the fact) There is no aliasing danger i believe: IO bars 
should never be marked as general RAM in the e820.




agreed, esp for .25

it's sort of a weird case of ioremap() use; I wonder if longer term we need
to have a different sort of interface for this kind of use...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Cbe-oss-dev] LIO Target iSCSI/SE PS3-Linux / FC8 builds

2008-02-04 Thread Bart Van Assche
On Feb 4, 2008 5:44 PM, Marc Dietrich
<[EMAIL PROTECTED]> wrote:
> ...
> Anyway, heres a quick ugly fix for the ARCH detection code, tested on ps3.
> ...

Architecture detection is indeed broken in LIO. Would it be possible
to use the standard config.guess script instead of the custom LIO arch
detection script ? config.guess is included with a.o. automake and
libtool.

Bart Van Assche.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Scst-devel] Integration of SCST in the mainstream Linux kernel

2008-02-04 Thread Tomasz Chmielewski

James Bottomley schrieb:


These are both features being independently worked on, are they not?
Even if they weren't, the combination of the size of SCST in kernel plus
the problem of having to find a migration path for the current STGT
users still looks to me to involve the greater amount of work.


I don't want to be mean, but does anyone actually use STGT in
production? Seriously?

In the latest development version of STGT, it's only possible to stop
the tgtd target daemon using KILL / 9 signal - which also means all
iSCSI initiator connections are corrupted when tgtd target daemon is
started again (kernel upgrade, target daemon upgrade, server reboot etc.).

Imagine you have to reboot all your NFS clients when you reboot your NFS
server. Not only that - your data is probably corrupted, or at least the
filesystem deserves checking...


--
Tomasz Chmielewski
http://wpkg.org



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] ext3: per-process soft-syncing data=ordered mode

2008-02-04 Thread Al Boldi
Jan Kara wrote:
> On Sat 02-02-08 00:26:00, Al Boldi wrote:
> > Chris Mason wrote:
> > > Al, could you please compare the write throughput from vmstat for the
> > > data=ordered vs data=writeback runs?  I would guess the data=ordered
> > > one has a lower overall write throughput.
> >
> > That's what I would have guessed, but it's actually going up 4x fold for
> > mysql from 559mb to 2135mb, while the db-size ends up at 549mb.
>
>   So you say we write 4-times as much data in ordered mode as in writeback
> mode. Hmm, probably possible because we force all the dirty data to disk
> when committing a transation in ordered mode (and don't do this in
> writeback mode). So if the workload repeatedly dirties the whole DB, we
> are going to write the whole DB several times in ordered mode but in
> writeback mode we just keep the data in memory all the time. But this is
> what you ask for if you mount in ordered mode so I wouldn't consider it a
> bug.

Ok, maybe not a bug, but a bit inefficient.  Check out this workload:

sync;

while :; do
  dd < /dev/full > /mnt/sda2/x.dmp bs=1M count=20
  rm -f /mnt/sda2/x.dmp
  usleep 1
done

vmstat 1 ( with mount /dev/sda2 /mnt/sda2 -o data=writeback) << note io-bo >>

procs ---memory-- ---swap-- -io --system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   incs us sy id wa
 2  0  0 293008   5232  5743600 0 0   18   206  4 80 16  0
 1  0  0 282840   5232  6762000 0 0   18   238  3 81 16  0
 1  0  0 297032   5244  5336400 0   152   21   211  4 79 17  0
 1  0  0 285236   5244  6522400 0 0   18   232  4 80 16  0
 1  0  0 299464   5244  5088000 0 0   18   222  4 80 16  0
 1  0  0 290156   5244  6017600 0 0   18   236  3 80 17  0
 0  0  0 302124   5256  4778800 0   152   21   213  4 80 16  0
 1  0  0 292180   5256  5824800 0 0   18   239  3 81 16  0
 1  0  0 287452   5256  6244400 0 0   18   202  3 80 17  0
 1  0  0 293016   5256  5739200 0 0   18   250  4 80 16  0
 0  0  0 302052   5256  4778800 0 0   19   194  3 81 16  0
 1  0  0 297536   5268  5292800 0   152   20   233  4 79 17  0
 1  0  0 286468   5268  6387200 0 0   18   212  3 81 16  0
 1  0  0 301572   5268  4881200 0 0   18   267  4 79 17  0
 1  0  0 292636   5268  5777600 0 0   18   208  4 80 16  0
 1  0  0 302124   5280  4778800 0   152   21   237  4 80 16  0
 1  0  0 291436   5280  5897600 0 0   18   205  3 81 16  0
 1  0  0 302068   5280  4778800 0 0   18   234  3 81 16  0
 1  0  0 293008   5280  5738800 0 0   18   221  4 79 17  0
 1  0  0 297288   5292  5253200 0   156   22   233  2 81 16  1
 1  0  0 294676   5292  5572400 0 0   19   199  3 81 16  0


vmstat 1 (with mount /dev/sda2 /mnt/sda2 -o data=ordered)

procs ---memory-- ---swap-- -io --system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   incs us sy id wa
 2  0  0 291052   5156  5901600 0 0   19   223  3 82 15  0
 1  0  0 291408   5156  5870400 0 0   18   218  3 81 16  0
 1  0  0 291888   5156  5827600 020   23   229  3 80 17  0
 1  0  0 300764   5168  4947200 0 12864   91   235  3 69 13 15
 1  0  0 300740   5168  4945600 0 0   19   215  3 80 17  0
 1  0  0 301088   5168  4904400 0 0   18   241  4 80 16  0
 1  0  0 298220   5168  5187200 0 0   18   225  3 81 16  0
 0  1  0 289168   5168  6075200 0 12712   45   237  3 77 15  5
 1  0  0 300260   5180  4985200 0   152   68   211  4 72 15  9
 1  0  0 298616   5180  5146000 0 0   18   237  3 81 16  0
 1  0  0 296988   5180  5309200 0 0   18   223  3 81 16  0
 1  0  0 296608   5180  5348000 0 0   18   223  3 81 16  0
 0  0  0 301640   5192  4803600 0 12868   93   206  4 67 13 16
 0  0  0 301624   5192  4803600 0 0   21   218  3 81 16  0
 0  0  0 301600   5192  4803600 0 0   18   212  3 81 16  0
 0  0  0 301584   5192  4803600 0 0   18   209  4 80 16  0
 0  0  0 301568   5192  4803600 0 0   18   208  3 81 16  0
 1  0  0 285520   5204  6454800 0 12864   95   216  3 69 13 15
 2  0  0 285124   5204  6492400 0 0   18   222  4 80 16  0
 1  0  0 283612   5204  6639200 0 0   18   231  3 81 16  0
 1  0  0 284216   5204  6573600 0 0   18   218  4 80 16  0
 0  1  0 289160   5204  6075200 0 12712   56   213  3 74 15  8
 1 

Re: issue with patch "x86: no CPA on iounmap"

2008-02-04 Thread Ingo Molnar

* Arjan van de Ven <[EMAIL PROTECTED]> wrote:

> Siddha, Suresh B wrote:
>> This is wrt to x86 git commit f56d005d30342a45d8af2b75e82200f09600
>>  "x86: no CPA on iounmap"
>>
>> This can use performance issue. When a GART driver unmaps a RAM page,
>
> thinking about this some more...
>
> afaik the gart driver doesn't use ioremap
>
> (and it does caching control explicitly, and sets its pages back to 
> cached)

there are many GART drivers, and the method used depends on the GART 
driver. The following GART drivers still use ioremap in one way or 
another:

 drivers/char/agp/amd-k7-agp.c
 drivers/char/agp/ati-agp.c
 drivers/char/agp/generic.c
 drivers/char/agp/sworks-agp.c
 drivers/char/drm/radeon_cp.c

the method use is in all cases the same: they use __get_free_page() to 
pick up a general RAM page, they do SetPageReserved() and then they use 
ioremap_nocache() to map it non-cached, and then they also program the 
GART to access those pages.

when the GART code deinits, it does an iounmap() on those pages, unmaps 
it from the GART hardware itself, does a ClearPageReserved() and does 
__free_page() to put the page into the general page pool again. So 
Suresh is right: these pages are currently marked UC at this point and 
we need to mark them cacheable.

we could do this automatically in iounmap() upon seeing a page_is_ram() 
that has PageReserved set. Or we could stick in a set_memory_wb() into 
the deinit [and ioremap_nocache()-failure] sequence.

Since we treat PageReserved pages specially in ioremap() already [we 
allow them, despite them being listed in the e820 map], i think the more 
robust solution is to recognize them in iounmap() as well - this way it 
cannot be forgotten accidentally. (and UC pages in the buddy are _hard_ 
to notice after the fact) There is no aliasing danger i believe: IO bars 
should never be marked as general RAM in the e820.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/22 -v7] Add basic support for gcc profiler instrumentation

2008-02-04 Thread Paul E. McKenney
On Mon, Feb 04, 2008 at 05:41:40PM -0500, Mathieu Desnoyers wrote:
> * Steven Rostedt ([EMAIL PROTECTED]) wrote:
> > 
> > On Mon, 4 Feb 2008, Paul E. McKenney wrote:
> > > OK, will see what I can do...
> > >
> > > > On Sat, 2 Feb 2008, Paul E. McKenney wrote:
> > > >
> > > > > Yep, you have dependencies, so something like the following:
> > > > >
> > > > > initial state:
> > > > >
> > > > >   struct foo {
> > > > >   int a;
> > > > >   };
> > > > >   struct foo x = { 0 };
> > > > >   struct foo y = { 0 };
> > > > >   struct foo *global_p = 
> > > > >   /* other variables are appropriately declared auto variables */
> > > > >
> > > > >   /* No kmalloc() or kfree(), hence no RCU grace periods. */
> > > > >   /* In the terminology of http://lwn.net/Articles/262464/, we */
> > > > >   /* are doing only publish-subscribe, nothing else. */
> > > > >
> > > > > writer:
> > > > >
> > > > >   x.a = 1;
> > > > >   smp_wmb();  /* or smp_mb() */
> > > > >   global_p = 
> > > > >
> > > > > reader:
> > > > >
> > > > >   p = global_p;
> > > > >   ta = p->a;
> > > > >
> > > > > Both Alpha and aggressive compiler optimizations can result in the 
> > > > > reader
> > > > > seeing the new value of the pointer () but the old value of the 
> > > > > field
> > > > > (0).  Strange but true.  The fix is as follows:
> > > > >
> > > > > reader:
> > > > >
> > > > >   p = global_p;
> > > > >   smp_read_barrier_depends();  /* or use rcu_dereference() */
> > > > >   ta = p->a;
> > > > >
> > > > > So how can this happen?  First note that if smp_read_barrier_depends()
> > > > > was unnecessary in this case, it would be unnecessary in all cases.
> > > > >
> > > > > Second, let's start with the compiler.  Suppose that a highly 
> > > > > optimizing
> > > > > compiler notices that in almost all cases, the reader finds 
> > > > > p==global_p.
> > > > > Suppose that this compiler also notices that one of the registers (say
> > > > > r1) almost always contains this expected value of global_p, and that
> > > > > cache pressure ensures that an actual load from global_p almost always
> > > > > generates an expensive cache miss.  Such a compiler would be within 
> > > > > its
> > > > > rights (as defined by the C standard) to generate code assuming that 
> > > > > r1
> > > > > already had the right value, while also generating code to validate 
> > > > > this
> > > > > assumption, perhaps as follows:
> > > > >
> > > > >   r2 = global_p;  /* high latency, other things complete 
> > > > > meanwhile */
> > > > >   ta == r1->a;
> > > > >   if (r1 != r2)
> > > > >   ta = r2->a;
> > > > >
> > > > > Now consider the following sequence of events on a superscalar CPU:
> > > >
> > > > I think you missed one step here (causing my confusion). I don't want to
> > > > assume so I'll try to put in the missing step:
> > > >
> > > > writer: r1 = p;  /* happens to use r1 to store parameter p */
> > >
> > > You lost me on this one...  The writer has only the following three steps:
> > 
> > You're right. I meant "writer:  r1 = x;"
> > 
> > >
> > > writer:
> > >
> > >   x.a = 1;
> > >   smp_wmb();  /* or smp_mb() */
> > >   global_p = 
> > >
> > > Where did the "r1 = p" come from?  For that matter, where did "p" come
> > > from?
> > >
> > > > >   reader: r2 = global_p; /* issued, has not yet completed. */
> > > > >   reader: ta = r1->a; /* which gives zero. */
> > > > >   writer: x.a = 1;
> > > > >   writer: smp_wmb();
> > > > >   writer: global_p = 
> > > > >   reader: r2 = global_p; /* this instruction now completes */
> > > > >   reader: if (r1 != r2) /* and these are equal, so we keep bad 
> > > > > ta! */
> > > >
> > > > Is that the case?
> > >
> > > Ah!  Please note that I am doing something unusual here in that I am
> > > working with global variables, as opposed to the normal RCU practice of
> > > dynamically allocating memory.  So "x" is just a global struct, not a
> > > pointer to a struct.
> > >
> > 
> > But lets look at a simple version of my original code anyway ;-)
> > 
> > Writer:
> > 
> > void add_op(struct myops *x) {
> > /* x->next may be garbage here */
> > x->next = global_p;
> > smp_wmb();
> > global_p = x;
> > }
> > 
> > Reader:
> > 
> > void read_op(void)
> > {
> > struct myops *p = global_p;
> > 
> > while (p != NULL) {
> > p->func();
> > p = next;
> > /* if p->next is garbage we crash */
> > }
> > }
> > 
> > 
> > Here, we are missing the read_barrier_depends(). Lets look at the Alpha
> > cache issue:
> > 
> > 
> > reader reads the new version of global_p, and then reads the next
> > pointer. But since the next pointer is on a different cacheline than
> > global_p, it may have somehow had that in it's cache still. So it uses the
> > old next pointer which contains the garbage.
> > 
> > Is that correct?
> > 
> > But I will have to admit, that I 

Re: [PATCH 02/22 -v7] Add basic support for gcc profiler instrumentation

2008-02-04 Thread Paul E. McKenney
On Mon, Feb 04, 2008 at 05:03:47PM -0500, Steven Rostedt wrote:
> 
> On Mon, 4 Feb 2008, Paul E. McKenney wrote:
> > OK, will see what I can do...
> >
> > > On Sat, 2 Feb 2008, Paul E. McKenney wrote:
> > >
> > > > Yep, you have dependencies, so something like the following:
> > > >
> > > > initial state:
> > > >
> > > > struct foo {
> > > > int a;
> > > > };
> > > > struct foo x = { 0 };
> > > > struct foo y = { 0 };
> > > > struct foo *global_p = 
> > > > /* other variables are appropriately declared auto variables */
> > > >
> > > > /* No kmalloc() or kfree(), hence no RCU grace periods. */
> > > > /* In the terminology of http://lwn.net/Articles/262464/, we */
> > > > /* are doing only publish-subscribe, nothing else. */
> > > >
> > > > writer:
> > > >
> > > > x.a = 1;
> > > > smp_wmb();  /* or smp_mb() */
> > > > global_p = 
> > > >
> > > > reader:
> > > >
> > > > p = global_p;
> > > > ta = p->a;
> > > >
> > > > Both Alpha and aggressive compiler optimizations can result in the 
> > > > reader
> > > > seeing the new value of the pointer () but the old value of the field
> > > > (0).  Strange but true.  The fix is as follows:
> > > >
> > > > reader:
> > > >
> > > > p = global_p;
> > > > smp_read_barrier_depends();  /* or use rcu_dereference() */
> > > > ta = p->a;
> > > >
> > > > So how can this happen?  First note that if smp_read_barrier_depends()
> > > > was unnecessary in this case, it would be unnecessary in all cases.
> > > >
> > > > Second, let's start with the compiler.  Suppose that a highly optimizing
> > > > compiler notices that in almost all cases, the reader finds p==global_p.
> > > > Suppose that this compiler also notices that one of the registers (say
> > > > r1) almost always contains this expected value of global_p, and that
> > > > cache pressure ensures that an actual load from global_p almost always
> > > > generates an expensive cache miss.  Such a compiler would be within its
> > > > rights (as defined by the C standard) to generate code assuming that r1
> > > > already had the right value, while also generating code to validate this
> > > > assumption, perhaps as follows:
> > > >
> > > > r2 = global_p;  /* high latency, other things complete 
> > > > meanwhile */
> > > > ta == r1->a;
> > > > if (r1 != r2)
> > > > ta = r2->a;
> > > >
> > > > Now consider the following sequence of events on a superscalar CPU:
> > >
> > > I think you missed one step here (causing my confusion). I don't want to
> > > assume so I'll try to put in the missing step:
> > >
> > >   writer: r1 = p;  /* happens to use r1 to store parameter p */
> >
> > You lost me on this one...  The writer has only the following three steps:
> 
> You're right. I meant "writer:  r1 = x;"

OK, I understand.  You are correct, it would make more sense at the machine
level for the writer to do something like:

writer:

r1 = 
r1->a = 1;
smp_wmb();  /* or smp_mb() */
global_p = r1;

> > writer:
> >
> > x.a = 1;
> > smp_wmb();  /* or smp_mb() */
> > global_p = 
> >
> > Where did the "r1 = p" come from?  For that matter, where did "p" come
> > from?
> >
> > > > reader: r2 = global_p; /* issued, has not yet completed. */
> > > > reader: ta = r1->a; /* which gives zero. */
> > > > writer: x.a = 1;
> > > > writer: smp_wmb();
> > > > writer: global_p = 
> > > > reader: r2 = global_p; /* this instruction now completes */
> > > > reader: if (r1 != r2) /* and these are equal, so we keep bad 
> > > > ta! */
> > >
> > > Is that the case?
> >
> > Ah!  Please note that I am doing something unusual here in that I am
> > working with global variables, as opposed to the normal RCU practice of
> > dynamically allocating memory.  So "x" is just a global struct, not a
> > pointer to a struct.
> 
> But lets look at a simple version of my original code anyway ;-)

Fair enough!  ;-)

> Writer:
> 
> void add_op(struct myops *x) {
>   /* x->next may be garbage here */
>   x->next = global_p;
>   smp_wmb();
>   global_p = x;
> }
> 
> Reader:
> 
> void read_op(void)
> {
>   struct myops *p = global_p;
> 
>   while (p != NULL) {
>   p->func();
>   p = next;
>   /* if p->next is garbage we crash */
>   }
> }
> 
> 
> Here, we are missing the read_barrier_depends(). Lets look at the Alpha
> cache issue:
> 
> 
> reader reads the new version of global_p, and then reads the next
> pointer. But since the next pointer is on a different cacheline than
> global_p, it may have somehow had that in it's cache still. So it uses the
> old next pointer which contains the garbage.
> 
> Is that correct?

Indeed!  Changing the reader to be as follows should fix it:

Reader:

void read_op(void)
{
struct myops *p = 

> global_flush_tlb() would be the correct one.

2008-02-04 Thread Igor M Podlesny
On 2008-02-05 13:53, Zachary Amsden wrote:
> On Tue, 2008-02-05 at 13:44 +0700, Igor M Podlesny wrote:
>> On 2008-02-05 13:34, Arjan van de Ven wrote:
>> [...]
>> >>   1) To have compiled it I had to replace global_flush_tlb()
>> >> call with __flush_tlb_all() and still guessing was it(?) a correct
>> >> replacment at all :-)
>> > 
>> > it is not; 
>> 
>>  I see, thanks. What would be the correct one? ;-)
> 
> global_flush_tlb() would be the correct one.
> 
Looking at the kernel's patch I don't think so:

-void global_flush_tlb(void)
-{
-   struct list_head l;
-   struct page *pg, *next;
-
-   BUG_ON(irqs_disabled());
-
-   spin_lock_irq(_lock);
-   list_replace_init(_list, );
-   spin_unlock_irq(_lock);
-   flush_map();
-   list_for_each_entry_safe(pg, next, , lru) {
-   list_del(>lru);
-   clear_bit(PG_arch_1, >flags);
-   if (PageReserved(pg) || !cpu_has_pse || page_private(pg) != 0)
-   continue;
-   ClearPagePrivate(pg);
-   __free_page(pg);
-   }
-}
-
-EXPORT_SYMBOL(global_flush_tlb);

-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] at91_mci: minor cleanup

2008-02-04 Thread Pierre Ossman
On Wed, 30 Jan 2008 17:45:48 +0100
Nicolas Ferre <[EMAIL PROTECTED]> wrote:

> From: Marc Pignat <[EMAIL PROTECTED]>
> 
> MMC_POWER_ON is a noop, no need to set the power pin again.
> 
> Signed-off-by: Marc Pignat <[EMAIL PROTECTED]>
> Signed-off-by: Nicolas Ferre <[EMAIL PROTECTED]>
> ---

Perhaps also a WARN_ON() or something in the default case to catch bad 
invokations?

Rgds
Pierre


signature.asc
Description: PGP signature


Re: FW: 2.6.24 breaks BIOS updates on all Dell machines

2008-02-04 Thread Greg KH
On Tue, Jan 29, 2008 at 11:15:22PM +0100, Jean Delvare wrote:
> >So, I'm all for reverting this patch.
> >
> >And then, feel free to revisit the problem by proposing something that
> >doesn't break existing users of the interface.
> 
> I'm a bit confused. It seems to me that the "class devices" are named
> differently in recent kernels. The i2c-dev class devices were originally
> showing as i2c-%d in their parent device directories (causing the
> collision), and now show as i2c-dev:i2c-%d. This suggests that the
> collision the patch above was trying to solve is in fact already fixed
> (by prefixing the device name with the class name). The good news is
> that it would mean that we can just revert the patch in question...
> 
> But quite frankly I'm not really sure, the class devices look different
> on every kernel I looked at, depending on the version and whether
> CONFIG_SYSFS_DEPRECATED is set or not.

THe naming is different depending on that sysfs variable, yes.  But it
should be consistant other than that.  If not, please let me know.

And yes, we did have to add the ":" a while ago to handle the namespace
collisions we were having.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: > best asked at one of the nvidia forums, not on lkml...

2008-02-04 Thread Zachary Amsden

On Tue, 2008-02-05 at 13:44 +0700, Igor M Podlesny wrote:
> On 2008-02-05 13:34, Arjan van de Ven wrote:
> [...]
> >>1) To have compiled it I had to replace global_flush_tlb()
> >> call with __flush_tlb_all() and still guessing was it(?) a correct
> >> replacment at all :-)
> > 
> > it is not; 
> 
>   I see, thanks. What would be the correct one? ;-)

global_flush_tlb() would be the correct one.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


> best asked at one of the nvidia forums, not on lkml...

2008-02-04 Thread Igor M Podlesny
On 2008-02-05 13:34, Arjan van de Ven wrote:
[...]
>>  1) To have compiled it I had to replace global_flush_tlb()
>> call with __flush_tlb_all() and still guessing was it(?) a correct
>> replacment at all :-)
> 
> it is not; 

I see, thanks. What would be the correct one? ;-)
>> 
>>  2) When loading it emits such messages:
>> 
>>  nvidia: Unknown symbol change_page_attr
>>  nvidia: Unknown symbol init_mm
>> 
>>  Can it be quick and easy solved?
> 
> best asked at one of the nvidia forums, not on lkml...
> they need to adjust a few API calls, it's not hard work but they need
> to do that (their driver isn't open source)

You're right.

-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [mm] Crashkernel memory reservation fails with 2.6.24-rc8-mm1

2008-02-04 Thread Sachin P. Sant

Sachin P. Sant wrote:

Bernhard Walle wrote:
* Vivek Goyal <[EMAIL PROTECTED]> [2008-02-04 19:38]:  
Bernahard, any idea who is the competitor here? 

Hm ..., can you boot the kernel without crashkernel= and provide the
/proc/iomem?   

Attached is the /proc/iomem output with and without crashkernel
parameter.

Adding debug gives this extra information.

hm, page 02a9d000 reserved twice.
crashkernel reservation failed - memory is in use

Will try to add more debug statements.

Hm . no problem with 2.6.24-mm1.

early res: 4 [9dc00-a0bff] EBDA
early res: 5 [8000-11fff] PGTABLE
Reserving 128MB of memory at 32MB for crashkernel (System RAM: 9088MB)
[e200-e21f] PMD ->81000120 on node 0

# cat /proc/iomem
..
008b8000-0099dc8b : Kernel bss
 0200-09ff : Crash kernel
c7fcae00-c7fcf7ff : ACPI Tables
.

Thanks
-Sachin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NVIDIA's Linux x86 Display Driver fresh driver isn't compatible anymore

2008-02-04 Thread Arjan van de Ven
On Tue, 05 Feb 2008 13:28:40 +0700
Igor M Podlesny <[EMAIL PROTECTED]> wrote:

> On 2008-02-05 12:32, Igor M Podlesny wrote:
> > On 2008-02-04 20:27, Andrew Morton wrote:
> >> On Mon, 04 Feb 2008 20:16:48 +0700 Igor M Podlesny
> >> <[EMAIL PROTECTED]> wrote:
> > [...]
> >>>Now I can say that both 2.6.24-mm1 and 2.6.24-git11 do NOT
> >>> "see" any of mine LVM-2 disks. pvscan, for e.g., finds nothing at
> >>> all.
> >> 
> >> You may find that you need to update your lvm userspace tools.
> > 
> > You're right; I've updated my initrd with fresh lvm
> > userspace-counterpart and now the problem has been fixed. Sorry for
> > groundless alert.
> > 
> > Thanks!
> 
>   But as russian proverb says, trouble never comes alone. :-)
> NVIDIA's fresh driver isn't compatible anymore:
> 
>   1) To have compiled it I had to replace global_flush_tlb()
> call with __flush_tlb_all() and still guessing was it(?) a correct
> replacment at all :-)

it is not; 
> 
>   2) When loading it emits such messages:
> 
>   nvidia: Unknown symbol change_page_attr
>   nvidia: Unknown symbol init_mm
> 
>   Can it be quick and easy solved?

best asked at one of the nvidia forums, not on lkml...
they need to adjust a few API calls, it's not hard work but they need
to do that (their driver isn't open source)


-- 
If you want to reach me at my work email, use [EMAIL PROTECTED]
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.24-git6 soft lockup detected while running libhugetlbfs

2008-02-04 Thread Kamalesh Babulal
Ingo Molnar wrote:
> * Kamalesh Babulal <[EMAIL PROTECTED]> wrote:
> 
>> The CONFIG_NO_HZ is not set and the system seems not be truly locked 
>> up ,btw wc -l of the softlockup messages is around 108 times, while 
>> running the libhugetlbfs only and this is reproducible with the 
>> 2.6.24-git7 also.
> 
> Peter just fixed a handful of bugs in this area - does the patch below 
> help?
> 
>   Ingo
> 
> -->
> Subject: debug: softlockup looping fix
> From: Peter Zijlstra <[EMAIL PROTECTED]>
> 
> Rafael J. Wysocki reported weird, multi-seconds delays during
> suspend/resume and bisected it back to:
> 
>   commit 82a1fcb90287052aabfa235e7ffc693ea003fe69
>   Author: Ingo Molnar <[EMAIL PROTECTED]>
>   Date:   Fri Jan 25 21:08:02 2008 +0100
> 
>   softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks
> 
> fix it:
> 
>  - restore the old wakeup mechanism
>  - fix break usage in do_each_thread() { } while_each_thread().
>  - fix the hotplug switch stmt, a fall-through case was broken.
> 
> Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
> Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
> ---
>  kernel/softlockup.c |   30 --
>  1 file changed, 20 insertions(+), 10 deletions(-)
> 
> Index: linux/kernel/softlockup.c
> ===
> --- linux.orig/kernel/softlockup.c
> +++ linux/kernel/softlockup.c
> @@ -101,6 +101,10 @@ void softlockup_tick(void)
> 
>   now = get_timestamp(this_cpu);
> 
> + /* Wake up the high-prio watchdog task every second: */
> + if (now > (touch_timestamp + 1))
> + wake_up_process(per_cpu(watchdog_task, this_cpu));
> +
>   /* Warn about unreasonable delays: */
>   if (now <= (touch_timestamp + softlockup_thresh))
>   return;
> @@ -191,11 +195,11 @@ static void check_hung_uninterruptible_t
>   read_lock(_lock);
>   do_each_thread(g, t) {
>   if (!--max_count)
> - break;
> + goto unlock;
>   if (t->state & TASK_UNINTERRUPTIBLE)
>   check_hung_task(t, now);
>   } while_each_thread(g, t);
> -
> + unlock:
>   read_unlock(_lock);
>  }
> 
> @@ -218,14 +222,19 @@ static int watchdog(void *__bind_cpu)
>* debug-printout triggers in softlockup_tick().
>*/
>   while (!kthread_should_stop()) {
> + set_current_state(TASK_INTERRUPTIBLE);
>   touch_softlockup_watchdog();
> - msleep_interruptible(1);
> + schedule();
> +
> + if (kthread_should_stop())
> + break;
> 
>   if (this_cpu != check_cpu)
>   continue;
> 
>   if (sysctl_hung_task_timeout_secs)
>   check_hung_uninterruptible_tasks(this_cpu);
> +
>   }
> 
>   return 0;
> @@ -259,13 +268,6 @@ cpu_callback(struct notifier_block *nfb,
>   wake_up_process(per_cpu(watchdog_task, hotcpu));
>   break;
>  #ifdef CONFIG_HOTPLUG_CPU
> - case CPU_UP_CANCELED:
> - case CPU_UP_CANCELED_FROZEN:
> - if (!per_cpu(watchdog_task, hotcpu))
> - break;
> - /* Unbind so it can run.  Fall thru. */
> - kthread_bind(per_cpu(watchdog_task, hotcpu),
> -  any_online_cpu(cpu_online_map));
>   case CPU_DOWN_PREPARE:
>   case CPU_DOWN_PREPARE_FROZEN:
>   if (hotcpu == check_cpu) {
> @@ -275,6 +277,14 @@ cpu_callback(struct notifier_block *nfb,
>   check_cpu = any_online_cpu(temp_cpu_online_map);
>   }
>   break;
> +
> + case CPU_UP_CANCELED:
> + case CPU_UP_CANCELED_FROZEN:
> + if (!per_cpu(watchdog_task, hotcpu))
> + break;
> + /* Unbind so it can run.  Fall thru. */
> + kthread_bind(per_cpu(watchdog_task, hotcpu),
> +  any_online_cpu(cpu_online_map));
>   case CPU_DEAD:
>   case CPU_DEAD_FROZEN:
>   p = per_cpu(watchdog_task, hotcpu);
Hi Ingo,

Thanks for the patch. The softlockup is not always reproducible, I tried six 
rounds without the patch to reproduce
the softlockup but was not able to. This is not seen after the 2.6.24-git8 and 
above, hope because of peters patch 
is already there in in the git(s).

-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


NVIDIA's Linux x86 Display Driver fresh driver isn't compatible anymore

2008-02-04 Thread Igor M Podlesny
On 2008-02-05 12:32, Igor M Podlesny wrote:
> On 2008-02-04 20:27, Andrew Morton wrote:
>> On Mon, 04 Feb 2008 20:16:48 +0700 Igor M Podlesny <[EMAIL PROTECTED]> wrote:
> [...]
>>>Now I can say that both 2.6.24-mm1 and 2.6.24-git11 do NOT "see" any
>>> of mine LVM-2 disks. pvscan, for e.g., finds nothing at all.
>> 
>> You may find that you need to update your lvm userspace tools.
> 
>   You're right; I've updated my initrd with fresh lvm
> userspace-counterpart and now the problem has been fixed. Sorry for
> groundless alert.
> 
>   Thanks!

But as russian proverb says, trouble never comes alone. :-) NVIDIA's 
fresh driver
isn't compatible anymore:

1) To have compiled it I had to replace global_flush_tlb() call with
__flush_tlb_all() and still guessing was it(?) a correct replacment at all :-)

2) When loading it emits such messages:

nvidia: Unknown symbol change_page_attr
nvidia: Unknown symbol init_mm

Can it be quick and easy solved?

-- 
End of message. Next message?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch] x86: add code to dump the (kernel) page tables for visual inspection

2008-02-04 Thread Arjan van de Ven
Subject: x86: add code to dump the (kernel) page tables for visual inspection 
by kernel developers
From: Arjan van de Ven <[EMAIL PROTECTED]>

This patch adds code to the kernel to have an (optional)
/proc/kernel_page_tables debug file that basically dumps the kernel
pagetables; this allows us kernel developers to verify that nothing fishy is
going on and that the various mappings are set up correctly. This was quite
useful in finding various change_page_attr() bugs, and is very likely to be
useful in the future as well.

Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]>
---
 arch/x86/Kconfig.debug|   11 +
 arch/x86/mm/Makefile_64   |1 
 arch/x86/mm/dump_pagetables.c |  301 ++
 3 files changed, 313 insertions(+)

Index: linux.trees.git/arch/x86/Kconfig.debug
===
--- linux.trees.git.orig/arch/x86/Kconfig.debug
+++ linux.trees.git/arch/x86/Kconfig.debug
@@ -70,6 +70,17 @@ config DEBUG_PER_CPU_MAPS
 
  Say N if unsure.
 
+config X86_PTDUMP
+   bool "Export kernel pagetable layout to userspace in /proc"
+   depends on X86_64
+   help
+ Say Y here if you want to show the kernel pagetable layout in
+ a /proc file. This information is only useful for kernel developers
+ who are working in architecture specific areas of the kernel.
+ It is probably not a good idea to enable this feature in a production
+ kernel.
+ If in doubt, say "N"
+
 config DEBUG_RODATA
bool "Write protect kernel read-only data structures"
default y
Index: linux.trees.git/arch/x86/mm/Makefile_64
===
--- linux.trees.git.orig/arch/x86/mm/Makefile_64
+++ linux.trees.git/arch/x86/mm/Makefile_64
@@ -7,3 +7,4 @@ obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpag
 obj-$(CONFIG_NUMA) += numa_64.o
 obj-$(CONFIG_K8_NUMA) += k8topology_64.o
 obj-$(CONFIG_ACPI_NUMA) += srat_64.o
+obj-$(CONFIG_X86_PTDUMP) += dump_pagetables.o
Index: linux.trees.git/arch/x86/mm/dump_pagetables.c
===
--- /dev/null
+++ linux.trees.git/arch/x86/mm/dump_pagetables.c
@@ -0,0 +1,301 @@
+/*
+ * Debug helper to dump the current kernel pagetables of the system
+ * so that we can see what the various memory ranges are set to.
+ *
+ * (C) Copyright 2008 Intel Corporation
+ *
+ * Author: Arjan van de Ven <[EMAIL PROTECTED]>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+
+/*
+ * The dumper groups pagetable entries of the same type into one, and for
+ * that it needs to keep some state when walking, and flush this state
+ * when a "break" in the continuity is found.
+ */
+struct pg_state {
+   int level;
+   pgprot_t current_prot;
+   unsigned long start_address;
+   unsigned long current_address;
+   int printed_vmalloc;
+   int printed_modules;
+   int printed_vmemmap;
+   int printed_highmap;
+};
+
+/* Multipliers for offsets within the PTEs */
+#define LEVEL_4_MULT (PAGE_SIZE)
+#define LEVEL_3_MULT (512UL * LEVEL_4_MULT)
+#define LEVEL_2_MULT (512UL * LEVEL_3_MULT)
+#define LEVEL_1_MULT (512UL * LEVEL_2_MULT)
+
+
+/*
+ * Print a readable form of a pgprot_t to the seq_file
+ */
+static void printk_prot(struct seq_file *m, pgprot_t prot, int level)
+{
+   unsigned long pr = pgprot_val(prot);
+
+   if (pr & _PAGE_USER)
+   seq_printf(m, "USR ");
+   else
+   seq_printf(m, "");
+   if (pr & _PAGE_RW)
+   seq_printf(m, "RW ");
+   else
+   seq_printf(m, "ro ");
+   if (pr & _PAGE_PWT)
+   seq_printf(m, "PWT ");
+   else
+   seq_printf(m, "");
+   if (pr & _PAGE_PCD)
+   seq_printf(m, "PCD ");
+   else
+   seq_printf(m, "");
+
+   /* Bit 9 has a different meaning on level 3 vs 4 */
+   if (level <= 3) {
+   if (pr & _PAGE_PSE)
+   seq_printf(m, "PSE ");
+   else
+   seq_printf(m, "");
+   } else {
+   if (pr & _PAGE_PAT)
+   seq_printf(m, "pat ");
+   else
+   seq_printf(m, "");
+   }
+   if (pr & _PAGE_GLOBAL)
+   seq_printf(m, "GLB ");
+   else
+   seq_printf(m, "");
+   if (pr & _PAGE_NX)
+   seq_printf(m, "NX ");
+   else
+   seq_printf(m, "x  ");
+}
+
+/*
+ * Sign-extend the 48 bit address to 64 bit
+ */
+static unsigned long sign_extend(unsigned long u)
+{
+   if (u>>47)
+   u = u | (0xUL << 48);
+   return u;
+}
+
+/*
+ * This 

SLUB: Support for statistics to help analyze allocator behavior

2008-02-04 Thread Christoph Lameter
The statistics provided here allow the monitoring of allocator behavior
at the cost of some (minimal) loss of performance. Counters are placed in
SLUB's per cpu data structure that is already written to by other code.
 
The per cpu structure may be extended by the statistics to be more than 
one cacheline which will increase the cache footprint of SLUB.

That is why there is a compile option to enable/disable the inclusion of
the statistics module.

The slabinfo tool is enhanced to support these statistics via two options:

-D  Switches the line of information displayed for a slab from size
mode to activity mode.

-A  Sorts the slabs displayed by activity. This allows the display of
the slabs most important to the performance of a certain load.

-r  Report option will report detailed statistics on

Example (tbench load):

slabinfo -AD->Shows the most active slabs

Name   ObjectsAlloc Free   %Fast
skbuff_fclone_cache 33 111953835 111953835  99  99
:192  2666  5283688  5281047  99  99
:0001024   849  5247230  5246389  83  83
vm_area_struct1349   119642   118355  91  22
:0004096156675366751  98  98
:064  20672529723383  98  78
dentry   102592863518464  91  45
:080 1100418950 8089  98  98
:096  17031235810784  99  98
:128   76210582 9875  94  18
:512   184 9807 9647  95  81
:0002048   479 9669 9195  83  65
anon_vma   777 9461 9002  99  71
kmalloc-8 6492 9981 5624  99  97
:768   258 7174 6931  58  15

So the skbuff_fclone_cache is of highest importance for the tbench load.
Pretty high load on the 192 sized slab. Look for the aliases

slabinfo -a | grep 000192
:192 <- xfs_btree_cur filp kmalloc-192 uid_cache tw_sock_TCP 
request_sock_TCPv6 tw_sock_TCPv6 skbuff_head_cache xfs_ili

Likely skbuff_head_cache.


Looking into the statistics of the skbuff_fclone_cache is possible through

slabinfo skbuff_fclone_cache->-r option implied if cache name is mentioned


 Usual output ...

Slab Perf Counter   Alloc Free %Al %Fr
--
Fastpath 111953360 111946981  99  99
Slowpath 1044 7423   0   0
Page Alloc272  264   0   0
Add partial25  325   0   0
Remove partial 86  264   0   0
RemoteObj/SlabFrozen  350 4832   0   0
Total111954404 111954404

Flushes   49 Refill0
Deactivate Full=325(92%) Empty=0(0%) ToHead=24(6%) ToTail=1(0%)

Looks good because the fastpath is overwhelmingly taken.


skbuff_head_cache:

Slab Perf Counter   Alloc Free %Al %Fr
--
Fastpath  5297262  5259882  99  99
Slowpath 447739586   0   0
Page Alloc937  824   0   0
Add partial 0 2515   0   0
Remove partial   1691  824   0   0
RemoteObj/SlabFrozen 2621 9684   0   0
Total 5301739  5299468

Deactivate Full=2620(100%) Empty=0(0%) ToHead=0(0%) ToTail=0(0%)

Less good because the proportion of slowpath is a bit higher here.



Descriptions of the output:

Total:  The total number of allocation and frees that occurred for a
slab

Fastpath:   The number of allocations/frees that used the fastpath.

Slowpath:   Other allocations

Page Alloc: Number of calls to the page allocator as a result of slowpath
processing

Add Partial:Number of slabs added to the partial list through free or
alloc (occurs during cpuslab flushes)

Remove Partial: Number of slabs removed from the partial list as a result of
allocations retrieving a partial slab or by a free freeing
the last object of a slab.

RemoteObj/Froz: How many times were remotely freed object encountered when a
slab was about to be deactivated. Frozen: How many times was
free able to skip list processing because the slab was in use
as the cpuslab of another processor.

Flushes:Number of times the cpuslab was flushed on request
(kmem_cache_shrink, may result from races in __slab_alloc)

Refill: Number of times we were able to refill the cpuslab from
remotely freed objects for the same slab.

Deactivate: Statistics how slabs were deactivated. Shows how they were
put onto the partial list.


Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 Documentation/vm/slabinfo.c |  149 
 

Unable to access PCMCIA with O2 Micro OZ711MP1/MS1

2008-02-04 Thread Mader, Alexander (N-MSR)

Hello,

after contacting linux-pcmcia and some search I am approaching lkml.

There seems to be a problem accessing PCMCIA cards with O2 Micro 
OZ711MP1/MS1 Controller.


On a Fujitsu Siemens Celsius H240 a 2.6.22-3-amd64 kernel from Debian 
testing is in use. PCMCIA utilities for Linux 2.6 version 014-4 are 
installed. I could supply the output of lspci and lshal.


When I insert a CardBus adapter, for instance a CNET CNF401 fast 
ethernet, everything works fine and the adapter becomes available to the 
system -- in this example the NIC with working LAN access. The lspci and 
lshal output then reflects the new hardware.


When I insert a PCMCIA adapter, for instance a 3com 3CCE589EC, a 3com 
3C589C or an compact flash adapter, almost nothing happens: In the case 
of the 3com I just get (dmesg):


pccard: PCMCIA card inserted into slot 0

In the case of the compact flash adapter at the first insert dmesg gives:

pccard: PCMCIA card inserted into slot 0
cs: memory probe 0x0c-0x0f: excluding 0xc-0xf
cs: memory probe 0x6000-0x60ff: excluding 0x6000-0x60ff
cs: memory probe 0x8800-0x8fff: excluding 0x8800-0x8fff
cs: memory probe 0xa000-0xa0ff: excluding 0xa000-0xa0ff
cs: memory probe 0xf030-0xf03f: excluding 0xf030-0xf03f

Later on dmesg just issues "pccard: PCMCIA card inserted into slot 0" 
when the same card is inserted no matter how often.


In the PCMCIA cases the lshal output doesn't change, but in the lspci 
output the original line changes from:


BridgeCtl: Parity- SERR- ISA- VGA- MAbort- >Reset+ 16bInt+
PostWrite+  ^^

to:

BridgeCtl: Parity- SERR- ISA- VGA- MAbort- >Reset- 16bInt+
PostWrite+  ^^

In the case of the compact flash adapters neither "modprobe ide_cs" nor 
"modprobe pata_pcmcia" yield anything -- not even in dmesg.


As I do not want to access a SmartCard I did not try the o2scr driver 
from gna.org/projects/o2scr.


If I could supply more data I would like to do so on your request.

Best regards, Alexander.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mmu notifiers #v5

2008-02-04 Thread Christoph Lameter
On Tue, 5 Feb 2008, Andrea Arcangeli wrote:

> On Mon, Feb 04, 2008 at 11:09:01AM -0800, Christoph Lameter wrote:
> > On Sun, 3 Feb 2008, Andrea Arcangeli wrote:
> > 
> > > > Right but that pin requires taking a refcount which we cannot do.
> > > 
> > > GRU can use my patch without the pin. XPMEM obviously can't use my
> > > patch as my invalidate_page[s] are under the PT lock (a feature to fit
> > > GRU/KVM in the simplest way), this is why an incremental patch adding
> > > invalidate_range_start/end would be required to support XPMEM too.
> > 
> > Doesnt the kernel in some situations release the page before releasing the 
> > pte lock? Then there will be an external pte pointing to a page that may 
> > now have a different use. Its really bad if that pte does allow writes.
> 
> Sure the kernel does that most of the time, which is for example why I
> had to use invalidate_page instead of invalidate_pages inside
> zap_pte_range. Zero problems with that (this is also the exact reason
> why I mentioned the tlb flushing code would need changes to convert
> some page in pages).

Zero problems only if you find having a single callout for every page 
acceptable. So the invalidate_range in your patch is only working 
sometimes. And even if it works then it has to be used on 2M range. Seems 
to be a bit fragile and needlessly complex.

"conversion of some page in pages"? A proposal to defer the freeing of the 
pages until after the pte_unlock?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Whine about suspicious return values from module's ->init() hook

2008-02-04 Thread Rusty Russell
On Tuesday 05 February 2008 14:53:18 Andrew Morton wrote:
> On Tue, 5 Feb 2008 14:43:31 +1100 Rusty Russell <[EMAIL PROTECTED]> 
wrote:
> > On Tuesday 05 February 2008 02:42:15 Alexey Dobriyan wrote:
> > > One head-scratching session could be noticeably shorter with this
> > > patch...
> > >
> > > Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]>
> >
> > If we want to prevent > 0 returns, let's just BUG_ON().
>
> That risks killing previously-working setups.  WARN_ON is sufficient.

I disagree.  WARN_ON is useful for developers, but they can handle BUG_ON, 
too.

If we were in freeze, I'd say WARN_ON.

Even better, audit them all, then BUG_ON.  Alexey?
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] RUSAGE_THREAD

2008-02-04 Thread Sripathi Kodi
Hi Andrew,

This adds the RUSAGE_THREAD option for the getrusage system call.
This is essentially Roland's patch from http://lkml.org/lkml/2008/1/18/589, 
but the line about RUSAGE_LWP line has been removed, as suggested
by Ulrich and Christoph.

Thanks,
Sripathi.

This adds the RUSAGE_THREAD option for the getrusage system call.

Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>
Signed-off-by: Sripathi Kodi <[EMAIL PROTECTED]>
---
 include/linux/resource.h |1 +
 kernel/sys.c |   31 ++-
 2 files changed, 23 insertions(+), 9 deletions(-)

diff -uprN linux-2.6.24_org/include/linux/resource.h 
linux-2.6.24/include/linux/resource.h
--- linux-2.6.24_org/include/linux/resource.h   2008-02-05 10:13:04.0 
+0530
+++ linux-2.6.24/include/linux/resource.h   2008-02-05 10:14:59.0 
+0530
@@ -19,6 +19,7 @@ struct task_struct;
 #defineRUSAGE_SELF 0
 #defineRUSAGE_CHILDREN (-1)
 #define RUSAGE_BOTH(-2)/* sys_wait4() uses this */
+#defineRUSAGE_THREAD   1   /* only the calling thread */
 
 struct rusage {
struct timeval ru_utime;/* user time used */
diff -uprN linux-2.6.24_org/kernel/sys.c linux-2.6.24/kernel/sys.c
--- linux-2.6.24_org/kernel/sys.c   2008-02-05 10:13:02.0 +0530
+++ linux-2.6.24/kernel/sys.c   2008-02-05 10:13:21.0 +0530
@@ -1554,6 +1554,19 @@ out:
  *
  */
 
+static void accumulate_thread_rusage(struct task_struct *t, struct rusage *r,
+cputime_t *utimep, cputime_t *stimep)
+{
+   *utimep = cputime_add(*utimep, t->utime);
+   *stimep = cputime_add(*stimep, t->stime);
+   r->ru_nvcsw += t->nvcsw;
+   r->ru_nivcsw += t->nivcsw;
+   r->ru_minflt += t->min_flt;
+   r->ru_majflt += t->maj_flt;
+   r->ru_inblock += task_io_get_inblock(t);
+   r->ru_oublock += task_io_get_oublock(t);
+}
+
 static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
 {
struct task_struct *t;
@@ -1563,6 +1576,11 @@ static void k_getrusage(struct task_stru
memset((char *) r, 0, sizeof *r);
utime = stime = cputime_zero;
 
+   if (who == RUSAGE_THREAD) {
+   accumulate_thread_rusage(p, r, , );
+   goto out;
+   }
+
rcu_read_lock();
if (!lock_task_sighand(p, )) {
rcu_read_unlock();
@@ -1595,14 +1613,7 @@ static void k_getrusage(struct task_stru
r->ru_oublock += p->signal->oublock;
t = p;
do {
-   utime = cputime_add(utime, t->utime);
-   stime = cputime_add(stime, t->stime);
-   r->ru_nvcsw += t->nvcsw;
-   r->ru_nivcsw += t->nivcsw;
-   r->ru_minflt += t->min_flt;
-   r->ru_majflt += t->maj_flt;
-   r->ru_inblock += task_io_get_inblock(t);
-   r->ru_oublock += task_io_get_oublock(t);
+   accumulate_thread_rusage(t, r, , );
t = next_thread(t);
} while (t != p);
break;
@@ -1614,6 +1625,7 @@ static void k_getrusage(struct task_stru
unlock_task_sighand(p, );
rcu_read_unlock();
 
+out:
cputime_to_timeval(utime, >ru_utime);
cputime_to_timeval(stime, >ru_stime);
 }
@@ -1627,7 +1639,8 @@ int getrusage(struct task_struct *p, int
 
 asmlinkage long sys_getrusage(int who, struct rusage __user *ru)
 {
-   if (who != RUSAGE_SELF && who != RUSAGE_CHILDREN)
+   if (who != RUSAGE_SELF && who != RUSAGE_CHILDREN &&
+   who != RUSAGE_THREAD)
return -EINVAL;
return getrusage(current, who, ru);
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: > You may find that you need to update your lvm userspace tools.

2008-02-04 Thread Andrew Morton
On Tue, 05 Feb 2008 12:32:11 +0700 Igor M Podlesny <[EMAIL PROTECTED]> wrote:

> On 2008-02-04 20:27, Andrew Morton wrote:
> > On Mon, 04 Feb 2008 20:16:48 +0700 Igor M Podlesny <[EMAIL PROTECTED]> 
> > wrote:
> [...]
> >>Now I can say that both 2.6.24-mm1 and 2.6.24-git11 do NOT "see" any
> >> of mine LVM-2 disks. pvscan, for e.g., finds nothing at all.
> > 
> > You may find that you need to update your lvm userspace tools.
> 
>   You're right; I've updated my initrd with fresh lvm
> userspace-counterpart and now the problem has been fixed. Sorry for
> groundless alert.
> 

No, breakage of a userspace interface is considered a serious regression.

If this was deliberate and utterly unavoidable, well, that's bad but sometimes
these things happen.  We do prefer to go through elaborate notification
processes to minimise the disruption, which afaik did not happen here.

If, however, the breakage was was unintentional then we should find the
cause and fix it asap, and backport the fix into 2.6.24.1.

Please tell us what version of the userspace tools you were previously runnning.

Could someone in dm-devel land please get involved?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] Add OF-tree support to RapidIO controller driver.

2008-02-04 Thread Stephen Rothwell
On Wed, 30 Jan 2008 18:30:52 +0800 Zhang Wei <[EMAIL PROTECTED]> wrote:
>
> -void fsl_rio_setup(int law_start, int law_size)
> +int fsl_rio_setup(struct of_device *dev)
>  {

> + if (!dev->node) {
> + dev_err(>dev, "Device OF-Node is NULL");
> + return -EFAULT;

Probably -EINVAL would be better. Here and all the other -EFAULTs.

> + aw = *(u32 *)of_get_property(dev->node, "#address-cells", NULL);
> + sw = *(u32 *)of_get_property(dev->node, "#size-cells", NULL);

What happens if either of these properties is missing?

> +static struct of_device_id fsl_of_rio_rpn_ids[] = {

This should be "const" please.

-- 
Cheers,
Stephen Rothwell[EMAIL PROTECTED]
http://www.canb.auug.org.au/~sfr/


pgp5nmctM9RR0.pgp
Description: PGP signature


Re: [2.6.24 REGRESSION] BUG: Soft lockup - with VFS

2008-02-04 Thread Andrew Morton
On Mon, 28 Jan 2008 09:31:43 +0100 "Oliver Pinter (Pintér Olivér)"  <[EMAIL 
PROTECTED]> wrote:

> hi all!
> 
> in the 2.6.24 become i some soft lockups with usb-phone, when i pluged
> in the mobile, then the vfs-layer crashed. am afternoon can i the
> .config send, and i bisected the kernel, when i have time.
> 
> pictures from crash:
> http://students.zipernowsky.hu/~oliverp/kernel/regression_2624/

It looks like selinux's file_has_perm() is doing spin_lock() on an
uninitialised (or already locked) spinlock.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[git pull] agp patches for 2.6.25

2008-02-04 Thread Dave Airlie

Hi Linus,

Please pull the 'agp-patches' branch from
ssh://master.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6.git agp-patches

It adds initial support for chipset flushing along with some intel pciids.

Dave.

 arch/x86/pci/i386.c |2 +-
 drivers/char/agp/agp.h  |6 +-
 drivers/char/agp/amd-k7-agp.c   |4 -
 drivers/char/agp/backend.c  |2 +-
 drivers/char/agp/compat_ioctl.c |4 +
 drivers/char/agp/compat_ioctl.h |2 +
 drivers/char/agp/frontend.c |   13 ++-
 drivers/char/agp/generic.c  |7 +
 drivers/char/agp/intel-agp.c|  305 +++
 include/linux/agp_backend.h |1 +
 include/linux/agpgart.h |1 +
 11 files changed, 276 insertions(+), 71 deletions(-)

commit bc894606e8843808c232319f69c26c18f6eaa662
Author: Dave Airlie <[EMAIL PROTECTED]>
Date:   Tue Feb 5 15:05:23 2008 +1000

agp: remove flush_agp_mappings calls from new flush handling code

Signed-off-by: Dave Airlie <[EMAIL PROTECTED]>

commit f011ae7437761dc071b4154cabb0041df041a7c0
Author: Dave Airlie <[EMAIL PROTECTED]>
Date:   Fri Jan 25 11:23:04 2008 +1000

intel-agp: introduce IS_I915 and do some cleanups..

Add a new IS_I915 and also do some checkpatch whitespace cleanups.

Signed-off-by: Dave Airlie <[EMAIL PROTECTED]>

commit 9119f85a0cdbac0397b39fa198866bf530cfab8b
Author: Zhenyu Wang <[EMAIL PROTECTED]>
Date:   Wed Jan 23 15:49:26 2008 +1000

[intel_agp] fix name for G35 chipset

Change origin chipset name i965G_1 to market name G35.

Signed-off-by: Zhenyu Wang <[EMAIL PROTECTED]>
Signed-off-by: Dave Airlie <[EMAIL PROTECTED]>

commit 4d64dd9e5d96cdcfa8dee91c7848341718c77444
Author: Dave Airlie <[EMAIL PROTECTED]>
Date:   Wed Jan 23 15:34:29 2008 +1000

intel-agp: fixup resource handling in flush code.

The flush code resource handling was having problems where some BIOS
reserve the resource in a pnp block and some don't.

Also there was a bug in that configure was being called at resume
and resetting some of the structs.

Signed-off-by: Dave Airlie <[EMAIL PROTECTED]>

commit 4e8b6e25943a22036a6b704ebef634c7dec4c10e
Author: Zhenyu Wang <[EMAIL PROTECTED]>
Date:   Wed Jan 23 14:54:37 2008 +1000

intel-agp: add new chipset ID

This one adds new pci ids for Intel intergrated graphics chipset, with gtt
table access change on it and new gtt table size definition.

Signed-off-by: Zhenyu Wang <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Dave Airlie <[EMAIL PROTECTED]>

commit 91d361c279b66ce4d617d544641d5f70b27c401a
Author: Julia Lawall <[EMAIL PROTECTED]>
Date:   Wed Dec 5 13:55:36 2007 -0800

agp: remove unnecessary pci_dev_put

pci_get_class implicitly does a pci_dev_put on its second argument, so
pci_dev_put is only needed if there is a break out of the loop.

The semantic match detecting this problem is as follows:

// 
@@
expression dev;
expression E;
@@

* pci_dev_put(dev)
  ... when != dev = E
(
* pci_get_device(...,dev)
|
* pci_get_device_reverse(...,dev)
|
* pci_get_subsys(...,dev)
|
* pci_get_class(...,dev)
)
// 

Signed-off-by: Julia Lawall <[EMAIL PROTECTED]>
Cc: Dave Jones <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Dave Airlie <[EMAIL PROTECTED]>

commit 62f29babbc60ab572d3cecda981931d3a66123d6
Author: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
Date:   Wed Dec 5 13:55:36 2007 -0800

agp: remove uid comparison as security check

In the face of containers and user namespaces, a uid==0 check for
security is not safe.  Switch to a capability check.

I'm not sure I picked the right capability, but this being AGP
CAP_SYS_RAWIO seemed to make sense.

Signed-off-by: Serge Hallyn <[EMAIL PROTECTED]>
Signed-off-by: Dave Airlie <[EMAIL PROTECTED]>

commit 1fa4db7d308da04f6644c5cb8eed244c200d4ed5
Author: Andrew Morton <[EMAIL PROTECTED]>
Date:   Thu Nov 29 10:00:48 2007 +1000

fix AGP warning

drivers/char/agp/intel-agp.c: In function 
'intel_i965_g33_setup_chipset_flush':
drivers/char/agp/intel-agp.c:872: warning: right shift count >= width of 
type

I wish the agp code wasn't written in a 10,000-column xterm :(

Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Dave Airlie <[EMAIL PROTECTED]>

commit 2162e6a2b0cd5acbb9bd8a3c94e1c1269b078295
Author: Dave Airlie <[EMAIL PROTECTED]>
Date:   Wed Nov 21 16:36:31 2007 +1000

agp/intel: Add chipset flushing support for i8xx chipsets.

This is a bit of a large hammer but it makes sure the chipset is flushed
by writing out 1k of data to an uncached page. We may be able to get better
information in the future on how to this better.

Signed-off-by: Dave Airlie 

Re: [PATCH 2/2] Kprobes: Move kprobes examples to samples/

2008-02-04 Thread Abhishek Sagar
On 2/5/08, Ananth N Mavinakayanahalli <[EMAIL PROTECTED]> wrote:

> + * Build and insert the kernel module as done in the kprobe example.
> + * You will see the trace data in /var/log/messages and on the console
> + * whenever sys_open() returns a negative value.

A passing observation"sys_open" should be replaced with "do_fork",
whose return value is not checked at all.

--
Regards,
Abhishek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Small pm documentation cleanups

2008-02-04 Thread Nigel Cunningham

Hi.

Rafael J. Wysocki wrote:

Len, please pick this up, thanks.

On Tuesday, 5 of February 2008, Pavel Machek wrote:

Small documentation fixes/additions that accumulated in my tree.

Signed-off-by: Pavel Machek <[EMAIL PROTECTED]>


Acked-by: Rafael J. Wysocki <[EMAIL PROTECTED]>


diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index cf38689..3be3328 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -147,8 +147,10 @@ and is between 256 and 4096 characters. 
 			default: 0
 
 	acpi_sleep=	[HW,ACPI] Sleep options

-   Format: { s3_bios, s3_mode }
-   See Documentation/power/video.txt
+   Format: { s3_bios, s3_mode, s3_beep }
+   See Documentation/power/video.txt for s3_bios and 
s3_mode.
+   s3_beep is for debugging; it beeps on PC speaker as 
soon as
+   kernel's real-mode entry point is called.


s/kernel's/the kernel's/

 
 	acpi_sci=	[HW,ACPI] ACPI System Control Interrupt trigger mode

Format: { level | edge | high | low }
diff --git a/Documentation/power/swsusp.txt b/Documentation/power/swsusp.txt
index aea7e92..d3e5e4e 100644
--- a/Documentation/power/swsusp.txt
+++ b/Documentation/power/swsusp.txt
@@ -386,6 +386,11 @@ before suspending; then remount them aft
 There is a work-around for this problem.  For more information, see
 Documentation/usb/persist.txt.
 
+Q: Can I suspend-to-disk using a swap partition under LVM?

+
+A: No. You can suspend successfully, but you'll not be able to
+resume. uswsusp should be able to work with LVM, see suspend.sf.net.


s/LVM, see/LVM. See/


+
 Q: I upgraded the kernel from 2.6.15 to 2.6.16. Both kernels were
 compiled with the similar configuration files. Anyway I found that
 suspend to disk (and resume) is much slower on 2.6.16 compared to
diff --git a/drivers/acpi/hardware/hwsleep.c b/drivers/acpi/hardware/hwsleep.c
index fd1c4ba..058d0be 100644
--- a/drivers/acpi/hardware/hwsleep.c
+++ b/drivers/acpi/hardware/hwsleep.c
@@ -286,13 +286,13 @@ acpi_status asmlinkage acpi_enter_sleep_
}
 
 	/*

+* 1) Disable/Clear all GPEs
 * 2) Enable all wakeup GPEs
 */
status = acpi_hw_disable_all_gpes();
if (ACPI_FAILURE(status)) {
return_ACPI_STATUS(status);
}
-
acpi_gbl_system_awake_and_running = FALSE;
 
 	status = acpi_hw_enable_all_wakeup_gpes();

diff --git a/drivers/acpi/sleep/main.c b/drivers/acpi/sleep/main.c
index 485de13..56e09cf 100644
--- a/drivers/acpi/sleep/main.c
+++ b/drivers/acpi/sleep/main.c
@@ -170,7 +170,7 @@ static int acpi_pm_enter(suspend_state_t
/* Reprogram control registers and execute _BFS */
acpi_leave_sleep_state_prep(acpi_state);
 
-	/* ACPI 3.0 specs (P62) says that it's the responsabilty

+   /* ACPI 3.0 specs (P62) says that it's the responsibilty


s/responsibilty/responsibility/


 * of the OSPM to clear the status bit [ implying that the
 * POWER_BUTTON event should not reach userspace ]
 */
diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig
index ef9b802..b275ffb 100644
--- a/kernel/power/Kconfig
+++ b/kernel/power/Kconfig
@@ -74,14 +74,14 @@ config PM_TRACE_RTC
RTC across reboots, so that you can debug a machine that just hangs
during suspend (or more commonly, during resume).
 
-	To use this debugging feature you should attempt to suspend the machine,

-   then reboot it, then run
+   To use this debugging feature you should attempt to suspend the
+   machine, then reboot it, then run


More readable would be "machine, reboot it and then run"

 
 		dmesg -s 100 | grep 'hash matches'
 
 	CAUTION: this option will cause your machine's real-time clock to be

set to an invalid time after a resume.
 
 config PM_SLEEP_SMP

bool
depends on SMP
@@ -123,7 +129,8 @@ config HIBERNATION
  called "hibernation" in user interfaces.  STD checkpoints the
  system and powers it off; and restores that checkpoint on reboot.
 
-	  You can suspend your machine with 'echo disk > /sys/power/state'.
+	  You can suspend your machine with 'echo disk > /sys/power/state' 
+	  after placing resume=/dev/swappartition on kernel command line.


s/on kernel/on the kernel/

Maybe add "in your bootloader's configuration file"?


  Alternatively, you can use the additional userland tools available
  from .
 
diff --git a/kernel/power/main.c b/kernel/power/main.c

index 6a6d5eb..d3df5af 100644
--- a/kernel/power/main.c
+++ b/kernel/power/main.c
@@ -223,6 +226,7 @@ void __attribute__ ((weak)) arch_suspend
  * @state: state to enter
  *
  * This function should be called after devices have been suspended.
+ * May not sleep.
  */
 static int suspend_enter(suspend_state_t state)
 {
@@ -250,6 

> You may find that you need to update your lvm userspace tools.

2008-02-04 Thread Igor M Podlesny
On 2008-02-04 20:27, Andrew Morton wrote:
> On Mon, 04 Feb 2008 20:16:48 +0700 Igor M Podlesny <[EMAIL PROTECTED]> wrote:
[...]
>>Now I can say that both 2.6.24-mm1 and 2.6.24-git11 do NOT "see" any
>> of mine LVM-2 disks. pvscan, for e.g., finds nothing at all.
> 
> You may find that you need to update your lvm userspace tools.

You're right; I've updated my initrd with fresh lvm
userspace-counterpart and now the problem has been fixed. Sorry for
groundless alert.

Thanks!

-- 
End of message. Next message?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mmu notifiers #v5

2008-02-04 Thread Andrea Arcangeli
On Mon, Feb 04, 2008 at 11:09:01AM -0800, Christoph Lameter wrote:
> On Sun, 3 Feb 2008, Andrea Arcangeli wrote:
> 
> > > Right but that pin requires taking a refcount which we cannot do.
> > 
> > GRU can use my patch without the pin. XPMEM obviously can't use my
> > patch as my invalidate_page[s] are under the PT lock (a feature to fit
> > GRU/KVM in the simplest way), this is why an incremental patch adding
> > invalidate_range_start/end would be required to support XPMEM too.
> 
> Doesnt the kernel in some situations release the page before releasing the 
> pte lock? Then there will be an external pte pointing to a page that may 
> now have a different use. Its really bad if that pte does allow writes.

Sure the kernel does that most of the time, which is for example why I
had to use invalidate_page instead of invalidate_pages inside
zap_pte_range. Zero problems with that (this is also the exact reason
why I mentioned the tlb flushing code would need changes to convert
some page in pages).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [POWERPC] Use a sensible default for clock_getres() in the vdso.

2008-02-04 Thread Tony Breeds
On Sun, Jan 27, 2008 at 07:32:59PM +0530, Sripathi Kodi wrote:
> Hi Paul,
> 
> On PPC, I see a disparity between clock_getres implementations in the
> vdso and syscall. I am using a IBM Openpower hardware and 2.6.24 kernel
> with CONFIG_HIGH_RES_TIMERS=y. 
> 
> clock_getres call for CLOCK_REALTIME returns 1 millisecond. However,
> when I edit arch/powerpc/kernel/vdso*/gettimeofday.S to force it to use 
> sys_clock_getres, I get 1 nanosecond resolution. The code in vdso seems
> to be returning some pre-defined (incorrect) variables.
> 
> Could you please let me know the reason for this? Is it something that
> should be fixed in vdso?

Can you try the attached patch and see it if works for you?

From: Tony Breeds <[EMAIL PROTECTED]>
Subject: [PATCH] [POWERPC] Use a sensible default for clock_getres() in the 
vdso.

This ensures that the syscall and the (fast) vdso versions of clock_getres()
will return the same resolution.

Signed-off-by: Tony Breeds <[EMAIL PROTECTED]>
---
 arch/powerpc/kernel/asm-offsets.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index ed083fe..e6e4928 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #ifdef CONFIG_PPC64
 #include 
 #include 
@@ -312,7 +313,7 @@ int main(void)
DEFINE(CLOCK_REALTIME, CLOCK_REALTIME);
DEFINE(CLOCK_MONOTONIC, CLOCK_MONOTONIC);
DEFINE(NSEC_PER_SEC, NSEC_PER_SEC);
-   DEFINE(CLOCK_REALTIME_RES, TICK_NSEC);
+   DEFINE(CLOCK_REALTIME_RES, (KTIME_MONOTONIC_RES).tv64);
 
 #ifdef CONFIG_BUG
DEFINE(BUG_ENTRY_SIZE, sizeof(struct bug_entry));
-- 
1.5.4


Yours Tony

  linux.conf.auhttp://linux.conf.au/ || http://lca2008.linux.org.au/
  Jan 28 - Feb 02 2008 The Australian Linux Technical Conference!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] Kprobes: Move kprobes examples to samples/

2008-02-04 Thread Ananth N Mavinakayanahalli
From: Ananth N Mavinakayanahalli <[EMAIL PROTECTED]>

Move kprobes examples from Documentation/kprobes.txt to under samples/.
Patch originally by Randy Dunlap.

o Updated the patch to apply on 2.6.24-mm1
o Modified examples code to build on multiple architectures. Currently,
  the examples code works for x86 and powerpc
o Cleaned up unneeded #includes
o Cleaned up Kconfig per Sam Ravnborg's suggestions to fix build break
  on archs that don't have kretprobes
o Implemented suggestions by Mathieu Desnoyers on CONFIG_KRETPROBES
o Included Andrew Morton's cleanup based on x86-git

Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]>
Signed-off-by: Ananth N Mavinakayanahalli <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Acked-by: Mathieu Desnoyers <[EMAIL PROTECTED]>
---
 Documentation/kprobes.txt   |  235 
 samples/Kconfig |   11 +
 samples/Makefile|2 
 samples/kprobes/Makefile|5 
 samples/kprobes/jprobe_example.c|   68 ++
 samples/kprobes/kprobe_example.c|   91 +
 samples/kprobes/kretprobe_example.c |   95 ++
 7 files changed, 276 insertions(+), 231 deletions(-)

Index: linux-2.6.24/Documentation/kprobes.txt
===
--- linux-2.6.24.orig/Documentation/kprobes.txt
+++ linux-2.6.24/Documentation/kprobes.txt
@@ -193,7 +193,8 @@ code mapping.
 The Kprobes API includes a "register" function and an "unregister"
 function for each type of probe.  Here are terse, mini-man-page
 specifications for these functions and the associated probe handlers
-that you'll write.  See the latter half of this document for examples.
+that you'll write.  See the files in the samples/kprobes/ sub-directory
+for examples.
 
 4.1 register_kprobe
 
@@ -421,249 +422,15 @@ e. Watchpoint probes (which fire on data
 
 8. Kprobes Example
 
-Here's a sample kernel module showing the use of kprobes to dump a
-stack trace and selected i386 registers when do_fork() is called.
-- cut here -
-/*kprobe_example.c*/
-#include 
-#include 
-#include 
-#include 
-
-/*For each probe you need to allocate a kprobe structure*/
-static struct kprobe kp;
-
-/*kprobe pre_handler: called just before the probed instruction is executed*/
-int handler_pre(struct kprobe *p, struct pt_regs *regs)
-{
-   printk("pre_handler: p->addr=0x%p, eip=%lx, eflags=0x%lx\n",
-   p->addr, regs->eip, regs->eflags);
-   dump_stack();
-   return 0;
-}
-
-/*kprobe post_handler: called after the probed instruction is executed*/
-void handler_post(struct kprobe *p, struct pt_regs *regs, unsigned long flags)
-{
-   printk("post_handler: p->addr=0x%p, eflags=0x%lx\n",
-   p->addr, regs->eflags);
-}
-
-/* fault_handler: this is called if an exception is generated for any
- * instruction within the pre- or post-handler, or when Kprobes
- * single-steps the probed instruction.
- */
-int handler_fault(struct kprobe *p, struct pt_regs *regs, int trapnr)
-{
-   printk("fault_handler: p->addr=0x%p, trap #%dn",
-   p->addr, trapnr);
-   /* Return 0 because we don't handle the fault. */
-   return 0;
-}
-
-static int __init kprobe_init(void)
-{
-   int ret;
-   kp.pre_handler = handler_pre;
-   kp.post_handler = handler_post;
-   kp.fault_handler = handler_fault;
-   kp.symbol_name = "do_fork";
-
-   ret = register_kprobe();
-   if (ret < 0) {
-   printk("register_kprobe failed, returned %d\n", ret);
-   return ret;
-   }
-   printk("kprobe registered\n");
-   return 0;
-}
-
-static void __exit kprobe_exit(void)
-{
-   unregister_kprobe();
-   printk("kprobe unregistered\n");
-}
-
-module_init(kprobe_init)
-module_exit(kprobe_exit)
-MODULE_LICENSE("GPL");
-- cut here -
-
-You can build the kernel module, kprobe-example.ko, using the following
-Makefile:
-- cut here -
-obj-m := kprobe-example.o
-KDIR := /lib/modules/$(shell uname -r)/build
-PWD := $(shell pwd)
-default:
-   $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules
-clean:
-   rm -f *.mod.c *.ko *.o
-- cut here -
-
-$ make
-$ su -
-...
-# insmod kprobe-example.ko
-
-You will see the trace data in /var/log/messages and on the console
-whenever do_fork() is invoked to create a new process.
+See samples/kprobes/kprobe_example.c
 
 9. Jprobes Example
 
-Here's a sample kernel module showing the use of jprobes to dump
-the arguments of do_fork().
-- cut here -
-/*jprobe-example.c */
-#include 
-#include 
-#include 
-#include 
-#include 
-
-/*
- * Jumper probe for do_fork.
- * Mirror principle enables access to arguments of the probed routine
- * from the probe handler.
- */
-
-/* Proxy routine having the same arguments as actual do_fork() routine */
-long jdo_fork(unsigned long clone_flags, unsigned long stack_start,
- struct pt_regs *regs, 

[PATCH 1/2] Kprobes: Indicate kretprobe support in Kconfig

2008-02-04 Thread Ananth N Mavinakayanahalli
From: Ananth N Mavinakayanahalli <[EMAIL PROTECTED]>

This patch adds CONFIG_HAVE_KRETPROBES to the arch//Kconfig file
for relevant architectures with kprobes support. This facilitates easy
handling of in-kernel modules (like samples/kprobes/kretprobe_example.c)
that depend on kretprobes being present in the kernel.

Updated to apply on 2.6.24-mm1. Thanks to Sam Ravnborg for helping
make the patch more lean.

Per Mathieu's suggestion, added CONFIG_KRETPROBES and fixed up
dependencies.

Signed-off-by: Ananth N Mavinakayanahalli <[EMAIL PROTECTED]>
Acked-by: Mathieu Desnoyers <[EMAIL PROTECTED]>
---
 arch/Kconfig  |7 +++
 arch/ia64/Kconfig |1 +
 arch/powerpc/Kconfig  |1 +
 arch/s390/Kconfig |1 +
 arch/x86/Kconfig  |1 +
 include/asm-ia64/kprobes.h|1 -
 include/asm-powerpc/kprobes.h |1 -
 include/asm-x86/kprobes.h |1 -
 include/linux/kprobes.h   |6 +++---
 kernel/kprobes.c  |9 +++--
 10 files changed, 17 insertions(+), 12 deletions(-)

Index: linux-2.6.24/arch/Kconfig
===
--- linux-2.6.24.orig/arch/Kconfig
+++ linux-2.6.24/arch/Kconfig
@@ -27,5 +27,12 @@ config KPROBES
  for kernel debugging, non-intrusive instrumentation and testing.
  If in doubt, say "N".
 
+config KRETPROBES
+   def_bool y
+   depends on KPROBES && HAVE_KRETPROBES
+
 config HAVE_KPROBES
def_bool n
+
+config HAVE_KRETPROBES
+   def_bool n
Index: linux-2.6.24/arch/ia64/Kconfig
===
--- linux-2.6.24.orig/arch/ia64/Kconfig
+++ linux-2.6.24/arch/ia64/Kconfig
@@ -17,6 +17,7 @@ config IA64
select ARCH_SUPPORTS_MSI
select HAVE_OPROFILE
select HAVE_KPROBES
+   select HAVE_KRETPROBES
default y
help
  The Itanium Processor Family is Intel's 64-bit successor to
Index: linux-2.6.24/arch/powerpc/Kconfig
===
--- linux-2.6.24.orig/arch/powerpc/Kconfig
+++ linux-2.6.24/arch/powerpc/Kconfig
@@ -89,6 +89,7 @@ config PPC
default y
select HAVE_OPROFILE
select HAVE_KPROBES
+   select HAVE_KRETPROBES
 
 config EARLY_PRINTK
bool
Index: linux-2.6.24/arch/x86/Kconfig
===
--- linux-2.6.24.orig/arch/x86/Kconfig
+++ linux-2.6.24/arch/x86/Kconfig
@@ -20,6 +20,7 @@ config X86
def_bool y
select HAVE_OPROFILE
select HAVE_KPROBES
+   select HAVE_KRETPROBES
 
 config GENERIC_LOCKBREAK
def_bool n
Index: linux-2.6.24/include/asm-ia64/kprobes.h
===
--- linux-2.6.24.orig/include/asm-ia64/kprobes.h
+++ linux-2.6.24/include/asm-ia64/kprobes.h
@@ -82,7 +82,6 @@ struct kprobe_ctlblk {
struct prev_kprobe prev_kprobe[ARCH_PREV_KPROBE_SZ];
 };
 
-#define ARCH_SUPPORTS_KRETPROBES
 #define kretprobe_blacklist_size 0
 
 #define SLOT0_OPCODE_SHIFT (37)
Index: linux-2.6.24/include/asm-powerpc/kprobes.h
===
--- linux-2.6.24.orig/include/asm-powerpc/kprobes.h
+++ linux-2.6.24/include/asm-powerpc/kprobes.h
@@ -80,7 +80,6 @@ typedef unsigned int kprobe_opcode_t;
 #define is_trap(instr) (IS_TW(instr) || IS_TWI(instr))
 #endif
 
-#define ARCH_SUPPORTS_KRETPROBES
 #define flush_insn_slot(p) do { } while (0)
 #define kretprobe_blacklist_size 0
 
Index: linux-2.6.24/include/asm-x86/kprobes.h
===
--- linux-2.6.24.orig/include/asm-x86/kprobes.h
+++ linux-2.6.24/include/asm-x86/kprobes.h
@@ -42,7 +42,6 @@ typedef u8 kprobe_opcode_t;
: (((unsigned long)current_thread_info()) + THREAD_SIZE \
   - (unsigned long)(ADDR)))
 
-#define ARCH_SUPPORTS_KRETPROBES
 #define flush_insn_slot(p) do { } while (0)
 
 extern const int kretprobe_blacklist_size;
Index: linux-2.6.24/include/linux/kprobes.h
===
--- linux-2.6.24.orig/include/linux/kprobes.h
+++ linux-2.6.24/include/linux/kprobes.h
@@ -125,11 +125,11 @@ struct jprobe {
 DECLARE_PER_CPU(struct kprobe *, current_kprobe);
 DECLARE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
 
-#ifdef ARCH_SUPPORTS_KRETPROBES
+#ifdef CONFIG_KRETPROBES
 extern void arch_prepare_kretprobe(struct kretprobe_instance *ri,
   struct pt_regs *regs);
 extern int arch_trampoline_kprobe(struct kprobe *p);
-#else /* ARCH_SUPPORTS_KRETPROBES */
+#else /* CONFIG_KRETPROBES */
 static inline void arch_prepare_kretprobe(struct kretprobe *rp,
struct pt_regs *regs)
 {
@@ -138,7 +138,7 @@ static inline int arch_trampoline_kprobe
 {
return 0;
 }
-#endif /* ARCH_SUPPORTS_KRETPROBES */
+#endif 

Re: CPU hotplug and IRQ affinity with 2.6.24-rt1

2008-02-04 Thread Gregory Haskins
>>> On Mon, Feb 4, 2008 at  9:51 PM, in message
<[EMAIL PROTECTED]>, Daniel Walker
<[EMAIL PROTECTED]> wrote: 
> I get the following when I tried it,
> 
> BUG: sleeping function called from invalid context bash(5126) at
> kernel/rtmutex.c:638
> in_atomic():1 [0001], irqs_disabled():1

Hi Daniel,
  Can you try this patch and let me know if it fixes your problem?

---

use rcu for root-domain kfree

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>

diff --git a/kernel/sched.c b/kernel/sched.c
index e6ad493..77e86c1 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -339,6 +339,7 @@ struct root_domain {
atomic_t refcount;
cpumask_t span;
cpumask_t online;
+   struct rcu_head rcu;

/*
 * The "RT overload" flag: it gets set if a CPU has more than
@@ -6222,6 +6223,12 @@ sd_parent_degenerate(struct sched_domain *sd, struct 
sched_domain *parent)
return 1;
 }

+/* rcu callback to free a root-domain */
+static void rq_free_root(struct rcu_head *rcu)
+{
+   kfree(container_of(rcu, struct root_domain, rcu));
+}
+
 static void rq_attach_root(struct rq *rq, struct root_domain *rd)
 {
unsigned long flags;
@@ -6241,7 +6248,7 @@ static void rq_attach_root(struct rq *rq, struct 
root_domain *rd)
cpu_clear(rq->cpu, old_rd->online);

if (atomic_dec_and_test(_rd->refcount))
-   kfree(old_rd);
+   call_rcu(_rd->rcu, rq_free_root);
}

atomic_inc(>refcount);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Scst-devel] Integration of SCST in the mainstream Linux kernel

2008-02-04 Thread James Bottomley

On Tue, 2008-02-05 at 05:43 +0100, Matteo Tescione wrote:
> Hi all,
> And sorry for intrusion, i am not a developer but i work everyday with iscsi
> and i found it fantastic.
> Altough Aoe, Fcoe and so on could be better, we have to look in real world
> implementations what is needed *now*, and if we look at vmware world,
> virtual iron, microsoft clustering etc, the answer is iSCSI.
> And now, SCST is the best open-source iSCSI target. So, from an end-user
> point of view, what are the really problems to not integrate scst in the
> mainstream kernel?

The fact that your last statement is conjecture.  It's definitely untrue
for non-IB networks, and the jury is still out on IB networks.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][drivers/misc/thinkpad_acpi.c] duplicate test if (level & TP_EC_FAN_FULLSPEED)

2008-02-04 Thread Henrique de Moraes Holschuh
On Tue, 05 Feb 2008, Roel Kluin wrote:
> Roland Dreier wrote:
> >  > /* safety net should the EC not support AUTO
> >  >  * or FULLSPEED mode bits and just ignore them */
> >  > if (level & TP_EC_FAN_FULLSPEED)
> >  > level |= 7; /* safety min speed 7 */
> >  > else if (level & TP_EC_FAN_FULLSPEED)
> >  > level |= 4; /* safety min speed 4 */
> >  > 
> >  > Note the duplicate test 'if (level & TP_EC_FAN_FULLSPEED)'. should
> >  > this be replaced by
> > 
> > Actually I suspect one of the two tests should be against TP_EC_FAN_AUTO
> > (based on the comment).
> 
> Thanks Roland, for your info
> 
> based on the comments in commit eaa7571b2d1a08873e4bdd8e6db3431df61cd9ad,
> I think this should be modified like below:
> 
> ACPI: thinkpad-acpi: add a safety net for TPEC fan control mode
> The Linux ThinkPad community is not positive that all ThinkPads that do
> HFSP EC fan control do implement full-speed and auto modes, some of the
> earlier ones supporting HFSP might not.
> 
> If the EC ignores the AUTO or FULL-SPEED bits, it will pay attention to the
> lower three bits that set the fan level. And as thinkpad-acpi was leaving
> these set to zero, it would stop(!) the fan, which is Not A Good Thing.
> So, as a safety net, we now make sure to also set the fan level part of the
> HFSP register to speed 7 for full-speed, and a minimum of speed 4 for auto
> mode.
> --
> second TP_EC_FAN_FULLSPEED should be P_EC_FAN_AUTO
> 
> 
> Signed-off-by: Roel Kluin <[EMAIL PROTECTED]>
> ---
> diff --git a/drivers/misc/thinkpad_acpi.c b/drivers/misc/thinkpad_acpi.c
> index cf56647..3c323fe 100644
> --- a/drivers/misc/thinkpad_acpi.c
> +++ b/drivers/misc/thinkpad_acpi.c
> @@ -4138,7 +4138,7 @@ static int fan_set_level(int level)
>* or FULLSPEED mode bits and just ignore them */
>   if (level & TP_EC_FAN_FULLSPEED)
>   level |= 7; /* safety min speed 7 */
> - else if (level & TP_EC_FAN_FULLSPEED)
> + else if (level & TP_EC_FAN_AUTO)
>   level |= 4; /* safety min speed 4 */
>  
>   if (!acpi_ec_write(fan_status_offset, level))

ACK.  This needs to be sent to stable as well.  I think both 2.6.22 and
2.6.23 need this patch.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] SH/Dreamcast - fix regressions, whitespace and memory leaks in Maple Bus driver

2008-02-04 Thread Greg KH
On Mon, Feb 04, 2008 at 10:37:37PM +, Adrian McMenamin wrote:
> 
> On Mon, 2008-02-04 at 08:14 -0800, Greg KH wrote:
> > On Mon, Feb 04, 2008 at 08:27:55AM +, Adrian McMenamin wrote:
> > > 
> > > On Sun, 2008-02-03 at 21:29 -0800, Greg KH wrote:
> > > > On Sun, Feb 03, 2008 at 08:00:47PM +, Adrian McMenamin wrote:
> > > > > From: Adrian McMenamin
> > > > > 
> > > > > This patch fixes the regression noted here:
> > > > > http://lkml.org/lkml/2008/1/26/189 as well as whitespace issues in the
> > > > > previous commit of this driver and the memory leaks noted here:
> > > > > http://lkml.org/lkml/2008/2/2/143 (as well as one or two other minor
> > > > > cleanups).
> > > > 
> > > > Which portion of the patch fixes the kobject WARN_ON()?
> > > 
> > > 
> > > 
> > > + if (mdev->registered == 0) {
> > > + retval = device_register(>dev);
> > > + if (retval) {
> > > + printk(KERN_INFO
> > > + "Maple bus: Attempt to register device"
> > > + " (%x, %x) failed.\n",
> > > + mdev->port, mdev->unit);
> > > + maple_free_dev(mdev);
> > > + mdev = NULL;
> > > + return;
> > > + }
> > > + mdev->registered = 1;
> > > + }
> > >  }
> > > 
> > > 
> > > Specifically the check on mdev->registered
> > 
> > So the code path could cause devices to be registered more than once?
> > That seems broken, as no other bus that I know of needs such a check :(
> > 
> > Is there a way to fix the root problem here, instead of this type of
> > change?
> > 
> 
> The hardware is very flaky. If I add in delays to the bus start, it will
> detect the devices, but it's not brilliant. Registering an empty device
> got round that problem, at the price of testing for the earlier
> registration.

That sounds like you are just papering over the problem.  Just delay
and let the hardware settle down if needed :)

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Scst-devel] Integration of SCST in the mainstream Linux kernel

2008-02-04 Thread Matteo Tescione
Hi all,
And sorry for intrusion, i am not a developer but i work everyday with iscsi
and i found it fantastic.
Altough Aoe, Fcoe and so on could be better, we have to look in real world
implementations what is needed *now*, and if we look at vmware world,
virtual iron, microsoft clustering etc, the answer is iSCSI.
And now, SCST is the best open-source iSCSI target. So, from an end-user
point of view, what are the really problems to not integrate scst in the
mainstream kernel?

Just my two cent,
--
So long and thank for all the fish
--
#Matteo Tescione
#RMnet srl


> 
> 
> On Mon, 4 Feb 2008, Matt Mackall wrote:
>> 
>> But ATAoE is boring because it's not IP. Which means no routing,
>> firewalls, tunnels, congestion control, etc.
> 
> The thing is, that's often an advantage. Not just for performance.
> 
>> NBD and iSCSI (for all its hideous growths) can take advantage of these
>> things.
> 
> .. and all this could equally well be done by a simple bridging protocol
> (completely independently of any AoE code).
> 
> The thing is, iSCSI does things at the wrong level. It *forces* people to
> use the complex protocols, when it's a known that a lot of people don't
> want it. 
> 
> Which is why these AoE and FCoE things keep popping up.
> 
> It's easy to bridge ethernet and add a new layer on top of AoE if you need
> it. In comparison, it's *impossible* to remove an unnecessary layer from
> iSCSI.
> 
> This is why "simple and low-level is good". It's always possible to build
> on top of low-level protocols, while it's generally never possible to
> simplify overly complex ones.
> 
> Linus
> 
> -
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
> ___
> Scst-devel mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/scst-devel
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [git pull] x86 arch updates for v2.6.25

2008-02-04 Thread Andrew Morton
On Mon, 4 Feb 2008 20:11:03 -0800 Phil Oester <[EMAIL PROTECTED]> wrote:

> On Mon, Feb 04, 2008 at 07:27:53PM -0800, Linus Torvalds wrote:
> > kgdb? Not so interesting. We have many more hard problems happening at 
> > user sites, not in developer hands.
> 
> FWIW, I'm not a fulltime developer by any means, but on occasion
> I have fixed a few bugs in the netfilter area of the kernel.
> And in almost all cases, I used kgdb in my debugging and testing

   ^^^
> of fixes.  

yup.

> In doing so, it was a bit of a PITA to find/patch kgdb into the
> kernel, and having it as a configurable option would have saved
> me some time and effort and made the process much smoother.
> 
> So perhaps someone else out there would find it similarly useful,
> and the extra time it takes to find/patch/compile kgdb in is
> precluding them from participating?  Why would we ever want to do
> that?

I used kgdb continuously for 4-5 years until it broke.  I don't think I
ever used it much for "debugging" as such.  I used it more for general
observation of what's going on in the kernel.  And for _confirmation_ of
what's going on (ie: testing that the actual state matches the expected
state).

I'd end up doing my development with the assumption that kgdb was present. 
One example: rather than putting printks all over the place to ensure that
the right thing was happening at the right time I'd instead add code like

void foo(void)
{
}

...
if (expr)
foo();

then, when the testcase was up and running and in steady state, break in
and put a breakpoint on foo().  Continue, wait for the breakpoint then go
in and observe locals, globals, data structures, etc.

It's hard to describe (and remember!).  But the presence of the debugger as
a development (not debugging) tool changes the way you do development a bit.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: issue with patch "x86: no CPA on iounmap"

2008-02-04 Thread Arjan van de Ven

Siddha, Suresh B wrote:

This is wrt to x86 git commit f56d005d30342a45d8af2b75e82200f09600
"x86: no CPA on iounmap"

This can use performance issue. When a GART driver unmaps a RAM page,


thinking about this some more...

afaik the gart driver doesn't use ioremap

(and it does caching control explicitly, and sets its pages back to cached)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] 2.6.24-mm1 section type conflict cleanup

2008-02-04 Thread Kamalesh Babulal
Sam Ravnborg wrote:
> On Mon, Feb 04, 2008 at 09:52:23PM +0530, Kamalesh Babulal wrote:
>> Hi Andrew,
>>
>> The 2.6.24-mm1 kernel build fails at many places with section type
>> conflict build error.
> 
> What arch?
> We have troubles with powerpc as pointed out by Al in another thread.
> 
>   Sam
Hi Sam,

This clean up is done for the powerpc, sorry forgot to mention it.

-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2008-02-04 Thread Roland Dreier
Linus, please pull from

master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This tree is also available from kernel.org mirrors at:

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
for-linus

This will get a second batch of InfiniBand/RDMA batches.  In addition
to the usual motley crew of changes, this pull includes a new driver
for NetEffect RNICs in drivers/infiniband/hw/nes.  The code could use
some further cleaning, but I don't think it's worth holding off on the
merge.

David Dillow (1):
  IB/srp: Retry stale connections

Eli Cohen (2):
  IB/mthca: Remove checks for srq->first_free < 0
  IB/ib_mthca: Pre-link receive WQEs in Tavor mode

Glenn Streiff (1):
  RDMA/nes: Add a driver for NetEffect RNICs

Hoang-Nam Nguyen (1):
  IB/ehca: Add PMA support

Jack Morgenstein (2):
  IB/mthca: Don't read reserved fields in mthca_QUERY_ADAPTER()
  mlx4_core: Don't read reserved fields in mlx4_QUERY_ADAPTER()

Joachim Fenkes (2):
  IB/ehca: Prevent sending UD packets to QP0
  IB/ehca: Update sma_attr also in case of disruptive config change

Olaf Kirch (1):
  IB/mthca: Return proper error codes from mthca_fmr_alloc()

Or Gerlitz (3):
  IPoIB: Handle bonding failover race for connected neighbours too
  IPoIB: Remove a misleading debug print
  IB/fmr_pool: Allocate page list for pool FMRs only when caching enabled

Roland Dreier (4):
  mlx4_core: Fix more section mismatches
  IB/mthca: Fix and simplify page size calculation in mthca_reg_phys_mr()
  IB/mlx4: Actually print out the driver version
  IB: Avoid marking __devinitdata as const

Sean Hefty (1):
  IB/cm: Add interim support for routed paths

 MAINTAINERS  |   10 +
 drivers/infiniband/Kconfig   |2 +-
 drivers/infiniband/Makefile  |1 +
 drivers/infiniband/core/cm.c |   89 +-
 drivers/infiniband/core/fmr_pool.c   |7 +-
 drivers/infiniband/hw/ehca/ehca_classes.h|1 +
 drivers/infiniband/hw/ehca/ehca_irq.c|2 +
 drivers/infiniband/hw/ehca/ehca_iverbs.h |5 +
 drivers/infiniband/hw/ehca/ehca_main.c   |2 +-
 drivers/infiniband/hw/ehca/ehca_reqs.c   |4 +
 drivers/infiniband/hw/ehca/ehca_sqp.c|   91 +
 drivers/infiniband/hw/mlx4/main.c|   10 +-
 drivers/infiniband/hw/mthca/mthca_cmd.c  |   11 +-
 drivers/infiniband/hw/mthca/mthca_main.c |5 +-
 drivers/infiniband/hw/mthca/mthca_mr.c   |8 +-
 drivers/infiniband/hw/mthca/mthca_provider.c |   22 +-
 drivers/infiniband/hw/mthca/mthca_qp.c   |   13 +-
 drivers/infiniband/hw/mthca/mthca_srq.c  |   47 +-
 drivers/infiniband/hw/nes/Kconfig|   16 +
 drivers/infiniband/hw/nes/Makefile   |3 +
 drivers/infiniband/hw/nes/nes.c  | 1152 
 drivers/infiniband/hw/nes/nes.h  |  560 
 drivers/infiniband/hw/nes/nes_cm.c   | 3088 
 drivers/infiniband/hw/nes/nes_cm.h   |  433 +++
 drivers/infiniband/hw/nes/nes_context.h  |  193 ++
 drivers/infiniband/hw/nes/nes_hw.c   | 3080 
 drivers/infiniband/hw/nes/nes_hw.h   | 1206 
 drivers/infiniband/hw/nes/nes_nic.c  | 1703 +++
 drivers/infiniband/hw/nes/nes_user.h |  112 +
 drivers/infiniband/hw/nes/nes_utils.c|  917 ++
 drivers/infiniband/hw/nes/nes_verbs.c| 3917 ++
 drivers/infiniband/hw/nes/nes_verbs.h|  169 ++
 drivers/infiniband/ulp/ipoib/ipoib_main.c|   19 +-
 drivers/infiniband/ulp/srp/ib_srp.c  |   53 +-
 drivers/infiniband/ulp/srp/ib_srp.h  |1 +
 drivers/net/mlx4/fw.c|6 -
 drivers/net/mlx4/fw.h|3 -
 drivers/net/mlx4/main.c  |   11 +-
 drivers/net/mlx4/mr.c|2 +-
 39 files changed, 16848 insertions(+), 126 deletions(-)
 create mode 100644 drivers/infiniband/hw/nes/Kconfig
 create mode 100644 drivers/infiniband/hw/nes/Makefile
 create mode 100644 drivers/infiniband/hw/nes/nes.c
 create mode 100644 drivers/infiniband/hw/nes/nes.h
 create mode 100644 drivers/infiniband/hw/nes/nes_cm.c
 create mode 100644 drivers/infiniband/hw/nes/nes_cm.h
 create mode 100644 drivers/infiniband/hw/nes/nes_context.h
 create mode 100644 drivers/infiniband/hw/nes/nes_hw.c
 create mode 100644 drivers/infiniband/hw/nes/nes_hw.h
 create mode 100644 drivers/infiniband/hw/nes/nes_nic.c
 create mode 100644 drivers/infiniband/hw/nes/nes_user.h
 create mode 100644 drivers/infiniband/hw/nes/nes_utils.c
 create mode 100644 drivers/infiniband/hw/nes/nes_verbs.c
 create mode 100644 drivers/infiniband/hw/nes/nes_verbs.h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  

Re: [PATCH 07/22] ide-tape: struct idetape_tape_t: shorten member names v2

2008-02-04 Thread Borislav Petkov
On Tue, Feb 05, 2008 at 02:23:21AM +0100, Bartlomiej Zolnierkiewicz wrote:
> On Monday 04 February 2008, Borislav Petkov wrote:
> > Shorten some member names not too aggressively since this driver might be 
> > gone
> > anyway soon.
> > 
> > Signed-off-by: Borislav Petkov <[EMAIL PROTECTED]>
> > ---
> >  drivers/ide/ide-tape.c |  210 
> > ++--
> >  1 files changed, 113 insertions(+), 97 deletions(-)
> > 
> > diff --git a/drivers/ide/ide-tape.c b/drivers/ide/ide-tape.c
> > index 126e8a9..0b5ccce 100644
> > --- a/drivers/ide/ide-tape.c
> > +++ b/drivers/ide/ide-tape.c
> 
> [...]
> 
> > @@ -1583,7 +1579,8 @@ static void idetape_create_read_cmd(idetape_tape_t 
> > *tape, idetape_pc_t *pc, unsi
> > pc->bh = bh;
> > atomic_set(>b_count, 0);
> > pc->buffer = NULL;
> > -   pc->request_transfer = pc->buffer_size = length * tape->tape_block_size;
> > +   pc->buffer_size = length * tape->blk_size;
> > +   pc->request_transfer= length * tape->blk_size;
> > if (pc->request_transfer == tape->stage_size)
> > set_bit(PC_DMA_RECOMMENDED, >flags);
> >  }
> > @@ -1621,7 +1618,8 @@ static void idetape_create_write_cmd(idetape_tape_t 
> > *tape, idetape_pc_t *pc, uns
> > pc->b_data = bh->b_data;
> > pc->b_count = atomic_read(>b_count);
> > pc->buffer = NULL;
> > -   pc->request_transfer = pc->buffer_size = length * tape->tape_block_size;
> > +   pc->request_transfer= length * tape->blk_size;
> > +   pc->buffer_size = length * tape->blk_size;
> > if (pc->request_transfer == tape->stage_size)
> > set_bit(PC_DMA_RECOMMENDED, >flags);
> >  }
> 
> for some reason gcc doesn't seem to optimize the new code as well as
> the old one (=> driver size goes up instead of staying unchanged)
> 
> interdiff between original patch and merged version:
> 
> diff -u b/drivers/ide/ide-tape.c b/drivers/ide/ide-tape.c
> --- b/drivers/ide/ide-tape.c
> +++ b/drivers/ide/ide-tape.c
> @@ -324,7 +324,7 @@
>   /* Current character device data transfer direction */
>   u8 chrdev_dir;
>  
> - /* tape block size, usu. 512 or 1024 bytes */
> + /* tape block size, usually 512 or 1024 bytes */
>   unsigned short blk_size;
>   int user_bs_factor;
>  
> @@ -1580,8 +1580,8 @@
>   pc->bh = bh;
>   atomic_set(>b_count, 0);
>   pc->buffer = NULL;
> - pc->buffer_size = length * tape->blk_size;
> - pc->request_transfer= length * tape->blk_size;
> + pc->buffer_size = length * tape->blk_size;
> + pc->request_transfer = pc->buffer_size;
>   if (pc->request_transfer == tape->stage_size)
>   set_bit(PC_DMA_RECOMMENDED, >flags);
>  }
> @@ -1619,8 +1619,8 @@
>   pc->b_data = bh->b_data;
>   pc->b_count = atomic_read(>b_count);
>   pc->buffer = NULL;
> - pc->request_transfer= length * tape->blk_size;
> - pc->buffer_size = length * tape->blk_size;
> + pc->buffer_size = length * tape->blk_size;
> + pc->request_transfer = pc->buffer_size;
>   if (pc->request_transfer == tape->stage_size)
>   set_bit(PC_DMA_RECOMMENDED, >flags);
>  }

Yeah, i did that only because checkpatch.pl complained that multiple assignments
should be avoided. Now it looks kinda dumb that way besides improving 
readability
so converting it to the best form w.r.t generating smaller binary would be a 
reason
good enough to ignore checkpatch.pl in that case.

-- 
Regards/Gruß,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.24 refuses to boot - ATA problem?

2008-02-04 Thread Gene Heskett
On Monday 04 February 2008, Mark Lord wrote:
>Gene Heskett wrote:
>> On Sunday 03 February 2008, Ingo Molnar wrote:
>>> * Gene Heskett <[EMAIL PROTECTED]> wrote:
 I believe its the same, but lemme paste it for sure, yes:
 [   26.339926] ENABLING IO-APIC IRQs
 [   26.340119] ..TIMER: vector=0x31 apic1=0 pin1=0 apic2=-1 pin2=-1
 [   26.350129] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
 [   26.350182] ...trying to set up timer (IRQ0) through the 8259A ...
 failed. [   26.350185] ...trying to set up timer as Virtual Wire IRQ...
 failed. [   26.360186] ...trying to set up timer as ExtINT IRQ... works.

 The third line is the only line that makes it to the screen during the
 boot trace.

 Now, what does this tell us?
>>>
>>> the question would be:
>>>
>>> - if you remove the acpi_use_timer_override boot flag
>>> - and if you boot a kernel with this hack applied
>>>
>>> => do those weird PATA failures come back?
>>>
>>> If the failues do _not_ come back then the problem is somehow
>>> affected/worked-around by the IO-APIC code that generates the above 4
>>> lines. If the failures are still the same then the above 4 lines are
>>> really just an uninteresting side-effect of the acpi_use_timer_override
>>> flag - and the real side-effects (that fixes PATA on your box) are to be
>>> found elsewhere.
>>>
>>> Sadly, the latter variant is the expected answer.
>>>
>>> Ingo
>>
>> And at this point, I can't tell.  This reboot was from a cold start,
>> without the argument, and cold by long enough to make the rounds about the
>> house and pick up a beer, but not take my evening pillbox.  A minute cold,
>> maybe 2 max. The log is clean since except for a kudzu nag of some sort:
>
>..
>
>Just to muddy your observations:  it is quite possible that a cold
> (power-off) reboot may be required to properly observe what happens here.
>
Precisely why I've now done that twice, without using the extra argument.  No 
recurrence dammit.

>Cheers



-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
He who makes a beast of himself gets rid of the pain of being a man.
-- Dr. Johnson
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: issue with patch "x86: no CPA on iounmap"

2008-02-04 Thread Arjan van de Ven

Siddha, Suresh B wrote:

This is wrt to x86 git commit f56d005d30342a45d8af2b75e82200f09600
"x86: no CPA on iounmap"

This can use performance issue. When a GART driver unmaps a RAM page,
which was mapped as UC, this commit will still retain UC attribute
on the kernel identity mapping. This can cause mysterious performance issue
if this freed page gets used by kernel later.

For now we should change the attribute during iounmap and in future PAT
infrastructure will have necessary hooks to avoid the aliasing issues.



this is a hard one; because the flipside is someone ioremapping the same page 
twice
(which is not impossible if there's 2 1k bars)... and then one of them gets 
iounmapped,
which would turn the page cachable again.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] enclosure: add support for enclosure services

2008-02-04 Thread James Bottomley
On Mon, 2008-02-04 at 19:28 -0800, Luben Tuikov wrote:
> --- On Mon, 2/4/08, James Bottomley <[EMAIL PROTECTED]> wrote:
> > On Mon, 2008-02-04 at 18:01 -0800, Luben Tuikov wrote:
> > > --- On Mon, 2/4/08, James Bottomley
> > <[EMAIL PROTECTED]> wrote:
> > > > > > The enclosure misc device is really
> > just a
> > > > library providing
> > > > > > sysfs
> > > > > > support for physical enclosure devices
> > and their
> > > > > > components.
> > > > > 
> > > > > Who is the target audience/user of those
> > facilities?
> > > > > a) The kernel itself needing to read/write
> > SES pages?
> > > > 
> > > > That depends on the enclosure integration, but
> > right at the
> > > > moment, it
> > > > doesn't
> > > 
> > > Yes, I didn't suspect so.
> > > 
> > > > 
> > > > > b) A user space application using sysfs to
> > read/write
> > > > >SES pages?
> > > > 
> > > > Not an application so much as a user.  The idea
> > of sysfs is
> > > > to allow
> > > > users to get and set the information in addition
> > to
> > > > applications.
> > > 
> > > Exactly the same argument stands for a user-space
> > > application with a user-space library.
> > > 
> > > This is the classical case of where it is better to
> > > do this in user-space as opposed to the kernel.
> > > 
> > > The kernel provides capability to access the SES
> > > device.  The user space application and library
> > > provide interpretation and control.  Thus if the
> > > enclosure were upgraded, one doesn't need to
> > > upgrade their kernel in order to utilize the new
> > > capabilities of the SES device.  Plus upgrading
> > > a user-space application is a lot easier than
> > > the kernel (and no reboot necessary).
> > 
> > The implementation is modular, so it's remove and
> > insert ...
> 
> I guess the same could be said for STGT and SCST, right?

You mean both of their kernel pieces are modular?  That's correct.

> LOL, no seriously, this is unnecessary kernel bloat,
> or rather at the wrong place (see below).
> 
> > 
> > > Consider another thing: vendors would really like
> > > unprecedented access to the SES device in the
> > enclosure
> > > so as your ses/enclosure code keeps state it would
> > > get out of sync when vendor user-space enclosure
> > > applications access (and modify) the SES device's
> > > pages.
> > 
> > The state model doesn't assume nothing else will alter
> > the state.
> 
> But it would be trivial exercise to show that an
> inconsistent state can be had by modifying pages
> of the SES device directly from userspace bypassing
> your implementation.

I don't think so ... if you actually look at the code, you'll see it
doesn't really have persistent state for the enclosure.

> > > You can test this yourself: submit a patch
> > > that removes SES /dev/sgX support; advertise your
> > > ses/class solution and watch the fun.
> > > 
> > > > > At the moment SES device management is done
> > via
> > > > > an application (user-space) and a user-space
> > library
> > > > > used by the application and /dev/sgX to send
> > SCSI
> > > > > commands to the SES device.
> > > > 
> > > > I must have missed that when I was looking for
> > > > implementations; what's
> > > > the URL?
> > > 
> > > I'm not aware of any GPLed ones.  That doesn't
> > > necessarily mean that the best course of action is
> > > to bloat the kernel.  You can move your ses/enclosure
> > > stuff to a user space application library
> > > and thus start a GPLed one.
> > 
> > Certainly ... patches welcome.
> 
> I've non at the moment, plus I don't think you'd be
> the point of contact for a user-space SES library.
> Unless of course you've already started something up
> on sourceforge.
> 
> Really, such an effort already exists: it is called
> sg_ses(8).
> 
> > 
> > > > But, if we have non-scsi enclosures to integrate,
> > that
> > > > makes it harder
> > > > for a user application because it has to know all
> > the
> > > > implementations.
> > > 
> > > So does the kernel.  And as I pointed out above, it
> > > is a lot easier to upgrade a user-space application
> > and
> > > library than it is to upgrade a new kernel and having
> > > to reboot the computer to run the new kernel.
> > 
> > No, think again ... it's easy for SES based enclosures
> > because they have
> > a SCSI transport.  We have no transport for SGPIO based
> > enclosures nor
> > for any of the other more esoteric ones.
> 
> Yes, for which the transport layer, implements the
> scsi device node for the SES device.  It doesn't really
> matter if the SCSI commands sent to the SES device go
> over SGPIO or FC or SAS or Bluetooth or I2C, etc, the
> transport layer can implement that and present the
> /dev/sgX node.

But it does matter if the enclosure device doesn't speak SCSI.  SGPIO
isn't a SCSI protocol ... it's a general purpose serial bus protocol.
It's pretty simple and register based, but it might (or might not) be
accessible via a SCSI bridge.

> Case in point: the protocol FW running on the ASIC
> provides this 

Re: CPU hotplug and IRQ affinity with 2.6.24-rt1

2008-02-04 Thread Max Krasnyansky


Daniel Walker wrote:
> On Mon, Feb 04, 2008 at 03:35:13PM -0800, Max Krasnyanskiy wrote:
>> This is just an FYI. As part of the "Isolated CPU extensions" thread Daniel 
>> suggest for me
>> to check out latest RT kernels. So I did or at least tried to and 
>> immediately spotted a couple
>> of issues.
>>
>> The machine I'm running it on is:
>>  HP xw9300, Dual Opteron, NUMA
>>
>> It looks like with -rt kernel IRQ affinity masks are ignored on that 
>> system. ie I write 1 to lets say /proc/irq/23/smp_affinity but the 
>> interrupts keep coming to CPU1. Vanilla 2.6.24 does not have that issue.
> 
> I tried this, and it works according to /proc/interrupts .. Are you
> looking at the interrupt threads affinity ?
Nope. I'm looking at the /proc/interrupts. ie The interrupt count keeps 
incrementing for cpu1 even
though affinity mask is set to 1.

IRQ thread affinity was btw set to 3 which is probably wrong.
To clarify, by default after reboot:
- IRQ affinity set 3, IRQ thread affinity set to 3
- User writes 1 into /proc/irq/N/smp_affinity
- IRQ affinity is now set to 1, IRQ thread affinity is still set to 3

It'd still work I guess but does not seem right. Ideally IRQ thread affinity 
should have change as well.
We could of course just have some user-space tool that adjusts both.

Looks like Greg already replied to the cpu hotplug issue. For me it did not 
oops. Just got stuck probably
because it could not move an IRQ due to broken IRQ affinity logic.

Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [git pull] x86 arch updates for v2.6.25

2008-02-04 Thread Phil Oester
On Mon, Feb 04, 2008 at 07:27:53PM -0800, Linus Torvalds wrote:
> kgdb? Not so interesting. We have many more hard problems happening at 
> user sites, not in developer hands.

FWIW, I'm not a fulltime developer by any means, but on occasion
I have fixed a few bugs in the netfilter area of the kernel.
And in almost all cases, I used kgdb in my debugging and testing
of fixes.  

In doing so, it was a bit of a PITA to find/patch kgdb into the
kernel, and having it as a configurable option would have saved
me some time and effort and made the process much smoother.

So perhaps someone else out there would find it similarly useful,
and the extra time it takes to find/patch/compile kgdb in is
precluding them from participating?  Why would we ever want to do
that?

Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] badness() dramatically overcounts memory

2008-02-04 Thread Balbir Singh
Jeff Davis wrote:
> In oom_kill.c, one of the badness calculations is wildly inaccurate. If
> memory is shared among child processes, that same memory will be counted
> for each child, effectively multiplying the memory penalty by N, where N
> is the number of children.
> 
> This makes it almost certain that the parent will always be chosen as
> the victim of the OOM killer (assuming any substantial amount memory
> shared among the children), even if the parent and children are well
> behaved and have a reasonable and unchanging VM size.
> 
> Usually this does not actually alleviate the memory pressure because the
> truly bad process is completely unrelated; and the OOM killer must later
> kill the truly bad process.
> 
> This trivial patch corrects the calculation so that it does not count a
> child's shared memory against the parent.
> 

Hi, Jeff,

1. grep on the kernel source tells me that shared_vm is incremented only in
   vm_stat_account(), which is a NO-OP if CONFIG_PROC_FS is not defined.
2. How have you tested these patches? One way to do it would be to use the
   memory controller and set a small limit on the control group. A memory
   intensive application will soon see an OOM.

I do need to look at OOM kill sanity, my colleagues using the memory controller
have reported wrong actions taken by the OOM killer, but I am yet to analyze 
them.

The interesting thing is the use of total_vm and not the RSS which is used as
the basis by the OOM killer. I need to read/understand the code a bit more.

-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-04 Thread Max Krasnyansky


Paul Jackson wrote:
> Max K wrote:
>>> And for another thing, we already declare externs in cpumask.h for
>>> the other, more widely used, cpu_*_map variables cpu_possible_map,
>>> cpu_online_map, and cpu_present_map.
>> Well, to address #2 and #3 isolated map will need to be exported as well.
>> Those other maps do not really have much to do with the scheduler code.
>> That's why I think either kernel/cpumask.c or kernel/cpu.c is a better place 
>> for them.
> 
> Well, if you have need it to be exported for #2 or #3, then that's ok
> by me - export it.
> 
> I'm unaware of any kernel/cpumask.c.  If you meant lib/cpumask.c, then
> I'd prefer you not put it there, as lib/cpumask.c just contains the
> implementation details of the abstract data type cpumask_t, not any of
> its uses.  If you mean kernel/cpuset.c, then that's not a good choice
> either, as that just contains the implementation details of the cpuset
> subsystem.  You should usually define such things in one of the files
> using it, and unless there is clearly a -better- place to move the
> definition, it's usually better to just leave it where it is.

I was thinking of creating the new file kernel/cpumask.c. But it probably does 
not make sense 
just for the masks. I'm now thinking kernel/cpu.c is the best place for it. It 
contains all 
the cpu hotplug logic that deals with those maps at the very top it has stuff 
like

/* Serializes the updates to cpu_online_map, cpu_present_map */
static DEFINE_MUTEX(cpu_add_remove_lock);

So it seems to make sense to keep the maps in there.

Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Whine about suspicious return values from module's ->init() hook

2008-02-04 Thread Andrew Morton
On Tue, 5 Feb 2008 14:43:31 +1100 Rusty Russell <[EMAIL PROTECTED]> wrote:

> On Tuesday 05 February 2008 02:42:15 Alexey Dobriyan wrote:
> > One head-scratching session could be noticeably shorter with this patch...
> >
> > Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]>
> 
> If we want to prevent > 0 returns, let's just BUG_ON().
> 

That risks killing previously-working setups.  WARN_ON is sufficient.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: T61P sound issue

2008-02-04 Thread Andrew Morton
On Mon, 4 Feb 2008 19:40:38 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote:

> > Here's lspci |grep -i audio:
> > Intel Corporation 82801H (ICH8 Family) HD Audio Controller (rev 03)
> > 
> > Running linux-2.6.24 stable. If there's a know bug, I could try to dig
> > more on it and get more info.
> 
> It works for me.  http://userweb.kernel.org/~akpm/config-t61p.txt

Well it sort-of works.  I started kde (fc8 install) and ran

play /usr/share/sounds/KDE_Close_Window.wav

and a prompt appeared titled "Error - artsmessage" with content "Sound
server fatal error: cpu overload, aborting".

After OKing that, the `play' command still works.

What could cause such a thing??
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Whine about suspicious return values from module's ->init() hook

2008-02-04 Thread Rusty Russell
On Tuesday 05 February 2008 02:42:15 Alexey Dobriyan wrote:
> One head-scratching session could be noticeably shorter with this patch...
>
> Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]>

If we want to prevent > 0 returns, let's just BUG_ON().

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/5] typesafe: TIMER_INITIALIZER and setup_timer

2008-02-04 Thread Rusty Russell
On Tuesday 05 February 2008 01:57:37 Al Viro wrote:
> On Mon, Feb 04, 2008 at 11:19:44PM +1100, Rusty Russell wrote:
> > This patch lets timer callback functions have their natural type
> > (ie. exactly match the data pointer type); it allows the old "unsigned
> > long data" type as well.
> >
> > Downside: if you use the old "unsigned long" callback type, you won't
> > get a warning if your data is not an unsigned long, due to the cast.
>
> No.  There's much saner way to do that and it does not involve any gccisms
> at all.  I'd posted such patches quite a while ago; normal C constructs
> are quite sufficient, TYVM.

If you're referring to your 1 Dec 2006 posting, it was enlightening, but I was 
unable to find any patches.

I'd be interested in seeing what you ended up with; I agree it'd be nice to 
kill that unsigned long.

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: T61P sound issue

2008-02-04 Thread Andrew Morton
On Sun, 27 Jan 2008 08:49:29 -0500 "Felipe Balbi" <[EMAIL PROTECTED]> wrote:

If a bug report fell in a forest, would ...

> Hi all,
> 
> Could anyone make T61P's ICH8 sound controller to work properly?

ooh, a fellow t61p owner.  How's suspend and resume working?

> Here's lspci |grep -i audio:
> Intel Corporation 82801H (ICH8 Family) HD Audio Controller (rev 03)
> 
> Running linux-2.6.24 stable. If there's a know bug, I could try to dig
> more on it and get more info.

It works for me.  http://userweb.kernel.org/~akpm/config-t61p.txt

> Attached is my .config and dmesg output

If it's still busted please cc the "SOUND" developers (from ./MAINTAINERS)
on the reply, thanks.


> Call Trace:
>[] __report_bad_irq+0x30/0x72
>  [] note_interrupt+0x22a/0x26b
>  [] handle_fasteoi_irq+0xa9/0xd0
>  [] do_IRQ+0x6c/0xd5
>  [] ret_from_intr+0x0/0xa
>  [] lapic_next_event+0x0/0xa
>  [] __do_softirq+0x5a/0xce
>  [] tick_program_event+0x31/0x4d
>  [] call_softirq+0x1c/0x28
>  [] do_softirq+0x2c/0x7d
>  [] irq_exit+0x3f/0x84
>  [] smp_apic_timer_interrupt+0x3f/0x53
>  [] apic_timer_interrupt+0x66/0x70
>   
> handlers:
> [] (usb_hcd_irq+0x0/0x52)
> Disabling IRQ #19
> 

(everyone gets this btw - some weird bluetooth-vs-usb thing which we don't
know how to fix).

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CPU hotplug and IRQ affinity with 2.6.24-rt1

2008-02-04 Thread Gregory Haskins
Hi Daniel,

  See inline...

>>> On Mon, Feb 4, 2008 at  9:51 PM, in message
<[EMAIL PROTECTED]>, Daniel Walker
<[EMAIL PROTECTED]> wrote: 
> On Mon, Feb 04, 2008 at 03:35:13PM -0800, Max Krasnyanskiy wrote:
>> This is just an FYI. As part of the "Isolated CPU extensions" thread Daniel 
> suggest for me
>> to check out latest RT kernels. So I did or at least tried to and 
> immediately spotted a couple
>> of issues.
>>
>> The machine I'm running it on is:
>>  HP xw9300, Dual Opteron, NUMA
>>
>> It looks like with -rt kernel IRQ affinity masks are ignored on that 
>> system. ie I write 1 to lets say /proc/irq/23/smp_affinity but the 
>> interrupts keep coming to CPU1. Vanilla 2.6.24 does not have that issue.
> 
> I tried this, and it works according to /proc/interrupts .. Are you
> looking at the interrupt threads affinity?
> 
>> Also the first thing I tried was to bring CPU1 off-line. Thats the fastest 
>> way to get irqs, soft-irqs, timers, etc of a CPU. But the box hung 
>> completely. It also managed to mess up my ext3 filesystem to the point 
>> where it required manual fsck (have not see that for a couple of
>> years now). I tried the same thing (ie echo 0 > 
>> /sys/devices/cpu/cpu1/online) from the console. It hang again with the 
>> message that looked something like:
>>  CPU1 is now off-line
>>  Thread IRQ-23 is on CPU1 ...
> 
> I get the following when I tried it,
> 
> BUG: sleeping function called from invalid context bash(5126) at
> kernel/rtmutex.c:638
> in_atomic():1 [0001], irqs_disabled():1
> Pid: 5126, comm: bash Not tainted 2.6.24-rt1 #1
>  [] show_trace_log_lvl+0x1d/0x3a
>  [] show_trace+0x12/0x14
>  [] dump_stack+0x6c/0x72
>  [] __might_sleep+0xe8/0xef
>  [] __rt_spin_lock+0x24/0x59
>  [] rt_spin_lock+0x8/0xa
>  [] kfree+0x2c/0x8d

Doh!  This is my bug.  Ill have to come up with a good way to free that memory 
under atomic, or do this another way.  Stay tuned.

>  [] rq_attach_root+0x67/0xba
>  [] cpu_attach_domain+0x2b6/0x2f7
>  [] detach_destroy_domains+0x23/0x37
>  [] update_sched_domains+0x2d/0x40
>  [] notifier_call_chain+0x2b/0x55
>  [] __raw_notifier_call_chain+0x19/0x1e
>  [] _cpu_down+0x84/0x24c
>  [] cpu_down+0x28/0x3a
>  [] store_online+0x27/0x5a
>  [] sysdev_store+0x20/0x25
>  [] sysfs_write_file+0xad/0xde
>  [] vfs_write+0x82/0xb8
>  [] sys_write+0x3d/0x61
>  [] sysenter_past_esp+0x5f/0x85
>  ===
> ---
> | preempt count: 0001 ]
> | 1-level deep critical section nesting:
> 
> .. []  __spin_lock_irqsave+0x14/0x3b
> .[] ..   ( <= rq_attach_root+0x12/0xba)
> 
> Which is clearly a problem .. 
> 
> (I added linux-rt-users to the CC)
> 
> Daniel
> -
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [git pull] x86 arch updates for v2.6.25

2008-02-04 Thread Linus Torvalds


On Tue, 5 Feb 2008, Maxim Levitsky wrote:
> 
> The x86 tree was merged several times, but I don't see kgdb included in 
> latest mainline -git.
> 
> So just one question, will it be included or no?

I won't even consider pulling it unless it's offered as a separate tree, 
not mixed up with other things. At that point I can give a look.

That said, I explained to Ingo why I'm not particularly interested in it. 
I don't think that "developer-centric" debugging is really even remotely 
our problem, and that I'm personally a lot more interested in 
infrastructure that helps normal users give better bug-reports. And kgdb 
isn't even _remotely_ it.

So I'd merge a patch that puts oops information (or the whole console 
printout) in the Intel management stuff in a heartbeat. That code is 
likely much grottier than any kgdb thing will ever be (Intel really 
screwed up the interface and made it some insane XML thing), but it's also 
fundamentally more important - if it means that normal users can give oops 
reports after they happened in X (or, these days, probably more commonly 
during suspend/resume) and the machine just died.

kgdb? Not so interesting. We have many more hard problems happening at 
user sites, not in developer hands.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] enclosure: add support for enclosure services

2008-02-04 Thread Luben Tuikov
--- On Mon, 2/4/08, James Bottomley <[EMAIL PROTECTED]> wrote:
> On Mon, 2008-02-04 at 18:01 -0800, Luben Tuikov wrote:
> > --- On Mon, 2/4/08, James Bottomley
> <[EMAIL PROTECTED]> wrote:
> > > > > The enclosure misc device is really
> just a
> > > library providing
> > > > > sysfs
> > > > > support for physical enclosure devices
> and their
> > > > > components.
> > > > 
> > > > Who is the target audience/user of those
> facilities?
> > > > a) The kernel itself needing to read/write
> SES pages?
> > > 
> > > That depends on the enclosure integration, but
> right at the
> > > moment, it
> > > doesn't
> > 
> > Yes, I didn't suspect so.
> > 
> > > 
> > > > b) A user space application using sysfs to
> read/write
> > > >SES pages?
> > > 
> > > Not an application so much as a user.  The idea
> of sysfs is
> > > to allow
> > > users to get and set the information in addition
> to
> > > applications.
> > 
> > Exactly the same argument stands for a user-space
> > application with a user-space library.
> > 
> > This is the classical case of where it is better to
> > do this in user-space as opposed to the kernel.
> > 
> > The kernel provides capability to access the SES
> > device.  The user space application and library
> > provide interpretation and control.  Thus if the
> > enclosure were upgraded, one doesn't need to
> > upgrade their kernel in order to utilize the new
> > capabilities of the SES device.  Plus upgrading
> > a user-space application is a lot easier than
> > the kernel (and no reboot necessary).
> 
> The implementation is modular, so it's remove and
> insert ...

I guess the same could be said for STGT and SCST, right?

LOL, no seriously, this is unnecessary kernel bloat,
or rather at the wrong place (see below).

> 
> > Consider another thing: vendors would really like
> > unprecedented access to the SES device in the
> enclosure
> > so as your ses/enclosure code keeps state it would
> > get out of sync when vendor user-space enclosure
> > applications access (and modify) the SES device's
> > pages.
> 
> The state model doesn't assume nothing else will alter
> the state.

But it would be trivial exercise to show that an
inconsistent state can be had by modifying pages
of the SES device directly from userspace bypassing
your implementation.

> 
> > You can test this yourself: submit a patch
> > that removes SES /dev/sgX support; advertise your
> > ses/class solution and watch the fun.
> > 
> > > > At the moment SES device management is done
> via
> > > > an application (user-space) and a user-space
> library
> > > > used by the application and /dev/sgX to send
> SCSI
> > > > commands to the SES device.
> > > 
> > > I must have missed that when I was looking for
> > > implementations; what's
> > > the URL?
> > 
> > I'm not aware of any GPLed ones.  That doesn't
> > necessarily mean that the best course of action is
> > to bloat the kernel.  You can move your ses/enclosure
> > stuff to a user space application library
> > and thus start a GPLed one.
> 
> Certainly ... patches welcome.

I've non at the moment, plus I don't think you'd be
the point of contact for a user-space SES library.
Unless of course you've already started something up
on sourceforge.

Really, such an effort already exists: it is called
sg_ses(8).

> 
> > > But, if we have non-scsi enclosures to integrate,
> that
> > > makes it harder
> > > for a user application because it has to know all
> the
> > > implementations.
> > 
> > So does the kernel.  And as I pointed out above, it
> > is a lot easier to upgrade a user-space application
> and
> > library than it is to upgrade a new kernel and having
> > to reboot the computer to run the new kernel.
> 
> No, think again ... it's easy for SES based enclosures
> because they have
> a SCSI transport.  We have no transport for SGPIO based
> enclosures nor
> for any of the other more esoteric ones.

Yes, for which the transport layer, implements the
scsi device node for the SES device.  It doesn't really
matter if the SCSI commands sent to the SES device go
over SGPIO or FC or SAS or Bluetooth or I2C, etc, the
transport layer can implement that and present the
/dev/sgX node.

Case in point: the protocol FW running on the ASIC
provides this capability so really the LLDD would
only see a the pure SCSI SES or processor device and
register that with the kernel.  At which point no new
kernel bloat is required.

Your code doesn't quite do that at the moment as it
actually goes further in to read and present SES pages.
Ideally it would simply provide capability for transport
layers to register a SCSI device of type SES, or processor.

Architecturally, the LLDD/transport layer would register
the SGPIO device on one end with the SGPIO layer and on
the other end as a SCSI SES/processpr device.  After that
sg_ses(8) or sglib, fits the bill for user space applications.

> That's not to say it can't be done, but it does
> mean that it can't be
> completely userspace.

See previous paragraph.

> 

Re: [2.6.24-mm1] TCP/IPv6 connect() oopses at twothirdsMD4Transform()

2008-02-04 Thread Tetsuo Handa
Hello.

> random: revert braindamage that snuck into checkpatch cleanup
> 
> Signed-off-by: Matt Mackall <[EMAIL PROTECTED]>

Yes. It solved the oops.

Thank you.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CPU hotplug and IRQ affinity with 2.6.24-rt1

2008-02-04 Thread Daniel Walker
On Mon, Feb 04, 2008 at 03:35:13PM -0800, Max Krasnyanskiy wrote:
> This is just an FYI. As part of the "Isolated CPU extensions" thread Daniel 
> suggest for me
> to check out latest RT kernels. So I did or at least tried to and immediately 
> spotted a couple
> of issues.
>
> The machine I'm running it on is:
>   HP xw9300, Dual Opteron, NUMA
>
> It looks like with -rt kernel IRQ affinity masks are ignored on that 
> system. ie I write 1 to lets say /proc/irq/23/smp_affinity but the 
> interrupts keep coming to CPU1. Vanilla 2.6.24 does not have that issue.

I tried this, and it works according to /proc/interrupts .. Are you
looking at the interrupt threads affinity?

> Also the first thing I tried was to bring CPU1 off-line. Thats the fastest 
> way to get irqs, soft-irqs, timers, etc of a CPU. But the box hung 
> completely. It also managed to mess up my ext3 filesystem to the point 
> where it required manual fsck (have not see that for a couple of
> years now). I tried the same thing (ie echo 0 > 
> /sys/devices/cpu/cpu1/online) from the console. It hang again with the 
> message that looked something like:
>   CPU1 is now off-line
>   Thread IRQ-23 is on CPU1 ...

I get the following when I tried it,

BUG: sleeping function called from invalid context bash(5126) at
kernel/rtmutex.c:638
in_atomic():1 [0001], irqs_disabled():1
Pid: 5126, comm: bash Not tainted 2.6.24-rt1 #1
 [] show_trace_log_lvl+0x1d/0x3a
 [] show_trace+0x12/0x14
 [] dump_stack+0x6c/0x72
 [] __might_sleep+0xe8/0xef
 [] __rt_spin_lock+0x24/0x59
 [] rt_spin_lock+0x8/0xa
 [] kfree+0x2c/0x8d
 [] rq_attach_root+0x67/0xba
 [] cpu_attach_domain+0x2b6/0x2f7
 [] detach_destroy_domains+0x23/0x37
 [] update_sched_domains+0x2d/0x40
 [] notifier_call_chain+0x2b/0x55
 [] __raw_notifier_call_chain+0x19/0x1e
 [] _cpu_down+0x84/0x24c
 [] cpu_down+0x28/0x3a
 [] store_online+0x27/0x5a
 [] sysdev_store+0x20/0x25
 [] sysfs_write_file+0xad/0xde
 [] vfs_write+0x82/0xb8
 [] sys_write+0x3d/0x61
 [] sysenter_past_esp+0x5f/0x85
 ===
---
| preempt count: 0001 ]
| 1-level deep critical section nesting:

.. []  __spin_lock_irqsave+0x14/0x3b
.[] ..   ( <= rq_attach_root+0x12/0xba)

Which is clearly a problem .. 

(I added linux-rt-users to the CC)

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86: dubious one-bit signed bitfield in cpuidle.h

2008-02-04 Thread Harvey Harrison
This is a minimal, stupid, fix, move to unsigned bitfield.  These errors
are hiding sparse warnings in any file that includes itbelow is a
sample.

  CHECK   arch/x86/kernel/acpi/cstate.c
include/linux/cpuidle.h:82:17: error: dubious one-bit signed bitfield
  CHECK   arch/x86/kernel/acpi/processor.c
include/linux/cpuidle.h:82:17: error: dubious one-bit signed bitfield
  CHECK   arch/x86/kernel/cpu/cpufreq/powernow-k7.c
include/linux/cpuidle.h:82:17: error: dubious one-bit signed bitfield
  CHECK   arch/x86/kernel/cpu/cpufreq/powernow-k8.c
include/linux/cpuidle.h:82:17: error: dubious one-bit signed bitfield
  CHECK   arch/x86/kernel/cpu/cpufreq/longhaul.c
include/linux/cpuidle.h:82:17: error: dubious one-bit signed bitfield
  CHECK   arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
include/linux/cpuidle.h:82:17: error: dubious one-bit signed bitfield

Signed-off-by: Harvey Harrison <[EMAIL PROTECTED]>
---
 include/linux/cpuidle.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index c4e0016..b0fd85a 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -79,7 +79,7 @@ struct cpuidle_state_kobj {
 };
 
 struct cpuidle_device {
-   int enabled:1;
+   unsigned intenabled:1;
unsigned intcpu;
 
int last_residency;
-- 
1.5.4.rc5.1138.g2602



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Integrating cpusets and cpu isolation [was Re: [CPUISOL] CPU isolation extensions]

2008-02-04 Thread Paul Jackson
Max K wrote:
> > And for another thing, we already declare externs in cpumask.h for
> > the other, more widely used, cpu_*_map variables cpu_possible_map,
> > cpu_online_map, and cpu_present_map.
> Well, to address #2 and #3 isolated map will need to be exported as well.
> Those other maps do not really have much to do with the scheduler code.
> That's why I think either kernel/cpumask.c or kernel/cpu.c is a better place 
> for them.

Well, if you have need it to be exported for #2 or #3, then that's ok
by me - export it.

I'm unaware of any kernel/cpumask.c.  If you meant lib/cpumask.c, then
I'd prefer you not put it there, as lib/cpumask.c just contains the
implementation details of the abstract data type cpumask_t, not any of
its uses.  If you mean kernel/cpuset.c, then that's not a good choice
either, as that just contains the implementation details of the cpuset
subsystem.  You should usually define such things in one of the files
using it, and unless there is clearly a -better- place to move the
definition, it's usually better to just leave it where it is.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.940.382.4214
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86: fix sparse warnings in powernow-k8.c

2008-02-04 Thread Harvey Harrison
sparse errors from include/linux/cpuidle.h currently are hiding these:

arch/x86/kernel/cpu/cpufreq/powernow-k8.c:830:7: warning: symbol 'hi' shadows 
an earlier one
arch/x86/kernel/cpu/cpufreq/powernow-k8.c:824:6: originally declared here
arch/x86/kernel/cpu/cpufreq/powernow-k8.c:830:15: warning: symbol 'lo' shadows 
an earlier one
arch/x86/kernel/cpu/cpufreq/powernow-k8.c:824:14: originally declared here

Signed-off-by: Harvey Harrison <[EMAIL PROTECTED]>
---
 arch/x86/kernel/cpu/cpufreq/powernow-k8.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/cpufreq/powernow-k8.c 
b/arch/x86/kernel/cpu/cpufreq/powernow-k8.c
index a052273..5affe91 100644
--- a/arch/x86/kernel/cpu/cpufreq/powernow-k8.c
+++ b/arch/x86/kernel/cpu/cpufreq/powernow-k8.c
@@ -827,7 +827,6 @@ static int fill_powernow_table_pstate(struct 
powernow_k8_data *data, struct cpuf
 
for (i = 0; i < data->acpi_data.state_count; i++) {
u32 index;
-   u32 hi = 0, lo = 0;
 
index = data->acpi_data.states[i].control & HW_PSTATE_MASK;
if (index > data->max_hw_pstate) {
-- 
1.5.4.rc5.1138.g2602



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [git pull] x86 arch updates for v2.6.25

2008-02-04 Thread Maxim Levitsky
On Wednesday, 30 January 2008 03:15:50 Ingo Molnar wrote:
> 
> Linus, please pull the latest x86 git tree from:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git
> 
> Find the shortlog attached below.
> 
> Most of the changes we have described here:
> 
> http://lkml.org/lkml/2008/1/21/230
> 
> It's not a small merge, it consists of 908 commits from 96 individual 
> arch/x86 developers (!):
> 
> 671 files changed, 42791 insertions(+), 38967 deletions(-)
> 
> so here are a few highlevel comments as well, in addition to the 
> shortlog:
> 
> - a number of core files are changed as well: most notably percpu,
>   debugging details, timers, the firewire remote debugging patch and ...
>   the KGDB remote debugging stub in kernel/kgdb.c.
> 
> - we tested KGDB to be merge-worthy within the x86 architecture (the 
>   only supported architecture for now) and it's better to have 
>   kernel/kgdb.c than arch/x86/kernel/kgdb.c. The code is reasonably 
>   clean and the user-space exposure is small - the only real exposure is 
>   the decades-old remote GDB protocol. We are happy to fix up any 
>   further cleanliness comments that people might have - but we really 
>   wanted to start somewhere and get this thing moving. As an added 
>   bonus: finally a kernel debugger that can be read without puking too
>   much ;-) [anyone remember KDB?]
> 

The x86 tree was merged several times, but I don't see kgdb included in 
latest mainline -git.

So just one question, will it be included or no?

Best regards,
Maxim Levitsky
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86: fix sparse error in traps_32.c

2008-02-04 Thread Harvey Harrison
arch/x86/kernel/traps_32.c:1193:31: error: dubious one-bit signed bitfield

This was being used to ensure the proper alignment of the FXSAVE/FXRSTOR data.
This would create a sparse error in the _correct_ cases, hiding further
warnings.  Use BUILD_BUG_ON instead.

Signed-off-by: Harvey Harrison <[EMAIL PROTECTED]>
---
 arch/x86/kernel/traps_32.c |   15 +--
 1 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/traps_32.c b/arch/x86/kernel/traps_32.c
index 2f94f69..47ca425 100644
--- a/arch/x86/kernel/traps_32.c
+++ b/arch/x86/kernel/traps_32.c
@@ -1182,17 +1182,12 @@ void __init trap_init(void)
 #endif
set_trap_gate(19,_coprocessor_error);
 
+   /*
+* Verify that the FXSAVE/FXRSTOR data will be 16-byte aligned.
+* Generate a build-time error if the alignment is wrong.
+*/
+   BUILD_BUG_ON(offsetof(struct task_struct, thread.i387.fxsave) & 15);
if (cpu_has_fxsr) {
-   /*
-* Verify that the FXSAVE/FXRSTOR data will be 16-byte aligned.
-* Generates a compile-time "error: zero width for bit-field" if
-* the alignment is wrong.
-*/
-   struct fxsrAlignAssert {
-   int _:!(offsetof(struct task_struct,
-   thread.i387.fxsave) & 15);
-   };
-
printk(KERN_INFO "Enabling fast FPU save and restore... ");
set_in_cr4(X86_CR4_OSFXSR);
printk("done.\n");
-- 
1.5.4.rc5.1138.g2602



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.24-mm1] TCP/IPv6 connect() oopses at twothirdsMD4Transform()

2008-02-04 Thread Matt Mackall

On Mon, 2008-02-04 at 17:36 -0800, Andrew Morton wrote:
> On Tue, 05 Feb 2008 10:28:43 +0900 Tetsuo Handa <[EMAIL PROTECTED]> wrote:
> 
> > Hello.
> > 
> > Kernel config is at http://I-love.SAKURA.ne.jp/tmp/config-2.6.24-mm1
> > 
> > 2.6.24 works fine.

> err, Matt?

random: revert braindamage that snuck into checkpatch cleanup

Signed-off-by: Matt Mackall <[EMAIL PROTECTED]>

diff -r 50a6e531a9f2 drivers/char/random.c
--- a/drivers/char/random.c Mon Feb 04 20:23:02 2008 -0600
+++ b/drivers/char/random.c Mon Feb 04 20:28:08 2008 -0600
@@ -1306,7 +1306,7 @@
  * Rotation is separate from addition to prevent recomputation
  */
 #define ROUND(f, a, b, c, d, x, s) \
-   (a += f(b, c, d) + in[x], a = (a << s) | (a >> (32 - s)))
+   (a += f(b, c, d) + x, a = (a << s) | (a >> (32 - s)))
 #define K1 0
 #define K2 013240474631UL
 #define K3 015666365641UL

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


What id does \"current->pid\" indicate?

2008-02-04 Thread Tetsuo Handa
Hello.

I found that there are "current->pid", "task_pid_vnr(current)"
and "task_pid_nr(current)" cases in kernel 2.6.24 .

According to include/linux/pid.h ,
"task_pid_nr()" is global id and "task_xid_vnr()" is virtual id.
But what id does "current->pid" indicate?
Is "current->pid" equivalent to "task_pid_nr(current)" ?

Regards.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] enclosure: add support for enclosure services

2008-02-04 Thread James Bottomley

On Mon, 2008-02-04 at 18:01 -0800, Luben Tuikov wrote:
> --- On Mon, 2/4/08, James Bottomley <[EMAIL PROTECTED]> wrote:
> > > > The enclosure misc device is really just a
> > library providing
> > > > sysfs
> > > > support for physical enclosure devices and their
> > > > components.
> > > 
> > > Who is the target audience/user of those facilities?
> > > a) The kernel itself needing to read/write SES pages?
> > 
> > That depends on the enclosure integration, but right at the
> > moment, it
> > doesn't
> 
> Yes, I didn't suspect so.
> 
> > 
> > > b) A user space application using sysfs to read/write
> > >SES pages?
> > 
> > Not an application so much as a user.  The idea of sysfs is
> > to allow
> > users to get and set the information in addition to
> > applications.
> 
> Exactly the same argument stands for a user-space
> application with a user-space library.
> 
> This is the classical case of where it is better to
> do this in user-space as opposed to the kernel.
> 
> The kernel provides capability to access the SES
> device.  The user space application and library
> provide interpretation and control.  Thus if the
> enclosure were upgraded, one doesn't need to
> upgrade their kernel in order to utilize the new
> capabilities of the SES device.  Plus upgrading
> a user-space application is a lot easier than
> the kernel (and no reboot necessary).

The implementation is modular, so it's remove and insert ...

> Consider another thing: vendors would really like
> unprecedented access to the SES device in the enclosure
> so as your ses/enclosure code keeps state it would
> get out of sync when vendor user-space enclosure
> applications access (and modify) the SES device's
> pages.

The state model doesn't assume nothing else will alter the state.

> You can test this yourself: submit a patch
> that removes SES /dev/sgX support; advertise your
> ses/class solution and watch the fun.
> 
> > > At the moment SES device management is done via
> > > an application (user-space) and a user-space library
> > > used by the application and /dev/sgX to send SCSI
> > > commands to the SES device.
> > 
> > I must have missed that when I was looking for
> > implementations; what's
> > the URL?
> 
> I'm not aware of any GPLed ones.  That doesn't
> necessarily mean that the best course of action is
> to bloat the kernel.  You can move your ses/enclosure
> stuff to a user space application library
> and thus start a GPLed one.

Certainly ... patches welcome.

> > But, if we have non-scsi enclosures to integrate, that
> > makes it harder
> > for a user application because it has to know all the
> > implementations.
> 
> So does the kernel.  And as I pointed out above, it
> is a lot easier to upgrade a user-space application and
> library than it is to upgrade a new kernel and having
> to reboot the computer to run the new kernel.

No, think again ... it's easy for SES based enclosures because they have
a SCSI transport.  We have no transport for SGPIO based enclosures nor
for any of the other more esoteric ones.

That's not to say it can't be done, but it does mean that it can't be
completely userspace.

> > A sysfs framework on the other hand is a universal known
> > thing for the
> > user applications.
> 
> So would a user-space ses library, a la libses.so.
> 
> > > One could have a very good argument to not bloat
> > > the kernel with this but leave it to a user-space
> > > application and a library to do all this and
> > > communicate with the SES device via the kernel's
> > /dev/sgX.
> > 
> > The same thing goes for other esoteric SCSI infrastructure
> > pieces like
> > cd changers.  On the whole, given that ATA is asking for
> > enclosure
> > management in kernel, it makes sense to consolidate the
> > infrastructure
> > and a ses ULD is a very good test bed.
> 
> What is wrong with exporting the SES device as /dev/sgX
> and having a user-space application and library to
> do all this?

How do you transport the enclosure commands over /dev/sgX?  Only SES has
SCSI command encapsulation ... the rest won't even be SCSI targets ...

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Scst-devel] Integration of SCST in the mainstream Linux kernel

2008-02-04 Thread Chris Weiss
On Feb 4, 2008 11:30 AM, Douglas Gilbert <[EMAIL PROTECTED]> wrote:
> Alan Cox wrote:
> >> better. So for example, I personally suspect that ATA-over-ethernet is way
> >> better than some crazy SCSI-over-TCP crap, but I'm biased for simple and
> >> low-level, and against those crazy SCSI people to begin with.
> >
> > Current ATAoE isn't. It can't support NCQ. A variant that did NCQ and IP
> > would probably trash iSCSI for latency if nothing else.
>
> And a variant that doesn't do ATA or IP:
> http://www.fcoe.com/
>

however, and interestingly enough, the open-fcoe software target
depends on scst (for now anyway)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] enclosure: add support for enclosure services

2008-02-04 Thread Luben Tuikov
--- On Mon, 2/4/08, James Bottomley <[EMAIL PROTECTED]> wrote:
> > > The enclosure misc device is really just a
> library providing
> > > sysfs
> > > support for physical enclosure devices and their
> > > components.
> > 
> > Who is the target audience/user of those facilities?
> > a) The kernel itself needing to read/write SES pages?
> 
> That depends on the enclosure integration, but right at the
> moment, it
> doesn't

Yes, I didn't suspect so.

> 
> > b) A user space application using sysfs to read/write
> >SES pages?
> 
> Not an application so much as a user.  The idea of sysfs is
> to allow
> users to get and set the information in addition to
> applications.

Exactly the same argument stands for a user-space
application with a user-space library.

This is the classical case of where it is better to
do this in user-space as opposed to the kernel.

The kernel provides capability to access the SES
device.  The user space application and library
provide interpretation and control.  Thus if the
enclosure were upgraded, one doesn't need to
upgrade their kernel in order to utilize the new
capabilities of the SES device.  Plus upgrading
a user-space application is a lot easier than
the kernel (and no reboot necessary).

Consider another thing: vendors would really like
unprecedented access to the SES device in the enclosure
so as your ses/enclosure code keeps state it would
get out of sync when vendor user-space enclosure
applications access (and modify) the SES device's
pages.

You can test this yourself: submit a patch
that removes SES /dev/sgX support; advertise your
ses/class solution and watch the fun.

> > At the moment SES device management is done via
> > an application (user-space) and a user-space library
> > used by the application and /dev/sgX to send SCSI
> > commands to the SES device.
> 
> I must have missed that when I was looking for
> implementations; what's
> the URL?

I'm not aware of any GPLed ones.  That doesn't
necessarily mean that the best course of action is
to bloat the kernel.  You can move your ses/enclosure
stuff to a user space application library
and thus start a GPLed one.

> But, if we have non-scsi enclosures to integrate, that
> makes it harder
> for a user application because it has to know all the
> implementations.

So does the kernel.  And as I pointed out above, it
is a lot easier to upgrade a user-space application and
library than it is to upgrade a new kernel and having
to reboot the computer to run the new kernel.

> A sysfs framework on the other hand is a universal known
> thing for the
> user applications.

So would a user-space ses library, a la libses.so.

> > One could have a very good argument to not bloat
> > the kernel with this but leave it to a user-space
> > application and a library to do all this and
> > communicate with the SES device via the kernel's
> /dev/sgX.
> 
> The same thing goes for other esoteric SCSI infrastructure
> pieces like
> cd changers.  On the whole, given that ATA is asking for
> enclosure
> management in kernel, it makes sense to consolidate the
> infrastructure
> and a ses ULD is a very good test bed.

What is wrong with exporting the SES device as /dev/sgX
and having a user-space application and library to
do all this?

Luben

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: brk randomization breaks columns

2008-02-04 Thread Jiri Kosina
[ some CCs added ]

On Mon, 4 Feb 2008, Pavel Machek wrote:

> [EMAIL PROTECTED]:~$ strace columns-bin
> execve("/usr/local/bin/columns-bin", ["columns-bin"], [/* 31 vars */])
> = 0
> old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
> -1, 0) = 0xb7f78000
> mprotect(0xb7f79000, 21406, PROT_READ|PROT_WRITE|PROT_EXEC) = 0
> mprotect(0x8048000, 31345, PROT_READ|PROT_WRITE|PROT_EXEC) = 0
> stat("/etc/ld.so.cache", {st_mode=S_IFREG|0644, st_size=106939, ...})
> = 0
> open("/etc/ld.so.cache", O_RDONLY)  = 3
> old_mmap(NULL, 106939, PROT_READ, MAP_SHARED, 3, 0) = 0xb7f5d000
> close(3)= 0
> stat("/etc/ld.so.preload", 0xbf87f348)  = -1 ENOENT (No such file or
> directory)
> open("/home/pavel/lib/libc.so.5", O_RDONLY) = -1 ENOENT (No such file
> or directory)
> open("/lib/libc.so.5", O_RDONLY)= 3
> read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\240\32"...,
> 4096) = 4096
> old_mmap(NULL, 786432, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
> 0xb7e9d000
> old_mmap(0xb7e9d000, 552787, PROT_READ|PROT_EXEC,
> MAP_PRIVATE|MAP_FIXED, 3, 0) = 0xb7e9d000
> old_mmap(0xb7f24000, 21848, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_FIXED, 3, 0x86000) = 0xb7f24000
> old_mmap(0xb7f2a000, 204908, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7f2a000
> close(3)= 0
> mprotect(0xb7e9d000, 552787, PROT_READ|PROT_WRITE|PROT_EXEC) = 0
> munmap(0xb7f5d000, 106939)  = 0
> mprotect(0x8048000, 31345, PROT_READ|PROT_EXEC) = 0
> mprotect(0xb7e9d000, 552787, PROT_READ|PROT_EXEC) = 0
> mprotect(0xb7f79000, 21406, PROT_READ|PROT_EXEC) = 0
> personality(PER_LINUX)  = 4194304
> geteuid()   = 1000
> getuid()= 1000
> getgid()= 1002
> getegid()   = 1002
> brk(0x8054098)  = 0x8054098
> brk(0x8055000)  = 0x8055000
> --- SIGSEGV (Segmentation fault) @ 0 (0) ---
> +++ killed by SIGSEGV (core dumped) +++
> Process 1517 detached
> [EMAIL PROTECTED]:~$
> columns die due to
> Feb  4 12:29:32 amd kernel: columns-bin[4535]: segfault at 8052000 ip 
> b7f08a9a sp bfb79628 error 6 in
> libc.so.5.4.33[b7e99000+87000]
> Just before death, 
> [EMAIL PROTECTED]:~# cat /proc/4537/maps
> 08048000-0805 r-xp  08:04 246209 /usr/local/bin/columns-bin
> 0805-08051000 rwxp 7000 08:04 246209 /usr/local/bin/columns-bin
> 08051000-08052000 rwxp 08051000 00:00 0
> b7f0-b7f87000 r-xp  08:04 373330 /lib/libc.so.5.4.33
> b7f87000-b7f8d000 rwxp 00086000 08:04 373330 /lib/libc.so.5.4.33
> b7f8d000-b7fc rwxp b7f8d000 00:00 0
> b7fdb000-b7fdc000 rwxp b7fdb000 00:00 0
> b7fdc000-b7fe2000 r-xp  08:04 373339 /lib/ld-linux.so.1.9.11
> b7fe2000-b7fe3000 rwxp 5000 08:04 373339 /lib/ld-linux.so.1.9.11
> bface000-bfae3000 rwxp bffeb000 00:00 0  [stack]
> e000-f000 r-xp  00:00 0  [vdso]
> [EMAIL PROTECTED]:~#

Actually, this clearly shows that either prehistoric libc.so.5 or the 
program itself are broken. 

- as you can easily see by repeated invocation of your program, the 
  arguments to brk() are always the same, no matter to what offset the brk 
  start gets randomized.

- i.e. the arguments passed to brk() strace shows clearly indicate that 
  the binary (or library) is assuming that brk starts in the very next
  page after code+bss (i.e. at the page following 0x08052000). That is wrong. 
  The program then accessess unmapped memory, which causes segfault.

> ...which is strange. Columns asked for brk, but kernel assigned it no
> heap. No wonder columns are crashing.

Now, you are right that the return value from brk() is bogus in these 
cases. The patch below should make it behave, as you can easily check with 
strace, right? Does anyone have any comments regarding this patch please?

Still, it will probably not fix your particular program crashes, just 
because it will always assume that brk starts immediately after the end of 
the bss, which is plain wrong and has never been assured. Could you please 
check whether there is any compat-* package available for you 
distribution, that upgrades libc.so.5 to any fixed version?

Thanks.


From: Jiri Kosina <[EMAIL PROTECTED]>

brk: check the lower bound properly

There is a check in sys_brk(), that tries to make sure that we do not 
underflow the area that is dedicated to brk heap.

The check is however wrong, as it assumes that brk area starts immediately 
after the end of the code (+bss), which is wrong for example in 
environments with randomized brk start. The proper way is to check whether 
the address is not below the start_brk address.

Signed-off-by: Jiri Kosina <[EMAIL PROTECTED]>

diff --git a/mm/mmap.c b/mm/mmap.c
index 8295577..1c3b48f 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -241,7 +241,7 @@ asmlinkage unsigned long 

[PATCH] sata_nv: fix ATAPI issues with memory over 4GB (v7)

2008-02-04 Thread Robert Hancock
This fixes some problems with ATAPI devices on nForce4 controllers in ADMA mode
on systems with memory located above 4GB. We need to delay setting the 64-bit
DMA mask until the PRD table and padding buffer are allocated so that they don't
get allocated above 4GB and break legacy mode (which is needed for ATAPI
devices). Also, if either port is in ATAPI mode we need to set the DMA mask
for the PCI device to 32-bit to ensure that the IOMMU code properly bounces
requests above 4GB, as it appears setting the bounce limit does not guarantee
that we will not try to map requests above this point.

Reported to fix https://bugzilla.redhat.com/show_bug.cgi?id=351451

Signed-off-by: Robert Hancock <[EMAIL PROTECTED]>

--- linux-2.6.24/drivers/ata/sata_nv.c  2008-01-24 16:58:37.0 -0600
+++ linux-2.6.24edit/drivers/ata/sata_nv.c  2008-01-29 18:39:37.0 
-0600
@@ -247,6 +247,7 @@ struct nv_adma_port_priv {
void __iomem*ctl_block;
void __iomem*gen_block;
void __iomem*notifier_clear_block;
+   u64 adma_dma_mask;
u8  flags;
int last_issue_ncq;
 };
@@ -715,9 +716,10 @@ static int nv_adma_slave_config(struct s
 {
struct ata_port *ap = ata_shost_to_port(sdev->host);
struct nv_adma_port_priv *pp = ap->private_data;
+   struct nv_adma_port_priv *port0, *port1;
+   struct scsi_device *sdev0, *sdev1;
struct pci_dev *pdev = to_pci_dev(ap->host->dev);
-   u64 bounce_limit;
-   unsigned long segment_boundary;
+   unsigned long segment_boundary, flags;
unsigned short sg_tablesize;
int rc;
int adma_enable;
@@ -729,6 +731,8 @@ static int nv_adma_slave_config(struct s
/* Not a proper libata device, ignore */
return rc;
 
+   spin_lock_irqsave(ap->lock, flags);
+
if (ap->link.device[sdev->id].class == ATA_DEV_ATAPI) {
/*
 * NVIDIA reports that ADMA mode does not support ATAPI 
commands.
@@ -737,7 +741,6 @@ static int nv_adma_slave_config(struct s
 * Restrict DMA parameters as required by the legacy interface
 * when an ATAPI device is connected.
 */
-   bounce_limit = ATA_DMA_MASK;
segment_boundary = ATA_DMA_BOUNDARY;
/* Subtract 1 since an extra entry may be needed for padding, 
see
   libata-scsi.c */
@@ -748,7 +751,6 @@ static int nv_adma_slave_config(struct s
adma_enable = 0;
nv_adma_register_mode(ap);
} else {
-   bounce_limit = *ap->dev->dma_mask;
segment_boundary = NV_ADMA_DMA_BOUNDARY;
sg_tablesize = NV_ADMA_SGTBL_TOTAL_LEN;
adma_enable = 1;
@@ -774,12 +776,49 @@ static int nv_adma_slave_config(struct s
if (current_reg != new_reg)
pci_write_config_dword(pdev, NV_MCP_SATA_CFG_20, new_reg);
 
-   blk_queue_bounce_limit(sdev->request_queue, bounce_limit);
+   port0 = ap->host->ports[0]->private_data;
+   port1 = ap->host->ports[1]->private_data;
+   sdev0 = ap->host->ports[0]->link.device[0].sdev;
+   sdev1 = ap->host->ports[1]->link.device[0].sdev;
+   if ((port0->flags & NV_ADMA_ATAPI_SETUP_COMPLETE) ||
+   (port1->flags & NV_ADMA_ATAPI_SETUP_COMPLETE)) {
+   /** We have to set the DMA mask to 32-bit if either port is in
+   ATAPI mode, since they are on the same PCI device which is
+   used for DMA mapping. If we set the mask we also need to set
+   the bounce limit on both ports to ensure that the block
+   layer doesn't feed addresses that cause DMA mapping to
+   choke. If either SCSI device is not allocated yet, it's OK
+   since that port will discover its correct setting when it
+   does get allocated.
+   Note: Setting 32-bit mask should not fail. */
+   if (sdev0)
+   blk_queue_bounce_limit(sdev0->request_queue,
+  ATA_DMA_MASK);
+   if (sdev1)
+   blk_queue_bounce_limit(sdev1->request_queue,
+  ATA_DMA_MASK);
+
+   pci_set_dma_mask(pdev, ATA_DMA_MASK);
+   } else {
+   /** This shouldn't fail as it was set to this value before */
+   pci_set_dma_mask(pdev, pp->adma_dma_mask);
+   if (sdev0)
+   blk_queue_bounce_limit(sdev0->request_queue,
+  pp->adma_dma_mask);
+   if (sdev1)
+   blk_queue_bounce_limit(sdev1->request_queue,
+  pp->adma_dma_mask);
+   }
+

Re: [2.6.24-mm1] TCP/IPv6 connect() oopses at twothirdsMD4Transform()

2008-02-04 Thread Andrew Morton
On Tue, 05 Feb 2008 10:28:43 +0900 Tetsuo Handa <[EMAIL PROTECTED]> wrote:

> Hello.
> 
> Kernel config is at http://I-love.SAKURA.ne.jp/tmp/config-2.6.24-mm1
> 
> 2.6.24 works fine.

Thanks for testing and reporting.  It really helps.

> Regards.
> --
> BUG: unable to handle kernel paging request at 25476bec
> IP: [] twothirdsMD4Transform+0x78/0x37c
> *pde =  
> Oops:  [#1] SMP DEBUG_PAGEALLOC
> last sysfs file: 
> /sys/devices/pci:00/:00:10.0/host0/target0:0:1/0:0:1:0/type
> Modules linked in: nfsd lockd sunrpc exportfs pcnet32
> 
> Pid: 2148, comm: a.out Not tainted (2.6.24-mm1 #1)
> EIP: 0060:[] EFLAGS: 00010286 CPU: 0
> EIP is at twothirdsMD4Transform+0x78/0x37c
> EAX: 00084000 EBX: 0800 ECX: 8000 EDX: db45ddec
> ESI:  EDI: 52806380 EBP: db45dddc ESP: db45ddc8
>  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> Process a.out (pid: 2148, ti=db45d000 task=deaf9250 task.ti=db45d000)
> Stack: 8000 def6ef9c 6380 c0759d60 db45de1c db45de28 c0211fd2 
> 0040 
>c0759d40    0100 52806380 1f2e00ba 
> fffa249f 
>5a696b37 8dbe1970 cf7579d0 3b0cc350 a54b10a8 def6e9a0  
> def6ef8c 
> Call Trace:
>  [] ? secure_tcpv6_sequence_number+0x58/0x7a
>  [] ? tcp_v6_connect+0x46d/0x4e3
>  [] ? lock_sock_nested+0x56/0x5e
>  [] ? inet_stream_connect+0x1c/0x163
>  [] ? inet_stream_connect+0x92/0x163
>  [] ? sys_connect+0x72/0x98
>  [] ? lock_release_holdtime+0x4e/0x54
>  [] ? do_page_fault+0x1c5/0x3fc
>  [] ? __lock_release+0x4b/0x51
>  [] ? do_page_fault+0x1c5/0x3fc
>  [] ? sys_socketcall+0x6f/0x15e
>  [] ? restore_nocheck+0x12/0x15
>  [] ? syscall_call+0x7/0xb
>  ===
> Code: 31 c1 03 0c ba 8b 7a 0c 01 ce 8b 4d ec c1 c6 0b 31 d9 21 f1 31 d9 03 0c 
> ba 8b 7a 10 01 c8 8b 4d ec c1 c0 13 31 f1 21 c1 33 4d ec <03> 0c ba 8b 7a 14 
> 01 cb 89 c1 c1 c3 03 31 f1 21 d9 31 f1 03 0c 
> EIP: [] twothirdsMD4Transform+0x78/0x37c SS:ESP 0068:db45ddc8
> ---[ end trace 160518059a282c77 ]---

err, Matt?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[2.6.24-mm1] TCP/IPv6 connect() oopses at twothirdsMD4Transform()

2008-02-04 Thread Tetsuo Handa
Hello.

Kernel config is at http://I-love.SAKURA.ne.jp/tmp/config-2.6.24-mm1

2.6.24 works fine.

Regards.
--
BUG: unable to handle kernel paging request at 25476bec
IP: [] twothirdsMD4Transform+0x78/0x37c
*pde =  
Oops:  [#1] SMP DEBUG_PAGEALLOC
last sysfs file: 
/sys/devices/pci:00/:00:10.0/host0/target0:0:1/0:0:1:0/type
Modules linked in: nfsd lockd sunrpc exportfs pcnet32

Pid: 2148, comm: a.out Not tainted (2.6.24-mm1 #1)
EIP: 0060:[] EFLAGS: 00010286 CPU: 0
EIP is at twothirdsMD4Transform+0x78/0x37c
EAX: 00084000 EBX: 0800 ECX: 8000 EDX: db45ddec
ESI:  EDI: 52806380 EBP: db45dddc ESP: db45ddc8
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process a.out (pid: 2148, ti=db45d000 task=deaf9250 task.ti=db45d000)
Stack: 8000 def6ef9c 6380 c0759d60 db45de1c db45de28 c0211fd2 0040 
   c0759d40    0100 52806380 1f2e00ba fffa249f 
   5a696b37 8dbe1970 cf7579d0 3b0cc350 a54b10a8 def6e9a0  def6ef8c 
Call Trace:
 [] ? secure_tcpv6_sequence_number+0x58/0x7a
 [] ? tcp_v6_connect+0x46d/0x4e3
 [] ? lock_sock_nested+0x56/0x5e
 [] ? inet_stream_connect+0x1c/0x163
 [] ? inet_stream_connect+0x92/0x163
 [] ? sys_connect+0x72/0x98
 [] ? lock_release_holdtime+0x4e/0x54
 [] ? do_page_fault+0x1c5/0x3fc
 [] ? __lock_release+0x4b/0x51
 [] ? do_page_fault+0x1c5/0x3fc
 [] ? sys_socketcall+0x6f/0x15e
 [] ? restore_nocheck+0x12/0x15
 [] ? syscall_call+0x7/0xb
 ===
Code: 31 c1 03 0c ba 8b 7a 0c 01 ce 8b 4d ec c1 c6 0b 31 d9 21 f1 31 d9 03 0c 
ba 8b 7a 10 01 c8 8b 4d ec c1 c0 13 31 f1 21 c1 33 4d ec <03> 0c ba 8b 7a 14 01 
cb 89 c1 c1 c3 03 31 f1 21 d9 31 f1 03 0c 
EIP: [] twothirdsMD4Transform+0x78/0x37c SS:ESP 0068:db45ddc8
---[ end trace 160518059a282c77 ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: data corruption with dmcrypt/LUKS

2008-02-04 Thread Christoph Anton Mitterer
On Mon, 2008-02-04 at 10:17 +0100, Milan Broz wrote:
> Yes, so if you hit this with 2.6.24 too is very important to sent OOps
> log to identify problem (or link to screen snapshot, digital camera
> snapshot or so).
I did about 5 complete tests today and dozens of mkfs.ext3's but I
wasn't able to reproduce any of the two errors... very very strange.
(used the same sequence of commands, with and without using the
USB-stick)...
I'll do some other tests tomorrow because these problems were real and I
cannot believe, that they're simply gone...

And IMHO hardware problems are still very unlikely, or am I wrong?

Anyway,.. is there anybody who made deeper tests of dmcrypt? I mean real
massive tests perhaps with different filesystems and so on?
What are your experiences at Redhat?

Best wishes,
Chris


smime.p7s
Description: S/MIME cryptographic signature


Re: Commit f06e4ec breaks vmware

2008-02-04 Thread Zachary Amsden

On Mon, 2008-02-04 at 16:36 +0100, Ingo Molnar wrote:
> * Jeff Chua <[EMAIL PROTECTED]> wrote:
> 
> > On Feb 4, 2008 10:53 PM, Ingo Molnar <[EMAIL PROTECTED]> wrote:
> > > > commit 8d947344c47a40626730bb80d136d8daac9f2060
> > > > Author: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
> > > > Date:   Wed Jan 30 13:31:12 2008 +0100
> > > >
> > > > x86: change write_idt_entry signature
> > >
> > > does the patch below ontop of x86.git#mm fix this?
> > 
> > 
> > > 32-bit or 64-bit guest kernel?
> > 
> > 32-bit.
> > 
> > Yep, this fixed the problem.
> 
> great! I've added:
> 
>Tested-by: Jeff Chua <[EMAIL PROTECTED]>
> 
> to the commit message as well, if you dont mind. Full patch is below.

Acked-by: Zachary Amsden <[EMAIL PROTECTED]>

Thanks, Ingo!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Small pm documentation cleanups

2008-02-04 Thread Rafael J. Wysocki
Len, please pick this up, thanks.

On Tuesday, 5 of February 2008, Pavel Machek wrote:
> 
> Small documentation fixes/additions that accumulated in my tree.
> 
> Signed-off-by: Pavel Machek <[EMAIL PROTECTED]>

Acked-by: Rafael J. Wysocki <[EMAIL PROTECTED]>

> diff --git a/Documentation/kernel-parameters.txt 
> b/Documentation/kernel-parameters.txt
> index cf38689..3be3328 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -147,8 +147,10 @@ and is between 256 and 4096 characters. 
>   default: 0
>  
>   acpi_sleep= [HW,ACPI] Sleep options
> - Format: { s3_bios, s3_mode }
> - See Documentation/power/video.txt
> + Format: { s3_bios, s3_mode, s3_beep }
> + See Documentation/power/video.txt for s3_bios and 
> s3_mode.
> + s3_beep is for debugging; it beeps on PC speaker as 
> soon as
> + kernel's real-mode entry point is called.
>  
>   acpi_sci=   [HW,ACPI] ACPI System Control Interrupt trigger mode
>   Format: { level | edge | high | low }
> diff --git a/Documentation/power/swsusp.txt b/Documentation/power/swsusp.txt
> index aea7e92..d3e5e4e 100644
> --- a/Documentation/power/swsusp.txt
> +++ b/Documentation/power/swsusp.txt
> @@ -386,6 +386,11 @@ before suspending; then remount them aft
>  There is a work-around for this problem.  For more information, see
>  Documentation/usb/persist.txt.
>  
> +Q: Can I suspend-to-disk using a swap partition under LVM?
> +
> +A: No. You can suspend successfully, but you'll not be able to
> +resume. uswsusp should be able to work with LVM, see suspend.sf.net.
> +
>  Q: I upgraded the kernel from 2.6.15 to 2.6.16. Both kernels were
>  compiled with the similar configuration files. Anyway I found that
>  suspend to disk (and resume) is much slower on 2.6.16 compared to
> diff --git a/drivers/acpi/hardware/hwsleep.c b/drivers/acpi/hardware/hwsleep.c
> index fd1c4ba..058d0be 100644
> --- a/drivers/acpi/hardware/hwsleep.c
> +++ b/drivers/acpi/hardware/hwsleep.c
> @@ -286,13 +286,13 @@ acpi_status asmlinkage acpi_enter_sleep_
>   }
>  
>   /*
> +  * 1) Disable/Clear all GPEs
>* 2) Enable all wakeup GPEs
>*/
>   status = acpi_hw_disable_all_gpes();
>   if (ACPI_FAILURE(status)) {
>   return_ACPI_STATUS(status);
>   }
> -
>   acpi_gbl_system_awake_and_running = FALSE;
>  
>   status = acpi_hw_enable_all_wakeup_gpes();
> diff --git a/drivers/acpi/sleep/main.c b/drivers/acpi/sleep/main.c
> index 485de13..56e09cf 100644
> --- a/drivers/acpi/sleep/main.c
> +++ b/drivers/acpi/sleep/main.c
> @@ -170,7 +170,7 @@ static int acpi_pm_enter(suspend_state_t
>   /* Reprogram control registers and execute _BFS */
>   acpi_leave_sleep_state_prep(acpi_state);
>  
> - /* ACPI 3.0 specs (P62) says that it's the responsabilty
> + /* ACPI 3.0 specs (P62) says that it's the responsibilty
>* of the OSPM to clear the status bit [ implying that the
>* POWER_BUTTON event should not reach userspace ]
>*/
> diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig
> index ef9b802..b275ffb 100644
> --- a/kernel/power/Kconfig
> +++ b/kernel/power/Kconfig
> @@ -74,14 +74,14 @@ config PM_TRACE_RTC
>   RTC across reboots, so that you can debug a machine that just hangs
>   during suspend (or more commonly, during resume).
>  
> - To use this debugging feature you should attempt to suspend the machine,
> - then reboot it, then run
> + To use this debugging feature you should attempt to suspend the
> + machine, then reboot it, then run
>  
>   dmesg -s 100 | grep 'hash matches'
>  
>   CAUTION: this option will cause your machine's real-time clock to be
>   set to an invalid time after a resume.
>  
>  config PM_SLEEP_SMP
>   bool
>   depends on SMP
> @@ -123,7 +129,8 @@ config HIBERNATION
> called "hibernation" in user interfaces.  STD checkpoints the
> system and powers it off; and restores that checkpoint on reboot.
>  
> -   You can suspend your machine with 'echo disk > /sys/power/state'.
> +   You can suspend your machine with 'echo disk > /sys/power/state' 
> +   after placing resume=/dev/swappartition on kernel command line.
> Alternatively, you can use the additional userland tools available
> from .
>  
> diff --git a/kernel/power/main.c b/kernel/power/main.c
> index 6a6d5eb..d3df5af 100644
> --- a/kernel/power/main.c
> +++ b/kernel/power/main.c
> @@ -223,6 +226,7 @@ void __attribute__ ((weak)) arch_suspend
>   *   @state: state to enter
>   *
>   *   This function should be called after devices have been suspended.
> + *   May not sleep.
>   */
>  static int suspend_enter(suspend_state_t state)
>  {
> @@ -250,6 +255,8 @@ static int 

[PATCH 1/2] kvm: move address_mask define to static function

2008-02-04 Thread Harvey Harrison
Signed-off-by: Harvey Harrison <[EMAIL PROTECTED]>
---
 arch/x86/kvm/x86_emulate.c |   22 --
 1 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c
index 7958600..649e14d 100644
--- a/arch/x86/kvm/x86_emulate.c
+++ b/arch/x86/kvm/x86_emulate.c
@@ -427,11 +427,21 @@ static u16 twobyte_table[256] = {
 })
 
 /* Access/update address held in a register, based on addressing mode. */
-#define address_mask(reg)  \
-   ((c->ad_bytes == sizeof(unsigned long)) ?   \
-   (reg) : ((reg) & ((1UL << (c->ad_bytes << 3)) - 1)))
+static inline unsigned long ad_mask(struct decode_cache *c)
+{
+   return (1UL << (c->ad_bytes << 3)) - 1;
+}
+
+static unsigned long address_mask(struct decode_cache *c, unsigned long reg)
+{
+   if (c->ad_bytes == sizeof(unsigned long))
+   return reg;
+   else
+   return reg & ad_mask(c);
+}
+
 #define register_address(base, reg) \
-   ((base) + address_mask(reg))
+   ((base) + address_mask(c, reg))
 #define register_address_increment(reg, inc)\
do {\
/* signed type ensures sign extension to long */\
@@ -1393,7 +1403,7 @@ special_insn:
1,
(c->d & ByteOp) ? 1 : c->op_bytes,
c->rep_prefix ?
-   address_mask(c->regs[VCPU_REGS_RCX]) : 1,
+   address_mask(c, c->regs[VCPU_REGS_RCX]) : 1,
(ctxt->eflags & EFLG_DF),
register_address(ctxt->es_base,
 c->regs[VCPU_REGS_RDI]),
@@ -1409,7 +1419,7 @@ special_insn:
0,
(c->d & ByteOp) ? 1 : c->op_bytes,
c->rep_prefix ?
-   address_mask(c->regs[VCPU_REGS_RCX]) : 1,
+   address_mask(c, c->regs[VCPU_REGS_RCX]) : 1,
(ctxt->eflags & EFLG_DF),
register_address(c->override_base ?
*c->override_base :
-- 
1.5.4.rc5.1138.g2602


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] kvm: replace more defines with functions

2008-02-04 Thread Harvey Harrison
register_address
register_address_increment
jmp_rel

Have a struct decode_cache parameter added instead of having 'c' in
the macro.

Signed-off-by: Harvey Harrison <[EMAIL PROTECTED]>
---
 arch/x86/kvm/x86_emulate.c |   92 ++--
 1 files changed, 46 insertions(+), 46 deletions(-)

diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c
index 649e14d..1c0502a 100644
--- a/arch/x86/kvm/x86_emulate.c
+++ b/arch/x86/kvm/x86_emulate.c
@@ -440,25 +440,25 @@ static unsigned long address_mask(struct decode_cache *c, 
unsigned long reg)
return reg & ad_mask(c);
 }
 
-#define register_address(base, reg) \
-   ((base) + address_mask(c, reg))
-#define register_address_increment(reg, inc)\
-   do {\
-   /* signed type ensures sign extension to long */\
-   int _inc = (inc);   \
-   if (c->ad_bytes == sizeof(unsigned long))   \
-   (reg) += _inc;  \
-   else\
-   (reg) = ((reg) &\
-~((1UL << (c->ad_bytes << 3)) - 1)) |  \
-   (((reg) + _inc) &   \
-((1UL << (c->ad_bytes << 3)) - 1));\
-   } while (0)
+static unsigned long register_address(struct decode_cache *c,
+ unsigned long base, unsigned long reg)
+{
+   return base + address_mask(c, reg);
+}
 
-#define JMP_REL(rel)   \
-   do {\
-   register_address_increment(c->eip, rel);\
-   } while (0)
+static void register_address_increment(struct decode_cache *c,
+  unsigned long *reg, int inc)
+{
+   if (c->ad_bytes == sizeof(unsigned long))
+   *reg += inc;
+   else
+   *reg = (*reg & ~ad_mask(c)) | ((*reg + inc) & ad_mask(c));
+}
+
+static void jmp_rel(struct decode_cache *c, int rel)
+{
+   register_address_increment(c, >eip, rel);
+}
 
 static int do_fetch_insn_byte(struct x86_emulate_ctxt *ctxt,
  struct x86_emulate_ops *ops,
@@ -994,8 +994,8 @@ static inline void emulate_push(struct x86_emulate_ctxt 
*ctxt)
c->dst.type  = OP_MEM;
c->dst.bytes = c->op_bytes;
c->dst.val = c->src.val;
-   register_address_increment(c->regs[VCPU_REGS_RSP], -c->op_bytes);
-   c->dst.ptr = (void *) register_address(ctxt->ss_base,
+   register_address_increment(c, >regs[VCPU_REGS_RSP], -c->op_bytes);
+   c->dst.ptr = (void *) register_address(c, ctxt->ss_base,
   c->regs[VCPU_REGS_RSP]);
 }
 
@@ -1005,13 +1005,13 @@ static inline int emulate_grp1a(struct x86_emulate_ctxt 
*ctxt,
struct decode_cache *c = >decode;
int rc;
 
-   rc = ops->read_std(register_address(ctxt->ss_base,
+   rc = ops->read_std(register_address(c, ctxt->ss_base,
c->regs[VCPU_REGS_RSP]),
   >dst.val, c->dst.bytes, ctxt->vcpu);
if (rc != 0)
return rc;
 
-   register_address_increment(c->regs[VCPU_REGS_RSP], c->dst.bytes);
+   register_address_increment(c, >regs[VCPU_REGS_RSP], c->dst.bytes);
 
return 0;
 }
@@ -1122,9 +1122,9 @@ static inline int emulate_grp45(struct x86_emulate_ctxt 
*ctxt,
if (rc != 0)
return rc;
}
-   register_address_increment(c->regs[VCPU_REGS_RSP],
+   register_address_increment(c, >regs[VCPU_REGS_RSP],
   -c->dst.bytes);
-   rc = ops->write_emulated(register_address(ctxt->ss_base,
+   rc = ops->write_emulated(register_address(c, ctxt->ss_base,
c->regs[VCPU_REGS_RSP]), >dst.val,
c->dst.bytes, ctxt->vcpu);
if (rc != 0)
@@ -1371,19 +1371,19 @@ special_insn:
c->dst.type  = OP_MEM;
c->dst.bytes = c->op_bytes;
c->dst.val = c->src.val;
-   register_address_increment(c->regs[VCPU_REGS_RSP],
+   register_address_increment(c, >regs[VCPU_REGS_RSP],
   -c->op_bytes);
c->dst.ptr = (void *) register_address(
-   ctxt->ss_base, c->regs[VCPU_REGS_RSP]);
+   c, ctxt->ss_base, c->regs[VCPU_REGS_RSP]);
break;
case 0x58 ... 

Re: Integration of SCST in the mainstream Linux kernel

2008-02-04 Thread Linus Torvalds


On Mon, 4 Feb 2008, Jeff Garzik wrote:
> 
> Both of these are easily handled if the server is 100% in charge of managing
> the filesystem _metadata_ and data.  That's what I meant by complete control.
> 
> i.e. it not ext3 or reiserfs or vfat, its a block device or 1000GB file
> managed by a userland process.

Oh ok.

Yes, if you bring the filesystem into user mode too, then the problems go 
away - because now your NFSD can interact directly with the filesystem 
without any kernel/usermode abstraction layer rules in between. So that 
has all the same properties as moving NFSD entirely into the kernel.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


issue with patch "x86: no CPA on iounmap"

2008-02-04 Thread Siddha, Suresh B
This is wrt to x86 git commit f56d005d30342a45d8af2b75e82200f09600
"x86: no CPA on iounmap"

This can use performance issue. When a GART driver unmaps a RAM page,
which was mapped as UC, this commit will still retain UC attribute
on the kernel identity mapping. This can cause mysterious performance issue
if this freed page gets used by kernel later.

For now we should change the attribute during iounmap and in future PAT
infrastructure will have necessary hooks to avoid the aliasing issues.

thanks,
suresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] per-process securebits

2008-02-04 Thread Ismail Dönmez
At Monday 04 February 2008 around 18:45:24 Serge E. Hallyn wrote:
> Quoting Andrew G. Morgan ([EMAIL PROTECTED]):
> > -BEGIN PGP SIGNED MESSAGE-
> > Hash: SHA1
> >
> > Ismail D??nmez wrote:
> > | What I meant to ask was what does "per-process securebits" brings as
> >
> > extra.
> >
> > It allows you to create a legacy free process tree. For example, a
> > chroot, or container (which Serge can obviously explain in more detail),
>
> (Just to give my thoughts on securebits and containers)
>
> A container is a set of processes which has its own private namespaces
> for all or most resources - for instance it sees only processes in its
> own pid namespace, and its first process, which is sees as pid 1, is
> known as some other pid, maybe 3459, to the rest of the system.
>
> We tend to talk about 'system containers' versus 'application
> containers'.  A system container would be like a vserver or openvz
> instance, something which looks like a separate machine.  I was
> going to say I don't imagine per-process securebits being useful
> there, but actually since a system container doesn't need to do any
> hardware setup it actually might be a much easier start for a full
> SECURE_NOROOT distro than a real machine.  Heck, on a real machine init
> and a few legacy deamons could run in the init namespace, while users
> log in and apache etc run in a SECURE_NOROOT container.
>
> But I especially like the thought of for instance postfix running in a
> carefully crafted application container (with its own virtual network
> card and limited file tree and no visibility of other processes) with
> SECURE_NOROOT on.

This is really interesting security wise, will be nice to see how it can be 
implemented in real life.

Thanks for the explanation and the implementation ;-)

Regards,
ismail

-- 
Never learn by your mistakes, if you do you may never dare to try again.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cups slow on linux-2.6.24

2008-02-04 Thread Jeff Chua



On Feb 5, 2008 4:17 AM, Jozsef Kadlecsik <[EMAIL PROTECTED]> wrote:

Actively closed connections are not handled properly, i.e. the initiator 
of the active close should not be taken into account. So could you give 
a try to the patch below? Does it just suppress the 'invalid packed 
ignored' and all other kernel messages or both suppresses them and 
produces normal printing speed?


Jozsef,

Amazing! You fixed it. No more 'invalid packed ignored', and speed back to 
normal (continues after approx. 20 seconds of pausing after 503 prints).


I used the latest git, and have to modify your patch slightly to make it 
work (changing "conntrack" to "ct").



Thank you for fixing this.

Jeff


Here's your patch modified so it'll apply to the latest git.

--- a/net/netfilter/nf_conntrack_proto_tcp.c.org2008-02-05 08:29:39 
+0800
+++ a/net/netfilter/nf_conntrack_proto_tcp.c2008-02-05 08:28:05 +0800
@@ -125,7 +125,7 @@
  * CLOSE_WAIT: ACK seen (after FIN)
  * LAST_ACK:   FIN seen (after FIN)
  * TIME_WAIT:  last ACK seen
- * CLOSE:  closed connection
+ * CLOSE:  closed connection (RST)
  *
  * LISTEN state is not used.
  *
@@ -824,9 +824,23 @@
case TCP_CONNTRACK_SYN_SENT:
if (old_state < TCP_CONNTRACK_TIME_WAIT)
break;
-   if ((ct->proto.tcp.seen[!dir].flags & IP_CT_TCP_FLAG_CLOSE_INIT)
-   || (ct->proto.tcp.last_dir == dir
-   && ct->proto.tcp.last_index == TCP_RST_SET)) {
+   /* RFC 1122: "When a connection is closed actively,
+* it MUST linger in TIME-WAIT state for a time 2xMSL
+* (Maximum Segment Lifetime). However, it MAY accept
+* a new SYN from the remote TCP to reopen the connection
+* directly from TIME-WAIT state, if..."
+* We ignore the conditions because we are in the
+* TIME-WAIT state anyway.
+*
+* Handle aborted connections: we and the server
+* think there is an existing connection but the client
+* aborts it and starts a new one.
+*/
+   if (((ct->proto.tcp.seen[dir].flags
+ | ct->proto.tcp.seen[!dir].flags)
+& IP_CT_TCP_FLAG_CLOSE_INIT)
+   || (ct->proto.tcp.last_dir == dir
+   && ct->proto.tcp.last_index == TCP_RST_SET)) {
/* Attempt to reopen a closed/aborted connection.
 * Delete this connection and look up again. */
write_unlock_bh(_lock);
@@ -838,15 +852,23 @@
case TCP_CONNTRACK_IGNORE:
/* Ignored packets:
 *
+* Our connection entry may be out of sync, so ignore
+* packets which may signal the real connection between
+* the client and the server.
+*
 * a) SYN in ORIGINAL
 * b) SYN/ACK in REPLY
 * c) ACK in reply direction after initial SYN in original.
+*
+		 * If the ignored packet is invalid, the receiver will send 
+		 * a RST we'll catch below.

 */
if (index == TCP_SYNACK_SET
&& ct->proto.tcp.last_index == TCP_SYN_SET
&& ct->proto.tcp.last_dir != dir
&& ntohl(th->ack_seq) == ct->proto.tcp.last_end) {
/* This SYN/ACK acknowledges a SYN that we earlier
+   /* b) This SYN/ACK acknowledges a SYN that we earlier
 * ignored as invalid. This means that the client and
 * the server are both in sync, while the firewall is
 * not. We kill this session and block the SYN/ACK so
@@ -924,8 +946,7 @@

ct->proto.tcp.state = new_state;
if (old_state != new_state
-   && (new_state == TCP_CONNTRACK_FIN_WAIT
-   || new_state == TCP_CONNTRACK_CLOSE))
+   && new_state == TCP_CONNTRACK_FIN_WAIT)
ct->proto.tcp.seen[dir].flags |= IP_CT_TCP_FLAG_CLOSE_INIT;
timeout = ct->proto.tcp.retrans >= nf_ct_tcp_max_retrans
  && tcp_timeouts[new_state] > nf_ct_tcp_timeout_max_retrans
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 14/22] ide-tape: cleanup and fix comments

2008-02-04 Thread Bartlomiej Zolnierkiewicz
On Monday 04 February 2008, Borislav Petkov wrote:
> Also, remove redundant ones and cleanup whitespace.
> 
> Signed-off-by: Borislav Petkov <[EMAIL PROTECTED]>

fixed few minor issues while merging the patch:

> ---
>  drivers/ide/ide-tape.c |  725 +++
>  1 files changed, 293 insertions(+), 432 deletions(-)
> 
> diff --git a/drivers/ide/ide-tape.c b/drivers/ide/ide-tape.c
> index 175d507..a80f8d9 100644
> --- a/drivers/ide/ide-tape.c
> +++ b/drivers/ide/ide-tape.c

[...]

>  /*
> - *   DSC polling parameters.
> + * DSC polling parameters.
>   *
> - *   Polling for DSC (a single bit in the status register) is a very
> - *   important function in ide-tape. There are two cases in which we
> - *   poll for DSC:
> + * Polling for DSC (a single bit in the status register) is a very important
> + * function in ide-tape. There are two cases in which we poll for DSC:
>   *
> - *   1.  Before a read/write packet command, to ensure that we
> - *   can transfer data from/to the tape's data buffers, without
> - *   causing an actual media access. In case the tape is not
> - *   ready yet, we take out our request from the device
> - *   request queue, so that ide.c will service requests from
> - *   the other device on the same interface meanwhile.
>   *
> - *   2.  After the successful initialization of a "media access
> - *   packet command", which is a command which can take a long
> - *   time to complete (it can be several seconds or even an hour).
> + * 1. Before a read/write packet command, to ensure that we can transfer data
> + *from/to the tape's data buffers, without causing an actual media 
> access.
> + *In case the tape is not ready yet, we take out our request from the 
> device
> + *request queue, so that ide.c could service requests from the other 
> device
> + *on the same interface in the meantime.
>   *
> - *   Again, we postpone our request in the middle to free the bus
> - *   for the other device. The polling frequency here should be
> - *   lower than the read/write frequency since those media access
> - *   commands are slow. We start from a "fast" frequency -
> - *   IDETAPE_DSC_MA_FAST (one second), and if we don't receive DSC
> - *   after IDETAPE_DSC_MA_THRESHOLD (5 minutes), we switch it to a
> - *   lower frequency - IDETAPE_DSC_MA_SLOW (1 minute).
> + * 2. After the successful initialization of a "media access packet command",
> + *which is a command that can take a long time to complete (the interval 
> can
> + *range from several seconds to even an hour).
>   *
> - *   We also set a timeout for the timer, in case something goes wrong.
> - *   The timeout should be longer then the maximum execution time of a
> - *   tape operation.
> - */
> - 
> -/*
> - *   DSC timings.
> + * Again, we postpone our request in the middle to free the bus for the other
> + * device. The polling frequency here should be lower than the read/write
> + * frequency since those media access commands are slow. We start from a 
> "fast"
> + * frequency - IDETAPE_DSC_MA_FAST (one second), and if we don't receive DSC
> + * after IDETAPE_DSC_MA_THRESHOLD (5 minutes), we switch it to a lower 
> frequency
> + * - IDETAPE_DSC_MA_SLOW (1 minute). We also set a timeout for the timer, in

the above paragraph is true only for point 2.

[...]

> @@ -703,8 +654,8 @@ static void idetape_analyze_error(ide_drive_t *drive, u8 
> *sense)
>   }
>  
>   /*
> -  * If error was the result of a zero-length read or write command,
> -  * with sense key=5, asc=0x22, ascq=0, let it slide.  Some drives
> +  * If error was the result of a zero-length read or write command, with
> +  * sense key=5, asc=0x22, ascq=0, let it slide.  Some drives
>* (i.e. Seagate STT3401A Travan) don't support 0-length read/writes.
>*/
>   if ((pc->c[0] == READ_6 || pc->c[0] == WRITE_6)

[...]

> @@ -1070,25 +1012,24 @@ static ide_startstop_t idetape_pc_intr(ide_drive_t 
> *drive)
>   if (pc->flags & PC_FL_DMA_IN_PROGRESS) {
>   if (hwif->ide_dma_end(drive) || (stat & ERR_STAT)) {
>   /*
> -  * A DMA error is sometimes expected. For example,
> -  * if the tape is crossing a filemark during a
> -  * READ command, it will issue an irq and position
> -  * itself before the filemark, so that only a partial
> -  * data transfer will occur (which causes the DMA
> -  * error). In that case, we will later ask the tape
> -  * how much bytes of the original request were
> -  * actually transferred (we can't receive that
> -  * information from the DMA engine on most chipsets).
> +  * A DMA error is sometimes 

Re: [PATCH 02/22] ide-tape: remove struct idetape_read_position_result_t

2008-02-04 Thread Bartlomiej Zolnierkiewicz
On Monday 04 February 2008, Borislav Petkov wrote:
> There should be no functional changes resulting from this patch.
> 
> Signed-off-by: Borislav Petkov <[EMAIL PROTECTED]>
> ---
>  drivers/ide/ide-tape.c |   49 +--
>  1 files changed, 18 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/ide/ide-tape.c b/drivers/ide/ide-tape.c
> index 442d71c..c8c57ab 100644
> --- a/drivers/ide/ide-tape.c
> +++ b/drivers/ide/ide-tape.c

[...]

>   if (!tape->pc->error) {
> - result = (idetape_read_position_result_t *) tape->pc->buffer;
> - debug_log(DBG_SENSE, "BOP - %s\n", result->bop ? "Yes" : "No");
> - debug_log(DBG_SENSE, "EOP - %s\n", result->eop ? "Yes" : "No");
> + debug_log(DBG_SENSE, "BOP - %s\n",
> + !!(readpos[0] & 0x80) ? "Yes" : "No");
> + debug_log(DBG_SENSE, "EOP - %s\n",
> + !!(readpos[0] & 0x40) ? "Yes" : "No");
> +
> + if (!!(readpos[0] & 0x4)) {
> + printk(KERN_INFO "ide-tape: Block location is unknown"
> + "to the tape\n");

I removed needless "!!" while merging the patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ppc: fix #ifdef-s in mediabay driver (take 2)

2008-02-04 Thread Bartlomiej Zolnierkiewicz

* Replace incorrect CONFIG_BLK_DEV_IDE #ifdef in
  check_media_bay() by CONFIG_MAC_FLOPPY one.

* Replace incorrect CONFIG_BLK_DEV_IDE #ifdef-s by
  CONFIG_BLK_DEV_IDE_PMAC ones.

* check_media_bay() is used only by drivers/block/swim3.c
  so make this function available only if CONFIG_MAC_FLOPPY
  is defined.

* check_media_bay_by_base() and media_bay_set_ide_infos()
  are used only by drivers/ide/ppc/pmac.c so so make these
  functions available only if CONFIG_MAC_FLOPPY is defined.

v2:
* Remove ifdefs from function prototypes. (Andrew Morton)

Cc: Benjamin Herrenschmidt <[EMAIL PROTECTED]>
Cc: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Bartlomiej Zolnierkiewicz <[EMAIL PROTECTED]>
---
 drivers/macintosh/mediabay.c   |   46 ++---
 include/asm-powerpc/mediabay.h |8 +++
 2 files changed, 25 insertions(+), 29 deletions(-)

Index: b/drivers/macintosh/mediabay.c
===
--- a/drivers/macintosh/mediabay.c
+++ b/drivers/macintosh/mediabay.c
@@ -78,12 +78,14 @@ struct media_bay_info {
int cached_gpio;
int sleeping;
struct semaphorelock;
-#ifdef CONFIG_BLK_DEV_IDE
+#ifdef CONFIG_BLK_DEV_IDE_PMAC
void __iomem*cd_base;
-   int cd_index;
int cd_irq;
int cd_retry;
 #endif
+#if defined(CONFIG_BLK_DEV_IDE_PMAC) || defined(CONFIG_MAC_FLOPPY)
+   int cd_index;
+#endif
 };
 
 #define MAX_BAYS   2
@@ -91,7 +93,7 @@ struct media_bay_info {
 static struct media_bay_info media_bays[MAX_BAYS];
 int media_bay_count = 0;
 
-#ifdef CONFIG_BLK_DEV_IDE
+#ifdef CONFIG_BLK_DEV_IDE_PMAC
 /* check the busy bit in the media-bay ide interface
(assumes the media-bay contains an ide device) */
 #define MB_IDE_READY(i)((readb(media_bays[i].cd_base + 0x70) & 0x80) 
== 0)
@@ -401,7 +403,7 @@ static void poll_media_bay(struct media_
set_mb_power(bay, id != MB_NO);
bay->content_id = id;
if (id == MB_NO) {
-#ifdef CONFIG_BLK_DEV_IDE
+#ifdef CONFIG_BLK_DEV_IDE_PMAC
bay->cd_retry = 0;
 #endif
printk(KERN_INFO "media bay %d is 
empty\n", bay->index);
@@ -414,9 +416,9 @@ static void poll_media_bay(struct media_
}
 }
 
+#ifdef CONFIG_MAC_FLOPPY
 int check_media_bay(struct device_node *which_bay, int what)
 {
-#ifdef CONFIG_BLK_DEV_IDE
int i;
 
for (i=0; istate = mb_resetting;
MBDBG("mediabay%d: waiting reset (kind:%d)\n", i, 
bay->content_id);
break;
-   
case mb_resetting:
if (bay->content_id != MB_CD) {
MBDBG("mediabay%d: bay is up (kind:%d)\n", i, 
bay->content_id);
bay->state = mb_up;
break;
}
-#ifdef CONFIG_BLK_DEV_IDE
+#ifdef CONFIG_BLK_DEV_IDE_PMAC
MBDBG("mediabay%d: waiting IDE reset (kind:%d)\n", i, 
bay->content_id);
bay->ops->un_reset_ide(bay);
bay->timer = msecs_to_jiffies(MB_IDE_WAIT);
@@ -536,16 +535,14 @@ static void media_bay_step(int i)
 #else
printk(KERN_DEBUG "media-bay %d is ide (not compiled in 
kernel)\n", i);
set_mb_power(bay, 0);
-#endif /* CONFIG_BLK_DEV_IDE */
+#endif /* CONFIG_BLK_DEV_IDE_PMAC */
break;
-   
-#ifdef CONFIG_BLK_DEV_IDE
+#ifdef CONFIG_BLK_DEV_IDE_PMAC
case mb_ide_resetting:
bay->timer = msecs_to_jiffies(MB_IDE_TIMEOUT);
bay->state = mb_ide_waiting;
MBDBG("mediabay%d: waiting IDE ready (kind:%d)\n", i, 
bay->content_id);
break;
-   
case mb_ide_waiting:
if (bay->cd_base == NULL) {
bay->timer = 0;
@@ -587,11 +584,10 @@ static void media_bay_step(int i)
bay->timer = 0;
}
break;
-#endif /* CONFIG_BLK_DEV_IDE */
-
+#endif /* CONFIG_BLK_DEV_IDE_PMAC */
case mb_powering_down:
bay->state = mb_empty;
-#ifdef CONFIG_BLK_DEV_IDE
+#ifdef CONFIG_BLK_DEV_IDE_PMAC
if (bay->cd_index >= 0) {
printk(KERN_DEBUG "Unregistering mb %d ide, 
index:%d\n", i,
   bay->cd_index);
@@ -607,7 +603,7 @@ static void media_bay_step(int i)
bay->content_id = MB_NO;
}
}
-#endif /* CONFIG_BLK_DEV_IDE */
+#endif /* CONFIG_BLK_DEV_IDE_PMAC */
MBDBG("mediabay%d: end of power down\n", i);
break;
}
@@ -739,7 +735,7 @@ static int 

Re: [PATCH 07/22] ide-tape: struct idetape_tape_t: shorten member names v2

2008-02-04 Thread Bartlomiej Zolnierkiewicz
On Monday 04 February 2008, Borislav Petkov wrote:
> Shorten some member names not too aggressively since this driver might be gone
> anyway soon.
> 
> Signed-off-by: Borislav Petkov <[EMAIL PROTECTED]>
> ---
>  drivers/ide/ide-tape.c |  210 
> ++--
>  1 files changed, 113 insertions(+), 97 deletions(-)
> 
> diff --git a/drivers/ide/ide-tape.c b/drivers/ide/ide-tape.c
> index 126e8a9..0b5ccce 100644
> --- a/drivers/ide/ide-tape.c
> +++ b/drivers/ide/ide-tape.c

[...]

> @@ -1583,7 +1579,8 @@ static void idetape_create_read_cmd(idetape_tape_t 
> *tape, idetape_pc_t *pc, unsi
>   pc->bh = bh;
>   atomic_set(>b_count, 0);
>   pc->buffer = NULL;
> - pc->request_transfer = pc->buffer_size = length * tape->tape_block_size;
> + pc->buffer_size = length * tape->blk_size;
> + pc->request_transfer= length * tape->blk_size;
>   if (pc->request_transfer == tape->stage_size)
>   set_bit(PC_DMA_RECOMMENDED, >flags);
>  }
> @@ -1621,7 +1618,8 @@ static void idetape_create_write_cmd(idetape_tape_t 
> *tape, idetape_pc_t *pc, uns
>   pc->b_data = bh->b_data;
>   pc->b_count = atomic_read(>b_count);
>   pc->buffer = NULL;
> - pc->request_transfer = pc->buffer_size = length * tape->tape_block_size;
> + pc->request_transfer= length * tape->blk_size;
> + pc->buffer_size = length * tape->blk_size;
>   if (pc->request_transfer == tape->stage_size)
>   set_bit(PC_DMA_RECOMMENDED, >flags);
>  }

for some reason gcc doesn't seem to optimize the new code as well as
the old one (=> driver size goes up instead of staying unchanged)

interdiff between original patch and merged version:

diff -u b/drivers/ide/ide-tape.c b/drivers/ide/ide-tape.c
--- b/drivers/ide/ide-tape.c
+++ b/drivers/ide/ide-tape.c
@@ -324,7 +324,7 @@
/* Current character device data transfer direction */
u8 chrdev_dir;
 
-   /* tape block size, usu. 512 or 1024 bytes */
+   /* tape block size, usually 512 or 1024 bytes */
unsigned short blk_size;
int user_bs_factor;
 
@@ -1580,8 +1580,8 @@
pc->bh = bh;
atomic_set(>b_count, 0);
pc->buffer = NULL;
-   pc->buffer_size = length * tape->blk_size;
-   pc->request_transfer= length * tape->blk_size;
+   pc->buffer_size = length * tape->blk_size;
+   pc->request_transfer = pc->buffer_size;
if (pc->request_transfer == tape->stage_size)
set_bit(PC_DMA_RECOMMENDED, >flags);
 }
@@ -1619,8 +1619,8 @@
pc->b_data = bh->b_data;
pc->b_count = atomic_read(>b_count);
pc->buffer = NULL;
-   pc->request_transfer= length * tape->blk_size;
-   pc->buffer_size = length * tape->blk_size;
+   pc->buffer_size = length * tape->blk_size;
+   pc->request_transfer = pc->buffer_size;
if (pc->request_transfer == tape->stage_size)
set_bit(PC_DMA_RECOMMENDED, >flags);
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


ide-tape redux (was: Re:)

2008-02-04 Thread Bartlomiej Zolnierkiewicz

Hi Borislav,

On Monday 04 February 2008, Borislav Petkov wrote:
> Hi Bart,
> 
> here are the pending ide-tape patches reworked which incorporate all review
> points raised so far. Several new patches are appended to the original series
> which i thought would be reasonable to sumbit along with the others. Also,
> i've applied "ide-tape: dump gcw fields on error in idetape_identify_device()"
> which is #12 and which you can simply ignore. Furthermore, #32 from the 
> original
> series got split up into the different logical changes it dealt with, as you
> requested.

Thanks!  [ Reviewing was so much easier. ]

>  Documentation/feature-removal-schedule.txt |   14 +-
>  drivers/ide/ide-tape.c | 2764 
> +---
>  2 files changed, 1325 insertions(+), 1453 deletions(-)

applied #1-7, #9-10, #13-22 (+queued all of them for 2.6.25)

w.r.t. #8 I'm waiting for Jens to comment on blk_{get,put}_request() approach

w.r.t. #11 ide-tape uses char devices and supports DSC so it is not as obvious
as in ide-floppy case that all atomic bitops can be just removed (extra audit
and some time -mm are required) so please resync/resubmit

#12 is already in Linus' tree

Bart
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [git pull] SLUB updates for 2.6.25

2008-02-04 Thread Christoph Lameter
On Tue, 5 Feb 2008, Nick Piggin wrote:

> Anyway, not saying the operations are useless, but they should be
> made available to core kernel and implemented per-arch. (if they are
> found to be useful)

The problem is to establish the usefulness. These measures may bring 1-2% 
in a pretty unstable operation mode assuming that the system is doing 
repetitive work. The micro optimizations seem to be often drowned out 
by small other changes to the system.

There is the danger that a gain is seen that is not due to the patch but 
due to other changes coming about because code is moved since patches 
change execution paths.

Plus they may be only possible on a specific architecture. I know that our 
IA64 hardware has special measures ensuring certain behavior of atomic ops 
etc, I guess Intel has similar tricks up their sleeve. At 8p there are 
likely increasing problems with lock starvation where your ticketlock 
helps. That is why I thought we better defer the stuff until there is some 
more evidence that these are useful.

I got particularly nervous about these changes after I saw small 
performance drops due to the __unlock patch on the dual quad. That should 
have been a consistent gain.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Integration of SCST in the mainstream Linux kernel

2008-02-04 Thread Matt Mackall

On Mon, 2008-02-04 at 16:24 -0800, Linus Torvalds wrote:
> 
> On Mon, 4 Feb 2008, Matt Mackall wrote:
> > 
> > But ATAoE is boring because it's not IP. Which means no routing,
> > firewalls, tunnels, congestion control, etc.
> 
> The thing is, that's often an advantage. Not just for performance.
> 
> > NBD and iSCSI (for all its hideous growths) can take advantage of these
> > things.
> 
> .. and all this could equally well be done by a simple bridging protocol 
> (completely independently of any AoE code).
> 
> The thing is, iSCSI does things at the wrong level. It *forces* people to 
> use the complex protocols, when it's a known that a lot of people don't 
> want it. 

I frankly think NBD is at a pretty comfortable level. It's internally
very simple (and hardware-agnostic). And moderately easy to do in
silicon.

But I'm not going to defend iSCSI. I worked on the first implementation
(what became the Cisco iSCSI driver) and I have no love for iSCSI at
all. It should have been (and started out as) a nearly trivial
encapsulation of SCSI over TCP much like ATA over Ethernet but quickly
lost the plot when committees got ahold of it.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: v2.6.24-mm1 lockdep.c warning

2008-02-04 Thread Andrew Morton
On Mon, 04 Feb 2008 20:26:00 -0400
Kevin Winchester <[EMAIL PROTECTED]> wrote:

> Found this in my dmesg:
> 
> [   10.671500] [ cut here ]
> [   10.671500] WARNING: at kernel/lockdep.c:2037
> trace_hardirqs_on+0xba/0x113()
> [   10.671500] Pid: 0, comm: swapper Not tainted 2.6.24-mm1 #2
> [   10.671500]  [] warn_on_slowpath+0x3c/0x4c
> [   10.671500]  [] ? check_usage_forwards+0x19/0x3b
> [   10.671500]  [] ? mark_lock+0x1ab/0x3ae
> [   10.671500]  [] ? ata_hsm_qc_complete+0xbc/0xc2
> [   10.671500]  [] ? _spin_unlock_irq+0x22/0x42
> [   10.671500]  [] trace_hardirqs_on+0xba/0x113
> [   10.671500]  [] _spin_unlock_irq+0x22/0x42
> [   10.671500]  [] hpet_rtc_interrupt+0xe8/0x299
> [   10.671500]  [] handle_IRQ_event+0x1a/0x46
> [   10.671500]  [] handle_edge_irq+0xa6/0x102
> [   10.671500]  [] ? handle_edge_irq+0x0/0x102
> [   10.671500]  [] do_IRQ+0x87/0xb0
> [   10.671500]  [] common_interrupt+0x2e/0x34
> [   10.671500]  [] ? default_idle+0x45/0x72
> [   10.671500]  [] ? default_idle+0x0/0x72
> [   10.671500]  [] cpu_idle+0x73/0xa3
> [   10.671500]  [] rest_init+0x61/0x63
> [   10.671500]  ===
> [   10.671500] ---[ end trace e5cdd42f557be0f0 ]---
> 
yup, I hit that too.  It looks like Ingo has deided to merge the
below into mainline.  I'll put it in the hot-fixes directory.


From: Andrew Morton <[EMAIL PROTECTED]>

Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 include/asm-generic/rtc.h |   11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff -puN include/linux/rtc.h~hpet-borkage-fix include/linux/rtc.h
diff -puN include/asm-generic/rtc.h~hpet-borkage-fix include/asm-generic/rtc.h
--- a/include/asm-generic/rtc.h~hpet-borkage-fix
+++ a/include/asm-generic/rtc.h
@@ -35,10 +35,11 @@
 static inline unsigned char rtc_is_updating(void)
 {
unsigned char uip;
+   unsigned long flags;
 
-   spin_lock_irq(_lock);
+   spin_lock_irqsave(_lock, flags);
uip = (CMOS_READ(RTC_FREQ_SELECT) & RTC_UIP);
-   spin_unlock_irq(_lock);
+   spin_unlock_irqrestore(_lock, flags);
return uip;
 }
 
@@ -46,6 +47,8 @@ static inline unsigned int get_rtc_time(
 {
unsigned long uip_watchdog = jiffies;
unsigned char ctrl;
+   unsigned long flags;
+
 #ifdef CONFIG_MACH_DECSTATION
unsigned int real_year;
 #endif
@@ -72,7 +75,7 @@ static inline unsigned int get_rtc_time(
 * RTC has RTC_DAY_OF_WEEK, we ignore it, as it is only updated
 * by the RTC when initially set to a non-zero value.
 */
-   spin_lock_irq(_lock);
+   spin_lock_irqsave(_lock, flags);
time->tm_sec = CMOS_READ(RTC_SECONDS);
time->tm_min = CMOS_READ(RTC_MINUTES);
time->tm_hour = CMOS_READ(RTC_HOURS);
@@ -83,7 +86,7 @@ static inline unsigned int get_rtc_time(
real_year = CMOS_READ(RTC_DEC_YEAR);
 #endif
ctrl = CMOS_READ(RTC_CONTROL);
-   spin_unlock_irq(_lock);
+   spin_unlock_irqrestore(_lock, flags);
 
if (!(ctrl & RTC_DM_BINARY) || RTC_ALWAYS_BCD)
{
_

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >