Re: [patch 2/3] only allow nonlinear vmas for ram backed filesystems

2007-03-25 Thread Peter Zijlstra
On Sun, 2007-03-25 at 16:00 -0800, Andrew Morton wrote:
> On Sat, 24 Mar 2007 23:09:19 +0100 Miklos Szeredi <[EMAIL PROTECTED]> wrote:
> 
> > Dirty page accounting/limiting doesn't work for nonlinear mappings,
> 
> Doesn't it?  iirc the problem is that we don't correctly re-clean the ptes
> while starting writeout.  And the dirty-page accounting is in fact correct
> (it'd darn well better be).

If we do not re-protect the pages on writeout, we'll decrement the dirty
count but not get a fault on re-dirty. Hence the dirty count will
actually skew.

In order to make page_mkclean() work for nonlinear vmas we need to do a
full pte scan for each invocation (we could perhaps only scan 1 in n
times to try and limit the damage) and that hurts. This will basically
render it useless.

The other solution is adding rmap information to nonlinear vmas but
doubling the memory overhead for nonlinear mappings was not deemed a
good idea.

> > so
> > for non-ram backed filesystems emulate with linear mappings.  This
> > retains ABI compatibility with previous kernels at minimal code cost.
> > 
> > All known users of nonlinear mappings actually use tmpfs, so this
> > shouldn't have any negative effect.
> 
> Unless someone is using remap_file_pages() against an ext3 file, in which
> case their application stops working?

it'll work up to a certain point (when you hit the max vma count). When
this issue first came up a few weeks ago, nobody knew of any non tmpfs
users.

> That would be a problem.  These guys:
> http://www.technovelty.org/code/linux/fremap.html, for example, will be in
> for a little surprise.

Its an example, it doesn't show if there are actually users of this, but
point taken. We actually could make this very example work by noting
that they map PROT_READ only.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: drivers/pci/probe.c patch in 2.6.20.4 causes 'cannot adjust BAR0 (not I/O)' on NVidia MCP51

2007-03-25 Thread Jan Beulich
>>> Greg KH <[EMAIL PROTECTED]> 25.03.07 18:11 >>>
>On Sun, Mar 25, 2007 at 08:23:23AM -0700, Kevin P. Fleming wrote:
>> I just upgraded from 2.6.20.2 to 2.6.20.4 on my Compaq V6000 laptop,
>> which has an NVidia core chipset. It has the MCP51 and uses it for PATA
>> and SATA.
>> 
>> Booting the 2.6.20.4 kernel causes two messages (and a kernel lockup)
>> like this:
>> 
>> :00:0d.0: cannot adjust BAR0 (not I/O)
>> :00:0d.0: cannot adjust BAR1 (not I/O)
>> 
>> Booting without ACPI, without APIC, without LAPIC makes no usable
>> difference (although sometimes I will also receive a message about BAR2).
>> 
>> This patch:
>> 
>>
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=ed8ccee0918ad063a4741c0656fda783e02df627;hp=9e5755bce00bb563739aeb0f09932a1907521167

>> 
>> is the cause... backing it out results in a working 2.6.20.4 kernel on
>> my laptop.
>> 
>> I'll be happy to provide any assistance I can debugging this problem.
>
>Jan, any thoughts about this?
>
>Should this be backed out of the -stable releases?

I don't think so - the message (as Alan also said) tells the user that the
intended adjustment is *not* being done, kind of as a warning. I would
agree, though, that the flags adjustment (namely, adding
IORESOURCE_IO) is somewhat questionable; this isn't a change the
patch did, though, it was that way already before. But otoh with the
legacy flag set in progif, these BARs *should* be I/O ones...

Jan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] Patch "CPU hotplug: call check_tsc_sync_source() with irqs off" breaks some drivers

2007-03-25 Thread Ingo Molnar

* Nicolas Boichat <[EMAIL PROTECTED]> wrote:

> > I found out which commit seems to cause these bugs:
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=d04f41e35343f1d788551fd3f753f51794f4afcf
> > 
> > The latest GIT without this commit works fine, but doesn't with it.
> 
> Sorry about blaming this commit. The problems happen randomly (about 1 
> reboot over 2 is ok, at least with 2.6.21-rc4). I'll run more tests 
> and post the results later.

is this the 32-bit kernel? If yes then does commit 
4edc5db83f574dfcc8be35b7b96760ded543b360 (included in -rc5) fix it?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: HELP!!! SD and suspend damage i-node.

2007-03-25 Thread Sergey Smirnov
Suspend damage filesystem each times on 4G SD.
I use bluetooth. But corruption doesn't depend on it.
I also test 2.6.21-rc4-git7 kernel. The same problem.
512Mb SD works normal.
Pavel Machek wrote:
> Hi!
> 
>> I use 2.6.20.2 kernel with ext3 rootfs on 4Gb SD on sharp zaurus sl-750 
>> (PXA255).
>> After suspend/resume filesystem stay clean. But some i-nodes become broken.
>> Some files looks like block device or pipe with strange permissions, owner 
>> etc.
>> I'm sure that there is no bad blocks on SD.
>> I'll send any additional information. Just say me what you need to help me.
>> I had i lot of tries. Apply patches, remove its, change fstype etc.
>> output of fsck -f
> 
> How repeatable is the corruption? I have c3000 here, and it corrupts
> memory during system, say during one in 5 suspends. Longer suspends
> seem to lead to 'more' corruption.
> 
> It did not damage my filesystem, yet.
> 
> Do you use bluetooth?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3/5] 2.6.21-rc4: known regressions (v2)

2007-03-25 Thread Frédéric RISS
Le dimanche 25 mars 2007 à 23:34 +0200, Frédéric Riss a écrit :
> However, as I pointed out in the initial report, the MacMini doesn't
> come out of suspend to ram because a commit in another merged patchset
> broke it. I tracked it down to:
> 
> commit e9e2cdb412412326c4827fc78ba27f410d837e6e
> parent 79bf2bb335b85db25d27421c798595a2fa2a0e82 
> Author: Thomas Gleixner <[EMAIL PROTECTED]>
> Date:   Fri Feb 16 01:28:04 2007 -0800
> 
> [PATCH] clockevents: i386 drivers
> 
> This patch has already been mentioned in regression reports, but AFAICS
> not related to suspend issues.
> 
> To be totally clear about what works and what doesn't:
> 
> 79bf2bb335b85db25d27421c798595a2fa2a0e82 
>+ cherry-pick f3ccb06f3b8e0cf42b579db21f3ca7f17fcc3f38  ==> works
> 
> e9e2cdb412412326c4827fc78ba27f410d837e6e
>+ cherry-pick f3ccb06f3b8e0cf42b579db21f3ca7f17fcc3f38  ==> broken
> 
> To try to get more information, I commented the call to
> do_suspend_lowlevel in drivers/acpi/sleep/main.c and used
> CONFIG_DISABLE_CONSOLE_SUSPEND. Interestingly, the suspend/resume cycle
> completes correctly in this mode.

Additional data point: I just tried with -rc5 and the issue is still
present. The config I used for this test defines neither NO_HZ nor
HIGH_RES_TIMERS.

Fred.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch -mm] i386: use pte_update_defer in ptep_test_and_clear_{dirty,young}

2007-03-25 Thread David Rientjes
Date: Sun, 25 Mar 2007 23:07:43 -0800
From: Zachary Amsden <[EMAIL PROTECTED]>

If you actually clear the bit, you need to:

+ pte_update_defer(vma->vm_mm, addr, ptep);

The reason is, when updating PTEs, the hypervisor must be notified.  
Using
atomic operations to do this is fine for all hypervisors I am aware of.
However, for hypervisors which shadow page tables, if these PTE
modifications are not trapped, you need a post-modification call to 
fulfill
the update of the shadow page table.

Cc: Zachary Amsden <[EMAIL PROTECTED]>
Cc: Hugh Dickins <[EMAIL PROTECTED]>
Signed-off-by: David Rientjes <[EMAIL PROTECTED]>
---
 include/asm-i386/pgtable.h |   26 ++
 1 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/include/asm-i386/pgtable.h b/include/asm-i386/pgtable.h
--- a/include/asm-i386/pgtable.h
+++ b/include/asm-i386/pgtable.h
@@ -287,18 +287,24 @@ do {  
\
 static inline int ptep_test_and_clear_dirty(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep)
 {
-   if (!pte_dirty(*ptep))
-   return 0;
-   return test_and_clear_bit(_PAGE_BIT_DIRTY, >pte_low);
+   int ret = 0;
+   if (pte_dirty(*ptep))
+   ret = test_and_clear_bit(_PAGE_BIT_DIRTY, >pte_low);
+   if (ret)
+   pte_update_defer(vma->vm_mm, addr, ptep);
+   return ret;
 }
 
 #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
 static inline int ptep_test_and_clear_young(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep)
 {
-   if (!pte_young(*ptep))
-   return 0;
-   return test_and_clear_bit(_PAGE_BIT_ACCESSED, >pte_low);
+   int ret = 0;
+   if (pte_young(*ptep))
+   ret = test_and_clear_bit(_PAGE_BIT_ACCESSED, >pte_low);
+   if (ret)
+   pte_update_defer(vma->vm_mm, addr, ptep);
+   return ret;
 }
 
 /*
@@ -317,10 +323,8 @@ do {   
\
 ({ \
int __dirty;\
__dirty = ptep_test_and_clear_dirty((vma), (address), (ptep));  \
-   if (__dirty) {  \
-   pte_update_defer((vma)->vm_mm, (address), (ptep));  \
+   if (__dirty)\
flush_tlb_page(vma, address);   \
-   }   \
__dirty;\
 })
 
@@ -329,10 +333,8 @@ do {   
\
 ({ \
int __young;\
__young = ptep_test_and_clear_young((vma), (address), (ptep));  \
-   if (__young) {  \
-   pte_update_defer((vma)->vm_mm, (address), (ptep));  \
+   if (__young)\
flush_tlb_page(vma, address);   \
-   }   \
__young;\
 })
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch -mm 1/2] i386: add ptep_test_and_clear_{dirty,young}

2007-03-25 Thread David Rientjes
On Sun, 25 Mar 2007, Zachary Amsden wrote:

> If you actually clear the bit, you need to:
> 
> + pte_update_defer(vma->vm_mm, addr, ptep);
> 
> The reason is, when updating PTEs, the hypervisor must be notified.  Using
> atomic operations to do this is fine for all hypervisors I am aware of.
> However, for hypervisors which shadow page tables, if these PTE modifications
> are not trapped, you need a post-modification call to fulfill the update of
> the shadow page table.
> 

Then why was ptep_test_and_clear_{dirty,young} ever removed in the first 
place??  To gain the optimization of one fewer branch and introduce a hack 
to advertise it's existance so the generic header file doesn't include its 
own version?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch -mm 1/2] i386: add ptep_test_and_clear_{dirty,young}

2007-03-25 Thread Hugh Dickins
On Sun, 25 Mar 2007, Zachary Amsden wrote:
> 
> If you actually clear the bit, you need to:
> 
> + pte_update_defer(vma->vm_mm, addr, ptep);
> 
> The reason is, when updating PTEs, the hypervisor must be notified.  Using
> atomic operations to do this is fine for all hypervisors I am aware of.
> However, for hypervisors which shadow page tables, if these PTE modifications
> are not trapped, you need a post-modification call to fulfill the update of
> the shadow page table.

Thanks for the very rapid response.

So, David just needs to move the pte_update_defer out of
ptep_clear_flush_* and into ptep_test_and_clear_*?

That leaves me wondering why you deleted ptep_test_and_clear_*
(while leaving their __HAVE_ARCHes) in the first place?

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch -mm 1/2] i386: add ptep_test_and_clear_{dirty,young}

2007-03-25 Thread Zachary Amsden

Hugh Dickins wrote:

On Sun, 25 Mar 2007, David Rientjes wrote:
  

Add ptep_test_and_clear_{dirty,young} to i386.  They advertise that they
have it and there is at least one place where it needs to be called
without the page table lock: to clear the accessed bit on write to



Without the page table lock??

  

/proc/pid/clear_refs.

ptep_clear_flush_{dirty,young} are updated to use the new functions.  The
overall net effect to current users of ptep_clear_flush_{dirty,young} is
that we introduce an additional branch.



We need to Cc Zach on this: git blame indicates it was he who replaced
i386's ptep_test_and_clear_{dirty,young} by that "We don't actually
have these" comment - it looks a bit as if what you want to do might
violate the assumptions he wants to make, but I don't grasp it.

Hugh

  

Cc: Hugh Dickins <[EMAIL PROTECTED]>
Cc: Ingo Molnar <[EMAIL PROTECTED]>
Signed-off-by: David Rientjes <[EMAIL PROTECTED]>
---
 include/asm-i386/pgtable.h |   25 +
 1 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/include/asm-i386/pgtable.h b/include/asm-i386/pgtable.h
--- a/include/asm-i386/pgtable.h
+++ b/include/asm-i386/pgtable.h
@@ -283,12 +283,23 @@ do {  
\
}   \
 } while (0)
 
-/*

- * We don't actually have these, but we want to advertise them so that
- * we can encompass the flush here.
- */
 #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_DIRTY
+static inline int ptep_test_and_clear_dirty(struct vm_area_struct *vma,
+   unsigned long addr, pte_t *ptep)
+{
+   if (!pte_dirty(*ptep))
+   return 0;

+	return test_and_clear_bit(_PAGE_BIT_DIRTY, >pte_low);

+}
+



If you actually clear the bit, you need to:

+ pte_update_defer(vma->vm_mm, addr, ptep);

The reason is, when updating PTEs, the hypervisor must be notified.  
Using atomic operations to do this is fine for all hypervisors I am 
aware of.  However, for hypervisors which shadow page tables, if these 
PTE modifications are not trapped, you need a post-modification call to 
fulfill the update of the shadow page table.



 #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
+static inline int ptep_test_and_clear_young(struct vm_area_struct *vma,
+   unsigned long addr, pte_t *ptep)
+{
+   if (!pte_young(*ptep))
+   return 0;
+   return test_and_clear_bit(_PAGE_BIT_ACCESSED, >pte_low);
+}



Same here.

Hugh, thanks for the cc.

Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux-VServer example results for sharing vs. separate mappings ...

2007-03-25 Thread Balbir Singh

Andrew Morton wrote:

On Mon, 26 Mar 2007 08:06:07 +0530 Balbir Singh <[EMAIL PROTECTED]> wrote:


Andrew Morton wrote:

Don't we break the global LRU with this scheme?

Sure, but that's deliberate!

(And we don't have a global LRU - the LRUs are per-zone).


Yes, true. But if we use zones for containers and say we have 400
of them, with all of them under limit. When the system wants
to reclaim memory, we might not end up reclaiming the best pages.
Am I missing something?


If a zone is under its min_pages limit, it needs reclaim.  Who/when/why
that reclaim is run doesn't really matter.

Yeah, we might run into some scaling problems with that many zones. 
They're unlikely to be unfixable.




ok.




b) Create a new memory abstraction, call it the "software zone", which
   is mostly decoupled from the present "hardware zones".  Most of the MM
   is reworked to use "software zones".  The "software zones" are
   runtime-resizeable, and obtain their pages via some means from the
   hardware zones.  A container uses a software zone.


I think the problem would be figuring out where to allocate memory from?
What happens if a software zone spans across many hardware zones?

Yes, that would be the tricky part.  But we generally don't care what
physical zone user pages come from, apart from NUMA optimisation.


The reclaim mechanism proposed *does not impact the non-container users*.

Yup.  Let's keep plugging away with Pavel's approach, see where it gets us.


Yes, we have some changes that we've made to the reclaim logic, we hope
to integrate a page cache controller soon. We are also testing the
patches. Hopefully soon enough, they'll be in a good state and we can
request you to merge the containers and the rss limit (plus page cache)
controller soon.


Now I'm worried again.  This separation between "rss controller" and
"pagecache" is largely alien to memory reclaim.  With physical containers
these new concepts (and their implementations) don't need to exist - it is
already all implemented.

Designing brand-new memory reclaim machinery in mid-2007 sounds like a very
bad idea.   But let us see what it looks like.



I did not mean to worry you again :-) We do not plan to implement brand
new memory reclaim, we intend to modify some bits and pieces for per
container reclaim. We believe at this point that all the necessary
infrastructure is largely present in container_isolate_pages(). Adding
a page cache controller should not require core-mm surgery, just the
accounting bits.

We basically agree that designing a brand new reclaim machinery is a bad
idea, non-container users will not be impacted. Only container driver
reclaim (caused by a container being at it's limit), will see some change
in reclaim behaviour and we shall try and restrict the changes to as
small as possible.

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch -mm 1/2] i386: add ptep_test_and_clear_{dirty,young}

2007-03-25 Thread Hugh Dickins
On Sun, 25 Mar 2007, David Rientjes wrote:
> Add ptep_test_and_clear_{dirty,young} to i386.  They advertise that they
> have it and there is at least one place where it needs to be called
> without the page table lock: to clear the accessed bit on write to

Without the page table lock??

> /proc/pid/clear_refs.
> 
> ptep_clear_flush_{dirty,young} are updated to use the new functions.  The
> overall net effect to current users of ptep_clear_flush_{dirty,young} is
> that we introduce an additional branch.

We need to Cc Zach on this: git blame indicates it was he who replaced
i386's ptep_test_and_clear_{dirty,young} by that "We don't actually
have these" comment - it looks a bit as if what you want to do might
violate the assumptions he wants to make, but I don't grasp it.

Hugh

> 
> Cc: Hugh Dickins <[EMAIL PROTECTED]>
> Cc: Ingo Molnar <[EMAIL PROTECTED]>
> Signed-off-by: David Rientjes <[EMAIL PROTECTED]>
> ---
>  include/asm-i386/pgtable.h |   25 +
>  1 files changed, 17 insertions(+), 8 deletions(-)
> 
> diff --git a/include/asm-i386/pgtable.h b/include/asm-i386/pgtable.h
> --- a/include/asm-i386/pgtable.h
> +++ b/include/asm-i386/pgtable.h
> @@ -283,12 +283,23 @@ do {
> \
>   }   \
>  } while (0)
>  
> -/*
> - * We don't actually have these, but we want to advertise them so that
> - * we can encompass the flush here.
> - */
>  #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_DIRTY
> +static inline int ptep_test_and_clear_dirty(struct vm_area_struct *vma,
> + unsigned long addr, pte_t *ptep)
> +{
> + if (!pte_dirty(*ptep))
> + return 0;
> + return test_and_clear_bit(_PAGE_BIT_DIRTY, >pte_low);
> +}
> +
>  #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
> +static inline int ptep_test_and_clear_young(struct vm_area_struct *vma,
> + unsigned long addr, pte_t *ptep)
> +{
> + if (!pte_young(*ptep))
> + return 0;
> + return test_and_clear_bit(_PAGE_BIT_ACCESSED, >pte_low);
> +}
>  
>  /*
>   * Rules for using ptep_establish: the pte MUST be a user pte, and
> @@ -305,9 +316,8 @@ do {  
> \
>  #define ptep_clear_flush_dirty(vma, address, ptep)   \
>  ({   \
>   int __dirty;\
> - __dirty = pte_dirty(*(ptep));   \
> + __dirty = ptep_test_and_clear_dirty((vma), (address), (ptep));  \
>   if (__dirty) {  \
> - clear_bit(_PAGE_BIT_DIRTY, &(ptep)->pte_low);   \
>   pte_update_defer((vma)->vm_mm, (address), (ptep));  \
>   flush_tlb_page(vma, address);   \
>   }   \
> @@ -318,9 +328,8 @@ do {  
> \
>  #define ptep_clear_flush_young(vma, address, ptep)   \
>  ({   \
>   int __young;\
> - __young = pte_young(*(ptep));   \
> + __young = ptep_test_and_clear_young((vma), (address), (ptep));  \
>   if (__young) {  \
> - clear_bit(_PAGE_BIT_ACCESSED, &(ptep)->pte_low);\
>   pte_update_defer((vma)->vm_mm, (address), (ptep));  \
>   flush_tlb_page(vma, address);   \
>   }   \
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: rSDl cpu scheduler version 0.34-test patch

2007-03-25 Thread Ingo Molnar

* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> > The patches available also include a rollup of sched: accurate user 
> > accounting as this code touches the same area and it is most 
> > convenient to include them together.
> 
> as i mentioned it before, please keep this one separate, as we want to 
> apply it first.

i.e. have the patch below the first patch of the series. Thanks,

Ingo

--->
Subject: [patch] sched: accurate user accounting
From: Con Kolivas <[EMAIL PROTECTED]>

Currently we only do cpu accounting to userspace based on what is 
actually happening precisely on each tick. The accuracy of that 
accounting gets progressively worse the lower HZ is. As we already keep 
accounting of nanosecond resolution we can accurately track user cpu, 
nice cpu and idle cpu if we move the accounting to update_cpu_clock with 
a nanosecond cpu_usage_stat entry. This increases overhead slightly but 
avoids the problem of tick aliasing errors making accounting unreliable.

Remove the now defunct Documentation/cpu-load.txt file.

Signed-off-by: Con Kolivas <[EMAIL PROTECTED]>
Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 Documentation/cpu-load.txt  |  113 
 include/linux/kernel_stat.h |3 +
 include/linux/sched.h   |2 
 kernel/sched.c  |   58 +-
 kernel/timer.c  |5 -
 5 files changed, 60 insertions(+), 121 deletions(-)

Index: linux/Documentation/cpu-load.txt
===
--- linux.orig/Documentation/cpu-load.txt
+++ /dev/null
@@ -1,113 +0,0 @@
-CPU load
-
-
-Linux exports various bits of information via `/proc/stat' and
-`/proc/uptime' that userland tools, such as top(1), use to calculate
-the average time system spent in a particular state, for example:
-
-$ iostat
-Linux 2.6.18.3-exp (linmac) 02/20/2007
-
-avg-cpu:  %user   %nice %system %iowait  %steal   %idle
-  10.010.002.925.440.00   81.63
-
-...
-
-Here the system thinks that over the default sampling period the
-system spent 10.01% of the time doing work in user space, 2.92% in the
-kernel, and was overall 81.63% of the time idle.
-
-In most cases the `/proc/stat' information reflects the reality quite
-closely, however due to the nature of how/when the kernel collects
-this data sometimes it can not be trusted at all.
-
-So how is this information collected?  Whenever timer interrupt is
-signalled the kernel looks what kind of task was running at this
-moment and increments the counter that corresponds to this tasks
-kind/state.  The problem with this is that the system could have
-switched between various states multiple times between two timer
-interrupts yet the counter is incremented only for the last state.
-
-
-Example

-
-If we imagine the system with one task that periodically burns cycles
-in the following manner:
-
- time line between two timer interrupts
-|--|
- ^^
- |_ something begins working  |
-  |_ something goes to sleep
- (only to be awaken quite soon)
-
-In the above situation the system will be 0% loaded according to the
-`/proc/stat' (since the timer interrupt will always happen when the
-system is executing the idle handler), but in reality the load is
-closer to 99%.
-
-One can imagine many more situations where this behavior of the kernel
-will lead to quite erratic information inside `/proc/stat'.
-
-
-/* gcc -o hog smallhog.c */
-#include 
-#include 
-#include 
-#include 
-#define HIST 10
-
-static volatile sig_atomic_t stop;
-
-static void sighandler (int signr)
-{
- (void) signr;
- stop = 1;
-}
-static unsigned long hog (unsigned long niters)
-{
- stop = 0;
- while (!stop && --niters);
- return niters;
-}
-int main (void)
-{
- int i;
- struct itimerval it = { .it_interval = { .tv_sec = 0, .tv_usec = 1 },
- .it_value = { .tv_sec = 0, .tv_usec = 1 } };
- sigset_t set;
- unsigned long v[HIST];
- double tmp = 0.0;
- unsigned long n;
- signal (SIGALRM, );
- setitimer (ITIMER_REAL, , NULL);
-
- hog (ULONG_MAX);
- for (i = 0; i < HIST; ++i) v[i] = ULONG_MAX - hog (ULONG_MAX);
- for (i = 0; i < HIST; ++i) tmp += v[i];
- tmp /= HIST;
- n = tmp - (tmp / 3.0);
-
- sigemptyset ();
- sigaddset (, SIGALRM);
-
- for (;;) {
- hog (n);
- sigwait (, );
- }
- return 0;
-}
-
-
-References
---
-
-http://lkml.org/lkml/2007/2/12/6
-Documentation/filesystems/proc.txt (1.8)
-
-
-Thanks
---
-
-Con Kolivas, Pavel Machek
Index: linux/include/linux/kernel_stat.h
===
--- linux.orig/include/linux/kernel_stat.h
+++ linux/include/linux/kernel_stat.h

RE: [2.6 patch] arch/cris/arch-v32/kernel/fasttimer.c: remove kernel 2.0 code

2007-03-25 Thread Mikael Starvik
Yes, I agree totally. 

Signed-off-by: Mikael Starvik <[EMAIL PROTECTED]>

-Original Message-
From: Adrian Bunk [mailto:[EMAIL PROTECTED] 
Sent: Monday, March 26, 2007 6:08 AM
To: Mikael Starvik
Cc: dev-etrax; linux-kernel@vger.kernel.org
Subject: [2.6 patch] arch/cris/arch-v32/kernel/fasttimer.c: remove kernel
2.0 code


This patch removes code for kernel 2.0 (sic).

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---

 arch/cris/arch-v32/kernel/fasttimer.c |   36 ++
 1 file changed, 3 insertions(+), 33 deletions(-)

--- linux-2.6.21-rc4-mm1/arch/cris/arch-v32/kernel/fasttimer.c.old
2007-03-25 19:50:26.0 +0200
+++ linux-2.6.21-rc4-mm1/arch/cris/arch-v32/kernel/fasttimer.c  2007-03-25
19:52:36.0 +0200
@@ -602,38 +602,14 @@
 }
 
 #ifdef CONFIG_PROC_FS
-static int proc_fasttimer_read(char *buf, char **start, off_t offset, int
len
-#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,2,0)
-   ,int *eof, void *data_unused
-#else
-,int unused
-#endif
-   );
-#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,2,0)
-static struct proc_dir_entry *fasttimer_proc_entry;
-#else
-static struct proc_dir_entry fasttimer_proc_entry =
-{
-  0, 9, "fasttimer",
-  S_IFREG | S_IRUGO, 1, 0, 0,
-  0, NULL /* ops -- default to array */,
-  _fasttimer_read /* get_info */,
-};
-#endif
-#endif /* CONFIG_PROC_FS */
 
-#ifdef CONFIG_PROC_FS
+static struct proc_dir_entry *fasttimer_proc_entry;
 
 /* This value is very much based on testing */
 #define BIG_BUF_SIZE (500 + NUM_TIMER_STATS * 300)
 
-static int proc_fasttimer_read(char *buf, char **start, off_t offset, int
len
-#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,2,0)
-   ,int *eof, void *data_unused
-#else
-,int unused
-#endif
-   )
+static int proc_fasttimer_read(char *buf, char **start, off_t offset, int
len,
+  int *eof, void *data_unused)
 {
   unsigned long flags;
   int i = 0;
@@ -808,9 +784,7 @@
 
   memcpy(buf, bigbuf + offset, len);
   *start = buf;
-#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,2,0)
   *eof = 1;
-#endif
 
   return len;
 }
@@ -974,12 +948,8 @@
 printk("fast_timer_init()\n");
 
 #ifdef CONFIG_PROC_FS
-#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,2,0)
if ((fasttimer_proc_entry = create_proc_entry( "fasttimer", 0, 0 )))
  fasttimer_proc_entry->read_proc = proc_fasttimer_read;
-#else
-proc_register_dynamic(_root, _proc_entry);
-#endif
 #endif /* PROC_FS */
 if(request_irq(TIMER_INTR_VECT, timer_trig_interrupt, IRQF_DISABLED,
"fast timer int", NULL))

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: rSDl cpu scheduler version 0.34-test patch

2007-03-25 Thread Ingo Molnar

* Con Kolivas <[EMAIL PROTECTED]> wrote:

> The patches available also include a rollup of sched: accurate user 
> accounting as this code touches the same area and it is most 
> convenient to include them together.

as i mentioned it before, please keep this one separate, as we want to 
apply it first.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 3/3] add the 'menu' cpuidle governor

2007-03-25 Thread Shaohua Li
Hi,
On Sat, 2007-03-24 at 03:47 -0400, Adam Belay wrote:
> This patch adds the 'menu' governor, as was described in my first email.
> 

> +/**
> + * menu_select - selects the next idle state to enter
> + * @dev: the CPU
> + */
> +static int menu_select(struct cpuidle_device *dev)
> +{
> + struct menu_device *data = &__get_cpu_var(menu_devices);
> + int i, expected_us, max_state = dev->state_count;
> +
> + /* discard BM history because it is sticky */
> + cpuidle_get_bm_activity();
Why discard BM history here? This way the next bm check almost always
return 0.
BTW, bm activity is global (Not cpu specific), we'd better account it
system wide.

> + /* determine the expected residency time */
> + expected_us = (s32) ktime_to_ns(tick_nohz_get_sleep_length()) / 1000;
> + expected_us = min(expected_us, data->break_last_us);
> +
> + /* determine the maximum state compatible with current BM status */
> + if (cpuidle_get_bm_activity())
> + data->bm_elapsed_us = 0;
> + if (data->bm_elapsed_us <= data->bm_holdoff_us)
> + max_state = data->deepest_bm_state + 1;
> +
> + /* find the deepest idle state that satisfies our constraints */
> + for (i = 1; i < max_state; i++) {
> + struct cpuidle_state *s = >states[i];
> + if (s->target_residency > expected_us)
> + break;
> + if (s->exit_latency > system_latency_constraint())
> + break;
> + }
> +
> + data->last_state_idx = i - 1;
> + data->idle_jiffies = tick_nohz_get_idle_jiffies();
> + return i - 1;
> +}
> +
> +/**
> + * menu_reflect - attempts to guess what happened after entry
> + * @dev: the CPU
> + *
> + * NOTE: it's important to be fast here because this operation will add to
> + *   the overall exit latency.
> + */
> +static void menu_reflect(struct cpuidle_device *dev)
> +{
> + struct menu_device *data = &__get_cpu_var(menu_devices);
> + int last_idx = data->last_state_idx;
> + int measured_us = cpuidle_get_last_residency(dev);
> + struct cpuidle_state *target = >states[last_idx];
> +
> + /*
> +  * Ugh, this idle state doesn't support residency measurements, so we
> +  * are basically lost in the dark.  As a compromise, assume we slept
> +  * for one full standard timer tick.  However, be aware that this
> +  * could potentially result in a suboptimal state transition.
> +  */
> + if (!(target->flags & CPUIDLE_FLAG_TIME_VALID))
> + measured_us = USEC_PER_SEC / HZ;
> +
> + data->bm_elapsed_us += measured_us;
> + data->break_elapsed_us += measured_us;
See the system state: idle->running->idle
Looks the bm_elapsed_us and break_elapsed_us account ingored the running
state between the two idles. Eg, the 'running' might generate a lot of
bm activity, then maybe we should reset bm_elapsed_us in the next
'idle'.

Thanks,
Shaohua
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3/6] 2.6.21-rc4: known regressions

2007-03-25 Thread Jeff Chua

On 3/26/07, Adrian Bunk <[EMAIL PROTECTED]> wrote:


> Resume from RAM (s2ram) still broke (tried with or without
> CONFIG_NO_HZ). Suspend to RAM seems ok, but upon resume, the screen
> will only display "inu" and only after pressing the power button will
> the system return to console. But "date" still doesn't advance.

This might be related to the following regression:

Subject: first disk access after resume takes several minutes
 ('date' does not advance after resume from RAM, CONFIG_NO_HZ=n)
References : http://lkml.org/lkml/2007/3/8/117
 http://lkml.org/lkml/2007/3/25/20
Submitter  : Michael S. Tsirkin <[EMAIL PROTECTED]>
Handled-By : Thomas Gleixner <[EMAIL PROTECTED]>
 Ingo Molnar <[EMAIL PROTECTED]>
Status : problem is being debugged


Adrian,

It's related. I tested without CONFIG_HPET_TIMER, and now my X60 can
suspend and resume from RAM (s2ram). Even better, it works
with/without CONFIG_NO_HZ.

But, suspend to disk still broke with CONFIG_NO_HZ set.

Thanks,
Jeff.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: accurate user accounting

2007-03-25 Thread Mike Galbraith
On Mon, 2007-03-26 at 08:11 +0300, Al Boldi wrote:

> > +   /* Sanity check. It should never go backwards or ruin accounting
> > */ +   if (unlikely(now < p->last_ran))
> > +   goto out_set;
> 
> If sched_clock() goes backwards, why not fix it, instead of hacking around 
> it?

When tasks change cpu, timestamp correction is attempted, but isn't
perfect (perfection: submit patch to read remote clock - people likely
toast poor submitter's buns very thoroughly).  Timewarps happen, and
even on Intel processors there seems to be very small difference in TSCs
in the same package.  (that's what instrumentation here said must be
true, possible booboos aside)

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux-VServer example results for sharing vs. separate mappings ...

2007-03-25 Thread Andrew Morton
On Mon, 26 Mar 2007 08:06:07 +0530 Balbir Singh <[EMAIL PROTECTED]> wrote:

> Andrew Morton wrote:
> >> Don't we break the global LRU with this scheme?
> > 
> > Sure, but that's deliberate!
> > 
> > (And we don't have a global LRU - the LRUs are per-zone).
> > 
> 
> Yes, true. But if we use zones for containers and say we have 400
> of them, with all of them under limit. When the system wants
> to reclaim memory, we might not end up reclaiming the best pages.
> Am I missing something?

If a zone is under its min_pages limit, it needs reclaim.  Who/when/why
that reclaim is run doesn't really matter.

Yeah, we might run into some scaling problems with that many zones. 
They're unlikely to be unfixable.


> >>> b) Create a new memory abstraction, call it the "software zone", which
> >>>is mostly decoupled from the present "hardware zones".  Most of the MM
> >>>is reworked to use "software zones".  The "software zones" are
> >>>runtime-resizeable, and obtain their pages via some means from the
> >>>hardware zones.  A container uses a software zone.
> >>>
> >> I think the problem would be figuring out where to allocate memory from?
> >> What happens if a software zone spans across many hardware zones?
> > 
> > Yes, that would be the tricky part.  But we generally don't care what
> > physical zone user pages come from, apart from NUMA optimisation.
> > 
> >> The reclaim mechanism proposed *does not impact the non-container users*.
> > 
> > Yup.  Let's keep plugging away with Pavel's approach, see where it gets us.
> > 
> 
> Yes, we have some changes that we've made to the reclaim logic, we hope
> to integrate a page cache controller soon. We are also testing the
> patches. Hopefully soon enough, they'll be in a good state and we can
> request you to merge the containers and the rss limit (plus page cache)
> controller soon.

Now I'm worried again.  This separation between "rss controller" and
"pagecache" is largely alien to memory reclaim.  With physical containers
these new concepts (and their implementations) don't need to exist - it is
already all implemented.

Designing brand-new memory reclaim machinery in mid-2007 sounds like a very
bad idea.   But let us see what it looks like.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch 3/7] integrity: EVM as an integrity service provider

2007-03-25 Thread Andrew Morton
On Sun, 25 Mar 2007 23:13:02 -0400 Mimi Zohar <[EMAIL PROTECTED]> wrote:

> On Sun, 2007-03-25 at 00:16 -0800, Andrew Morton wrote:
> > On Fri, 23 Mar 2007 12:09:36 -0400 Mimi Zohar <[EMAIL PROTECTED]> wrote:
> > 
> > > +++ linux-2.6.21-rc4-mm1/security/evm/Kconfig
> > > @@ -0,0 +1,17 @@
> > > +config INTEGRITY_EVM
> > > + boolean "EVM support"
> > > + depends on INTEGRITY && KEYS
> > > + select CRYPTO_HMAC
> > > + select CRYPTO_MD5
> > > + select CRYPTO_SHA1
> > > + default 0
> > > + help
> > > +   The Extended Verification Module is an integrity provider.
> > > +   An extensible set of extended attributes, as defined in
> > > +   /etc/evm.conf, are HMAC protected against modification
> > > +   using the TPM's KERNEL ROOT KEY, if configured, or with a
> > > +   pass-phrase.  Possible extended attributes include authenticity,
> > > +   integrity, and revision level.
> > > +
> > > +   If you are unsure how to answer this question, answer N.
> > > +
> > 
> > Is no dependency upon TPM needed?
> 
> It's obviously preferable to have and use a TPM, but if one is not
> available you can use a pass-phrase.
> 

So it will compile and run OK with CONFIG_TPM=n?  And with
CONFIG_INTEGRITY_EVM=y, CONFIG_TPM=m?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: accurate user accounting

2007-03-25 Thread Al Boldi
Con Kolivas wrote:
>
> Ok this one is heavily tested. Please try it when you find the time.

It's better, but still skewed.  Try two chew.c's; they account 80% each.

> ---
> Currently we only do cpu accounting to userspace based on what is
> actually happening precisely on each tick. The accuracy of that
> accounting gets progressively worse the lower HZ is. As we already keep
> accounting of nanosecond resolution we can accurately track user cpu,
> nice cpu and idle cpu if we move the accounting to update_cpu_clock with
> a nanosecond cpu_usage_stat entry.

That's great and much needed, but this is still probed; so what's wrong with 
doing it in-lined?

> This increases overhead slightly but
> avoids the problem of tick aliasing errors making accounting unreliable.

Higher scheduling accuracy may actually offset any overhead incurred, so it's 
well worth it; and if it's in-lined it should mean even less overhead.

> +   /* Sanity check. It should never go backwards or ruin accounting
> */ +   if (unlikely(now < p->last_ran))
> +   goto out_set;

If sched_clock() goes backwards, why not fix it, instead of hacking around 
it?


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: rSDl cpu scheduler version 0.34-test patch

2007-03-25 Thread Mike Galbraith
On Mon, 2007-03-26 at 11:00 +1000, Con Kolivas wrote:
> This is just for testing at the moment! The reason is the size of this patch.

(no testing done yet, but I have a couple comments)

> In the interest of evolution, I've taken the RSDL cpu scheduler and increased 
> the resolution of the task timekeeping to nanosecond resolution.

+   /* All the userspace visible cpu accounting is done here */
+   time_diff = now - p->last_ran;
...
+   /* cpu scheduler quota accounting is performed here */
+   if (p->policy != SCHED_FIFO)
+   p->time_slice -= time_diff;

If we still have any jiffies resolution clocks out there, this could be
a bit problematic.

+static inline void enqueue_pulled_task(struct rq *src_rq, struct rq *rq,
+  struct task_struct *p)
+{
+   int queue_prio;
+
+   p->array = rq->active; <== set
+   if (!rt_task(p)) {
+   if (p->rotation == src_rq->prio_rotation) {
+   if (p->array == src_rq->expired) { <== evaluate
+   queue_expired(p, rq);
+   goto out_queue;
+   }
+   if (p->time_slice < 0)
+   task_new_array(p, rq);
+   } else
+   task_new_array(p, rq);
+   }
+   queue_prio = next_entitled_slot(p, rq);

(bug aside, this special function really shouldn't exist imho, because
there's nothing special going on.  we didn't need it before to do the
same thing, so we shouldn't need it now.)

+static void recalc_task_prio(struct task_struct *p, struct rq *rq)
+{
+   struct prio_array *array = rq->active;
+   int queue_prio;
+
+   if (p->rotation == rq->prio_rotation) {
+   if (p->array == array) {
+   if (p->time_slice > 0)
+   return;
+   p->time_slice = p->quota;
+   } else if (p->array == rq->expired) {
+   queue_expired(p, rq);
+   return;
+   } else
+   task_new_array(p, rq);
+   } else

Dequeueing a task still leaves a stale p->array laying around to be
possibly evaluated later.  try_to_wake_up() doesn't currently evaluate
and set p->rotation (but should per design doc), so when you get here, a
cross-cpu waking task won't continue it's rotation.  If it did evaluate
and set, recalc_task_prio() would evaluate the guaranteed to fail these
tests array pointer, so the task will still not continue it's rotation.
Stale pointers are evil.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Use attribute groups in struct device_type

2007-03-25 Thread Dmitry Torokhov
Hi Greg,

On Saturday 10 March 2007 02:22, Greg KH wrote:
> On Sat, Mar 10, 2007 at 02:12:04AM -0500, Dmitry Torokhov wrote:
> > On Saturday 10 March 2007 01:55, Greg KH wrote:
> > > On Fri, Mar 09, 2007 at 10:54:43PM -0800, Greg KH wrote:
> > > > On Sat, Mar 10, 2007 at 01:37:34AM -0500, Dmitry Torokhov wrote:
> > > > > Greg,
> > > > > 
> > > > > Please consider applying the patch below. It switches struct 
> > > > > device_type
> > > > > to using attribute groups which os more flexible. I am using it in my
> > > > > input class_device -> device conversion (which is 99% done btw).
> > > > 
> > > > Argh, I never sent you my version of that, did I?  Very sorry about
> > > > that, I was working on fixing up the device namespace issue first, which
> > > > isn't done yet :(
> > > > 
> > > > Anyway, my patch that did that is below, feel free to use it or not if
> > > > you want.
> > > > 
> > > > > I looked through -mm and the latest git and there does not seem to be
> > > > > any users of struct device_type yet...
> > > > 
> > > > Yes, the input patch below uses it and I have a block-device patch from
> > > > Kay in my tree that Andrew doesn't pull from (as it's usually really
> > > > messed up and I know to hide this kind of breakage from him...)
> > > 
> > > Oops, that patch didn't use it, this follow-on patch from Kay uses them.
> > > 
> > 
> > Ok, so input portion in your tree does not use type->attrs so we don't
> > have a conflict here. Unless my patch messes up Kay's blockdev patch
> > badly I'd like you to accept it. Input uses 3 attribute groups and I
> > don't want to open-code their creation/removal.
> 
> I'll take your patch and see if it messes up Kay's.  If it does, I'm
> sure he will fix it up for me later :)
> 

Here is the patch converting input core to struct device. It is a bit
different from yours as it does full conversion of handles into devices
(instead of just associating dynamically created device structure with
handles). The patch depends on bunch of other changes in my tree and
so will not apply to mainline so if you'd like to try it just pull
from my tree and then apply. I am interested whether new udev will
handle this.

I would also like to put it in my tree but it depends on the patch
adding attribute groups to the struct device_type which is in your
domain. What would be the best way for getting this to -mm?   

-- 
Dmitry

Input: convert from class devices to regular devices

Signed-off-by: Dmitry Torokhov <[EMAIL PROTECTED]>
---

 drivers/input/evdev.c|   84 +--
 drivers/input/input.c|  135 +--
 drivers/input/joydev.c   |   84 +--
 drivers/input/mousedev.c |  204 ++-
 drivers/input/tsdev.c|   66 ++-
 include/linux/input.h|   19 ++--
 6 files changed, 313 insertions(+), 279 deletions(-)

Index: work/drivers/input/input.c
===
--- work.orig/drivers/input/input.c
+++ work/drivers/input/input.c
@@ -443,7 +443,7 @@ static int input_attach_handler(struct i
printk(KERN_ERR
"input: failed to attach handler %s to device %s, "
"error: %d\n",
-   handler->name, kobject_name(>cdev.kobj), error);
+   handler->name, kobject_name(>dev.kobj), error);
 
return error;
 }
@@ -528,7 +528,7 @@ static void input_seq_print_bitmap(struc
 static int input_devices_seq_show(struct seq_file *seq, void *v)
 {
struct input_dev *dev = container_of(v, struct input_dev, node);
-   const char *path = kobject_get_path(>cdev.kobj, GFP_KERNEL);
+   const char *path = kobject_get_path(>dev.kobj, GFP_KERNEL);
struct input_handle *handle;
 
seq_printf(seq, "I: Bus=%04x Vendor=%04x Product=%04x Version=%04x\n",
@@ -683,15 +683,17 @@ static inline int input_proc_init(void) 
 static inline void input_proc_exit(void) { }
 #endif
 
-#define INPUT_DEV_STRING_ATTR_SHOW(name)   
\
-static ssize_t input_dev_show_##name(struct class_device *dev, char *buf)  
\
-{  
\
-   struct input_dev *input_dev = to_input_dev(dev);
\
-   
\
-   return scnprintf(buf, PAGE_SIZE, "%s\n",
\
-input_dev->name ? input_dev->name : "");   
\
-}  
\
-static CLASS_DEVICE_ATTR(name, S_IRUGO, input_dev_show_##name, NULL);
+#define INPUT_DEV_STRING_ATTR_SHOW(name)   \
+static ssize_t input_dev_show_##name(struct device *dev,   \
+struct device_attribute *attr, \
+  

Re: [BUG] Patch "CPU hotplug: call check_tsc_sync_source() with irqs off" breaks some drivers

2007-03-25 Thread Nicolas Boichat
Nicolas Boichat wrote:
> Hello,
> 
> I'm running a Macbook Pro first generation (Core Duo, so x86).
> 
> I ran accross these two problems while upgrading from 2.6.21-rc3 to the
> current git HEAD:
>  1. appletouch cannot initialize the device properly at boot time (the
> module is automatically loaded by Gentoo), I have to reload the module
> to get it to work.
>  2. ALSA hda_intel (patch_sigmatel) fails to read properly the subsystem
> id. "head  /proc/asound/card0/codec#0" returns:
>> Codec: SigmaTel STAC9221 A1
>> Address: 0
>> Vendor Id: 0x83847680
>> Subsystem Id: 0x100
>> Revision Id: 0x103401
> while I expect this subsystem id: 0x106b0200.
> This is due to a read failure in sound/pci/hda/hda_codec.c at line 553
> (I have to reboot on OS X to get the id correct again, and it seems to
> happen quite randomly, so I'm not absolutely certain this bug is
> related, but it's something i have never seen before...)
> 
> I found out which commit seems to cause these bugs:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=d04f41e35343f1d788551fd3f753f51794f4afcf
> 
> The latest GIT without this commit works fine, but doesn't with it.

Sorry about blaming this commit. The problems happen randomly (about 1
reboot over 2 is ok, at least with 2.6.21-rc4). I'll run more tests and
post the results later.

Best regards,

Nicolas

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fat/vfat: optionally ignore system timezone offset when reading/writing timestamps

2007-03-25 Thread Hiroyuki Machida
HI OGAWA-san and Paul-san,

Sorry late response,

I'm not famillar with recent fat code, but code itself looks good for
just turn on/off time adjusting. On the other hand, I feel we need more 
consideration on use cases/requirements. I feel that turning off
time adjustment is a just ad-hoc solution to issues like Paul san 
brought up.

Thanks,
Hiroyuki
 

OGAWA Hirofumi <[EMAIL PROTECTED]> wrote on 2007/03/19 23:31:06:
> Paul Collins <[EMAIL PROTECTED]> writes:
> 
> > Hello,
> 
> Hello,
> 
> > Here is a patch that adds a mount option named "posixtime" that, when
> > enabled, causes the fat/vfat code to not adjust timestamps as they are
> > read/written to/from disk.  The intent of the adjustment as performed
> > by the existing code appears to be to present correct timestamps to
> > Windows and friends, which treat the timestamp values as local time.
> >
> > However, the systems that I use to update my FAT32 filesystems are all
> > running Linux, and as such will process POSIX timestamps correctly.
> > (The filesystems are on disks in digital audio players, which do not
> > process timestamps except to check if they have changed.)  Due to the
> > aforementioned adjustment on read/write, after a Daylight Savings
> > transition I must update the file timestamps on the filesystems so
> > that rsync does not needlessly re-copy files.
> >
> > Please review and consider applying this patch.
> 
> Thanks. Looks good to me as start. IIRC, Machida-san did this few
> years ago. Machida-san, do you have any comment?
> 
> > diff --git a/fs/fat/dir.c b/fs/fat/dir.c
> > index c16af24..d6c0a7f 100644
> > --- a/fs/fat/dir.c
> > +++ b/fs/fat/dir.c
> > @@ -1064,7 +1064,7 @@ int fat_alloc_new_dir(struct inode *dir, struct 
> timespec *ts)
> >goto error_free;
> >   }
> >
> > - fat_date_unix2dos(ts->tv_sec, , );
> > + fat_date_unix2dos(ts->tv_sec, , , sbi->options.adjust);
> >
> >   de = (struct msdos_dir_entry *)bhs[0]->b_data;
> >   /* filling the new directory slots ("." and ".." entries) */
> > diff --git a/fs/fat/inode.c b/fs/fat/inode.c
> > index a9e4688..02d9225 100644
> > --- a/fs/fat/inode.c
> > +++ b/fs/fat/inode.c
> > @@ -374,17 +374,20 @@ static int fat_fill_inode(struct inode *inode, 
> struct msdos_dir_entry *de)
> >   inode->i_blocks = ((inode->i_size + (sbi->cluster_size - 1))
> >& ~((loff_t)sbi->cluster_size - 1)) >> 9;
> >   inode->i_mtime.tv_sec =
> > -  date_dos2unix(le16_to_cpu(de->time), le16_to_cpu(de->date));
> > +  date_dos2unix(le16_to_cpu(de->time), le16_to_cpu(de->date),
> > + sbi->options.adjust);
> >   inode->i_mtime.tv_nsec = 0;
> >   if (sbi->options.isvfat) {
> >int secs = de->ctime_cs / 100;
> >int csecs = de->ctime_cs % 100;
> >inode->i_ctime.tv_sec  =
> > date_dos2unix(le16_to_cpu(de->ctime),
> > -  le16_to_cpu(de->cdate)) + secs;
> > +  le16_to_cpu(de->cdate),
> > +  sbi->options.adjust) + secs;
> >inode->i_ctime.tv_nsec = csecs * 1000;
> >inode->i_atime.tv_sec =
> > -   date_dos2unix(0, le16_to_cpu(de->adate));
> > +   date_dos2unix(0, le16_to_cpu(de->adate),
> > +  sbi->options.adjust);
> >inode->i_atime.tv_nsec = 0;
> >   } else
> >inode->i_ctime = inode->i_atime = inode->i_mtime;
> > @@ -592,11 +595,14 @@ retry:
> >   raw_entry->attr = fat_attr(inode);
> >   raw_entry->start = cpu_to_le16(MSDOS_I(inode)->i_logstart);
> >   raw_entry->starthi = cpu_to_le16(MSDOS_I(inode)->i_logstart >> 16);
> > - fat_date_unix2dos(inode->i_mtime.tv_sec, _entry->time, 
> _entry->date);
> > + fat_date_unix2dos(inode->i_mtime.tv_sec, _entry->time, 
> _entry->date,
> > + sbi->options.adjust);
> >   if (sbi->options.isvfat) {
> >__le16 atime;
> > - 
> fat_date_unix2dos(inode->i_ctime.tv_sec,_entry->ctime,_entry->cd
> ate);
> > -  fat_date_unix2dos(inode->i_atime.tv_sec,,_entry->adate);
> > + 
> fat_date_unix2dos(inode->i_ctime.tv_sec,_entry->ctime,_entry->cd
> ate,
> > +  sbi->options.adjust);
> > +  fat_date_unix2dos(inode->i_atime.tv_sec,,_entry->adate,
> > +  sbi->options.adjust);
> >raw_entry->ctime_cs = (inode->i_ctime.tv_sec & 1) * 100 +
> > inode->i_ctime.tv_nsec / 1000;
> >   }
> > @@ -854,7 +860,7 @@ enum {
> >   Opt_charset, Opt_shortname_lower, Opt_shortname_win95,
> >   Opt_shortname_winnt, Opt_shortname_mixed, Opt_utf8_no, Opt_utf8_yes,
> >   Opt_uni_xl_no, Opt_uni_xl_yes, Opt_nonumtail_no, Opt_nonumtail_yes,
> > - Opt_obsolate, Opt_flush, Opt_err,
> > + Opt_obsolate, Opt_flush, Opt_posixtime, Opt_err,
> >  };
> >
> >  static match_table_t fat_tokens = {
> > @@ -887,6 +893,7 @@ static match_table_t fat_tokens = {
> >   {Opt_obsolate, "cvf_options=%100s"},
> >   {Opt_obsolate, "posix"},
> >   {Opt_flush, "flush"},
> > + {Opt_posixtime, "posixtime"},
> >   {Opt_err, NULL},
> >  };
> >  static match_table_t msdos_tokens = {
> > @@ -950,6 +957,7 @@ static int parse_options(char *options, int 
> is_vfat, int silent, int *debug,
> >   opts->utf8 = opts->unicode_xlate = 0;
> >   

Re: [2.6.22 patch] more scheduled OSS driver removal

2007-03-25 Thread Lee Revell

On 3/25/07, Richard Knutsson <[EMAIL PROTECTED]> wrote:

Adrian Bunk wrote:
> This patch contains the scheduled removal of OSS drivers that:
> - have ALSA drivers for the same hardware without known regressions and
> - whose Kconfig options have been removed in 2.6.20.
>
Sorry for this late respond

Is sound/oss/emu10k1/ really ready to be replaced? According to a friend
of mine, who got the hardware, the ALSA still lacks feature provided by
the OSS-driver. He mentioned some sort of programmable memory (forgot
the name for that) with whom you can run small programs to manipulate
the sounds (adding echo and such).


He's referring to EMU DSP manager.  (emudspmgr I think it's called).
The ALSA equivalent is ld10k1.  It might not be as mature as the OSS
equivalent, but are we really going to keep obsolete drivers in the
kernel just because the userspace tools for the old driver are more
mature?


Also, he got problems with getting
more accurate control of the volume (guess this is more a userland issue).
He seems to have done his homework but only got to pages where other
peoples had the same problem.



I guess he's referring to the well known "Master volume only controls
front output" problem.  This really does need to be resolved, as many
other ALSA drivers are effected.

Lee
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC: 2.6 patch] sisfb_get_fix(): remove the unused second parameter

2007-03-25 Thread Adrian Bunk
Most likely this patch isn't correct but points to a bug.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---

 drivers/video/sis/sis_main.c |   11 ---
 drivers/video/sis/sis_main.h |4 ++--
 2 files changed, 6 insertions(+), 9 deletions(-)

--- linux-2.6.21-rc4-mm1/drivers/video/sis/sis_main.h.old   2007-03-26 
00:26:53.0 +0200
+++ linux-2.6.21-rc4-mm1/drivers/video/sis/sis_main.h   2007-03-26 
00:27:10.0 +0200
@@ -672,8 +672,8 @@
 SISINITSTATIC int sisfb_init(void);
 
 /* fbdev routines */
-static int sisfb_get_fix(struct fb_fix_screeninfo *fix, int con,
-   struct fb_info *info);
+static int sisfb_get_fix(struct fb_fix_screeninfo *fix,
+ struct fb_info *info);
 
 static int sisfb_ioctl(struct fb_info *info, unsigned int cmd,
unsigned long arg);
--- linux-2.6.21-rc4-mm1/drivers/video/sis/sis_main.c.old   2007-03-26 
00:25:33.0 +0200
+++ linux-2.6.21-rc4-mm1/drivers/video/sis/sis_main.c   2007-03-26 
00:26:35.0 +0200
@@ -1430,11 +1430,8 @@
if((err = sisfb_do_set_var(>var, 1, info)))
return err;
 
-#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,10)
-   sisfb_get_fix(>fix, info->currcon, info);
-#else
-   sisfb_get_fix(>fix, -1, info);
-#endif
+   sisfb_get_fix(>fix, info);
+
return 0;
 }
 
@@ -1853,7 +1850,7 @@
 }
 
 static int
-sisfb_get_fix(struct fb_fix_screeninfo *fix, int con, struct fb_info *info)
+sisfb_get_fix(struct fb_fix_screeninfo *fix, struct fb_info *info)
 {
struct sis_video_info *ivideo = (struct sis_video_info *)info->par;
 
@@ -6386,7 +6383,7 @@
sis_fb_info->fix = ivideo->sisfb_fix;
sis_fb_info->screen_base = ivideo->video_vbase + 
ivideo->video_offset;
sis_fb_info->fbops = _ops;
-   sisfb_get_fix(_fb_info->fix, -1, sis_fb_info);
+   sisfb_get_fix(_fb_info->fix, sis_fb_info);
sis_fb_info->pseudo_palette = ivideo->pseudo_palette;
 
fb_alloc_cmap(_fb_info->cmap, 256 , 0);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC: 2.6 patch] drivers/scsi/ips.c: remove kernel 2.4 code

2007-03-25 Thread Adrian Bunk
This patch removes some kernel 2.4 code.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---

 drivers/scsi/ips.c |  145 +
 drivers/scsi/ips.h |   44 -
 2 files changed, 20 insertions(+), 169 deletions(-)

--- linux-2.6.21-rc4-mm1/drivers/scsi/ips.h.old 2007-03-25 20:10:28.0 
+0200
+++ linux-2.6.21-rc4-mm1/drivers/scsi/ips.h 2007-03-25 20:22:28.0 
+0200
@@ -58,10 +58,6 @@
/*
 * Some handy macros
 */
-   #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,4,20) || defined CONFIG_HIGHIO
-  #define IPS_HIGHIO
-   #endif
-
#define IPS_HA(x)   ((ips_ha_t *) x->hostdata)
#define IPS_COMMAND_ID(ha, scb) (int) (scb - ha->scbs)
#define IPS_IS_TROMBONE(ha) (((ha->device_id == 
IPS_DEVICEID_COPPERHEAD) && \
@@ -84,38 +80,8 @@
 #define IPS_SGLIST_SIZE(ha)   (IPS_USE_ENH_SGLIST(ha) ? \
  sizeof(IPS_ENH_SG_LIST) : 
sizeof(IPS_STD_SG_LIST))
 
-   #if LINUX_VERSION_CODE < KERNEL_VERSION(2,4,4)
-  #define pci_set_dma_mask(dev,mask) ( mask > 0x ? 1:0 )
-  #define scsi_set_pci_device(sh,dev) (0)
-   #endif
-
-   #ifndef IRQ_NONE
-  typedef void irqreturn_t;
-  #define IRQ_NONE
-  #define IRQ_HANDLED
-  #define IRQ_RETVAL(x)
-   #endif
-   
-   #if LINUX_VERSION_CODE < KERNEL_VERSION(2,5,0)
-  #define IPS_REGISTER_HOSTS(SHT)  
scsi_register_module(MODULE_SCSI_HA,SHT)
-  #define IPS_UNREGISTER_HOSTS(SHT)
scsi_unregister_module(MODULE_SCSI_HA,SHT)
-  #define IPS_ADD_HOST(shost,device)
-  #define IPS_REMOVE_HOST(shost)
-  #define IPS_SCSI_SET_DEVICE(sh,ha)   scsi_set_pci_device(sh, 
(ha)->pcidev)
-  #define IPS_PRINTK(level, pcidev, format, arg...) \
-printk(level "%s %s:" format , "ips" , \
-(pcidev)->slot_name , ## arg)
-  #define scsi_host_alloc(sh,size) scsi_register(sh,size)
-  #define scsi_host_put(sh) scsi_unregister(sh)
-   #else
-  #define IPS_REGISTER_HOSTS(SHT)  (!ips_detect(SHT))
-  #define IPS_UNREGISTER_HOSTS(SHT)
-  #define IPS_ADD_HOST(shost,device)   do { scsi_add_host(shost,device); 
scsi_scan_host(shost); } while (0)
-  #define IPS_REMOVE_HOST(shost)   scsi_remove_host(shost)
-  #define IPS_SCSI_SET_DEVICE(sh,ha)   do { } while (0)
-  #define IPS_PRINTK(level, pcidev, format, arg...) \
+  #define IPS_PRINTK(level, pcidev, format, arg...) \
 dev_printk(level , &((pcidev)->dev) , format , ## arg)
-   #endif
 
#define MDELAY(n)   \
do {\
@@ -134,7 +100,7 @@
#define pci_dma_hi32(a) ((a >> 16) >> 16)
#define pci_dma_lo32(a) (a & 0x)
 
-   #if (BITS_PER_LONG > 32) || (defined CONFIG_HIGHMEM64G && defined 
IPS_HIGHIO)
+   #if (BITS_PER_LONG > 32) || defined(CONFIG_HIGHMEM64G)
   #define IPS_ENABLE_DMA64(1)
#else
   #define IPS_ENABLE_DMA64(0)
@@ -451,16 +417,10 @@
/*
 * Scsi_Host Template
 */
-#if LINUX_VERSION_CODE < KERNEL_VERSION(2,5,0)
-   static int ips_proc24_info(char *, char **, off_t, int, int, int);
-   static void ips_select_queue_depth(struct Scsi_Host *, struct scsi_device 
*);
-   static int ips_biosparam(Disk *disk, kdev_t dev, int geom[]);
-#else
static int ips_proc_info(struct Scsi_Host *, char *, char **, off_t, int, 
int);
static int ips_biosparam(struct scsi_device *sdev, struct block_device 
*bdev,
sector_t capacity, int geom[]);
static int ips_slave_configure(struct scsi_device *SDptr);
-#endif
 
 /*
  * Raid Command Formats
--- linux-2.6.21-rc4-mm1/drivers/scsi/ips.c.old 2007-03-25 20:11:39.0 
+0200
+++ linux-2.6.21-rc4-mm1/drivers/scsi/ips.c 2007-03-25 20:20:35.0 
+0200
@@ -211,19 +211,6 @@
 #warning "This driver has only been tested on the x86/ia64/x86_64 platforms"
 #endif
 
-#if LINUX_VERSION_CODE <= KERNEL_VERSION(2,5,0)
-#include 
-#include "sd.h"
-#define IPS_LOCK_SAVE(lock,flags) spin_lock_irqsave(_request_lock,flags)
-#define IPS_UNLOCK_RESTORE(lock,flags) 
spin_unlock_irqrestore(_request_lock,flags)
-#ifndef __devexit_p
-#define __devexit_p(x) x
-#endif
-#else
-#define IPS_LOCK_SAVE(lock,flags) do{spin_lock(lock);(void)flags;}while(0)
-#define IPS_UNLOCK_RESTORE(lock,flags) 
do{spin_unlock(lock);(void)flags;}while(0)
-#endif
-
 #define IPS_DMA_DIR(scb) ((!scb->scsi_cmd || ips_is_passthru(scb->scsi_cmd) || 
\
  DMA_NONE == scb->scsi_cmd->sc_data_direction) ? \
  PCI_DMA_BIDIRECTIONAL : \
@@ -381,24 +368,13 @@
.eh_abort_handler   = ips_eh_abort,
.eh_host_reset_handler  = ips_eh_reset,
.proc_name  = "ips",
-#if LINUX_VERSION_CODE > KERNEL_VERSION(2,5,0)
.proc_info  = ips_proc_info,
.slave_configure= 

[2.6 patch] drivers/media/video/bt8xx/bttv-cards.c: fix off-by-one

2007-03-25 Thread Adrian Bunk
This patch fixes an off-by-one error spotted by the Coverity checker.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---
--- linux-2.6.21-rc4-mm1/drivers/media/video/bt8xx/bttv-cards.c.old 
2007-03-26 00:00:07.0 +0200
+++ linux-2.6.21-rc4-mm1/drivers/media/video/bt8xx/bttv-cards.c 2007-03-26 
00:00:55.0 +0200
@@ -2983,7 +2983,7 @@
bttv_tvcards[btv->c.type].gpiomask = (UNSET != gpiomask) ? gpiomask : 
gpiobits;
printk(KERN_INFO "bttv%d: gpio config override: mask=0x%x, mux=",
   btv->c.nr,bttv_tvcards[btv->c.type].gpiomask);
-   for (i = 0; i < 5; i++) {
+   for (i = 0; i < 4; i++) {
printk("%s0x%x", i ? "," : "", 
bttv_tvcards[btv->c.type].gpiomux[i]);
}
printk("\n");

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[2.6 patch] drivers/video/sis/: remove more kernel 2.4 code

2007-03-25 Thread Adrian Bunk
This patch removes the remaining kernel 2.4 code.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---

 drivers/video/sis/osdef.h|5 -
 drivers/video/sis/sis.h  |   50 
 drivers/video/sis/sis_main.c |  105 ++-
 3 files changed, 22 insertions(+), 138 deletions(-)

--- linux-2.6.21-rc4-mm1/drivers/video/sis/osdef.h.old  2007-03-25 
20:41:24.0 +0200
+++ linux-2.6.21-rc4-mm1/drivers/video/sis/osdef.h  2007-03-25 
20:41:38.0 +0200
@@ -58,9 +58,6 @@
 #define SIS_LINUX_KERNEL   /* Linux kernel framebuffer */
 #undef  SIS_XORG_XF86  /* XFree86/X.org */
 
-#undef SIS_LINUX_KERNEL_24
-#undef SIS_LINUX_KERNEL_26
-
 #ifdef OutPortByte
 #undef OutPortByte
 #endif
@@ -100,8 +97,6 @@
 #define SIS315H
 #endif
 
-#define SIS_LINUX_KERNEL_26
-
 #if !defined(SIS300) && !defined(SIS315H)
 #warning Neither CONFIG_FB_SIS_300 nor CONFIG_FB_SIS_315 is set
 #warning sisfb will not work!
--- linux-2.6.21-rc4-mm1/drivers/video/sis/sis.h.old2007-03-25 
20:43:55.0 +0200
+++ linux-2.6.21-rc4-mm1/drivers/video/sis/sis.h2007-03-25 
20:47:29.0 +0200
@@ -27,11 +27,7 @@
 #include 
 
 #include "osdef.h"
-#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,5,0)
 #include 
-#else
-#include 
-#endif
 
 #include "vgatypes.h"
 #include "vstruct.h"
@@ -40,12 +36,8 @@
 #define VER_MINOR  8
 #define VER_LEVEL  9
 
-#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,5,0)
 #include 
-#define SIS_PCI_GET_CLASS(a, b) pci_get_class(a, b)
-#define SIS_PCI_GET_DEVICE(a,b,c) pci_get_device(a,b,c)
-#define SIS_PCI_GET_SLOT(a,b) pci_get_slot(a,b)
-#define SIS_PCI_PUT_DEVICE(a) pci_dev_put(a)
+
 #ifdef CONFIG_COMPAT
 #if LINUX_VERSION_CODE <= KERNEL_VERSION(2,6,10)
 #include 
@@ -55,18 +47,7 @@
 #define SIS_NEW_CONFIG_COMPAT
 #endif
 #endif /* CONFIG_COMPAT */
-#else  /* 2.4 */
-#define SIS_PCI_GET_CLASS(a, b) pci_find_class(a, b)
-#define SIS_PCI_GET_DEVICE(a,b,c) pci_find_device(a,b,c)
-#define SIS_PCI_GET_SLOT(a,b) pci_find_slot(a,b)
-#define SIS_PCI_PUT_DEVICE(a)
-#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,4,19)
-#ifdef __x86_64__  /* Shouldn't we check for CONFIG_IA32_EMULATION here? */
-#include 
-#define SIS_OLD_CONFIG_COMPAT
-#endif
-#endif
-#endif /* 2.4 */
+
 #if LINUX_VERSION_CODE > KERNEL_VERSION(2,6,8)
 #define SIS_IOTYPE1 void __iomem
 #define SIS_IOTYPE2 __iomem
@@ -498,26 +479,8 @@
 
struct fb_var_screeninfo default_var;
 
-#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,5,0)
struct fb_fix_screeninfo sisfb_fix;
u32 pseudo_palette[17];
-#endif
-
-#if LINUX_VERSION_CODE < KERNEL_VERSION(2,5,0)
-   struct display   sis_disp;
-   struct display_switchsisfb_sw;
-   struct {
-   u16 red, green, blue, pad;
-   }   sis_palette[256];
-   union {
-#ifdef FBCON_HAS_CFB16
-   u16 cfb16[16];
-#endif
-#ifdef FBCON_HAS_CFB32
-   u32 cfb32[16];
-#endif
-   }   sis_fbcon_cmap;
-#endif
 
struct sisfb_monitor {
u16 hmin;
@@ -538,10 +501,6 @@
 
int mni;/* Mode number index */
 
-#if LINUX_VERSION_CODE < KERNEL_VERSION(2,5,0)
-   int currcon;
-#endif
-
unsigned long   video_size;
unsigned long   video_base;
unsigned long   mmio_size;
@@ -578,9 +537,6 @@
int sisfb_tvplug;
int sisfb_tvstd;
int sisfb_nocrt2rate;
-#if LINUX_VERSION_CODE < KERNEL_VERSION(2,5,0)
-   int sisfb_inverse;
-#endif
 
u32 heapstart;  /* offset  */
SIS_IOTYPE1 *sisfb_heap_start;  /* address */
@@ -646,9 +602,7 @@
int modechanged;
unsigned char   modeprechange;
 
-#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,5,0)
u8  sisfb_lastrates[128];
-#endif
 
int newrom;
int haveXGIROM;
--- linux-2.6.21-rc4-mm1/drivers/video/sis/sis_main.c.old   2007-03-25 
20:45:12.0 +0200
+++ linux-2.6.21-rc4-mm1/drivers/video/sis/sis_main.c   2007-03-25 
20:54:46.0 +0200
@@ -1948,7 +1948,7 @@
default:return NULL;
}
for(i = 0; i < nbridgenum; i++) {
-   if((pdev = SIS_PCI_GET_DEVICE(PCI_VENDOR_ID_SI,
+   if((pdev = pci_get_device(PCI_VENDOR_ID_SI,
nbridgeids[nbridgeidx+i], NULL)))
break;
}
@@ -4613,9 +4613,9 @@
unsigned short temp;
int ret = 0;
 
-   while((pdev = SIS_PCI_GET_CLASS(PCI_CLASS_BRIDGE_HOST, pdev))) {
+   while((pdev = pci_get_class(PCI_CLASS_BRIDGE_HOST, pdev))) {
temp = pdev->vendor;
-   SIS_PCI_PUT_DEVICE(pdev);
+   pci_dev_put(pdev);
if(temp == pcivendor) {
ret = 1;
break;
@@ 

[2.6 patch] drivers/scsi/pcmcia/nsp_cs.c: remove kernel 2.4 code

2007-03-25 Thread Adrian Bunk
This patch removes kernel 2.4 code and some ancient CVS tags.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---

 drivers/scsi/pcmcia/nsp_cs.c |  159 +--
 drivers/scsi/pcmcia/nsp_cs.h |8 -
 2 files changed, 5 insertions(+), 162 deletions(-)

--- linux-2.6.21-rc4-mm1/drivers/scsi/pcmcia/nsp_cs.h.old   2007-03-25 
19:29:38.0 +0200
+++ linux-2.6.21-rc4-mm1/drivers/scsi/pcmcia/nsp_cs.h   2007-03-25 
19:33:31.0 +0200
@@ -10,8 +10,6 @@
 
 =*/
 
-/* $Id: nsp_cs.h,v 1.19 2003/08/18 11:09:19 elca Exp $ */
-
 #ifndef  __nsp_cs__
 #define  __nsp_cs__
 
@@ -227,13 +225,7 @@
 typedef struct scsi_info_t {
struct pcmcia_device*p_dev;
struct Scsi_Host  *host;
-#if (LINUX_VERSION_CODE >= KERNEL_VERSION(2,5,74))
dev_node_t node;
-#else
-   intndev;
-   dev_node_t node[8];
-   struct bus_operations *bus;
-#endif
intstop;
 } scsi_info_t;
 
--- linux-2.6.21-rc4-mm1/drivers/scsi/pcmcia/nsp_cs.c.old   2007-03-25 
19:30:03.0 +0200
+++ linux-2.6.21-rc4-mm1/drivers/scsi/pcmcia/nsp_cs.c   2007-03-25 
19:40:56.0 +0200
@@ -25,8 +25,6 @@
 
 ***/
 
-/* $Id: nsp_cs.c,v 1.23 2003/08/18 11:09:19 elca Exp $ */
-
 #include 
 #include 
 #include 
@@ -59,7 +57,7 @@
 #include "nsp_cs.h"
 
 MODULE_AUTHOR("YOKOTA Hiroshi <[EMAIL PROTECTED]>");
-MODULE_DESCRIPTION("WorkBit NinjaSCSI-3 / NinjaSCSI-32Bi(16bit) PCMCIA SCSI 
host adapter module $Revision: 1.23 $");
+MODULE_DESCRIPTION("WorkBit NinjaSCSI-3 / NinjaSCSI-32Bi(16bit) PCMCIA SCSI 
host adapter module");
 MODULE_SUPPORTED_DEVICE("sd,sr,sg,st");
 #ifdef MODULE_LICENSE
 MODULE_LICENSE("GPL");
@@ -83,10 +81,6 @@
.proc_name   = "nsp_cs",
.proc_info   = nsp_proc_info,
.name= "WorkBit NinjaSCSI-3/32Bi(16bit)",
-#if (LINUX_VERSION_CODE < KERNEL_VERSION(2,5,0))
-   .detect  = nsp_detect_old,
-   .release = nsp_release_old,
-#endif
.info= nsp_info,
.queuecommand= nsp_queuecommand,
 /* .eh_abort_handler= nsp_eh_abort,*/
@@ -97,9 +91,6 @@
.sg_tablesize= SG_ALL,
.cmd_per_lun = 1,
.use_clustering  = DISABLE_CLUSTERING,
-#if (LINUX_VERSION_CODE < KERNEL_VERSION(2,5,2))
-   .use_new_eh_code = 1,
-#endif
 };
 
 static nsp_hw_data nsp_data_base; /* attach <-> detect glue */
@@ -1313,11 +1304,7 @@
nsp_hw_data *data_b = _data_base, *data;
 
nsp_dbg(NSP_DEBUG_INIT, "this_id=%d", sht->this_id);
-#if (LINUX_VERSION_CODE > KERNEL_VERSION(2,5,73))
host = scsi_host_alloc(_driver_template, sizeof(nsp_hw_data));
-#else
-   host = scsi_register(sht, sizeof(nsp_hw_data));
-#endif
if (host == NULL) {
nsp_dbg(NSP_DEBUG_INIT, "host failed");
return NULL;
@@ -1354,37 +1341,6 @@
return host; /* detect done. */
 }
 
-#if (LINUX_VERSION_CODE < KERNEL_VERSION(2,5,0))
-static int nsp_detect_old(struct scsi_host_template *sht)
-{
-   if (nsp_detect(sht) == NULL) {
-   return 0;
-   } else {
-   //MOD_INC_USE_COUNT;
-   return 1;
-   }
-}
-
-
-static int nsp_release_old(struct Scsi_Host *shpnt)
-{
-   //nsp_hw_data *data = (nsp_hw_data *)shpnt->hostdata;
-
-   /* PCMCIA Card Service dose same things below. */
-   /* So we do nothing.   */
-   //if (shpnt->irq) {
-   //  free_irq(shpnt->irq, data->ScsiInfo);
-   //}
-   //if (shpnt->io_port) {
-   //  release_region(shpnt->io_port, shpnt->n_io_port);
-   //}
-
-   //MOD_DEC_USE_COUNT;
-
-   return 0;
-}
-#endif
-
 /**/
 /* return info string*/
 /**/
@@ -1403,19 +1359,9 @@
nsp_dbg(NSP_DEBUG_PROC, "buffer=0x%p pos=0x%p length=%d 
%d\n", buffer, pos, length,  length - (pos - buffer));\
} \
} while(0)
-static int
-nsp_proc_info(
-#if (LINUX_VERSION_CODE > KERNEL_VERSION(2,5,73))
-   struct Scsi_Host *host,
-#endif
-   char  *buffer,
-   char **start,
-   off_t  offset,
-   intlength,
-#if !(LINUX_VERSION_CODE > KERNEL_VERSION(2,5,73))
-   inthostno,
-#endif
-   intinout)
+
+static int nsp_proc_info(struct Scsi_Host *host, char *buffer, char **start,
+off_t offset, int length, int inout)
 {
int id;
char *pos = buffer;
@@ -1423,24 +1369,13 @@
int speed;
unsigned long flags;
nsp_hw_data *data;
-#if !(LINUX_VERSION_CODE > 

[2.6 patch] arch/cris/arch-v32/kernel/fasttimer.c: remove kernel 2.0 code

2007-03-25 Thread Adrian Bunk
This patch removes code for kernel 2.0 (sic).

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---

 arch/cris/arch-v32/kernel/fasttimer.c |   36 ++
 1 file changed, 3 insertions(+), 33 deletions(-)

--- linux-2.6.21-rc4-mm1/arch/cris/arch-v32/kernel/fasttimer.c.old  
2007-03-25 19:50:26.0 +0200
+++ linux-2.6.21-rc4-mm1/arch/cris/arch-v32/kernel/fasttimer.c  2007-03-25 
19:52:36.0 +0200
@@ -602,38 +602,14 @@
 }
 
 #ifdef CONFIG_PROC_FS
-static int proc_fasttimer_read(char *buf, char **start, off_t offset, int len
-#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,2,0)
-   ,int *eof, void *data_unused
-#else
-,int unused
-#endif
-   );
-#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,2,0)
-static struct proc_dir_entry *fasttimer_proc_entry;
-#else
-static struct proc_dir_entry fasttimer_proc_entry =
-{
-  0, 9, "fasttimer",
-  S_IFREG | S_IRUGO, 1, 0, 0,
-  0, NULL /* ops -- default to array */,
-  _fasttimer_read /* get_info */,
-};
-#endif
-#endif /* CONFIG_PROC_FS */
 
-#ifdef CONFIG_PROC_FS
+static struct proc_dir_entry *fasttimer_proc_entry;
 
 /* This value is very much based on testing */
 #define BIG_BUF_SIZE (500 + NUM_TIMER_STATS * 300)
 
-static int proc_fasttimer_read(char *buf, char **start, off_t offset, int len
-#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,2,0)
-   ,int *eof, void *data_unused
-#else
-,int unused
-#endif
-   )
+static int proc_fasttimer_read(char *buf, char **start, off_t offset, int len,
+  int *eof, void *data_unused)
 {
   unsigned long flags;
   int i = 0;
@@ -808,9 +784,7 @@
 
   memcpy(buf, bigbuf + offset, len);
   *start = buf;
-#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,2,0)
   *eof = 1;
-#endif
 
   return len;
 }
@@ -974,12 +948,8 @@
 printk("fast_timer_init()\n");
 
 #ifdef CONFIG_PROC_FS
-#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,2,0)
if ((fasttimer_proc_entry = create_proc_entry( "fasttimer", 0, 0 )))
  fasttimer_proc_entry->read_proc = proc_fasttimer_read;
-#else
-proc_register_dynamic(_root, _proc_entry);
-#endif
 #endif /* PROC_FS */
 if(request_irq(TIMER_INTR_VECT, timer_trig_interrupt, IRQF_DISABLED,
"fast timer int", NULL))

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[2.6 patch] drivers/scsi/nsp32.c: remove kernel 2.4 code

2007-03-25 Thread Adrian Bunk
This patch removes kernel 2.4 code.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---

 drivers/scsi/nsp32.c |  109 +--
 1 file changed, 13 insertions(+), 96 deletions(-)

--- linux-2.6.21-rc4-mm1/drivers/scsi/nsp32.c.old   2007-03-25 
20:27:34.0 +0200
+++ linux-2.6.21-rc4-mm1/drivers/scsi/nsp32.c   2007-03-25 20:31:59.0 
+0200
@@ -49,10 +49,6 @@
 #include 
 #include 
 
-#if (LINUX_VERSION_CODE < KERNEL_VERSION(2,6,0))
-# include 
-#endif
-
 #include "nsp32.h"
 
 
@@ -199,17 +195,9 @@
 static void __exitexit_nsp32  (void);
 
 /* struct struct scsi_host_template */
-#if (LINUX_VERSION_CODE > KERNEL_VERSION(2,5,73))
 static int nsp32_proc_info   (struct Scsi_Host *, char *, char **, 
off_t, int, int);
-#else
-static int nsp32_proc_info   (char *, char **, off_t, int, int, int);
-#endif
 
-#if (LINUX_VERSION_CODE > KERNEL_VERSION(2,5,73))
 static int nsp32_detect  (struct pci_dev *pdev);
-#else
-static int nsp32_detect  (struct scsi_host_template *);
-#endif
 static int nsp32_queuecommand(struct scsi_cmnd *,
void (*done)(struct scsi_cmnd *));
 static const char *nsp32_info(struct Scsi_Host *);
@@ -296,15 +284,7 @@
.eh_abort_handler   = nsp32_eh_abort,
.eh_bus_reset_handler   = nsp32_eh_bus_reset,
.eh_host_reset_handler  = nsp32_eh_host_reset,
-#if (LINUX_VERSION_CODE < KERNEL_VERSION(2,5,74))
-   .detect = nsp32_detect,
-   .release= nsp32_release,
-#endif
-#if (LINUX_VERSION_CODE < KERNEL_VERSION(2,5,2))
-   .use_new_eh_code= 1,
-#else
 /* .highmem_io = 1, */
-#endif
 };
 
 #include "nsp32_io.h"
@@ -1210,13 +1190,9 @@
unsigned long flags;
int ret;
int handled = 0;
-
-#if (LINUX_VERSION_CODE > KERNEL_VERSION(2,5,0))
struct Scsi_Host *host = data->Host;
+
spin_lock_irqsave(host->host_lock, flags);
-#else
-   spin_lock_irqsave(_request_lock, flags);
-#endif
 
/*
 * IRQ check, then enable IRQ mask
@@ -1480,11 +1456,7 @@
nsp32_write2(base, IRQ_CONTROL, 0);
 
  out2:
-#if (LINUX_VERSION_CODE > KERNEL_VERSION(2,5,0))
spin_unlock_irqrestore(host->host_lock, flags);
-#else
-   spin_unlock_irqrestore(_request_lock, flags);
-#endif
 
nsp32_dbg(NSP32_DEBUG_INTR, "exit");
 
@@ -1499,28 +1471,15 @@
nsp32_dbg(NSP32_DEBUG_PROC, "buffer=0x%p pos=0x%p 
length=%d %d\n", buffer, pos, length,  length - (pos - buffer));\
} \
} while(0)
-static int nsp32_proc_info(
-#if (LINUX_VERSION_CODE > KERNEL_VERSION(2,5,73)) 
-   struct Scsi_Host *host,
-#endif
-   char *buffer,
-   char**start,
-   off_t offset,
-   int   length,
-#if !(LINUX_VERSION_CODE > KERNEL_VERSION(2,5,73)) 
-   int   hostno,
-#endif
-   int   inout)
+
+static int nsp32_proc_info(struct Scsi_Host *host, char *buffer, char **start,
+  off_t offset, int length, int inout)
 {
char *pos = buffer;
int   thislength;
unsigned long flags;
nsp32_hw_data*data;
-#if (LINUX_VERSION_CODE > KERNEL_VERSION(2,5,73)) 
int   hostno;
-#else
-   struct Scsi_Host *host;
-#endif
unsigned int  base;
unsigned char mode_reg;
int   id, speed;
@@ -1531,15 +1490,7 @@
return -EINVAL;
}
 
-#if (LINUX_VERSION_CODE > KERNEL_VERSION(2,5,73)) 
hostno = host->host_no;
-#else
-   /* search this HBA host */
-   host = scsi_host_hn_get(hostno);
-   if (host == NULL) {
-   return -ESRCH;
-   }
-#endif
data = (nsp32_hw_data *)host->hostdata;
base = host->io_port;
 
@@ -2674,17 +2625,7 @@
  * 0x900-0xbff: (map same 0x800-0x8ff I/O port image repeatedly)
  * 0xc00-0xfff: CardBus status registers
  */
-#if (LINUX_VERSION_CODE > KERNEL_VERSION(2,5,73))
-#define DETECT_OK 0
-#define DETECT_NG 1
-#define PCIDEVpdev
 static int nsp32_detect(struct pci_dev *pdev)
-#else
-#define DETECT_OK 1
-#define DETECT_NG 0
-#define PCIDEV(data->Pci)
-static int nsp32_detect(struct scsi_host_template *sht)
-#endif
 {
struct Scsi_Host *host; /* registered host structure */
struct resource  *res;
@@ -2697,11 +2638,7 @@
/*
 * register this HBA as SCSI device
 */
-#if (LINUX_VERSION_CODE > KERNEL_VERSION(2,5,73))
host = scsi_host_alloc(_template, sizeof(nsp32_hw_data));
-#else
-   host = scsi_register(sht, sizeof(nsp32_hw_data));
-#endif
if (host == NULL) {
nsp32_msg (KERN_ERR, "failed to scsi register");
goto err;
@@ -2719,9 +2656,6 @@
host->unique_id = 

Re: [3/6] 2.6.21-rc4: known regressions

2007-03-25 Thread Adrian Bunk
On Mon, Mar 26, 2007 at 09:25:51AM +0800, Jeff Chua wrote:
> On 3/19/07, Adrian Bunk <[EMAIL PROTECTED]> wrote:
> 
> >Subject: ThinkPad doesn't resume from suspend to RAM
> >References : http://lkml.org/lkml/2007/2/27/80
> > http://lkml.org/lkml/2007/2/28/348
> >Submitter  : Jens Axboe <[EMAIL PROTECTED]>
> > Jeff Chua <[EMAIL PROTECTED]>
> >Status : unknown
> >
> >
> >Subject: suspend to disk hangs
> >References : http://lkml.org/lkml/2007/3/6/142
> >Submitter  : Jeff Chua <[EMAIL PROTECTED]>
> >Status : unknown
> 
> The good news is on 2.6.21-rc5, suspend to disk, and resume from disk works.
> But, it only works with CONFIG_NO_HZ unset.

Thanks for this information.

> Setting CONFIG_NO_HZ will cause suspend to disk to hang just before
> saving memory to disk.

Thanks, noted.

> Resume from RAM (s2ram) still broke (tried with or without
> CONFIG_NO_HZ). Suspend to RAM seems ok, but upon resume, the screen
> will only display "inu" and only after pressing the power button will
> the system return to console. But "date" still doesn't advance.

This might be related to the following regression:

Subject: first disk access after resume takes several minutes
 ('date' does not advance after resume from RAM, CONFIG_NO_HZ=n)
References : http://lkml.org/lkml/2007/3/8/117
 http://lkml.org/lkml/2007/3/25/20
Submitter  : Michael S. Tsirkin <[EMAIL PROTECTED]>
Handled-By : Thomas Gleixner <[EMAIL PROTECTED]>
 Ingo Molnar <[EMAIL PROTECTED]>
Status : problem is being debugged

> Thanks,
> Jeff.

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][Patch 3/6] integrity: EVM as an integrity service provider

2007-03-25 Thread Herbert Xu
Mimi Zohar <[EMAIL PROTECTED]> wrote:
>
> +int update_file_hash(struct dentry *dentry, struct file *f,
> +struct hash_desc *desc)
> +{

...

> +   while (offset < i_size) {
> +   rbuf_len = kernel_read(file, offset, rbuf, PAGE_SIZE);
> +   if (rbuf_len <= 0)
> +   break;
> +   offset += rbuf_len;
> +   sg[0].page = virt_to_page(rbuf);
> +   sg[0].offset = ((long)rbuf & ~PAGE_MASK);
> +   sg[0].length = rbuf_len;
> +
> +   crypto_hash_update(desc, sg, rbuf_len);

You should check for errors here and on all calls to crypto_hash_*
that return errors.

> +int evm_calc_hash(struct dentry *dentry, struct file *file, char *digest,
> + int xattr_type)
> +{
> +   struct crypto_hash *tfm;
> +   struct hash_desc desc;
> +   int error = 0;
> +
> +   if (!dentry && !file)
> +   return -ENOENT;
> +
> +   tfm = crypto_alloc_hash(evm_hash, 0, CRYPTO_ALG_ASYNC);
> +   if (!tfm) {

That should be !IS_ERR(tfm).

> +static int __init init_evm(void)
> +{
> +   int error;
> +
> +   if (strncmp(evm_hash, "sha1", 4) == 0) {
> +   evm_hash_type = EVM_TYPE_SHA1;
> +   hash_digest_size = SHA1_DIGEST_SIZE;
> +   hash_str_size = SHA1_STR_SIZE;

This looks suspect.  Can you guarantee that only md5/sha1 are used?
If not you should be getting these from the tfm.  If yes then you
shouldn't allow arbitrary strings to be used.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: I/O memory barriers vs SMP memory barriers

2007-03-25 Thread Paul E. McKenney
On Sun, Mar 25, 2007 at 11:38:43PM +0200, Lennert Buytenhek wrote:
> On Sun, Mar 25, 2007 at 02:15:42PM -0700, Paul E. McKenney wrote:
> 
> > > > [ background: On ARM, SMP synchronisation does need barriers but device
> > > >   synchronisation does not.  The question is that given this, whether
> > > >   mb() and friends can be NOPs on ARM or not (i.e. whether mb() is
> > > >   supposed to sync against other CPUs or not, or whether only smp_mb()
> > > >   can be used for this.)  ]
> > > 
> > > H...
> > > 
> > > [snip]
> > 
> > 3.  Orders memory accesses and device accesses, but not necessarily
> > the union of the two -- mb(), rmb(), wmb().
> 
> If mb/rmb/wmb are required to order normal memory accesses, that means
> that the change made in commit 9623b3732d11b0a18d9af3419f680d27ea24b014
> to always define mb/rmb/wmb as barrier() on ARM systems was wrong.

This was on UP ARM systems, right?  Assuming that ARM CPUs respect the
usual CPU-self-consistency semantics, and given the background that
device accesses are ordered, then it might well be OK to have mb/rmb/wmb
be barrier() on UP ARM systems.

Most likely not on SMP ARM systems, however.

> Does everybody agree on these semantics, though?  At least David seems
> to think that mb/rmb/wmb aren't required to order normal memory accesses
> against each other..

Not on UP.  On SMP, ordering is (almost certainly) required.

> > 4.  Orders only device accesses, which is what seems to be looked
> > for here.
> 
> Yes.  (As above, on ARM, SMP synchronisation does need barriers but
> device synchronisation does not.  If mb/rmb/wmb were only required to
> synchronise device accesses, they could have been regular compiler
> barriers on ARM, but if they are also required to synchronise normal
> memory accesses against each other, they have to map to hardware
> barriers.)

Again, for kernels built for UP, you might well be able to make the
mb() primitives be barrier().  I don't see it for SMP, though.

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][Patch 1/4] kprobe fast unregistration

2007-03-25 Thread Masami Hiramatsu
Hi Christoph and Anil,

Thank you for your comments.

Christoph Hellwig wrote:
> Speeding up the unregistration is a very good idea, but this interface
> is rather horrible.  It's almost a receipe for users to get it wrong.
Keshavamurthy, Anil S wrote:
> I agree with Christop that the interface is horrible and error prone.

OK, I agree. I had chosen a confusable name.

> However, I see the use case where people want to disable the probes quickly 
> and
> would like to reenable them again. Looking closely at your patch,
> I think this can be acheived.

Thank you.

> Here is my suggestion.
> 
>> Here is an example code.
>> --
>> struct kprobes *p;
>> for_each_probe(p) {
>>  unregister_kprobe_fast(p);
> ^^
> Change this to disable_kprobe(p), which is essentially the same as
> what you have implemented. And also provide an opposite function
> to reenable_kprobe(p) which enables the disabled probe again.

I'd like to change that to prepare_to_unregister_kprobe(p) instead of
disable_kprobe(p).

I think Josh and other people want interfaces to disable/reenable all
probes at once when the sysrq is pressed.
So, IMHO, these interfaces should use a global (and per-cpu?) flag which
controls whether kprobes calls user-defined handler or not, instead of
self-modifying.
For example,

if (p && p->pre_handler && kprobe_enable) {
   ^
   ...
}

But, I think this would be another story.
I'd like to discuss this topic in other mails.

>> }
>> commit_kprobes();
>   ^^
> Change this to unregister_disabled_kprobes(), which essentially 
> unregisters all the disabled probes.

And also, I'd like to change it to unregister_prepared_kprobes().
What would you think about this idea?

Thanks,


> 
> Thanks,
> Anil Keshavamurthy
> 
-- 
Masami HIRAMATSU
Linux Technology Center
Hitachi, Ltd., Systems Development Laboratory
E-mail: [EMAIL PROTECTED]


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fix sudden warps in mousedev

2007-03-25 Thread Dmitry Torokhov
On Sunday 25 March 2007 14:19, Pete Zaitcev wrote:
> On Sun, 25 Mar 2007 01:34:02 -0400, Dmitry Torokhov <[EMAIL PROTECTED]> wrote:
> 
> > > +    * Without this, a touchpad may report an unchanged 
> > > position,
> > > +    * then a sync. The input_event() eats the position 
> > > report, but
> > > +    * lets the sync through. We increment pkt_count and 
> > > leave 
> > > +    * a stale position in the ring. If a future reference to 
> > > fx(2)
> > > +    * hits the stale position, a large dx is reported, and 
> > > the
> > > +    * pointer warps across the screen.
> > > +    */
> > > +   dev = mousedev->handle.dev;
> > > +   fx(0) = dev->abs[ABS_X];
> > > +   fy(0) = dev->abs[ABS_Y];
> > 
> > I do not like input hanlders poking into input devices... Can't we just
> > reset pkt_count at the beginning of the touch to get rid of stale data?
> 
> The pkt_count is zero at the moment of assignment above.

Riiight... Of course you are right...

I tried to reproduce warping on console but could not for some reason. Could you
please try the patch below and tell me if it fixes the problem for you?

-- 
Dmitry

Signed-off-by: Dmitry Torokhov <[EMAIL PROTECTED]>
---

 drivers/input/mousedev.c |   51 +++
 1 files changed, 26 insertions(+), 25 deletions(-)

Index: work/drivers/input/mousedev.c
===
--- work.orig/drivers/input/mousedev.c
+++ work/drivers/input/mousedev.c
@@ -124,32 +124,33 @@ static void mousedev_touchpad_event(stru
int size, tmp;
enum { FRACTION_DENOM = 128 };
 
-   if (mousedev->touch) {
-   size = dev->absmax[ABS_X] - dev->absmin[ABS_X];
-   if (size == 0)
-   size = 256 * 2;
-
-   switch (code) {
-   case ABS_X:
-   fx(0) = value;
-   if (mousedev->pkt_count >= 2) {
-   tmp = ((value - fx(2)) * (256 * 
FRACTION_DENOM)) / size;
-   tmp += mousedev->frac_dx;
-   mousedev->packet.dx = tmp / 
FRACTION_DENOM;
-   mousedev->frac_dx = tmp - 
mousedev->packet.dx * FRACTION_DENOM;
-   }
-   break;
+   switch (code) {
+   case ABS_X:
+   fx(0) = value;
+   if (mousedev->touch && mousedev->pkt_count >= 2) {
+   size = dev->absmax[ABS_X] - dev->absmin[ABS_X];
+   if (size == 0)
+   size = 256 * 2;
+   tmp = ((value - fx(2)) * (256 * 
FRACTION_DENOM)) / size;
+   tmp += mousedev->frac_dx;
+   mousedev->packet.dx = tmp / FRACTION_DENOM;
+   mousedev->frac_dx = tmp - mousedev->packet.dx * 
FRACTION_DENOM;
+   }
+   break;
 
-   case ABS_Y:
-   fy(0) = value;
-   if (mousedev->pkt_count >= 2) {
-   tmp = -((value - fy(2)) * (256 * 
FRACTION_DENOM)) / size;
-   tmp += mousedev->frac_dy;
-   mousedev->packet.dy = tmp / 
FRACTION_DENOM;
-   mousedev->frac_dy = tmp - 
mousedev->packet.dy * FRACTION_DENOM;
-   }
-   break;
-   }
+   case ABS_Y:
+   fy(0) = value;
+   if (mousedev->touch && mousedev->pkt_count >= 2) {
+   /* use X size to keep the same scale */
+   size = dev->absmax[ABS_X] - dev->absmin[ABS_X];
+   if (size == 0)
+   size = 256 * 2;
+   tmp = -((value - fy(2)) * (256 * 
FRACTION_DENOM)) / size;
+   tmp += mousedev->frac_dy;
+   mousedev->packet.dy = tmp / FRACTION_DENOM;
+   mousedev->frac_dy = tmp - mousedev->packet.dy * 
FRACTION_DENOM;
+   }
+   break;
}
 }
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch 3/7] integrity: EVM as an integrity service provider

2007-03-25 Thread Mimi Zohar
On Sun, 2007-03-25 at 00:16 -0800, Andrew Morton wrote:
> On Fri, 23 Mar 2007 12:09:36 -0400 Mimi Zohar <[EMAIL PROTECTED]> wrote:
> 
> > +++ linux-2.6.21-rc4-mm1/security/evm/Kconfig
> > @@ -0,0 +1,17 @@
> > +config INTEGRITY_EVM
> > +   boolean "EVM support"
> > +   depends on INTEGRITY && KEYS
> > +   select CRYPTO_HMAC
> > +   select CRYPTO_MD5
> > +   select CRYPTO_SHA1
> > +   default 0
> > +   help
> > + The Extended Verification Module is an integrity provider.
> > + An extensible set of extended attributes, as defined in
> > + /etc/evm.conf, are HMAC protected against modification
> > + using the TPM's KERNEL ROOT KEY, if configured, or with a
> > + pass-phrase.  Possible extended attributes include authenticity,
> > + integrity, and revision level.
> > +
> > + If you are unsure how to answer this question, answer N.
> > +
> 
> Is no dependency upon TPM needed?

It's obviously preferable to have and use a TPM, but if one is not
available you can use a pass-phrase.

Mimi Zoharar


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: drivers/pci/probe.c patch in 2.6.20.4 causes 'cannot adjust BAR0 (not I/O)' on NVidia MCP51

2007-03-25 Thread Adrian Bunk
On Sun, Mar 25, 2007 at 07:25:46PM -0700, Kevin P. Fleming wrote:
> Adrian Bunk wrote:
> > It also adds PCI_BASE_ADDRESS_SPACE_IO to the flags which it didn't 
> > without the patch.
> 
> As an experiment I modified 2.6.20.4 to _only_ remove that value from
> the combined value for the flags and it did not help in any noticeable
> way. I can reliably boot and operate the machine with the original patch
> reversed, though.

OK thanks, then it's something else.

I see only one more thing that might make any difference:
Perhaps we are not seeing the problematic operation due to the different 
loglevel?

Please change in the third printk() the patch adds the KERN_INFO to 
KERN_WARNING.

If you get a "...trying to change BAR..." before it hangs, remove the 
pci_write_config_dword() line - this should then fix it.

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 17/21] MSI: Clear the irq_desc's msi pointer on free

2007-03-25 Thread Eric W. Biederman
Michael Ellerman <[EMAIL PROTECTED]> writes:

> We don't call dynamic_irq_cleanup(), so it never gets done. Perhaps we
> should be using your dynamic_irq_init/cleanup.

It depends.  If you are going through another irq controller etc.
dynamic_irq_cleanup is probably excessive.

>> Therefore you are doing this too late and we need to ensure the
>> architecture code does this in arch_teardown_msi_irq.
>
> As long as the arch teardown routine somehow calls dynamic_irq_cleanup()
> it should be fine. But I guess it's probably safer to just have all
> archs do set_msi_irq(irq, NULL) in the teardown.

Yes.  That sounds correct.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 17/21] MSI: Clear the irq_desc's msi pointer on free

2007-03-25 Thread Michael Ellerman
On Thu, 2007-03-22 at 21:00 -0600, Eric W. Biederman wrote:
> Michael Ellerman <[EMAIL PROTECTED]> writes:
> 
> > On Thu, 2007-03-22 at 08:23 -0600, Eric W. Biederman wrote:
> >> Michael Ellerman <[EMAIL PROTECTED]> writes:
> >> 
> >> > Currently we never clear the msi_desc pointer in the irq_desc. This
> >> > leaves us with a pointer to free'ed memory hanging around. No one seems
> >> > to have hit this, so presumably other parts of the code are protecting
> >> > us from ever using the stale pointer .. or we're just lucky, we should
> >> > still clear it.
> >> 
> >> Hmm.  Maybe.  Currently this is done in dynamic_irq_cleanup,
> >> at least for everything except sparc64.
> >
> > OK, I missed that. I still think we should do it here, otherwise there's
> > a window, however small, where the msi_desc pointer is pointing at freed
> > memory.
> 
> After following the code through the current cleanup happens before you are
> proposing, and in fact the irq is return to the set of irq's that can
> be allocated before you are calling set_irq_msi(irq, NULL).

We don't call dynamic_irq_cleanup(), so it never gets done. Perhaps we
should be using your dynamic_irq_init/cleanup.

> Therefore you are doing this too late and we need to ensure the
> architecture code does this in arch_teardown_msi_irq.

As long as the arch teardown routine somehow calls dynamic_irq_cleanup()
it should be fine. But I guess it's probably safer to just have all
archs do set_msi_irq(irq, NULL) in the teardown.

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person


signature.asc
Description: This is a digitally signed message part


Re: Linux-VServer example results for sharing vs. separate mappings ...

2007-03-25 Thread Balbir Singh

Andrew Morton wrote:

Don't we break the global LRU with this scheme?


Sure, but that's deliberate!

(And we don't have a global LRU - the LRUs are per-zone).



Yes, true. But if we use zones for containers and say we have 400
of them, with all of them under limit. When the system wants
to reclaim memory, we might not end up reclaiming the best pages.
Am I missing something?


b) Create a new memory abstraction, call it the "software zone", which
   is mostly decoupled from the present "hardware zones".  Most of the MM
   is reworked to use "software zones".  The "software zones" are
   runtime-resizeable, and obtain their pages via some means from the
   hardware zones.  A container uses a software zone.


I think the problem would be figuring out where to allocate memory from?
What happens if a software zone spans across many hardware zones?


Yes, that would be the tricky part.  But we generally don't care what
physical zone user pages come from, apart from NUMA optimisation.


The reclaim mechanism proposed *does not impact the non-container users*.


Yup.  Let's keep plugging away with Pavel's approach, see where it gets us.



Yes, we have some changes that we've made to the reclaim logic, we hope
to integrate a page cache controller soon. We are also testing the
patches. Hopefully soon enough, they'll be in a good state and we can
request you to merge the containers and the rss limit (plus page cache)
controller soon.


--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][Patch 1/4] kprobe fast unregistration

2007-03-25 Thread Frank Ch. Eigler

Hi -

"Keshavamurthy, Anil S" <[EMAIL PROTECTED]> writes:

> [...]
> > Really?  What possible problems can occur?  The worst that occurs
> > to me is that if someone forgets to call the commit function, the
> > kprobes will still be disabled, but memory won't be recycled for a
> > while.  [...]
>
> Yes, Have you looked at the code? 

A little, but we were talking more about the interface than the
implementation.

> If someone forgets to call the commit function, the kprobe will be
> disabled and yes the memory won't be recycled but the worst problem
> is that if the probe is on a module function then that module can't
> be unloaded at all [...]

I believe there is already a kprobes patch in the queue for
enumerating active probes in some /proc file.  Should a module be
locked into memory for such a reason, finding the culprit should not
be difficult.

> Hence, my suggestion would be to call them as disable_kprobe()
> (instead of unregister_kprobes_fast() which is confusing and error
> prone) and also to provide an opposite function to reenable_kprobe()
> and finally provide unregister_disabled_kprobes() which is
> essentially the same as commit_kprobes().

One problem with this idea is that if the unregister_fast()=disable()
is to become reversible, then the renamed commit_kprobes() will no
longer be indempotent.  There can no longer be a single system-wide
deferred-kprobe-cleanup list, since individual kprobes clients might
want to reinstate their probes in the future.

> > Would it be possible to allay even that concern with an automated
> > deferred/periodic commit?
> > 
> I would recomand that users call unregister_disabled_kprobes() explictly.

But this would solve both problems (memory leaks and outstanding
reference counts on modules).  In this variant,
unregister_kprobes_fast could replace unregister_kprobes outright, and
the (builtin deferred) commit function would need not be exported.

- FChE
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: drivers/pci/probe.c patch in 2.6.20.4 causes 'cannot adjust BAR0 (not I/O)' on NVidia MCP51

2007-03-25 Thread Kevin P. Fleming
Adrian Bunk wrote:
> It also adds PCI_BASE_ADDRESS_SPACE_IO to the flags which it didn't 
> without the patch.

As an experiment I modified 2.6.20.4 to _only_ remove that value from
the combined value for the flags and it did not help in any noticeable
way. I can reliably boot and operate the machine with the original patch
reversed, though.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.22 patch] more scheduled OSS driver removal

2007-03-25 Thread Adrian Bunk
On Mon, Mar 26, 2007 at 03:26:19AM +0200, Richard Knutsson wrote:
> Adrian Bunk wrote:
> >This patch contains the scheduled removal of OSS drivers that:
> >- have ALSA drivers for the same hardware without known regressions and
> >- whose Kconfig options have been removed in 2.6.20.
> >  
> Sorry for this late respond
> 
> Is sound/oss/emu10k1/ really ready to be replaced? According to a friend 
> of mine, who got the hardware, the ALSA still lacks feature provided by 
> the OSS-driver. He mentioned some sort of programmable memory (forgot 
> the name for that) with whom you can run small programs to manipulate 
> the sounds (adding echo and such). Also, he got problems with getting 
> more accurate control of the volume (guess this is more a userland issue).
> He seems to have done his homework but only got to pages where other 
> peoples had the same problem.

Which bug numbers in the ALSA bugtracking system [1] are these?

If there are regressions in an ALSA driver compared to the corresponding 
OSS driver, the goal is to get the ALSA driver fixed.

Current number of regressions in the ALSA emu10k1 driver people gave me 
numbers of open ALSA bugs for:
  0

> If this is really implemented, please let me know so I can forward it to 
> him.
>...

You can monitor the bugs in the ALSA bugtracking system describing the 
problems your friend has.

> Thanks and good night
> Richard Knutsson

cu
Adrian

[1] http://bugtrack.alsa-project.org/alsa-bug/

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] tifm_sd: add missing \n

2007-03-25 Thread Alex Dubov
In fact, this is fixed in already queued patch set.

--- Daniel Drake <[EMAIL PROTECTED]> wrote:

> Noticed this in user logs.
> 
> Signed-off-by: Daniel Drake <[EMAIL PROTECTED]>
> 
> Index: linux/drivers/mmc/tifm_sd.c
> ===
> --- linux.orig/drivers/mmc/tifm_sd.c
> +++ linux/drivers/mmc/tifm_sd.c
> @@ -676,7 +676,7 @@ static void tifm_sd_abort(unsigned long 
>   struct tifm_sd *host = (struct tifm_sd*)data;
>  
>   printk(KERN_ERR DRIVER_NAME
> -": card failed to respond for a long period of time");
> +": card failed to respond for a long period of time\n");
>  
>   tifm_sd_terminate(host);
>   tifm_eject(host->dev);
> 



 

Food fight? Enjoy some healthy debate 
in the Yahoo! Answers Food & Drink Q
http://answers.yahoo.com/dir/?link=list=396545367
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.22 patch] more scheduled OSS driver removal

2007-03-25 Thread Richard Knutsson

Adrian Bunk wrote:

This patch contains the scheduled removal of OSS drivers that:
- have ALSA drivers for the same hardware without known regressions and
- whose Kconfig options have been removed in 2.6.20.
  

Sorry for this late respond

Is sound/oss/emu10k1/ really ready to be replaced? According to a friend 
of mine, who got the hardware, the ALSA still lacks feature provided by 
the OSS-driver. He mentioned some sort of programmable memory (forgot 
the name for that) with whom you can run small programs to manipulate 
the sounds (adding echo and such). Also, he got problems with getting 
more accurate control of the volume (guess this is more a userland issue).
He seems to have done his homework but only got to pages where other 
peoples had the same problem.


If this is really implemented, please let me know so I can forward it to 
him. Also, any good site on ALSA with detailed information on this stuff 
would be much appreciated!


Thanks and good night
Richard Knutsson

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] move die notifier handling to common code

2007-03-25 Thread Paul Mackerras
Christoph Hellwig writes:

> This patch moves the die notifier handling to common code.  Previous
> various architectures had exactly the same code for it.  Note that
> the new code is compiled unconditionally, this should be understood
> as an appel to the other architecture maintainer to implement
> support for it aswell (aka sprinkling a notify_die or two in the
> proper place)

Seems OK, although I think kprobes should not be using notify_die.
The set of events that kprobes is interested in has no intersection at
all with the set that any other consumer of the notify_die events is
interested in, on any architecture.

Furthermore, the multiplexing of the kprobes events through notify_die
really serves no useful purpose.  It just means that
kprobe_exceptions_notify has to demultiplex the events with a switch
statement.  There is no significant common code for all events in
kprobe_exceptions_notify, just a simple check whether the event
happened in user mode.

However, all that is in arch code so can be changed per-arch if
desired.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3/6] 2.6.21-rc4: known regressions

2007-03-25 Thread Jeff Chua

On 3/19/07, Adrian Bunk <[EMAIL PROTECTED]> wrote:


Subject: ThinkPad doesn't resume from suspend to RAM
References : http://lkml.org/lkml/2007/2/27/80
 http://lkml.org/lkml/2007/2/28/348
Submitter  : Jens Axboe <[EMAIL PROTECTED]>
 Jeff Chua <[EMAIL PROTECTED]>
Status : unknown


Subject: suspend to disk hangs
References : http://lkml.org/lkml/2007/3/6/142
Submitter  : Jeff Chua <[EMAIL PROTECTED]>
Status : unknown


The good news is on 2.6.21-rc5, suspend to disk, and resume from disk works.
But, it only works with CONFIG_NO_HZ unset.

Setting CONFIG_NO_HZ will cause suspend to disk to hang just before
saving memory to disk.

Resume from RAM (s2ram) still broke (tried with or without
CONFIG_NO_HZ). Suspend to RAM seems ok, but upon resume, the screen
will only display "inu" and only after pressing the power button will
the system return to console. But "date" still doesn't advance.

Thanks,
Jeff.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RSDL 0.31 causes slowdown

2007-03-25 Thread Con Kolivas
On Saturday 24 March 2007 04:57, Tim Chen wrote:
> On Fri, 2007-03-23 at 13:40 +1100, Con Kolivas wrote:
> > Volanomark is a purely yield() semantic dependant workload (as
> > discussed many times previously). In the earlier form of RSDL I
> > softened the effect of sched_yield but other changes since then have
> > made that softness bordering on a noop. Obviously when sched_yield is
> > relied upon that will not be enough. Extending the rr interval simply
> > makes the yield slightly more effective and is not the proper
> > workaround. Since expiration of arrays is a regular frequent
> > occurrence in RSDL then changing yield semantics back to expiration
> > should cause a massive improvement in these values, without making the
> > yields as long as in mainline. It's impossible to know exactly what
> > the final result will be since java uses this timing sensitive yield
> > for locking but we can improve it drastically from this. I'll make a
> > patch soon to change yield again.
>
> Con,
>
> The new RSDL 0.33 has fully recovered the loss in performance for
> Volanomark.  The throughput for Volanomark is at the same level as
> mainline 2.6.21-rc4 kernel.
>
> Tim

Thanks very much for testing. I'm quite happy with the yield semantics staying 
the way they are in rSDl 0.33+.

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/22 take 3] UBI: Unsorted Block Images

2007-03-25 Thread Jörn Engel
On Mon, 26 March 2007 01:21:25 +0100, David Woodhouse wrote:
> On Mon, 2007-03-26 at 02:01 +0200, Jörn Engel wrote:
> > You can on NAND.  ECC is done in software.  And for a data structure as
> > simple as the 'tally', foregoing ECC is not a huge problem - most
> > bitflips are easily detected and the remaining only cause off-by-a-few
> > on the erase count. 
> 
> You're only allowed a limited number of write cycles to each page
> though. So you can't just clear the bits in a 2112-byte page one at a
> time; typically when you clear the fifth bit, the contents of the whole
> page become undefined until the next erase cycle.

That limitation stems from ECC and ECC is done in software.  Currently
everyone and his dog is doing ECC in chunks of 256 bytes on NAND.  So
your minimum write size is 256 bytes _if you care about ECC_.  If you
don't care, you can write single bits on NAND, just as you can on NOR.

Controlling ECC in software means we are quite flexible.  Given
sufficient incentive, we can change the rules quite significantly.

Jörn

-- 
You can't tell where a program is going to spend its time. Bottlenecks
occur in surprising places, so don't try to second guess and put in a
speed hack until you've proven that's where the bottleneck is.
-- Rob Pike
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Opps in 2.6.21-rc4 nfsd

2007-03-25 Thread Neil Brown
On Monday March 26, [EMAIL PROTECTED] wrote:
> 
> I'll try to have a patch for you to try on 2.6.21rc4 shortly.
> 

This patch should remove the Oops, but I have a suspicion that
something else is wrong, so I'd be very interested if you could run
with this for a few days and report the results.

NeilBrown

Signed-off-by: Neil Brown <[EMAIL PROTECTED]>

### Diffstat output
 ./fs/nfsd/nfsfh.c |   37 +++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff .prev/fs/nfsd/nfsfh.c ./fs/nfsd/nfsfh.c
--- .prev/fs/nfsd/nfsfh.c   2007-03-26 10:51:01.0 +1000
+++ ./fs/nfsd/nfsfh.c   2007-03-26 11:04:20.0 +1000
@@ -324,7 +324,7 @@ fh_compose(struct svc_fh *fhp, struct sv
 *
 */
 
-   u8 version = 1;
+   u8 version;
u8 fsid_type = 0;
struct inode * inode = dentry->d_inode;
struct dentry *parent = dentry->d_parent;
@@ -342,12 +342,45 @@ fh_compose(struct svc_fh *fhp, struct sv
 * the reference filehandle (if it is in the same export)
 * or the export options.
 */
+ retry:
+   version = 1;
if (ref_fh && ref_fh->fh_export == exp) {
version = ref_fh->fh_handle.fh_version;
if (version == 0xca)
fsid_type = FSID_DEV;
-   else
+   else {
fsid_type = ref_fh->fh_handle.fh_fsid_type;
+   /* need to check that this type works for this
+* export point.  As the lookup was done with
+* help from user-space, there is no guarantee
+*/
+   if (ref_fh == fhp)
+   fh_put(ref_fh);
+   ref_fh = NULL;
+   switch(fsid_type) {
+   case FSID_DEV:
+   if (!old_valid_dev(ex_dev))
+   goto retry;
+   break;
+   case FSID_NUM:
+   if (! (exp->ex_flags & NFSEXP_FSID))
+   goto retry;
+   break;
+   case FSID_MAJOR_MINOR:
+   case FSID_ENCODE_DEV:
+   break;
+   case FSID_UUID8:
+   case FSID_UUID16:
+   if (!root_export)
+   goto retry;
+   /* fall through */
+   case FSID_UUID4_INUM:
+   case FSID_UUID16_INUM:
+   if (exp->ex_uuid == NULL)
+   goto retry;
+   break;
+   }
+   }
/* We know this version/type works for this export
 * so there is no need for further checks.
 */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


rSDl cpu scheduler version 0.34-test patch

2007-03-25 Thread Con Kolivas
This is just for testing at the moment! The reason is the size of this patch.

In the interest of evolution, I've taken the RSDL cpu scheduler and increased 
the resolution of the task timekeeping to nanosecond resolution. This removes 
the need for the runqueue rotation component entirely out of RSDL. The design 
basically is mostly unchanged, minus over 150 lines of code for the rotation, 
yet should be slightly better performing. It should be indistinguishable in 
usage from v0.33.

Other changes from v0.33:
-rr interval was not being properly scaled with HZ
-fix possible race in checking task_queued in task_running_tick
-scale down rr interval for niced tasks if HZ can tolerate it
-cull list_splice_tail

What does this mean for the next version of RSDL?

Assuming all works as expected on these test patches, it will be cleanest to 
submit a new series of patches for -mm with the renamed Staircase-Deadline 
scheduler and new documentation (when it's done).


So for testing here are full rollups for 2.6.20.4 and 2.6.21-rc4:
http://ck.kolivas.org/patches/staircase-deadline/2.6.20.4-sd-0.34-test.patch
http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc4-sd-0.34-test.patch

The patches available also include a rollup of sched: accurate user accounting 
as this code touches the same area and it is most convenient to include them 
together.

(incrementals in each subdir of staircase-deadline/ for those interested).

Thanks Mike for continuing to attempt to use the cluebat on me on this one. 
>From the start I wasn't sure if this was necessary or not but ends up being 
less code than RSDL.

While I'm still far from being well, luckily I am in much better shape to be 
able to spend the time at the pc to have done this. Thanks to all those who 
expressed their concern.

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Opps in 2.6.21-rc4 nfsd

2007-03-25 Thread Neil Brown
On Friday March 23, [EMAIL PROTECTED] wrote:
> Adrian et al,
> 
> Under 2.6.20.2+3 I get a similar crash and more frequently.  At least
> under 2.6.21rc4 I only get this about once every two hours under heavy
> nfs testing.

That is very odd.  The code where it is crashing in the trace you
provide does not exist in 2.6.20 at all.

Do you have an Oops trace from when it crashed under 2.6.20.x ??

I'll try to have a patch for you to try on 2.6.21rc4 shortly.

What version of nfs-utils are you using?

Thanks,
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: (usagi-core 32640) Re: [linux-usb-devel] [PATCH 0/2] [SERIAL] [USB] fixed to skip NULL entry in struct serial usb_serial_port.

2007-03-25 Thread Noriaki TAKAMIYA
Hi,

>> Sun, 25 Mar 2007 08:55:21 -0700
>> [Subject: (usagi-core 32640) Re: [linux-usb-devel] [PATCH 0/2] [SERIAL] 
>> [USB] fixed to skip NULL entry in struct serial usb_serial_port.]
>> Greg KH <[EMAIL PROTECTED]> wrote...

> >   Yes, this problem was already fixed.
> 
> Great, thanks for testing.  So I guess both of these patches are no
> longer necessary, right?

  I think so. But I wonder if usb_get_serial_port_data() should check
  the argument for the other drivers.

  Regards,

--
Noriaki TAKAMIYA
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch -mm 1/2] i386: add ptep_test_and_clear_{dirty,young}

2007-03-25 Thread David Rientjes
Add ptep_test_and_clear_{dirty,young} to i386.  They advertise that they
have it and there is at least one place where it needs to be called
without the page table lock: to clear the accessed bit on write to
/proc/pid/clear_refs.

ptep_clear_flush_{dirty,young} are updated to use the new functions.  The
overall net effect to current users of ptep_clear_flush_{dirty,young} is
that we introduce an additional branch.

Cc: Hugh Dickins <[EMAIL PROTECTED]>
Cc: Ingo Molnar <[EMAIL PROTECTED]>
Signed-off-by: David Rientjes <[EMAIL PROTECTED]>
---
 include/asm-i386/pgtable.h |   25 +
 1 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/include/asm-i386/pgtable.h b/include/asm-i386/pgtable.h
--- a/include/asm-i386/pgtable.h
+++ b/include/asm-i386/pgtable.h
@@ -283,12 +283,23 @@ do {  
\
}   \
 } while (0)
 
-/*
- * We don't actually have these, but we want to advertise them so that
- * we can encompass the flush here.
- */
 #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_DIRTY
+static inline int ptep_test_and_clear_dirty(struct vm_area_struct *vma,
+   unsigned long addr, pte_t *ptep)
+{
+   if (!pte_dirty(*ptep))
+   return 0;
+   return test_and_clear_bit(_PAGE_BIT_DIRTY, >pte_low);
+}
+
 #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
+static inline int ptep_test_and_clear_young(struct vm_area_struct *vma,
+   unsigned long addr, pte_t *ptep)
+{
+   if (!pte_young(*ptep))
+   return 0;
+   return test_and_clear_bit(_PAGE_BIT_ACCESSED, >pte_low);
+}
 
 /*
  * Rules for using ptep_establish: the pte MUST be a user pte, and
@@ -305,9 +316,8 @@ do {
\
 #define ptep_clear_flush_dirty(vma, address, ptep) \
 ({ \
int __dirty;\
-   __dirty = pte_dirty(*(ptep));   \
+   __dirty = ptep_test_and_clear_dirty((vma), (address), (ptep));  \
if (__dirty) {  \
-   clear_bit(_PAGE_BIT_DIRTY, &(ptep)->pte_low);   \
pte_update_defer((vma)->vm_mm, (address), (ptep));  \
flush_tlb_page(vma, address);   \
}   \
@@ -318,9 +328,8 @@ do {
\
 #define ptep_clear_flush_young(vma, address, ptep) \
 ({ \
int __young;\
-   __young = pte_young(*(ptep));   \
+   __young = ptep_test_and_clear_young((vma), (address), (ptep));  \
if (__young) {  \
-   clear_bit(_PAGE_BIT_ACCESSED, &(ptep)->pte_low);\
pte_update_defer((vma)->vm_mm, (address), (ptep));  \
flush_tlb_page(vma, address);   \
}   \
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch -mm 2/2] smaps: use ptep_test_and_clear_young

2007-03-25 Thread David Rientjes
Use arch-specified ptep_test_and_clear_young() to clear the pte accessed
bits for /proc/pid/clear_refs.  This avoids a race condition if a pte is
modified between pte_mkold() and set_pte_at().

Cc: Hugh Dickins <[EMAIL PROTECTED]>
Signed-off-by: David Rientjes <[EMAIL PROTECTED]>
---
 fs/proc/task_mmu.c |5 +
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -277,10 +277,7 @@ static void clear_refs_one_pmd(struct vm_area_struct *vma, 
pmd_t *pmd,
continue;
 
/* Clear accessed and referenced bits. */
-   if (pte_young(ptent)) {
-   ptent = pte_mkold(ptent);
-   set_pte_at(vma->vm_mm, addr, pte, ptent);
-   }
+   ptep_test_and_clear_young(vma, addr, pte);
ClearPageReferenced(page);
}
pte_unmap_unlock(pte - 1, ptl);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Wrong IDE cable detection in libata [Re: 2.6.21-rc4-mm1]

2007-03-25 Thread J.A. Magallón
On Mon, 19 Mar 2007 20:56:23 -0800, Andrew Morton <[EMAIL PROTECTED]> wrote:

> 
> Temporarily at
> 
>   http://userweb.kernel.org/~akpm/2.6.21-rc4-mm1/
> 

Libata seems to misdetect my cable.
I have double-checked and the cable is 80 pin...

ata1 is PATA ICH5 bus 1 with DVD-RW + ZIP and 40 pin cable
ata2 is PATA ICH5 bus 2 with extra HD + DVD and 80 pin cable
ata3 is real SATA ICH5 with boot HD

(mm, I chaged bios settings to get the box booting from the SATA disk)

werewolf:~# lsscsi
[0:0:0:0]cd/dvd  HL-DT-ST DVDRAM GSA-H10N  JL12  /dev/.tmp-11-0
[0:0:1:0]diskIOMEGA   ZIP 250  51.G  /dev/sda
[1:0:0:0]diskATA  ST3120022A   3.06  /dev/sdb
[1:0:1:0]cd/dvd  TOSHIBA  DVD-ROM SD-M1712 1004  /dev/sr1
[2:0:0:0]diskATA  ST3200822AS  3.01  /dev/sdc

ata_piix :00:1f.1: version 2.10ac1
ACPI: PCI Interrupt :00:1f.1[A] -> GSI 18 (level, low) -> IRQ 18
PCI: Setting latency timer of device :00:1f.1 to 64
ata1: PATA max UDMA/133 cmd 0x000101f0 ctl 0x000103f6 bmdma 0x0001f000 irq 14
ata2: PATA max UDMA/133 cmd 0x00010170 ctl 0x00010376 bmdma 0x0001f008 irq 15
scsi0 : ata_piix
ata1.00: ATAPI, max UDMA/33
ata1.01: ATAPI, max MWDMA0, CDB intr
ata1.00: configured for UDMA/33
ata1.01: configured for PIO3
scsi1 : ata_piix
ata2.00: ATA-6: ST3120022A, 3.06, max UDMA/100
ata2.00: 234441648 sectors, multi 16: LBA48
ata2.01: ATAPI, max UDMA/33
ata2.00: limited to UDMA/33 due to 40-wire cable<===
ata2.00: configured for UDMA/33
ata2.01: configured for UDMA/33
scsi 0:0:0:0: CD-ROMHL-DT-ST DVDRAM GSA-H10N  JL12 PQ: 0 ANSI: 5
sr0: scsi3-mmc drive: 48x/48x writer dvd-ram cd/rw xa/form2 cdda tray
Uniform CD-ROM driver Revision: 3.20
sr 0:0:0:0: Attached scsi CD-ROM sr0
scsi 0:0:1:0: Direct-Access IOMEGA   ZIP 250  51.G PQ: 0 ANSI: 5
sd 0:0:1:0: [sda] Attached SCSI removable disk
scsi 1:0:0:0: Direct-Access ATA  ST3120022A   3.06 PQ: 0 ANSI: 5
sd 1:0:0:0: [sdb] 234441648 512-byte hardware sectors (120034 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
sd 1:0:0:0: [sdb] 234441648 512-byte hardware sectors (120034 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
 sdb: sdb1
sd 1:0:0:0: [sdb] Attached SCSI disk
scsi 1:0:1:0: CD-ROMTOSHIBA  DVD-ROM SD-M1712 1004 PQ: 0 ANSI: 5
sr1: scsi3-mmc drive: 48x/48x cd/rw xa/form2 cdda tray
sr 1:0:1:0: Attached scsi CD-ROM sr1

Any ideas ?

--
J.A. Magallon  \   Software is like sex:
 \ It's better when it's free
Mandriva Linux release 2007.1 (Cooker) for i586
Linux 2.6.20-jam05 (gcc 4.1.2 20070302 (prerelease) (4.1.2-1mdv2007.1)) #1 SMP 
PREEMPT
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] INPUT: Delete unused source file drivers/input/power.c.

2007-03-25 Thread Dmitry Torokhov
On Sunday 25 March 2007 18:39, Andrew Morton wrote:
> On Sun, 25 Mar 2007 16:28:39 -0400 (EDT) "Robert P. J. Day" <[EMAIL 
> PROTECTED]> wrote:
> 
> > Delete the never-compiled source file drivers/input/power.c, and
> > remove its entry from the corresponding Makefile, as there is no
> > Kconfig file that refers to the config option INPUT_POWER.
> 
> Please cc the maintainer when he's this easy to find ;)
> 
> There are significant-looking patches against input.c in the git-input
> tree - if it's presently dead I suspect that it'll be coming back to
> life soon.
> 

Actually it is just me being anal and trying to keep even dead modules
compilable... I think it is time for power.c to go. Now that we have ACPI
using input layer to report buttons/lid I don't see it coming back.

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/22 take 3] UBI: Unsorted Block Images

2007-03-25 Thread David Woodhouse
On Mon, 2007-03-26 at 02:01 +0200, Jörn Engel wrote:
> You can on NAND.  ECC is done in software.  And for a data structure as
> simple as the 'tally', foregoing ECC is not a huge problem - most
> bitflips are easily detected and the remaining only cause off-by-a-few
> on the erase count. 

You're only allowed a limited number of write cycles to each page
though. So you can't just clear the bits in a 2112-byte page one at a
time; typically when you clear the fifth bit, the contents of the whole
page become undefined until the next erase cycle.

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/22 take 3] UBI: Unsorted Block Images

2007-03-25 Thread Jörn Engel
On Mon, 26 March 2007 00:46:33 +0100, David Woodhouse wrote:
> On Mon, 2007-03-26 at 00:55 +0200, Jörn Engel wrote:
> > > although, since you can flip bits to 1 without requireing an erase you 
> > [ vice versa.  you can flip bits to 0 without erasing. ]
> 
> And on NAND flash you can't just do it in multiple cycles one bit at a
> time. The 'tally' trick isn't viable there.

You can on NAND.  ECC is done in software.  And for a data structure as
simple as the 'tally', foregoing ECC is not a huge problem - most
bitflips are easily detected and the remaining only cause off-by-a-few
on the erase count.

On NOR with transparent (hardware) ECC you can't.

Jörn

-- 
Homo Sapiens is a goal, not a description.
-- unknown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] disable barriers in dm-crypt [was Re: BUG: Files corrupt after moving LVM volume to USB disk]

2007-03-25 Thread Marti Raudsepp

On 3/23/07, Milan Broz <[EMAIL PROTECTED]> wrote:

Disable barriers in dm-crypt because of current workqueue processing
can reorder requests.

--- linux-2.6.20.1.orig/drivers/md/dm-crypt.c   2007-03-23 15:07:11.0 
+0100
+++ linux-2.6.20.1/drivers/md/dm-crypt.c2007-03-23 15:09:59.0 
+0100
+   if (bio_barrier(bio))
+   return -EOPNOTSUPP;
+


Sorry, this doesn't help either. ext3 is still slow, reiserfs still fails.

(And sorry for the double e-mail, I forgot to reply to all)

Marti Raudsepp
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] add file position info to proc

2007-03-25 Thread Neil Brown
On Sunday March 25, [EMAIL PROTECTED] wrote:
> On Sat, 24 Mar 2007 23:04:09 +0100 Miklos Szeredi <[EMAIL PROTECTED]> wrote:
> 
> > This patch adds support for finding out the current file position,
> > open flags and possibly other info in the future.
> > 
> > These new entries are added:
> > 
> >   /proc/PID/fdinfo/FD
> >   /proc/PID/task/TID/fdinfo/FD
> > 
> > For each fd the information is provided in the following format:
> > 
> > pos:1234
> > flags:  012
> 
> I've seen the idea mentioned once or twice, but not with any great
> enthusiasm.  Why does Linux want this feature?

Same reason we want ltrace and lsof.  To be able to look inside a
process and see what it is doing.
e.g. how far has that program got through reading that enormous file?
is it really making progress, or is it going much more slowly than I
would expect and so should I look more deeply.

I'm in favour.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/3] only allow nonlinear vmas for ram backed filesystems

2007-03-25 Thread Andrew Morton
On Sat, 24 Mar 2007 23:09:19 +0100 Miklos Szeredi <[EMAIL PROTECTED]> wrote:

> Dirty page accounting/limiting doesn't work for nonlinear mappings,

Doesn't it?  iirc the problem is that we don't correctly re-clean the ptes
while starting writeout.  And the dirty-page accounting is in fact correct
(it'd darn well better be).

> so
> for non-ram backed filesystems emulate with linear mappings.  This
> retains ABI compatibility with previous kernels at minimal code cost.
> 
> All known users of nonlinear mappings actually use tmpfs, so this
> shouldn't have any negative effect.

Unless someone is using remap_file_pages() against an ext3 file, in which
case their application stops working?

That would be a problem.  These guys:
http://www.technovelty.org/code/linux/fremap.html, for example, will be in
for a little surprise.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: accurate user accounting

2007-03-25 Thread Con Kolivas
On Monday 26 March 2007 09:01, Con Kolivas wrote:
> On Monday 26 March 2007 03:14, malc wrote:
> > On Mon, 26 Mar 2007, Con Kolivas wrote:
> > > On Monday 26 March 2007 01:19, malc wrote:
> > Erm... i just looked at the code and suddenly it stopped making any sense
> > at all:
> >
> >  p->last_ran = rq->most_recent_timestamp = now;
> >  /* Sanity check. It should never go backwards or ruin accounting
> > */ if (unlikely(now < p->last_ran))
> >  return;
> >  time_diff = now - p->last_ran;
> >
> > First `now' is assigned to `p->last_ran' and the very next line
> > compares those two values, and then the difference is taken.. I quite
> > frankly am either very tired or fail to see the point.. time_diff is
> > either always zero or there's always a race here.
>
> Bah major thinko error on my part! That will teach me to post patches
> untested at 1:30 am. I'll try again shortly sorry.

Ok this one is heavily tested. Please try it when you find the time.

---
Currently we only do cpu accounting to userspace based on what is 
actually happening precisely on each tick. The accuracy of that 
accounting gets progressively worse the lower HZ is. As we already keep 
accounting of nanosecond resolution we can accurately track user cpu, 
nice cpu and idle cpu if we move the accounting to update_cpu_clock with 
a nanosecond cpu_usage_stat entry. This increases overhead slightly but 
avoids the problem of tick aliasing errors making accounting unreliable.

Remove the now defunct Documentation/cpu-load.txt file.

Signed-off-by: Con Kolivas <[EMAIL PROTECTED]>

---
 Documentation/cpu-load.txt  |  113 
 include/linux/kernel_stat.h |3 +
 include/linux/sched.h   |2 
 kernel/sched.c  |   58 +-
 kernel/timer.c  |5 -
 5 files changed, 60 insertions(+), 121 deletions(-)

Index: linux-2.6.21-rc4-acct/include/linux/kernel_stat.h
===
--- linux-2.6.21-rc4-acct.orig/include/linux/kernel_stat.h  2007-03-26 
00:56:05.0 +1000
+++ linux-2.6.21-rc4-acct/include/linux/kernel_stat.h   2007-03-26 
00:56:25.0 +1000
@@ -16,11 +16,14 @@
 
 struct cpu_usage_stat {
cputime64_t user;
+   cputime64_t user_ns;
cputime64_t nice;
+   cputime64_t nice_ns;
cputime64_t system;
cputime64_t softirq;
cputime64_t irq;
cputime64_t idle;
+   cputime64_t idle_ns;
cputime64_t iowait;
cputime64_t steal;
 };
Index: linux-2.6.21-rc4-acct/include/linux/sched.h
===
--- linux-2.6.21-rc4-acct.orig/include/linux/sched.h2007-03-26 
00:56:05.0 +1000
+++ linux-2.6.21-rc4-acct/include/linux/sched.h 2007-03-26 00:57:01.0 
+1000
@@ -882,7 +882,7 @@ struct task_struct {
int __user *clear_child_tid;/* CLONE_CHILD_CLEARTID */
 
unsigned long rt_priority;
-   cputime_t utime, stime;
+   cputime_t utime, utime_ns, stime;
unsigned long nvcsw, nivcsw; /* context switch counts */
struct timespec start_time;
 /* mm fault and swap info: this can arguably be seen as either mm-specific or 
thread-specific */
Index: linux-2.6.21-rc4-acct/kernel/sched.c
===
--- linux-2.6.21-rc4-acct.orig/kernel/sched.c   2007-03-26 00:56:05.0 
+1000
+++ linux-2.6.21-rc4-acct/kernel/sched.c2007-03-26 09:38:50.0 
+1000
@@ -89,6 +89,7 @@ unsigned long long __attribute__((weak))
  */
 #define NS_TO_JIFFIES(TIME)((TIME) / (10 / HZ))
 #define JIFFIES_TO_NS(TIME)((TIME) * (10 / HZ))
+#define JIFFY_NS   JIFFIES_TO_NS(1)
 
 /*
  * These are the 'tuning knobs' of the scheduler:
@@ -3017,8 +3018,59 @@ EXPORT_PER_CPU_SYMBOL(kstat);
 static inline void
 update_cpu_clock(struct task_struct *p, struct rq *rq, unsigned long long now)
 {
-   p->sched_time += now - p->last_ran;
-   p->last_ran = rq->most_recent_timestamp = now;
+   struct cpu_usage_stat *cpustat = _this_cpu.cpustat;
+   cputime64_t time_diff;
+
+   /* Sanity check. It should never go backwards or ruin accounting */
+   if (unlikely(now < p->last_ran))
+   goto out_set;
+   /* All the userspace visible cpu accounting is done here */
+   time_diff = now - p->last_ran;
+   p->sched_time += time_diff;
+   if (p != rq->idle) {
+   cputime_t utime_diff = time_diff;
+
+   if (TASK_NICE(p) > 0) {
+   cpustat->nice_ns = cputime64_add(cpustat->nice_ns,
+time_diff);
+   if (cpustat->nice_ns > JIFFY_NS) {
+   cpustat->nice_ns =
+   cputime64_sub(cpustat->nice_ns,
+  

Re: [PATCH 00/22 take 3] UBI: Unsorted Block Images

2007-03-25 Thread David Woodhouse
On Mon, 2007-03-26 at 00:55 +0200, Jörn Engel wrote:
> > although, since you can flip bits to 1 without requireing an erase you 
> [ vice versa.  you can flip bits to 0 without erasing. ]

And on NAND flash you can't just do it in multiple cycles one bit at a
time. The 'tally' trick isn't viable there.

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] add file position info to proc

2007-03-25 Thread Andrew Morton
On Sat, 24 Mar 2007 23:04:09 +0100 Miklos Szeredi <[EMAIL PROTECTED]> wrote:

> This patch adds support for finding out the current file position,
> open flags and possibly other info in the future.
> 
> These new entries are added:
> 
>   /proc/PID/fdinfo/FD
>   /proc/PID/task/TID/fdinfo/FD
> 
> For each fd the information is provided in the following format:
> 
> pos:  1234
> flags:012

I've seen the idea mentioned once or twice, but not with any great
enthusiasm.  Why does Linux want this feature?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/3] remove throttle_vm_writeout()

2007-03-25 Thread Andrew Morton
On Sat, 24 Mar 2007 22:57:34 +0100 Miklos Szeredi <[EMAIL PROTECTED]> wrote:

> From: Miklos Szeredi <[EMAIL PROTECTED]>
> 
> Remove this function.  It's purpose was to limit the global number of
> writeback pages from submitted by direct reclaim.  But this is equally
> well accomplished by limited queue lengths.  When this function was
> added, the device queues had much larger default lengths (8192
> requests, now it's 128), causing problems.

This changelog is wrong.

Yes, the _default_ depth of CFQ was decreased.  But it is trivial for the
user to inrcrease the queue depth again, and we'd prefer that the VM not
shit itself in response.

Plus, more significantly, the queue is per-disk.  A system with many disks
will easily be able to cover all memory with under-writeback pages. 
Which is why the level of under-writeback memory is controlled at a higher
level.

These variables are why we must not reply upon per-queue throttling in
VFS/MM/VM.


All that being said, the patch (with a new, correct changlog) is OK.  This
is because we now control the amount of dirty memory in the machine by
running balance_dirty_pages() at first-write-fault time.  So we can no
longer get into the situation where all memory is dirty+writeback.


That being said, this patch is only correct if we don't apply "[patch 1/3]
fix illogical behavior in balance_dirty_pages()".  If we _do_ apply that
patch then we can again get all memory under-writeback and we again need
throttle_vm_writeout().


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] fix illogical behavior in balance_dirty_pages()

2007-03-25 Thread Andrew Morton
On Sat, 24 Mar 2007 22:55:29 +0100 Miklos Szeredi <[EMAIL PROTECTED]> wrote:

> This is a slightly different take on the fix for the deadlock in fuse
> with dirty balancing.  David Chinner convinced me, that per-bdi
> counters are too expensive, and that it's not worth trying to account
> the number of pages under writeback, as they will be limited by the
> queue anyway.
> 
> 
> From: Miklos Szeredi <[EMAIL PROTECTED]>
> 
> Current behavior of balance_dirty_pages() is to try to start writeout
> into the specified queue for at least "write_chunk" number of pages.
> If "write_chunk" pages have been submitted, then return.
> 
> However if there are less than "write_chunk" dirty pages for this
> queue, then it doesn't return, waiting for the global dirty+writeback
> counters to subside, but without doing any actual work.
> 
> This is illogical behavior: it allows more dirtyings while there are
> dirty pages, but stops further dirtying completely if there are no
> more dirty pages.

That behaviour is perfectly logical.  It prevents the number of
dirty+writeback pages from exceeding dirty_ratio.

> It also makes a deadlock possible when one filesystem is writing data
> through another, and the balance_dirty_pages() for the lower
> filesystem is stalling the writeback for the upper filesystem's
> data (*).

I still don't understand this one.  I got lost when belatedly told that
i_mutex had something to do with it.

> So the exit condition should instead be:
> 
>   submitted at least "write_chunk" number of pages
> OR
> submitted ALL the dirty pages destined for this backing dev
>   AND
> backing dev is not congested
> 
> To do this, introduce a new counter in writeback_control, which counts
> the number of dirty pages encountered during writeback.  This includes
> all dirty pages, even those which are already under writeback but have
> been dirtied again, and those which have been skipped due to having
> locked buffers.
> 
> If this counter is zero after trying to submit some pages for
> writeback, and the backing dev is uncongested, then don't wait any
> more.  After this, newly dirtied pages can quickly be written back to
> this backing dev.
> 
> If there are globally no more pages to submit for writeback
> (nr_reclaimable == 0), then also don't wait for ever, only while this
> backing dev is congested.
> 
> (*) For more info on this deadlock, see the following discussions:
> 
>   http://lkml.org/lkml/2007/3/1/9
>   http://lkml.org/lkml/2007/3/12/16
> 
> Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]>
> ---
> 
> Index: linux/include/linux/writeback.h
> ===
> --- linux.orig/include/linux/writeback.h  2007-03-24 22:06:56.0 
> +0100
> +++ linux/include/linux/writeback.h   2007-03-24 22:29:02.0 +0100
> @@ -44,6 +44,7 @@ struct writeback_control {
>   long nr_to_write;   /* Write this many pages, and decrement
>  this for each page written */
>   long pages_skipped; /* Pages which were not written */
> + long nr_dirty;  /* Number of dirty pages encountered */
>  
>   /*
>* For a_ops->writepages(): is start or end are non-zero then this is
> Index: linux/mm/page-writeback.c
> ===
> --- linux.orig/mm/page-writeback.c2007-03-24 22:06:56.0 +0100
> +++ linux/mm/page-writeback.c 2007-03-24 22:29:02.0 +0100
> @@ -207,7 +207,15 @@ static void balance_dirty_pages(struct a
>* written to the server's write cache, but has not yet
>* been flushed to permanent storage.
>*/
> - if (nr_reclaimable) {
> + if (!nr_reclaimable) {
> + /*
> +  * If there's nothing more to write back and this queue
> +  * is uncongested,  then it is possible to quickly
> +  * write out some more data, so let's not wait
> +  */
> + if (!bdi_write_congested(bdi))
> + break;
> + } else {

This says "if there are no dirty pages in the machine at all, then go back
and dirty some more pages, regardless of the present number of
under-writeback pages".

>   writeback_inodes();
>   get_dirty_limits(_thresh,
>   _thresh, mapping);
> @@ -220,6 +228,14 @@ static void balance_dirty_pages(struct a
>   pages_written += write_chunk - wbc.nr_to_write;
>   if (pages_written >= write_chunk)
>   break;  /* We've done our duty */
> +
> + /*
> +  * If there are no more dirty pages for this backing
> +  * backing dev, and the queue is not congested, 

Linux 2.6.21-rc5

2007-03-25 Thread Linus Torvalds

There's various fixes here, ranging from some architecture updates (ia64, 
ARM, MIPS, SH, Sparc64) to KVM, networking and network drivers.

And random one-liners.

But probably more important, and likely much more visible to most people 
is the fixes for the fallout from the hrtimers and no-HZ changes, and some 
of the ACPI regressions.

Those timer changes ended up much more painful than anybody wished for, 
but big thanks to Thomas Gleixner for being on it like a weasel on a dead 
rat, and the regression list has kept shrinking.

So if you have reported a regression in the 2.6.21-rc series, please check 
2.6.21-rc5, and update your report as appropriate (whether fixed or "still 
problems with xyzzy").

Linus

---
Adrian Bunk (3):
  X86_P4_CLOCKMOD must select CPU_FREQ_TABLE
  [X25] x25_forward_call(): fix NULL dereferences
  drivers/video/s3fb.c: fix a use-before-check

Akira Iguchi (1):
  drivers/ata/Kconfig: PATA_SCC depends on wrong platform

Alan Stern (1):
  usblp: quirk flag and device entry for Seiko Epson M129C printer

Alan Tyson (1):
  [CIFS] reset mode when client notices that ATTR_READONLY is no longer set

Alessandro Zummo (1):
  pata_ixp4xx_cf: fix interrupt

Alexandr Andreev (1):
  x86-64: wire up compat sched_rr_get_interval(2)

Alexey Dobriyan (1):
  [NET]: Copy mac_len in skb_clone() as well

Alexey Starikovskiy (1):
  ACPI: resolve HP nx6125 S3 immediate wakeup regression

Andi Kleen (4):
  x86-64: Update defconfig
  i386: Update defconfig
  i386: Enforce GPLness of VMI ROM
  x86: Export _proxy_pda for gcc 4.2

Andrew Johnson (1):
  swsusp: fix suspend when console is in VT_AUTO+KD_GRAPHICS mode

Andrew Morton (2):
  machzwd warning fix
  "ext[34]: EA block reference count racing fix" performance fix

Andy Isaacson (1):
  fix read past end of array in md/linear.c

Ankita Garg (1):
  oom fix: prevent oom from killing a process with children/sibling 
unkillable

Anton Blanchard (2):
  [POWERPC] Bypass hcall stats until cpu features have run
  Fix return code in pci-skeleton.c

Antonino A. Daplas (2):
  savagefb: Fix black screen on load in Savage IX
  savagefb: Fix compile error if debugging is enabled

Arnaud Patard (1):
  spi_s3c24xx.c: warning fix

Atsushi Nemoto (5):
  [MIPS] Check FCSR for pending interrupts, alternative version
  [MIPS] FPU ownership management & preemption fixes
  SPI: at25: do not use pointer before assignment
  [MIPS] Qemu: Fix Symmetric Uniprocessor support.
  [MIPS] SPARSEMEM: The first pfn of zone should be min_low_pfn, not 0.

Avi Kivity (4):
  KVM: Unset kvm_arch_ops if arch module loading failed
  KVM: Fix guest sysenter on vmx
  KVM: MMU: Fix guest writes to nonpae pde
  KVM: MMU: Fix host memory corruption on i386 with >= 4GB ram

Bartlomiej Zolnierkiewicz (2):
  ide: don't allow DMA to be enabled if CONFIG_IDEDMA_{ICS,PCI}_AUTO=n
  ide: remove CONFIG_IDEDMA_{ICS,PCI}_AUTO config options

Bernhard Walle (3):
  Initialise SAK member for each virtual console to prevent oops
  Fix wrong /proc/iomem on SGI Altix
  [IA64] Fix wrong /proc/iomem on SGI Altix

Brice Goglin (4):
  myri10ge: Serverworks HT2100 provides aligned PCIe completion
  myri10ge: update wcfifo and intr_coal_delay default values
  myri10ge: fix management of >4kB allocated pages
  myri10ge: update driver version to 1.3.0-1.226

Bryan O'Sullivan (1):
  IB/ipath: Check return value of lookup_one_len

Chris Lesiak (1):
  spi: destroy workqueue after spi_unregister_master

Cyrill V. Gorcunov (1):
  i2c/ds1374: Check workqueue creation status

Dale Farnsworth (1):
  mv643xx_eth: add mv643xx_eth_shutdown function

Dan Williams (1):
  [ARM] 4271/1: iop32x: fix ep80219 detection (support iq80219 platforms)

Daniel Mack (1):
  ide: remove static prototypes from 
include/asm-mips/mach-au1x00/au1xxx_ide.h

Danny Kukawka (1):
  add Fujitsu Siemens Tablet PC devices to 8250_pnp.c

Dave Jones (1):
  [NET]: fix up misplaced inlines.

David Brownell (4):
  reduce pnp syslog spam
  rm pointless dmaengine exports
  gpio_direction_output() needs an initial value
  gpio_direction_output-needs-an-initial-value fix

David Howells (4):
  FRV: fix unannotated variable declarations
  NOMMU: supply get_unmapped_area() to fix NOMMU SYSV SHM
  NOMMU: make SYSV SHM nattch work correctly
  FDPIC: fix the /proc/pid/stat representation of executable boundaries

David S. Miller (3):
  [SPARC64]: Get DEBUG_PAGEALLOC working again.
  [SPARC64]: Use Kconfig.preempt
  [SPARC64]: store-init needs trailing membar.

Deepak Saxena (1):
  [MIPS] Make MIPS udelay() preempt safe under DEBUG_PREEMPT

Divy Le Ray (5):
  cxgb3 - fix ethtool cmd on multiple queues port
  cxgb3 - Auto-load FW if mismatch detected
  cxgb3 - Fix potential MAC hang
  cxgb3 - T3B2 

Re: [patch] sched: accurate user accounting

2007-03-25 Thread Con Kolivas
On Monday 26 March 2007 03:14, malc wrote:
> On Mon, 26 Mar 2007, Con Kolivas wrote:
> > On Monday 26 March 2007 01:19, malc wrote:
> >> On Mon, 26 Mar 2007, Con Kolivas wrote:
> >>> So before we go any further with this patch, can you try the following
> >>> one and see if this simple sanity check is enough?
> >>
> >> Sure (compiling the kernel now), too bad old axiom that testing can not
> >> confirm absence of bugs holds.
> >>
> >> I have one nit and one request from clarification. Question first (i
> >> admit i haven't looked at the surroundings of the patch maybe things
> >> would have been are self evident if i did):
> >>
> >> What this patch amounts to is that the accounting logic is moved from
> >> timer interrupt to the place where scheduler switches task (or something
> >> to that effect)?
> >
> > Both the scheduler tick and context switch now. So yes it adds overhead
> > as I said, although we already do update_cpu_clock on context switch, but
> > it's not this complex.
> >
> >> [..snip..]
> >>
> >>>  * These are the 'tuning knobs' of the scheduler:
> >>> @@ -3017,8 +3018,53 @@ EXPORT_PER_CPU_SYMBOL(kstat);
> >>> static inline void
> >>> update_cpu_clock(struct task_struct *p, struct rq *rq, unsigned long
> >>> long now) {
> >>> - p->sched_time += now - p->last_ran;
> >>> + struct cpu_usage_stat *cpustat = _this_cpu.cpustat;
> >>> + cputime64_t time_diff;
> >>> +
> >>>   p->last_ran = rq->most_recent_timestamp = now;
> >>> + /* Sanity check. It should never go backwards or ruin accounting */
> >>> + if (unlikely(now < p->last_ran))
> >>> + return;
> >>> + time_diff = now - p->last_ran;
> >>
> >> A nit. Anything wrong with:
> >>
> >> time_diff = now - p->last_ran;
> >> if (unlikeley (LESS_THAN_ZERO (time_diff))
> >>  return;
> >
> > Does LESS_THAN_ZERO work on a cputime64_t on all arches? I can't figure
> > that out just by looking myself which is why I did it the other way.
>
> I have no idea what type cputime64_t really is, so used this imaginary
> LESS_THAN_ZERO thing.
>
> Erm... i just looked at the code and suddenly it stopped making any sense
> at all:
>
>  p->last_ran = rq->most_recent_timestamp = now;
>  /* Sanity check. It should never go backwards or ruin accounting
> */ if (unlikely(now < p->last_ran))
>  return;
>  time_diff = now - p->last_ran;
>
> First `now' is assigned to `p->last_ran' and the very next line
> compares those two values, and then the difference is taken.. I quite
> frankly am either very tired or fail to see the point.. time_diff is
> either always zero or there's always a race here.

Bah major thinko error on my part! That will teach me to post patches untested 
at 1:30 am. I'll try again shortly sorry.

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/22 take 3] UBI: Unsorted Block Images

2007-03-25 Thread Jörn Engel
On Sun, 25 March 2007 13:49:58 -0800, David Lang wrote:
> On Sun, 25 Mar 2007, Jörn Engel wrote:
> 
> >Logical volume management can just as easily move its management
> >information into a table, instead of having it spread across all blocks.
> >Blocks can keep their original size.  Since you have to scan flash
> >anyway, you can also scan for a table, compare a magical number and do
> >some extra check to protect yourself against a UBI image inside some
> >logical volume.  No big deal.

[ This was not a request for UBI to be changed.  The only purpose was to
illustrate that LogFS is not broken.  The previous thread suggested
otherwise and I just couldn't leave it at that. ]

> if you are being paranoid about write cycles putting the write count in the 
> block you are writing avoids doing an erase/write elsewhere
>
> although, since you can flip bits to 1 without requireing an erase you 
[ vice versa.  you can flip bits to 0 without erasing. ]
> could sacrafice some space and say that your table has a normal counter for 
> the number of times the block has been erased, but a 'tally counter' where 
> you turn one bit on each time you erase the block, and when you fill up the 
> tally block you re-write the entire table, clearing all the tallys. if you 
> have relativly large eraseblocks it seems like you could afford to 
> sacrafice the space in your master table to avoid erases of it

Or you could have a table and any number of updates to it.  Erase one
block, append a small update marker to the table.  There are plenty of
options.  All have in common that code would be more complicated.

Another advantage is that erase counts don't get reset if the race
against a power failure during erase is lost.

Whether the advantaves of power-of-two blocksizes and safe erasecounts
are worth it, I leave for others to decide.

Jörn

-- 
Fools ignore complexity.  Pragmatists suffer it.
Some can avoid it.  Geniuses remove it.
-- Perlis's Programming Proverb #58, SIGPLAN Notices, Sept.  1982
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: debug rsdl 0.33

2007-03-25 Thread Con Kolivas
On Monday 26 March 2007 08:49, Con Kolivas wrote:
> On Monday 26 March 2007 04:28, Torsten Kaiser wrote:
> > On 3/24/07, Con Kolivas <[EMAIL PROTECTED]> wrote:
> > >  kernel/sched.c |   51
> > > +++ 1 file changed, 51
> > > insertions(+)
> >
> > 2.6.21-rc4-mm1 also fails for me.
> >
> > I tried pure 2.6.21-rc4-mm1, +hotfixes, +hotfixes+rsdl33 and at last
> > also added above debug patch.
>
> Thank you very much for the effort!
>
> > The oops from with the debug-patch added:
> > [   65.426126] Freeing unused kernel memory: 312k freed
> > (on the console the system is starting up, getting until "Letting udev
> > process events ...")
> > [   66.665611] Unable to handle kernel NULL pointer dereference at
> > 0020 RIP:
> > [   66.682030]  [] __sched_text_start+0x4dc/0xa0e
>
> The debug patch didn't do anything. This means it is not an unset bitmap
> problem at all otherwise it should have self corrected itself.
>
> > The system in x86_64, two 2218 on a MCP55 nvidia chipset.
> >
> > 2.6.21-rc3-mm1 works fine.
> >
> > (gdb) list *0x8026167c
> > 0x8026167c is in schedule (kernel/sched.c:3619).
>
>   next = list_entry(queue->next, struct task_struct, run_list);
>   rq->prio_level = idx;
>
> > 3614/*
> > 3615 * When the task is chosen it is checked to see if its
> > quota has been
> > 3616 * added to this runqueue level which is only performed
> > once per 3617 * level per major rotation for each running
> > task. 3618 */
> > 3619if (next->rotation != rq->prio_rotation) {
>
> Urgh. Dereferencing there? That can only be next that's deferencing meaning
> the run_list entry is bogus. That should only ever be done under runqueue
> lock so I have a race somewhere where it's not. Time for more looking.

This is about the only place I can see the run_list is looked at unlocked. Can
you see if this simple patch helps? The debug patch is unnecessary now.

Thanks!

--
Ensure checking task_queued() is only done under runqueue lock.

Signed-off-by: Con Kolivas <[EMAIL PROTECTED]>

---
 kernel/sched.c |   10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

Index: linux-2.6.21-rc4-mm1/kernel/sched.c
===
--- linux-2.6.21-rc4-mm1.orig/kernel/sched.c2007-03-26 08:54:15.0 
+1000
+++ linux-2.6.21-rc4-mm1/kernel/sched.c 2007-03-26 08:55:21.0 +1000
@@ -3421,16 +3421,16 @@ static inline void rotate_runqueue_prior
 
 static void task_running_tick(struct rq *rq, struct task_struct *p, int tick)
 {
-   if (unlikely(!task_queued(p))) {
-   /* Task has expired but was not scheduled yet */
-   set_tsk_need_resched(p);
-   return;
-   }
/* SCHED_FIFO tasks never run out of timeslice. */
if (unlikely(p->policy == SCHED_FIFO))
return;
 
spin_lock(>lock);
+   if (unlikely(!task_queued(p))) {
+   /* Task has expired but was not scheduled off yet */
+   set_tsk_need_resched(p);
+   goto out_unlock;
+   }
/*
 * Accounting is performed by both the task and the runqueue. This
 * allows frequently sleeping tasks to get their proper quota of


-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [NFS] [2.6 patch] remove nfs4_acl_add_ace()

2007-03-25 Thread J. Bruce Fields
On Sun, Mar 25, 2007 at 04:58:32PM +0200, Adrian Bunk wrote:
> nfs4_acl_add_ace() can now be removed.
> 
> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

Acked-by: J. Bruce Fields <[EMAIL PROTECTED]>

--b.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: debug rsdl 0.33

2007-03-25 Thread Con Kolivas
On Monday 26 March 2007 04:28, Torsten Kaiser wrote:
> On 3/24/07, Con Kolivas <[EMAIL PROTECTED]> wrote:
> >  kernel/sched.c |   51
> > +++ 1 file changed, 51
> > insertions(+)
>
> 2.6.21-rc4-mm1 also fails for me.
>
> I tried pure 2.6.21-rc4-mm1, +hotfixes, +hotfixes+rsdl33 and at last
> also added above debug patch.

Thank you very much for the effort!
>
> The oops from with the debug-patch added:
> [   65.426126] Freeing unused kernel memory: 312k freed
> (on the console the system is starting up, getting until "Letting udev
> process events ...")
> [   66.665611] Unable to handle kernel NULL pointer dereference at
> 0020 RIP:
> [   66.682030]  [] __sched_text_start+0x4dc/0xa0e

The debug patch didn't do anything. This means it is not an unset bitmap 
problem at all otherwise it should have self corrected itself.

> The system in x86_64, two 2218 on a MCP55 nvidia chipset.
>
> 2.6.21-rc3-mm1 works fine.
>
> (gdb) list *0x8026167c
> 0x8026167c is in schedule (kernel/sched.c:3619).

next = list_entry(queue->next, struct task_struct, run_list);
rq->prio_level = idx;

> 3614/*
> 3615 * When the task is chosen it is checked to see if its
> quota has been
> 3616 * added to this runqueue level which is only performed
> once per 3617 * level per major rotation for each running task.
> 3618 */
> 3619if (next->rotation != rq->prio_rotation) {

Urgh. Dereferencing there? That can only be next that's deferencing meaning 
the run_list entry is bogus. That should only ever be done under runqueue 
lock so I have a race somewhere where it's not. Time for more looking.

> Torsten

Thanks!

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] tifm_sd: add missing \n

2007-03-25 Thread Daniel Drake
Noticed this in user logs.

Signed-off-by: Daniel Drake <[EMAIL PROTECTED]>

Index: linux/drivers/mmc/tifm_sd.c
===
--- linux.orig/drivers/mmc/tifm_sd.c
+++ linux/drivers/mmc/tifm_sd.c
@@ -676,7 +676,7 @@ static void tifm_sd_abort(unsigned long 
struct tifm_sd *host = (struct tifm_sd*)data;
 
printk(KERN_ERR DRIVER_NAME
-  ": card failed to respond for a long period of time");
+  ": card failed to respond for a long period of time\n");
 
tifm_sd_terminate(host);
tifm_eject(host->dev);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] INPUT: Delete unused source file drivers/input/power.c.

2007-03-25 Thread Andrew Morton
On Sun, 25 Mar 2007 16:28:39 -0400 (EDT) "Robert P. J. Day" <[EMAIL PROTECTED]> 
wrote:

> Delete the never-compiled source file drivers/input/power.c, and
> remove its entry from the corresponding Makefile, as there is no
> Kconfig file that refers to the config option INPUT_POWER.

Please cc the maintainer when he's this easy to find ;)

There are significant-looking patches against input.c in the git-input
tree - if it's presently dead I suspect that it'll be coming back to
life soon.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] aoe, tb0219: fix late spin_lock_init() and friends

2007-03-25 Thread Andrew Morton
On Sun, 25 Mar 2007 23:02:21 +0400 Alexey Dobriyan <[EMAIL PROTECTED]> wrote:

> Some drivers do register_chrdev() before lock or semaphore used in
> corresponding file_operations is initialized.
> 
> Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]>
> ---
> 
>  drivers/block/aoe/aoechr.c |4 ++--
>  drivers/char/tb0219.c  |4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> --- a/drivers/block/aoe/aoechr.c
> +++ b/drivers/block/aoe/aoechr.c
> @@ -246,13 +246,13 @@ aoechr_init(void)
>  {
>   int n, i;
>  
> + sema_init(_sema, 0);
> + spin_lock_init(_lock);
>   n = register_chrdev(AOE_MAJOR, "aoechr", _fops);
>   if (n < 0) { 
>   printk(KERN_ERR "aoe: can't register char device\n");
>   return n;
>   }
> - sema_init(_sema, 0);
> - spin_lock_init(_lock);
>   aoe_class = class_create(THIS_MODULE, "aoe");
>   if (IS_ERR(aoe_class)) {
>   unregister_chrdev(AOE_MAJOR, "aoechr");
> --- a/drivers/char/tb0219.c
> +++ b/drivers/char/tb0219.c
> @@ -287,6 +287,8 @@ static int __devinit tb0219_probe(struct
>  {
>   int retval;
>  
> + spin_lock_init(_lock);
> +
>   if (request_mem_region(TB0219_START, TB0219_SIZE, "TB0219") == NULL)
>   return -EBUSY;
>  
> @@ -304,8 +306,6 @@ static int __devinit tb0219_probe(struct
>   return retval;
>   }
>  
> - spin_lock_init(_lock);
> -
>   old_machine_restart = _machine_restart;
>   _machine_restart = tb0219_restart;
>  

These locks can and should be initialised at compile-time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH]fdomain.c: get rid of unused stuff

2007-03-25 Thread Andrew Morton
On Sun, 25 Mar 2007 12:11:23 -0400 (EDT) Parag Warudkar <[EMAIL PROTECTED]> 
wrote:

> fdomain.c uses the below stuff only if PCMCIA is not defined.
> This causes unused variables to be defined when PCMCIA is not defined. 
> Wrap variables and functions around #ifndef PCMCIA appropriately to avoid 
> this.

Your email client performs space-stuffing and hence mangles patches.

> 4 less compiler warnings.

Please quote the warnings in the changelog when removing them.


> Compile tested on i386.
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] MSR: Add support for safe variants

2007-03-25 Thread Andrew Morton
On Sun, 25 Mar 2007 14:18:23 +0200 Jean Delvare <[EMAIL PROTECTED]> wrote:

> Add support for _safe (exception handled) variants of rdmsr_on_cpu 
> and wrmsr_on_cpu.  This is needed for the upcoming coretemp hardware
> monitoring driver, which might step into non-existing (poorly
> documented) MSR.

Crappy changelog.  What do these functions do?  What is "safe" about them? 
This is described in neither the changelog nor the code comments, but it
should be described in both.

Also, Andi presently has several MSR patches queued (most prominently
ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt-current/patches/rwmsr-on-cpu)
but I'm not including them in -mm due to rejects.  If I get a properly
changlogged and commented version of this patch I can merge it, but it
might get wrecked again when the x86_64 tree gets sorted out.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/22 take 3] UBI: Unsorted Block Images

2007-03-25 Thread David Lang

On Sun, 25 Mar 2007, Jörn Engel wrote:


On Wed, 21 March 2007 12:25:34 +0100, Thomas Gleixner wrote:

On Wed, 2007-03-21 at 12:05 +0100, Jörn Engel wrote:


Also LogFS currently requires erasesizes of 2^n.


Last time I talked to you about that, you said it would be possible and
fixable.


Actually, no.  LogFS is not broken, there is nothing to fix.

And there is no fundamental reason why UBI should export blocks with
non-power-of-two sizes.  UBI currently consists of two parts that are
intimately intertwined in the current implementation, but have
relatively little connection otherwise.

1. Logical volume management.
2. Static volumes.

Logical volume management can just as easily move its management
information into a table, instead of having it spread across all blocks.
Blocks can keep their original size.  Since you have to scan flash
anyway, you can also scan for a table, compare a magical number and do
some extra check to protect yourself against a UBI image inside some
logical volume.  No big deal.


if you are being paranoid about write cycles putting the write count in the 
block you are writing avoids doing an erase/write elsewhere


although, since you can flip bits to 1 without requireing an erase you could 
sacrafice some space and say that your table has a normal counter for the number 
of times the block has been erased, but a 'tally counter' where you turn one bit 
on each time you erase the block, and when you fill up the tally block you 
re-write the entire table, clearing all the tallys. if you have relativly large 
eraseblocks it seems like you could afford to sacrafice the space in your master 
table to avoid erases of it


IIRC someone said that the count per block was 128 bits? if so you could have a 
master table with a 32 bit integer (4B writes) + 96 bits of tally so that you 
would only have to re-write the table when a block on the flash has been erased 
96 times. with this approach the wear on your table is unlikely to be a factor 
until the point where you are loosing a lot of other eraseblocks due to wear 
(at which point you could shift to a different block for your table, or retire 
the flash)


David Lang

Re: debug rsdl 0.33

2007-03-25 Thread Andrew Morton
On Sun, 25 Mar 2007 19:28:57 +0100 "Torsten Kaiser" <[EMAIL PROTECTED]> wrote:

> On 3/24/07, Con Kolivas <[EMAIL PROTECTED]> wrote:
> >  kernel/sched.c |   51 +++
> >  1 file changed, 51 insertions(+)
> 
> 2.6.21-rc4-mm1 also fails for me.
> 
> I tried pure 2.6.21-rc4-mm1, +hotfixes, +hotfixes+rsdl33 and at last
> also added above debug patch.
> 
> The oops from with the debug-patch added:
> [   65.426126] Freeing unused kernel memory: 312k freed
> (on the console the system is starting up, getting until "Letting udev
> process events ...")
> [   66.665611] Unable to handle kernel NULL pointer dereference at
> 0020 RIP:
> [   66.682030]  [] __sched_text_start+0x4dc/0xa0e
> [   66.707402] PGD 0
> [   66.713473] Oops:  [1] SMP
> [   66.722968] last sysfs file:
> devices/pci:00/:00:05.0/host2/target2:0:0/2:0:0:0/type
> [   66.747954] CPU 0
> [   66.754025] Modules linked in:
> [   66.763209] Pid: 1200, comm: udevd Not tainted 2.6.21-rc4-mm1 #4
> [   66.781162] RIP: 0010:[]  []
> __sched_text_start+0x4dc/0xa0e
> [   66.807236] RSP: 0018:81007d38fe78  EFLAGS: 00010082
> [   66.823115] RAX: ffd0 RBX: 008c RCX: 
> 058e
> [   66.844439] RDX:  RSI: 000c RDI: 
> 
> [   66.865767] RBP: 81007d38ff08 R08: 0064 R09: 
> 810001014a58
> [   66.887092] R10: 001c R11: 0246 R12: 
> 810001013700
> [   66.908418] R13: 810001014198 R14: 0001 R15: 
> 000f859461fc
> [   66.929745] FS:  2b67df90e6d0() GS:807aa000()
> knlGS:
> [   66.953950] CS:  0010 DS:  ES:  CR0: 8005003b
> [   66.971126] CR2: 0020 CR3: 00201000 CR4: 
> 06e0
> [   66.992451] Process udevd (pid: 1200, threadinfo 81007d38e000,
> task 81007e354100)
> [   67.016915] Stack:  04b0 
>  81007e354100
> [   67.041097]  ffd0 81007e354298 81011d420680
> 802234b1
> [   67.063407]  0001  
> 0246
> [   67.085149] Call Trace:
> [   67.093037]  [] filp_close+0x71/0x90
> [   67.108397]  [] do_exit+0x7e7/0x800
> [   67.123495]  [] do_group_exit+0x82/0x90
> [   67.139634]  [] system_call+0x7e/0x83
> [   67.155277]
> [   67.159739]
> [   67.159740] Code: 48 39 48 50 0f 84 8b 00 00 00 48 c7 40 40 00 00 00 00 8b 
> 52
> [   67.186877] RIP  [] __sched_text_start+0x4dc/0xa0e
> [   67.205919]  RSP 
> [   67.216348] CR2: 0020
> [   67.226260] Fixing recursive fault but reboot is needed!

We've seen multiple reports of this.

For some reason we've managed to confuse kallsyms too.

> The system in x86_64, two 2218 on a MCP55 nvidia chipset.
> 
> 2.6.21-rc3-mm1 works fine.
> 
> (gdb) list *0x8026167c
> 0x8026167c is in schedule (kernel/sched.c:3619).
> 3614/*
> 3615 * When the task is chosen it is checked to see if its
> quota has been
> 3616 * added to this runqueue level which is only performed once 
> per
> 3617 * level per major rotation for each running task.
> 3618 */
> 3619if (next->rotation != rq->prio_rotation) {
> 3620/* Task has moved during major rotation */
> 3621task_new_array(next, rq);
> 3622if (!entitled_slot(next->static_prio, idx))
> 3623exchange_slot(next, rq);
> 
> 

Ah, that helps, thanks.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: apm.c: get rid of unused variables

2007-03-25 Thread Andrew Morton
On Sun, 25 Mar 2007 13:07:41 -0400 "Parag Warudkar" <[EMAIL PROTECTED]> wrote:

> Separate Kconfig patch below and also attached.

Please avoid sending two copies of a patch in the same email like this.

Because the result applies happily with `patch --dry-run' then makes a mess
when you try to apply it for real.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: I/O memory barriers vs SMP memory barriers

2007-03-25 Thread Paul E. McKenney
On Fri, Mar 23, 2007 at 01:43:53PM +, David Howells wrote:
> 
> [Resend - this time with a comma in the addresses, not a dot]
> 
> Lennert Buytenhek <[EMAIL PROTECTED]> wrote:
> 
> > [ background: On ARM, SMP synchronisation does need barriers but device
> >   synchronisation does not.  The question is that given this, whether
> >   mb() and friends can be NOPs on ARM or not (i.e. whether mb() is
> >   supposed to sync against other CPUs or not, or whether only smp_mb()
> >   can be used for this.)  ]
> 
> H...
> 
> I see your problem.  I think the right way to deal with this is to get rid of
> mb(), rmb(), wmb() and read_barrier_depends() and replace them with io_mb(),
> io_rmb(), ...

We will get combinatorial explosion if we aren't -extremely- careful:

1.  Orders only normal memory accesses, which is all that is required
of smp_*().

2.  Orders both normal and device accesses -- mmiowb().

3.  Orders memory accesses and device accesses, but not necessarily
the union of the two -- mb(), rmb(), wmb().

4.  Orders only device accesses, which is what seems to be looked
for here.

Thanx, Paul

> I think that there are only two places you should be using explicit memory
> barriers:
> 
>  (1) To control inter-CPU effects on an SMP system.
> 
>  (2) To control CPU vs device effects.
> 
> > On Thu, Mar 22, 2007 at 04:17:44PM +, Catalin Marinas wrote:
> > 
> > > Is the requirement for mb() to act correctly in the SMP case as well?
> > 
> > That's what the docs seem to suggest.  A couple of snippets from
> > memory-barriers.txt:
> > 
> > [1]  A write memory barrier gives a guarantee that all the STORE operations
> >  specified before the barrier will appear to happen before all the STORE
> >  operations specified after the barrier with respect to the other
> >  components of the system.
> > 
> > [2]  A read barrier is a data dependency barrier plus a guarantee that all 
> > the
> >  LOAD operations specified before the barrier will appear to happen 
> > before
> >  all the LOAD operations specified after the barrier with respect to the
> >  other components of the system.
> > 
> > [3] TYPEMANDATORY   SMP CONDITIONAL
> > === === ===
> > GENERAL mb()smp_mb()
> > WRITE   wmb()   smp_wmb()
> > READrmb()   smp_rmb()
> > DATA DEPENDENCY read_barrier_depends()  smp_read_barrier_depends()
> > 
> > [4]  Mandatory barriers should not be used to control SMP effects,
> >  since mandatory barriers unnecessarily impose overhead on UP
> >  systems.
> > 
> > Note the wording of 'other components of the system' in [1] and [2] --
> > the way I read it, this includes devices as well as other CPUs.
> 
> Yes, but I suppose which "other components" may depend on the class of barrier
> used.
> 
> > [4] says that mandatory barriers (i.e. from [3]: mb(), wmb(), rmb(),
> > read_barrier_depends()) SHOULD not be used to control SMP effects, but
> > it does not say that they MUST not.
> 
> As it stands, mb() is a superset of smp_mb(), and rmb() of smp_rmb(), etc.,
> so, yes, currently, mb() implies smp_mb().  However, mb() shouldn't be used if
> smb_mb() is sufficient as that may impact performance on a UP system.
> 
> Really, mb() should only be used with respect to I/O.
> 
> > > The memory-barriers.txt doc says that smp_* must be used for the SMP
> > > case.
> > 
> > The exact wording is:
> > 
> > [!] Note that SMP memory barriers _must_ be used to control the
> > ordering of references to shared memory on SMP systems, though
> > the use of locking instead is sufficient.
> > 
> > This can IMHO be interpreted in two ways:
> > 1. If you want to control ordering of references to shared memory on
> >SMP systems, you must use SMP memory barriers and not any other kind
> >of memory barrier.
> 
> If the shared memory is purely an inter-CPU effect, yes.  If the shared memory
> is actually a device with side effects, then I/O safe memory barriers are
> required - mb() and co.  Note that there must _also_ be safety wrt to other
> CPUs in the system, as other CPUs may also try to access the device.
> 
> > 2. If you want to control ordering of references to shared memory on
> >SMP systems, you must use memory barriers, and the SMP memory barrier
> >is the most appropriate barrier type to use.
> 
> You may use locking instead to control inter-CPU effects.  Locks imply one-way
> permeable SMP-class memory barriers.
> 
> > I'm thinking that [2] is what was intended.  [1] doesn't seem consistent
> > with the rest of the document, but if [1] _is_ what is what was intended,
> > we're off the hook and mb() and friends can be NOPs on ARM.  (But it'd
> > probably still need a thorough audit... :-/ )
> 

Re: I/O memory barriers vs SMP memory barriers

2007-03-25 Thread Lennert Buytenhek
On Sun, Mar 25, 2007 at 02:15:42PM -0700, Paul E. McKenney wrote:

> > > [ background: On ARM, SMP synchronisation does need barriers but device
> > >   synchronisation does not.  The question is that given this, whether
> > >   mb() and friends can be NOPs on ARM or not (i.e. whether mb() is
> > >   supposed to sync against other CPUs or not, or whether only smp_mb()
> > >   can be used for this.)  ]
> > 
> > H...
> > 
> > [snip]
> 
> 3.Orders memory accesses and device accesses, but not necessarily
>   the union of the two -- mb(), rmb(), wmb().

If mb/rmb/wmb are required to order normal memory accesses, that means
that the change made in commit 9623b3732d11b0a18d9af3419f680d27ea24b014
to always define mb/rmb/wmb as barrier() on ARM systems was wrong.

Does everybody agree on these semantics, though?  At least David seems
to think that mb/rmb/wmb aren't required to order normal memory accesses
against each other..


> 4.Orders only device accesses, which is what seems to be looked
>   for here.

Yes.  (As above, on ARM, SMP synchronisation does need barriers but
device synchronisation does not.  If mb/rmb/wmb were only required to
synchronise device accesses, they could have been regular compiler
barriers on ARM, but if they are also required to synchronise normal
memory accesses against each other, they have to map to hardware
barriers.)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3/5] 2.6.21-rc4: known regressions (v2)

2007-03-25 Thread Frédéric Riss
Le vendredi 23 mars 2007 à 19:50 +0100, Adrian Bunk a écrit :
> Subject: MacMini: doesn't come out of suspend to ram
> References : http://lkml.org/lkml/2007/3/21/374
> Submitter  : Frédéric RISS <[EMAIL PROTECTED]>
>  Tino Keitel <[EMAIL PROTECTED]>
> Caused-By  : Bob Moore <[EMAIL PROTECTED]>
>  commit c5a7156959e89b32260ad6072bbf5077bcdfbeee
> Status : unknown

I spent some time this weekend investigating this issue more thoroughly.
In fact the regression caused by this commit has been corrected by
f3ccb06f3b8e0cf42b579db21f3ca7f17fcc3f38 (ACPI: Disable wake GPEs only
once.)  

However, as I pointed out in the initial report, the MacMini doesn't
come out of suspend to ram because a commit in another merged patchset
broke it. I tracked it down to:

commit e9e2cdb412412326c4827fc78ba27f410d837e6e
parent 79bf2bb335b85db25d27421c798595a2fa2a0e82 
Author: Thomas Gleixner <[EMAIL PROTECTED]>
Date:   Fri Feb 16 01:28:04 2007 -0800

[PATCH] clockevents: i386 drivers

This patch has already been mentioned in regression reports, but AFAICS
not related to suspend issues.

To be totally clear about what works and what doesn't:

79bf2bb335b85db25d27421c798595a2fa2a0e82 
   + cherry-pick f3ccb06f3b8e0cf42b579db21f3ca7f17fcc3f38  ==> works

e9e2cdb412412326c4827fc78ba27f410d837e6e
   + cherry-pick f3ccb06f3b8e0cf42b579db21f3ca7f17fcc3f38  ==> broken

To try to get more information, I commented the call to
do_suspend_lowlevel in drivers/acpi/sleep/main.c and used
CONFIG_DISABLE_CONSOLE_SUSPEND. Interestingly, the suspend/resume cycle
completes correctly in this mode.

Fred.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] block2mtd_paramline[] mustn't be __initdata

2007-03-25 Thread Jörn Engel
On Sun, 25 March 2007 16:58:05 +0200, Adrian Bunk wrote:
> 
> block2mtd_paramline[] is used in the non-__init block2mtd_setup()
> 
> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>
Acked-By: Jörn Engel <[EMAIL PROTECTED]>

Adrian, can you put me on Cc: next time?

> ---
> --- linux-2.6.21-rc4-mm1/drivers/mtd/devices/block2mtd.c.old  2007-03-25 
> 15:56:10.0 +0200
> +++ linux-2.6.21-rc4-mm1/drivers/mtd/devices/block2mtd.c  2007-03-25 
> 15:56:31.0 +0200
> @@ -423,7 +423,7 @@
>  
>  #ifndef MODULE
>  static int block2mtd_init_called = 0;
> -static __initdata char block2mtd_paramline[80 + 12]; /* 80 for device, 12 
> for erase size */
> +static char block2mtd_paramline[80 + 12]; /* 80 for device, 12 for erase 
> size */
>  #endif

Jörn

-- 
Das Aufregende am Schreiben ist es, eine Ordnung zu schaffen, wo
vorher keine existiert hat.
-- Doris Lessing
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/22 take 3] UBI: Unsorted Block Images

2007-03-25 Thread Jörn Engel
On Wed, 21 March 2007 12:25:34 +0100, Thomas Gleixner wrote:
> On Wed, 2007-03-21 at 12:05 +0100, Jörn Engel wrote:
> > 
> > Also LogFS currently requires erasesizes of 2^n.
> 
> Last time I talked to you about that, you said it would be possible and
> fixable.

Actually, no.  LogFS is not broken, there is nothing to fix.

And there is no fundamental reason why UBI should export blocks with
non-power-of-two sizes.  UBI currently consists of two parts that are
intimately intertwined in the current implementation, but have
relatively little connection otherwise.

1. Logical volume management.
2. Static volumes.

Logical volume management can just as easily move its management
information into a table, instead of having it spread across all blocks.
Blocks can keep their original size.  Since you have to scan flash
anyway, you can also scan for a table, compare a magical number and do
some extra check to protect yourself against a UBI image inside some
logical volume.  No big deal.

Static volumes can keep a header inside their volumes.  The tiny
first-stage bootloader is currently scanning flash and can continue to
do so.  But at least this header no longer causes trouble for LogFS or
any other UBI user.

UBI is just as broken as LogFS is.  It works with every user in mainline
(which comes down to JFFS2).  LogFS works with every MTD device in
mainline.  The only combination that doesn't work is LogFS on UBI - due
to deliberate design decisions on both sides.

Jörn

-- 
Joern's library part 8:
http://citeseer.ist.psu.edu/plank97tutorial.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch resend v4] update ctime and mtime for mmaped write

2007-03-25 Thread Miklos Szeredi
From: Miklos Szeredi <[EMAIL PROTECTED]>

Changes:
v4:
 o rename test_clear_page_modified to page_mkclean_noprot
 o clean up page_mkclean_noprot
 o don't set AS_CMTIME from fault handler, since that also sets the PTE dirty
 o only update c/mtime in munmap, if file is not mapped any more
 o cleanups
v3:
 o rename is_page_modified to test_clear_page_modified
v2:
 o set AS_CMTIME flag in clear_page_dirty_for_io() too
 o don't clear AS_CMTIME in file_update_time()
 o check the dirty bit in the page tables
v1:
 o moved check from __fput() to remove_vma(), which is more logical
 o changed set_page_dirty() to set_page_dirty_mapping in hugetlb.c
 o cleaned up #ifdef CONFIG_BLOCK mess

This patch makes writing to shared memory mappings update st_ctime and
st_mtime as defined by SUSv3:

   The st_ctime and st_mtime fields of a file that is mapped with
   MAP_SHARED and PROT_WRITE shall be marked for update at some point
   in the interval between a write reference to the mapped region and
   the next call to msync() with MS_ASYNC or MS_SYNC for that portion
   of the file by any process. If there is no such call and if the
   underlying file is modified as a result of a write reference, then
   these fields shall be marked for update at some time after the
   write reference.

A new address_space flag is introduced: AS_CMTIME.  This is set each
time dirtyness from the page table is transferred to the page or if
the page is dirtied without the page table being set dirty.

Note, the flag is set unconditionally, even if the page is already
dirty.  This is important, because the page might have been dirtied
earlier by a non-mmap write.

Msync checks this flag and also dirty bit in the page tables, because
the data might change again after an msync(MS_ASYNC), while the page
is already dirty and read-write.  This also makes the time updating
work for memory backed filesystems such as tmpfs.

This implementation walks the pages in the synced range, and uses rmap
to find all the ptes for each page.  Non-linear vmas are ignored,
since the ptes can only be found by scanning the whole vma, which is
very inefficient.

As an optimization, if dirty pages are accounted, then only walk the
dirty pages, since the clean pages necessarily have clean ptes.  This
doesn't work for memory backed filesystems, where no dirty accounting
is done.

An alternative implementation could check for all intersecting vmas in
the mapping and walk the page tables for each.  This would probably be
more efficient for memory backed filesystems and if the number of
dirty pages is near the total number of pages in the range.

In munmap if there are no more memory mappings of this file, and the
AS_CMTIME flag has been set, the file times are updated.

Fixes Novell Bugzilla #206431.

Inspired by Peter Staubach's patch and the resulting comments.

Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]>
---

Index: linux-2.6.21-rc4-mm1/include/linux/pagemap.h
===
--- linux-2.6.21-rc4-mm1.orig/include/linux/pagemap.h   2007-03-25 
19:00:06.0 +0200
+++ linux-2.6.21-rc4-mm1/include/linux/pagemap.h2007-03-25 
19:00:35.0 +0200
@@ -19,6 +19,7 @@
  */
 #defineAS_EIO  (__GFP_BITS_SHIFT + 0)  /* IO error on async 
write */
 #define AS_ENOSPC  (__GFP_BITS_SHIFT + 1)  /* ENOSPC on async write */
+#define AS_CMTIME  (__GFP_BITS_SHIFT + 2)  /* ctime/mtime update needed */
 
 static inline void mapping_set_error(struct address_space *mapping, int error)
 {
Index: linux-2.6.21-rc4-mm1/include/linux/mm.h
===
--- linux-2.6.21-rc4-mm1.orig/include/linux/mm.h2007-03-25 
19:00:06.0 +0200
+++ linux-2.6.21-rc4-mm1/include/linux/mm.h 2007-03-25 19:00:36.0 
+0200
@@ -808,6 +808,7 @@ int redirty_page_for_writepage(struct wr
struct page *page);
 int FASTCALL(set_page_dirty(struct page *page));
 int set_page_dirty_lock(struct page *page);
+int set_page_dirty_mapping(struct page *page);
 int clear_page_dirty_for_io(struct page *page);
 
 extern unsigned long do_mremap(unsigned long addr,
Index: linux-2.6.21-rc4-mm1/mm/memory.c
===
--- linux-2.6.21-rc4-mm1.orig/mm/memory.c   2007-03-25 19:00:06.0 
+0200
+++ linux-2.6.21-rc4-mm1/mm/memory.c2007-03-25 19:00:36.0 +0200
@@ -676,7 +676,7 @@ static unsigned long zap_pte_range(struc
anon_rss--;
else {
if (pte_dirty(ptent))
-   set_page_dirty(page);
+   set_page_dirty_mapping(page);
if (pte_young(ptent))
SetPageReferenced(page);
file_rss--;
@@ -954,7 +954,7 @@ struct page 

Re: [patch 3/3] update ctime and mtime for mmaped write

2007-03-25 Thread Miklos Szeredi
> A few comments..

Thanks for reviewing.

> > Index: linux-2.6.21-rc4-mm1/mm/rmap.c
> > ===
> > --- linux-2.6.21-rc4-mm1.orig/mm/rmap.c 2007-03-24 19:03:11.0 
> > +0100
> > +++ linux-2.6.21-rc4-mm1/mm/rmap.c  2007-03-24 19:34:30.0 +0100
> > @@ -507,6 +507,43 @@ int page_mkclean(struct page *page)
> >  EXPORT_SYMBOL_GPL(page_mkclean);
> >  
> >  /**
> > + * test_clear_page_modified - check and clear the dirty bit for all 
> > mappings of a page
> > + * @page:  the page to check
> > + */
> > +bool test_clear_page_modified(struct page *page)
> > +{
> > +   struct address_space *mapping = page->mapping;
> 
> page_mapping(page)? Otherwise that BUG_ON(!mapping) a few lines down
> isn't of much use.

OK, removed BUG_ON().  This is called with page locked and mapping
checked, so there's no need to use page_mapping().

> > +   spin_lock(>i_mmap_lock);
> > +   vma_prio_tree_foreach(vma, , >i_mmap, pgoff, pgoff) {
> > +   if (vma->vm_flags & VM_SHARED) {
> > +   struct mm_struct *mm = vma->vm_mm;
> > +   unsigned long addr = vma_address(page, vma);
> > +   pte_t *pte;
> > +   spinlock_t *ptl;
> > +
> > +   if (addr != -EFAULT &&
> > +   (pte = page_check_address(page, mm, addr, ))) {
> > +   if (ptep_clear_flush_dirty(vma, addr, pte))
> > +   modified = true;
> > +   pte_unmap_unlock(pte, ptl);
> > +   }
> 
> Its against coding style to do assignments in conditionals.

OK, cleaned up.

> > +   if (page_test_and_clear_dirty(page))
> > +   modified = true;
> > +   return modified;
> > +}
> 
> Why not parametrize page_mkclean() to conditionally wrprotect clean
> pages? Something like:

Well, I don't really like this, because for msync, there's really no
need to do the complex ptep operations.  Using ptep_clear_flush_dirty()
can save a couple of cycles.

> > +   mapping = vma->vm_file->f_mapping;
> > +   modified = test_and_clear_bit(AS_CMTIME, >flags);
> > +
> > +   pagevec_init(, 0);
> > +   index = linear_page_index(vma, start);
> > +   end_index = linear_page_index(vma, end);
> > +   while (index < end_index) {
> > +   int i;
> > +   int nr_pages = min(end_index - index, (pgoff_t) PAGEVEC_SIZE);
> > +
> > +   if (mapping_cap_account_dirty(mapping))
> > +   nr_pages = pagevec_lookup_tag(, mapping, ,
> > +   PAGECACHE_TAG_DIRTY, nr_pages);
> > +   else
> > +   nr_pages = pagevec_lookup(, mapping, index,
> > + nr_pages);
> > +   if (!nr_pages)
> > +   break;
> > +
> > +   for (i = 0; i < nr_pages; i++) {
> > +   struct page *page = pvec.pages[i];
> > +
> > +   /* Skip pages which are just being read */
> > +   if (!PageUptodate(page))
> > +   continue;
> > +
> > +   lock_page(page);
> > +   index = page->index + 1;
> > +   if (page->mapping == mapping &&
> > +   test_clear_page_modified(page)) {
> 
> page_mkclean(page, 0)
> 
> > +   set_page_dirty(page);
> 
> set_page_dirty_mapping() ?

That would cause the file times to be updated twice, which is wrong.

But if AS_CMTIME were tested/cleared after walking the pages, this
would work.  Fixed.

Also realized, that setting AS_CMTIME from the page fault can also
cause a double update, since the PTE dirty bit is also set there.

And also realized, that doing the file update from munmap can also
cause a double update, if this is not the last mapping of the inode.

Fixed all these in the updated patch.

Thanks,
Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] riscom8: fix use of deprecated functions

2007-03-25 Thread Joerg Roedel
From: Joerg Roedel <[EMAIL PROTECTED]>

This patch replaces the deprecated functions in drivers/char/riscom8.c
and fixes the compile warnings they produced.

Signed-off-by: Joerg Roedel <[EMAIL PROTECTED]>

diff --git a/drivers/char/riscom8.c b/drivers/char/riscom8.c
index 7014525..54cb8cb 100644
--- a/drivers/char/riscom8.c
+++ b/drivers/char/riscom8.c
@@ -226,13 +226,13 @@ static void __init rc_init_CD180(struct riscom_board 
const * bp)
 {
unsigned long flags;

-   save_flags(flags); cli();
+   local_irq_save(flags);
rc_out(bp, RC_CTOUT, 0);   /* Clear timeout 
*/
rc_wait_CCR(bp);   /* Wait for CCR ready
*/
rc_out(bp, CD180_CCR, CCR_HARDRESET);  /* Reset CD180 chip  
*/
-   sti();
+   local_irq_enable();
rc_long_delay(HZ/20);  /* Delay 0.05 sec
*/
-   cli();
+   local_irq_disable();
rc_out(bp, CD180_GIVR, RC_ID); /* Set ID for this chip  
*/
rc_out(bp, CD180_GICR, 0); /* Clear all bits
*/
rc_out(bp, CD180_PILR1, RC_ACK_MINT);  /* Prio for modem intr   
*/
@@ -243,7 +243,7 @@ static void __init rc_init_CD180(struct riscom_board const 
* bp)
rc_out(bp, CD180_PPRH, (RC_OSCFREQ/(100/RISCOM_TPS)) >> 8);
rc_out(bp, CD180_PPRL, (RC_OSCFREQ/(100/RISCOM_TPS)) & 0xff);

-   restore_flags(flags);
+   local_irq_restore(flags);
 }
 
 /* Main probing routine, also sets irq. */
@@ -826,7 +826,7 @@ static int rc_setup_port(struct riscom_board *bp, struct 
riscom_port *port)
port->xmit_buf = (unsigned char *) tmp;
}

-   save_flags(flags); cli();
+   local_irq_save(flags);

if (port->tty) 
clear_bit(TTY_IO_ERROR, >tty->flags);
@@ -838,7 +838,7 @@ static int rc_setup_port(struct riscom_board *bp, struct 
riscom_port *port)
rc_change_speed(bp, port);
port->flags |= ASYNC_INITIALIZED;

-   restore_flags(flags);
+   local_irq_restore(flags);
return 0;
 }
 
@@ -949,19 +949,19 @@ static int block_til_ready(struct tty_struct *tty, struct 
file * filp,
 */
retval = 0;
add_wait_queue(>open_wait, );
-   cli();
+   local_irq_disable();
if (!tty_hung_up_p(filp))
port->count--;
-   sti();
+   local_irq_enable();
port->blocked_open++;
while (1) {
-   cli();
+   local_irq_disable();
rc_out(bp, CD180_CAR, port_No(port));
CD = rc_in(bp, CD180_MSVR) & MSVR_CD;
rc_out(bp, CD180_MSVR, MSVR_RTS);
bp->DTR &= ~(1u << port_No(port));
rc_out(bp, RC_DTR, bp->DTR);
-   sti();
+   local_irq_enable();
set_current_state(TASK_INTERRUPTIBLE);
if (tty_hung_up_p(filp) ||
!(port->flags & ASYNC_INITIALIZED)) {
@@ -1034,7 +1034,7 @@ static void rc_close(struct tty_struct * tty, struct file 
* filp)
if (!port || rc_paranoia_check(port, tty->name, "close"))
return;

-   save_flags(flags); cli();
+   local_irq_save(flags);
if (tty_hung_up_p(filp))
goto out;

@@ -1101,7 +1101,7 @@ static void rc_close(struct tty_struct * tty, struct file 
* filp)
}
port->flags &= ~(ASYNC_NORMAL_ACTIVE|ASYNC_CLOSING);
wake_up_interruptible(>close_wait);
-out:   restore_flags(flags);
+out:   local_irq_restore(flags);
 }
 
 static int rc_write(struct tty_struct * tty, 
@@ -1120,34 +1120,34 @@ static int rc_write(struct tty_struct * tty,
if (!tty || !port->xmit_buf)
return 0;
 
-   save_flags(flags);
+   local_irq_save(flags);
while (1) {
-   cli();  
+   local_irq_disable();
c = min_t(int, count, min(SERIAL_XMIT_SIZE - port->xmit_cnt - 1,
  SERIAL_XMIT_SIZE - port->xmit_head));
if (c <= 0) {
-   restore_flags(flags);
+   local_irq_restore(flags);
break;
}
 
memcpy(port->xmit_buf + port->xmit_head, buf, c);
port->xmit_head = (port->xmit_head + c) & (SERIAL_XMIT_SIZE-1);
port->xmit_cnt += c;
-   restore_flags(flags);
+   local_irq_restore(flags);
 
buf += c;
count -= c;
total += c;
}
 
-   cli();
+   local_irq_disable();
if (port->xmit_cnt && !tty->stopped && !tty->hw_stopped &&
!(port->IER & IER_TXRDY)) {
port->IER |= IER_TXRDY;
rc_out(bp, CD180_CAR, port_No(port));
rc_out(bp, CD180_IER, 

Re: [patch 1/3] fix illogical behavior in balance_dirty_pages()

2007-03-25 Thread Miklos Szeredi
> for (;;) {
> struct writeback_control wbc = {
> .bdi= bdi,
> .sync_mode  = WB_SYNC_NONE,
> .older_than_this = NULL,
> .nr_to_write= write_chunk,
> .range_cyclic   = 1,
> };
> 
> get_dirty_limits(_thresh, _thresh,
> _thresh, bdi);
> bdi_nr_reclaimable = bdi_stat(bdi, BDI_DIRTY) +
> bdi_stat(bdi, BDI_UNSTABLE);
> (A) if (bdi_nr_reclaimable + bdi_stat(bdi, BDI_WRITEBACK) <=
> bdi_thresh)
> break;
> 
> /* Note: nr_reclaimable denotes nr_dirty + nr_unstable.
>  * Unstable writes are a feature of certain networked
>  * filesystems (i.e. NFS) in which data may have been
>  * written to the server's write cache, but has not yet
>  * been flushed to permanent storage.
>  */
> (B)  if (bdi_nr_reclaimable) {
> writeback_inodes();
> 
> get_dirty_limits(_thresh, _thresh,
>_thresh, bdi);
> bdi_nr_reclaimable = bdi_stat(bdi, BDI_DIRTY) +
> bdi_stat(bdi, BDI_UNSTABLE);
> (C) if (bdi_nr_reclaimable + bdi_stat(bdi, BDI_WRITEBACK) 
> <=
> bdi_thresh)
> break;
> 
> pages_written += write_chunk - wbc.nr_to_write;
> if (pages_written >= write_chunk)
> break;  /* We've done our duty */
> }
> congestion_wait(WRITE, HZ/10);
> }
> 
> I'm thinking that if bdi_nr_reclaimable == 0, A reduces to
> bdi_stat(bdi, BDI_WRITEBACK) <= bdi_thresh and we're still out of the
> loop, no?
> 

Yep, sorry.

I'll review this patch, there seem to be odd things, like calculating
dirty_thresh, then not ever using it...

Btw, would you mind measuring the performance of my patch with the
slow/fast device mix, to see how it compares to your patchset and the
vanila kernel?

Thanks,
Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3/5] 2.6.21-rc4: known regressions (v2)

2007-03-25 Thread Eric W. Biederman
"Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:

> On Sunday, 25 March 2007 14:56, Eric W. Biederman wrote:
>> "Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:
>> 
>> > Yes, in kernel/power/disk.c:power_down() .
>> >
>> > Please comment out the disable_nonboot_cpus() in there and retest (but
> please
>> > test the latest Linus' tree).
>> 
>> 
>> 
>> Why do we even need a disable_nonboot_cpus in that path?  machine_shutdown
>> on i386 and x86_64 should take care of that.  Further the code that computes
>> the boot cpu is bogus (not all architectures require cpu == 0 to be
>> the boot cpu), and disabling non boot cpus appears to be a strong
>> x86ism, in the first place.
>
> Yes.
>  
>> If the only reason for disable_nonboot_cpus there is to avoid the
>> WARN_ON in init_low_mappings() we should seriously consider killing
>> it.
>
> We have considered it, but no one was sure that it was a good idea.

The problem with the current init_low_mappings is that it hacks the
current page table.  If we can instead use a different page table
the code becomes SMP safe.

I have extracted the patch that addresses this from the relocatable
patchset and appended it for sparking ideas.  It goes a little
farther than we need to solve this issue but the basics are there.

>> If we can wait for 2.6.22 the relocatable x86_64 patchset that 
>> Andi has queued, has changes that kill the init_low_mapping() hack.
>
> I think we should kill the WARN_ON() right now, perhaps replacing it with
> a FIXME comment.

Reasonable.

>> I'm not very comfortable with calling cpu_down in a common code path
>> right now either.  I'm fairly certain we still don't have that
>> correct.  So if we confine the mess that is cpu_down to #if
>> defined(CPU_HOTPLUG) && defined(CONFIG_EXPERIMENTAL) I don't care.
>> If we start using it everywhere I'm very nervous.
>> migration when bringing a cpu down is strongly racy, and I don't think
>> we actually put cpus to sleep properly either.
>
> I'm interested in all of the details, please.  I seriously consider dropping
> cpu_up()/cpu_down() from the suspend code paths.


So I'm not certain if in a multiple cpu context we can avoid all of the
issues with cpu hotplug but there is a reasonable chance so I will
explain as best I can.

Yanking the appropriate code out of linuxbios the way a processor should stop
itself is to send an INIT IPI to itself.  This puts a cpu into an optimized
wait for startup IPI state where it is otherwise disabled.  This is the state
any sane BIOS will put the cpus into before control is handed off to the kernel.

> static inline void stop_this_cpu(void)
> {
> unsigned apicid;
> apicid = lapicid();
> 
> /* Send an APIC INIT to myself */
> lapic_write(LAPIC_ICR2, SET_LAPIC_DEST_FIELD(apicid));
> lapic_write(LAPIC_ICR, LAPIC_INT_LEVELTRIG | LAPIC_INT_ASSERT | 
> LAPIC_DM_INIT);
> /* Wait for the ipi send to finish */
> lapic_wait_icr_idle();
> 
> /* Deassert the APIC INIT */
> lapic_write(LAPIC_ICR2, SET_LAPIC_DEST_FIELD(apicid));
> lapic_write(LAPIC_ICR,  LAPIC_INT_LEVELTRIG | LAPIC_DM_INIT);
> /* Wait for the ipi send to finish */
> lapic_wait_icr_idle();
> 
> /* If I haven't halted spin forever */
> for(;;) {
> hlt();
> }
> }

I'm not certain what to do with the interrupt races.  But I will see
if I can explain what I know.



- Most ioapics are buggy.
- Most ioapics do not follow pci-ordering rules with respect to
  interrupt message deliver so ensuring all in-flight irqs have
  arrived somewhere is very hard.
- To avoid bugs we always limit ourselves to reprogramming the ioapics
  in the interrupt handler, and not considering an interrupt
  successfully reprogrammed until we have received an irq in the new
  location.
- On x86 we have two basic interrupt handling modes.
  o logical addressing with lowest priority delivery.
  o physical addressing with delivery to a single cpu.
- With logical addressing as long as the cpu is not available for
  having an interrupt delivered to it the interrupt will be
  never be delivered to a particular cpu.  Ideally we also update
  the mask in the ioapic to not target that cpu.
- With physical addressing targeting a single cpu we need to reprogram
  the ioapics not to target that specific cpu.  This needs to happen
  in the interrupt handler and we need to wait for the next interrupt
  before we tear down our data structures for handling the interrupt.

  The current cpu hotplug code attempts to reprogram the ioapics from
  process context which is just wrong.

Now as part of suspend/resume I think we should be programming the
hardware not to generate interrupts in the first place at the actual
hardware devices so we can likely avoid all of the code that
reprograms interrupts while they are active.  If we can use things
like pci ordering rules to ensure the device will never fire the
interrupt until resumed we 

[PATCH] INPUT: Delete unused source file drivers/input/power.c.

2007-03-25 Thread Robert P. J. Day

Delete the never-compiled source file drivers/input/power.c, and
remove its entry from the corresponding Makefile, as there is no
Kconfig file that refers to the config option INPUT_POWER.

Signed-off-by: Robert P. J. Day <[EMAIL PROTECTED]>

---

diff --git a/drivers/input/Makefile b/drivers/input/Makefile
index da575de..b4cd106 100644
--- a/drivers/input/Makefile
+++ b/drivers/input/Makefile
@@ -13,7 +13,6 @@ obj-$(CONFIG_INPUT_MOUSEDEV)  += mousedev.o
 obj-$(CONFIG_INPUT_JOYDEV) += joydev.o
 obj-$(CONFIG_INPUT_EVDEV)  += evdev.o
 obj-$(CONFIG_INPUT_TSDEV)  += tsdev.o
-obj-$(CONFIG_INPUT_POWER)  += power.o
 obj-$(CONFIG_INPUT_EVBUG)  += evbug.o

 obj-$(CONFIG_INPUT_KEYBOARD)   += keyboard/
diff --git a/drivers/input/power.c b/drivers/input/power.c
deleted file mode 100644
index ee82464..000
--- a/drivers/input/power.c
+++ /dev/null
@@ -1,166 +0,0 @@
-/*
- * $Id: power.c,v 1.10 2001/09/25 09:17:15 vojtech Exp $
- *
- *  Copyright (c) 2001 "Crazy" James Simmons
- *
- *  Input driver Power Management.
- *
- *  Sponsored by Transvirtual Technology.
- */
-
-/*
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
- *
- * Should you need to contact me, the author, you can do so by
- * e-mail - mail your message to <[EMAIL PROTECTED]>.
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-static struct input_handler power_handler;
-
-/*
- * Power management can't be done in a interrupt context. So we have to
- * use keventd.
- */
-static int suspend_button_pushed = 0;
-static void suspend_button_task_handler(void *data)
-{
-udelay(200); /* debounce */
-suspend_button_pushed = 0;
-}
-
-static DECLARE_WORK(suspend_button_task, suspend_button_task_handler, NULL);
-
-static void power_event(struct input_handle *handle, unsigned int type,
-   unsigned int code, int down)
-{
-   struct input_dev *dev = handle->dev;
-
-   printk("Entering power_event\n");
-
-   if (type == EV_PWR) {
-   switch (code) {
-   case KEY_SUSPEND:
-   printk("Powering down entire device\n");
-
-   if (!suspend_button_pushed) {
-   suspend_button_pushed = 1;
-   schedule_work(_button_task);
-   }
-   break;
-   case KEY_POWER:
-   /* Hum power down the machine. */
-   break;
-   default:
-   return;
-   }
-   }
-
-   if (type == EV_KEY) {
-   switch (code) {
-   case KEY_SUSPEND:
-   printk("Powering down input device\n");
-   /* This is risky. See pm.h for details. */
-   if (dev->state != PM_RESUME)
-   dev->state = PM_RESUME;
-   else
-   dev->state = PM_SUSPEND;
-   pm_send(dev->pm_dev, dev->state, dev);
-   break;
-   case KEY_POWER:
-   /* Turn the input device off completely ? */
-   break;
-   default:
-   return;
-   }
-   }
-   return;
-}
-
-static struct input_handle *power_connect(struct input_handler *handler,
- struct input_dev *dev,
- const struct input_device_id *id)
-{
-   struct input_handle *handle;
-
-   if (!(handle = kzalloc(sizeof(struct input_handle), GFP_KERNEL)))
-   return NULL;
-
-   handle->dev = dev;
-   handle->handler = handler;
-
-   input_open_device(handle);
-
-   printk(KERN_INFO "power.c: Adding power management to input layer\n");
-   return handle;
-}
-
-static void power_disconnect(struct input_handle *handle)
-{
-   input_close_device(handle);
-   kfree(handle);
-}
-
-static const struct input_device_id power_ids[] = {
-   {
-   .flags = 

Re: [patch 0/7] Add IRQF_IRQPOLL_IRQ flag to allow irqpoll

2007-03-25 Thread Bernhard Walle
* Guennadi Liakhovetski <[EMAIL PROTECTED]> [2007-03-24 00:21]:
> On Fri, 23 Mar 2007, Bernhard Walle wrote:
> 
> > * Guennadi Liakhovetski <[EMAIL PROTECTED]> [2007-03-23 23:15]:
> > > On Fri, 23 Mar 2007, Bernhard Walle wrote:
> > > 
> > > > irqpoll is broken on some architectures that don't use the IRQ 0 for 
> > > > the timer
> > > > interrupt like IA64. This patch adds a IRQF_IRQPOLL_IRQ flag.
> > > 
> > > emn, please, cannot we come up with a better name - one that doesn't have 
> > > "IRQ" 3 (!) times in it?...
> > 
> > Well, it was suggested by Andrew. What's your suggestion?
> 
> Purely esthetically (ehm, am I saying that?...) your original suggestion 
> to use IRQF_IRQPOLL looks better to me. Or even IRQF_POLL, but, maybe, 
> it's a bit too much of a simplification.

I personally don't care very much about the naming. So I think it's
Andrews final decision ...


Thanks,
   Bernhard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[BUG]pci layer may report wrong iomem resources data to drivers

2007-03-25 Thread thomas schorpp

lo,

aic7xxx driver mmio / dma on x86_64 and some pre 2.6.19.5 ix86 linux kernels are 
broken here with the adaptec asc19160 PCI/32 scsi hba pci card.


well, i've got several live cd systems < 2.6.19.5i386 like whax 3.0 and knoppix 
that oops
and hang boot in aic7xxx init, only one booting here so far is knoppix 5.2,

the latest unofficial debian stable 2.6.8-12-amd64-generic, which says ACPI: 
PCI interrupt :00:06.0[A] -> GSI 17 (level, low) -> IRQ 17

aic7xxx: PCI0:6:0 MEM region 0x0 unavailable. Cannot memory map device.
but works in PIO mode,

a debian etch 2.6.18-4-amd64 x86_64 which says:

SCSI subsystem initialized
GSI 16 sharing vector 0xA9 and IRQ 16
ACPI: PCI Interrupt :00:06.0[A] -> GSI 17 (level, low) -> IRQ 169
BUG: soft lockup detected on CPU#0!

Call Trace:
 [] softlockup_tick+0xdb/0xed
[] update_process_times+0x42/0x68
[] smp_local_timer_interrupt+0x23/0x47
[] smp_apic_timer_interrupt+0x41/0x47
[] apic_timer_interrupt+0x66/0x6c
 [] pci_conf1_write+0x0/0xc9
[] :aic7xxx:ahc_pci_test_register_access+0xc2/0x391
[] :aic7xxx:ahc_pci_test_register_access+0x4f/0x391
[] :aic7xxx:ahc_pci_map_registers+0x1bb/0x239
[] :aic7xxx:ahc_pci_config+0x4c/0x12d0
[] pcibios_set_master+0x1e/0x84
[] :aic7xxx:ahc_linux_pci_dev_probe+0x13e/0x213
[] pci_device_probe+0xdf/0x147
[] driver_probe_device+0x52/0xa8
[] __driver_attach+0x0/0x9a
[] __driver_attach+0x50/0x9a
[] __driver_attach+0x0/0x9a
[] bus_for_each_dev+0x43/0x6e
[] bus_add_driver+0x7e/0x130
[] __pci_register_driver+0x57/0x7d
[] :aic7xxx:ahc_linux_pci_init+0x17/0x21
[] :aic7xxx:ahc_linux_init+0x325/0x336
[] default_wake_function+0x0/0xe
[] __down_read+0x12/0x9a
[] __link_module+0x0/0x25
[] __up_read+0x13/0x8a
[] sys_init_module+0x16cc/0x1882
[] system_call+0x7e/0x83

BUG: soft lockup detected on CPU#0!

which is not surprising, since the aic driver has a thread 4E4 blocking test 
function:


I can fix the mmio check not to
hang, but the card won't actually work mmio until whatever's assigning
the BAR above 32 bits is fixed (that could either be a kernel PCI bug or
a BIOS bug).

[EMAIL PROTECTED]

a kernel.org 2.6.20 with K8 config set but built in a 32Bit debian sid 
environment, but works ok:

tom1:~# uname -a
Linux tom1.schorpp.dyndns.dk 2.6.20 #1 PREEMPT Mon Feb 5 11:21:13 CET 2007 i686 
GNU/Linux

tom1:~# lspci -vvv -s 00:06.0
00:06.0 SCSI storage controller: Adaptec AIC-7892B U160/m (rev 02)
   Subsystem: Adaptec 19160 Ultra160 SCSI Controller
   Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- 
Stepping- SERR+ FastB2B-
   Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- and finally the latest kernel.org 2.6.20.4 AMD K8 and .21-rc4 git built on debian amd64 etch userland that 
hangs boot on aic7xxx init without magic sysreq keys functionality and oops print triggering, 
(so the softlockup detection does not work without SMP config set, too):


Loading iSCSI transport class v2.0-724.
ACPI: PCI Interrupt :00:06.0[A] -> GSI 17 (level, low) -> IRQ 17
... Kernel alive - Kernel direct mapping tables up to 1 @ 8000-d000

i've dangerously commented out the blocking while(!ahc_is_paused) and got the 
driver to work in PIO mode so far.

tom1:~# lspci -vvv -s 00:06.0
00:06.0 SCSI storage controller: Adaptec AIC-7892B U160/m (rev 02)
  Subsystem: Adaptec 19160 Ultra160 SCSI Controller
  Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Step 
ping- SERR+ FastB2B-
  Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR-   Status: D0 PME-Enable- DSel=0 DScale=0 PME- 



the driver asks the pci layer for the iomem resource correctly (checked 0x1000 
len) with:

request_mem_region(start, 0x1000, "aic7xxx")

which returns a invalid >32Bit address and the 64Bit pci mem resources flag is 
set.
the driver shortens the return value then: ahc->platform_data->mem_busaddr is 
u32.

on later freeing [   49.278810] Trying to free nonexistent resource 

warning occours respectively.

i've tried to change containers to u64 but then the above allocation call fails 
with NULL return.

theres no reason to agree about a mb bios issue since the mb bios does not care about OS kernels 
and since some 32bit linux versions/kernel configs work and winxp_x64 kernel works fine with mmio.


we're not sure how to fix that:


The problem you seem to have is that your system is reporting a BAR
beyond 32 bits (4GB) which the card physically can't use.  This could be
because of a BIOS misconfiguration or because there's a bug in the PCI
subsystem somewhere.

James


on what circumstances can a >= 2.6.19.5 32bit kernel pci layer decode/report a correct 
iomem address and a < or 64bit x86_64 configured/built not?


working in to linux pci hal next, some change there must have caused the issue.
it seems fixed for 32bit kernels but not for 64bit.

y
tom

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the 

Re: [PATCH] Fix race between attach_task and cpuset_exit

2007-03-25 Thread Paul Jackson
> How about using a local variable like ref_count and using
> 
> ref_count = atomic_dec_and_test(>count); This will avoid the two
> atomic operations, atomic_dec() and atomic_read() below.

This would also seem to address the race I just noticed in my previous
reply.

Though I would suggest that we use the same code pattern in both
cpuset_exit and attach_task -- either both use cpuset_to_be_released,
or both use ref_count.

I tend to prefer the cpuset_to_be_released form - seems a bit more
explicitly clear to me.  But vatsa's doing the patch - he gets the
last vote.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fix race between attach_task and cpuset_exit

2007-03-25 Thread Paul Jackson
> + task_lock(tsk);
>   cs = tsk->cpuset;
>   tsk->cpuset = _cpuset;  /* the_top_cpuset_hack - see above */
> + atomic_dec(>count);
> + task_unlock(tsk);
>  
>   if (notify_on_release(cs)) {
>   char *pathbuf = NULL;
>  
>   mutex_lock(_mutex);
> - if (atomic_dec_and_test(>count))
> + if (!atomic_read(>count))
>   check_for_release(cs, );

Is there perhaps another race here?  Could it happen that:
 1) the atomic_dec() lowers the count to say one (any value > zero)
 2) after we drop the task lock, some other task or tasks decrement
the count to zero
 3) we catch that zero when we atomic_read the count, and issue a spurious
check_for_release().

I'm thinking that we should use the same oldcs_tobe_released logic
here as we used in attach_task, so that we do an atomic_dec_and_test()
inside the task lock, and if that hit zero, then we know that our
pointer to this cpuset is the last remaining reference, so we can
release that pointer at our convenience, knowing no one else can
reference or mess with that cpuset any more.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel 2.6.20 takes a long time to load

2007-03-25 Thread Thomas Gleixner
On Sun, 2007-03-25 at 14:22 +0200, Helmut Auer wrote:
> Hello List,
> 
> The kernel 2.6.20 makes a long delay during booting ( about 15 seconds ) 
> after showing:
> 
> NET: Registered protocol family 2
> 
> then it goes along with:
> 
> IP route cache hash table entries: 2048 (order: 1, 8192 bytes)
> TCP established hash table entries: 8192 (order: 4, 98304 bytes)
> ...
> 
> I am using the kernel parameter lpj to avoid calculating processor speed 
> during kernel load.
> If I do not use the lpj parameter, the detection of cpu speed takes about 10 
> seconds, but then 
> there is no delay after:
> NET: Registered protocol family 2
> 
> Any hints what I can try to get a fast booting kernel ?
> I am using 2.6.20 with some gentoo patches ( plain 2.6.20-4 has the same 
> problem ).
> 
> My .config is here:
> http://www.helmutauer.de/.config-2.6.20.3.gz

Can you put up a complete bootlog too ?

tglx



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] __copy_to_user_inatomic broken on non Pentium machines

2007-03-25 Thread Thomas Gleixner
On Sun, 2007-03-25 at 11:14 -0700, Linus Torvalds wrote:
> > Environment: Pre Pentium systems, (boot_cpu_data.wp_works_ok == 0)
> 
> This shouldn't be "pre-pentium", afaik. WP-works-ok on i486 too. I think 
> only the original i386 had this bug ("feature").
>
> But I agree, it does seem to be broken on such machines (I assume you 
> don't actually have one, but just tested by forcing it by hand ;)

Yes, it's a genuine i386 embedded system and AFAIK the same feature is
available on 486 clones. i386 and Co are still in used in the embedded
space.

tglx




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   >