date:20061218

Re: GPL only modules

2006-12-18 Thread Giacomo A. Catenazzi

Linus Torvalds wrote:
> 
> On Mon, 18 Dec 2006, Alexandre Oliva wrote:
>>> In other words, in the GPL, "Program" does NOT mean "binary". Never has.
>> Agreed.  So what?  How does this relate with the point above?
>>
>> The binary is a Program, as much as the sources are a Program.  Both
>> forms are subject to copyright law and to the license, in spite of
>> http://www.fsfla.org/?q=en/node/128#1
> 
> Here's how it relates:
>  - if a program is not a "derived work" of the C library, then it's not 
>"the program" as defined by the GPLv2 AT ALL.
> 
> In other words, it doesn't matter ONE WHIT whether you use "ld --static" 
> or "ld" or "mkisofs" - if the program isn't (by copyright law) derived 
> from glibc, then EVEN IF glibc was under the GPLv2, it would IN NO WAY 
> AFFECT THE RESULTING BINARY.

I really don't agree.  It seems you confuse source and binary application.

The source surelly is not derived, you can link *any* libc to your
program.

But a binary is different.

Let start with your example about books: you write a book, you have
the copyright of the text, but if you publish it with X publiher, he
may use a own font.  You can read the book, scan it to extract text
(I hope fair use allows it), but not copy the book pages: there is
your text, but also copyrighted font.  Publisher should check
that the two license are compatible, as the user that links
with a new library.

For binary, it is the same. You can extract libraries and rest of
programs (better doing with sources), but until it is one binary,
it is a new mixed entity.

It is not only linking, it is mixing bytes! Some part of library is
linked statically, there are some references in the static part of
program. It is a mix and until the two part are mixed (not only linked)
you should follow both licenses for copying!

Choose any dynamic program in your machine, try to link glibc with an
other (not directly derived libc) library... you see how it is hard,
and it is very different to an "aggregation".  And dynamic links is
only the latest step of "merging" the two binaries.

Other libraries tend to be more "dynamic", but glibc mixes to much

In other word, source A, library B: the binary C is derived both from A
and B, but surelly A is not derived by B.  So IMHO IANAL, in arguments
we should not confuse the sources and the binary in the arguments, so
not calling simply "the program".

ciao
cate

> 
> And I'm simply claiming that a binary doesn't become "derived from" by any 
> action of linking.
> 
> Even if you link using "ld", even if it's static, the binary is not 
> "derived from". It's an aggregate.
> 
> "Derivation" has nothing to do with "linking". Either it's derived or it 
> is not, and "linking" simply doesn't matter. It doesn't matter whether 
> it's static or dynamic. That's a detail that simply doesn't have anythign 
> at all to do with "derivative work".
> 
> THAT is my point. 
> 
> Static vs dynamic matters for whether it's an AGGREGATE work. Clearly, 
> static linking aggregates the library with the other program in the same 
> binary. There's no question about that. And that _does_ have meaning from 
> a copyright law angle, since if you don't have permission to ship 
> aggregate works under the license, then you can't ship said binary. It's 
> just a non-issue in the specific case of the GPLv2.
> 
> In the presense of dynamic linking the binary isn't even an aggregate 
> work.
> 
> THAT is the difference between static and dynamic. A simple command line 
> flag to the linker shouldn't really reasonably be considered to change 
> "derivation" status.
> 
> Either something is derived, or it's not. If it's derived, "ld", 
> "mkisofs", "putting them close together" or "shipping them on totally 
> separate CD's" doesn't matter. It's still derived.
> 
>   Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: GPL only modules

2006-12-18 Thread Giacomo A. Catenazzi

Linus Torvalds wrote:
> 
> On Mon, 18 Dec 2006, Alexandre Oliva wrote:
>>> In other words, in the GPL, "Program" does NOT mean "binary". Never has.
>> Agreed.  So what?  How does this relate with the point above?
>>
>> The binary is a Program, as much as the sources are a Program.  Both
>> forms are subject to copyright law and to the license, in spite of
>> http://www.fsfla.org/?q=en/node/128#1
> 
> Here's how it relates:
>  - if a program is not a "derived work" of the C library, then it's not 
>"the program" as defined by the GPLv2 AT ALL.
> 
> In other words, it doesn't matter ONE WHIT whether you use "ld --static" 
> or "ld" or "mkisofs" - if the program isn't (by copyright law) derived 
> from glibc, then EVEN IF glibc was under the GPLv2, it would IN NO WAY 
> AFFECT THE RESULTING BINARY.

I really don't agree.  It seems you confuse source and binary application.

The source surelly is not derived, you can link *any* libc to your
program.

But a binary is different.

Let start with your example about books: you write a book, you have
the copyright of the text, but if you publish it with X publiher, he
may use a own font.  You can read the book, scan it to extract text
(I hope fair use allows it), but not copy the book pages: there is
your text, but also copyrighted font.  Publisher should check
that the two license are compatible, as the user that links
with a new library.

For binary, it is the same. You can extract libraries and rest of
programs (better doing with sources), but until it is one binary,
it is a new mixed entity.

It is not only linking, it is mixing bytes! Some part of library is
linked statically, there are some references in the static part of
program. It is a mix and until the two part are mixed (not only linked)
you should follow both licenses for copying!

Choose any dynamic program in your machine, try to link glibc with an
other (not directly derived libc) library... you see how it is hard,
and it is very different to an "aggregation".  And dynamic links is
only the latest step of "merging" the two binaries.

Other libraries tend to be more "dynamic", but glibc mixes to much

In other word, source A, library B: the binary C is derived both from A
and B, but surelly A is not derived by B.  So IMHO IANAL, in arguments
we should not confuse the sources and the binary in the arguments, so
not calling simply "the program".

ciao
cate

> 
> And I'm simply claiming that a binary doesn't become "derived from" by any 
> action of linking.
> 
> Even if you link using "ld", even if it's static, the binary is not 
> "derived from". It's an aggregate.
> 
> "Derivation" has nothing to do with "linking". Either it's derived or it 
> is not, and "linking" simply doesn't matter. It doesn't matter whether 
> it's static or dynamic. That's a detail that simply doesn't have anythign 
> at all to do with "derivative work".
> 
> THAT is my point. 
> 
> Static vs dynamic matters for whether it's an AGGREGATE work. Clearly, 
> static linking aggregates the library with the other program in the same 
> binary. There's no question about that. And that _does_ have meaning from 
> a copyright law angle, since if you don't have permission to ship 
> aggregate works under the license, then you can't ship said binary. It's 
> just a non-issue in the specific case of the GPLv2.
> 
> In the presense of dynamic linking the binary isn't even an aggregate 
> work.
> 
> THAT is the difference between static and dynamic. A simple command line 
> flag to the linker shouldn't really reasonably be considered to change 
> "derivation" status.
> 
> Either something is derived, or it's not. If it's derived, "ld", 
> "mkisofs", "putting them close together" or "shipping them on totally 
> separate CD's" doesn't matter. It's still derived.
> 
>   Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[git pull] more drm patches for 2.6.20

2006-12-18 Thread Dave Airlie



Hi Linus,

This is just a bunch of minor patches and fixes for the drm tree.
The biggest change is to the intel driver to fix up some tearing issues,
and a small update to the radeon bounds check to fix r300 issue.

The rest are just cleanups and comment fixes..

Dave.

 drivers/char/drm/drmP.h |7 -
 drivers/char/drm/drm_lock.c |2
 drivers/char/drm/drm_stub.c |   12 ++
 drivers/char/drm/drm_sysfs.c|8 +-
 drivers/char/drm/i915_irq.c |  199 +++
 drivers/char/drm/r128_drm.h |3 -
 drivers/char/drm/r128_drv.h |3 -
 drivers/char/drm/r128_state.c   |3 -
 drivers/char/drm/r300_cmdbuf.c  |   32 +-
 drivers/char/drm/radeon_drv.h   |   15 +++
 drivers/char/drm/radeon_irq.c   |4 -
 drivers/char/drm/radeon_mem.c   |4 -
 drivers/char/drm/radeon_state.c |   13 +--
 drivers/char/drm/savage_bci.c   |4 -
 14 files changed, 190 insertions(+), 119 deletions(-)

commit f9841a8d6018f8bcba77e75c9e368d94f1f22933
Author: Jean Delvare <[EMAIL PROTECTED]>
Date:   Tue Dec 19 18:04:33 2006 +1100

drm: Stop defining pci_pretty_name

drm drivers no longer use pci_pretty_name so we can stop defining it.

Signed-off-by: Jean Delvare <[EMAIL PROTECTED]>
Signed-off-by: Dave Airlie <[EMAIL PROTECTED]>

commit 83a9e29b0fd753c28e3979d638a8ebfd3f6ebc96
Author: Dave Airlie <[EMAIL PROTECTED]>
Date:   Tue Dec 19 17:56:14 2006 +1100

drm: r128: comment aligment with drm git

Align some r128 license comments

Signed-off-by: Dave Airlie <[EMAIL PROTECTED]>

commit 0c4dd906a220fac7997048178ee4f5d8c378b38b
Author: Dave Airlie <[EMAIL PROTECTED]>
Date:   Tue Dec 19 17:49:44 2006 +1100

drm: make kernel context switch same as for drm git tree.

Signed-off-by: Dave Airlie <[EMAIL PROTECTED]>

commit 94bb598e6b7d68690426f4c7c4385823951861eb
Author: Dave Airlie <[EMAIL PROTECTED]>
Date:   Tue Dec 19 17:49:08 2006 +1100

drm: fixup comment header style

Signed-off-by: Dave Airlie <[EMAIL PROTECTED]>

commit 183b4aeefa1ff8e0a792b95d5d56f0994d022449
Author: Eric Anholt <[EMAIL PROTECTED]>
Date:   Tue Dec 19 17:20:02 2006 +1100

drm: savage: compat fix from drm git.

Signed-off-by: Dave Airlie <[EMAIL PROTECTED]>

commit 1d6bb8e51dba3db1c15575901022fe72d363e5a4
Author: =?utf-8?q?Michel_D=C3=A4nzer?= <[EMAIL PROTECTED]>
Date:   Fri Dec 15 18:54:35 2006 +1100

drm: Unify radeon offset checking.

Replace r300_check_offset() with generic radeon_check_offset(), which 
doesn't
reject valid offsets when the framebuffer area is at the very end of the 
card's
32 bit address space. Make radeon_check_and_fixup_offset() use
radeon_check_offset() as well.

This fixes https://bugs.freedesktop.org/show_bug.cgi?id=7697 .

commit 3188a24c256bae0ed93d81d82db1f1bb6060d727
Author: =?utf-8?q?Michel_D=C3=A4nzer?= <[EMAIL PROTECTED]>
Date:   Mon Dec 11 18:32:27 2006 +1100

i915_vblank_tasklet: Try harder to avoid tearing.

Previously, if there were several buffer swaps scheduled for the same 
vertical
blank, all but the first blit emitted stood a chance of exhibiting tearing. 
In
order to avoid this, split the blits along slices of each output top to 
bottom.

Signed-off-by: Dave Airlie <[EMAIL PROTECTED]>

commit 2c3f0eddfbd7f5c7a5450de287bad805722888c3
Author: Jeff Garzik <[EMAIL PROTECTED]>
Date:   Sat Dec 9 10:50:22 2006 +1100

DRM: handle pci_enable_device failure

Signed-off-by: Jeff Garzik <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Dave Airlie <[EMAIL PROTECTED]>

commit 94f060bd0f78814f4daf8c7942bd710af52c7d6f
Author: Akinobu Mita <[EMAIL PROTECTED]>
Date:   Sat Dec 9 10:49:47 2006 +1100

drm: fix return value check

class_create() and class_device_create() return error code as a pointer on
failure.  These return values need to be checked by IS_ERR().

Signed-off-by: Akinobu Mita <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Dave Airlie <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [KORG] Re: kernel.org lies about latest -mm kernel

2006-12-18 Thread J.H.

On Tue, 2006-12-19 at 07:46 +0100, Willy Tarreau wrote:
> On Sun, Dec 17, 2006 at 04:42:56PM -0800, J.H. wrote:
> > On Mon, 2006-12-18 at 00:37 +0200, Matti Aarnio wrote:
> > > On Sun, Dec 17, 2006 at 10:23:54AM -0800, Randy Dunlap wrote:
> > > > J.H. wrote:
> > > ...
> > > > >The root cause boils down to with git, gitweb and the normal mirroring
> > > > >on the frontend machines our basic working set no longer stays resident
> > > > >in memory, which is forcing more and more to actively go to disk 
> > > > >causing
> > > > >a much higher I/O load.  You have the added problem that one of the
> > > > >frontend machines is getting hit harder than the other due to several
> > > > >factors: various DNS servers not round robining, people explicitly
> > > > >hitting [git|mirrors|www|etc]1 instead of 2 for whatever reason and
> > > > >probably several other factors we aren't aware of.  This has caused the
> > > > >average load on that machine to hover around 150-200 and if for 
> > > > >whatever
> > > > >reason we have to take one of the machines down the load on the
> > > > >remaining machine will skyrocket to 2000+.  
> > > 
> > > Relaying on DNS and clients doing round-robin load-balancing is doomed.
> > > 
> > > You really, REALLY, need external L4 load-balancer switches.
> > > (And installation help from somebody who really knows how to do this
> > > kind of services on a cluster.)
> > 
> > While this is a really good idea when you have systems that are all in a
> > single location, with a single uplink and what not - this isn't the case
> > with kernel.org.  Our machines are currently in three separate
> > facilities in the US (spanning two different states), with us working on
> > a fourth in Europe.
> 
> On multi-site setups, you have to rely on DNS, but the DNS should not
> announce the servers themselves, but the local load balancers, each of
> which knows other sites.
> 
> While people often find it dirty, there's no problem forwarding a
> request from one site to another via the internet as long as there
> are big pipes. Generally, I play with weights to slightly smooth
> the load and reduce the bandwidth usage on the pipe (eg: 2/3 local,
> 1/3 remote).
> 
> With LVS, you can even use the tunneling mode, with which the request
> comes to LB on site A, is forwarded to site B via the net, but the data
> returns from site B to the client.
> 
> If the frontend machines are not taken off-line too often, it should
> be no big deal for them to handle something such as LVS, and would
> help spreding the load.

I'll have to look into it - but by and large the round robining tends to
work.  Specifically as I am writing this the machines are both pushing
right around 150mbps, however the load on zeus1 is 170 vs. zeus2's 4.
Also when we peak the bandwidth we do use every last kb we can get our
hands on, so doing any tunneling takes just that much bandwidth away
from the total.

Number of Processes running
process #1  #2

rsync   162 69
http734 642
ftp 353 190

as a quick snapshot.  I would agree with HPA's recent statement - that
people who are mirroring against kernel.org have probably hard coded the
first machine into their scripts, combine that with a few dns servers
that don't honor or deal with round robining and you have the extra load
on the first machine vs. the second.

> 
> > > > >Since it's apparent not everyone is aware of what we are doing, I'll
> > > > >mention briefly some of the bigger points.
> > > ...
> > > > >- We've cut back on the number of ftp and rsync users to the machines.
> > > > >Basically we are cutting back where we can in an attempt to keep the
> > > > >load from spiraling out of control, this helped a bit when we recently
> > > > >had to take one of the machines down and instead of loads spiking into
> > > > >the 2000+ range we peaked at about 500-600 I believe.
> > > 
> > > How about having filesystems mounted with "noatime" ?
> > > Or do you already do that ?
> > 
> > We've been doing that for over a year.
> 
> Couldn't we temporarily *cut* the services one after the other on www1
> to find which ones are the most I/O consumming, and see which ones can
> coexist without bad interaction ?
> 
> Also, I see that keepalive is still enabled on apache, I guess there
> are thousands of processes and that apache is eating gigs of RAM by
> itself. I strongly suggest disabling keepalive there.
> 
> > - John
> 
> Just my 2 cents,
> Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Fix area->nr_free-- went (-1) issue in buddy system

2006-12-18 Thread Aubrey


On 12/19/06, Nick Piggin <[EMAIL PROTECTED]> wrote:

Hi Aubery!

That's right. I guess you can either align your zone sizes (must be
aligned to MAX_ORDER size), or add the zone check in page_is_buddy.


Adding the zone check in page_is_buddy fix the problem.
Thanks again, :)

-Aubrey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Peter Zijlstra

On Mon, 2006-12-18 at 11:18 -0800, Linus Torvalds wrote:

> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index d8a842a..3f9061e 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -448,7 +448,7 @@ static int page_mkclean_one(struct page 
> > goto unlock;
> >  
> > entry = ptep_get_and_clear(mm, address, pte);
> > -   entry = pte_mkclean(entry);
> > +   /*entry = pte_mkclean(entry);*/
> > entry = pte_wrprotect(entry);
> > ptep_establish(vma, address, pte, entry);
> > lazy_mmu_prot_update(entry);
> 
> The above patch is bad. It's always going to hide the bug, but it hides it 
> by just not doing anything at all. 

Not quite, it does wrprotect still, so further updates will trigger the
do_wp_page() path and call set_page_dirty().

So we could make 'something' that would keep the tracking working and
not create corruption, say something like this:

However I'll try and figure out how we get so terribly confused on the
PG_dirty state that we have to clean it and fall back to pte_dirty. That
is the real issue we have.

---
 include/linux/rmap.h |6 ++
 mm/page-writeback.c  |3 ++-
 mm/rmap.c|   23 ++-
 3 files changed, 26 insertions(+), 6 deletions(-)

Index: linux-2.6-git/mm/rmap.c
===
--- linux-2.6-git.orig/mm/rmap.c2006-12-18 11:06:29.0 +0100
+++ linux-2.6-git/mm/rmap.c 2006-12-19 08:33:57.0 +0100
@@ -428,7 +428,8 @@ int page_referenced(struct page *page, i
return referenced;
 }
 
-static int page_mkclean_one(struct page *page, struct vm_area_struct *vma)
+static int page_mkcw_one(struct page *page,
+struct vm_area_struct *vma, int make_clean)
 {
struct mm_struct *mm = vma->vm_mm;
unsigned long address;
@@ -448,7 +449,8 @@ static int page_mkclean_one(struct page 
goto unlock;
 
entry = ptep_get_and_clear(mm, address, pte);
-   entry = pte_mkclean(entry);
+   if (make_clean)
+   entry = pte_mkclean(entry);
entry = pte_wrprotect(entry);
ptep_establish(vma, address, pte, entry);
lazy_mmu_prot_update(entry);
@@ -460,7 +462,8 @@ out:
return ret;
 }
 
-static int page_mkclean_file(struct address_space *mapping, struct page *page)
+static int page_mkcw_file(struct address_space *mapping,
+ struct page *page, int make_clean)
 {
pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
struct vm_area_struct *vma;
@@ -478,7 +481,7 @@ static int page_mkclean_file(struct addr
return ret;
 }
 
-int page_mkclean(struct page *page)
+static int page_mkcw(struct page *page, int make_clean)
 {
int ret = 0;
 
@@ -487,12 +490,22 @@ int page_mkclean(struct page *page)
if (page_mapped(page)) {
struct address_space *mapping = page_mapping(page);
if (mapping)
-   ret = page_mkclean_file(mapping, page);
+   ret = page_mkcw_file(mapping, page, make_clean);
}
 
return ret;
 }
 
+int page_mkclean(struct page *page)
+{
+   return page_mkcw(page, 1);
+}
+
+int page_wrprotect(struct page *page)
+{
+   return page_mkcw(page, 0);
+}
+
 /**
  * page_set_anon_rmap - setup new anonymous rmap
  * @page:  the page to add the mapping to
Index: linux-2.6-git/include/linux/rmap.h
===
--- linux-2.6-git.orig/include/linux/rmap.h 2006-12-19 08:31:59.0 
+0100
+++ linux-2.6-git/include/linux/rmap.h  2006-12-19 08:32:28.0 +0100
@@ -110,6 +110,7 @@ unsigned long page_address_in_vma(struct
  * returns the number of cleaned PTEs.
  */
 int page_mkclean(struct page *);
+int page_wrprotect(struct page *);
 
 #else  /* !CONFIG_MMU */
 
@@ -125,6 +126,11 @@ static inline int page_mkclean(struct pa
return 0;
 }
 
+static inline int page_wrprotect(struct page *page)
+{
+   return 0;
+}
+
 
 #endif /* CONFIG_MMU */
 
Index: linux-2.6-git/mm/page-writeback.c
===
--- linux-2.6-git.orig/mm/page-writeback.c  2006-12-19 08:24:48.0 
+0100
+++ linux-2.6-git/mm/page-writeback.c   2006-12-19 08:31:43.0 +0100
@@ -872,7 +872,8 @@ int test_clear_page_dirty(struct page *p
 * page is locked, which pins the address_space
 */
if (mapping_cap_account_dirty(mapping)) {
-   page_mkclean(page);
+   if (page_wrprotect(page))
+   set_page_dirty();
dec_zone_page_state(page, NR_FILE_DIRTY);
}
return 1;




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please

Re: [PATCH] microcode: Fix mc_cpu_notifier section warning

2006-12-18 Thread Jean Delvare

Hi Tigran,

On Mon, 18 Dec 2006 10:04:39 + (GMT), Tigran Aivazian wrote:
> Ok, your patch is correct, although I assume you realize that it does 
> nothing --- both the function and the data it operates on are inside 
> CONFIG_HOTPLUG_CPU and checking include/linux/init.h I see that 
> __cpuinitdata is nothing in this case. E.g. msr_class_cpu_notifier in the 
> msr driver isn't declared __cpuinitdata...

I don't see anything in arch/i386/kernel/microcode.c depending on
CONFIG_HOTPLUG_CPU (in 2.6.20-rc1), sorry.

> But to tidy up one should add __cpuinitdata as you suggest (to guard for 
> the case if these two slip out of CONFIG_HOTPLUG_CPU, although they are 
> meaningless if cpu hotplug support is not configured in).
> 
> Kind regards
> Tigran
> 
> On Sun, 17 Dec 2006, Jean Delvare wrote:
> 
> > Structure mc_cpu_notifier references a __cpuinit function, but
> > isn't declared __cpuinitdata itself:
> >
> > WARNING: arch/i386/kernel/microcode.o - Section mismatch: reference
> > to .init.text: from .data after 'mc_cpu_notifier' (at offset 0x118)
> >
> > Signed-off-by: Jean Delvare <[EMAIL PROTECTED]>
> > ---
> > arch/i386/kernel/microcode.c |2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > --- linux-2.6.20-rc1.orig/arch/i386/kernel/microcode.c  2006-12-15 
> > 09:05:20.0 +0100
> > +++ linux-2.6.20-rc1/arch/i386/kernel/microcode.c   2006-12-17 
> > 15:23:40.0 +0100
> > @@ -722,7 +722,7 @@
> > return NOTIFY_OK;
> > }
> >
> > -static struct notifier_block mc_cpu_notifier = {
> > +static struct notifier_block __cpuinitdata mc_cpu_notifier = {
> > .notifier_call = mc_cpu_callback,
> > };

-- 
Jean Delvare
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] libata-scsi: ata_task_ioctl should return ATA registers from sense data

2006-12-18 Thread Tejun Heo

David Milburn wrote:
> User applications using the HDIO_DRIVE_TASK ioctl through libata
> expect specific ATA registers to be returned to userspace. Verified
> that ata_task_ioctl correctly returns register values to the
> smartctl application.
> 
> Signed-off-by: David Milburn <[EMAIL PROTECTED]>
Acked-by: Tejun Heo <[EMAIL PROTECTED]>

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Linus Torvalds

On Tue, 19 Dec 2006, Nick Piggin wrote:
> 
> I wouldn't have thought it becomes clean by dropping it ;) Is this a
> trick question? My answer is that we clean a page by by taking some
> action such that the underlying data matches the data in RAM...

Sure.

> We don't "drop" any data until it has been cleaned (again, ignoring
> things like truncate for a minute). That's a bug!

Actually, it's the other way around. We have to drop the dirty bits BEFORE 
cleaning. If we clean first, and _then_ drop the dirty bits, THAT is a 
bug, because the dirty bits can now refer to _new_ dirty data that didn't 
get written out.

So the proper sequence is _literally_ to mark the page clean FIRST. Drop 
all the dirty bits, but not the _data_ obviously (ie you have a reference 
to the page). And _then_ you do the writeout to actually clean the data 
itself.

So you actually state it exactly the wrogn way around.

We MUST clear the dirty bits before we do the IO that actually cleans the 
data. Exactly because if new writes keep on happening, if we do it in the 
other order, we'll drop dirty data on the floor.

> > In no other circumstance do we ever want to clear a dirty bit, as far as I
> > can tell. 
> 
> Exactly. And that is exactly what try_to_free_buffers is doing now.
> 
> I still think you should have a look at the patch.

I claim that dropping dirty bits AFTER the IO is always wrong. 
Try_to_free_buffers() must never touch the dirty bits at all, because by 
definition that thing happens after the IO has actually been done.

Anbd yes, I looked at your patch. And it looks a million times cleaner 
than Andrew's patch. However, it's already been tested multiple times, and 
totally REMOVING the "clear_page_dirty()" from try_to_free_buffers() still 
resulted in the corruption.

That said, I think your patch is worth it just as a cleanup. Much nicer 
than Andrews code, also from a naming standpoint. So I'm not actually 
disagreeing about the patch itself, but I _am_ saying that I don't 
actually see the point of ever moving the dirty bits around.

So I repeat: we have the case where we really want to _remove_ the dirty 
bits (because we're going to write the current state of the page to disk, 
and we need to clear the dirty bits BEFORE we do that). That's the one 
that makes sense, and that's the code we want to run before doing IO. It's 
the "clear_dirty_bits_for_io()" case.

The code that doesn't make sense is the "shuffle the dirty bits around" In 
other words: when does it actually make sense to call your 
(well-implemented, don't get me wrong) "test_clear_page_dirty_sync_ptes()"
function? It doesn't _fix_ anything. It just shuffles dirty bits from one 
place to another. What was the point again?

If the point is "try_to_free_buffers()", then my argument was that I had a 
much simpler solution: "Just don't do that then". My simple patch sadly 
didn't fix the data corruption, so the data corruption comes from 
something ELSE than try_to_free_buffers().

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Peter Zijlstra

On Tue, 2006-12-19 at 15:36 +1100, Nick Piggin wrote:

> plain text document attachment (fs-fix.patch)
> Index: linux-2.6/fs/buffer.c
> ===
> --- linux-2.6.orig/fs/buffer.c2006-12-19 15:15:46.0 +1100
> +++ linux-2.6/fs/buffer.c 2006-12-19 15:36:01.0 +1100
> @@ -2852,7 +2852,17 @@ int try_to_free_buffers(struct page *pag
>* This only applies in the rare case where try_to_free_buffers
>* succeeds but the page is not freed.
>*/
> - clear_page_dirty(page);
> +
> + /*
> +  * If the page has been dirtied via the user mappings, then
> +  * clean buffers does not indicate the page data is actually
> +  * clean! Only clear the page dirty bit if there are no dirty
> +  * ptes either.
> +  *
> +  * If there are dirty ptes, then the page must be uptodate, so
> +  * the above concern does not apply.
> +  */
> + clear_page_dirty_sync_ptes(page);
>   }
>  out:
>   if (buffers_to_free) {
> Index: linux-2.6/include/linux/page-flags.h
> ===
> --- linux-2.6.orig/include/linux/page-flags.h 2006-12-19 15:17:18.0 
> +1100
> +++ linux-2.6/include/linux/page-flags.h  2006-12-19 15:34:24.0 
> +1100
> @@ -254,6 +254,7 @@ static inline void SetPageUptodate(struc
>  struct page; /* forward declaration */
>  
>  int test_clear_page_dirty(struct page *page);
> +int test_clear_page_dirty_sync_ptes(struct page *page);
>  int test_clear_page_writeback(struct page *page);
>  int test_set_page_writeback(struct page *page);
>  
> @@ -262,6 +263,11 @@ static inline void clear_page_dirty(stru
>   test_clear_page_dirty(page);
>  }
>  
> +static inline void clear_page_dirty_sync_ptes(struct page *page)
> +{
> + test_clear_page_dirty_sync_ptes(page);
> +}
> +
>  static inline void set_page_writeback(struct page *page)
>  {
>   test_set_page_writeback(page);
> Index: linux-2.6/mm/page-writeback.c
> ===
> --- linux-2.6.orig/mm/page-writeback.c2006-12-19 15:17:53.0 
> +1100
> +++ linux-2.6/mm/page-writeback.c 2006-12-19 15:33:29.0 +1100
> @@ -844,9 +844,10 @@ EXPORT_SYMBOL(set_page_dirty_lock);
>  
>  /*
>   * Clear a page's dirty flag, while caring for dirty memory accounting. 
> + * Does not clear pte dirty bits.
>   * Returns true if the page was previously dirty.
>   */
> -int test_clear_page_dirty(struct page *page)
> +static int test_clear_page_dirty_leave_ptes(struct page *page)
>  {
>   struct address_space *mapping = page_mapping(page);
>   unsigned long flags;
> @@ -862,10 +863,8 @@ int test_clear_page_dirty(struct page *p
>* We can continue to use `mapping' here because the
>* page is locked, which pins the address_space
>*/
> - if (mapping_cap_account_dirty(mapping)) {
> - page_mkclean(page);
> + if (mapping_cap_account_dirty(mapping))
>   dec_zone_page_state(page, NR_FILE_DIRTY);
> - }
>   return 1;
>   }
>   write_unlock_irqrestore(>tree_lock, flags);
> @@ -873,9 +872,43 @@ int test_clear_page_dirty(struct page *p
>   }
>   return TestClearPageDirty(page);
>  }
> +
> +/*
> + * As above, but does clear dirty bits from ptes
> + */
> +int test_clear_page_dirty(struct page *page)
> +{
> + struct address_space *mapping = page_mapping(page);
> +
> + if (test_clear_page_dirty_leave_ptes(page)) {
> + if (mapping_cap_account_dirty(mapping))
> + page_mkclean(page);
> + return 1;
> + }
> + return 0;
> +}
>  EXPORT_SYMBOL(test_clear_page_dirty);
>  
>  /*
> + * As above, but redirties page if any dirty ptes are found (and then only
> + * if the mapping accounts dirty pages, otherwise dirty ptes are left dirty
> + * but the page is cleaned).
> + */
> +int test_clear_page_dirty_sync_ptes(struct page *page)
> +{
> + struct address_space *mapping = page_mapping(page);
> +
> + if (test_clear_page_dirty_leave_ptes(page)) {
> + if (mapping_cap_account_dirty(mapping)) {
> + if (page_mkclean(page))
> + set_page_dirty(page);
> + }
> + return 1;
> + }
> + return 0;
> +}
> +
> +/*
>   * Clear a page's dirty flag, while caring for dirty memory accounting.
>   * Returns true if the page was previously dirty.
>   *

Hmm, not quite; It certainly look better than the extra ,[01] tagged to
test_clear_page_dirty() though. Although I would have expected it the
other way around -

Re: [patch] lockdep: more unlock-on-error fixes

2006-12-18 Thread Jarek Poplawski

On Mon, Dec 18, 2006 at 03:39:36PM +0100, Ingo Molnar wrote:
> 
> * Jarek Poplawski <[EMAIL PROTECTED]> wrote:
> 
> > Hello,
> > 
> > If any of this proposals should be omitted or separated let me know.
> 
> thanks for the fixes, they look good to me. I have reorganized the 
> __lock_acquire() changes a bit. Plus i dropped the check_locks_freed() 
> changes: there's no reason lockdep should be using 'raw' irq flags 
> saving - these functions are not part of the irq-flags tracing code so 
> they dont /need/ to be raw.

I'm not 100% convinced - now trace_hardirqs_off/on is 
done only for lockdep reasons, so it is like selfcheck.
But it's probably the matter of taste.

...
> Index: linux/kernel/lockdep.c
> ===
> --- linux.orig/kernel/lockdep.c
> +++ linux/kernel/lockdep.c
...
> @@ -2210,19 +2214,24 @@ out_calc_hash:
>   if (!chain_head && ret != 2)
>   if (!check_prevs_add(curr, hlock))
>   return 0;
> - graph_unlock();
> - }
> + } else
> + /* after lookup_chain_cache(): */
> + if (unlikely(!debug_locks))
> + return 0;
> +
>   curr->lockdep_depth++;
>   check_chain_key(curr);
>   if (unlikely(curr->lockdep_depth >= MAX_LOCK_DEPTH)) {
> - debug_locks_off();
> + debug_locks_off_graph_unlock();
>   printk("BUG: MAX_LOCK_DEPTH too low!\n");
>   printk("turning off the locking correctness validator.\n");
>   return 0;
>   }
> +
>   if (unlikely(curr->lockdep_depth > max_lockdep_depth))
>   max_lockdep_depth = curr->lockdep_depth;
>  
> + graph_unlock();
>   return 1;
>  }

Sorry but it's not good... There could be no lock 
at all here (eg. trylock != 0 || check != 2). 

Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Aiee, killing interrupt handler!

2006-12-18 Thread Chuck Ebbert

In-Reply-To: <[EMAIL PROTECTED]>

On Tue, 19 Dec 2006 12:06:20 +0800, Hawk Xu wrote:

> Our server(running Oracle 10g) is having a kernel panic problem:
<> 
> Process swapper (pid: 0, threadinfo 80582000, task 80464300)
> Stack: 0296 8013f325 81007f7f54d0 0100
>0001 000e 8053e098 8013f3a5
>81007f7f54d0 810002c10a20

You need to post the entire oops message, not just the last part.  It should
start with "BUG". And using a more recent kernel would be a good idea.

> And, we have these error messages in the /var/log/kernel file:
> 
> Dec  7 17:19:09 kf85-1 kernel: set_local_var[9683]: segfault at
> fffc rip 55f41d69 rsp c4e8 error 6
> Dec  7 17:27:44 kf85-1 kernel: set_local_var[12020]: segfault at
> fffc rip 55f41d69 rsp b978 error 6

32-bit Oracle on 64-bit kernel?  If so, it's probably not going to work.

-- 
MBTI: IXTP
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-cifs-client] Re: 2.6.19.1 bug? tar: file changed as we read it

2006-12-18 Thread Chuck Ebbert

In-Reply-To: <[EMAIL PROTECTED]>

On Mon, 18 Dec 2006 11:22:36 -0500, simo wrote:

> > With cifs, a directory search shows different sizes but opening
> > them by name gives identical contents:
> > 
> > $ ll ipt_dscp* ipt_DSCP*
> > -r 1 me me 1581 Jan 28  2004 ipt_dscp.c
> > -r 1 me me 2753 Jan 29  2004 ipt_DSCP.c
> > $ ll ipt_dscp.c ipt_DSCP.c
> > -r 1 me me 1581 Jan 28  2004 ipt_dscp.c
> > -r 1 me me 1581 Jan 28  2004 ipt_DSCP.c
> > $ diff ipt_dscp.c ipt_DSCP.c
> > $
> > 
> > So where is the bug? On the server?
> 
> What is the server?
> Samba? Which vertsion?

Samba 2.2.3.

> Do you use unix extensions? Or "case sensitive = yes" ?

No UNIX extensions.  Not case sensitive.

SO this is kind of expected, but smbfs and cifs client for Linux have
subtly different behavior.

-- 
MBTI: IXTP
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: -mm merge plans for 2.6.20

2006-12-18 Thread Nick Piggin

Dave Jones wrote:

On Tue, Dec 19, 2006 at 04:20:37PM +1100, Nick Piggin wrote:
 > Dave Jones wrote:
 > 
 > > Eeek! page_mapcount(page) went negative! (-2)
 > 
 > Hmm, probably happened once before, too.

You're right. Going back further in the log, I noticed
that it had happened again exactly at the time that cron restarted vpnc.
The first time, the flags were different..

 Dec  4 00:01:03 firewall kernel: Eeek! page_mapcount(page) went negative! (-1)
 Dec  4 00:01:03 firewall kernel:   page->flags = 400
 Dec  4 00:01:03 firewall kernel:   page->count = 1
 Dec  4 00:01:03 firewall kernel:   page->mapping = 

Still reserved, with a NULL mapping. I'd say it could be the same page.

 > >   page->flags = 404
 > 
 > What's that? PG_referenced|PG_reserved? So I'd say it is likely

 > that some driver has got its refcounting wrong.

At the time that it bit me, here's what was loaded..

tun ipt_MASQUERADE iptable_nat ip_nat ipt_LOG xt_limit ipv6
ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink xt_tcpudp
iptable_filter ip_tables x_tables video sbs i2c_ec button battery asus_acpi ac
parport_pc lp parport pcspkr ide_cd i2c_viapro i2c_core cdrom 3c59x via_rhine
via_ircc mii irda crc_ccitt serio_raw dm_snapshot dm_zero dm_mirror dm_mod ext3
jbd ehci_hcd ohci_hcd uhci_hcd

The scary ones (i2c, irda) weren't in use at all, and had never been opened 
afaik,
so the potential for those to be corrupting memory is slim, but not out of the
question. (Why the hell asus_acpi is loaded is a mystery, this isn't an Asus,
or a laptop. Probably dumb initscripts).

OK that could be useful if I do some grepping and see which ones are
setting PG_reserved.

 > And I see we've got another report for 2.6.19.1 from Chris, which
 > is equally vague.

I'll be moving that box to 2.6.19.x at some point real soon, so I'll holler
if I see it again on a later kernel.

 > IMO the pattern is much too consistent to be able to attribute
 > them all to hardware problems. And considering it takes so long
 > for these things to appear, can we get something like the attached
 > patch upstream at least until we manage to stamp them out?

Sounds like a good idea to me.

ACKed-by: Dave Jones <[EMAIL PROTECTED]>

Thanks.

 > Any other debugging info we can add?

Would it be useful to print the pfn of the page ?
In cases like mine, where it bit twice before it killed the box, it
might be interesting to see if its always the same page.  Not sure
what that would prove/disprove though.

Might help. I guess the site where it is allocated from might be
another one, although I'm hoping that if we know what ->nopage is
being used then we'll be able to track it. OTOH it may be using
remap_pfn_range from fops->mmap, rather than nopage... I wonder
how we could get at that info? vma->vm_file->f_op->mmap?

--
SUSE Labs, Novell Inc.
Index: linux-2.6/include/linux/rmap.h
===
--- linux-2.6.orig/include/linux/rmap.h 2006-12-04 19:56:17.0 +1100
+++ linux-2.6/include/linux/rmap.h  2006-12-19 16:14:30.0 +1100
@@ -72,7 +72,7 @@ void __anon_vma_link(struct vm_area_stru
 void page_add_anon_rmap(struct page *, struct vm_area_struct *, unsigned long);
 void page_add_new_anon_rmap(struct page *, struct vm_area_struct *, unsigned 
long);
 void page_add_file_rmap(struct page *);
-void page_remove_rmap(struct page *);
+void page_remove_rmap(struct page *, struct vm_area_struct *);

 /**
  * page_dup_rmap - duplicate pte mapping to a page
Index: linux-2.6/mm/filemap_xip.c
===
--- linux-2.6.orig/mm/filemap_xip.c 2006-12-04 19:07:10.0 +1100
+++ linux-2.6/mm/filemap_xip.c  2006-12-19 16:14:30.0 +1100
@@ -189,7 +189,7 @@ __xip_unmap (struct address_space * mapp
/* Nuke the page table entry. */
flush_cache_page(vma, address, pte_pfn(*pte));
pteval = ptep_clear_flush(vma, address, pte);
-   page_remove_rmap(page);
+   page_remove_rmap(page, vma);
dec_mm_counter(mm, file_rss);
BUG_ON(pte_dirty(pteval));
pte_unmap_unlock(pte, ptl);
Index: linux-2.6/mm/fremap.c
===
--- linux-2.6.orig/mm/fremap.c  2006-12-04 19:56:20.0 +1100
+++ linux-2.6/mm/fremap.c   2006-12-19 16:14:30.0 +1100
@@ -33,7 +33,7 @@ static int zap_pte(struct mm_struct *mm,
if (page) {
if (pte_dirty(pte))
set_page_dirty(page);
-   page_remove_rmap(page);
+   page_remove_rmap(page, vma);
page_cache_release(page);
}
} else {
Index: linux-2.6/mm/memory.c

linus' git repo down?

2006-12-18 Thread Robert P. J. Day


  for the last couple of days, i've been unable to pull from linus'
2.6 repository.  i consistently get:

$ git pull
fatal: unexpected EOF
Fetch failure: 
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
No changes.

even after several retries.  i can clone it from scratch, i just can't
update from it.  thoughts?

rday
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [KORG] Re: kernel.org lies about latest -mm kernel

2006-12-18 Thread J.H.

On Tue, 2006-12-19 at 07:34 +0100, Willy Tarreau wrote:
> On Sat, Dec 16, 2006 at 11:30:34AM -0800, J.H. wrote:
> (...)
> > Since it's apparent not everyone is aware of what we are doing, I'll
> > mention briefly some of the bigger points.
> > 
> > - We have contacted HP to see if we can get additional hardware, mind
> > you though this is a long term solution and will take time, but if our
> > request is approved it will double the number of machines kernel.org
> > runs.
> 
> Just evil suggestion, but if you contact someone else than HP, they
> might be _very_ interested in taking HP's place and providing whatever
> you need to get their name on www.kernel.org. Sun and IBM do such
> monter machines too. That would not be very kind to HP, but it might
> help getting hardware faster.

I leave the actual hardware acquisitions up to HPA, I just try to keep
the machines up and running without too many problems.  HP has been
incredibly supportive of kernel.org in the past and I for one have been
very appreciative of their hardware and would love to continue working
with them.

> 
> > - Gitweb is causing us no end of headache, there are (known to me
> > anyway) two different things happening on that.  I am looking at Jeff
> > Garzik's suggested caching mechanism as a temporary stop-gap, with an
> > eye more on doing a rather heavy re-write of gitweb itself to include
> > semi-intelligent caching.  I've already started in on the later - and I
> > just about have the caching layer put in.  But this is still at least a
> > week out before we could even remotely consider deploying it.
> 
> Couldn't we disable gitweb for as long as we don't get newer machines ?
> I've been using it in the past, but it was just a convenience. If needed,
> we can explode all the recent patches with a "git-format-patch -k -m" in a
> directory.

I've mentioned this to the other admins and the consensus was that there
would be quite the outcry to suggest this - if the consensus is to
disable gitweb until we can get it under control we would take doing
that into consideration.

> 
> > - We've cut back on the number of ftp and rsync users to the machines.
> > Basically we are cutting back where we can in an attempt to keep the
> > load from spiraling out of control, this helped a bit when we recently
> > had to take one of the machines down and instead of loads spiking into
> > the 2000+ range we peaked at about 500-600 I believe.
> 
> I did not imagine FTP and rsync being so much used !

On average we are moving anywhere from 400-600mbps between the two
machines, on release days we max both of the connections at 1gpbs each
and have seen that draw last for 48hours.  For instance when FC6 was
released in the first 12 hours or so we moved 13 TBytes of data.

> 
> > So we know the problem is there, and we are working on it - we are
> > getting e-mails about it if not daily than every other day or so.  If
> > there are suggestions we are willing to hear them - but the general
> > feeling with the admins is that we are probably hitting the biggest
> > problems already.
> 
> BTW, yesterday my 2.4 patches were not published, but I noticed that
> they were not even signed not bziped on hera. At first I simply thought
> it was related, but right now I have a doubt. Maybe the automatic script
> has been temporarily been disabled on hera too ?

The script that deals with the uploads also deals with the packaging -
so yes the problem is related.

> 
> > - John 'Warthog9' Hawley
> > Kernel.org Admin
> 
> Thanks for keeping us informed !
> Willy

Doing what I can :-)

- John

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Nick Piggin


Linus Torvalds wrote:


On Tue, 19 Dec 2006, Nick Piggin wrote:


We never want to drop dirty data! (ignoring the truncate case, which is
handled privately by truncate anyway)



Bzzt.

SURE we do.

We absolutely do want to drop dirty data in the writeout path.

How do you think dirty data ever _becomes_ clean data?


I wouldn't have thought it becomes clean by dropping it ;) Is this a
trick question? My answer is that we clean a page by by taking some
action such that the underlying data matches the data in RAM...

We don't "drop" any data until it has been cleaned (again, ignoring
things like truncate for a minute). That's a bug! And
try_to_free_buffers() is called from places outside the writeout path.
This is our bug (or at least, one of our bugs that appears to have the
same triggers and symptoms as people are reporting).

[...]

In no other circumstance do we ever want to clear a dirty bit, as far as I 
can tell. 


Exactly. And that is exactly what try_to_free_buffers is doing now.

I still think you should have a look at the patch.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [KORG] Re: kernel.org lies about latest -mm kernel

2006-12-18 Thread Willy Tarreau

On Sun, Dec 17, 2006 at 04:42:56PM -0800, J.H. wrote:
> On Mon, 2006-12-18 at 00:37 +0200, Matti Aarnio wrote:
> > On Sun, Dec 17, 2006 at 10:23:54AM -0800, Randy Dunlap wrote:
> > > J.H. wrote:
> > ...
> > > >The root cause boils down to with git, gitweb and the normal mirroring
> > > >on the frontend machines our basic working set no longer stays resident
> > > >in memory, which is forcing more and more to actively go to disk causing
> > > >a much higher I/O load.  You have the added problem that one of the
> > > >frontend machines is getting hit harder than the other due to several
> > > >factors: various DNS servers not round robining, people explicitly
> > > >hitting [git|mirrors|www|etc]1 instead of 2 for whatever reason and
> > > >probably several other factors we aren't aware of.  This has caused the
> > > >average load on that machine to hover around 150-200 and if for whatever
> > > >reason we have to take one of the machines down the load on the
> > > >remaining machine will skyrocket to 2000+.  
> > 
> > Relaying on DNS and clients doing round-robin load-balancing is doomed.
> > 
> > You really, REALLY, need external L4 load-balancer switches.
> > (And installation help from somebody who really knows how to do this
> > kind of services on a cluster.)
> 
> While this is a really good idea when you have systems that are all in a
> single location, with a single uplink and what not - this isn't the case
> with kernel.org.  Our machines are currently in three separate
> facilities in the US (spanning two different states), with us working on
> a fourth in Europe.

On multi-site setups, you have to rely on DNS, but the DNS should not
announce the servers themselves, but the local load balancers, each of
which knows other sites.

While people often find it dirty, there's no problem forwarding a
request from one site to another via the internet as long as there
are big pipes. Generally, I play with weights to slightly smooth
the load and reduce the bandwidth usage on the pipe (eg: 2/3 local,
1/3 remote).

With LVS, you can even use the tunneling mode, with which the request
comes to LB on site A, is forwarded to site B via the net, but the data
returns from site B to the client.

If the frontend machines are not taken off-line too often, it should
be no big deal for them to handle something such as LVS, and would
help spreding the load.

> > > >Since it's apparent not everyone is aware of what we are doing, I'll
> > > >mention briefly some of the bigger points.
> > ...
> > > >- We've cut back on the number of ftp and rsync users to the machines.
> > > >Basically we are cutting back where we can in an attempt to keep the
> > > >load from spiraling out of control, this helped a bit when we recently
> > > >had to take one of the machines down and instead of loads spiking into
> > > >the 2000+ range we peaked at about 500-600 I believe.
> > 
> > How about having filesystems mounted with "noatime" ?
> > Or do you already do that ?
> 
> We've been doing that for over a year.

Couldn't we temporarily *cut* the services one after the other on www1
to find which ones are the most I/O consumming, and see which ones can
coexist without bad interaction ?

Also, I see that keepalive is still enabled on apache, I guess there
are thousands of processes and that apache is eating gigs of RAM by
itself. I strongly suggest disabling keepalive there.

> - John

Just my 2 cents,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: -mm merge plans for 2.6.20

2006-12-18 Thread Dave Jones

On Tue, Dec 19, 2006 at 04:20:37PM +1100, Nick Piggin wrote:
 > Dave Jones wrote:
 > 
 > > Eeek! page_mapcount(page) went negative! (-2)
 > 
 > Hmm, probably happened once before, too.

You're right. Going back further in the log, I noticed
that it had happened again exactly at the time that cron restarted vpnc.
The first time, the flags were different..

 Dec  4 00:01:03 firewall kernel: Eeek! page_mapcount(page) went negative! (-1)
 Dec  4 00:01:03 firewall kernel:   page->flags = 400
 Dec  4 00:01:03 firewall kernel:   page->count = 1
 Dec  4 00:01:03 firewall kernel:   page->mapping = 

 > >   page->flags = 404
 > 
 > What's that? PG_referenced|PG_reserved? So I'd say it is likely
 > that some driver has got its refcounting wrong.

At the time that it bit me, here's what was loaded..

tun ipt_MASQUERADE iptable_nat ip_nat ipt_LOG xt_limit ipv6
ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink xt_tcpudp
iptable_filter ip_tables x_tables video sbs i2c_ec button battery asus_acpi ac
parport_pc lp parport pcspkr ide_cd i2c_viapro i2c_core cdrom 3c59x via_rhine
via_ircc mii irda crc_ccitt serio_raw dm_snapshot dm_zero dm_mirror dm_mod ext3
jbd ehci_hcd ohci_hcd uhci_hcd

The scary ones (i2c, irda) weren't in use at all, and had never been opened 
afaik,
so the potential for those to be corrupting memory is slim, but not out of the
question. (Why the hell asus_acpi is loaded is a mystery, this isn't an Asus,
or a laptop. Probably dumb initscripts).

 > And I see we've got another report for 2.6.19.1 from Chris, which
 > is equally vague.

I'll be moving that box to 2.6.19.x at some point real soon, so I'll holler
if I see it again on a later kernel.

 > IMO the pattern is much too consistent to be able to attribute
 > them all to hardware problems. And considering it takes so long
 > for these things to appear, can we get something like the attached
 > patch upstream at least until we manage to stamp them out?

Sounds like a good idea to me.

ACKed-by: Dave Jones <[EMAIL PROTECTED]>

 > Any other debugging info we can add?

Would it be useful to print the pfn of the page ?
In cases like mine, where it bit twice before it killed the box, it
might be interesting to see if its always the same page.  Not sure
what that would prove/disprove though.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] procfs: export context switch counts in /proc/*/stat

2006-12-18 Thread Benjamin LaHaise

On Mon, Dec 18, 2006 at 11:50:08PM +, David Wragg wrote:
> This patch (against 2.6.19/2.6.19.1) adds the four context switch
> values (voluntary context switches, involuntary context switches, and
> the same values accumulated from terminated child processes) to the
> end of /proc/*/stat, similarly to min_flt, maj_flt and the time used
> values.

Please put these into new files, as the stat files in /proc are 
horribly overloaded and have always been somewhat problematic 
when it comes to changing how things are reported due to internal 
changes to the kernel.  Cheers,

-ben
-- 
"Time is of no importance, Mr. President, only life is important."
Don't Email: <[EMAIL PROTECTED]>.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] RTC classdev: Add sysfs support for wakeup alarm (r/w)

2006-12-18 Thread Paul Sokolovsky

Hello David,

Tuesday, December 19, 2006, 2:59:11 AM, you wrote:

> On Monday 18 December 2006 4:54 pm, David Brownell wrote:

>> > http://handhelds.org/cgi-bin/cvsweb.cgi/linux/kernel26/drivers/rtc/rtc-sa1100.c.diff?r1=1.5=1.6=h
>> 
>> That patch you applied looks right to me -- why don't you forward it
>> to Alessandro as a bugfix for 2.6.20-rc2, and save me the effort?

> Actually, correction:  it'd be correct if you ripped out the buggy
> calls to manage the irq wake mechanism.  A later message will show
> how those need to work.  (The IRQ framework will give one helpful
> hint when it warns about mismatched enable/disable calls ...)

  Do you mean enable_irq_wake()/disable_irq_wake() calls? In what way
they are buggy? The only "bug" with them I see is that they are not
implemented for PXA, which just once again reminds that mach-pxa is
real misfit in ARM family (own DMA API instead of fitting with generic
ARM one, no cpufreq support in mainline, and few other not implemented
APIs). That's of course pretty sad, as apparently PXA was/still is
the most popular CPU for consumer market (well, at least in "something
like real computer" caregory) ;-(.

  But those calls are apparently still needed, even if you say that
wakeup stuff should be handled in generic manner, as PM feature, and
on device level. After all, what drivers will do to actually enable
wakeup for a given device? I hope we don't speak about using
CPU-specific registers in reusable device drivers for that.

  This is pretty interesting topic for us, and so far in handhelds.org
ports we don't handle dynamic wakeup configuration at all, so I would
eagerly expect your samples. In the meantime, I went and hacked
.set_wake methods for PXA's irq_chips. And that's when I got idea why
it might haven't been implemented at all - PXA27x's model of wakeup
sources is a bit weird comparing with nice and clean PXA25x's ;-).
It's still not the reason to give up on those calls at all - after
all, even "least common denominator" implementation will give good
value. I yet need to test what I've put together, though.

> - Dave

-- 
Best regards,
 Paulmailto:[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux disk performance.

2006-12-18 Thread Nick Piggin


Manish Regmi wrote:


Nick Piggin:


but
they look like they might be a (HZ quantised) delay coming from
block layer plugging.



Sorry i didn´t understand what you mean.


When you submit a request to an empty block device queue, it can
get "plugged" for a number of timer ticks before any IO is actually
started. This is done for efficiency reasons and is independent of
the IO scheduler used.

Use the noop IO scheduler, as well as the attached patch, and let's
see what your numbers look like.

Thanks,
Nick

--
SUSE Labs, Novell Inc.
Index: linux-2.6/block/ll_rw_blk.c
===
--- linux-2.6.orig/block/ll_rw_blk.c2006-12-19 17:35:00.0 +1100
+++ linux-2.6/block/ll_rw_blk.c 2006-12-19 17:35:53.0 +1100
@@ -226,6 +226,8 @@ void blk_queue_make_request(request_queu
q->unplug_delay = (3 * HZ) / 1000;  /* 3 milliseconds */
if (q->unplug_delay == 0)
q->unplug_delay = 1;
+   q->unplug_delay = 0;
+   q->unplug_thresh = 0;
 
INIT_WORK(>unplug_work, blk_unplug_work, q);

Re: [take28-resend_2->0 0/8] kevent: Generic event handling mechanism.

2006-12-18 Thread Evgeniy Polyakov

On Mon, Dec 18, 2006 at 10:21:34PM -0800, Ulrich Drepper ([EMAIL PROTECTED]) 
wrote:
> Evgeniy Polyakov wrote:
> >I've uploaded the latest changes to the homepage.
> 
> Thanks.  But could you now update the patch so that it can be compiled 
> with the current upstream kernel?  At least  has 
> problems because of file->st accesses.

file->st is only defined for poll/select events, if it is not specified
at compile time, functions in linux/kevent.h becomes void:

#ifdef CONFIG_KEVENT_POLL
static inline void kevent_init_file(struct file *file)
{
kevent_storage_init(file, >st);
}

static inline void kevent_cleanup_file(struct file *file)
{
kevent_storage_fini(>st);
}
#else
static inline void kevent_init_file(struct file *file) {}
static inline void kevent_cleanup_file(struct file *file) {}
#endif

What error messages do you see and what are kevent related config
changes?

> -- 
> ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, 
> CA ❖

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: GPL only modules

2006-12-18 Thread D. Hazelton

On Monday 18 December 2006 12:16, David Schwartz wrote:
> Combined responses to save bandwidth and reduce the number of times people
> have to press "d".
>
> > Agreed. You missed the point.
>
> I don't understand how you could lead with "agreed" and then proceed to
> completely ignore the entire point I just made.

I *initially* thought you had missed the point. After your later post 
clarifying things I saw that my statement had been in error and that I did 
agree with you completely.

> > Since the Linux Kernel header files
> > contain a
> > chunk of the source code for the kernel in the form of the macros
> > for locking
> > et. al. then using the headers - including that code in your
> > module - makes
> > it a derivative work.
>
> No, it does not. The header files are purely function and not expressive in
> this case. Copyright only protects one choice among many equally-practical
> choices for expressing the same idea or performing the same function.

In this case, well. We aren't talking Copyright, but the license under which 
the software is distributed. According to the USPTO placing a statement such 
as (c) 2006 Pornrat Watanabe on a work you have created automatially places 
it under a copyright. The kernel source code, copyrighted as it is, is then 
distributed under the terms of the GNU GPL. 

Using the code from the header files may not make the module a derivative, but 
it is including parts of a copyrighted work. By *NOT* complying with the 
license under which said copyrighted work is distributed, you are giving up 
your rights under the license.

This doesn't negate any problems with people making Blob drivers, because, as 
you pointed out, under the same laws they aren't a derivative work, which 
means that that clause of the license doesn't apply. Now if the GPL contained 
a clause specifically defining what it considered a derivative work things 
would be different.

> > Actually, thinking about it, the way a Linux driver module works actually
> > seems to make *ANY* driver a derivative work, because they are
> > loaded into
> > the kernels memory space and cannot function without having that done.
>
> If every practical way of expressing an idea contains something, then that
> something is *not* protectable when used to express an idea of that kind.

Not what I was saying. There are any number of ways to make a driver 
function - the FUSE system has shown that clearly. But by making that driver 
one that is loaded directly into the kernels memory space...

It's that act that *I* *FEEL* makes it a derivative work.

> > *IF* the "Usermode Driver" interface that is being worked on ever proves
> > useful then, and only then, could you consider it *NOT* a
> > derivative work.
> > Because then the only thing it is using *IS* an interface, not complete
> > chunks of the source as generated when the pre-processor finishes running
> > through the file.
>
> No, you have it completely backwards.

No, you missed my point. I was saying that the Usermode Driver interface would 
make the current style of kernel modules fully derivative works. This being 
because they are using an open system interface and *NOT* including code 
distributed with the kernel.

> If a usermode driver interface was equally practical to develop a
> particular type of driver, then using the kernel headers would make the
> driver a derivative work. Because, in that case, the choice to use the
> kernel headers would be a creative choice -- one chosen method among many
> equally practical one.

And this is what I was saying. Perhaps I didn't state it in clear and concise 
english.

> Copyright only protects creative choices, not purely functional ones.
>
> "A Linux 2.6 driver for the ATI X800 graphics chipset" is an idea. If the
> only reasonably practical way to express that idea is with the Linux kernel
> header files, then using the Linux kernel header files is scenes a fair,
> not protected content.

Okay. I understood this back at the start of your reply.

> DS

Okay, after a lot of thought and me realizing some mistakes I had made in 
interpreting the law and legal precedents I see we are on the same page.

DRH
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Linus Torvalds



On Tue, 19 Dec 2006, Nick Piggin wrote:
> 
> We never want to drop dirty data! (ignoring the truncate case, which is
> handled privately by truncate anyway)

Bzzt.

SURE we do.

We absolutely do want to drop dirty data in the writeout path.

How do you think dirty data ever _becomes_ clean data?

In other words, yes, we _do_ want to test-and-clear all the pgtable bits 
_and_ the PG_dirty bit. We want to do it for:
 - writeout
 - truncate
 - possibly a "drop" event (which could be a case for a journal entry that 
   becomes stale due to being replaced or something - kind of "truncate" 
   on metadata)

because both of those events _literally_ turn dirty state into clean 
state.

In no other circumstance do we ever want to clear a dirty bit, as far as I 
can tell. 

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [KORG] Re: kernel.org lies about latest -mm kernel

2006-12-18 Thread Willy Tarreau

On Sat, Dec 16, 2006 at 11:30:34AM -0800, J.H. wrote:
(...)
> Since it's apparent not everyone is aware of what we are doing, I'll
> mention briefly some of the bigger points.
> 
> - We have contacted HP to see if we can get additional hardware, mind
> you though this is a long term solution and will take time, but if our
> request is approved it will double the number of machines kernel.org
> runs.

Just evil suggestion, but if you contact someone else than HP, they
might be _very_ interested in taking HP's place and providing whatever
you need to get their name on www.kernel.org. Sun and IBM do such
monter machines too. That would not be very kind to HP, but it might
help getting hardware faster.

> - Gitweb is causing us no end of headache, there are (known to me
> anyway) two different things happening on that.  I am looking at Jeff
> Garzik's suggested caching mechanism as a temporary stop-gap, with an
> eye more on doing a rather heavy re-write of gitweb itself to include
> semi-intelligent caching.  I've already started in on the later - and I
> just about have the caching layer put in.  But this is still at least a
> week out before we could even remotely consider deploying it.

Couldn't we disable gitweb for as long as we don't get newer machines ?
I've been using it in the past, but it was just a convenience. If needed,
we can explode all the recent patches with a "git-format-patch -k -m" in a
directory.

> - We've cut back on the number of ftp and rsync users to the machines.
> Basically we are cutting back where we can in an attempt to keep the
> load from spiraling out of control, this helped a bit when we recently
> had to take one of the machines down and instead of loads spiking into
> the 2000+ range we peaked at about 500-600 I believe.

I did not imagine FTP and rsync being so much used !

> So we know the problem is there, and we are working on it - we are
> getting e-mails about it if not daily than every other day or so.  If
> there are suggestions we are willing to hear them - but the general
> feeling with the admins is that we are probably hitting the biggest
> problems already.

BTW, yesterday my 2.4 patches were not published, but I noticed that
they were not even signed not bziped on hera. At first I simply thought
it was related, but right now I have a doubt. Maybe the automatic script
has been temporarily been disabled on hera too ?

> - John 'Warthog9' Hawley
> Kernel.org Admin

Thanks for keeping us informed !
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Fix area->nr_free-- went (-1) issue in buddy system

2006-12-18 Thread Nick Piggin


Hi Aubery!

Aubrey wrote:

Hi Nick,

Thanks for your reply again, ;-).

On 12/19/06, Nick Piggin <[EMAIL PROTECTED]> wrote:



This should not happen because the pages are checked to ensure they are
from the same zone before merging.



How? page_is_buddy() only check if the buddy has the buddy flag and
has the same order.
Where can I find the same zone is checked?


Ah OK, you're using 2.6.16? Later kernels have a check for this. I
guess you could backport it?





What kind of system do you have? What is the dmesg and the .config?



I'm using the blackfin uclinux. dmesg and .config is attached.

It could be that the zones are not properly aligned and 
CONFIG_HOLES_IN_ZONE

is not set.



I changed the code in paging_init(), see below:
-
#if 0
   zones_size[ZONE_DMA] = (end_mem - PAGE_OFFSET) >> 
PAGE_SHIFT;

   zones_size[ZONE_NORMAL] = 0;
#else
   zones_size[ZONE_DMA] = (end_mem/2 - PAGE_OFFSET) >> 
PAGE_SHIFT;

   zones_size[ZONE_NORMAL] = (end_mem/2 - PAGE_OFFSET) >>
PAGE_SHIFT;
#endif
-
This is only what I did the change. I also suspect the zones are not
properly aligned, But how to align it? I think our system doesn't need
CONFIG_HOLES_IN_ZONE.


That's right. I guess you can either align your zone sizes (must be
aligned to MAX_ORDER size), or add the zone check in page_is_buddy.

Hope that works.

Nick

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] incorrect direct io error handling

2006-12-18 Thread Dmitriy Monakhov

"Chen, Kenneth W" <[EMAIL PROTECTED]> writes:

> Dmitriy Monakhov wrote on Monday, December 18, 2006 5:23 AM
>> This patch is result of discussion started week ago here:
>> http://lkml.org/lkml/2006/12/11/66
>> changes from original patch:
>>  - Update wrong comments about i_mutex locking.
>>  - Add BUG_ON(!mutex_is_locked(..)) for non blkdev. 
>>  - vmtruncate call only for non blockdev
>> LOG:
>> If generic_file_direct_write() has fail (ENOSPC condition) inside 
>> __generic_file_aio_write_nolock() it may have instantiated
>> a few blocks outside i_size. And fsck will complain about wrong i_size
>> (ext2, ext3 and reiserfs interpret i_size and biggest block difference as 
>> error),
>> after fsck will fix error i_size will be increased to the biggest block,
>> but this blocks contain gurbage from previous write attempt, this is not 
>> information leak, but its silence file data corruption. This issue affect 
>> fs regardless the values of blocksize or pagesize.
>> We need truncate any block beyond i_size after write have failed , do in 
>> simular
>> generic_file_buffered_write() error path. If host is !S_ISBLK i_mutex always
>> held inside generic_file_aio_write_nolock() and we may safely call 
>> vmtruncate().
>> Some fs (XFS at least) may directly call generic_file_direct_write()with 
>> i_mutex not held. There is no general scenario in this case. This fs have to 
>> handle generic_file_direct_write() error by its own specific way (place).
>>   
>
>
> I'm puzzled that if ext2 is able to instantiate some blocks, then why does it
> return no space error?  Where is the error coming from?
generic_file_aio_write_nolock()
 ->generic_file_direct_write()
   ->generic_file_direct_IO()
 ->ext2_direct_IO(WRITE,...)
   ->blockdev_direct_IO( ,ext2_get_block,...)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux disk performance.

2006-12-18 Thread Manish Regmi

On 12/18/06, Erik Mouw <[EMAIL PROTECTED]> wrote:
<...snip...>

>
> But isn't O_DIRECT supposed to bypass buffering in Kernel?

It is.

> Doesn't it directly write to disk?

Yes, but it still uses an IO scheduler.

Ok. but i also tried with noop to turnoff disk scheduling effects.
There was still timing differences. Usually i get 3100 microseconds
but upto 2 microseconds at certain intervals. I am just using
gettimeofday between two writes to read the timing.

In your first message you mentioned you were using an ancient 2.6.10
kernel. That kernel uses the anticipatory IO scheduler. Update to the
latest stable kernel (2.6.19.1 at time of writing) and it will default
to the CFQ scheduler which has a smoother writeout, plus you can give
your process a different IO scheduling class and level (see
Documentation/block/ioprio.txt).

Thanks... i will try with CFQ.

Nick Piggin:

but
they look like they might be a (HZ quantised) delay coming from
block layer plugging.

Sorry i didn´t understand what you mean.

To minimise scheduling effects i tried giving it maximum priority.

--
---
regards
Manish Regmi

---
UNIX without a C Compiler is like eating Spaghetti with your mouth
sewn shut. It just doesn't make sense.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [take28-resend_2->0 0/8] kevent: Generic event handling mechanism.

2006-12-18 Thread Ulrich Drepper


Evgeniy Polyakov wrote:

I've uploaded the latest changes to the homepage.


Thanks.  But could you now update the patch so that it can be compiled 
with the current upstream kernel?  At least  has 
problems because of file->st accesses.


--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: -mm merge plans for 2.6.20

2006-12-18 Thread Nick Piggin


Dave Jones wrote:


Eeek! page_mapcount(page) went negative! (-2)


Hmm, probably happened once before, too.


  page->flags = 404


What's that? PG_referenced|PG_reserved? So I'd say it is likely
that some driver has got its refcounting wrong.

Unfortunately, this debugging output is almost useless when it
comes to trying to track down the problem any further.

And I see we've got another report for 2.6.19.1 from Chris, which
is equally vague.

IMO the pattern is much too consistent to be able to attribute
them all to hardware problems. And considering it takes so long
for these things to appear, can we get something like the attached
patch upstream at least until we manage to stamp them out? Any
other debugging info we can add?

--
SUSE Labs, Novell Inc.
Index: linux-2.6/include/linux/rmap.h
===
--- linux-2.6.orig/include/linux/rmap.h 2006-12-04 19:56:17.0 +1100
+++ linux-2.6/include/linux/rmap.h  2006-12-19 16:14:30.0 +1100
@@ -72,7 +72,7 @@ void __anon_vma_link(struct vm_area_stru
 void page_add_anon_rmap(struct page *, struct vm_area_struct *, unsigned long);
 void page_add_new_anon_rmap(struct page *, struct vm_area_struct *, unsigned 
long);
 void page_add_file_rmap(struct page *);
-void page_remove_rmap(struct page *);
+void page_remove_rmap(struct page *, struct vm_area_struct *);
 
 /**
  * page_dup_rmap - duplicate pte mapping to a page
Index: linux-2.6/mm/filemap_xip.c
===
--- linux-2.6.orig/mm/filemap_xip.c 2006-12-04 19:07:10.0 +1100
+++ linux-2.6/mm/filemap_xip.c  2006-12-19 16:14:30.0 +1100
@@ -189,7 +189,7 @@ __xip_unmap (struct address_space * mapp
/* Nuke the page table entry. */
flush_cache_page(vma, address, pte_pfn(*pte));
pteval = ptep_clear_flush(vma, address, pte);
-   page_remove_rmap(page);
+   page_remove_rmap(page, vma);
dec_mm_counter(mm, file_rss);
BUG_ON(pte_dirty(pteval));
pte_unmap_unlock(pte, ptl);
Index: linux-2.6/mm/fremap.c
===
--- linux-2.6.orig/mm/fremap.c  2006-12-04 19:56:20.0 +1100
+++ linux-2.6/mm/fremap.c   2006-12-19 16:14:30.0 +1100
@@ -33,7 +33,7 @@ static int zap_pte(struct mm_struct *mm,
if (page) {
if (pte_dirty(pte))
set_page_dirty(page);
-   page_remove_rmap(page);
+   page_remove_rmap(page, vma);
page_cache_release(page);
}
} else {
Index: linux-2.6/mm/memory.c
===
--- linux-2.6.orig/mm/memory.c  2006-12-04 19:56:21.0 +1100
+++ linux-2.6/mm/memory.c   2006-12-19 16:14:30.0 +1100
@@ -681,7 +681,7 @@ static unsigned long zap_pte_range(struc
mark_page_accessed(page);
file_rss--;
}
-   page_remove_rmap(page);
+   page_remove_rmap(page, vma);
tlb_remove_page(tlb, page);
continue;
}
@@ -1576,7 +1576,7 @@ gotten:
page_table = pte_offset_map_lock(mm, pmd, address, );
if (likely(pte_same(*page_table, orig_pte))) {
if (old_page) {
-   page_remove_rmap(old_page);
+   page_remove_rmap(old_page, vma);
if (!PageAnon(old_page)) {
dec_mm_counter(mm, file_rss);
inc_mm_counter(mm, anon_rss);
Index: linux-2.6/mm/rmap.c
===
--- linux-2.6.orig/mm/rmap.c2006-12-04 19:56:21.0 +1100
+++ linux-2.6/mm/rmap.c 2006-12-19 16:20:13.0 +1100
@@ -47,6 +47,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -567,7 +568,7 @@ void page_add_file_rmap(struct page *pag
  *
  * The caller needs to hold the pte lock.
  */
-void page_remove_rmap(struct page *page)
+void page_remove_rmap(struct page *page, struct vm_area_struct *vma)
 {
if (atomic_add_negative(-1, >_mapcount)) {
if (unlikely(page_mapcount(page) < 0)) {
@@ -575,6 +576,9 @@ void page_remove_rmap(struct page *page)
printk (KERN_EMERG "  page->flags = %lx\n", 
page->flags);
printk (KERN_EMERG "  page->count = %x\n", 
page_count(page));
printk (KERN_EMERG "  page->mapping = %p\n", 
page->mapping);
+   print_symbol (KERN_EMERG "  vma->vm_ops = %s\n", 
(unsigned long)vma->vm_ops);
+

Re: [RFC][PATCH] Fix area->nr_free-- went (-1) issue in buddy system

2006-12-18 Thread Aubrey


Hi Nick,

Thanks for your reply again, ;-).

On 12/19/06, Nick Piggin <[EMAIL PROTECTED]> wrote:


This should not happen because the pages are checked to ensure they are
from the same zone before merging.


How? page_is_buddy() only check if the buddy has the buddy flag and
has the same order.
Where can I find the same zone is checked?



What kind of system do you have? What is the dmesg and the .config?


I'm using the blackfin uclinux. dmesg and .config is attached.


It could be that the zones are not properly aligned and CONFIG_HOLES_IN_ZONE
is not set.


I changed the code in paging_init(), see below:
-
#if 0
   zones_size[ZONE_DMA] = (end_mem - PAGE_OFFSET) >> PAGE_SHIFT;
   zones_size[ZONE_NORMAL] = 0;
#else
   zones_size[ZONE_DMA] = (end_mem/2 - PAGE_OFFSET) >> PAGE_SHIFT;
   zones_size[ZONE_NORMAL] = (end_mem/2 - PAGE_OFFSET) >>
PAGE_SHIFT;
#endif
-
This is only what I did the change. I also suspect the zones are not
properly aligned, But how to align it? I think our system doesn't need
CONFIG_HOLES_IN_ZONE.

Thanks,
-Aubrey
root:~> dmesg
Linux version 2.6.16.27-ADI-2006R2 ([EMAIL PROTECTED]) (gcc version 4.1.1 (ADI 
06R2)) #2 Tue Dec 19 14:03:41 CST 2006
Blackfin support (C) 2004-2006 Analog Devices, Inc.
Compiled for ADSP-BF537 Rev. 0.2
Blackfin uClinux support by http://blackfin.uclinux.org/
Processor Speed: 500 MHz core clock and 100 Mhz System Clock
Board Memory: 48MB
Kernel Managed Memory: 48MB
Memory map:
  text  = 0x1000-0x0011cd18
  init  = 0x0011d000-0x0012b9b0
  data  = 0x0012bfdc-0x00151aa4
  stack = 0x0012c000-0x0012e000
  bss   = 0x00151ab0-0x001600b4
  available = 0x001600b4-0x0270
  rootfs= 0x0270-0x02f0
  DMA Zone  = 0x02f0-0x0300
On node 0 totalpages: 9984
  DMA zone: 4992 pages, LIFO batch:0
  DMA32 zone: 0 pages, LIFO batch:0
  Normal zone: 4992 pages, LIFO batch:0
  HighMem zone: 0 pages, LIFO batch:0
Instruction Cache Enabled
Data Cache Enabled (write-through)
Hardware Trace Enabled
Built 1 zonelists
Kernel command line: root=/dev/mtdblock0 rw mem=48M 
ip=192.168.0.8:192.168.0.2:10.99.22.1:255.255.255.0:BF537:eth0:off
Configuring Blackfin Priority Driven Interrupts
PID hash table entries: 256 (order: 8, 4096 bytes)
Dentry cache hash table entries: 8192 (order: 3, 32768 bytes)
Inode-cache hash table entries: 4096 (order: 2, 16384 bytes)
Physical pages: 2700
Memory available: 38144k/47952k RAM, (58k init code, 1135k kernel code, 57k 
data, 1024k dma)
Blackfin Scratchpad data SRAM: 4 KB
Blackfin DATA_A SRAM: 16 KB
Blackfin DATA_B SRAM: 16 KB
Blackfin Instruction SRAM: 48 KB
Calibrating delay loop... 995.32 BogoMIPS (lpj=1990656)
Security Framework v1.0.0 initialized
Capability LSM initialized
Mount-cache hash table entries: 512
NET: Registered protocol family 16
Blackfin DMA Controller
stamp_init(): registering device resources
NET: Registered protocol family 23
io scheduler noop registered
io scheduler anticipatory registered (default)
io scheduler cfq registered
Real Time Clock Driver v1.10e
Dynamic Power Management Controller Driver v0.1: major=10, minor = 254
Blackfin BF5xx serial driver version 2.00 (DMA mode)
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
Blackfin mac net device registered
uclinux[mtd]: RAM probe address=0x270 size=0x80
Creating 1 MTD partitions on "RAM":
0x-0x0080 : "ROMfs"
uclinux[mtd]: set ROMfs:EXT2  to be root filesystem
NET: Registered protocol family 2
IP route cache hash table entries: 512 (order: -1, 2048 bytes)
TCP established hash table entries: 2048 (order: 1, 8192 bytes)
TCP bind hash table entries: 2048 (order: 1, 8192 bytes)
TCP: Hash tables configured (established 2048 bind 2048)
TCP reno registered
TCP bic registered
NET: Registered protocol family 1
NET: Registered protocol family 17
IrCOMM protocol (Dag Brattli)
IP-Config: Gateway not on directly connected network.
VFS: Mounted root (ext2 filesystem).
Freeing unused kernel memory: 56k freed (0x11d000 - 0x12a000)
ttyS0 at irq = 18 is a builtin Blackfin UART
dma_alloc_init: dma_page @ 0x0242d000 - 256 pages at 0x02f0
root:~> cat /proc/buddyinfo 
Node 0, zone  DMA  4  3  1  2  1  1  0  1   
   1  2  1  1  0  0 
Node 0, zone   Normal  3  1  2  2  1  1  0  0 
4294967295  0  0  0  1  0 


kerne.config
Description: application/config

Re: RFC: PCI quirks update for 2.6.16

2006-12-18 Thread Sergio Monteiro Basto

On Mon, 2006-12-11 at 00:47 +0100, Adrian Bunk wrote:
> So we have the following situation:
> - 2.6.16- 2.6.16.16 : problems for Chris
>   (and possibly many other people)
> - 2.6.16.17 - 2.6.16.35 : problems for many other people
>   (I remember 4-5 bug reports in the kernel
>Bugzilla alone)
> 
> The fix in 2.6.19 was considered suboptimal, and Alan's patch for
> fixing 
> this whole issue more properly is currently not even in your tree.  

Right,
Those 4-5 bug reports should test Alan's patch.
All the problem is detected the correct devices that should be quirked.
In 2.6.16, all ( PCI_VENDOR_ID_VIA, PCI_ANY_ID), in 2.6.16.17 just some.
Still questionable if this quirks is for on-board VIA when interrupts
are in PIC mode, or for all interrupts modes (historically before the
patch to be for IO-APIC and PIC mode, was just for PIC mode, but in that
time IO-APIC wasn't common on PC) .
So with Alan's patch the question is:if a device need to be quirked and
don't. 
Those 4-5 reports will answer the question, they needs VIA quirks and we
want know is the patch do the right job. 
My laptop that need the quirks and I can test it is not available right
now and I am too busy to test on it, sorry.

Thanks,
-- 
Sérgio M.B.

smime.p7s
Description: S/MIME cryptographic signature

Re: [PATCH] incorrect direct io error handling

2006-12-18 Thread Dmitriy Monakhov

David Chinner <[EMAIL PROTECTED]> writes:

> On Mon, Dec 18, 2006 at 04:22:44PM +0300, Dmitriy Monakhov wrote:
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index 8332c77..7c571dd 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -2044,8 +2044,9 @@ generic_file_direct_write(struct kiocb *
>>  /*
>>   * Sync the fs metadata but not the minor inode changes and
>>   * of course not the data as we did direct DMA for the IO.
>> - * i_mutex is held, which protects generic_osync_inode() from
>> - * livelocking.  AIO O_DIRECT ops attempt to sync metadata here.
>> + * i_mutex may not being held (XFS does this), if so some specific 
>> locking
>> + * ordering must protect generic_osync_inode() from livelocking.
>> + * AIO O_DIRECT ops attempt to sync metadata here.
>>   */
>>  if ((written >= 0 || written == -EIOCBQUEUED) &&
>>  ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
>> @@ -2279,6 +2280,17 @@ __generic_file_aio_write_nolock(struct k
>>  
>>  written = generic_file_direct_write(iocb, iov, _segs, pos,
>>  ppos, count, ocount);
>> +/*
>> + * If host is not S_ISBLK generic_file_direct_write() may 
>> + * have instantiated a few blocks outside i_size  files
>> + * Trim these off again.
>> + */
>> +if (unlikely(written < 0) && !S_ISBLK(inode->i_mode)) {
>> +loff_t isize = i_size_read(inode);
>> +if (pos + count > isize)
>> +vmtruncate(inode, isize);
>> +}
>> +
>>  if (written < 0 || written == count)
>>  goto out;
>
> You comment in the first hunk that i_mutex may not be held here,
> but there's no comment in __generic_file_aio_write_nolock() that the
> i_mutex must be held for !S_ISBLK devices.
Any one may call directly call generic_file_direct_write() with i_mutex not 
held. 
>
>> @@ -2341,6 +2353,13 @@ ssize_t generic_file_aio_write_nolock(st
>>  ssize_t ret;
>>  
>>  BUG_ON(iocb->ki_pos != pos);
>> +/*
>> + *  generic_file_buffered_write() may be called inside 
>> + *  __generic_file_aio_write_nolock() even in case of
>> + *  O_DIRECT for non S_ISBLK files. So i_mutex must be held.
>> + */
>> +if (!S_ISBLK(inode->i_mode))
>> +BUG_ON(!mutex_is_locked(>i_mutex));
>>  
>>  ret = __generic_file_aio_write_nolock(iocb, iov, nr_segs,
>>  >ki_pos);
>
> I note that you comment here in generic_file_aio_write_nolock(),
> but it's not immediately obvious that this is refering to the
> vmtruncate() call in __generic_file_aio_write_nolock().
This is not about vmtruncate(). __generic_file_aio_write_nolock() may 
call generic_file_buffered_write() even in case of O_DIRECT for !S_ISBLK, and 
generic_file_buffered_write() has documented locking rules (i_mutex held).
IMHO it is important to explicitly document this . And after we realize
that i_mutex always held, vmtruncate() may be safely called.
>
> IOWs, wouldn't it be better to put this comment and check in
> __generic_file_aio_write_nolock() directly above the vmtruncate()
> call that cares about this?
>
>> @@ -2383,8 +2402,8 @@ ssize_t generic_file_aio_write(struct ki
>>  EXPORT_SYMBOL(generic_file_aio_write);
>>  
>>  /*
>> - * Called under i_mutex for writes to S_ISREG files.   Returns -EIO if 
>> something
>> - * went wrong during pagecache shootdown.
>> + * May be called without i_mutex for writes to S_ISREG files. XFS does this.
>> + * Returns -EIO if something went wrong during pagecache shootdown.
>>   */
>
> Not sure you need to say "XFS does this" - other filesystems may do this
> in the future.
Yes, but where are multiple comments about "reiserfs does this" in fs/buffer.c

>
> Cheers,
>
> Dave.
> -- 
> Dave Chinner
> Principal Engineer
> SGI Australian Software Group
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Task watchers v2

2006-12-18 Thread Paul Jackson

Matt wrote:
> - Task watchers can actually improve kernel performance slightly (up to
> 2% in extremely fork-heavy workloads for instance).

Nice.

Could you explain why?

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] noinitramfs: correct initcall return type

2006-12-18 Thread Randy Dunlap

From: Randy Dunlap <[EMAIL PROTECTED]>

Use expected function return type to fix warning.
init/noinitramfs.c:42: warning: initialization from incompatible pointer type

Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]>
---
 init/noinitramfs.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- linux-2.6.20-rc1-mm1.orig/init/noinitramfs.c
+++ linux-2.6.20-rc1-mm1/init/noinitramfs.c
@@ -29,7 +29,7 @@
 /*
  * Create a simple rootfs that is similar to the default initramfs
  */
-static void __init default_rootfs(void)
+static int __init default_rootfs(void)
 {
int mkdir_err = sys_mkdir("/dev", 0755);
int err = sys_mknod((const char __user *) "/dev/console",
@@ -38,5 +38,6 @@ static void __init default_rootfs(void)
if (err == -EROFS)
printk("Warning: Failed to create a rootfs\n");
mkdir_err = sys_mkdir("/root", 0700);
+   return 0;
 }
 rootfs_initcall(default_rootfs);


---
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch] search a little harder for mkimage

2006-12-18 Thread Mike Frysinger


this small patch checks to see if `${CROSS_COMPILE}mkimage` exists and
if not, fall back to the standard `mkimage`

the Blackfin toolchain includes mkimage, but we dont want to namespace
collide with any of the user's system setup, so we prefix it with our
toolchain name
-mike
Check to see if the mkimage tool is part of the cross-compile toolchain.

Signed-off-by: Mike Frysinger <[EMAIL PROTECTED]>

--- a/linux-2.6/scripts/mkuboot.sh
+++ b/linux-2.6/scripts/mkuboot.sh
@@ -4,12 +4,15 @@
 # Build U-Boot image when `mkimage' tool is available.
 #
 
-MKIMAGE=$(type -path mkimage)
+MKIMAGE=$(type -path ${CROSS_COMPILE}mkimage)
 
 if [ -z "${MKIMAGE}" ]; then
-	# Doesn't exist
-	echo '"mkimage" command not found - U-Boot images will not be built' >&2
-	exit 0;
+	MKIMAGE=$(type -path mkimage)
+	if [ -z "${MKIMAGE}" ]; then
+		# Doesn't exist
+		echo '"mkimage" command not found - U-Boot images will not be built' >&2
+		exit 0;
+	fi
 fi
 
 # Call "mkimage" to create U-Boot image

[PATCH 4/5] Break init() in two parts to avoid MODPOST warnings

2006-12-18 Thread Vivek Goyal



o init() is a non __init function in .text section but it calls many
  functions which are in .init.text section. Hence MODPOST generates lots
  of cross reference warnings on i386 if compiled with CONFIG_RELOCATABLE=y

WARNING: vmlinux - Section mismatch: reference to .init.text:smp_prepare_cpus 
from .text between 'init' (at offset 0xc0101049) and 'rest_init'
WARNING: vmlinux - Section mismatch: reference to .init.text:migration_init 
from .text between 'init' (at offset 0xc010104e) and 'rest_init'
WARNING: vmlinux - Section mismatch: reference to .init.text:spawn_ksoftirqd 
from .text between 'init' (at offset 0xc0101053) and 'rest_init'

o This patch breaks down init() in two parts. One part which can go
  in .init.text section and can be freed and other part which has to 
  be non __init(init_post()). Now init() calls init_post() and init_post()
  does not call any functions present in .init sections. Hence getting
  rid of warnings.

Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]>
---

 init/main.c |   81 +---
 1 file changed, 45 insertions(+), 36 deletions(-)

diff -puN init/main.c~brea-init-in-two-parts-to-avoid-warnings init/main.c
--- linux-2.6.19-rc1-reloc/init/main.c~brea-init-in-two-parts-to-avoid-warnings 
2006-12-15 14:09:03.0 +0530
+++ linux-2.6.19-rc1-reloc-root/init/main.c 2006-12-15 14:09:03.0 
+0530
@@ -716,7 +716,49 @@ static void run_init_process(char *init_
kernel_execve(init_filename, argv_init, envp_init);
 }
 
-static int init(void * unused)
+/* This is a non __init function. Force it to be noinline otherwise gcc
+ * makes it inline to init() and it becomes part of init.text section
+ */
+static int noinline init_post(void)
+{
+   free_initmem();
+   unlock_kernel();
+   mark_rodata_ro();
+   system_state = SYSTEM_RUNNING;
+   numa_default_policy();
+
+   if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)
+   printk(KERN_WARNING "Warning: unable to open an initial 
console.\n");
+
+   (void) sys_dup(0);
+   (void) sys_dup(0);
+
+   if (ramdisk_execute_command) {
+   run_init_process(ramdisk_execute_command);
+   printk(KERN_WARNING "Failed to execute %s\n",
+   ramdisk_execute_command);
+   }
+
+   /*
+* We try each of these until one succeeds.
+*
+* The Bourne shell can be used instead of init if we are
+* trying to recover a really broken machine.
+*/
+   if (execute_command) {
+   run_init_process(execute_command);
+   printk(KERN_WARNING "Failed to execute %s.  Attempting "
+   "defaults...\n", execute_command);
+   }
+   run_init_process("/sbin/init");
+   run_init_process("/etc/init");
+   run_init_process("/bin/init");
+   run_init_process("/bin/sh");
+
+   panic("No init found.  Try passing init= option to kernel.");
+}
+
+static int __init init(void * unused)
 {
lock_kernel();
/*
@@ -764,39 +806,6 @@ static int init(void * unused)
 * we're essentially up and running. Get rid of the
 * initmem segments and start the user-mode stuff..
 */
-   free_initmem();
-   unlock_kernel();
-   mark_rodata_ro();
-   system_state = SYSTEM_RUNNING;
-   numa_default_policy();
-
-   if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)
-   printk(KERN_WARNING "Warning: unable to open an initial 
console.\n");
-
-   (void) sys_dup(0);
-   (void) sys_dup(0);
-
-   if (ramdisk_execute_command) {
-   run_init_process(ramdisk_execute_command);
-   printk(KERN_WARNING "Failed to execute %s\n",
-   ramdisk_execute_command);
-   }
-
-   /*
-* We try each of these until one succeeds.
-*
-* The Bourne shell can be used instead of init if we are 
-* trying to recover a really broken machine.
-*/
-   if (execute_command) {
-   run_init_process(execute_command);
-   printk(KERN_WARNING "Failed to execute %s.  Attempting "
-   "defaults...\n", execute_command);
-   }
-   run_init_process("/sbin/init");
-   run_init_process("/etc/init");
-   run_init_process("/bin/init");
-   run_init_process("/bin/sh");
-
-   panic("No init found.  Try passing init= option to kernel.");
+   init_post();
+   return 0;
 }
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 5/5] i386: Fix memory hotplug related MODPOST generated warning

2006-12-18 Thread Vivek Goyal



o Fix modpost generated warning.

WARNING: vmlinux - Section mismatch: reference to .init.text: from .text
between 'add_one_highpage_hotplug' (at offset 0xc0113d3f) and 'online_page'

Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]>
---

 arch/i386/mm/init.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff -puN arch/i386/mm/init.c~i386-memory-hotplug-related-warnings 
arch/i386/mm/init.c
--- 
linux-2.6.19-rc1-reloc/arch/i386/mm/init.c~i386-memory-hotplug-related-warnings 
2006-12-15 14:09:05.0 +0530
+++ linux-2.6.19-rc1-reloc-root/arch/i386/mm/init.c 2006-12-15 
14:09:05.0 +0530
@@ -283,7 +283,7 @@ void __init add_one_highpage_init(struct
SetPageReserved(page);
 }
 
-static int add_one_highpage_hotplug(struct page *page, unsigned long pfn)
+static int __meminit add_one_highpage_hotplug(struct page *page, unsigned long 
pfn)
 {
free_new_highpage(page);
totalram_pages++;
@@ -300,7 +300,7 @@ static int add_one_highpage_hotplug(stru
  * has been added dynamically that would be
  * onlined here is in HIGHMEM
  */
-void online_page(struct page *page)
+void __meminit online_page(struct page *page)
 {
ClearPageReserved(page);
add_one_highpage_hotplug(page, page_to_pfn(page));
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/5] i386: move startup_32() in text.head section

2006-12-18 Thread Vivek Goyal



o Entry startup_32 was in .text section but it was accessing some init
  data too and it prompts MODPOST to generate compilation warnings.

WARNING: vmlinux - Section mismatch: reference to .init.data:boot_params from
.text between '_text' (at offset 0xc0100029) and 'startup_32_smp'
WARNING: vmlinux - Section mismatch: reference to .init.data:boot_params from
.text between '_text' (at offset 0xc0100037) and 'startup_32_smp'
WARNING: vmlinux - Section mismatch: reference to
.init.data:init_pg_tables_end from .text between '_text' (at offset
0xc0100099) and 'startup_32_smp'

o Can't move startup_32 to .init.text as this entry point has to be at the
  start of bzImage. Hence moved startup_32 to a new section .text.head and
  instructed MODPOST to not to generate warnings if init data is being 
  accessed from .text.head section. This code has been audited.

o SMP boot up code (startup_32_smp) can go into .init.text if CPU hotplug
  is not supported. Otherwise it generates more warnings

WARNING: vmlinux - Section mismatch: reference to .init.data:new_cpu_data from
.text between 'checkCPUtype' (at offset 0xc0100126) and 'is486'
WARNING: vmlinux - Section mismatch: reference to .init.data:new_cpu_data from
.text between 'checkCPUtype' (at offset 0xc0100130) and 'is486'

Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]>
---

 arch/i386/kernel/head.S|   17 ++---
 arch/i386/kernel/vmlinux.lds.S |7 ++-
 scripts/mod/modpost.c  |   10 +-
 3 files changed, 29 insertions(+), 5 deletions(-)

diff -puN 
arch/i386/kernel/head.S~i386-reloc-kernel-move-startup_32-in-text-head 
arch/i386/kernel/head.S
--- 
linux-2.6.19-rc1-reloc/arch/i386/kernel/head.S~i386-reloc-kernel-move-startup_32-in-text-head
   2006-12-15 14:09:01.0 +0530
+++ linux-2.6.19-rc1-reloc-root/arch/i386/kernel/head.S 2006-12-15 
14:09:01.0 +0530
@@ -53,6 +53,7 @@
  * any particular GDT layout, because we load our own as soon as we
  * can.
  */
+.section .text.head
 ENTRY(startup_32)
 
 #ifdef CONFIG_PARAVIRT
@@ -141,16 +142,25 @@ page_pde_offset = (__PAGE_OFFSET >> 20);
jb 10b
movl %edi,(init_pg_tables_end - __PAGE_OFFSET)
 
-#ifdef CONFIG_SMP
xorl %ebx,%ebx  /* This is the boot CPU (BSP) */
jmp 3f
-
 /*
  * Non-boot CPU entry point; entered from trampoline.S
  * We can't lgdt here, because lgdt itself uses a data segment, but
  * we know the trampoline has already loaded the boot_gdt_table GDT
  * for us.
+ *
+ * If cpu hotplug is not supported then this code can go in init section
+ * which will be freed later
  */
+
+#ifdef CONFIG_HOTPLUG_CPU
+.section .text
+#else
+.section .init.text
+#endif
+
+#ifdef CONFIG_SMP
 ENTRY(startup_32_smp)
cld
movl $(__BOOT_DS),%eax
@@ -208,8 +218,8 @@ ENTRY(startup_32_smp)
xorl %ebx,%ebx
incl %ebx
 
-3:
 #endif /* CONFIG_SMP */
+3:
 
 /*
  * Enable paging
@@ -492,6 +502,7 @@ ignore_int:
 #endif
iret
 
+.section .text
 #ifdef CONFIG_PARAVIRT
 startup_paravirt:
cld
diff -puN 
arch/i386/kernel/vmlinux.lds.S~i386-reloc-kernel-move-startup_32-in-text-head 
arch/i386/kernel/vmlinux.lds.S
--- 
linux-2.6.19-rc1-reloc/arch/i386/kernel/vmlinux.lds.S~i386-reloc-kernel-move-startup_32-in-text-head
2006-12-15 14:09:01.0 +0530
+++ linux-2.6.19-rc1-reloc-root/arch/i386/kernel/vmlinux.lds.S  2006-12-15 
14:09:01.0 +0530
@@ -37,9 +37,14 @@ SECTIONS
 {
   . = LOAD_OFFSET + LOAD_PHYSICAL_ADDR;
   phys_startup_32 = startup_32 - LOAD_OFFSET;
+
+  .text.head : AT(ADDR(.text.head) - LOAD_OFFSET) {
+   _text = .;  /* Text and read-only data */
+   *(.text.head)
+  } :text = 0x9090
+
   /* read-only */
   .text : AT(ADDR(.text) - LOAD_OFFSET) {
-   _text = .;  /* Text and read-only data */
*(.text)
SCHED_TEXT
LOCK_TEXT
diff -puN scripts/mod/modpost.c~i386-reloc-kernel-move-startup_32-in-text-head 
scripts/mod/modpost.c
--- 
linux-2.6.19-rc1-reloc/scripts/mod/modpost.c~i386-reloc-kernel-move-startup_32-in-text-head
 2006-12-15 14:09:01.0 +0530
+++ linux-2.6.19-rc1-reloc-root/scripts/mod/modpost.c   2006-12-15 
14:09:01.0 +0530
@@ -623,11 +623,19 @@ static int secref_whitelist(const char *
if (f1 && f2)
return 1;
 
-   /* Whitelist all references from .pci_fixup section if vmlinux */
+   /* Whitelist all references from .pci_fixup section if vmlinux
+* Whitelist all refereces from .text.head to .init.data if vmlinux
+* Whitelist all refereces from .text.head to .init.text if vmlinux
+*/
if (is_vmlinux(modname)) {
if ((strcmp(fromsec, ".pci_fixup") == 0) &&
(strcmp(tosec, ".init.text") == 0))
return 1;
+
+   if ((strcmp(fromsec, ".text.head") == 0) &&
+   ((strcmp(tosec, ".init.data") == 0) ||
+

[PATCH 1/5] i386: cpu hotplug/smpboot misc MODPOST warning fixes

2006-12-18 Thread Vivek Goyal


o Misc smpboot/cpu hotplug path cleanups. I did those to supress the 
  warnings generated by MODPOST. These warnings are visible only 
  if CONFIG_RELOCATABLE=y.
 
o CONFIG_RELOCATABLE compiles the kernel with --emit-relocs option. This
  option retains relocation information in vmlinux file and MODPOST
  is quick to spit out "Section mismatch" warnings. 

o This patch fixes some of those warnings. Many of the functions in 
  smpboot case are __devinit type and they in turn accesses text/data which
  if of type __cpuinit. Now if CONFIG_HOTPLUG=y and CONFIG_HOTPLUG_CPU=n
  then we end up in cases where a function in .text segment is calling
  another function in .init.text segment and MODPOST emits warning. 

WARNING: vmlinux - Section mismatch: reference to .init.text:identify_cpu from 
.text between 'smp_store_cpu_info' (at offset 0xc011020d) and 'do_boot_cpu'
WARNING: vmlinux - Section mismatch: reference to .init.text:init_gdt from 
.text between 'do_boot_cpu' (at offset 0xc01102ca) and '__cpu_up'
WARNING: vmlinux - Section mismatch: reference to .init.text:print_cpu_info 
from .text between 'do_boot_cpu' (at offset 0xc01105d0) and '__cpu_up'

o It also fixes the issues where CONFIG_HOTPLUG_CPU=y and start_secondary()
  is calling smp_callin() which in-turn calls synchronize_tsc_ap() which is
  of type __init. This should have meant broken CPU hotplug.

WARNING: vmlinux - Section mismatch: reference to .init.data: from .text 
between 'start_secondary' (at offset 0xc011603f) and 'initialize_secondary'
WARNING: vmlinux - Section mismatch: reference to .init.data: from .text 
between 'MP_processor_info' (at offset 0xc0116a4f) and 'mp_register_lapic'
WARNING: vmlinux - Section mismatch: reference to .init.data: from .text 
between 'MP_processor_info' (at offset 0xc0116a4f) and 'mp_register_lapic'

Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]>
---

 arch/i386/kernel/mpparse.c |8 
 arch/i386/kernel/setup.c   |2 +-
 arch/i386/kernel/smpboot.c |   10 +-
 3 files changed, 10 insertions(+), 10 deletions(-)

diff -puN 
arch/i386/kernel/mpparse.c~i386-cpu-hotplug-make-new-cpu-data-devinitdata 
arch/i386/kernel/mpparse.c
--- 
linux-2.6.19-rc1-reloc/arch/i386/kernel/mpparse.c~i386-cpu-hotplug-make-new-cpu-data-devinitdata
2006-12-15 14:08:55.0 +0530
+++ linux-2.6.19-rc1-reloc-root/arch/i386/kernel/mpparse.c  2006-12-15 
14:08:55.0 +0530
@@ -36,7 +36,7 @@
 
 /* Have we found an MP table */
 int smp_found_config;
-unsigned int __initdata maxcpus = NR_CPUS;
+unsigned int __cpuinitdata maxcpus = NR_CPUS;
 
 /*
  * Various Linux-internal data structures created from the
@@ -102,9 +102,9 @@ static int __init mpf_checksum(unsigned 
  */
 
 static int mpc_record; 
-static struct mpc_config_translation *translation_table[MAX_MPC_ENTRY] 
__initdata;
+static struct mpc_config_translation *translation_table[MAX_MPC_ENTRY] 
__cpuinitdata;
 
-static void __devinit MP_processor_info (struct mpc_config_processor *m)
+static void __cpuinit MP_processor_info (struct mpc_config_processor *m)
 {
int ver, apicid;
physid_mask_t phys_cpu;
@@ -822,7 +822,7 @@ void __init mp_register_lapic_address(u6
Dprintk("Boot CPU = %d\n", boot_cpu_physical_apicid);
 }
 
-void __devinit mp_register_lapic (u8 id, u8 enabled)
+void __cpuinit mp_register_lapic (u8 id, u8 enabled)
 {
struct mpc_config_processor processor;
int boot_cpu = 0;
diff -puN 
arch/i386/kernel/setup.c~i386-cpu-hotplug-make-new-cpu-data-devinitdata 
arch/i386/kernel/setup.c
--- 
linux-2.6.19-rc1-reloc/arch/i386/kernel/setup.c~i386-cpu-hotplug-make-new-cpu-data-devinitdata
  2006-12-15 14:08:55.0 +0530
+++ linux-2.6.19-rc1-reloc-root/arch/i386/kernel/setup.c2006-12-15 
14:10:35.0 +0530
@@ -77,7 +77,7 @@ extern struct resource code_resource;
 extern struct resource data_resource;
 
 /* cpu data as detected by the assembly code in head.S */
-struct cpuinfo_x86 new_cpu_data __initdata = { 0, 0, 0, 0, -1, 1, 0, 0, -1 };
+struct cpuinfo_x86 new_cpu_data __cpuinitdata = { 0, 0, 0, 0, -1, 1, 0, 0, -1 
};
 /* common cpu data for all cpus */
 struct cpuinfo_x86 boot_cpu_data __read_mostly = { 0, 0, 0, 0, -1, 1, 0, 0, -1 
};
 EXPORT_SYMBOL(boot_cpu_data);
diff -puN 
arch/i386/kernel/smpboot.c~i386-cpu-hotplug-make-new-cpu-data-devinitdata 
arch/i386/kernel/smpboot.c
--- 
linux-2.6.19-rc1-reloc/arch/i386/kernel/smpboot.c~i386-cpu-hotplug-make-new-cpu-data-devinitdata
2006-12-15 14:08:55.0 +0530
+++ linux-2.6.19-rc1-reloc-root/arch/i386/kernel/smpboot.c  2006-12-15 
14:08:55.0 +0530
@@ -159,7 +159,7 @@ void __init smp_alloc_memory(void)
  * a given CPU
  */
 
-static void __devinit smp_store_cpu_info(int id)
+static void __cpuinit smp_store_cpu_info(int id)
 {
struct cpuinfo_x86 *c = cpu_data + id;
 
@@ -364,7 +364,7 @@ extern void calibrate_delay(void);
 
 static atomic_t init_deasserted;
 
-static void __devinit smp_callin(void)
+static void

[PATCH 2/5] Convert some functions to __init to avoid MODPOST warnings

2006-12-18 Thread Vivek Goyal



o Some functions which should have been in init sections as they are called
  only once. Put them in init sections. Otherwise MODPOST generates warning
  as these functions are placed in .text and they end up accessing something
  in init sections.

WARNING: vmlinux - Section mismatch: reference to .init.text:migration_init
from .text between 'do_pre_smp_initcalls' (at offset 0xc01000d1) and
'run_init_process'

Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]>
---

 arch/i386/kernel/acpi/boot.c   |2 +-
 arch/i386/kernel/acpi/earlyquirk.c |2 +-
 arch/i386/kernel/setup.c   |2 +-
 init/main.c|2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff -puN 
arch/i386/kernel/acpi/boot.c~convert-some-functions-to-init-to-avoid-warnings 
arch/i386/kernel/acpi/boot.c
--- 
linux-2.6.19-rc1-reloc/arch/i386/kernel/acpi/boot.c~convert-some-functions-to-init-to-avoid-warnings
2006-12-15 14:08:59.0 +0530
+++ linux-2.6.19-rc1-reloc-root/arch/i386/kernel/acpi/boot.c2006-12-15 
14:08:59.0 +0530
@@ -333,7 +333,7 @@ acpi_parse_ioapic(acpi_table_entry_heade
 /*
  * Parse Interrupt Source Override for the ACPI SCI
  */
-static void acpi_sci_ioapic_setup(u32 gsi, u16 polarity, u16 trigger)
+static void __init acpi_sci_ioapic_setup(u32 gsi, u16 polarity, u16 trigger)
 {
if (trigger == 0)   /* compatible SCI trigger is level */
trigger = 3;
diff -puN 
arch/i386/kernel/acpi/earlyquirk.c~convert-some-functions-to-init-to-avoid-warnings
 arch/i386/kernel/acpi/earlyquirk.c
--- 
linux-2.6.19-rc1-reloc/arch/i386/kernel/acpi/earlyquirk.c~convert-some-functions-to-init-to-avoid-warnings
  2006-12-15 14:08:59.0 +0530
+++ linux-2.6.19-rc1-reloc-root/arch/i386/kernel/acpi/earlyquirk.c  
2006-12-15 14:08:59.0 +0530
@@ -50,7 +50,7 @@ static int __init check_bridge(int vendo
return 0;
 }
 
-static void check_intel(void)
+static void __init check_intel(void)
 {
u16 vendor, device;
 
diff -puN 
arch/i386/kernel/setup.c~convert-some-functions-to-init-to-avoid-warnings 
arch/i386/kernel/setup.c
--- 
linux-2.6.19-rc1-reloc/arch/i386/kernel/setup.c~convert-some-functions-to-init-to-avoid-warnings
2006-12-15 14:08:59.0 +0530
+++ linux-2.6.19-rc1-reloc-root/arch/i386/kernel/setup.c2006-12-15 
14:08:59.0 +0530
@@ -495,7 +495,7 @@ static void set_mca_bus(int x) { }
 #endif
 
 /* Overridden in paravirt.c if CONFIG_PARAVIRT */
-char * __attribute__((weak)) memory_setup(void)
+char * __init __attribute__((weak)) memory_setup(void)
 {
return machine_specific_memory_setup();
 }
diff -puN init/main.c~convert-some-functions-to-init-to-avoid-warnings 
init/main.c
--- 
linux-2.6.19-rc1-reloc/init/main.c~convert-some-functions-to-init-to-avoid-warnings
 2006-12-15 14:08:59.0 +0530
+++ linux-2.6.19-rc1-reloc-root/init/main.c 2006-12-15 14:09:57.0 
+0530
@@ -698,7 +698,7 @@ static void __init do_basic_setup(void)
do_initcalls();
 }
 
-static void do_pre_smp_initcalls(void)
+static void __init do_pre_smp_initcalls(void)
 {
extern int spawn_ksoftirqd(void);
 #ifdef CONFIG_SMP
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Nick Piggin


Linus Torvalds wrote:

On Mon, 18 Dec 2006, Peter Zijlstra wrote:


This should be safe; page_mkclean walks the rmap and flips the pte's
under the pte lock and records the dirty state while iterating.
Concurrent faults will either do set_page_dirty() before we get around
to doing it or vice versa, but dirty state is not lost.



Ok, I really liked this patch, but the more I thought about it, the more I 
started to doubt the reasons for liking it.


Well this implements my suggestion to redirty the page if there were dirty
ptes. I think it is a good fix (whether or not it fixes Andrei's bug, it
does fix a bug), though maybe _slightly_ suboptimal.

I think we have some core fundamental problem here that this patch is 
needed at all.


So let's think about this: we apparently have two cases of 
"clear_page_dirty()":


 - the one that really wants to clear the bit unconditionally (Andrew 
   calls this the "must_clean_ptes" case, which I personally find to be a 
   really confusing name, but whatever)


 - the other case. The case that doesn't want to really clear the pte 
   dirty bits.


I don't think this characterises it correctly. Think about how it worked
before the page_mkclean went in there.

We really _never_ want to just clear pte dirty bits, because that would be
a data loss situation[*]. The only reason we clear PG_dirty is because some
filesystem may have cleaned each buffer without realising it has cleaned
the whole page. But if you have a dirty pte, then all bets are off: a
buffer with a clear dirty bit can not be considered clean.

Before the dirty page tracking, it was fine to clear PG_dirty here, because
we would pick up the pte dirty info later on. After the page dirty tracking,
clearing pte dirty is a bug here, and re-accounting the dirty page is
arguably the minimal fix.

[*] except in the truncate case where we are happy to throw out dirty data,
but in that case there would be no ptes anyway.

The only thing I would suggest is not applying Andrew's patch at all, and
do the special casing in try_to_free_buffers(). I've attached a patch for
comments.


and I thought your patch made sense, because it saved away the pte state 
in the page dirty state, and that matches my mental model, but the more I 
think about it, the less sense that whole "the other case" situation makes 
AT ALL.


Why does "the other case" exist at all? If you want to clear the dirty 
page flag, what is _ever_ the reason for not wanting to drop PTE dirty 
information? In other words, what possible reason can there ever be for 
saying "I want this page to be clean", while at the same time saying "but 
if it was dirty in the page tables, don't forget about that state".


We never want to drop dirty data! (ignoring the truncate case, which is
handled privately by truncate anyway)

This whole exercise is not about cleaning or dirtying or fogetting the actual
*data* in the page. It is about bringing the pagecache's notion of whether
the page is dirty or clean in line with the (more uptodate) filesystem's
notion.

After dirty write accounting, we also threw in "the virtual memory manager's
notion", but got that case slightly wrong.

As unlikely as this race is for SMP systems, I think it is easily possible
for PREEMPT kernels. And they have featured in all bug reports, AFAIKS.

--
SUSE Labs, Novell Inc.
Index: linux-2.6/fs/buffer.c
===
--- linux-2.6.orig/fs/buffer.c  2006-12-19 15:15:46.0 +1100
+++ linux-2.6/fs/buffer.c   2006-12-19 15:36:01.0 +1100
@@ -2852,7 +2852,17 @@ int try_to_free_buffers(struct page *pag
 * This only applies in the rare case where try_to_free_buffers
 * succeeds but the page is not freed.
 */
-   clear_page_dirty(page);
+
+   /*
+* If the page has been dirtied via the user mappings, then
+* clean buffers does not indicate the page data is actually
+* clean! Only clear the page dirty bit if there are no dirty
+* ptes either.
+*
+* If there are dirty ptes, then the page must be uptodate, so
+* the above concern does not apply.
+*/
+   clear_page_dirty_sync_ptes(page);
}
 out:
if (buffers_to_free) {
Index: linux-2.6/include/linux/page-flags.h
===
--- linux-2.6.orig/include/linux/page-flags.h   2006-12-19 15:17:18.0 
+1100
+++ linux-2.6/include/linux/page-flags.h2006-12-19 15:34:24.0 
+1100
@@ -254,6 +254,7 @@ static inline void SetPageUptodate(struc
 struct page;   /* forward declaration */
 
 int test_clear_page_dirty(struct page *page);
+int test_clear_page_dirty_sync_ptes(struct page *page);
 int test_clear_page_writeback(struct page *page);
 int test_set_page_writeback(struct page *page);
 
@@ -262,6

Re: [take28-resend_2->0 0/8] kevent: Generic event handling mechanism.

2006-12-18 Thread Evgeniy Polyakov

On Mon, Dec 18, 2006 at 07:47:21PM -0800, Ulrich Drepper ([EMAIL PROTECTED]) 
wrote:
> It would help if one could actually get hold of the changes.
> 
> Neither at home nor on my gmail account did I get them all.  The gmane 
> also only has 5 of the 9 mails or so.  Your archive only has sources 
> from a couple of versions back.

Kernel archive contains all changes as far as I can see at
marc.theaimsgroup.com and lkml.org

I've uploaded the latest changes to the homepage.

> -- 
> ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, 
> CA ❖

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfslogd-spinlock bug?

2006-12-18 Thread David Chinner

On Tue, Dec 19, 2006 at 01:52:29PM +1100, David Chinner wrote:
> On Tue, Dec 19, 2006 at 12:39:46AM +0100, Haar János wrote:
> > From: "David Chinner" <[EMAIL PROTECTED]>
> > > #define POISON_FREE 0x6b
> > >
> > > Can you confirm that you are running with CONFIG_DEBUG_SLAB=y?
> > 
> > Yes, i build with this option enabled.

..

> FWIW, I've run XFSQA twice now on a scsi disk with slab debuggin turned
> on and I haven't seen this problem. I'm not sure how to track down
> the source of the problem without a test case, but as a quick test, can
> you try the following patch?

Third try an I got a crash on a poisoned object:

[1]kdb> md8c40 e0300d7d5100
0xe0300d7d5100 5a2cf071    q.,Z
0xe0300d7d5110 5a2cf071 6b6b6b6b6b6b6b6b   q.,Z
0xe0300d7d5120 e039eb7b6320 6b6b6b6b6b6b6b6bc{.9...
0xe0300d7d5130 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
0xe0300d7d5140 6b6b6b6f6b6b6b6b 6b6b6b6b6b6b6b6b   okkk
0xe0300d7d5150 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
0xe0300d7d5160 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
0xe0300d7d5170 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
0xe0300d7d5180 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
0xe0300d7d5190 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
0xe0300d7d51a0 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
0xe0300d7d51b0 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
0xe0300d7d51c0 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
0xe0300d7d51d0 6b6b6b6b6b6b6b6b a56b6b6b6b6b6b6b   kkk.
0xe0300d7d51e0 5a2cf071 a00100468c30   q.,Z0.F.
[1]kdb> mds 0xe0300d7d51e0
0xe0300d7d51e0 5a2cf071   q.,Z
0xe0300d7d51e8 a00100468c30 xfs_inode_item_destroy+0x30

So the use-after-free here is on an inode item. You're tripping
over a buffer item.

Unfortunately, it is not the same problem - the problem I've just
hit is to do with a QA test that does a forced shutdown on an active
filesystem, and:

[1]kdb> xmount 0xe0304393e238
.
flags 0x440010 

The filesystem was being shutdown so xfs_inode_item_destroy() just
frees the inode log item without removing it from the AIL. I'll fix that,
and see if i have any luck

So I'd still try that patch i sent in the previous email...

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

unreapable zombies, maybe futex+ptrace+exit

2006-12-18 Thread Albert Cahalan


I have a fun little test program for people to try. It creates zombies
that persist until reboot, despite being reparented to init. Sometimes
it creates processes that block SIGKILL, sit around with pending SIGKILL,
or both.

You'll want:

a. either assembly skills or the ability to run 32-bit x86 code
b. the procps-3.2.7 release, so you can easily view the results
c. the strace program, or some other ptrace-based debugger
d. a recent kernel -- updated Fedora 5 or mainline 2.6.19 will do

Compile like this:
gcc -m32 -std=gnu99 -O2 -o cloninator cloninator.c

Run like this:
strace -f -F ./cloninator

Let the program run for a bit, then do one of a few fun things:

a. hit ^C to stop it
b. run "killall -9 cloninator" to stop it
c. send SIGKILL to the process group (the negative as PID)
d. send SIGKILL to all your processes (use -1 as PID)

View the results:
ps -Ccloninator -mwostat,ppid,pid,tid,nlwp,pending,sigmask,sigignore,caught,wch

I suggest trying other debuggers. Under a debugger I can't share,
thousands of messed-up zombies get created in under a minute.
With strace, you'll probably get a half dozen after a couple trys.
You might try gdb, fenris, nightview, and anything else which
uses ptrace to observe something. (Ideas?) Be sure to specify any
options needed to follow child processes; you may need to comment
out the CLONE_VFORK case for wimpy debuggers.

BTW, we can probably now answer this question:

$ egrep -i 'todo.*safe' kernel/*.c
kernel/exit.c:  // TODO: is this safe?
kernel/exit.c:  // TODO: is this safe?

///

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#include 
#include 

#include 

static void early_write(int fd, const void *buf, size_t count)
{
#if 0
   unsigned long eax = __NR_write;
   // push and pop because -fPIC probably needs ebx for the GOT
base pointer
   __asm__ __volatile__(
   "push %%ebx ; push %1 ; pop %%ebx ; int $0x80; pop %%ebx"
   :"=a"(eax)
   :"r"(fd),"c"(buf),"d"(count),"0"(eax)
   :"memory"
   );
#endif
}

static void p_str(char *s)
{
   size_t count = strlen(s);
   early_write(STDERR_FILENO,s,count);
}

static void p_hex(unsigned long u)
{
   char buf[9];
   char x[] = "0123456789abcdef";
   char *s = buf;
   s[8] = '\0';
   int i = 8;
   while(i--)
   buf[7-i] = x[(u>>(i*4))&15];
   early_write(STDERR_FILENO,buf,8);
}

static void p_dec(unsigned long u)
{
   char buf[11];
   char *s = buf+10;
   *s-- = '\0';
   int count = 0;
   while(u || !count)
   {
   *s-- = u%10 + '0';
   u /= 10;
   count++;
   }
   early_write(STDERR_FILENO,s+1,count);
}


#define FUTEX_WAIT  0
#define FUTEX_WAKE  1


typedef int lock_t;

#define LOCK_INITIALIZER 0

static inline void init_lock(lock_t* l) { *l = 0; }

// lock_add performs an atomic add and returns the resulting value
static inline int lock_add(lock_t* l, int val)
{
   int result = val;
   __asm__ __volatile__ (
   "lock; xaddl %1, %0;"
   : "=m" (*l), "=r" (result)
   : "1" (result), "m" (*l)
   : "memory");
   return result + val; // Return the value written to memory
}

// lock_bts_high_bit atomically tests and sets the high bit and returns
// true if the bit was clear initially
static inline bool lock_bts_high_bit(lock_t* l)
{
   bool result;
   __asm__ __volatile__ (
   "lock; btsl $31, %0;\n\t"
   "setnc %1;"
   : "=m" (*l), "=q" (result)
   : "m" (*l)
   : "memory");
   return result;
}

static int futex(int* uaddr, int op, int val, const struct
timespec*timeout, int*uaddr2, int val3)
{
   (void)timeout;
   (void)uaddr2;
   (void)val3;
   int eax = __NR_futex;
   __asm__ __volatile__(
   "push %%ebx ; push %1 ; pop %%ebx ; int $0x80; pop %%ebx"
   :"=a"(eax)
   :"r"(uaddr),"c"(op),"d"(val),"0"(eax)
   :"memory"
   );
   return eax;
}


// lock will wait for and lock a mutex
static void lock(lock_t* l)
{
   // Check the mutex and set held bit
   if (lock_bts_high_bit(l))
   {
   // Got the mutex
   return;
   }

   // Increment wait count
   lock_add(l, 1);

   while (true)
   {
   // Check the mutex and set held bit
   if (lock_bts_high_bit(l))
   {
   // Got the mutex, decrement wait count
   lock_add(l, -1);
   return;
   }

   int val = *l;
   // Ensure the mutex wasn't given up since the check
   if (!(val & 0x8000))
   continue;

Re: [RFC][PATCH] Fix area->nr_free-- went (-1) issue in buddy system

2006-12-18 Thread Nick Piggin


Aubrey wrote:

Hi all,

When I setup two zones (NORMAL and DMA) in my system, I got the
following wired result from /proc/buddyinfo.
- 


root:~> cat /proc/buddyinfo
Node 0, zone  DMA  2  1  2  1  1  0  0
1  1  2  2  0  0  0
Node 0, zone   Normal  1  1  1  1  1  1  0
0 4294967295  0 4294967295  2  0  0
- 



As you see, two area->nr_free went -1.

After dig into the code, I found the problem is in the fun
__free_one_page() when the kernel boot up call free_all_bootmem(). If
two zones setup, it's possible NORMAL zone merged a block whose order
=8 at the first time(this time zone[NORMA]->free_area[8].nr_free = 0)
and found its buddy in the DMA zone. So the two blocks will be merged
and area->nr_free went to -1.


This should not happen because the pages are checked to ensure they are
from the same zone before merging.

What kind of system do you have? What is the dmesg and the .config? It
could be that the zones are not properly aligned and CONFIG_HOLES_IN_ZONE
is not set.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: GPL only modules

2006-12-18 Thread Daniel Barkalow

On Mon, 18 Dec 2006, Linus Torvalds wrote:

> Static vs dynamic matters for whether it's an AGGREGATE work. Clearly, 
> static linking aggregates the library with the other program in the same 
> binary. There's no question about that. And that _does_ have meaning from 
> a copyright law angle, since if you don't have permission to ship 
> aggregate works under the license, then you can't ship said binary. It's 
> just a non-issue in the specific case of the GPLv2.

Under US law, the distinction is between works that are copyrightable 
themselves as "derivative works" and works that are derived from others, 
but aren't copyrightable. Provided you're allowed to ship aggregate works, 
the question is whether the output of "ld" is a copyrightable work 
distinct from the inputs.

I'd agree that "ar", like "mkisofs", doesn't create a derived work, but I 
think that "objcopy" does create a derived work, and "ld" does too, by 
virtue of modifying the objects it takes to resolve symbols. Now, you 
could distribute to somebody an ar archive of your program, and the 
recipient (given fair use rights to the copy of the program they received) 
could do "gcc program.a -o program" to link it. But I don't think you 
automatically get the right (under the "mere aggregation" permission) to 
distribute the result of relocating the symbols of gnutls around those of 
your program and vice versa, along with modifying the references to 
external symbols from each of these to point to specific locations.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: GPL only modules

2006-12-18 Thread D. Hazelton

On Monday 18 December 2006 20:35, David Schwartz wrote:
> > For both static and dynamic linking, you might claim the output is an
> > aggregate, but that doesn't matter.  What matters is whether or not
> > the output is a work based on the program, and whether the "mere
> > aggregation" paragraph kicks in.
> >
> > If the output is not an aggregate, which is quite likely to be
> > the case for dynamic linking, and quite possibly also for many static
> > linking cases, then the "mere aggregation" paragraph of clause 2 does
> > not kick in.
> >
> > If the output is indeed an aggregate, as it may quite likely be in the
> > case of static linking, then the "mere aggregation" considerations of
> > clause 2 may kick in and enable the 'anything else' to not be brought
> > under the scope of the license.  You still need permission to
> > distribute the whole.  The GPL asserts its non-interference with your
> > ability to distribute the separate portion separately, under whatever
> > license you can, as long as it's not a derived work from the GPL
> > portion.
>
> No!
>
> It makes no difference whether the "mere aggregation" paragraph kicks in
> because the "mere aggregation" paragraph is *explaining* the *law*. What
> matters is what the law actually *says*.
>
> We are talking about what works are within the GPL's scope. The text of the
> GPL does not matter because the GPL does not set its own scope, copyright
> law does.
>
> The GPL could say that if you ever see the source code to a GPL'd work,
> every work you ever write must be placed under the GPL. But that wouldn't
> make it true, because that would be a requirement outside the GPL's scope.
>
> We are talking about works are inside the GPL's legal scope, and in that
> case, nothing the GPL says can enlarge the scope.
>
> DS


Actually, after rereading the GPLv2 because of this discussion I came to a 
most surprising conclusion. While there are *IMPLICIT* and *EXPLICIT* 
copyrights on the code, they have no bearing on the text of the GPL.

The GPL is a License that covers how the code may be used, modified and 
distributed. This is the reason that the FSF people had to make the big 
exception for Bison, because the parser skeleton is such an integral part of 
Bison (Bison itself, IIRC, uses the same skeleton, modified, as part of the 
program) that truthfully, any parser built using Bison is a derivative work 
of code released under the GPL.

That said, since there is a distribution, use and modification license on the 
Linux Kernel - the GPLv2 - there are those extra restrictions on the code 
*OUTSIDE* the copyright rules.

DRH
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[-mm patch] drivers/pci/quirks.c: cleanup

2006-12-18 Thread Adrian Bunk

This patch contains the following cleanups:
- move all EXPORT_SYMBOL's directly below the code they are exporting
- move all DECLARE_PCI_FIXUP_*'s directly below the functions they
  are calling

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---

 drivers/pci/pci.c|4 
 drivers/pci/quirks.c |   42 +-
 2 files changed, 17 insertions(+), 29 deletions(-)

--- linux-2.6.20-rc1-mm1/drivers/pci/quirks.c.old   2006-12-19 
04:12:39.0 +0100
+++ linux-2.6.20-rc1-mm1/drivers/pci/quirks.c   2006-12-19 04:59:22.0 
+0100
@@ -61,7 +61,8 @@ DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_I
 
 This appears to be BIOS not version dependent. So presumably there is a 
 chipset level fix */
-int isa_dma_bridge_buggy;  /* Exported */
+int isa_dma_bridge_buggy;
+EXPORT_SYMBOL(isa_dma_bridge_buggy);
 
 static void __devinit quirk_isa_dma_hangs(struct pci_dev *dev)
 {
@@ -83,6 +84,7 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_NE
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_NEC, PCI_DEVICE_ID_NEC_CBUS_3,   
quirk_isa_dma_hangs );
 
 int pci_pci_problems;
+EXPORT_SYMBOL(pci_pci_problems);
 
 /*
  * Chipsets where PCI->PCI transfers vanish or hang
@@ -94,6 +96,8 @@ static void __devinit quirk_nopcipci(str
pci_pci_problems |= PCIPCI_FAIL;
}
 }
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_SI,  PCI_DEVICE_ID_SI_5597,  
quirk_nopcipci );
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_SI,  PCI_DEVICE_ID_SI_496,   
quirk_nopcipci );
 
 static void __devinit quirk_nopciamd(struct pci_dev *dev)
 {
@@ -105,9 +109,6 @@ static void __devinit quirk_nopciamd(str
pci_pci_problems |= PCIAGP_FAIL;
}
 }
-
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_SI,  PCI_DEVICE_ID_SI_5597,  
quirk_nopcipci );
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_SI,  PCI_DEVICE_ID_SI_496,   
quirk_nopcipci );
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_8151_0,   
quirk_nopciamd );
 
 /*
@@ -1122,6 +1123,14 @@ static void quirk_sis_96x_smbus(struct p
pci_write_config_byte(dev, 0x77, val & ~0x10);
pci_read_config_byte(dev, 0x77, );
 }
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SI, PCI_DEVICE_ID_SI_961,   
quirk_sis_96x_smbus );
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SI, PCI_DEVICE_ID_SI_962,   
quirk_sis_96x_smbus );
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SI, PCI_DEVICE_ID_SI_963,   
quirk_sis_96x_smbus );
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SI, PCI_DEVICE_ID_SI_LPC,   
quirk_sis_96x_smbus );
+DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_SI, PCI_DEVICE_ID_SI_961,   
quirk_sis_96x_smbus );
+DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_SI, PCI_DEVICE_ID_SI_962,   
quirk_sis_96x_smbus );
+DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_SI, PCI_DEVICE_ID_SI_963,   
quirk_sis_96x_smbus );
+DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_SI, PCI_DEVICE_ID_SI_LPC,   
quirk_sis_96x_smbus );
 
 /*
  * ... This is further complicated by the fact that some SiS96x south
@@ -1158,6 +1167,8 @@ static void quirk_sis_503(struct pci_dev
 */
dev->device = devid;
 }
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SI, PCI_DEVICE_ID_SI_503,   
quirk_sis_503 );
+DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_SI, PCI_DEVICE_ID_SI_503,   
quirk_sis_503 );
 
 static void __init quirk_sis_96x_compatible(struct pci_dev *dev)
 {
@@ -1170,8 +1181,6 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_S
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SI, PCI_DEVICE_ID_SI_651,   
quirk_sis_96x_compatible );
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SI, PCI_DEVICE_ID_SI_735,   
quirk_sis_96x_compatible );
 
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SI, PCI_DEVICE_ID_SI_503,   
quirk_sis_503 );
-DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_SI, PCI_DEVICE_ID_SI_503,   
quirk_sis_503 );
 /*
  * On ASUS A8V and A8V Deluxe boards, the onboard AC97 audio controller
  * and MC97 modem controller are disabled when a second PCI soundcard is
@@ -1202,21 +1211,8 @@ static void asus_hides_ac97_lpc(struct p
}
 }
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_VIA,PCI_DEVICE_ID_VIA_8237, 
asus_hides_ac97_lpc );
-
-
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SI, PCI_DEVICE_ID_SI_961,   
quirk_sis_96x_smbus );
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SI, PCI_DEVICE_ID_SI_962,   
quirk_sis_96x_smbus );
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SI, PCI_DEVICE_ID_SI_963,   
quirk_sis_96x_smbus );
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SI, PCI_DEVICE_ID_SI_LPC,   
quirk_sis_96x_smbus );
-
 DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_VIA,PCI_DEVICE_ID_VIA_8237, 
asus_hides_ac97_lpc );
 
-
-DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_SI, PCI_DEVICE_ID_SI_961,   
quirk_sis_96x_smbus );
-DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_SI, PCI_DEVICE_ID_SI_962,

[2.6 patch] drivers/atm/fore200e.c: cleanups

2006-12-18 Thread Adrian Bunk

This patch contains the following transformations from custom functions 
to standard kernel version:
- fore200e_kmalloc() -> kzalloc()
- fore200e_kfree() -> kfree()
- fore200e_swap() -> cpu_to_be32()

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---

 drivers/atm/fore200e.c |  166 -
 1 file changed, 68 insertions(+), 98 deletions(-)

--- linux-2.6.20-rc1-mm1/drivers/atm/fore200e.c.old 2006-12-19 
04:29:36.0 +0100
+++ linux-2.6.20-rc1-mm1/drivers/atm/fore200e.c 2006-12-19 04:34:53.0 
+0100
@@ -172,25 +172,6 @@ fore200e_irq_itoa(int irq)
 }
 
 
-static void*
-fore200e_kmalloc(int size, gfp_t flags)
-{
-void *chunk = kzalloc(size, flags);
-
-if (!chunk)
-   printk(FORE200E "kmalloc() failed, requested size = %d, flags = 
0x%x\n",size, flags);
-
-return chunk;
-}
-
-
-static void
-fore200e_kfree(void* chunk)
-{
-kfree(chunk);
-}
-
-
 /* allocate and align a chunk of memory intended to hold the data behing 
exchanged
between the driver and the adapter (using streaming DVMA) */
 
@@ -206,7 +187,7 @@ fore200e_chunk_alloc(struct fore200e* fo
 chunk->align_size = size;
 chunk->direction  = direction;
 
-chunk->alloc_addr = fore200e_kmalloc(chunk->alloc_size, GFP_KERNEL | 
GFP_DMA);
+chunk->alloc_addr = kzalloc(chunk->alloc_size, GFP_KERNEL | GFP_DMA);
 if (chunk->alloc_addr == NULL)
return -ENOMEM;
 
@@ -228,7 +209,7 @@ fore200e_chunk_free(struct fore200e* for
 {
 fore200e->bus->dma_unmap(fore200e, chunk->dma_addr, chunk->dma_size, 
chunk->direction);
 
-fore200e_kfree(chunk->alloc_addr);
+kfree(chunk->alloc_addr);
 }
 
 
@@ -882,7 +863,7 @@ fore200e_sba_detect(const struct fore200
return NULL;
 }
 
-fore200e = fore200e_kmalloc(sizeof(struct fore200e), GFP_KERNEL);
+fore200e = kzalloc(sizeof(struct fore200e), GFP_KERNEL);
 if (fore200e == NULL)
return NULL;
 
@@ -1505,7 +1486,7 @@ fore200e_open(struct atm_vcc *vcc)
 
 spin_unlock_irqrestore(>q_lock, flags);
 
-fore200e_vcc = fore200e_kmalloc(sizeof(struct fore200e_vcc), GFP_ATOMIC);
+fore200e_vcc = kzalloc(sizeof(struct fore200e_vcc), GFP_ATOMIC);
 if (fore200e_vcc == NULL) {
vc_map->vcc = NULL;
return -ENOMEM;
@@ -1526,7 +1507,7 @@ fore200e_open(struct atm_vcc *vcc)
if (fore200e->available_cell_rate < vcc->qos.txtp.max_pcr) {
up(>rate_sf);
 
-   fore200e_kfree(fore200e_vcc);
+   kfree(fore200e_vcc);
vc_map->vcc = NULL;
return -EAGAIN;
}
@@ -1554,7 +1535,7 @@ fore200e_open(struct atm_vcc *vcc)
 
fore200e->available_cell_rate += vcc->qos.txtp.max_pcr;
 
-   fore200e_kfree(fore200e_vcc);
+   kfree(fore200e_vcc);
return -EINVAL;
 }
 
@@ -1630,7 +1611,7 @@ fore200e_close(struct atm_vcc* vcc)
 clear_bit(ATM_VF_PARTIAL,>flags);
 
 ASSERT(fore200e_vcc);
-fore200e_kfree(fore200e_vcc);
+kfree(fore200e_vcc);
 }
 
 
@@ -1831,7 +1812,7 @@ fore200e_getstats(struct fore200e* fore2
 u32 stats_dma_addr;
 
 if (fore200e->stats == NULL) {
-   fore200e->stats = fore200e_kmalloc(sizeof(struct stats), GFP_KERNEL | 
GFP_DMA);
+   fore200e->stats = kzalloc(sizeof(struct stats), GFP_KERNEL | GFP_DMA);
if (fore200e->stats == NULL)
return -ENOMEM;
 }
@@ -2002,17 +1983,6 @@ fore200e_setloop(struct fore200e* fore20
 }
 
 
-static inline unsigned int
-fore200e_swap(unsigned int in)
-{
-#if defined(__LITTLE_ENDIAN)
-return swab32(in);
-#else
-return in;
-#endif
-}
-
-
 static int
 fore200e_fetch_stats(struct fore200e* fore200e, struct sonet_stats __user *arg)
 {
@@ -2021,19 +1991,19 @@ fore200e_fetch_stats(struct fore200e* fo
 if (fore200e_getstats(fore200e) < 0)
return -EIO;
 
-tmp.section_bip = fore200e_swap(fore200e->stats->oc3.section_bip8_errors);
-tmp.line_bip= fore200e_swap(fore200e->stats->oc3.line_bip24_errors);
-tmp.path_bip= fore200e_swap(fore200e->stats->oc3.path_bip8_errors);
-tmp.line_febe   = fore200e_swap(fore200e->stats->oc3.line_febe_errors);
-tmp.path_febe   = fore200e_swap(fore200e->stats->oc3.path_febe_errors);
-tmp.corr_hcs= fore200e_swap(fore200e->stats->oc3.corr_hcs_errors);
-tmp.uncorr_hcs  = fore200e_swap(fore200e->stats->oc3.ucorr_hcs_errors);
-tmp.tx_cells= fore200e_swap(fore200e->stats->aal0.cells_transmitted)  +
- fore200e_swap(fore200e->stats->aal34.cells_transmitted) +
- fore200e_swap(fore200e->stats->aal5.cells_transmitted);
-tmp.rx_cells= fore200e_swap(fore200e->stats->aal0.cells_received) +
- fore200e_swap(fore200e->stats->aal34.cells_received)+
- fore200e_swap(fore200e->stats->aal5.cells_received);
+tmp.section_bip = cpu_to_be32(fore200e->stats->oc3.section_bip8_errors);
+tmp.line_bip=

[2.6 patch] drivers/atm/Kconfig: remove dead ATM_TNETA1570 option

2006-12-18 Thread Adrian Bunk

This patch removes the unconverted ATM_TNETA1570 option that also lacks 
any code in the kernel.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

--- linux-2.6.20-rc1-mm1/drivers/atm/Kconfig.old2006-12-19 
04:42:00.0 +0100
+++ linux-2.6.20-rc1-mm1/drivers/atm/Kconfig2006-12-19 04:42:14.0 
+0100
@@ -167,10 +167,6 @@
  Note that extended debugging may create certain race conditions
  itself. Enable this ONLY if you suspect problems with the driver.
 
-#   bool 'Rolfs TI TNETA1570' CONFIG_ATM_TNETA1570 y
-#   if [ "$CONFIG_ATM_TNETA1570" = "y" ]; then
-#  bool '  Enable extended debugging' CONFIG_ATM_TNETA1570_DEBUG n
-#   fi
 config ATM_NICSTAR
tristate "IDT 77201 (NICStAR) (ForeRunnerLE)"
depends on PCI && ATM && !64BIT

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[2.6 patch] powerpc: remove the broken Gemini support

2006-12-18 Thread Adrian Bunk

On Sat, Nov 25, 2006 at 12:49:35AM +0100, Adrian Bunk wrote:
> I just saw the commit message below.
> 
> There seems to have been some although unmerged work on APUS support by 
> Roman, but I didn't find any recent work on bringing the GEMINI support 
> back into life.
> 
> Is this a wrong impression, or would a patch to remove it be OK?
>...

Zero feedback, patch to remove it below.

cu
Adrian


<--  snip  -->


This patch removes the broken Gemini support.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---

 arch/powerpc/kernel/head_32.S  |5 
 arch/powerpc/platforms/embedded6xx/Kconfig |9 
 arch/ppc/Kconfig   |9 
 arch/ppc/boot/simple/Makefile  |4 
 arch/ppc/boot/simple/misc.c|   15 
 arch/ppc/configs/gemini_defconfig  |  618 -
 arch/ppc/kernel/head.S |   18 
 arch/ppc/platforms/Makefile|1 
 arch/ppc/platforms/gemini.h|  165 -
 arch/ppc/platforms/gemini_pci.c|   41 -
 arch/ppc/platforms/gemini_prom.S   |   90 ---
 arch/ppc/platforms/gemini_serial.h |   40 -
 arch/ppc/platforms/gemini_setup.c  |  577 ---
 arch/ppc/syslib/Makefile   |1 
 arch/ppc/xmon/start.c  |5 
 include/asm-ppc/serial.h   |2 
 16 files changed, 2 insertions(+), 1598 deletions(-)

--- linux-2.6.20-rc1-mm1/arch/powerpc/kernel/head_32.S.old  2006-12-19 
03:57:49.0 +0100
+++ linux-2.6.20-rc1-mm1/arch/powerpc/kernel/head_32.S  2006-12-19 
03:58:30.0 +0100
@@ -344,12 +344,7 @@
 /* System reset */
 /* core99 pmac starts the seconary here by changing the vector, and
putting it back to what it was (unknown_exception) when done.  */
-#if defined(CONFIG_GEMINI) && defined(CONFIG_SMP)
-   . = 0x100
-   b   __secondary_start_gemini
-#else
EXCEPTION(0x100, Reset, unknown_exception, EXC_XFER_STD)
-#endif
 
 /* Machine check */
 /*
--- linux-2.6.20-rc1-mm1/arch/powerpc/platforms/embedded6xx/Kconfig.old 
2006-12-19 03:59:10.0 +0100
+++ linux-2.6.20-rc1-mm1/arch/powerpc/platforms/embedded6xx/Kconfig 
2006-12-19 03:59:26.0 +0100
@@ -104,15 +104,6 @@
 config PAL4
bool "SBS-Palomar4"
 
-config GEMINI
-   bool "Synergy-Gemini"
-   select PPC_INDIRECT_PCI
-   depends on BROKEN
-   help
- Select Gemini if configuring for a Synergy Microsystems' Gemini
- series Single Board Computer.  More information is available at:
- .
-
 config EST8260
bool "EST8260"
---help---
--- linux-2.6.20-rc1-mm1/arch/ppc/Kconfig.old   2006-12-19 03:59:37.0 
+0100
+++ linux-2.6.20-rc1-mm1/arch/ppc/Kconfig   2006-12-19 03:59:48.0 
+0100
@@ -670,15 +670,6 @@
 config PAL4
bool "SBS-Palomar4"
 
-config GEMINI
-   bool "Synergy-Gemini"
-   depends on BROKEN
-   select PPC_INDIRECT_PCI
-   help
- Select Gemini if configuring for a Synergy Microsystems' Gemini
- series Single Board Computer.  More information is available at:
- .
-
 config EST8260
bool "EST8260"
---help---
--- linux-2.6.20-rc1-mm1/arch/ppc/boot/simple/Makefile.old  2006-12-19 
03:59:58.0 +0100
+++ linux-2.6.20-rc1-mm1/arch/ppc/boot/simple/Makefile  2006-12-19 
04:00:20.0 +0100
@@ -116,10 +116,6 @@
  extra.o-$(CONFIG_CHESTNUT):= misc-chestnut.o
  end-$(CONFIG_CHESTNUT):= chestnut
 
-  zimage-$(CONFIG_GEMINI)  := zImage-STRIPELF
-zimageinitrd-$(CONFIG_GEMINI)  := zImage.initrd-STRIPELF
- end-$(CONFIG_GEMINI)  := gemini
-
  extra.o-$(CONFIG_KATANA)  := misc-katana.o
  end-$(CONFIG_KATANA)  := katana
cacheflag-$(CONFIG_KATANA)  := -include $(clear_L2_L3)
--- linux-2.6.20-rc1-mm1/arch/ppc/boot/simple/misc.c.old2006-12-19 
04:00:33.0 +0100
+++ linux-2.6.20-rc1-mm1/arch/ppc/boot/simple/misc.c2006-12-19 
04:01:18.0 +0100
@@ -42,14 +42,11 @@
 #endif
 
 /* Will / Can the user give input?
- * Val Henson has requested that Gemini doesn't wait for the
- * user to edit the cmdline or not.
  */
 #if (defined(CONFIG_SERIAL_8250_CONSOLE) \
|| defined(CONFIG_VGA_CONSOLE) \
|| defined(CONFIG_SERIAL_MPC52xx_CONSOLE) \
-   || defined(CONFIG_SERIAL_MPSC_CONSOLE)) \
-   && !defined(CONFIG_GEMINI)
+   || defined(CONFIG_SERIAL_MPSC_CONSOLE))
 #define INTERACTIVE_CONSOLE1
 #endif
 
@@ -178,16 +175,6 @@
 
if (keyb_present)
CRT_tstc();  /* Forces keyboard to be initialized */
-#ifdef CONFIG_GEMINI
-   /*
-* If cmd_line is empty and cmd_preset is not, copy cmd_preset
-* to cmd_line.  This way we can override cmd_preset

OSS driver removal, 3nd round

2006-12-18 Thread Adrian Bunk

Now that the second round of removing options for OSS drivers where ALSA 
drivers without regressions exist for the same hardware got included in 
Linus' tree, it's time for a third round amongst the remaining drivers.


Removing OSS drivers where ALSA drivers for the same hardware exists has
two reasons:

1. remove obsolete and mostly unmaintained code
2. get bugs in the ALSA drivers reported that weren't previously
   reported due to the possible workaround of using the OSS drivers


The list below divides the OSS drivers into the following three
categories:
1. ALSA drivers for the same hardware
2. ALSA drivers for the same hardware with known problems
3. no ALSA drivers for the same hardware


The proposed timeline is:
- 2.6.20: let the drivers under 1. in the list below depend on
  OSS_OBSOLETE
- 2.6.22: remove the options depending on OSS_OBSOLETE
- 2.6.24: remove the code for the drivers that were depending on
  OSS_OBSOLETE from the kernel tree


To make a long story short:

If you are using an OSS driver because the ALSA driver doesn't work
equally well on your hardware listed under 1. below, send me an email
with a bug number in the ALSA bug tracking system now.


A small FAQ:

Q: But OSS is kewl and ALSA sucks!
A: The decision for the OSS->ALSA move was four years ago.
   If ALSA sucks, please help to improve ALSA.

Q: What about the OSS emulation in ALSA?
A: The OSS emulation in ALSA is not affected by my patches
   (and it's not in any way scheduled for removal).


Please review the following list:


1. ALSA drivers for the same hardware

DMASOUND_PMAC
SOUND_ES1371


2. ALSA drivers for the same hardware with known problems

SOUND_CS4232
- ALSA #1520 (Soundchip was not detected on HP Omnibook 5700 CTX)

SOUND_ICH
- Alan Cox:
  ALSA driver lacks "support for AC97 wired touchscreens and the like"

SOUND_SSCAPE
- ALSA #2234 (driver does not find Soundscape Elite)

SOUND_TRIDENT
- maintainer of the OSS driver wants his driver to stay

SOUND_VIA82CXXX
- ALSA #1906 (1-second overruns reported by arecord,
  complete system hang with jackd -d alsa)


3. no ALSA drivers for the same hardware

DMASOUND_ATARI
DMASOUND_PAULA
DMASOUND_Q40
SOUND_AEDSP16
SOUND_AU1550_AC97
SOUND_BCM_CS4297A
SOUND_HAL2
SOUND_KAHLUA
SOUND_MSNDCLAS
SOUND_MSNDPIN
SOUND_MSS (also due to SOUND_PSS, SOUND_TRIX and perhaps SOUND_AEDSP16)
SOUND_PAS
SOUND_PSS
SOUND_SB (also due to SOUND_KAHLUA, SOUND_PAS and perhaps SOUND_AEDSP16)
SOUND_SH_DAC_AUDIO
SOUND_TRIX
SOUND_VIDC
SOUND_VRC5477
SOUND_VWSND
SOUND_WAVEARTIST


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [take28-resend_2->0 0/8] kevent: Generic event handling mechanism.

2006-12-18 Thread Randy Dunlap

On Mon, 18 Dec 2006 19:47:21 -0800 Ulrich Drepper wrote:

> It would help if one could actually get hold of the changes.
> 
> Neither at home nor on my gmail account did I get them all.  The gmane 
> also only has 5 of the 9 mails or so.  Your archive only has sources 
> from a couple of versions back.

It looks like they are all archived at marc.theaimsgroup.com.
Try this:
http://marc.theaimsgroup.com/?l=linux-netdev=2=1=take28-resend_2=t

But it would be Good for them to have a home.

---
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Aiee, killing interrupt handler!

2006-12-18 Thread Hawk Xu


Hi!

Our server(running Oracle 10g) is having a kernel panic problem:

Process swapper (pid: 0, threadinfo 80582000, task 80464300)
Stack: 0296 8013f325 81007f7f54d0 0100
  0001 000e 8053e098 8013f3a5
  81007f7f54d0 810002c10a20
Call Trace:  {group_send_sig_info+85}
{send_group_sig_info+53}
  {it_real_fn+0} {it_real_fn+22}
  {run_timer_softirq+383}
{profile_pc+32}
  {__do_softirq+113}
{do_softirq+53}
  {apic_timer_interrupt+99}  
{kernel_thread+130}
  {default_idle+0}
{default_idle+32}
  {cpu_idle+74} {start_kernel+469}
  {_sinittext+579}
Code: 80 3f 00 7e f9 e9 4a fd ff ff e8 b0 25 ec ff e9 74 fd ff ff
console shuts up ...
<0>Kernel panic - not syncing: Aiee, killing interrupt handler!


And, we have these error messages in the /var/log/kernel file:

Dec  7 17:19:09 kf85-1 kernel: set_local_var[9683]: segfault at
fffc rip 55f41d69 rsp c4e8 error 6
Dec  7 17:27:44 kf85-1 kernel: set_local_var[12020]: segfault at
fffc rip 55f41d69 rsp b978 error 6
Dec  7 17:29:39 kf85-1 kernel: dbi[12608]: segfault at 
rip 080ecea8 rsp a0b0 error 4
Dec 14 14:00:39 kf85-1 kernel: set_local_var[1886]: segfault at
fffc rip 55f41d69 rsp b358 error 6
Dec 15 10:03:17 kf85-1 kernel: set_local_var[2459]: segfault at
fffc rip 55f41d69 rsp c2e8 error 6
Dec 15 10:36:27 kf85-1 kernel: modeling[12173] trap bounds rip:806aec8
rsp:9820 error:0
Dec 15 10:51:49 kf85-1 kernel: modeling[14405]: segfault at
0008 rip 56b97e8c rsp aa78 error 6
Dec 15 11:09:14 kf85-1 kernel: set_local_var[20817]: segfault at
fffc rip 55f41d69 rsp c928 error 6
Dec 15 11:16:29 kf85-1 kernel: set_local_var[21760]: segfault at
fffc rip 55f41d69 rsp bd98 error 6
Dec 15 15:10:52 kf85-1 kernel: rtdb_server[17604] trap bounds
rip:80f5247 rsp:5b9f9040 error:0
Dec 15 15:11:01 kf85-1 kernel: rtdb_server[18631] trap bounds
rip:80f5247 rsp:58905040 error:0
Dec 15 15:11:16 kf85-1 kernel: rtdb_server[18718] trap bounds
rip:80f5247 rsp:59300040 error:0
Dec 15 15:11:23 kf85-1 kernel: rtdb_server[18762] trap bounds
rip:80f5247 rsp:59106040 error:0
Dec 15 15:14:17 kf85-1 kernel: rtdb_server[18869] trap bounds
rip:80f5247 rsp:5b10a040 error:0
Dec 15 15:14:22 kf85-1 kernel: rtdb_server[19567] trap bounds
rip:80f5247 rsp:59106040 error:0
Dec 15 15:14:32 kf85-1 kernel: rtdb_server[19586] trap bounds
rip:80f5247 rsp:57903040 error:0
Dec 15 15:48:30 kf85-1 kernel: set_local_var[2430]: segfault at
fffc rip 55f41d69 rsp c7f8 error 6
Dec 15 16:16:17 kf85-1 kernel: GFileManager[10453]: segfault at
3135 rip 574d5f99 rsp 597b3158 error 6


The kernel version is 2.6.12.5.  The kernel panic problem happened 3
times last week, and we don't know whether there are some relationships
between the kernel panic and the error messages in the kernel log file.

That's all we know now, the server is in Nanjing, which is 1000
kilometers south of us, and we are not allowed to access the server.

Any help would be great!


Best regards,

hxu

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [take28-resend_2->0 0/8] kevent: Generic event handling mechanism.

2006-12-18 Thread Ulrich Drepper


It would help if one could actually get hold of the changes.

Neither at home nor on my gmail account did I get them all.  The gmane 
also only has 5 of the 9 mails or so.  Your archive only has sources 
from a couple of versions back.


--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: GPL only modules

2006-12-18 Thread D. Hazelton

On Monday 18 December 2006 14:41, Alexandre Oliva wrote:
> On Dec 17, 2006, Kyle Moffett <[EMAIL PROTECTED]> wrote:
> > On the other hand, certain projects like OpenAFS, while not license-
> > compatible, are certainly not derivative works.
>
> Certainly a big chunk of OpenAFS might not be, just like a big chunk
> of other non-GPL drivers for Linux.
>
> But what about the glue code?  Can that be defended as not a derived
> work, such that it doesn't have to be GPL?

That has never been an issue, really. Its what 99% of the binary drivers 
believe - hence the reason that there is the user-compiled component to all 
of them.

> If not, can the whole containing both the non-derivative work and the
> source code providing the glue without which the whole wouldn't
> fulfill its intended purpose be regarded as a mere aggregate, and thus
> not be subject to the requirement that the whole be released under the
> GPL?

The view that binary vendors take is "Linking does not create a derived 
work" - regardless of the fact that you cannot have a complete compiled 
program or module *WITHOUT* that linking. However I have a feeling that the 
lawyers in the employ of the companies that ship BLOB drivers say that all 
they need to do to comply with the GPL is to ship the glue-code in source 
form.

And I have to admit that this does seem to comply with the GPL - to the 
letter, if not the spirit.

DRH
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC][PATCH] Fix area->nr_free-- went (-1) issue in buddy system

2006-12-18 Thread Aubrey


Hi all,

When I setup two zones (NORMAL and DMA) in my system, I got the
following wired result from /proc/buddyinfo.
-
root:~> cat /proc/buddyinfo
Node 0, zone  DMA  2  1  2  1  1  0  0
1  1  2  2  0  0  0
Node 0, zone   Normal  1  1  1  1  1  1  0
0 4294967295  0 4294967295  2  0  0
-

As you see, two area->nr_free went -1.

After dig into the code, I found the problem is in the fun
__free_one_page() when the kernel boot up call free_all_bootmem(). If
two zones setup, it's possible NORMAL zone merged a block whose order
=8 at the first time(this time zone[NORMA]->free_area[8].nr_free = 0)
and found its buddy in the DMA zone. So the two blocks will be merged
and area->nr_free went to -1.

My proposed patch is as follows:


Signed-off-by: Aubrey Li <[EMAIL PROTECTED]>
---
--- page_alloc.c.orig   2006-12-19 10:45:25.0 +0800
+++ page_alloc.c2006-12-19 10:44:48.0 +0800
@@ -407,7 +407,8 @@ static inline void __free_one_page(struc

list_del(>lru);
area = zone->free_area + order;
-   area->nr_free--;
+   if (area->nr_free > 0)
+   area->nr_free--;
rmv_page_order(buddy);
combined_idx = __find_combined_index(page_idx, order);
page = page + (combined_idx - page_idx);

Any comments?

Thanks,
-Aubrey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfslogd-spinlock bug?

2006-12-18 Thread David Chinner

On Tue, Dec 19, 2006 at 12:39:46AM +0100, Haar János wrote:
> From: "David Chinner" <[EMAIL PROTECTED]>
> > On Mon, Dec 18, 2006 at 09:17:50AM +0100, Haar János wrote:
> > > From: "David Chinner" <[EMAIL PROTECTED]>
> > > > Ok, I've never heard of a problem like this before and you are doing
> > > > something that very few ppl are doing (i.e. XFS on NBD). I'd start
> > > > Hence  I'd start by suspecting a bug in the NBD driver.
> > >
> > > Ok, if you have right, this also can be in context with the following
> issue:
> > >
> > > http://download.netcenter.hu/bughunt/20061217/messages.txt   (10KB)
> >
> > Which appears to be a crash in wake_up_process() when doing memory
> > reclaim (waking the xfsbufd).
> 
> Sorry, can you translate it to "poor mans language"? :-)
> This is a different bug?

Don't know - it's a different crash, but once again one that I've
never heard of occurring before.

> > Ok, I've found this pattern:
> >
> > #define POISON_FREE 0x6b
> >
> > Can you confirm that you are running with CONFIG_DEBUG_SLAB=y?
> 
> Yes, i build with this option enabled.
> Is this wrong?

No, but it does slow your machine down.

> > If so, we have a use after free occurring here and it would also
> > explain why no-one has reported it before.
> >
> > FWIW, can you turn on CONFIG_XFS_DEBUG=y and see if that triggers
> > a different bug check prior to the above dump?
> 
> [EMAIL PROTECTED] linux-2.6.19]# make bzImage
> scripts/kconfig/conf -s arch/x86_64/Kconfig
> .config:7:warning: trying to assign nonexistent symbol XFS_DEBUG
> 
> I have missed something?

No - I forgot that config option doesn't exist in mainline XFS - it's
only in the dev tree.

FWIW, I've run XFSQA twice now on a scsi disk with slab debuggin turned
on and I haven't seen this problem. I'm not sure how to track down
the source of the problem without a test case, but as a quick test, can
you try the following patch?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group


---
 fs/xfs/linux-2.6/xfs_buf.c |4 
 1 file changed, 4 insertions(+)

Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_buf.c
===
--- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_buf.c   2006-12-19 
12:22:54.0 +1100
+++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_buf.c2006-12-19 13:48:36.937118569 
+1100
@@ -942,11 +942,14 @@ xfs_buf_unlock(
 /*
  * Pinning Buffer Storage in Memory
  * Ensure that no attempt to force a buffer to disk will succeed.
+ * Hold the buffer so we don't attempt to free it while it
+ * is pinned.
  */
 void
 xfs_buf_pin(
xfs_buf_t   *bp)
 {
+   xfs_buf_hold(bp);
atomic_inc(>b_pin_count);
XB_TRACE(bp, "pin", (long)bp->b_pin_count.counter);
 }
@@ -958,6 +961,7 @@ xfs_buf_unpin(
if (atomic_dec_and_test(>b_pin_count))
wake_up_all(>b_waiters);
XB_TRACE(bp, "unpin", (long)bp->b_pin_count.counter);
+   xfs_buf_rele(bp);
 }
 
 int
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: schedule_timeout: wrong timeout value

2006-12-18 Thread Robert Hancock


kyle wrote:

Hi,

Recently my mysql servershows something like:
Dec 18 18:24:05 sql kernel: schedule_timeout: wrong timeout value 
 from c0284efd

Dec 18 18:24:36 sql last message repeated 19939 times
Dec 18 18:25:37 sql last message repeated 33392 times

from syslog every 1 or 2 days. Whenever the messages show, mysql server 
stop accept new connections from the same network, and I need to restart 
the mysql service and then it will keep running well for 1-2 days until 
the messages show up again.


The server has been running over 1 year without any problem, the problem 
started show up around 2 weeks ago. It's running kernel 2.6.12, and 
mysql server, nothing else. Hardware is Pentium 4 2.8GHz with 
hyperthreading enabled.


What does the kernel message mean and why it make mysql stop accept new 
connections? Is it hardware problem or try upgrade the kernel may help?

Please CC me if possible. Thank you


The message means some code in the kernel or in some module passed a 
negative value to schedule_timeout which it shouldn't have. The c0284efd 
value is the address of the function that made the call - you may be 
able to look that up in your /proc/ksyms or the System.map file and 
figure out what function that is..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Odd system lock up

2006-12-18 Thread Erik Ohrnberger

OK, got the 2.6.19 kernel installed and running OK, full libata wrapping of
existing IDE controllers and hard disks.

I'm experiencing some odd, random periodic system lockups without any sort
of debugging information being captured in the system message log.  Perhaps
it's a hard disk that's causing the trouble?

Is there a way to capture which drive might be causing the issue in the
message log?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa


> > > If all of test_clear_page_dirty() has been commented out then the page 
> > > will
> > > never become clean hence will never fall out of pagecache, so unless 
> > > Andrei
> > > is doing a reboot before checking for corruption, perhaps the underlying
> > > data on-disk is incorrect, but we can't see it.
> > 
> > if I do a sync and echo 1 > /proc/sys/vm/drop_caches
> 
> OK, that works.
> 
> >  does the reboot is
> > still necesary ?
> 
> It might be necessary to reboot in this case - if we're leaving the
> pagecache dirty, writing to drop_caches won't remove it.  And you probably
> won't be able to get a clean reboot either.
> 
> > > 
> > > Andrei, how _are_ you running this test?What's the exact sequence of 
> > > steps?
> > > 
> > > In particular, are you doing anything which would cause the corrupted file
> > > to be evicted from memory, thus forcing a read from disk?  Such as
> > > unmounting and then remounting the filesystem?
> > 
> > I boot linux, I start rtorrent and start the download, while it's
> > downloading I start evolution and i check my mail(my mbox is very large,
> > several hundered megabytes), I close evolution(I use evolution just to
> > have another application witch uses the filesystem and the memory), I
> > start evolution again. I start firefox. The download is complete.
> > Rtorrent says if the hash is good or not. I do a "unrar t qwe.rar" to
> > test that all 84 downloaded rar files are ok and see the result.
> > 
> > > 
> > > The point of my question is to check that the data is really incorrect
> > > on-disk, or whether it is incorrect in pagecache.

I rebooted and the files are still broken after reboot(tested twice) so
the data is incorrect on disk.

> > > 
> > > Also, it'd be useful if you could determine whether the bug appears with
> > > the ext2 filesystem: do s/ext3/ext2/ in /etc/fstab, or boot with
> > > rootfstype=ext2 if it's the root filesystem.
> > 
> > I will test.

Will test In a couple of hours, I have some work to do...

> 
> ok, thanks.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrew Morton

On Tue, 19 Dec 2006 03:44:51 +0200
Andrei Popa <[EMAIL PROTECTED]> wrote:

> On Mon, 2006-12-18 at 17:21 -0800, Andrew Morton wrote:
> > On Mon, 18 Dec 2006 16:57:30 -0800 (PST)
> > Linus Torvalds <[EMAIL PROTECTED]> wrote:
> > 
> > > What happens if you only ifdef out that single thing? 
> > > 
> > > The actual page-cleaning functions make sure to only clear the TAG_DIRTY 
> > > bit _after_ the page has been marked for writeback. Is there some 
> > > ordering 
> > > constraint there, perhaps?
> > > 
> > > I'm really reaching here. I'm trying to see the pattern, and I'm not 
> > > seeing it. I'm asking you to test things just to get more of a feel for 
> > > what triggers the failure, than because I actually have any kind of idea 
> > > of what the heck is going on.
> > > 
> > > Andrew, Nick, Hugh - any ideas?
> > 
> > If all of test_clear_page_dirty() has been commented out then the page will
> > never become clean hence will never fall out of pagecache, so unless Andrei
> > is doing a reboot before checking for corruption, perhaps the underlying
> > data on-disk is incorrect, but we can't see it.
> 
> if I do a sync and echo 1 > /proc/sys/vm/drop_caches

OK, that works.

>  does the reboot is
> still necesary ?

It might be necessary to reboot in this case - if we're leaving the
pagecache dirty, writing to drop_caches won't remove it.  And you probably
won't be able to get a clean reboot either.

> > 
> > Andrei, how _are_ you running this test?What's the exact sequence of 
> > steps?
> > 
> > In particular, are you doing anything which would cause the corrupted file
> > to be evicted from memory, thus forcing a read from disk?  Such as
> > unmounting and then remounting the filesystem?
> 
> I boot linux, I start rtorrent and start the download, while it's
> downloading I start evolution and i check my mail(my mbox is very large,
> several hundered megabytes), I close evolution(I use evolution just to
> have another application witch uses the filesystem and the memory), I
> start evolution again. I start firefox. The download is complete.
> Rtorrent says if the hash is good or not. I do a "unrar t qwe.rar" to
> test that all 84 downloaded rar files are ok and see the result.
> 
> > 
> > The point of my question is to check that the data is really incorrect
> > on-disk, or whether it is incorrect in pagecache.
> > 
> > Also, it'd be useful if you could determine whether the bug appears with
> > the ext2 filesystem: do s/ext3/ext2/ in /etc/fstab, or boot with
> > rootfstype=ext2 if it's the root filesystem.
> 
> I will test.

ok, thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa

On Mon, 2006-12-18 at 16:57 -0800, Linus Torvalds wrote:
> 
> On Tue, 19 Dec 2006, Andrei Popa wrote:
> > > > 
> > > > nope, no file corruption at all.
> > > 
> > > Ok. That's interesting, but I think you actually #ifdef'ed out too 
> > > much:
> > > 
> > > It was really just the _inner_ "if (mapping_cap_account_dirty(.." 
> > > statement that I meant you should remove.
> > > 
> > > Can you try that too?
> > 
> > I have file corruption: "Hash check on download completion found bad
> > chunks, consider using "safe_sync"."
> 
> Ok, that's interesting.
> 
> So it doesn't seem to be the call to page_mkclean() itself that causes 
> corruption. It looks like Peter's hunch that maybe there's some bug in 
> PG_dirty handling _itself_ might be an idea..
> 
> And the reason it only started happening now is that it may just have been 
> _hidden_ by the fact that while we kept the dirty bits in the page tables, 
> we'd end up writing the dirty page _despite_ having lost the PG_dirty bit. 
> So if it's some bad interaction between writable mappings and some other 
> part of the system, we just didn't see it earlier, exactly because we had 
> _lots_ of dirty bits, and it was enough that _one_ of them was right.
> 
> If you didn't see corruption when you #ifdef'ed out too much of the 
> "test_clean_page_dirty() function (the _whole_ TestClearPageDirty() 
> if-statement), but you get it when you just comment out the stuff that 
> does the page_mkclean(), that's interesting.
> 
> I'm left lookin gat the "radix_tree_tag_clear()" in 
> test_clear_page_dirty().
> 
> What happens if you only ifdef out that single thing? 

I have file corruption.

> 
> The actual page-cleaning functions make sure to only clear the TAG_DIRTY 
> bit _after_ the page has been marked for writeback. Is there some ordering 
> constraint there, perhaps?
> 
> I'm really reaching here. I'm trying to see the pattern, and I'm not 
> seeing it. I'm asking you to test things just to get more of a feel for 
> what triggers the failure, than because I actually have any kind of idea 
> of what the heck is going on.
> 
> Andrew, Nick, Hugh - any ideas?
> 
>   Linus


diff --git a/fs/buffer.c b/fs/buffer.c
index d1f1b54..263f88e 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag
int ret = 0;
 
BUG_ON(!PageLocked(page));
-   if (PageWriteback(page))
+   if (PageDirty(page) || PageWriteback(page))
return 0;
 
if (mapping == NULL) {  /* can this still happen? */
@@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag
spin_lock(>private_lock);
ret = drop_buffers(page, _to_free);
spin_unlock(>private_lock);
-   if (ret) {
-   /*
-* If the filesystem writes its buffers by hand (eg ext3)
-* then we can have clean buffers against a dirty page.  We
-* clean the page here; otherwise later reattachment of buffers
-* could encounter a non-uptodate page, which is unresolvable.
-* This only applies in the rare case where try_to_free_buffers
-* succeeds but the page is not freed.
-*
-* Also, during truncate, discard_buffer will have marked all
-* the page's buffers clean.  We discover that here and clean
-* the page also.
-*/
-   if (test_clear_page_dirty(page))
-   task_io_account_cancelled_write(PAGE_CACHE_SIZE);
-   }
 out:
if (buffers_to_free) {
struct buffer_head *bh = buffers_to_free;
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 0f05cab..2d8 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1245,7 +1245,7 @@ retry:
wait_on_page_writeback(page);
 
if (PageWriteback(page) ||
-   !test_clear_page_dirty(page)) {
+   !test_clear_page_dirty(page, 0)) {
unlock_page(page);
break;
}
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 1387749..da2bdb1 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -484,7 +484,7 @@ static int fuse_commit_write(struct file
spin_unlock(>lock);
 
if (offset == 0 && to == PAGE_CACHE_SIZE) {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 0);
SetPageUptodate(page);
}
}
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index ed2c223..9f82cd0 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct
 
 static void truncate_huge_page(struct page *page)
 {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 0);

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa

On Mon, 2006-12-18 at 17:21 -0800, Andrew Morton wrote:
> On Mon, 18 Dec 2006 16:57:30 -0800 (PST)
> Linus Torvalds <[EMAIL PROTECTED]> wrote:
> 
> > What happens if you only ifdef out that single thing? 
> > 
> > The actual page-cleaning functions make sure to only clear the TAG_DIRTY 
> > bit _after_ the page has been marked for writeback. Is there some ordering 
> > constraint there, perhaps?
> > 
> > I'm really reaching here. I'm trying to see the pattern, and I'm not 
> > seeing it. I'm asking you to test things just to get more of a feel for 
> > what triggers the failure, than because I actually have any kind of idea 
> > of what the heck is going on.
> > 
> > Andrew, Nick, Hugh - any ideas?
> 
> If all of test_clear_page_dirty() has been commented out then the page will
> never become clean hence will never fall out of pagecache, so unless Andrei
> is doing a reboot before checking for corruption, perhaps the underlying
> data on-disk is incorrect, but we can't see it.

if I do a sync and echo 1 > /proc/sys/vm/drop_caches does the reboot is
still necesary ?

> 
> Andrei, how _are_ you running this test?What's the exact sequence of 
> steps?
> 
> In particular, are you doing anything which would cause the corrupted file
> to be evicted from memory, thus forcing a read from disk?  Such as
> unmounting and then remounting the filesystem?

I boot linux, I start rtorrent and start the download, while it's
downloading I start evolution and i check my mail(my mbox is very large,
several hundered megabytes), I close evolution(I use evolution just to
have another application witch uses the filesystem and the memory), I
start evolution again. I start firefox. The download is complete.
Rtorrent says if the hash is good or not. I do a "unrar t qwe.rar" to
test that all 84 downloaded rar files are ok and see the result.

> 
> The point of my question is to check that the data is really incorrect
> on-disk, or whether it is incorrect in pagecache.
> 
> Also, it'd be useful if you could determine whether the bug appears with
> the ext2 filesystem: do s/ext3/ext2/ in /etc/fstab, or boot with
> rootfstype=ext2 if it's the root filesystem.

I will test.

> 
> Thanks.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: GPL only modules

2006-12-18 Thread David Schwartz


Combined responses:

> So therefore I don't think you can reasonably claim that static
> vs. dynamic linking is only a technical difference.  There are clearly
> other differences when it comes to distribution of the resulting
> binaries.

We're only talking about the special case of GPL'd works. You can download a
million copies of a GPL'd work from a server run by a family member across
the room. You can then delete one copy for each copy you distribute in the
form of a statically linked work.

Issues of copying don't apply to GPL'd works unless you have no access to
the source code. Otherwise, someone else can copy you as many works as you
want with the source code, and you can use first sale to transfer every one
of them.

> I personally would think that a mechanical process of modification
> *does* create a derived work, but it would take a court of law or a
> legislature to make an authoritative decision, I guess.

Under at least United States law, copyright protected creative expression
and only creative expression. In other jurisdictions, there are other types
of rights similar to copyright that one can obtain by means of hard work,
for example database compilation rights. They are usually legally distinct
from copyright and grant different rights with different rules.

DS


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: GPL only modules

2006-12-18 Thread David Schwartz


> For both static and dynamic linking, you might claim the output is an
> aggregate, but that doesn't matter.  What matters is whether or not
> the output is a work based on the program, and whether the "mere
> aggregation" paragraph kicks in.
>
> If the output is not an aggregate, which is quite likely to be
> the case for dynamic linking, and quite possibly also for many static
> linking cases, then the "mere aggregation" paragraph of clause 2 does
> not kick in.
>
> If the output is indeed an aggregate, as it may quite likely be in the
> case of static linking, then the "mere aggregation" considerations of
> clause 2 may kick in and enable the 'anything else' to not be brought
> under the scope of the license.  You still need permission to
> distribute the whole.  The GPL asserts its non-interference with your
> ability to distribute the separate portion separately, under whatever
> license you can, as long as it's not a derived work from the GPL
> portion.

No!

It makes no difference whether the "mere aggregation" paragraph kicks in
because the "mere aggregation" paragraph is *explaining* the *law*. What
matters is what the law actually *says*.

We are talking about what works are within the GPL's scope. The text of the
GPL does not matter because the GPL does not set its own scope, copyright
law does.

The GPL could say that if you ever see the source code to a GPL'd work,
every work you ever write must be placed under the GPL. But that wouldn't
make it true, because that would be a requirement outside the GPL's scope.

We are talking about works are inside the GPL's legal scope, and in that
case, nothing the GPL says can enlarge the scope.

DS



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: GPL only modules

2006-12-18 Thread David Schwartz


> > It's also not clear that an aggregate work is in fact
> > a single work for any legal purpose other than the aggregator's claim to
> > copyright.

> Not sure what you're trying to say there - what are we talking about
> here other than the copyright?

We are talking about two different possible copyright claims. One is the
person who aggregates the works who may try to claim a "compilation
copyright" in the aggregate. The other is the authors of the original works
who may try to claim that the aggregate is a derivative work.

> First sale has nothing to do with this. First sale applies to the
> redistribution or resale of copies you have purchased, not with the
> right to make additional copies.

First sale is exactly what this is about. Nobody needs to make "additional
copies" of the Linux kernel because I can download a thousand of them from a
computer operated by the guy in the office down the hall from me.


> > ... For copyright law purposes, it is not a work because no creative
> > input was needed to produce it beyond what was used to create
> > the works from
> > which it was formed.

> Selection and organization are potentially creative. The Act
> distinguishes between derivative works, compilations, and collective
> works. A derivative work is a work "based on" the original work; a
> compilation is a work formed by "collecting and assembling"
> preexisting works "in such a way that the resulting work as a whole
> constitutes an original work of authorship. A "collective work" is any
> work formed by assembling independent works into a whole. All
> compilations are collective works, but not all collective works are
> compilations. Derivative works have nothing to do with aggregation.

Good, so we agree that aggregate is not a derivative work. That means it
doesn't have to be GPL'd even if some of its component works are GPL'd.

> > I recently bought two DVDs as a present for a friend of mine. I
> > put the two
> > DVDs in one box and shipped them to him. Just because the two
> > DVDs are in
> > one box does not make them a derivative work for copyright
> > purposes because
> > no creative input went in to them. I can even staple the two
> > DVDs together
> > if I want. I also don't need any special permission to ship the
> > two of them
> > together to my friend, first sale covers that. The right to ship each
> > individual work is all that's needed to ship the aggregate.

> First sale is separate from Copyright. You have the right to ship
> them, but not to make copies of them. You can't for instance, ship
> your friend a single DVD that combines the contents of the two you
> bought. That's not unlike the distinction GPLv3 makes between
> "propagating" and "conveying".

I don't see why you can't distribute a single DVD that combines the contents
of the two you bought, so long as you destroy the originals. There is no
issue about the number of copies with the GPL because you can download any
number of copies of a GPL'd work from someone else who provides you with
source.

> > Now, if I wanted to write my own story with elements from the content of
> > both DVDs, that would be a derivative work because the
> > combination itself is
> > done in a creative way.

> If it just rearranged the pieces, it would not be a derivative work,
> it would be a compilation. If you transformed the pieces, it might be
> a derivative work (depending on the nature of the transformation).

I think it depends upon how small the pieces are. If you rearranged them
creatively, and the result was in effect a single work, I think it would be
a derivative work.

> > No automated, mechanical process can create a derivative work
> > of software.
> > (With a few exceptions not relevant here.)

> The truth of that statement depends on exactly what you mean by "an
> automated, mechanical process". There are mechanical processs that
> would simply produce the original work itself, not a derivative (e.g.,
> changing the type from Courier to Times). There are other mechanical
> proceses that would produce a collective work (e.g., inserting after
> each line of the program a statement indicating whether or not it was
> valid C). There are other mechanical processes that would create a
> derivative work (e.g., a paraphrasing tool). It depends on the nature
> of the mechanical process; that is, the decision, by you, to apply a
> particular mechanical process is itself creative. But, perhaps that's
> what you meant by your "few exceptions".

I mean that you can't link together a bunch of works that would otherwise be
independent and get a derivative work as a result. Linking combines
mechanistically, not creatively, so it aggregates.

DS


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] OOPS: divide error while s2dsk (2.6.20-rc1-mm1)

2006-12-18 Thread Andrew Morton

On Mon, 18 Dec 2006 17:18:12 -0800 (PST)
David Rientjes <[EMAIL PROTECTED]> wrote:

> On Mon, 18 Dec 2006, Andrew Morton wrote:
> 
> > diff -puN mm/vmscan.c~shrink_all_memory-fix-lru_pages-handling mm/vmscan.c
> > --- a/mm/vmscan.c~shrink_all_memory-fix-lru_pages-handling
> > +++ a/mm/vmscan.c
> > @@ -1484,6 +1484,16 @@ static unsigned long shrink_all_zones(un
> > return ret;
> >  }
> >  
> > +static unsigned long count_lru_pages(void)
> > +{
> > +   struct zone *zone;
> > +   unsigned long ret = 0;
> > +
> > +   for_each_zone(zone);
> > +   ret += zone->nr_active + zone->nr_inactive;
> > +   return ret;
> > +}
> > +
> >  /*
> >   * Try to free `nr_pages' of memory, system-wide, and return the number of
> >   * freed pages.
> 
> There's an extra semicolon there

Sigh.  coding-while-diseased.

> that results in only the final zone being 
> used.
> 

Actually it'll go oops.  Fixed, thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrew Morton

On Mon, 18 Dec 2006 16:57:30 -0800 (PST)
Linus Torvalds <[EMAIL PROTECTED]> wrote:

> What happens if you only ifdef out that single thing? 
> 
> The actual page-cleaning functions make sure to only clear the TAG_DIRTY 
> bit _after_ the page has been marked for writeback. Is there some ordering 
> constraint there, perhaps?
> 
> I'm really reaching here. I'm trying to see the pattern, and I'm not 
> seeing it. I'm asking you to test things just to get more of a feel for 
> what triggers the failure, than because I actually have any kind of idea 
> of what the heck is going on.
> 
> Andrew, Nick, Hugh - any ideas?

If all of test_clear_page_dirty() has been commented out then the page will
never become clean hence will never fall out of pagecache, so unless Andrei
is doing a reboot before checking for corruption, perhaps the underlying
data on-disk is incorrect, but we can't see it.

Andrei, how _are_ you running this test?What's the exact sequence of steps?

In particular, are you doing anything which would cause the corrupted file
to be evicted from memory, thus forcing a read from disk?  Such as
unmounting and then remounting the filesystem?

The point of my question is to check that the data is really incorrect
on-disk, or whether it is incorrect in pagecache.

Also, it'd be useful if you could determine whether the bug appears with
the ext2 filesystem: do s/ext3/ext2/ in /etc/fstab, or boot with
rootfstype=ext2 if it's the root filesystem.

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] OOPS: divide error while s2dsk (2.6.20-rc1-mm1)

2006-12-18 Thread David Rientjes

On Mon, 18 Dec 2006, Andrew Morton wrote:

> diff -puN mm/vmscan.c~shrink_all_memory-fix-lru_pages-handling mm/vmscan.c
> --- a/mm/vmscan.c~shrink_all_memory-fix-lru_pages-handling
> +++ a/mm/vmscan.c
> @@ -1484,6 +1484,16 @@ static unsigned long shrink_all_zones(un
>   return ret;
>  }
>  
> +static unsigned long count_lru_pages(void)
> +{
> + struct zone *zone;
> + unsigned long ret = 0;
> +
> + for_each_zone(zone);
> + ret += zone->nr_active + zone->nr_inactive;
> + return ret;
> +}
> +
>  /*
>   * Try to free `nr_pages' of memory, system-wide, and return the number of
>   * freed pages.

There's an extra semicolon there that results in only the final zone being 
used.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Gene Heskett

On Monday 18 December 2006 18:48, Andrei Popa wrote:
>On Mon, 2006-12-18 at 14:32 -0800, Linus Torvalds wrote:
>> On Mon, 18 Dec 2006, Andrei Popa wrote:
>> > > This should be fairly easy to test: just change every single ", 1"
>> > > case in the patch to ", 0".
>> > >
>> > > What happens for you in that case?
>> >
>> > I have file corruption.
>>
>> Magic. And btw, _thanks_ for being such a great tester.
>>
>> So now I have one more thng for you to try, it you can bother:
>>
>> There's exactly two call sites that call "page_mkclean()" (an dthat is
>> the only thing in turn that calls "page_mkclean_one()", which we
>> already determined will cause the corruption).
>>
>> Both of them do
>>
>>  if (mapping_cap_account_dirty(mapping)) {
>>  ..
>>
>> things, although they do slightly different things inside that if in
>> your patched kernel.
>>
>> Can you just TOTALLY DISABLE that case for the test_clear_page_dirty()
>> case? Just do an "#if 0 .. #endif" around that whole if-statement,
>> leaving the _only_ thing that actually calls "page_mkclean()" to be
>> the "clear_page_dirty_for_io()" call.
>>
>> Do you still see corruption?
>
>nope, no file corruption at all.
>
Goody I says to nobody in particular, I'll go build this...
>
>diff --git a/fs/buffer.c b/fs/buffer.c
>index d1f1b54..263f88e 100644
>--- a/fs/buffer.c
>+++ b/fs/buffer.c
>@@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag
>   int ret = 0;
>
>   BUG_ON(!PageLocked(page));
>-  if (PageWriteback(page))
>+  if (PageDirty(page) || PageWriteback(page))
>   return 0;
>
>   if (mapping == NULL) {  /* can this still happen? */
>@@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag
>   spin_lock(>private_lock);
>   ret = drop_buffers(page, _to_free);
>   spin_unlock(>private_lock);
>-  if (ret) {
>-  /*
>-   * If the filesystem writes its buffers by hand (eg ext3)
>-   * then we can have clean buffers against a dirty page.  We
>-   * clean the page here; otherwise later reattachment of buffers
>-   * could encounter a non-uptodate page, which is unresolvable.
>-   * This only applies in the rare case where try_to_free_buffers
>-   * succeeds but the page is not freed.
>-   *
>-   * Also, during truncate, discard_buffer will have marked all
>-   * the page's buffers clean.  We discover that here and clean
>-   * the page also.
>-   */
>-  if (test_clear_page_dirty(page))
>-  task_io_account_cancelled_write(PAGE_CACHE_SIZE);
>-  }
> out:
>   if (buffers_to_free) {
>   struct buffer_head *bh = buffers_to_free;
>diff --git a/fs/cifs/file.c b/fs/cifs/file.c
>index 0f05cab..2d8 100644
>--- a/fs/cifs/file.c
>+++ b/fs/cifs/file.c
>@@ -1245,7 +1245,7 @@ retry:
>   wait_on_page_writeback(page);
>
>   if (PageWriteback(page) ||
>-  !test_clear_page_dirty(page)) {
>+  !test_clear_page_dirty(page, 0)) {
>   unlock_page(page);
>   break;
>   }
>diff --git a/fs/fuse/file.c b/fs/fuse/file.c
>index 1387749..da2bdb1 100644
>--- a/fs/fuse/file.c
>+++ b/fs/fuse/file.c
>@@ -484,7 +484,7 @@ static int fuse_commit_write(struct file
>   spin_unlock(>lock);
>
>   if (offset == 0 && to == PAGE_CACHE_SIZE) {
>-  clear_page_dirty(page);
>+  clear_page_dirty(page, 0);
>   SetPageUptodate(page);
>   }
>   }
>diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
>index ed2c223..9f82cd0 100644
>--- a/fs/hugetlbfs/inode.c
>+++ b/fs/hugetlbfs/inode.c
>@@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct
>
> static void truncate_huge_page(struct page *page)
> {
>-  clear_page_dirty(page);
>+  clear_page_dirty(page, 0);
>   ClearPageUptodate(page);
>   remove_from_page_cache(page);
>   put_page(page);
>diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c
>index b1a1c72..5e29b37 100644
>--- a/fs/jfs/jfs_metapage.c
>+++ b/fs/jfs/jfs_metapage.c
>@@ -773,7 +773,7 @@ #if MPS_PER_PAGE == 1
>
>   /* Retest mp->count since we may have released page lock */
>   if (test_bit(META_discard, >flag) && !mp->count) {
>-  clear_page_dirty(page);
>+  clear_page_dirty(page, 0);
>   ClearPageUptodate(page);
>   }
> #else
>diff --git a/fs/reiserfs/stree.c b/fs/reiserfs/stree.c
>index 47e7027..a97e198 100644
>--- a/fs/reiserfs/stree.c
>+++ b/fs/reiserfs/stree.c
>@@ -1459,7 +1459,7 @@ static void unmap_buffers(struct page *p
>   bh = next;
>   } while (bh != head);
>   if

Re: [PATCH, RFC] reimplement flush_workqueue()

2006-12-18 Thread Andrew Morton

On Tue, 19 Dec 2006 03:43:19 +0300
Oleg Nesterov <[EMAIL PROTECTED]> wrote:

> On 12/18, Andrew Morton wrote:
> >
> > On Mon, 18 Dec 2006 01:34:16 +0300
> > Oleg Nesterov <[EMAIL PROTECTED]> wrote:
> > 
> > > NOTE: I removed 'int cpu' parameter, flush_workqueue() locks/unlocks
> > > workqueue_mutex unconditionally. It may be restored, but I think it
> > > doesn't make much sense, we take the mutex for the very short time,
> > > and the code becomes simpler.
> > > 
> > 
> > Taking workqueue_mutex() unconditionally in flush_workqueue() means
> > that we'll deadlock if a single-threaded workqueue callback handler calls
> > flush_workqueue().
> 
> Well. But flush_workqueue() drops workqueue_mutex before going to sleep ?
> 
>   flush_workqueue(single_threaded_wq);
>   ...
>   mutex_lock(_mutex);
>   ...
>   mutex_unlock(_mutex);
>   wait_for_completition();
>   handler runs,
>   calls flush_workqueue(),
>   workqueue_mutex is free

Oh.  OK.  In that case we can switch to preempt_disable() for the
cpu-hotplug holdoff.  Sometime.

> > It's an idiotic thing to do, but I think I spotted a site last week which
> > does this.  scsi?  Not sure..
> 
> Ok, it is time to sleep. I'll look tomorrov and re-send if 
> flush_cpu_workqueue()
> really needs "bool workqueue_mutex_is_locked" parameter.

Hopefully not.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] RTC classdev: Add sysfs support for wakeup alarm (r/w)

2006-12-18 Thread David Brownell

On Monday 18 December 2006 4:54 pm, David Brownell wrote:

> > http://handhelds.org/cgi-bin/cvsweb.cgi/linux/kernel26/drivers/rtc/rtc-sa1100.c.diff?r1=1.5=1.6=h
> 
> That patch you applied looks right to me -- why don't you forward it
> to Alessandro as a bugfix for 2.6.20-rc2, and save me the effort?

Actually, correction:  it'd be correct if you ripped out the buggy
calls to manage the irq wake mechanism.  A later message will show
how those need to work.  (The IRQ framework will give one helpful
hint when it warns about mismatched enable/disable calls ...)

- Dave
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Linus Torvalds

On Tue, 19 Dec 2006, Andrei Popa wrote:
> > > 
> > > nope, no file corruption at all.
> > 
> > Ok. That's interesting, but I think you actually #ifdef'ed out too 
> > much:
> > 
> > It was really just the _inner_ "if (mapping_cap_account_dirty(.." 
> > statement that I meant you should remove.
> > 
> > Can you try that too?
> 
> I have file corruption: "Hash check on download completion found bad
> chunks, consider using "safe_sync"."

Ok, that's interesting.

So it doesn't seem to be the call to page_mkclean() itself that causes 
corruption. It looks like Peter's hunch that maybe there's some bug in 
PG_dirty handling _itself_ might be an idea..

And the reason it only started happening now is that it may just have been 
_hidden_ by the fact that while we kept the dirty bits in the page tables, 
we'd end up writing the dirty page _despite_ having lost the PG_dirty bit. 
So if it's some bad interaction between writable mappings and some other 
part of the system, we just didn't see it earlier, exactly because we had 
_lots_ of dirty bits, and it was enough that _one_ of them was right.

If you didn't see corruption when you #ifdef'ed out too much of the 
"test_clean_page_dirty() function (the _whole_ TestClearPageDirty() 
if-statement), but you get it when you just comment out the stuff that 
does the page_mkclean(), that's interesting.

I'm left lookin gat the "radix_tree_tag_clear()" in 
test_clear_page_dirty().

What happens if you only ifdef out that single thing? 

The actual page-cleaning functions make sure to only clear the TAG_DIRTY 
bit _after_ the page has been marked for writeback. Is there some ordering 
constraint there, perhaps?

I'm really reaching here. I'm trying to see the pattern, and I'm not 
seeing it. I'm asking you to test things just to get more of a feel for 
what triggers the failure, than because I actually have any kind of idea 
of what the heck is going on.

Andrew, Nick, Hugh - any ideas?

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] RTC classdev: Add sysfs support for wakeup alarm (r/w)

2006-12-18 Thread David Brownell

Hi Paul,

On Monday 18 December 2006 3:58 pm, Paul Sokolovsky wrote:
> Monday, December 18, 2006, 6:28:58 AM, you wrote:
> > On Sunday 17 December 2006 11:30 am, Paul Sokolovsky wrote:
> 
> >> Small battery-powered systems, like PDAs, need a way to be
> >> suspended most of the time and woken up just from time to time to
> >> process pending tasks. 
> 
> > Sounds like you're thinking of this from a userspace perspective...
> 
> > Could you share some examples of such "pending tasks"?
> 
>   Well, the actual usecase, which triggered me to hack that, was a
> need to write a "burn out" test script for suspend/resume for a
> battery-powered ARM device (PDA), which would do suspend/resume cycle
> thousands of times. And wakeup alarm is obvious, if not only, source of
> automated resume events.

I like how you think ... SCRIPTED TESTING for suspend/resume!!
Can you clone yourself?  Soon, and massively?  :)

Though I confess I've used that example myself.  I think if you scan LKML
you'll find an "rtcwake" program (userspace) I've used on several different
ARMs (not PXA though), and even x86 PCs (using a new RTC driver, but getting
the usual headaches from ACPI S3 resume failures on most systems).


>   Of course, I started by trying existing solutions - e.g. there's an
> "atd" implementation which uses /dev/rtc, but I found it having awful
> latency (>2s), then I tried to write simple C app to set alarm via
> ioctl(), just to find alarm IRQs are shutdown on its exit.
> 
>   But anyway, I'm that kind of guy who think that debugging and
> diagnostics are important things for *production* system,

You'd find violent agreement from anyone who's spent time trying
to support all this fancy technology.  Heck, even toasters have
problems sometimes.


> >> Obvious way to achieve this is to use timer, or 
> >> alarm, wakeup. Unfortunately, this matter is bit confusing in Linux.
> >> There's only one "good" "supported" way to set alarm - via ioctl() on
> >> an RTC device fd. Unfortunately, this alarm is not persistent - as soon
> >> as fd is closed, alarm id discharged.
> 
> > I don't think that's true in general.  Most RTCs don't even care
> > whether userspace did an open() or close().  I see the S3C one does,
> > and that explicitly leaves the alarm active. 
> 
> > But I see that only the SA1100/PXA and SH RTCs turn off all IRQs
> > after RTC_WKALM_* requests ... that's a distinct minority.
> 
>   Oh my! I couldn't even think this can be idiosyncrasy of specific
> implementation. Oh, what a world... ;-)

Yeah, just think how bad it was _before_ the RTC class framework
existed.   I think I counted at least half a dozen implementations,
with minimal API overlap.

One thing we're missing now is RTC conformance tests.  They'd have
turned up some of these issues pretty quickly, if thery were at all
good.

 
> > So judging implementations as votes ... only two implementations
> > that implement the RTC_WKALM_* call follow that rule, and most
> > don't.  However, a few implementations ignore rtc_wkalrm.enabled,
> > or otherwise mistreat that flag (e.g. rtc-ds1553 doesn't disable
> > AIE when enabled==0), so it's clear there are some issues there.
> 
> > My vote would be that closing the FD should not turn off the alarm.
> > It's supposed to be a one-shot deal anyway.
> 
>   I would agree with such behavior. But what's clear that the
> behavior, whatever it is, should be consistent across implementations,
> or its just a mess ;-(.

Yep.  I've been known to submit patches to improve that, even ones
to the RTC framework.  :)

 
> > And also, that someone audits the drivers/rtc code to make sure that
> > alarm-capable drivers handle the rtc_wkalrm.enabled flag correctly;
> > your patch sort of presumes that will happen, anyway.
> 
>   Yes, I mentioned, that for PXA/SA, my patch becomes actually useful
> only after applying your patch (plus, with fixed TODO: here's what
> I applied to handhelds.org tree:
> http://handhelds.org/cgi-bin/cvsweb.cgi/linux/kernel26/drivers/rtc/rtc-sa1100.c.diff?r1=1.5=1.6=h
> ).
> 
>   That of course doesn't mean sysfs alarm support patch depends on
> rtc-sa1100.c patch in any way (it's just PXA/SA won't actually wake up,
> but sysfs patch for showing/storing alarm properties obviously doesn't
> depend on any specific implementation).

Hmm, you're in a position to test, so I may send you an update to try.

That patch you applied looks right to me -- why don't you forward it
to Alessandro as a bugfix for 2.6.20-rc2, and save me the effort?

 
> >> Implement "alarm" attribute group for RTC classdevs. At this time,
> >> add "since_epoch", "wakeup_enabled", and "pending" attributes. First
> >> two support both read and write.
> 
> > I think you shouldn't add this group unless the RTC has methods
> > to read and write the alarm; there are RTCs that don't have that
> > feature.
> 
> > Also, I'd rather see a much simpler interface.  Like a single
> > "alarm" attribute.  It would display as the empty string

Re: [PATCH, RFC rc1-mm1] implement flush_work()

2006-12-18 Thread Andrew Morton

On Mon, 18 Dec 2006 23:17:14 +0300
Oleg Nesterov <[EMAIL PROTECTED]> wrote:

> Add ->current_work to the "struct cpu_workqueue_struct", it points to
> currently running "struct queue_work". When flush_work(work) detects
> ->current_work == work, it inserts a barrier at the _head_ of ->worklist
> (and thus right _after_ that work) and waits for completition. This means
> that the next work fired on that CPU will be this barrier, or another
> barrier queued by concurrent flush_work(), so the caller of flush_work()
> will be woken before any "regular" work has a chance to run.
> 
> Since __queue_work() does both set_wq_data() and list_add_tail() atomically
> under cwq->lock, flush_work() can remove the pending work from queue when
> it sees "get_wq_data(work) == cwq".

Seems sane.

> NOTE: flush_work() doesn't like no-auto-release works. Unless they go away
> we can fix this later or add the "don't do this" comment.

Yes, let's make the _NAR stuff go away plze.  It's fairly
straightforward, and is on my todo list somewhere.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] OOPS: divide error while s2dsk (2.6.20-rc1-mm1)

2006-12-18 Thread Rafael J. Wysocki

On Tuesday, 19 December 2006 00:17, Andrew Morton wrote:
> On Mon, 18 Dec 2006 23:38:23 +0100
> "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote:
> 
> > > > Looks like we have a problem with slab shrinking here.
> > > > 
> > > > Could you please use gdb to check what exactly is at shrink_slab+0x9e?
> > > 
> > > Sure, but not till Friday, sorry (I am away).
> > 
> > I reproduced this on one box, but then it turned out that EIP was at line 
> > 195
> > of mm/vmscan.c where there was
> > 
> > do_div(delta, lru_pages + 1);
> 
> That implies that we passed it lru_pages=-1.
> 
> Presumably the logic in
> vmscanc-account-for-memory-already-freed-in-seeking-to.patch caused that.
> 
> > Well, I have no idea how this can lead to a divide error (lru_pages is
> > unsigned).
> > 
> > I'm unable to reproduce this on another i386 box, so it seems to be somewhat
> > configuration specific.
> > 
> 
> There is one wart in shrink_all_memory() and I think we should fix that in
> 2.6.20.
> 
> Please check the below.

Fine by me.

> I'll drop vmscanc-account-for-memory-already-freed-in-seeking-to.patch.  It
> has other stuff in it which we might still need.  But altering
> sc->swap_cluster_max in that manner looks odd.

Agreed.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: libata and sata?

2006-12-18 Thread John Richard Moser



Alan wrote:
>> I no longer have two kernels to test through; I can't tell if the speed
>> is back or not.  Nothing in dmesg tells me if SATA is using DMA or
>> 32-bit IO support though, so I don't know... lack of knowledge over here
>> is killing me for troubleshooting this on my own.
> 
> The dmesg message shows the mode selected. It should be the highest speed
> but in one or two cases it selects UDMA33 only. I've fixed one of those
> caused by us relying on a bit not defined in older controllers. We've
> still got a case in the newer chips where BIOS setup doesn't set the
> flags properly. Old IDE has a hackish workaround for that and I'll
> probably end up porting it over.
> 
> 

It seems the highest speed here is UDMA/133.  That should be right...

I've let this go for now; except someone just brought up that copying
from one SATA drive to another slows Ubuntu to a crawl (which is what
I'm using, hence my dmesg should be relevant).  On my end I'm not
noticing; VLC used to hang the system horribly while trying to read like
20M videos (hard disk light on the whole time), now it behaves.


[   25.411977] sata_via :00:0f.0: version 2.0
[   25.411992] ACPI: PCI Interrupt :00:0f.0[B] -> Link [ALKA] -> GSI
20 (level, low) -> IRQ 18
[   25.412004] sata_via :00:0f.0: routed to hard irq line 11
[   25.412057] ata3: SATA max UDMA/133 cmd 0x9400 ctl 0x9802 bmdma
0xA400 irq 18
[   25.412363] ata4: SATA max UDMA/133 cmd 0x9C00 ctl 0xA002 bmdma
0xA408 irq 18
[   25.412380] scsi2 : sata_via
[   25.415286] 8139cp: 10/100 PCI Ethernet driver v1.3 (Mar 22, 2004)
[   25.598514] ieee1394: Host added: ID:BUS[0-00:1023]
GUID[]
[   25.613389] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[   25.738290] usb 2-1: device not accepting address 2, error -71
[   25.764951] ata3.00: ATA-7, max UDMA/133, 240121728 sectors: LBA
[   25.764954] ata3.00: ata3: dev 0 multi count 16
[   25.765730] ata3.00: configured for UDMA/133
[   25.765741] scsi3 : sata_via
[   25.967113] ata4: SATA link down 1.5 Gbps (SStatus 0 SControl 300)
[   25.977712] ATA: abnormal status 0x7F on port 0x9C07
[   25.977852] scsi 2:0:0:0: Direct-Access ATA  Maxtor 6Y120M0
 YAR5 PQ: 0 ANSI: 5

-- 
We will enslave their women, eat their children and rape their
cattle!
  -- Bosc, Evil alien overlord from the fifth dimension
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.18.6

2006-12-18 Thread Chris Wright

diff --git a/Makefile b/Makefile
index 85d8009..c8b2f7e 100644
--- a/Makefile
+++ b/Makefile
@@ -1,7 +1,7 @@
 VERSION = 2
 PATCHLEVEL = 6
 SUBLEVEL = 18
-EXTRAVERSION = .5
+EXTRAVERSION = .6
 NAME=Avast! A bilge rat!
 
 # *DOCUMENTATION*
diff --git a/arch/arm/kernel/calls.S b/arch/arm/kernel/calls.S
index 3173924..e8f7436 100644
--- a/arch/arm/kernel/calls.S
+++ b/arch/arm/kernel/calls.S
@@ -331,6 +331,19 @@
CALL(sys_mbind)
 /* 320 */  CALL(sys_get_mempolicy)
CALL(sys_set_mempolicy)
+   CALL(sys_openat)
+   CALL(sys_mkdirat)
+   CALL(sys_mknodat)
+/* 325 */  CALL(sys_fchownat)
+   CALL(sys_futimesat)
+   CALL(sys_fstatat64)
+   CALL(sys_unlinkat)
+   CALL(sys_renameat)
+/* 330 */  CALL(sys_linkat)
+   CALL(sys_symlinkat)
+   CALL(sys_readlinkat)
+   CALL(sys_fchmodat)
+   CALL(sys_faccessat)
 #ifndef syscalls_counted
 .equ syscalls_padding, ((NR_syscalls + 3) & ~3) - NR_syscalls
 #define syscalls_counted
diff --git a/arch/m32r/kernel/entry.S b/arch/m32r/kernel/entry.S
index ac6d840..5b01fd2 100644
--- a/arch/m32r/kernel/entry.S
+++ b/arch/m32r/kernel/entry.S
@@ -23,35 +23,35 @@
  * updated in fork.c:copy_thread, signal.c:do_signal,
  * ptrace.c and ptrace.h
  *
- * M32Rx/M32R2 M32R
- *   @(sp)  - r4   ditto
- *   @(0x04,sp) - r5   ditto
- *   @(0x08,sp) - r6   ditto
- *   @(0x0c,sp) - *pt_regs ditto
- *   @(0x10,sp) - r0   ditto
- *   @(0x14,sp) - r1   ditto
- *   @(0x18,sp) - r2   ditto
- *   @(0x1c,sp) - r3   ditto
- *   @(0x20,sp) - r7   ditto
- *   @(0x24,sp) - r8   ditto
- *   @(0x28,sp) - r9   ditto
- *   @(0x2c,sp) - r10  ditto
- *   @(0x30,sp) - r11  ditto
- *   @(0x34,sp) - r12  ditto
- *   @(0x38,sp) - syscall_nr   ditto
- *   @(0x3c,sp) - acc0h@(0x3c,sp) - acch
- *   @(0x40,sp) - acc0l@(0x40,sp) - accl
- *   @(0x44,sp) - acc1h@(0x44,sp) - dummy_acc1h
- *   @(0x48,sp) - acc1l@(0x48,sp) - dummy_acc1l
- *   @(0x4c,sp) - psw  ditto
- *   @(0x50,sp) - bpc  ditto
- *   @(0x54,sp) - bbpswditto
- *   @(0x58,sp) - bbpc ditto
- *   @(0x5c,sp) - spu (cr3)ditto
- *   @(0x60,sp) - fp (r13) ditto
- *   @(0x64,sp) - lr (r14) ditto
- *   @(0x68,sp) - spi (cr2)ditto
- *   @(0x6c,sp) - orig_r0  ditto
+ * M32R/M32Rx/M32R2
+ *   @(sp)  - r4
+ *   @(0x04,sp) - r5
+ *   @(0x08,sp) - r6
+ *   @(0x0c,sp) - *pt_regs
+ *   @(0x10,sp) - r0
+ *   @(0x14,sp) - r1
+ *   @(0x18,sp) - r2
+ *   @(0x1c,sp) - r3
+ *   @(0x20,sp) - r7
+ *   @(0x24,sp) - r8
+ *   @(0x28,sp) - r9
+ *   @(0x2c,sp) - r10
+ *   @(0x30,sp) - r11
+ *   @(0x34,sp) - r12
+ *   @(0x38,sp) - syscall_nr
+ *   @(0x3c,sp) - acc0h
+ *   @(0x40,sp) - acc0l
+ *   @(0x44,sp) - acc1h; ISA_DSP_LEVEL2 only
+ *   @(0x48,sp) - acc1l; ISA_DSP_LEVEL2 only
+ *   @(0x4c,sp) - psw
+ *   @(0x50,sp) - bpc
+ *   @(0x54,sp) - bbpsw
+ *   @(0x58,sp) - bbpc
+ *   @(0x5c,sp) - spu (cr3)
+ *   @(0x60,sp) - fp (r13)
+ *   @(0x64,sp) - lr (r14)
+ *   @(0x68,sp) - spi (cr2)
+ *   @(0x6c,sp) - orig_r0
  */
 
 #include 
@@ -95,17 +95,10 @@
 #define R11(reg)   @(0x30,reg)
 #define R12(reg)   @(0x34,reg)
 #define SYSCALL_NR(reg)@(0x38,reg)
-#if defined(CONFIG_ISA_M32R2) && defined(CONFIG_ISA_DSP_LEVEL2)
 #define ACC0H(reg) @(0x3C,reg)
 #define ACC0L(reg) @(0x40,reg)
 #define ACC1H(reg) @(0x44,reg)
 #define ACC1L(reg) @(0x48,reg)
-#elif defined(CONFIG_ISA_M32R2) || defined(CONFIG_ISA_M32R)
-#define ACCH(reg)  @(0x3C,reg)
-#define ACCL(reg)  @(0x40,reg)
-#else
-#error unknown isa configuration
-#endif
 #define PSW(reg)   @(0x4C,reg)
 #define BPC(reg)   @(0x50,reg)
 #define BBPSW(reg) @(0x54,reg)
diff --git a/arch/x86_64/kernel/setup.c b/arch/x86_64/kernel/setup.c
index 34afad7..ffcb9e4 100644
--- a/arch/x86_64/kernel/setup.c
+++ b/arch/x86_64/kernel/setup.c
@@ -1010,7 +1010,10 @@ static void __cpuinit init_intel(struct cpuinfo_x86 *c)
if ((c->x86 == 0xf && c->x86_model >= 0x03) ||
(c->x86 == 0x6 && c->x86_model >= 0x0e))
set_bit(X86_FEATURE_CONSTANT_TSC, >x86_capability);
-   set_bit(X86_FEATURE_SYNC_RDTSC, >x86_capability);
+   if (c->x86 == 15)
+   set_bit(X86_FEATURE_SYNC_RDTSC, >x86_capability);
+   else
+

Linux 2.6.18.6

2006-12-18 Thread Chris Wright

We (the -stable team) are announcing the release of the 2.6.18.6 kernel.
An assortment of important fixes with one security related fix that is
associated with less common bluetooth hardware:

1dca7c28: Bluetooth: Add packet size checks for CAPI messages (CVE-2006-6106)

The diffstat and short summary of the fixes are below.

I'll also be replying to this message with a copy of the patch between
2.6.18.5 and 2.6.18.6, as it is small enough to do so.

The updated 2.6.18.y git tree can be found at:  
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.18.y.git 
and can be browsed at the normal kernel.org git web browser:
www.kernel.org/git/ 

thanks,
-chris


 Makefile  |2 
 arch/arm/kernel/calls.S   |   13 +
 arch/m32r/kernel/entry.S  |   65 +++---
 arch/x86_64/kernel/setup.c|5 +-
 drivers/ieee1394/ohci1394.c   |   21 ++--
 drivers/md/dm-crypt.c |6 +-
 drivers/md/dm-snap.c  |1 
 drivers/media/dvb/frontends/lgdt330x.c|6 --
 drivers/media/video/tuner-simple.c|2 
 drivers/media/video/tuner-types.c |   14 -
 drivers/net/bonding/bond_main.c   |2 
 drivers/net/forcedeth.c   |3 +
 drivers/net/sunhme.c  |5 ++
 fs/compat.c   |2 
 include/asm-arm/unistd.h  |   13 +
 include/asm-m32r/ptrace.h |   28 +--
 include/asm-m32r/sigcontext.h |   13 -
 kernel/softirq.c  |2 
 net/bluetooth/cmtp/capi.c |   39 +--
 net/bridge/netfilter/ebtables.c   |   54 +
 net/ieee80211/softmac/ieee80211softmac_scan.c |2 
 net/ipv4/netfilter/ip_tables.c|5 +-
 net/ipv4/route.c  |2 
 net/ipv4/xfrm4_policy.c   |2 
 net/irda/irttp.c  |4 -
 net/sched/act_gact.c  |4 -
 net/sched/act_police.c|   26 --
 27 files changed, 199 insertions(+), 142 deletions(-)

Summary of changes from v2.6.18.5 to v2.6.18.6


Al Viro (4):
  EBTABLES: Fix wraparounds in ebt_entries verification.
  EBTABLES: Verify that ebt_entries have zero ->distinguisher.
  EBTABLES: Deal with the worst-case behaviour in loop checks.
  EBTABLES: Prevent wraparounds in checks for entry components' sizes.

Andrey Mirkin (1):
  skip data conversion in compat_sys_mount when data_page is NULL

Andy Gospodarek (1):
  bonding: incorrect bonding state reported via ioctl

Arjan van de Ven (1):
  x86-64: Mark rdtsc as sync only for netburst, not for core2

Chris Wright (1):
  Linux 2.6.18.6

Christophe Saout (1):
  dm crypt: Fix data corruption with dm-crypt over RAID5

Daniel Barkalow (1):
  forcedeth: Disable INTx when enabling MSI in forcedeth

David Miller (3):
  PKT_SCHED act_gact: division by zero
  XFRM: Use output device disable_xfrm for forwarded packets
  IPSEC: Fix inetpeer leak in ipv4 xfrm dst entries.

Hans Verkuil (1):
  V4L: Fix broken TUNER_LG_NTSC_TAPE radio support

Hirokazu Takata (1):
  m32r: make userspace headers platform-independent

Jeet Chaudhuri (1):
  IrDA: Incorrect TTP header reservation

Jurij Smakov (1):
  SUNHME: Fix for sunhme failures on x86

Marcel Holtmann (1):
  Bluetooth: Add packet size checks for CAPI messages (CVE-2006-6106)

Michael Buesch (1):
  softmac: remove netif_tx_disable when scanning

Michael Krufky (1):
  DVB: lgdt330x: fix signal / lock status detection bug

Milan Broz (1):
  dm snapshot: fix freeing pending exception

Patrick McHardy (2):
  NET_SCHED: policer: restore compatibility with old iproute binaries
  NETFILTER: ip_tables: revision support for compat code

Russell King (1):
  ARM: Add sys_*at syscalls

Stefan Richter (1):
  ieee1394: ohci1394: add PPC_PMAC platform code to driver probe

Zachary Amsden (1):
  softirq: remove BUG_ONs which can incorrectly trigger

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH, RFC] reimplement flush_workqueue()

2006-12-18 Thread Oleg Nesterov

On 12/18, Andrew Morton wrote:
>
> On Mon, 18 Dec 2006 01:34:16 +0300
> Oleg Nesterov <[EMAIL PROTECTED]> wrote:
> 
> > NOTE: I removed 'int cpu' parameter, flush_workqueue() locks/unlocks
> > workqueue_mutex unconditionally. It may be restored, but I think it
> > doesn't make much sense, we take the mutex for the very short time,
> > and the code becomes simpler.
> > 
> 
> Taking workqueue_mutex() unconditionally in flush_workqueue() means
> that we'll deadlock if a single-threaded workqueue callback handler calls
> flush_workqueue().

Well. But flush_workqueue() drops workqueue_mutex before going to sleep ?

flush_workqueue(single_threaded_wq);
...
mutex_lock(_mutex);
...
mutex_unlock(_mutex);
wait_for_completition();
handler runs,
calls flush_workqueue(),
workqueue_mutex is free

> It's an idiotic thing to do, but I think I spotted a site last week which
> does this.  scsi?  Not sure..

Ok, it is time to sleep. I'll look tomorrov and re-send if flush_cpu_workqueue()
really needs "bool workqueue_mutex_is_locked" parameter.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: GPL only modules

2006-12-18 Thread Paul Mackerras

Linus Torvalds writes:
> 
> 
> On Tue, 19 Dec 2006, Paul Mackerras wrote:
> >
> > There is in fact a pretty substantial non-technical difference between
> > static and dynamic linking.  If I create a binary by static linking
> > and I include some library, and I distribute that binary to someone
> > else, the recipient doesn't need to have a separate copy of the
> > library, because they get one in the binary.
> 
> I agree, and I do agree that it's a real difference. 
> 
> I personally think that it's the "aggregation" issue, not a "derivation" 
> issue, but I'll freely admit that it's just my personal view of the 
> situation.

I think the critical issue is whether any human creativity is required
to establish derivation.

Clearly there is some modification and adaptation that ld does to a
library in the process of linking it into a binary, and ld is unlike
mkisofs or gzip in that you can't extract the library in its original
form (or any form suitable for linking with another program) from the
output of ld --static.

The question is whether it matters that the process that ld does is
mechanical in nature.  This is possibly an area where you'll get a
different answer in different jurisdictions.  I believe that in the
US, some creative input is required to establish copyright, whereas in
Australia, only "effort" is needed.  I don't know whether that affects
the definition of "derived work".

I personally would think that a mechanical process of modification
*does* create a derived work, but it would take a court of law or a
legislature to make an authoritative decision, I guess.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.20-rc1-mm1

2006-12-18 Thread Andrew Morton

On Mon, 18 Dec 2006 16:29:02 -0800
Randy Dunlap <[EMAIL PROTECTED]> wrote:

> On Thu, 14 Dec 2006 22:59:13 -0800 Andrew Morton wrote:
> 
> Got this on booting up on x86_64 test box.
> Didn't happen on next boot.
> 
> 
> BUG: scheduling while atomic: hald-addon-stor/0x2000/3300
> 
> Call Trace:
>  [] show_trace+0x34/0x47
>  [] dump_stack+0x12/0x17
>  [] __sched_text_start+0x5d/0x7ba
>  [] __cond_resched+0x1c/0x44
>  [] cond_resched+0x29/0x30
>  [] __reacquire_kernel_lock+0x26/0x44
>  [] thread_return+0xac/0xea
>  [] __cond_resched+0x1c/0x44
>  [] cond_resched+0x29/0x30
>  [] wait_for_completion+0x17/0xd2
>  [] blk_execute_rq+0x98/0xb8
>  [] scsi_execute+0xd4/0xf1
>  [] scsi_execute_req+0xb9/0xde
>  [] scsi_test_unit_ready+0x39/0x75
>  [] sd_media_changed+0x40/0x87
>  [] check_disk_change+0x1f/0x76
>  [] sd_open+0x80/0x113
>  [] do_open+0x9f/0x2a7
>  [] blkdev_open+0x2e/0x5d
>  [] __dentry_open+0xd9/0x1a7
>  [] do_filp_open+0x2a/0x38
>  [] do_sys_open+0x44/0xc8
>  [] system_call+0x7e/0x83

Bit 29 of current->preempt_count got set.  I don't think there's any way in
which that can happen normally.  So probably some hardware or software
error reached out and flipped that bit.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

FYI: [patch 2.6.20-rc1 0/6] I2C driver model updates, part II

2006-12-18 Thread David Brownell

This is just a heads-up to the folk who read LKML more than more specialized
Linux lists ... there's work afoot to clean up the I2C core and make it fit
the driver model better.  (Some would say "overdue work...".)

The most interesting/useful part (IMO) is summarized in the appended message;
you can read list archives starting at:

  http://lists.lm-sensors.org/pipermail/i2c/2006-December/000633.html

The "part I" stuff gets rid of i2c_adapter.dev (it was pretty pointless),
which opens the door to eliminating even more of the redundancy between
i2c-core and the driver model.  It also lets I2C device drivers use the
standard driver model suspend()/resume()/shutdown mechanisms.  Innocuous,
despite the number of i2c_adapter.dev users that needed to change.

The "part II" bits basically leave "legacy" I2C drivers alone; they add
support for "new style" drivers.  The difference is that legacy drivers
create their own device nodes, and their probe() routines look at busses;
while new style drivers work like any other driver in current Linux,
with probe() being handed a pre-created device node (i2c_client).

I think the plan is to get this into MM soonish; "part I" being nearly
ready, and "part II" probably still needs a bit of tweaking.  (I have
my own notions, and suspect at least one person on LKML will have some
opinions to share...)  There are already OpenFirmware conversion patches,
and driver/platform conversions will as usual take a while to sort out.

- Dave



--  Forwarded Message  --

Subject: [patch 2.6.20-rc1 0/6] I2C driver model updates, part II
Date: Monday 18 December 2006 2:59 pm
From: David Brownell <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]

As promised, the second set of patches ... adding "new style" driver
probe support.  This will help at least

 - Embedded I2C in general (which need IRQs, and board-specific config)
 - OMAP (where I2C can't use SMBUS_QUICK)
 - RTCs (which for various reasons often can't use SMBUS_QUICK probing)
 - OpenFirmware (which provides tables of I2C devices)
 - DVB (I'm told the probing is problematic there too)

These patches apply after the latest "part I" patches (sent to this list,
with three updates posted after feedback from Jean).  The patches are:

 - Support probe(), and hotplug/coldplug
 - Support remove()
 - Document the "new style" driver model probe() and remove()
 - Drive new style driver binding by {bus#, dev#, info} device declarations
 - Update the i2c-omap adapter driver to use bus# matching chip docs
 - Update one OMAP board, and a driver, to use the new infrastructure

I expect that last patch will get split in two; for this series it's
best viewed as an example.

- Dave


---
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa

On Mon, 2006-12-18 at 16:04 -0800, Linus Torvalds wrote:
> 
> On Tue, 19 Dec 2006, Andrei Popa wrote:
> > > 
> > > There's exactly two call sites that call "page_mkclean()" (an dthat is 
> > > the 
> > > only thing in turn that calls "page_mkclean_one()", which we already 
> > > determined will cause the corruption). 
> > >
> > > Can you just TOTALLY DISABLE that case for the test_clear_page_dirty() 
> > > case? Just do an "#if 0 .. #endif" around that whole if-statement, 
> > > leaving 
> > > the _only_ thing that actually calls "page_mkclean()" to be the 
> > > "clear_page_dirty_for_io()" call.
> > > 
> > > Do you still see corruption?
> > 
> > nope, no file corruption at all.
> 
> Ok. That's interesting, but I think you actually #ifdef'ed out too 
> much:
> 
> > +
> > +#if 0
> > if (TestClearPageDirty(page)) {
> > radix_tree_tag_clear(>page_tree,
> > page_index(page), PAGECACHE_TAG_DIRTY);
> > @@ -866,11 +868,19 @@ int test_clear_page_dirty(struct page *p
> >  * page is locked, which pins the address_space
> >  */
> > if (mapping_cap_account_dirty(mapping)) {
> > -   page_mkclean(page);
> > +   int cleaned = page_mkclean(page);
> > +   if (!must_clean_ptes && cleaned){
> > +   WARN_ON(1);
> > +   set_page_dirty(page);
> > +   }
> > +
> > dec_zone_page_state(page, NR_FILE_DIRTY);
> > }
> > return 1;
> > }
> > +
> > +#endif
> > +
> 
> It was really just the _inner_ "if (mapping_cap_account_dirty(.." 
> statement that I meant you should remove.
> 
> Can you try that too?

I have file corruption: "Hash check on download completion found bad
chunks, consider using "safe_sync"."


diff --git a/fs/buffer.c b/fs/buffer.c
index d1f1b54..263f88e 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag
int ret = 0;
 
BUG_ON(!PageLocked(page));
-   if (PageWriteback(page))
+   if (PageDirty(page) || PageWriteback(page))
return 0;
 
if (mapping == NULL) {  /* can this still happen? */
@@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag
spin_lock(>private_lock);
ret = drop_buffers(page, _to_free);
spin_unlock(>private_lock);
-   if (ret) {
-   /*
-* If the filesystem writes its buffers by hand (eg ext3)
-* then we can have clean buffers against a dirty page.  We
-* clean the page here; otherwise later reattachment of buffers
-* could encounter a non-uptodate page, which is unresolvable.
-* This only applies in the rare case where try_to_free_buffers
-* succeeds but the page is not freed.
-*
-* Also, during truncate, discard_buffer will have marked all
-* the page's buffers clean.  We discover that here and clean
-* the page also.
-*/
-   if (test_clear_page_dirty(page))
-   task_io_account_cancelled_write(PAGE_CACHE_SIZE);
-   }
 out:
if (buffers_to_free) {
struct buffer_head *bh = buffers_to_free;
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 0f05cab..2d8 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1245,7 +1245,7 @@ retry:
wait_on_page_writeback(page);
 
if (PageWriteback(page) ||
-   !test_clear_page_dirty(page)) {
+   !test_clear_page_dirty(page, 0)) {
unlock_page(page);
break;
}
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 1387749..da2bdb1 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -484,7 +484,7 @@ static int fuse_commit_write(struct file
spin_unlock(>lock);
 
if (offset == 0 && to == PAGE_CACHE_SIZE) {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 0);
SetPageUptodate(page);
}
}
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index ed2c223..9f82cd0 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct
 
 static void truncate_huge_page(struct page *page)
 {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 0);
ClearPageUptodate(page);
remove_from_page_cache(page);
put_page(page);
diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c
index b1a1c72..5e29b37 100644
--- a/fs/jfs/jfs_metapage.c
+++ b/fs/jfs/jfs_metapage.c
@@ -773,7 +773,7 @@ #if MPS_PER_PAGE == 1
 
/* Retest mp->count since we may have released page

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Linus Torvalds

On Tue, 19 Dec 2006, Andrei Popa wrote:
> 
> the corrupted file has a chink full with zeros
> 
> http://193.226.119.62/corruption0.jpg
> http://193.226.119.62/corruption1.jpg

Thanks. Yup, filled with zeroes, and the corruption stops (but does _not_ 
start) at a page boundary.

That _does_ look very much like it was filled in linearly, then written 
out to disk when it was in the middle of the page, and then we simply lost 
the further writes that should also have gone on to that page. All 
consistent with dropping a dirty bit somewhere in the middle of the page 
updates.

Which we kind of knew must be the issue anyway, but it's good to know that 
the corruption pattern is consistent with what we're trying to figure out.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH, RFC] reimplement flush_workqueue()

2006-12-18 Thread Andrew Morton

On Mon, 18 Dec 2006 01:34:16 +0300
Oleg Nesterov <[EMAIL PROTECTED]> wrote:

> Remove ->remove_sequence, ->insert_sequence, and ->work_done from
> struct cpu_workqueue_struct. To implement flush_workqueue() we can
> queue a barrier work on each CPU and wait for its completition.

Seems sensible.  I seem to recall considering doing it that way when I
initially implemeted flush_workqueue(), but I don't recall why I didn't do
this.  hmm.

> We don't need to worry about CPU going down while we are are sleeping
> on the completition. take_over_work() will move this work on another
> CPU, and the handler will wake up us eventually.
> 
> NOTE: I removed 'int cpu' parameter, flush_workqueue() locks/unlocks
> workqueue_mutex unconditionally. It may be restored, but I think it
> doesn't make much sense, we take the mutex for the very short time,
> and the code becomes simpler.
> 

Taking workqueue_mutex() unconditionally in flush_workqueue() means
that we'll deadlock if a single-threaded workqueue callback handler calls
flush_workqueue().

It's an idiotic thing to do, but I think I spotted a site last week which
does this.  scsi?  Not sure..
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.20-rc1-mm1

2006-12-18 Thread Randy Dunlap

On Thu, 14 Dec 2006 22:59:13 -0800 Andrew Morton wrote:

Got this on booting up on x86_64 test box.
Didn't happen on next boot.


BUG: scheduling while atomic: hald-addon-stor/0x2000/3300

Call Trace:
 [] show_trace+0x34/0x47
 [] dump_stack+0x12/0x17
 [] __sched_text_start+0x5d/0x7ba
 [] __cond_resched+0x1c/0x44
 [] cond_resched+0x29/0x30
 [] __reacquire_kernel_lock+0x26/0x44
 [] thread_return+0xac/0xea
 [] __cond_resched+0x1c/0x44
 [] cond_resched+0x29/0x30
 [] wait_for_completion+0x17/0xd2
 [] blk_execute_rq+0x98/0xb8
 [] scsi_execute+0xd4/0xf1
 [] scsi_execute_req+0xb9/0xde
 [] scsi_test_unit_ready+0x39/0x75
 [] sd_media_changed+0x40/0x87
 [] check_disk_change+0x1f/0x76
 [] sd_open+0x80/0x113
 [] do_open+0x9f/0x2a7
 [] blkdev_open+0x2e/0x5d
 [] __dentry_open+0xd9/0x1a7
 [] do_filp_open+0x2a/0x38
 [] do_sys_open+0x44/0xc8
 [] system_call+0x7e/0x83
 [<2b6c5bb34580>]

---
~Randy
kconfig:  http://oss.oracle.com/~rdunlap/configs/config-2620-rc1mm1
full log:  http://oss.oracle.com/~rdunlap/logs/2620-rc1mm1.out
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix sparsemem on Cell

2006-12-18 Thread KAMEZAWA Hiroyuki

On Mon, 18 Dec 2006 15:16:20 -0800
Dave Hansen <[EMAIL PROTECTED]> wrote:

> enum context
> {
> EARLY,
> HOTPLUG
> };
I like this :)

Thanks,
-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa

On Mon, 2006-12-18 at 14:45 -0800, Linus Torvalds wrote:
> 
> On Mon, 18 Dec 2006, Alessandro Suardi wrote:
> > 
> > No idea whether this can be a data point or not, but
> > here it goes... my P2P box is about to turn 5 days old
> > while running nonstop one or both of aMule 2.1.3 and
> > BitTorrent 4.4.0 on ext3 mounted w/default options
> > on both IDE and USB disks. Zero corruption.
> > 
> > AMD K7-800, 512MB RAM, PREEMPT/UP kernel,
> > 2.6.19-git20 on top of up-to-date FC6.
> 
> It _looks_ like PREEMPT/SMP is one common configuration.
> 
> It might also be that the blocksize of the filesystem matters. 4kB 
> filesystems are fundamentally simpler than 1kB filesystems, for example. 
> You can tell at least with "/sbin/dumpe2fs -h /dev/..." or something.
> 
> Andrei - one thing that might be interesting to see: when corruption 
> occurs, can you get the corrupted file somehow? And compare it with a 
> known-good copy to see what the corruption looks like?

the corrupted file has a chink full with zeros

http://193.226.119.62/corruption0.jpg
http://193.226.119.62/corruption1.jpg



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [solved] Yenta Cardbus allocation failure

2006-12-18 Thread Markus Rechberger


I went on with investigating that problem and found the problem,
though I'm not sure if that solution is acceptable..

seems like the memory range gets preallocated in setup-bus.c, and
CARDBUS_MEM_SIZE defines that size.

I changed
#define CARDBUS_MEM_SIZE(32*1024*1024)
to
#define CARDBUS_MEM_SIZE(48*1024*1024)

and now the system is able to allocate the resources for the 3rd
pci/pcmcia function.

Can anyone please have a closer look at it too? I think the whole
implementation isn't really good there..

so this is the new output of iomem:
$ cat /proc/iomem
...
3000-35ff : PCI Bus #02
 3000-32ff : PCI CardBus #03
3600-360003ff : :00:1f.1
3900-3bff : PCI CardBus #03
 3900-39ff : :03:00.0
 3a00-3aff : :03:00.1
 3b00-3bff : :03:00.2 <- this one failed to allocate previously
3c00-3eff : PCI CardBus #07
4100-43ff : PCI CardBus #07
...

and lspci:
$ lspci -vvv
03:00.0 Multimedia video controller: Conexant CX23880/1/2/3 PCI Video
and Audio Decoder (rev 05)
   Subsystem: Yuan Yuan Enterprise Co., Ltd. Unknown device 1788
   Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
   Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium

TAbort- SERR- 
   Interrupt: pin A routed to IRQ 10
   Region 0: Memory at 3900 (32-bit, non-prefetchable)
[disabled] [size=16M]
   Capabilities: [44] Vital Product Data
   Capabilities: [4c] Power Management version 2
   Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
   Status: D0 PME-Enable- DSel=0 DScale=0 PME-

03:00.1 Multimedia controller: Conexant CX23880/1/2/3 PCI Video and
Audio Decoder [Audio Port] (rev 05)
   Subsystem: Yuan Yuan Enterprise Co., Ltd. Unknown device 1788
   Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
   Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium

TAbort- SERR- 
   Interrupt: pin A routed to IRQ 10
   Region 0: Memory at 3a00 (32-bit, non-prefetchable)
[disabled] [size=16M]
   Capabilities: [4c] Power Management version 2
   Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
   Status: D0 PME-Enable- DSel=0 DScale=0 PME-

03:00.2 Multimedia controller: Conexant CX23880/1/2/3 PCI Video and
Audio Decoder [MPEG Port] (rev 05)
   Subsystem: Yuan Yuan Enterprise Co., Ltd. Unknown device 1788
   Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
   Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium

TAbort- SERR- 
   Interrupt: pin A routed to IRQ 10
NEW --> Region 0: Memory at 3b00 (32-bit, non-prefetchable)
[disabled] [size=16M]
   Capabilities: [4c] Power Management version 2
   Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
   Status: D0 PME-Enable- DSel=0 DScale=0 PME-


thanks,
Markus

On 12/12/06, Markus Rechberger <[EMAIL PROTECTED]> wrote:

Hi,

I've got a PCMCIA Hybrid TV tuner, but when I plug it in it fails to
allocate resources for the 3rd PCI function.
I already searched with google and someone implemented an otion

parm:   override_bios:yenta ignore bios resource allocation (uint

in yenta_socket, though this doesn't seem to work out with that device.

Any idea how that problem can be solved?

So here are some logs

lspci:
:03:00.0 Multimedia video controller: Conexant CX23880/1/2/3 PCI
Video and Audio Decoder (rev 05)
Subsystem: Yuan Yuan Enterprise Co., Ltd.: Unknown device 1788
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- SERR- TAbort- SERR- TAbort- SERR- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Open letter to Linux kernel developers (was Re: Binary Drivers)

2006-12-18 Thread Jesper Juhl


On 18/12/06, Hannu Savolainen <[EMAIL PROTECTED]> wrote:

Marek Wawrzyczny wrote:
> Dear Linux Kernel ML,
>
> I am writing as a Linux-only user of over 2 years to express my concern with
> the recent proposal to block out closed source modules from the kernel.
>
> While, I understand and share your sentiments over open source software and
> drivers. I fear however, that trying to steamroll the industry into
> developing open source drivers by banning closed source drivers is going to
> have a completely different result. They will simply abandon Linux support
> for some of their products altogether.
>
As a developer of some "closed source" drivers I can confirm that this
is exactly the case. I would never consider open sourcing my work just
because somebody is pointing pistol to my neck. I would leave the whole
IT business and start doing something else rather than accept this kind
of mafia-like negotiation methods.



Why is this dead horse still kicking?
Linus already spoke on this issue (
http://lkml.org/lkml/2006/12/13/370 ,
http://lkml.org/lkml/2006/12/14/218 ) and Greg KH already withdrew his
patch ( http://lkml.org/lkml/2006/12/14/63 ), so could we please just
let this dead horse rest in peace?

--
Jesper Juhl <[EMAIL PROTECTED]>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please  http://www.expita.com/nomime.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2/2] agpgart - Remove unnecessary flushes.

2006-12-18 Thread Dave Jones

On Fri, Dec 08, 2006 at 07:24:37PM +0100, Thomas Hellström wrote:
 > This patch is to speed up flipping of pages in and out of the AGP 
 > aperture as needed by the new drm memory manager.
 > 
 > A number of global cache flushes are removed as well as some PCI posting 
 > flushes.
 > The following guidelines have been used:
 > 
 > 1) Memory that is only mapped uncached and that has been subject to a 
 > global cache flush after the mapping was changed to uncached does not 
 > need any more cache flushes. Neither before binding to the aperture nor 
 > after unbinding.
 > 
 > 2) Only do one PCI posting flush after a sequence of writes modifying 
 > page entries in the GATT.
 > 
 > Patch against davej's agpgart.git

I looked at applying this one to agpgart.git, as it's less controversial
than the other patch. However,..

- MIME : just say no. I had to hand fix up a few things before git would
  even see that I was feeding it a diff.
- No Signed-off-by: line.
- The diff adds trailing whitespace. This makes git sad also.
  (This problem also affects the other diff, which is possibly why...)
- Finally..
   error: patch failed: drivers/char/agp/generic.c:1076
   error: drivers/char/agp/generic.c: patch does not apply
   error: patch failed: drivers/char/agp/intel-agp.c:256
   error: drivers/char/agp/intel-agp.c: patch does not apply

  Perhaps this diff should have have been [1/2] instead.

Can you fix those up, and resend this one?

The other diff I want to chew over some more before applying, especially
after Arjan's comments.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Linus Torvalds



On Tue, 19 Dec 2006, Andrei Popa wrote:
> > 
> > There's exactly two call sites that call "page_mkclean()" (an dthat is the 
> > only thing in turn that calls "page_mkclean_one()", which we already 
> > determined will cause the corruption). 
> >
> > Can you just TOTALLY DISABLE that case for the test_clear_page_dirty() 
> > case? Just do an "#if 0 .. #endif" around that whole if-statement, leaving 
> > the _only_ thing that actually calls "page_mkclean()" to be the 
> > "clear_page_dirty_for_io()" call.
> > 
> > Do you still see corruption?
> 
> nope, no file corruption at all.

Ok. That's interesting, but I think you actually #ifdef'ed out too 
much:

> +
> +#if 0
>   if (TestClearPageDirty(page)) {
>   radix_tree_tag_clear(>page_tree,
>   page_index(page), PAGECACHE_TAG_DIRTY);
> @@ -866,11 +868,19 @@ int test_clear_page_dirty(struct page *p
>* page is locked, which pins the address_space
>*/
>   if (mapping_cap_account_dirty(mapping)) {
> - page_mkclean(page);
> + int cleaned = page_mkclean(page);
> + if (!must_clean_ptes && cleaned){
> + WARN_ON(1);
> + set_page_dirty(page);
> + }
> +
>   dec_zone_page_state(page, NR_FILE_DIRTY);
>   }
>   return 1;
>   }
> +
> +#endif
> +

It was really just the _inner_ "if (mapping_cap_account_dirty(.." 
statement that I meant you should remove.

Can you try that too?

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/1] Char: isicom, correct probing/removing

2006-12-18 Thread Jiri Slaby

isicom, correct probing/removing

Don't forget to decrease card_count in fail paths and in remove function.
Also null board->base in such cases to point out, that this structure is
unused and thus can be reassigned.

Signed-off-by: Jiri Slaby <[EMAIL PROTECTED]>

---
commit ab95fdae2db7f8fded639796814079441f04a3e2
tree 07b12dfe0e0c1e79c79aac160a5ccd24e2cfa3d3
parent f2aae537dbeeed215a444f386f0cf6dd93a463fd
author Jiri Slaby <[EMAIL PROTECTED]> Tue, 19 Dec 2006 00:55:13 +0100
committer Jiri Slaby <[EMAIL PROTECTED]> Tue, 19 Dec 2006 00:55:13 +0100

 drivers/char/isicom.c |   16 +---
 1 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/char/isicom.c b/drivers/char/isicom.c
index d99a73e..dd361ff 100644
--- a/drivers/char/isicom.c
+++ b/drivers/char/isicom.c
@@ -1747,7 +1747,7 @@ end:
 /*
  * Insmod can set static symbols so keep these static
  */
-static int card;
+static unsigned int card_count;
 
 static int __devinit isicom_probe(struct pci_dev *pdev,
const struct pci_device_id *ent)
@@ -1757,7 +1757,7 @@ static int __devinit isicom_probe(struct pci_dev *pdev,
u8 pciirq;
struct isi_board *board = NULL;
 
-   if (card >= BOARD_COUNT)
+   if (card_count >= BOARD_COUNT)
goto err;
 
ioaddr = pci_resource_start(pdev, 3);
@@ -1775,7 +1775,7 @@ static int __devinit isicom_probe(struct pci_dev *pdev,
board->index = index;
board->base = ioaddr;
board->irq = pciirq;
-   card++;
+   card_count++;
 
pci_set_drvdata(pdev, board);
 
@@ -1785,7 +1785,7 @@ static int __devinit isicom_probe(struct pci_dev *pdev,
"will be disabled.\n", board->base, board->base + 15,
index + 1);
retval = -EBUSY;
-   goto err;
+   goto errdec;
}
 
retval = request_irq(board->irq, isicom_interrupt,
@@ -1814,8 +1814,10 @@ errunri:
free_irq(board->irq, board);
 errunrr:
pci_release_region(pdev, 3);
-err:
+errdec:
board->base = 0;
+   card_count--;
+err:
return retval;
 }
 
@@ -1829,6 +1831,8 @@ static void __devexit isicom_remove(struct pci_dev *pdev)
 
free_irq(board->irq, board);
pci_release_region(pdev, 3);
+   board->base = 0;
+   card_count--;
 }
 
 static int __init isicom_init(void)
@@ -1836,8 +1840,6 @@ static int __init isicom_init(void)
int retval, idx, channel;
struct isi_port *port;
 
-   card = 0;
-
for(idx = 0; idx < BOARD_COUNT; idx++) {
port = _ports[idx * 16];
isi_card[idx].ports = port;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] debugging feature: SysRq-Q to print timers

2006-12-18 Thread Andrew Morton

On Mon, 18 Dec 2006 18:45:49 -0500
Dave Jones <[EMAIL PROTECTED]> wrote:

> On Mon, Dec 18, 2006 at 03:31:03PM -0800, Andrew Morton wrote:
>  > On Sat, 16 Dec 2006 08:56:58 +0100
>  > Ingo Molnar <[EMAIL PROTECTED]> wrote:
>  > 
>  > > ->
>  > > Subject: [patch] debugging feature: SysRq-Q to print timers
>  > > From: Ingo Molnar <[EMAIL PROTECTED]>
>  > > 
>  > > add SysRq-Q to print pending timers and other timer info.
>  > 
>  > I must say that I've never needed this feature or /proc/timer-list, and I
>  > don't recall ever having seen anyone request it, nor get themselves into a
>  > situation where they needed it.
> 
> /proc/timer-list is useful for profiling applications doing excessive wakeups.
> With the move towards being tickless, this is more important than ever,
> and giving users the right tools to find these problems themselves is 
> important.
> 

oic.  Nobody ever tells me squat.  

Your explanation doesn't explain why we need this info in a sysrq
triggerable form.

And what about /proc/timer-stat?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: GPL only modules

2006-12-18 Thread Linus Torvalds

On Tue, 19 Dec 2006, Paul Mackerras wrote:
>
> There is in fact a pretty substantial non-technical difference between
> static and dynamic linking.  If I create a binary by static linking
> and I include some library, and I distribute that binary to someone
> else, the recipient doesn't need to have a separate copy of the
> library, because they get one in the binary.

I agree, and I do agree that it's a real difference. 

I personally think that it's the "aggregation" issue, not a "derivation" 
issue, but I'll freely admit that it's just my personal view of the 
situation.

> In other words, static linking gives the recipient a "free" copy of
> the library, but dynamic linking doesn't.  That is why some companies'
> legal guidelines have quite different rules about the distribution of
> binaries, depending on whether they are statically or dynamically
> linked.

Yes. There is not doubt at all that regardless of anything else, if you 
link statically, you at the VERY LEAST need to have the right to 
distribute the library as part of an "aggregate work". 

> So therefore I don't think you can reasonably claim that static
> vs. dynamic linking is only a technical difference.  There are clearly
> other differences when it comes to distribution of the resulting
> binaries.

Yes. And I have actually talked about this exact issue - even in the 
absense of any "derivation" from the library, the fact that static linking 
includes a _copy_ of the library does mean that you have to have the right 
to distribute that particular copy. 

Now, under the GPL, aggregate distribution is allowed, but you still do 
need to follow the other GPL rules (ie you would need to distributed 
sources for the library - even if you don't necessarily distribute sources 
to the binary you linked _with_).

So there's no question that "dynamic linking" simplifies issues, by virtue 
of not even distributing any library code at all. I absolutely agree about 
that part.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] RTC classdev: Add sysfs support for wakeup alarm (r/w)

2006-12-18 Thread Paul Sokolovsky

Hello David,

Monday, December 18, 2006, 6:28:58 AM, you wrote:

> On Sunday 17 December 2006 11:30 am, Paul Sokolovsky wrote:

>> Small battery-powered systems, like PDAs, need a way to be
>> suspended most of the time and woken up just from time to time to
>> process pending tasks. 

> Sounds like you're thinking of this from a userspace perspective...

> Could you share some examples of such "pending tasks"?

  Well, the actual usecase, which triggered me to hack that, was a
need to write a "burn out" test script for suspend/resume for a
battery-powered ARM device (PDA), which would do suspend/resume cycle
thousands of times. And wakeup alarm is obvious, if not only, source of
automated resume events.

  Of course, I started by trying existing solutions - e.g. there's an
"atd" implementation which uses /dev/rtc, but I found it having awful
latency (>2s), then I tried to write simple C app to set alarm via
ioctl(), just to find alarm IRQs are shutdown on its exit.

  But anyway, I'm that kind of guy who think that debugging and
diagnostics are important things for *production* system, so such
sysfs alarm support just as precious as /proc/interrupts and /proc/dma
(ah, ARM Linux maintainer declined to fix broken /proc/dma support for
ARM, I forgot ;-( ).

[]

>> Obvious way to achieve this is to use timer, or 
>> alarm, wakeup. Unfortunately, this matter is bit confusing in Linux.
>> There's only one "good" "supported" way to set alarm - via ioctl() on
>> an RTC device fd. Unfortunately, this alarm is not persistent - as soon
>> as fd is closed, alarm id discharged.

> I don't think that's true in general.  Most RTCs don't even care
> whether userspace did an open() or close().  I see the S3C one does,
> and that explicitly leaves the alarm active. 

> But I see that only the SA1100/PXA and SH RTCs turn off all IRQs
> after RTC_WKALM_* requests ... that's a distinct minority.

  Oh my! I couldn't even think this can be idiosyncrasy of specific
implementation. Oh, what a world... ;-)

> So judging implementations as votes ... only two implementations
> that implement the RTC_WKALM_* call follow that rule, and most
> don't.  However, a few implementations ignore rtc_wkalrm.enabled,
> or otherwise mistreat that flag (e.g. rtc-ds1553 doesn't disable
> AIE when enabled==0), so it's clear there are some issues there.

> My vote would be that closing the FD should not turn off the alarm.
> It's supposed to be a one-shot deal anyway.

  I would agree with such behavior. But what's clear that the
behavior, whatever it is, should be consistent across implementations,
or its just a mess ;-(.

> And also, that someone audits the drivers/rtc code to make sure that
> alarm-capable drivers handle the rtc_wkalrm.enabled flag correctly;
> your patch sort of presumes that will happen, anyway.

  Yes, I mentioned, that for PXA/SA, my patch becomes actually useful
only after applying your patch (plus, with fixed TODO: here's what
I applied to handhelds.org tree:
http://handhelds.org/cgi-bin/cvsweb.cgi/linux/kernel26/drivers/rtc/rtc-sa1100.c.diff?r1=1.5=1.6=h
).

  That of course doesn't mean sysfs alarm support patch depends on
rtc-sa1100.c patch in any way (it's just PXA/SA won't actually wake up,
but sysfs patch for showing/storing alarm properties obviously doesn't
depend on any specific implementation).

>   And hmm, it'd
> be good to have rtctest.c (in Documentation/rtc.txt) test for that...
> it doesn't actually use RTC_WKALM_* calls, so it's too easy for folk
> to goof up their implementations.

>> Formal part
>> ===
>> 
>> Implement "alarm" attribute group for RTC classdevs. At this time,
>> add "since_epoch", "wakeup_enabled", and "pending" attributes. First
>> two support both read and write.

> I think you shouldn't add this group unless the RTC has methods
> to read and write the alarm; there are RTCs that don't have that
> feature.

> Also, I'd rather see a much simpler interface.  Like a single
> "alarm" attribute.  It would display as the empty string unless
> it was enabled, in which case the alarm time wouhd show.  To
> disable it, write an empty string; to enable an alarm, just write
> a valid time (in the future).  The other parameters aren't needed;
> "wakeup" is PM infrastructure (/sys/devices/.../power/wakeup),
> since it's easy to have an alarm that's not wakeup-capable.

  Yes, both of these are, or may be, true. That was really a draft,
initial version. I probably don't have knowledge/resources to make it
"right", but it would be nice if it prompted someone with more
experience/resources to tweak in such support, as well as the problems
outlined above...

> - Dave

-- 
Best regards,
 Paulmailto:[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 >

1 - 100 of 669 matches

Mail list logo