date:20070730

Re: [PATCH] hugetlbfs read() support

2007-07-30 Thread dean gaudet

On Thu, 19 Jul 2007, Bill Irwin wrote:

> On Thu, Jul 19, 2007 at 10:07:59AM -0700, Nishanth Aravamudan wrote:
> > But I do think a second reason to do this is to make hugetlbfs behave
> > like a normal fs -- that is read(), write(), etc. work on files in the
> > mountpoint. But that is simply my opinion.
> 
> Mine as well.

ditto.  here's a few other things i've run into recently:

it should be possible to use cp(1) to load large datasets into a 
hugetlbfs.

it should be possible to use ftruncate() on hugetlbfs files.  (on a tmpfs 
it's req'd to extend the file before mmaping... on hugetlbfs it returns 
EINVAL or somesuch and mmap just magically extends files.)

it should be possible to statfs() and get usage info... this works only if 
you mount with size=N.

-dean

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] add kdump_after_notifier

2007-07-30 Thread Takenori Nagano

Eric W. Biederman wrote:
> Vivek Goyal <[EMAIL PROTECTED]> writes:
> 
>>> Bernhard's idea (kdump uses panic_notifier) is very good for me. But it 
>>> isn't
>>> good for kdump user, because they want to take a dump ASAP when panicked.
>>>
>> This one is better than registering kdump as one of the users of a
>> panic_notifier() list. 
>>
>> I think if there are any crash specific actions, they should be taken care
>> in next kernel while it is booting.
>>
>> If something is really very time critical, and has to be done immediately
>> after panic (I am not sure how can one ensure that given the fact any number
>> of users can register on panic_notifier_list and you are not sure about your
>> order in the list and when one will get the control), then probably that
>> piece of code should be in kernel and called before crash_kexec().
>>
>> What is that specific piece of action which you can't do in second kernel?
>>
>> Eric, do you have any thoughts on this. I think these guys are referring
>> to failover problem where immediately after panic() they want to send
>> message to other node.
> 
> My thoughts are roughly the same as they were last time this was suggested.
> I think adding a notifier to the kexec on panic path is a bad idea.
> This functionality  sounds wrong, because it makes it hard to ensure
> reliability of the kexec on panic code path.  We are still doing to
> much on it as it stands. The working assumption on that code path
> needs to be the kernel is broken. Anything else is just asking for
> trouble.
> 
> Currently we do have a hook in place for code to be called. It is called
> the purgatory section of /sbin/kexec.  And it's user space so you can
> do whatever you want there.  Or you can wait until the second kernel
> gets more fully booted.
> 
> If we really need to do something in the kernel we can patch the kernel
> to make a function call from crash_kexec.  We don't need any notifiers
> to do this.
> 
> A further problem with notifiers is they mess up the state we would
> like to debug.  Which again makes them a problem.
> 
> 
> So at least until a specific case is made for a specific piece of code
> to get in I am totally opposed to the idea.

Hi all,

IMHO, most users don't use kdump, kdump users are only kernel developers and
enterprise users. I think enterprise users want the notifier function, because
they use some driver and software (hardware monitering driver, clustering
software, heartbeat driver, etc...) to raise their system availability.

Some popular distributers added the dump function to their own kernel. We can
use panic_notifier on LKCD (http://lkcd.sourceforge.net/), and diskdump
(http://sourceforge.net/projects/lkdump) provides own notifier function
disk_dump_notifier.

Now, kdump was merged mainline kernel. Then some distributers chose kdump.
I think kdump is greater than other dump function, but kdump has no notifier
function. This is a large problem for enterprise users.

Solutions
 1: my patch
 2: Bernhard's idea
 3: add kdump_notifier_list

I think my patch is better than other solutions, because it has only very few
impact. Vivek, Eric, how do you think?

Thanks.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: inotify and /proc/

2007-07-30 Thread Chris Friesen


Joseph Pingenot wrote:


While we're on the subject, is there some way to receive notification
  that some aspect of a process changes (in this case, stopping using
  CPU, but not exiting).


For some internal stuff a while back I did a patch that allows any 
process to register for status change notifications.  Basically, the 
registered process gets put on a list and gets the same notifications 
(killed/stopped/exited, etc.) as the real parent.


Useful for this type of thing.

Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] fs/gfs2: mark struct *_operations const

2007-07-30 Thread Denis Cheng

these struct *_operations are all method tables, thus should be const.

Signed-off-by: Denis Cheng <[EMAIL PROTECTED]>
---
 fs/gfs2/eaops.c|8 
 fs/gfs2/eaops.h|4 ++--
 fs/gfs2/glock.c|2 +-
 fs/gfs2/ops_dentry.c   |3 +--
 fs/gfs2/ops_dentry.h   |2 +-
 fs/gfs2/ops_vm.c   |4 ++--
 fs/gfs2/ops_vm.h   |4 ++--
 include/linux/dcache.h |2 +-
 include/linux/mm.h |2 +-
 9 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/fs/gfs2/eaops.c b/fs/gfs2/eaops.c
index 1ab3e9d..aa8dbf3 100644
--- a/fs/gfs2/eaops.c
+++ b/fs/gfs2/eaops.c
@@ -200,28 +200,28 @@ static int security_eo_remove(struct gfs2_inode *ip, 
struct gfs2_ea_request *er)
return gfs2_ea_remove_i(ip, er);
 }
 
-static struct gfs2_eattr_operations gfs2_user_eaops = {
+static const struct gfs2_eattr_operations gfs2_user_eaops = {
.eo_get = user_eo_get,
.eo_set = user_eo_set,
.eo_remove = user_eo_remove,
.eo_name = "user",
 };
 
-struct gfs2_eattr_operations gfs2_system_eaops = {
+const struct gfs2_eattr_operations gfs2_system_eaops = {
.eo_get = system_eo_get,
.eo_set = system_eo_set,
.eo_remove = system_eo_remove,
.eo_name = "system",
 };
 
-static struct gfs2_eattr_operations gfs2_security_eaops = {
+static const struct gfs2_eattr_operations gfs2_security_eaops = {
.eo_get = security_eo_get,
.eo_set = security_eo_set,
.eo_remove = security_eo_remove,
.eo_name = "security",
 };
 
-struct gfs2_eattr_operations *gfs2_ea_ops[] = {
+const struct gfs2_eattr_operations *gfs2_ea_ops[] = {
NULL,
_user_eaops,
_system_eaops,
diff --git a/fs/gfs2/eaops.h b/fs/gfs2/eaops.h
index 508b4f7..da2f7fb 100644
--- a/fs/gfs2/eaops.h
+++ b/fs/gfs2/eaops.h
@@ -22,9 +22,9 @@ struct gfs2_eattr_operations {
 
 unsigned int gfs2_ea_name2type(const char *name, const char **truncated_name);
 
-extern struct gfs2_eattr_operations gfs2_system_eaops;
+extern const struct gfs2_eattr_operations gfs2_system_eaops;
 
-extern struct gfs2_eattr_operations *gfs2_ea_ops[];
+extern const struct gfs2_eattr_operations *gfs2_ea_ops[];
 
 #endif /* __EAOPS_DOT_H__ */
 
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 3f0974e..00fe234 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -2095,7 +2095,7 @@ static int gfs2_glock_seq_show(struct seq_file *file, 
void *iter_ptr)
return 0;
 }
 
-static struct seq_operations gfs2_glock_seq_ops = {
+static const struct seq_operations gfs2_glock_seq_ops = {
.start = gfs2_glock_seq_start,
.next  = gfs2_glock_seq_next,
.stop  = gfs2_glock_seq_stop,
diff --git a/fs/gfs2/ops_dentry.c b/fs/gfs2/ops_dentry.c
index 793e334..1bdf016 100644
--- a/fs/gfs2/ops_dentry.c
+++ b/fs/gfs2/ops_dentry.c
@@ -108,8 +108,7 @@ static int gfs2_dhash(struct dentry *dentry, struct qstr 
*str)
return 0;
 }
 
-struct dentry_operations gfs2_dops = {
+const struct dentry_operations gfs2_dops = {
.d_revalidate = gfs2_drevalidate,
.d_hash = gfs2_dhash,
 };
-
diff --git a/fs/gfs2/ops_dentry.h b/fs/gfs2/ops_dentry.h
index 5caa3db..668a6bc 100644
--- a/fs/gfs2/ops_dentry.h
+++ b/fs/gfs2/ops_dentry.h
@@ -12,6 +12,6 @@
 
 #include 
 
-extern struct dentry_operations gfs2_dops;
+extern const struct dentry_operations gfs2_dops;
 
 #endif /* __OPS_DENTRY_DOT_H__ */
diff --git a/fs/gfs2/ops_vm.c b/fs/gfs2/ops_vm.c
index 927d739..baa3b20 100644
--- a/fs/gfs2/ops_vm.c
+++ b/fs/gfs2/ops_vm.c
@@ -159,11 +159,11 @@ out:
return ret;
 }
 
-struct vm_operations_struct gfs2_vm_ops_private = {
+const struct vm_operations_struct gfs2_vm_ops_private = {
.fault = gfs2_private_fault,
 };
 
-struct vm_operations_struct gfs2_vm_ops_sharewrite = {
+const struct vm_operations_struct gfs2_vm_ops_sharewrite = {
.fault = gfs2_sharewrite_fault,
 };
 
diff --git a/fs/gfs2/ops_vm.h b/fs/gfs2/ops_vm.h
index 4ae8f43..cfefd2d 100644
--- a/fs/gfs2/ops_vm.h
+++ b/fs/gfs2/ops_vm.h
@@ -12,7 +12,7 @@
 
 #include 
 
-extern struct vm_operations_struct gfs2_vm_ops_private;
-extern struct vm_operations_struct gfs2_vm_ops_sharewrite;
+extern const struct vm_operations_struct gfs2_vm_ops_private;
+extern const struct vm_operations_struct gfs2_vm_ops_sharewrite;
 
 #endif /* __OPS_VM_DOT_H__ */
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index aab53df..9cd948e 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -104,7 +104,7 @@ struct dentry {
struct list_head d_subdirs; /* our children */
struct list_head d_alias;   /* inode alias list */
unsigned long d_time;   /* used by d_revalidate */
-   struct dentry_operations *d_op;
+   const struct dentry_operations *d_op;
struct super_block *d_sb;   /* The root of the dentry tree */
void *d_fsdata; /* fs-specific data */
 #ifdef CONFIG_PROFILING
diff --git a/include/linux/mm.h

[rfc] balance-on-fork NUMA placement

2007-07-30 Thread Nick Piggin

Hi,

I haven't given this idea testing yet, but I just wanted to get some
opinions on it first. NUMA placement still isn't ideal (eg. tasks with
a memory policy will not do any placement, and process migrations of
course will leave the memory behind...), but it does give a bit more
chance for the memory controllers and interconnects to get evenly
loaded.

The primary reason for currently doing balance on fork is to improve
the NUMA placement of user memory, so on the basis that is useful, I
think it should be useful for kernel memory too?

---
NUMA balance-on-fork code is in a good position to allocate all of a new
process's memory on a chosen node. However, it really only starts allocating
on the correct node after the process starts running.

task and thread structures, stack, mm_struct, vmas, page tables etc. are
all allocated on the parent's node.

This patch uses memory policies to attempt to improve this. It requires
that we ask the scheduler to suggest the child's new CPU earlier in the
fork, but that is not a fundamental difference.



Index: linux-2.6/include/linux/mempolicy.h
===
--- linux-2.6.orig/include/linux/mempolicy.h
+++ linux-2.6/include/linux/mempolicy.h
@@ -141,6 +141,8 @@ void mpol_free_shared_policy(struct shar
 struct mempolicy *mpol_shared_policy_lookup(struct shared_policy *sp,
unsigned long idx);
 
+extern int mpol_prefer_cpu_start(int cpu);
+extern void mpol_prefer_cpu_end(int arg);
 extern void numa_default_policy(void);
 extern void numa_policy_init(void);
 extern void mpol_rebind_policy(struct mempolicy *pol, const nodemask_t *new);
@@ -227,6 +229,15 @@ mpol_shared_policy_lookup(struct shared_
 #define vma_policy(vma) NULL
 #define vma_set_policy(vma, pol) do {} while(0)
 
+static inline int mpol_prefer_cpu_start(int cpu)
+{
+   return 0;
+}
+
+static inline void mpol_prefer_cpu_end(int arg)
+{
+}
+
 static inline void numa_policy_init(void)
 {
 }
Index: linux-2.6/include/linux/sched.h
===
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -1460,6 +1460,7 @@ extern void FASTCALL(wake_up_new_task(st
 #else
  static inline void kick_process(struct task_struct *tsk) { }
 #endif
+extern int sched_fork_suggest_cpu(int clone_flags);
 extern void sched_fork(struct task_struct *p, int clone_flags);
 extern void sched_dead(struct task_struct *p);
 
@@ -1782,6 +1783,7 @@ static inline unsigned int task_cpu(cons
 }
 
 extern void set_task_cpu(struct task_struct *p, unsigned int cpu);
+extern void __set_task_cpu(struct task_struct *p, unsigned int cpu);
 
 #else
 
@@ -1794,6 +1796,10 @@ static inline void set_task_cpu(struct t
 {
 }
 
+extern void __set_task_cpu(struct task_struct *p, unsigned int cpu)
+{
+}
+
 #endif /* CONFIG_SMP */
 
 #ifdef HAVE_ARCH_PICK_MMAP_LAYOUT
Index: linux-2.6/kernel/fork.c
===
--- linux-2.6.orig/kernel/fork.c
+++ linux-2.6/kernel/fork.c
@@ -964,6 +964,7 @@ static struct task_struct *copy_process(
int __user *child_tidptr,
struct pid *pid)
 {
+   int cpu, mpol_arg;
int retval;
struct task_struct *p = NULL;
 
@@ -989,10 +990,13 @@ static struct task_struct *copy_process(
if (retval)
goto fork_out;
 
+   cpu = sched_fork_suggest_cpu(clone_flags);
+   mpol_arg = mpol_prefer_cpu_start(cpu);
+
retval = -ENOMEM;
p = dup_task_struct(current);
if (!p)
-   goto fork_out;
+   goto fork_mpol;
 
rt_mutex_init_task(p);
 
@@ -1183,7 +1187,7 @@ static struct task_struct *copy_process(
INIT_LIST_HEAD(>ptrace_children);
INIT_LIST_HEAD(>ptrace_list);
 
-   /* Perform scheduler related setup. Assign this task to a CPU. */
+   /* Perform scheduler related setup. */
sched_fork(p, clone_flags);
 
/* Need tasklist lock for parent etc handling! */
@@ -1193,6 +1197,7 @@ static struct task_struct *copy_process(
p->ioprio = current->ioprio;
 
/*
+* Assign this task to a CPU.
 * The task hasn't been attached yet, so its cpus_allowed mask will
 * not be changed, nor will its assigned CPU.
 *
@@ -1202,9 +1207,10 @@ static struct task_struct *copy_process(
 * parent's CPU). This avoids alot of nasty races.
 */
p->cpus_allowed = current->cpus_allowed;
-   if (unlikely(!cpu_isset(task_cpu(p), p->cpus_allowed) ||
-   !cpu_online(task_cpu(p
-   set_task_cpu(p, smp_processor_id());
+   if (unlikely(!cpu_isset(cpu, p->cpus_allowed) ||
+   !cpu_online(cpu)))
+   cpu = smp_processor_id();
+   __set_task_cpu(p, cpu);
 
/* CLONE_PARENT re-uses the old

Re: [SPARC32] NULL pointer derefference

2007-07-30 Thread Mark Fortescue


Hi David,



One possible issue is sequencing, perhaps the stack argument copy
is occuring before the new context is setup properly on sun4c.



I think it is somthing related to this but too much has changed for me to 
work out what is going on. At present, I don't have a good enough 
understanding of the virtual memory system and how it interracts with the 
sun4c mmu.


The original code did a job lot of pte stuf in install_arg_page. The new 
code seems to replace this using get_user_pages but I have not worked out 
how get_user_pages gets to the point at which it allocated pte's i.e. 
maps the stack memory it is about to put the arguments into.



Another issue might be the new flush_cache_page() call in this
new code in fs/exec.c, there are now cases where flush_cache_page()
will be called on kernel addresses, and sun4c's implementation might
not like that at all.


I commented out the flush_cache_page callmade in the new code. This had no 
effect on the problem. Other tests have shown it is breaking earlier than 
this.


I am going to try to narrow down exactly where the pointer gets messed up 
as this should help.


Regards
Mark Fortescue.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 20/26] saa7134: fix thread shutdown handling

2007-07-30 Thread Greg KH

On Tue, Jul 31, 2007 at 02:05:48AM -0300, Mauro Carvalho Chehab wrote:
> Hi Greg,
> 
> Em Seg, 2007-07-30 ??s 21:33 -0700, Greg KH escreveu:
> > anexo Documento somente texto
> > (saa7134-fix-thread-shutdown-handling.patch)
> > -stable review patch.  If anyone has any objections, please let us know.
> > 
> > --
> > 
> > From: Jeff Mahoney <[EMAIL PROTECTED]>
> > 
> > This patch changes the test for the thread pid from >= 0 to > 0.
> > 
> > When the saa8134 driver initialization fails after a certain point, it goes
>  ^^^
> The patch is OK. however, the driver name is saa7134.

Heh, thanks, I've fixed the text of the patch.

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [-mm PATCH 6/9] Memory controller add per container LRU and reclaim (v4)

2007-07-30 Thread YAMAMOTO Takashi

> +unsigned long mem_container_isolate_pages(unsigned long nr_to_scan,
> + struct list_head *dst,
> + unsigned long *scanned, int order,
> + int mode, struct zone *z,
> + struct mem_container *mem_cont,
> + int active)
> +{
> + unsigned long nr_taken = 0;
> + struct page *page;
> + unsigned long scan;
> + LIST_HEAD(mp_list);
> + struct list_head *src;
> + struct meta_page *mp;
> +
> + if (active)
> + src = _cont->active_list;
> + else
> + src = _cont->inactive_list;
> +
> + for (scan = 0; scan < nr_to_scan && !list_empty(src); scan++) {
> + mp = list_entry(src->prev, struct meta_page, lru);

what prevents another thread from freeing mp here?

> + spin_lock(_cont->lru_lock);
> + if (mp)
> + page = mp->page;
> + spin_unlock(_cont->lru_lock);
> + if (!mp)
> + continue;

YAMAMOTO Takashi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: regression on HP zx1 platform from ACPI autoload modules patches

2007-07-30 Thread Tony Luck

> > During bootup it panics with:
> >
> > Kernel panic - not syncing: Unable to find SBA IOMMU: Try a generic or DIG 
> > kernel

I think that the fix for this has already gone into Linus's tree (on
Friday).  Commit SHA1
for the patch that should fix this is:
8c8eb78f673c07b60f31751e1e47ac367c60c6b7

If latest GIT tree still has this problem, then please let us know.

-Tony
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [SPARC32] NULL pointer derefference

2007-07-30 Thread Ollie Wild

On 7/29/07, Mark Fortescue <[EMAIL PROTECTED]> wrote:
> Hi All,
>
> Unfortunatly Sparc32 sun4c low level memory management apears to be
> incompatible with commit b6a2fea39318e43fee84fa7b0b90d68bed92d2ba
> mm: variable length argument support.

I feel like I ought to help out with this since it's my change which
broke things, but I don't have access to a Sparc32 box.  Does anyone
have a remotely rebootable machine I can use?

Ollie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] Introduce CONFIG_SUSPEND (updated)

2007-07-30 Thread Len Brown

On Sunday 29 July 2007 17:27, Rafael J. Wysocki wrote:
> Still, there are many other files in which CONFIG_PM can be replaced
> with CONFIG_PM_SLEEP or even with CONFIG_SUSPEND, but they can be updated in
> the future.

There is #ifdef CONFIG_PM
around all the .suspend and .resume methods.

Technically they are PM_DEVICE_STATES or something,
that could really be under PM, and both SUSPEND and HIBERNATE
would depend on PM_DEVICE_STATES, but it is also possible to have
PM_DEVICE_STATES without SUSPEND and HIBERNATE.

oh no, I just suggested more instead of fewer config options:-O

but i agree, this can wait a bit...

-Len
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 20/26] saa7134: fix thread shutdown handling

2007-07-30 Thread Mauro Carvalho Chehab

Hi Greg,

Em Seg, 2007-07-30 às 21:33 -0700, Greg KH escreveu:
> anexo Documento somente texto
> (saa7134-fix-thread-shutdown-handling.patch)
> -stable review patch.  If anyone has any objections, please let us know.
> 
> --
> 
> From: Jeff Mahoney <[EMAIL PROTECTED]>
> 
> This patch changes the test for the thread pid from >= 0 to > 0.
> 
> When the saa8134 driver initialization fails after a certain point, it goes
   ^^^
The patch is OK. however, the driver name is saa7134.

-- 
Cheers,
Mauro

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: TCP SACK issue, hung connection, tcpdump included

2007-07-30 Thread Darryl L. Miles



I've been able to capture a tcpdump from both ends during the problem 
and its my belief there is a bug in 2.6.20.1 (at the client side) in 
that it issues a SACK option for an old sequence which the current 
window being advertised is beyond it.  This is the most concerning issue 
as the integrity of the sequence numbers doesn't seem right (to my 
limited understanding anyhow).


There is another concern of why the SERVER performed a retransmission in 
the first place, when the tcpdump shows the ack covering it has been seen.



I have made available the full dumps at:

http://darrylmiles.org/snippets/lkml/20070731/


There are some changes in 2.6.22 that appear to affect TCP SACK handling 
does this fix a known issue ?




This sequence is interesting from the client side:

03:58:56.419034 IP SERVER.ssh > CLIENT.43726: . 26016:27464(1448) ack 
4239 win 2728  # S1
03:58:56.419100 IP CLIENT.43726 > SERVER.ssh: . ack 27464 win 501 
 # C1
03:58:56.422019 IP SERVER.ssh > CLIENT.43726: P 27464:28176(712) ack 
4239 win 2728  # S2
03:58:56.422078 IP CLIENT.43726 > SERVER.ssh: . ack 28176 win 501 
 # C2


The above 4 packets look as expect to me.  Then we suddenly see a 
retransmission of 26016:27464.


03:58:56.731597 IP SERVER.ssh > CLIENT.43726: . 26016:27464(1448) ack 
4239 win 2728  # S3


So the client instead of discarding the retransmission of duplicate 
segment, issues a SACK.


03:58:56.731637 IP CLIENT.43726 > SERVER.ssh: . ack 28176 win 501 
> # C3


In response to this the server is confused ???  It responds to 
sack{26016:27464} but the client is also saying "wnd 28176".  Wouldn't 
the server expect "wnd < 26016" to there is a segment to retransmit ?


03:58:57.322800 IP SERVER.ssh > CLIENT.43726: . 26016:27464(1448) ack 
4239 win 2728  # S4





Now viewed from the server side:

03:58:56.365655 IP SERVER.ssh > CLIENT.43726: . 26016:27464(1448) ack 
4239 win 2728  # S1
03:58:56.365662 IP SERVER.ssh > CLIENT.43726: P 27464:28176(712) ack 
4239 win 2728  # S2
03:58:56.374633 IP CLIENT.43726 > SERVER.ssh: . ack 24144 win 488 
 # propagation delay
03:58:56.381630 IP CLIENT.43726 > SERVER.ssh: . ack 25592 win 501 
 # propagation delay
03:58:56.384503 IP CLIENT.43726 > SERVER.ssh: . ack 26016 win 501 
 # propagation delay
03:58:56.462583 IP CLIENT.43726 > SERVER.ssh: . ack 27464 win 501 
 # C1
03:58:56.465707 IP CLIENT.43726 > SERVER.ssh: . ack 28176 win 501 
 # C2


The above packets just as expected.

03:58:56.678546 IP SERVER.ssh > CLIENT.43726: . 26016:27464(1448) ack 
4239 win 2728  # S3


I guess the above packet is indeed a retransmission of "# S1" but why 
was it retransmitted, when we can clearly see "# C1" above acks this 
segment ?  It is not even as if the retransmission escaped before the 
kernel had time to process the ack, as 200ms elapsed.  CONCERN NUMBER TWO


03:58:56.774778 IP CLIENT.43726 > SERVER.ssh: . ack 28176 win 501 
> # C3


CONCERN NUMBER ONE, why in response to that escaped retransmission was a 
SACK the appropriate response ?  When at the time the client sent the 
SACK it had received all data upto 28176, a fact it continues to 
advertise in the "# C3" packet above.


There is nothing wrong is the CLIENT expecting to see a retransmission 
of that segment at this point in time that is an expected circumstance.


03:58:57.269529 IP SERVER.ssh > CLIENT.43726: . 26016:27464(1448) ack 
4239 win 2728  # S4






Darryl
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] add check do_direct_IO() return val

2007-07-30 Thread Badari


Joe Jin wrote:

Hmm.. in this config file, whats causing DIO to panic ? Which test actually
passing faulty buffer ?




By my testing, just defined job3 and job10 will also get the panic, but if
only have one of them, panic will not appear. the faulty buffer maybe passed
by mmap.
  


Thanks. looks lile my machine crashed too while running the tests. I 
will take a look.


Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] flush icache before set_pte take6. [4/4] optimization for cpus other than montecito

2007-07-30 Thread KAMEZAWA Hiroyuki

On Tue, 31 Jul 2007 13:29:32 +0900
KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> wrote:

> On Mon, 30 Jul 2007 22:15:50 -0600
> "David Mosberger-Tang" <[EMAIL PROTECTED]> wrote:
> 
> > This seems crazy to me.  Flushing should occur according to the
> > *architecture*, not model-by-model.  Even if we happen to get "lucky"
> > on pre-Montecito CPUs, that doesn't justify such ugly hacks.  
> 

BTW, how about the quality of patch[2/4] ?

If it is enough well, I want it to be merged

please..
-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/5] x86_64 EFI support -v3

2007-07-30 Thread Eric W. Biederman

"Huang, Ying" <[EMAIL PROTECTED]> writes:

> Looking forward to your comments,

After reading through the patches.  I don't see any compelling
reason to use the efi runtime support.  It looks like we get
the interesting support for efi without out.  The graphics
and the memory map.  We have ACPI which already can do handle
this case.

Further the code has dead code sitting in the patches that you never
intend to use. EFI_MEMMAP.

The boot protocols are now out of sync for arch/i386 and arch/x86_64
for no obvious reason.

Using efi_set_virtual means kdump doesn't work which means that no
one is going to use this in a prebuilt kernel.

So mostly this patchset looks like a bad idea.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/26] 2.6.21.7 -stable review

2007-07-30 Thread Greg KH

On Mon, Jul 30, 2007 at 09:30:47PM -0700, Greg KH wrote:
> This is the start of the stable review cycle for the 2.6.21.7 release.
> There are 26 patches in this series, all will be posted as a response to
> this one.  If anyone has any issues with these being applied, please let
> us know.  If anyone is a maintainer of the proper subsystem, and wants
> to add a Signed-off-by: line to the patch, please respond with it.

Rolled up patch is at:
kernel.org/pub/linux/kernel/v2.6/stable-testing/patch-2.6.21.7-rc1.gz

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] Debug handling of early spurious interrupts

2007-07-30 Thread Andrew Morton

On Tue, 31 Jul 2007 11:25:26 +0900 Fernando Luis Vázquez Cao <[EMAIL 
PROTECTED]> wrote:

> > runtime.  Some drivers don't get used by many people and users of some
> > architectures (esp embedded) tend to lag kernel.org by a long time.  So it
> > could be years before all the fallout from this change is finally wrapped
> > up.
> Yes, that is a big concern. However, the same embedded people is
> starting to use both kexec and kdump, so they may suffer the issues we
> are trying to weed out anyway, even if these patches are not applied.
> The difference is that with this new functionality it is possible to
> catch potential problems relatively easily, because any incorrect
> behaviour this may cause will be easily reproducible and, in most cases,
> will reveal itself early at boot time.
> 
> As things stand now, I guess we will keep seeing occasional crashes and
> strange behaviour in kexec-booted kernels, which in some cases will be
> due to incorrect handling of spurious interrupts. Besides, such problems
> are really difficult to reproduce because, commonly, we would need to
> hit an obscure corner case.

Please have a think about what we can do to aid this debugging.  For
example, add an uncondtional printk into request_irq() for now which tells
us what irq is being registered - a print_symbol() of the irq_handler_t
would be pretty good, although I suspect there are a lot of interrupt
handlers with the same name..


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/5] x86_64 EFI support -v3: EFI base support

2007-07-30 Thread Eric W. Biederman

"Huang, Ying" <[EMAIL PROTECTED]> writes:

> Changelog between v2 and v3:
>
> 1. The EFI callwrapper is re-implemented in assembler.
>
> ---
>
> This patch adds basic support for EFI x86_64 system. The main file of
> the patch is the addition of efi.c for x86_64. This file is modeled
> after the EFI IA32 avatar. EFI initialization are implemented in
> efi.c. Some x86_64 specifics are worth noting here. On x86_64,
> parameters passed to UEFI firmware services need to follow the UEFI
> calling convention. For this purpose, a set of functions named
> lin2win ( is the number of parameters) are implemented. EFI
> function calls are wrapped before calling the firmware service.


Since the code to generate the e820 map from the efi memory map has
been added to elilo (odd but ok) why does this patch continue to
have code for playing with the efi memory map?

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 5/5] x86_64 EFI support -v3: EFI document

2007-07-30 Thread Eric W. Biederman

"Huang, Ying" <[EMAIL PROTECTED]> writes:

> This patch adds document for EFI x86_64 support. The boot parameters
> added are documented in Documentation/i386/zero-page.txt. The setup
> and operation guide of EFI based system is documented in
> Documentation/x86_64/uefi.txt.
>
> Signed-off-by: Chandramouli Narayanan <[EMAIL PROTECTED]>
> Signed-off-by: Huang Ying <[EMAIL PROTECTED]>
>
> ---
>
>  i386/zero-page.txt |   18 ++
>  x86_64/uefi.txt|   42 ++
>  2 files changed, 56 insertions(+), 4 deletions(-)
>
> Index: linux-2.6.23-rc1/Documentation/i386/zero-page.txt
> ===
> --- linux-2.6.23-rc1.orig/Documentation/i386/zero-page.txt 2007-07-30
> 11:28:45.0 +0800
> +++ linux-2.6.23-rc1/Documentation/i386/zero-page.txt 2007-07-30
> 11:29:28.0 +0800
> @@ -31,11 +31,11 @@
>   0xb0 - 0x13fFree. Add more parameters here if you really 
> need them.
>   0x140- 0x1beEDID_INFO Video mode setup
>  
> -0x1c4unsigned long   EFI system table pointer
> -0x1c8unsigned long   EFI memory descriptor size
> -0x1ccunsigned long   EFI memory descriptor version
> +0x1c4unsigned long   EFI system table pointer*
> +0x1c8unsigned long   EFI memory descriptor size*
> +0x1ccunsigned long   EFI memory descriptor version*
>  0x1d0unsigned long   EFI memory descriptor map pointer
> -0x1d4unsigned long   EFI memory descriptor map size
> +0x1d4unsigned long   EFI memory descriptor map size*
>  0x1e0unsigned long   ALT_MEM_K, alternative mem check, in Kb
>  0x1e4unsigned long   Scratch field for the kernel setup code
>  0x1e8charnumber of entries in E820MAP (below)
> @@ -87,3 +87,13 @@
>  0x2d0 - 0xd00E820MAP
>  0xd00 - 0xeffEDDBUF (edd.S) for disk signature read sector
>  0xd00 - 0xeebEDDBUF (edd.S) for edd data
> +
> +Changes for x86_64 implementation:
> +-
> +For alignment purposes, the following parameters are rearranged.
> +
> +0x1b8unsigned long   EFI system table pointer
> +0x1c0unsigned long   EFI Loader signature
> +0x1c4unsigned long   EFI memory descriptor size
> +0x1c8unsigned long   EFI memory descriptor version
> +0x1ccunsigned long   EFI memory descriptor map size

Huh?  It is the same protocol.  Unless there are specific issues such
as pointers being to small we should remain 100% the same for both
arch/i386 and arch/x86_64   This variation looks like a serious
bug.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2.6.22] negative time jump

2007-07-30 Thread Vasily Averin

john stultz wrote:
> On 7/29/07, Vasily Averin <[EMAIL PROTECTED]> wrote:
>> I've investigated why my testnode freezes. When I found that node is freezed
>> again I've started to press Sysrq keys and noticed the following negative 
>> time jump.
>>
>> Could anybody please help me to understand the reasons of this issue?
>>
>> --- VvS comment: some pre-history: node boot
>> Jul 27 13:58:10 ts28 Linux version 2.6.22 ([EMAIL PROTECTED]) (gcc version 
>> 3.4.6
>> 20060404 (Red Hat 3.4.6-3)) #11 SMP Fri Jul 27 12:47:45 MSD 2007
>> Jul 27 13:58:10 ts28 Command line: ro root=LABEL=/1 console=ttyS0,115200
>> console=tty debug silencelevel=8 [EMAIL PROTECTED] acpc=noirq clocksource=tsc
> 
> clocksource=tsc?
> 
> I suspect you're forcing the clocksource as its not selected by
> default. Could you provide dmesg output without that option. It might
> shed some light as to why the clocksource isn't chosen.

Default clocksource was acpi-pm. But the node have similar behavior when I've
used this clocksource. (please look at the following logs)

Originally I've investigated SATA-related issue and noticed some strange with
timers. When I've reproduced situation again I've pressed Alt+Sysrq+Q keys and
noticed that it shows incorrect time (it shows 431968 sec from booting but
according to the serial console timestamps it should be ~445954 sec). Then you
can see that time was jumped back, next timestamp is 431965 sec, 3 sec back.

> It may very well be your TSCs are not synched or are otherwise not
> reliable for timekeeping , and thus time is not consistent between
> cpus.

However IMHO it cannot explain time loops (~4400 sec) that I'm observing right 
now.

thank you,
Vasily Averin

PS. You can look at the other logs and find more details in my attachments to
http://bugzilla.kernel.org/show_bug.cgi?id=8650

Jul 12 09:00:16 ts28 Linux version 2.6.22 ([EMAIL PROTECTED]) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-3)) #4 SMP Thu Jul 12 07:58:27 MSD 2007
Jul 12 09:00:16 ts28 Command line: ro root=LABEL=/1 console=ttyS0,115200 console=tty debug silencelevel=8 [EMAIL PROTECTED] acpi=noirq
Jul 12 09:00:16 ts28 BIOS-provided physical RAM map:
Jul 12 09:00:16 ts28  BIOS-e820:  - 0009d800 (usable)
Jul 12 09:00:16 ts28  BIOS-e820: 0009d800 - 000a (reserved)
Jul 12 09:00:16 ts28  BIOS-e820: 000f - 0010 (reserved)
Jul 12 09:00:16 ts28  BIOS-e820: 0010 - dfee (usable)
Jul 12 09:00:16 ts28  BIOS-e820: dfee - dfee3000 (ACPI NVS)
Jul 12 09:00:16 ts28  BIOS-e820: dfee3000 - dfef (ACPI data)
Jul 12 09:00:16 ts28  BIOS-e820: dfef - dff0 (reserved)
Jul 12 09:00:16 ts28  BIOS-e820: fec0 - 0001 (reserved)
Jul 12 09:00:16 ts28 Entering add_active_range(0, 0, 157) 0 entries of 3200 used
Jul 12 09:00:16 ts28 Entering add_active_range(0, 256, 917216) 1 entries of 3200 used
Jul 12 09:00:16 ts28 end_pfn_map = 1048576
Jul 12 09:00:16 ts28 DMI 2.3 present.
Jul 12 09:00:16 ts28 ACPI: RSDP 000F6980, 0014 (r0 VIAK8 )
Jul 12 09:00:16 ts28 ACPI: RSDT DFEE3000, 002C (r1 VIAK8  AWRDACPI 42302E31 AWRD0)
Jul 12 09:00:16 ts28 ACPI: FACP DFEE3040, 0074 (r1 VIAK8  AWRDACPI 42302E31 AWRD0)
Jul 12 09:00:16 ts28 ACPI: DSDT DFEE30C0, 4A4C (r1 VIAK8  AWRDACPI 1000 MSFT  10E)
Jul 12 09:00:16 ts28 ACPI: FACS DFEE, 0040
Jul 12 09:00:16 ts28 ACPI: APIC DFEE7B40, 0068 (r1 VIAK8  AWRDACPI 42302E31 AWRD0)
Jul 12 09:00:16 ts28 Scanning NUMA topology in Northbridge 24
Jul 12 09:00:16 ts28 Number of nodes 2
Jul 12 09:00:16 ts28 Node 0 MemBase  Limit dfee
Jul 12 09:00:16 ts28 Entering add_active_range(0, 0, 157) 0 entries of 3200 used
Jul 12 09:00:16 ts28 Entering add_active_range(0, 256, 917216) 1 entries of 3200 used
Jul 12 09:00:16 ts28 Skipping disabled node 1
Jul 12 09:00:16 ts28 NUMA: Using 63 for the hash shift.
Jul 12 09:00:16 ts28 Using node hash shift of 63
Jul 12 09:00:16 ts28 Bootmem setup node 0 -dfee
Jul 12 09:00:16 ts28 Zone PFN ranges:
Jul 12 09:00:16 ts28   DMA 0 -> 4096
Jul 12 09:00:16 ts28   DMA324096 ->  1048576
Jul 12 09:00:16 ts28   Normal1048576 ->  1048576
Jul 12 09:00:16 ts28 early_node_map[2] active PFN ranges
Jul 12 09:00:16 ts28 0:0 ->  157
Jul 12 09:00:16 ts28 0:  256 ->   917216
Jul 12 09:00:16 ts28 On node 0 totalpages: 917117
Jul 12 09:00:16 ts28   DMA zone: 56 pages used for memmap
Jul 12 09:00:16 ts28 
Jul 12 09:00:16 ts28   DMA zone: 2018 pages reserved
Jul 12 09:00:16 ts28   DMA zone: 1923 pages, LIFO batch:0
Jul 12 09:00:16 ts28   DMA32 zone: 12484 pages used for memmap
Jul 12 09:00:16 ts28   DMA32 zone: 900636 pages, LIFO batch:31
Jul 12 09:00:16 ts28   Normal zone: 0 pages used for memmap
Jul 12 09:00:16 ts28 ACPI: PM-Timer IO Port: 0x4008
Jul 12 09:00:16 ts28 ACPI: Local APIC address 0xfee0
Jul 12 09:00:16 ts28

[patch 24/26] sky2: workaround for lost IRQ

2007-07-30 Thread Greg KH

-stable review patch.  If anyone has any objections, please let us know.

--

This patch restores a couple of workarounds from 2.6.16:
 * restart transmit moderation timer in case it expires during IRQ routine
 * default to having 10 HZ watchdog timer.
At this point it more important not to hang than to worry about the
power cost.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
Cc: Jeff Garzik <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
---
 drivers/net/sky2.c |9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

--- linux-2.6.21.6.orig/drivers/net/sky2.c
+++ linux-2.6.21.6/drivers/net/sky2.c
@@ -95,7 +95,7 @@ static int disable_msi = 0;
 module_param(disable_msi, int, 0);
 MODULE_PARM_DESC(disable_msi, "Disable Message Signaled Interrupt (MSI)");
 
-static int idle_timeout = 0;
+static int idle_timeout = 100;
 module_param(idle_timeout, int, 0);
 MODULE_PARM_DESC(idle_timeout, "Watchdog timer for lost interrupts (ms)");
 
@@ -2433,6 +2433,13 @@ static int sky2_poll(struct net_device *
 
work_done = sky2_status_intr(hw, work_limit);
if (work_done < work_limit) {
+   /* Bug/Errata workaround?
+* Need to kick the TX irq moderation timer.
+*/
+   if (sky2_read8(hw, STAT_TX_TIMER_CTRL) == TIM_START) {
+   sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_STOP);
+   sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_START);
+   }
netif_rx_complete(dev0);
 
sky2_read32(hw, B0_Y2_SP_LISR);

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/5] x86_64 EFI support -v3: EFI framebuffer driver

2007-07-30 Thread Eric W. Biederman

"Huang, Ying" <[EMAIL PROTECTED]> writes:

> This patch adds Graphics Output Protocol support to the kernel.
> UEFI2.0 spec deprecates Universal Graphics Adapter (UGA) protocol and
> only Graphics Output Protocol (GOP) is produced. Therefore, the boot
> loader needs to query the UEFI firmware with appropriate Output
> Protocol and pass the video information to the kernel. As a result of
> GOP protocol, an EFI framebuffer driver is needed for displaying
> console messages. The patch adds a EFI framebuffer driver. The EFI
> frame buffer driver in this patch is based on the Intel Mac
> framebuffer driver.
>
> The ELILO bootloader takes care of passing the video information as
> appropriate for EFI firmware.

Am I correct in understanding that you are not using any of the efi
runtime service infrastructure you have built up in other patches?

So far this looks like the only useful piece of the patchset.
If this doesn't use the runtime services I don't see any point
in having them.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] add check do_direct_IO() return val

2007-07-30 Thread Joe Jin

> Hmm.. in this config file, whats causing DIO to panic ? Which test actually
> passing faulty buffer ?
> 

By my testing, just defined job3 and job10 will also get the panic, but if
only have one of them, panic will not appear. the faulty buffer maybe passed
by mmap.

Thanks,
Joe
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 23/26] NTP: remove clock_was_set() call to prevent deadlock

2007-07-30 Thread Greg KH

-stable review patch.  If anyone has any objections, please let us know.

--

The clock_was_set() call in seconds_overflow() which happens only when
leap seconds are inserted / deleted is wrong in two aspects:

1. it results in a call to on_each_cpu() with interrupts disabled
2. it is potential deadlock source vs. call_lock in smp_call_function()

The only possible side effect of the removal might be, that an absolute
CLOCK_REALTIME timer fires 1 second too late, in the rare case of leap
second deletion and an absolute CLOCK_REALTIME timer which expires in
the affected time frame. It will never fire too early.

This was probably observed by the reporter of a June 30th -> July 1st
hang: http://lkml.org/lkml/2007/7/3/

A similar problem was observed by Dave Jones, who provided a screen shot
with a lockdep back trace, which allowed to analyse the problem.

Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]>
Cc: john stultz <[EMAIL PROTECTED]>
Cc: Dave Jones <[EMAIL PROTECTED]>
Cc: Ingo Molnar <[EMAIL PROTECTED]>
Cc: Vincent Fortier <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>

---
 kernel/time/ntp.c |2 --
 1 file changed, 2 deletions(-)

--- linux-2.6.21.6.orig/kernel/time/ntp.c
+++ linux-2.6.21.6/kernel/time/ntp.c
@@ -120,7 +120,6 @@ void second_overflow(void)
 */
time_interpolator_update(-NSEC_PER_SEC);
time_state = TIME_OOP;
-   clock_was_set();
printk(KERN_NOTICE "Clock: inserting leap second "
"23:59:60 UTC\n");
}
@@ -135,7 +134,6 @@ void second_overflow(void)
 */
time_interpolator_update(NSEC_PER_SEC);
time_state = TIME_WAIT;
-   clock_was_set();
printk(KERN_NOTICE "Clock: deleting leap second "
"23:59:59 UTC\n");
}

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 25/26] V4L: bttv: fix v4l1 api usage breaking the driver

2007-07-30 Thread Greg KH

-stable review patch.  If anyone has any objections, please let us know.

--

From: Trent Piepho <[EMAIL PROTECTED]>

If one uses a V4L *one* application, such as vlc or mplayer's v4l driver, as
the first user after the driver is loaded, the driver wedges itself and will
never capture properly.  Even if one uses a V4L2 application later, it still
won't work.

If one uses a V4L *two* application first, such as tvtime or mplayer's v4l2
driver, then the driver will be ok.  One can then run a V4L1 application, and
it will work.

It turns out the problem is with norm changing and the crop support that was
added in 2.6.21.  The driver defaults to PAL, and keeps the last norm it was
set too across opens.  If one changes the norm via V4L1, the cropping
parameters are not reset like they should be, and they'll remain broken across
device opens.

This patch removes the direct setting of btv->tvnorm in the V4L1 ioctl
VIDIOCSCHAN handler.  The norm is set via the existing call to set_input(),
which calls set_tvnorm(), which will reset the cropping values now that it is
able to detect the norm change.

Signed-off-by: Trent Piepho <[EMAIL PROTECTED]>
Signed-off-by: Michael Krufky <[EMAIL PROTECTED]>
Signed-off-by: Mauro Carvalho Chehab <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
---
(cherry picked from commit 333408f21590d50397f3004e3f87070fa8f52c51)

 drivers/media/video/bt8xx/bttv-driver.c |   13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

--- linux-2.6.21.6.orig/drivers/media/video/bt8xx/bttv-driver.c
+++ linux-2.6.21.6/drivers/media/video/bt8xx/bttv-driver.c
@@ -1313,7 +1313,7 @@ set_tvnorm(struct bttv *btv, unsigned in
 
 /* Call with btv->lock down. */
 static void
-set_input(struct bttv *btv, unsigned int input)
+set_input(struct bttv *btv, unsigned int input, unsigned int norm)
 {
unsigned long flags;
 
@@ -1332,7 +1332,7 @@ set_input(struct bttv *btv, unsigned int
}
audio_input(btv,(input == bttv_tvcards[btv->c.type].tuner ?
   TVAUDIO_INPUT_TUNER : TVAUDIO_INPUT_EXTERN));
-   set_tvnorm(btv,btv->tvnorm);
+   set_tvnorm(btv, norm);
i2c_vidiocschan(btv);
 }
 
@@ -1423,7 +1423,7 @@ static void bttv_reinit_bt848(struct btt
 
init_bt848(btv);
btv->pll.pll_current = -1;
-   set_input(btv,btv->input);
+   set_input(btv, btv->input, btv->tvnorm);
 }
 
 static int get_control(struct bttv *btv, struct v4l2_control *c)
@@ -1993,8 +1993,7 @@ static int bttv_common_ioctls(struct btt
return 0;
}
 
-   btv->tvnorm = v->norm;
-   set_input(btv,v->channel);
+   set_input(btv, v->channel, v->norm);
mutex_unlock(>lock);
return 0;
}
@@ -2130,7 +2129,7 @@ static int bttv_common_ioctls(struct btt
if (*i > bttv_tvcards[btv->c.type].video_inputs)
return -EINVAL;
mutex_lock(>lock);
-   set_input(btv,*i);
+   set_input(btv, *i, btv->tvnorm);
mutex_unlock(>lock);
return 0;
}
@@ -4762,7 +4761,7 @@ static int __devinit bttv_probe(struct p
bt848_hue(btv,32768);
bt848_sat(btv,32768);
audio_mute(btv, 1);
-   set_input(btv,0);
+   set_input(btv, 0, btv->tvnorm);
bttv_crop_reset(>crop[0], btv->tvnorm);
btv->crop[1] = btv->crop[0]; /* current = default */
disclaim_vbi_lines(btv);

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 26/26] V4L: cx88-blackbird: fix vidioc_g_tuner never ending list of tuners

2007-07-30 Thread Greg KH

-stable review patch.  If anyone has any objections, please let us know.

--

From: Jelle Foks <[EMAIL PROTECTED]>

v4l-info and other programs would loop indefinitely while querying the
tuners for cx88-blackbird cards.

The cause was that vidioc_g_tuner didn't return an error value for
qctrl->id != 0, making the application think there is a never ending
list of tuners...

This patch adds the same index check as done in vidioc_g_tuner() in
cx88-video.

Signed-off-by: Jelle Foks <[EMAIL PROTECTED]>
Signed-off-by: Michael Krufky <[EMAIL PROTECTED]>
Signed-off-by: Mauro Carvalho Chehab <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
---
(cherry picked from commit f057131fb6eb2c45f6023e3da41ccd6e4e71aee9)

 drivers/media/video/cx88/cx88-blackbird.c |2 ++
 1 file changed, 2 insertions(+)

--- linux-2.6.21.6.orig/drivers/media/video/cx88/cx88-blackbird.c
+++ linux-2.6.21.6/drivers/media/video/cx88/cx88-blackbird.c
@@ -1034,6 +1034,8 @@ static int vidioc_g_tuner (struct file *
 
if (unlikely(UNSET == core->tuner_type))
return -EINVAL;
+   if (0 != t->index)
+   return -EINVAL;
 
strcpy(t->name, "Television");
t->type   = V4L2_TUNER_ANALOG_TV;

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 20/26] saa7134: fix thread shutdown handling

2007-07-30 Thread Greg KH

-stable review patch.  If anyone has any objections, please let us know.

--

From: Jeff Mahoney <[EMAIL PROTECTED]>

This patch changes the test for the thread pid from >= 0 to > 0.

When the saa8134 driver initialization fails after a certain point, it goes
through the complete shutdown process for the driver.  Part of shutting it
down includes tearing down the thread for tv audio.

The test for tearing down the thread tests for >= 0.  Since the dev
structure is kzalloc'd, the test will always be true if we haven't tried to
start the thread yet.  We end up waiting on pid 0 to complete, which will
never happen, so we lock up.

This bug was observed in Novell Bugzilla 284718, when request_irq() failed.

Signed-off-by: Jeff Mahoney <[EMAIL PROTECTED]>
Acked-by: Mauro Carvalho Chehab <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
---

 drivers/media/video/saa7134/saa7134-tvaudio.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-2.6.21.6.orig/drivers/media/video/saa7134/saa7134-tvaudio.c
+++ linux-2.6.21.6/drivers/media/video/saa7134/saa7134-tvaudio.c
@@ -1006,7 +1006,7 @@ int saa7134_tvaudio_init2(struct saa7134
 int saa7134_tvaudio_fini(struct saa7134_dev *dev)
 {
/* shutdown tvaudio thread */
-   if (dev->thread.pid >= 0) {
+   if (dev->thread.pid > 0) {
dev->thread.shutdown = 1;
wake_up_interruptible(>thread.wq);
wait_for_completion(>thread.exit);

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 21/26] serial: clear proper MPSC interrupt cause bits

2007-07-30 Thread Greg KH

-stable review patch.  If anyone has any objections, please let us know.

--

From: Jay Lubomirski <[EMAIL PROTECTED]>

The interrupt clearing code in mpsc_sdma_intr_ack() mistakenly clears the
interrupt for both controllers instead of just the one its supposed to.
This can result in the other controller appearing to hang because its
interrupt was effectively lost.

So, don't clear the interrupt cause bits for both MPSC controllers when
clearing the interrupt for one of them.  Just clear the one that is
supposed to be cleared.

Signed-off-by: Jay Lubomirski <[EMAIL PROTECTED]>
Acked-by: Mark A. Greer <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
---

 drivers/serial/mpsc.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- linux-2.6.21.6.orig/drivers/serial/mpsc.c
+++ linux-2.6.21.6/drivers/serial/mpsc.c
@@ -502,7 +502,8 @@ mpsc_sdma_intr_ack(struct mpsc_port_info
 
if (pi->mirror_regs)
pi->shared_regs->SDMA_INTR_CAUSE_m = 0;
-   writel(0, pi->shared_regs->sdma_intr_base + SDMA_INTR_CAUSE);
+   writeb(0x00, pi->shared_regs->sdma_intr_base + SDMA_INTR_CAUSE +
+  pi->port.line);
return;
 }
 

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 22/26] i386: fix infinite loop with singlestep int80 syscalls

2007-07-30 Thread Greg KH

-stable review patch.  If anyone has any objections, please let us know.

--

The commit 635cf99a80f4ebee59d70eb64bb85ce829e4591f introduced a
regression.  Executing a ptrace single step after certain int80
accesses will infinitely loop and never advance the PC.

The TIF_SINGLESTEP check should be done on the return from the syscall
and not before it.

The new test case is below:

/* Test whether singlestep through an int80 syscall works.
 */
#define _GNU_SOURCE
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

static int child, status;
static struct user_regs_struct regs;

static void do_child()
{
char str[80] = "child: int80 test\n";

ptrace(PTRACE_TRACEME, 0, 0, 0);
kill(getpid(), SIGUSR1);
write(fileno(stdout),str,strlen(str));
asm ("int $0x80" : : "a" (20)); /* getpid */
}

static void do_parent()
{
unsigned long eip, expected = 0;
again:
waitpid(child, , 0);
if (WIFEXITED(status) || WIFSIGNALED(status))
return;

if (WIFSTOPPED(status)) {
ptrace(PTRACE_GETREGS, child, 0, );
eip = regs.eip;
if (expected)
fprintf(stderr, "child stop @ %08lx, expected %08lx 
%s\n",
eip, expected,
eip == expected ? "" : " <== ERROR");

if (*(unsigned short *)eip == 0x80cd) {
fprintf(stderr, "int 0x80 at %08x\n", (unsigned 
int)eip);
expected = eip + 2;
} else
expected = 0;

ptrace(PTRACE_SINGLESTEP, child, NULL, NULL);
}
goto again;
}

int main(int argc, char * const argv[])
{
child = fork();
if (child)
do_parent();
else
do_child();
return 0;
}


Signed-off-by: Jason Wessel <[EMAIL PROTECTED]>
Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: Chuck Ebbert <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
---
 arch/i386/kernel/entry.S |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

--- linux-2.6.21.6.orig/arch/i386/kernel/entry.S
+++ linux-2.6.21.6/arch/i386/kernel/entry.S
@@ -371,10 +371,6 @@ ENTRY(system_call)
CFI_ADJUST_CFA_OFFSET 4
SAVE_ALL
GET_THREAD_INFO(%ebp)
-   testl $TF_MASK,PT_EFLAGS(%esp)
-   jz no_singlestep
-   orl $_TIF_SINGLESTEP,TI_flags(%ebp)
-no_singlestep:
# system call tracing in operation / 
emulation
/* Note, _TIF_SECCOMP is bit number 8, and so it needs testw and not 
testb */
testw 
$(_TIF_SYSCALL_EMU|_TIF_SYSCALL_TRACE|_TIF_SECCOMP|_TIF_SYSCALL_AUDIT),TI_flags(%ebp)
@@ -389,6 +385,10 @@ syscall_exit:
# setting need_resched or sigpending
# between sampling and the iret
TRACE_IRQS_OFF
+   testl $TF_MASK,PT_EFLAGS(%esp)  # If tracing set singlestep flag on exit
+   jz no_singlestep
+   orl $_TIF_SINGLESTEP,TI_flags(%ebp)
+no_singlestep:
movl TI_flags(%ebp), %ecx
testw $_TIF_ALLWORK_MASK, %cx   # current->work
jne syscall_exit_work

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 17/26] audit: fix oops removing watch if audit disabled

2007-07-30 Thread Greg KH

-stable review patch.  If anyone has any objections, please let us know.

--

From: Tony Jones <[EMAIL PROTECTED]>

Removing a watched file will oops if audit is disabled (auditctl -e 0).

To reproduce:
- auditctl -e 1
- touch /tmp/foo
- auditctl -w /tmp/foo
- auditctl -e 0
- rm /tmp/foo (or mv)

Signed-off-by: Tony Jones <[EMAIL PROTECTED]>
Cc: Al Viro <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
---

 kernel/auditfilter.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-2.6.21.6.orig/kernel/auditfilter.c
+++ linux-2.6.21.6/kernel/auditfilter.c
@@ -905,7 +905,7 @@ static void audit_update_watch(struct au
 
/* If the update involves invalidating rules, do the inode-based
 * filtering now, so we don't omit records. */
-   if (invalidating &&
+   if (invalidating && current->audit_context &&
audit_filter_inodes(current, current->audit_context) == 
AUDIT_RECORD_CONTEXT)
audit_set_auditable(current->audit_context);
 

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 18/26] POWERPC: Fix subtle FP state corruption bug in signal return on SMP

2007-07-30 Thread Greg KH

-stable review patch.  If anyone has any objections, please let us know.

--

This fixes a bug which can cause corruption of the floating-point state
on return from a signal handler.  If we have a signal handler that has
used the floating-point registers, and it happens to context-switch to
another task while copying the interrupted floating-point state from the
user stack into the thread struct (e.g. because of a page fault, or
because it gets preempted), the context switch code will think that the
FP registers contain valid FP state that needs to be copied into the
thread_struct, and will thus overwrite the values that the signal return
code has put into the thread_struct.

This can occur because we clear the MSR bits that indicate the presence
of valid FP state after copying the state into the thread_struct.  To fix
this we just move the clearing of the MSR bits to before the copy.  A
similar potential problem also occurs with the Altivec state, and this
fixes that in the same way.

Signed-off-by: Paul Mackerras <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
---
 arch/powerpc/kernel/signal_64.c |   10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

--- linux-2.6.21.6.orig/arch/powerpc/kernel/signal_64.c
+++ linux-2.6.21.6/arch/powerpc/kernel/signal_64.c
@@ -177,6 +177,13 @@ static long restore_sigcontext(struct pt
 */
discard_lazy_cpu_state();
 
+   /*
+* Force reload of FP/VEC.
+* This has to be done before copying stuff into current->thread.fpr/vr
+* for the reasons explained in the previous comment.
+*/
+   regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC);
+
err |= __copy_from_user(>thread.fpr, >fp_regs, 
FP_REGS_SIZE);
 
 #ifdef CONFIG_ALTIVEC
@@ -198,9 +205,6 @@ static long restore_sigcontext(struct pt
current->thread.vrsave = 0;
 #endif /* CONFIG_ALTIVEC */
 
-   /* Force reload of FP/VEC */
-   regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC);
-
return err;
 }
 

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 19/26] mm: kill validate_anon_vma to avoid mapcount BUG

2007-07-30 Thread Greg KH

-stable review patch.  If anyone has any objections, please let us know.

--

From: Hugh Dickins <[EMAIL PROTECTED]>

validate_anon_vma gave a useful check on the integrity of the anon_vma list
when Andrea was developing obj rmap; but it was not enabled in SLES9
itself, nor in mainline, until Nick changed commented-out RMAP_DEBUG to
configurable CONFIG_DEBUG_VM in 2.6.17.  Now Petr Vandrovec reports that
its BUG_ON(mapcount > 10) can easily crash a CONFIG_DEBUG_VM=y system.

That limit was just an arbitrary number to protect against an infinite
loop.  We could raise it to something enormous (depending on sizeof struct
vma and size of memory?); but I rather think validate_anon_vma has outlived
its usefulness, and is better just removed - which gives a magnificent
performance boost to anything like Petr's test program ;)

Of course, a very long anon_vma list is bad news for preemption latency,
and I believe there has been one recent report of such: let's not forget
that, but validate_anon_vma only makes it worse not better.

Signed-off-by: Hugh Dickins <[EMAIL PROTECTED]>
Cc: Petr Vandrovec <[EMAIL PROTECTED]>
Acked-by: Nick Piggin <[EMAIL PROTECTED]>
Cc: Andrea Arcangeli <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
---

 mm/rmap.c |   24 +---
 1 file changed, 1 insertion(+), 23 deletions(-)

--- linux-2.6.21.6.orig/mm/rmap.c
+++ linux-2.6.21.6/mm/rmap.c
@@ -53,24 +53,6 @@
 
 struct kmem_cache *anon_vma_cachep;
 
-static inline void validate_anon_vma(struct vm_area_struct *find_vma)
-{
-#ifdef CONFIG_DEBUG_VM
-   struct anon_vma *anon_vma = find_vma->anon_vma;
-   struct vm_area_struct *vma;
-   unsigned int mapcount = 0;
-   int found = 0;
-
-   list_for_each_entry(vma, _vma->head, anon_vma_node) {
-   mapcount++;
-   BUG_ON(mapcount > 10);
-   if (vma == find_vma)
-   found = 1;
-   }
-   BUG_ON(!found);
-#endif
-}
-
 /* This must be called under the mmap_sem. */
 int anon_vma_prepare(struct vm_area_struct *vma)
 {
@@ -121,10 +103,8 @@ void __anon_vma_link(struct vm_area_stru
 {
struct anon_vma *anon_vma = vma->anon_vma;
 
-   if (anon_vma) {
+   if (anon_vma)
list_add_tail(>anon_vma_node, _vma->head);
-   validate_anon_vma(vma);
-   }
 }
 
 void anon_vma_link(struct vm_area_struct *vma)
@@ -134,7 +114,6 @@ void anon_vma_link(struct vm_area_struct
if (anon_vma) {
spin_lock(_vma->lock);
list_add_tail(>anon_vma_node, _vma->head);
-   validate_anon_vma(vma);
spin_unlock(_vma->lock);
}
 }
@@ -148,7 +127,6 @@ void anon_vma_unlink(struct vm_area_stru
return;
 
spin_lock(_vma->lock);
-   validate_anon_vma(vma);
list_del(>anon_vma_node);
 
/* We must garbage collect the anon_vma if it's empty */

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 14/26] sched: fix next_interval determination in idle_balance()

2007-07-30 Thread Greg KH

-stable review patch.  If anyone has any objections, please let us know.

--

From: Christoph Lameter <[EMAIL PROTECTED]>

Fix massive SMP imbalance on NUMA nodes observed on 2.6.21.5 with CFS. 
(and later on reproduced without CFS as well).

The intervals of domains that do not have SD_BALANCE_NEWIDLE must be 
considered for the calculation of the time of the next balance. 
Otherwise we may defer rebalancing forever and nodes might stay idle for 
very long times.

Siddha also spotted that the conversion of the balance interval to 
jiffies is missing. Fix that to.

From: Srivatsa Vaddagiri <[EMAIL PROTECTED]>

also continue the loop if !(sd->flags & SD_LOAD_BALANCE).

Tested-by: Paul E. McKenney <[EMAIL PROTECTED]>

It did in fact trigger under all three of mainline, CFS, and -rt 
including CFS -- see below for a couple of emails from last Friday 
giving results for these three on the AMD box (where it happened) and on 
a single-quad NUMA-Q system (where it did not, at least not with such 
severity).

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
---
 kernel/sched.c |   22 +-
 1 file changed, 13 insertions(+), 9 deletions(-)

--- linux-2.6.21.6.orig/kernel/sched.c
+++ linux-2.6.21.6/kernel/sched.c
@@ -2831,17 +2831,21 @@ static void idle_balance(int this_cpu, s
unsigned long next_balance = jiffies + 60 *  HZ;
 
for_each_domain(this_cpu, sd) {
-   if (sd->flags & SD_BALANCE_NEWIDLE) {
+   unsigned long interval;
+
+   if (!(sd->flags & SD_LOAD_BALANCE))
+   continue;
+
+   if (sd->flags & SD_BALANCE_NEWIDLE)
/* If we've pulled tasks over stop searching: */
pulled_task = load_balance_newidle(this_cpu,
-   this_rq, sd);
-   if (time_after(next_balance,
- sd->last_balance + sd->balance_interval))
-   next_balance = sd->last_balance
-   + sd->balance_interval;
-   if (pulled_task)
-   break;
-   }
+   this_rq, sd);
+
+   interval = msecs_to_jiffies(sd->balance_interval);
+   if (time_after(next_balance, sd->last_balance + interval))
+   next_balance = sd->last_balance + interval;
+   if (pulled_task)
+   break;
}
if (!pulled_task)
/*

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 15/26] posix-timers: Prevent softirq starvation by small intervals and SIG_IGN

2007-07-30 Thread Greg KH

-stable review patch.  If anyone has any objections, please let us know.

--

posix-timers which deliver an ignored signal are currently rearmed in
the timer softirq: This is necessary because the timer needs to be
delivered again when SIG_IGN is removed. This is not a problem, when
the interval is reasonable.

With high resolution timers enabled one might arm a posix timer with a
very small interval and ignore the signal. This might lead to a
softirq starvation when the interval is so small that the timer is
requeued onto the softirq pending list right away.

This problem was pointed out by Jan Kiszka. Thanks Jan !

The correct solution would be to stop the timer, when the signal is
ignored and rearm it when SIG_IGN is removed. Unfortunately this
requires modification in sigaction and involves non trivial sighand
locking. It's too late in the release cycle for such a change.

For now we just keep the timer running and enforce that the timer only
fires every jiffie. This does not break anything as we keep the
overrun counter correct. It adds a little inaccuracy to the
timer_gettime() interface, but...

The more complex change is necessary anyway to fix another short
coming of the current implementation, which I discovered while looking
at this problem: A pending signal is discarded when SIG_IGN is set. In
case that a posixtimer signal is pending then it is discarded as well,
but when SIG_IGN is removed later nothing rearms the timer. This is
not new, it's that way since posix timers have been merged. So nothing
to worry about right now.

I have a working solution to fix all of this, but the impact is too
large for both stable and 2.6.22. I'm going to send it out for review
in the next days.

This should go into 2.6.21.stable as well.

Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]>
Acked-by: Ingo Molnar <[EMAIL PROTECTED]>
Cc: Jan Kiszka <[EMAIL PROTECTED]>
Cc: Ulrich Drepper <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>

---
 kernel/posix-timers.c |   35 +--
 1 file changed, 33 insertions(+), 2 deletions(-)

--- linux-2.6.21.6.orig/kernel/posix-timers.c
+++ linux-2.6.21.6/kernel/posix-timers.c
@@ -354,9 +354,40 @@ static enum hrtimer_restart posix_timer_
 * it should be restarted.
 */
if (timr->it.real.interval.tv64 != 0) {
+   ktime_t now = hrtimer_cb_get_time(timer);
+
+   /*
+* FIXME: What we really want, is to stop this
+* timer completely and restart it in case the
+* SIG_IGN is removed. This is a non trivial
+* change which involves sighand locking
+* (sigh !), which we don't want to do late in
+* the release cycle.
+*
+* For now we just let timers with an interval
+* less than a jiffie expire every jiffie to
+* avoid softirq starvation in case of SIG_IGN
+* and a very small interval, which would put
+* the timer right back on the softirq pending
+* list. By moving now ahead of time we trick
+* hrtimer_forward() to expire the timer
+* later, while we still maintain the overrun
+* accuracy, but have some inconsistency in
+* the timer_gettime() case. This is at least
+* better than a starved softirq. A more
+* complex fix which solves also another related
+* inconsistency is already in the pipeline.
+*/
+#ifdef CONFIG_HIGH_RES_TIMERS
+   {
+   ktime_t kj = ktime_set(0, NSEC_PER_SEC / HZ);
+
+   if (timr->it.real.interval.tv64 < kj.tv64)
+   now = ktime_add(now, kj);
+   }
+#endif
timr->it_overrun +=
-   hrtimer_forward(timer,
-   hrtimer_cb_get_time(timer),
+   hrtimer_forward(timer, now,
timr->it.real.interval);
ret = HRTIMER_RESTART;
++timr->it_requeue_pending;

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 16/26] FUTEX: Restore the dropped ERSCH fix

2007-07-30 Thread Greg KH

-stable review patch.  If anyone has any objections, please let us know.

--

The return value of futex_find_get_task() needs to be -ESRCH in case
that the search fails. This was part of the original futex fixes and 
got accidentally dropped, when the futex-tidy-up patch was split out.

Results in a NULL pointer dereference in case the search fails.

Restore it.

Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]>
Cc: Ingo Molnar <[EMAIL PROTECTED]>
Cc: Ulrich Drepper <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
---
 kernel/futex.c |   14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

--- linux-2.6.21.6.orig/kernel/futex.c
+++ linux-2.6.21.6/kernel/futex.c
@@ -390,14 +390,12 @@ static struct task_struct * futex_find_g
 
rcu_read_lock();
p = find_task_by_pid(pid);
-   if (!p)
-   goto out_unlock;
-   if ((current->euid != p->euid) && (current->euid != p->uid)) {
-   p = NULL;
-   goto out_unlock;
-   }
-   get_task_struct(p);
-out_unlock:
+
+   if (!p || ((current->euid != p->euid) && (current->euid != p->uid)))
+   p = ERR_PTR(-ESRCH);
+   else
+   get_task_struct(p);
+
rcu_read_unlock();
 
return p;

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 13/26] hugetlb: fix get_policy for stacked shared memory files

2007-07-30 Thread Greg KH

-stable review patch.  If anyone has any objections, please let us know.

--

From: Adam Litke <[EMAIL PROTECTED]>

Here's another breakage as a result of shared memory stacked files :(

The NUMA policy for a VMA is determined by checking the following (in the
order given):

1) vma->vm_ops->get_policy() (if defined)
2) vma->vm_policy (if defined)
3) task->mempolicy (if defined)
4) Fall back to default_policy

By switching to stacked files for shared memory, get_policy() is now always
set to shm_get_policy which is a wrapper function.  This causes us to stop
at step 1, which yields NULL for hugetlb instead of task->mempolicy which
was the previous (and correct) result.

This patch modifies the shm_get_policy() wrapper to maintain steps 1-3 for
the wrapped vm_ops.

(akpm: the refcounting of mempolicies is busted and this patch does nothing to
improve it)

Signed-off-by: Adam Litke <[EMAIL PROTECTED]>
Acked-by: William Irwin <[EMAIL PROTECTED]>
Cc: dean gaudet <[EMAIL PROTECTED]>
Cc: Christoph Lameter <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
---

 ipc/shm.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- linux-2.6.21.6.orig/ipc/shm.c
+++ linux-2.6.21.6/ipc/shm.c
@@ -254,8 +254,10 @@ struct mempolicy *shm_get_policy(struct 
 
if (sfd->vm_ops->get_policy)
pol = sfd->vm_ops->get_policy(vma, addr);
-   else
+   else if (vma->vm_policy)
pol = vma->vm_policy;
+   else
+   pol = current->mempolicy;
return pol;
 }
 #endif

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 12/26] dm crypt: fix remove first_clone

2007-07-30 Thread Greg KH

-stable review patch.  If anyone has any objections, please let us know.

--

From: Olaf Kirch <[EMAIL PROTECTED]>

Get rid of first_clone in dm-crypt

This gets rid of first_clone, which is not really needed.  Apparently, cloned
bios used to share their bvec some time way in the past - this is no longer
the case.  Contrarily, this even hurts us if we try to create a clone off
first_clone after it has completed, and crypt_endio has destroyed its bvec.

Signed-off-by: Olaf Kirch <[EMAIL PROTECTED]>
Signed-off-by: Alasdair G Kergon <[EMAIL PROTECTED]>
Cc: Jens Axboe <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
---
Gitweb: 
http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=2f9941b6c55d70103c1bc3f2c7676acd9f20bf8a

 drivers/md/dm-crypt.c |   34 ++
 1 file changed, 6 insertions(+), 28 deletions(-)

--- linux-2.6.21.6.orig/drivers/md/dm-crypt.c
+++ linux-2.6.21.6/drivers/md/dm-crypt.c
@@ -33,7 +33,6 @@
 struct crypt_io {
struct dm_target *target;
struct bio *base_bio;
-   struct bio *first_clone;
struct work_struct work;
atomic_t pending;
int error;
@@ -380,9 +379,8 @@ static int crypt_convert(struct crypt_co
  * This should never violate the device limitations
  * May return a smaller bio when running out of pages
  */
-static struct bio *
-crypt_alloc_buffer(struct crypt_io *io, unsigned int size,
-   struct bio *base_bio, unsigned int *bio_vec_idx)
+static struct bio *crypt_alloc_buffer(struct crypt_io *io, unsigned int size,
+ unsigned int *bio_vec_idx)
 {
struct crypt_config *cc = io->target->private;
struct bio *clone;
@@ -390,12 +388,7 @@ crypt_alloc_buffer(struct crypt_io *io, 
gfp_t gfp_mask = GFP_NOIO | __GFP_HIGHMEM;
unsigned int i;
 
-   if (base_bio) {
-   clone = bio_alloc_bioset(GFP_NOIO, base_bio->bi_max_vecs, 
cc->bs);
-   __bio_clone(clone, base_bio);
-   } else
-   clone = bio_alloc_bioset(GFP_NOIO, nr_iovecs, cc->bs);
-
+   clone = bio_alloc_bioset(GFP_NOIO, nr_iovecs, cc->bs);
if (!clone)
return NULL;
 
@@ -498,9 +491,6 @@ static void dec_pending(struct crypt_io 
if (!atomic_dec_and_test(>pending))
return;
 
-   if (io->first_clone)
-   bio_put(io->first_clone);
-
bio_endio(io->base_bio, io->base_bio->bi_size, io->error);
 
mempool_free(io, cc->io_pool);
@@ -618,8 +608,7 @@ static void process_write(struct crypt_i
 * so repeat the whole process until all the data can be handled.
 */
while (remaining) {
-   clone = crypt_alloc_buffer(io, base_bio->bi_size,
-  io->first_clone, _idx);
+   clone = crypt_alloc_buffer(io, base_bio->bi_size, _idx);
if (unlikely(!clone)) {
dec_pending(io, -ENOMEM);
return;
@@ -635,21 +624,11 @@ static void process_write(struct crypt_i
}
 
clone->bi_sector = cc->start + sector;
-
-   if (!io->first_clone) {
-   /*
-* hold a reference to the first clone, because it
-* holds the bio_vec array and that can't be freed
-* before all other clones are released
-*/
-   bio_get(clone);
-   io->first_clone = clone;
-   }
-
remaining -= clone->bi_size;
sector += bio_sectors(clone);
 
-   /* prevent bio_put of first_clone */
+   /* Grab another reference to the io struct
+* before we kick off the request */
if (remaining)
atomic_inc(>pending);
 
@@ -965,7 +944,6 @@ static int crypt_map(struct dm_target *t
io = mempool_alloc(cc->io_pool, GFP_NOIO);
io->target = ti;
io->base_bio = bio;
-   io->first_clone = NULL;
io->error = io->post_process = 0;
atomic_set(>pending, 0);
kcryptd_queue_io(io);

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 11/26] dm crypt: fix avoid cloned bio ref after free

2007-07-30 Thread Greg KH

-stable review patch.  If anyone has any objections, please let us know.

--

From: Olaf Kirch <[EMAIL PROTECTED]>

Do not access the bio after generic_make_request

We should never access a bio after generic_make_request - there's no guarantee
it still exists.

Signed-off-by: Olaf Kirch <[EMAIL PROTECTED]>
Signed-off-by: Alasdair G Kergon <[EMAIL PROTECTED]>
Cc: Jens Axboe <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
---
Gitweb: 
http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=98221eb757de03d9aa6262b1eded2be708640ccc

 drivers/md/dm-crypt.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

--- linux-2.6.21.6.orig/drivers/md/dm-crypt.c
+++ linux-2.6.21.6/drivers/md/dm-crypt.c
@@ -655,9 +655,12 @@ static void process_write(struct crypt_i
 
generic_make_request(clone);
 
+   /* Do not reference clone after this - it
+* may be gone already. */
+
/* out of memory -> run queues */
if (remaining)
-   congestion_wait(bio_data_dir(clone), HZ/100);
+   congestion_wait(WRITE, HZ/100);
}
 }
 

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/5] x86_64 EFI support -v3: EFI runtime support

2007-07-30 Thread Eric W. Biederman

"Huang, Ying" <[EMAIL PROTECTED]> writes:

> This patch adds runtime service support for EFI x86_64 system.
>
> The EFI support for emergency_restart and RTC clock is added. The EFI
> based implementation and legacy BIOS or CMOS based implementation are
> put in separate functions and are chosen based on the value of
> efi_enabled.

The patches to the reboot path are wrong (see below).

Why do we need to do this anyway?  Why do we need any EFI runtime
support?  We already have ACPI.  Isn't that good enough to abstract
out the runtime parts of the hardware?

Why do we need to replace working code that directly talks to the
architecturally defined hardware, with firmware calls?

What is the point?  What is the advantage?

The disadvantage of having more code to maintain and having
to deal with more BIOS bugs should be obvious.

> Signed-off-by: Chandramouli Narayanan <[EMAIL PROTECTED]>
> Signed-off-by: Huang Ying <[EMAIL PROTECTED]>
>
> ---
>
>  reboot.c |   11 ++-
>  time.c   |   47 +++
>  2 files changed, 41 insertions(+), 17 deletions(-)
>
> Index: linux-2.6.23-rc1/arch/x86_64/kernel/reboot.c
> ===
> --- linux-2.6.23-rc1.orig/arch/x86_64/kernel/reboot.c 2007-07-23
> 04:41:00.0 +0800
> +++ linux-2.6.23-rc1/arch/x86_64/kernel/reboot.c 2007-07-30 09:26:56.0
> +0800
> @@ -9,6 +9,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -117,7 +118,7 @@
>   pci_iommu_shutdown();
>  }
>  
> -void machine_emergency_restart(void)
> +static inline void bios_emergency_restart(void)
>  {
>   int i;
>  
> @@ -145,6 +146,14 @@
>   }  
>  }
>  
> +void machine_emergency_restart(void)
> +{
> + if (efi_enabled)
> + efi_emergency_restart();
> + else
> + bios_emergency_restart();
> +}
> +

A EFI is a bios so naming the current machine_emergency_restart 
bios_emergency_restart is
a misnomer, especially since pounds the hardware not the firmware.
Second we already have a perfectly capable mechanism in the
reboot_type variable so you should just need to add one more type and
handle this properly. 


>  void machine_restart(char * __unused)
>  {
>   printk("machine restart\n");
> Index: linux-2.6.23-rc1/arch/x86_64/kernel/time.c
> ===
> --- linux-2.6.23-rc1.orig/arch/x86_64/kernel/time.c 2007-07-23
> 04:41:00.0 +0800
> +++ linux-2.6.23-rc1/arch/x86_64/kernel/time.c 2007-07-30 09:42:12.0
> +0800
> @@ -27,6 +27,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #ifdef CONFIG_ACPI
>  #include /* for PM timer frequency */
> @@ -89,13 +90,6 @@
>   unsigned char control, freq_select;
>  
>  /*
> - * IRQs are disabled when we're called from the timer interrupt,
> - * no need for spin_lock_irqsave()
> - */
> -
> - spin_lock(_lock);
> -
> -/*
>   * Tell the clock it's being set and stop it.
>   */
>  
> @@ -143,14 +137,26 @@
>   CMOS_WRITE(control, RTC_CONTROL);
>   CMOS_WRITE(freq_select, RTC_FREQ_SELECT);
>  
> - spin_unlock(_lock);
> -
>   return retval;
>  }
>  
>  int update_persistent_clock(struct timespec now)
>  {
> - return set_rtc_mmss(now.tv_sec);
> + int retval;
> +
> +/*
> + * IRQs are disabled when we're called from the timer interrupt,
> + * no need for spin_lock_irqsave()
> + */
> +
> + spin_lock(_lock);
> + if (efi_enabled)
> + retval = efi_set_rtc_mmss(now.tv_sec);
> + else
> + retval = set_rtc_mmss(now.tv_sec);
> + spin_unlock(_lock);
> +
> + return retval;
>  }
>  
>  void main_timer_handler(void)
> @@ -195,14 +201,11 @@
>   return IRQ_HANDLED;
>  }
>  
> -unsigned long read_persistent_clock(void)
> +unsigned long read_cmos_clock(void)
>  {
>   unsigned int year, mon, day, hour, min, sec;
> - unsigned long flags;
>   unsigned century = 0;
>  
> - spin_lock_irqsave(_lock, flags);
> -
>   do {
>   sec = CMOS_READ(RTC_SECONDS);
>   min = CMOS_READ(RTC_MINUTES);
> @@ -217,8 +220,6 @@
>  #endif
>   } while (sec != CMOS_READ(RTC_SECONDS));
>  
> - spin_unlock_irqrestore(_lock, flags);
> -
>   /*
>* We know that x86-64 always uses BCD format, no need to check the
>* config register.
> @@ -246,6 +247,20 @@
>   return mktime(year, mon, day, hour, min, sec);
>  }
>  
> +unsigned long read_persistent_clock(void)
> +{
> + unsigned long flags, retval;
> +
> + spin_lock_irqsave(_lock, flags);
> + if (efi_enabled)
> + retval = efi_get_time();
> + else
> + retval = read_cmos_clock();
> + spin_unlock_irqrestore(_lock, flags);
> +
> + return retval;
> +}
> +
>  /* calibrate_cpu is used on systems with fixed rate TSCs to determine
>   * processor frequency */
>  #define TICK_COUNT 1
-
To

[patch 09/26] dm crypt: disable barriers

2007-07-30 Thread Greg KH

-stable review patch.  If anyone has any objections, please let us know.

--

From: Milan Broz <[EMAIL PROTECTED]>

Disable barriers in dm-crypt because of current workqueue processing can
reorder requests.

This must be addresed later but for now disabling barriers is needed to
prevent data corruption.

Signed-off-by: Milan Broz <[EMAIL PROTECTED]>
Signed-off-by: Alasdair G Kergon <[EMAIL PROTECTED]>
Cc: Jens Axboe <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
---
Gitweb: 
http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=9c89f8be1a7d14aad9d2c3f7d90d7d88f82c61e2

 drivers/md/dm-crypt.c |3 +++
 1 file changed, 3 insertions(+)

--- linux-2.6.21.6.orig/drivers/md/dm-crypt.c
+++ linux-2.6.21.6/drivers/md/dm-crypt.c
@@ -954,6 +954,9 @@ static int crypt_map(struct dm_target *t
struct crypt_config *cc = ti->private;
struct crypt_io *io;
 
+   if (bio_barrier(bio))
+   return -EOPNOTSUPP;
+
io = mempool_alloc(cc->io_pool, GFP_NOIO);
io->target = ti;
io->base_bio = bio;

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 10/26] dm crypt: fix call to clone_init

2007-07-30 Thread Greg KH

-stable review patch.  If anyone has any objections, please let us know.

--

From: Olaf Kirch <[EMAIL PROTECTED]>

Call clone_init early

We need to call clone_init as early as possible - at least before call
bio_put(clone) in any error path.  Otherwise, the destructor will try to
dereference bi_private, which may still be NULL.

Signed-off-by: Olaf Kirch <[EMAIL PROTECTED]>
Signed-off-by: Alasdair G Kergon <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
---
Gitweb: 
http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=027581f3515b5ec2218847dab578afa439a9d6b9

 drivers/md/dm-crypt.c |   12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

--- linux-2.6.21.6.orig/drivers/md/dm-crypt.c
+++ linux-2.6.21.6/drivers/md/dm-crypt.c
@@ -107,6 +107,8 @@ struct crypt_config {
 
 static struct kmem_cache *_crypt_io_pool;
 
+static void clone_init(struct crypt_io *, struct bio *);
+
 /*
  * Different IV generation algorithms:
  *
@@ -379,9 +381,10 @@ static int crypt_convert(struct crypt_co
  * May return a smaller bio when running out of pages
  */
 static struct bio *
-crypt_alloc_buffer(struct crypt_config *cc, unsigned int size,
+crypt_alloc_buffer(struct crypt_io *io, unsigned int size,
struct bio *base_bio, unsigned int *bio_vec_idx)
 {
+   struct crypt_config *cc = io->target->private;
struct bio *clone;
unsigned int nr_iovecs = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
gfp_t gfp_mask = GFP_NOIO | __GFP_HIGHMEM;
@@ -396,7 +399,7 @@ crypt_alloc_buffer(struct crypt_config *
if (!clone)
return NULL;
 
-   clone->bi_destructor = dm_crypt_bio_destructor;
+   clone_init(io, clone);
 
/* if the last bio was not complete, continue where that one ended */
clone->bi_idx = *bio_vec_idx;
@@ -562,6 +565,7 @@ static void clone_init(struct crypt_io *
clone->bi_end_io  = crypt_endio;
clone->bi_bdev= cc->dev->bdev;
clone->bi_rw  = io->base_bio->bi_rw;
+   clone->bi_destructor = dm_crypt_bio_destructor;
 }
 
 static void process_read(struct crypt_io *io)
@@ -585,7 +589,6 @@ static void process_read(struct crypt_io
}
 
clone_init(io, clone);
-   clone->bi_destructor = dm_crypt_bio_destructor;
clone->bi_idx = 0;
clone->bi_vcnt = bio_segments(base_bio);
clone->bi_size = base_bio->bi_size;
@@ -615,7 +618,7 @@ static void process_write(struct crypt_i
 * so repeat the whole process until all the data can be handled.
 */
while (remaining) {
-   clone = crypt_alloc_buffer(cc, base_bio->bi_size,
+   clone = crypt_alloc_buffer(io, base_bio->bi_size,
   io->first_clone, _idx);
if (unlikely(!clone)) {
dec_pending(io, -ENOMEM);
@@ -631,7 +634,6 @@ static void process_write(struct crypt_i
return;
}
 
-   clone_init(io, clone);
clone->bi_sector = cc->start + sector;
 
if (!io->first_clone) {

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 08/26] md: Fix bug in error handling during raid1 repair.

2007-07-30 Thread Greg KH

-stable review patch.  If anyone has any objections, please let us know.

--

From: Mike Accetta <[EMAIL PROTECTED]>

If raid1/repair (which reads all block and fixes any differences
it finds) hits a read error, it doesn't reset the bio for writing
before writing correct data back, so the read error isn't fixed,
and the device probably gets a zero-length write which it might
complain about.

Signed-off-by: Neil Brown <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
---

 drivers/md/raid1.c |   21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff .prev/drivers/md/raid1.c ./drivers/md/raid1.c
--- linux-2.6.21.6.orig/drivers/md/raid1.c
+++ linux-2.6.21.6/drivers/md/raid1.c
@@ -1240,17 +1240,24 @@ static void sync_request_write(mddev_t *
}
r1_bio->read_disk = primary;
for (i=0; iraid_disks; i++)
-   if (r1_bio->bios[i]->bi_end_io == end_sync_read &&
-   test_bit(BIO_UPTODATE, _bio->bios[i]->bi_flags)) 
{
+   if (r1_bio->bios[i]->bi_end_io == end_sync_read) {
int j;
int vcnt = r1_bio->sectors >> (PAGE_SHIFT- 9);
struct bio *pbio = r1_bio->bios[primary];
struct bio *sbio = r1_bio->bios[i];
-   for (j = vcnt; j-- ; )
-   if 
(memcmp(page_address(pbio->bi_io_vec[j].bv_page),
-  
page_address(sbio->bi_io_vec[j].bv_page),
-  PAGE_SIZE))
-   break;
+
+   if (test_bit(BIO_UPTODATE, >bi_flags)) {
+   for (j = vcnt; j-- ; ) {
+   struct page *p, *s;
+   p = pbio->bi_io_vec[j].bv_page;
+   s = sbio->bi_io_vec[j].bv_page;
+   if (memcmp(page_address(p),
+  page_address(s),
+  PAGE_SIZE))
+   break;
+   }
+   } else
+   j = 0;
if (j >= 0)
mddev->resync_mismatches += 
r1_bio->sectors;
if (j < 0 || test_bit(MD_RECOVERY_CHECK, 
>recovery)) {

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 05/26] pi-futex: Fix exit races and locking problems

2007-07-30 Thread Greg KH


-stable review patch.  If anyone has any objections, please let us know.

--

From: Alexey Kuznetsov <[EMAIL PROTECTED]>
1. New entries can be added to tsk->pi_state_list after task completed
   exit_pi_state_list(). The result is memory leakage and deadlocks.

2. handle_mm_fault() is called under spinlock. The result is obvious.

3. results in self-inflicted deadlock inside glibc.
   Sometimes futex_lock_pi returns -ESRCH, when it is not expected
   and glibc enters to for(;;) sleep() to simulate deadlock. This problem
   is quite obvious and I think the patch is right. Though it looks like
   each "if" in futex_lock_pi() got some stupid special case "else if". :-)

4. sometimes futex_lock_pi() returns -EDEADLK,
   when nobody has the lock. The reason is also obvious (see comment
   in the patch), but correct fix is far beyond my comprehension.
   I guess someone already saw this, the chunk:

if (rt_mutex_trylock(_state->pi_mutex))
ret = 0;

   is obviously from the same opera. But it does not work, because the
   rtmutex is really taken at this point: wake_futex_pi() of previous
   owner reassigned it to us. My fix works. But it looks very stupid.
   I would think about removal of shift of ownership in wake_futex_pi()
   and making all the work in context of process taking lock.

From: Thomas Gleixner <[EMAIL PROTECTED]>

Fix 1) Avoid the tasklist lock variant of the exit race fix by adding
an additional state transition to the exit code.

This fixes also the issue, when a task with recursive segfaults
is not able to release the futexes.

Fix 2) Cleanup the lookup_pi_state() failure path and solve the -ESRCH
problem finally.

Fix 3) Solve the fixup_pi_state_owner() problem which needs to do the fixup
in the lock protected section by using the in_atomic userspace access
functions.

This removes also the ugly lock drop / unqueue inside of fixup_pi_state()

Fix 4) Fix a stale lock in the error path of futex_wake_pi()

Added some error checks for verification.

The -EDEADLK problem is solved by the rtmutex fixups.

Cc: Alexey Kuznetsov <[EMAIL PROTECTED]>
Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]>
Acked-by: Ingo Molnar <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>

---
 include/linux/sched.h |1 
 kernel/exit.c |   22 +
 kernel/futex.c|  191 +-
 3 files changed, 150 insertions(+), 64 deletions(-)

--- linux-2.6.21.6.orig/kernel/futex.c
+++ linux-2.6.21.6/kernel/futex.c
@@ -396,10 +396,6 @@ static struct task_struct * futex_find_g
p = NULL;
goto out_unlock;
}
-   if (p->exit_state != 0) {
-   p = NULL;
-   goto out_unlock;
-   }
get_task_struct(p);
 out_unlock:
rcu_read_unlock();
@@ -467,7 +463,7 @@ lookup_pi_state(u32 uval, struct futex_h
struct futex_q *this, *next;
struct list_head *head;
struct task_struct *p;
-   pid_t pid;
+   pid_t pid = uval & FUTEX_TID_MASK;
 
head = >chain;
 
@@ -485,6 +481,8 @@ lookup_pi_state(u32 uval, struct futex_h
return -EINVAL;
 
WARN_ON(!atomic_read(_state->refcount));
+   WARN_ON(pid && pi_state->owner &&
+   pi_state->owner->pid != pid);
 
atomic_inc(_state->refcount);
me->pi_state = pi_state;
@@ -495,15 +493,33 @@ lookup_pi_state(u32 uval, struct futex_h
 
/*
 * We are the first waiter - try to look up the real owner and attach
-* the new pi_state to it, but bail out when the owner died bit is set
-* and TID = 0:
+* the new pi_state to it, but bail out when TID = 0
 */
-   pid = uval & FUTEX_TID_MASK;
-   if (!pid && (uval & FUTEX_OWNER_DIED))
+   if (!pid)
return -ESRCH;
p = futex_find_get_task(pid);
-   if (!p)
-   return -ESRCH;
+   if (IS_ERR(p))
+   return PTR_ERR(p);
+
+   /*
+* We need to look at the task state flags to figure out,
+* whether the task is exiting. To protect against the do_exit
+* change of the task flags, we do this protected by
+* p->pi_lock:
+*/
+   spin_lock_irq(>pi_lock);
+   if (unlikely(p->flags & PF_EXITING)) {
+   /*
+* The task is on the way out. When PF_EXITPIDONE is
+* set, we know that the task has finished the
+* cleanup:
+*/
+   int ret = (p->flags & PF_EXITPIDONE) ? -ESRCH : -EAGAIN;
+
+   spin_unlock_irq(>pi_lock);
+   put_task_struct(p);
+   return ret;
+   }
 
pi_state = alloc_pi_state();
 
@@

[patch 06/26] hpt366: disallow Ultra133 for HPT374

2007-07-30 Thread Greg KH


-stable review patch.  If anyone has any objections, please let us know.

--

Eliminate UltraATA/133 support for HPT374 -- the chip isn't capable of this mode
according to the manual, and doesn't even seem to tolerate 66 MHz DPLL clock...

Signed-off-by: Sergei Shtylyov <[EMAIL PROTECTED]>
Cc: Geller Sandor <[EMAIL PROTECTED]>
Signed-off-by: Bartlomiej Zolnierkiewicz <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
---
 drivers/ide/pci/hpt366.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

--- linux-2.6.21.6.orig/drivers/ide/pci/hpt366.c
+++ linux-2.6.21.6/drivers/ide/pci/hpt366.c
@@ -1,5 +1,5 @@
 /*
- * linux/drivers/ide/pci/hpt366.c  Version 1.03May 4, 2007
+ * linux/drivers/ide/pci/hpt366.c  Version 1.04Jun 4, 2007
  *
  * Copyright (C) 1999-2003 Andre Hedrick <[EMAIL PROTECTED]>
  * Portions Copyright (C) 2001 Sun Microsystems, Inc.
@@ -106,7 +106,8 @@
  *   switch  to calculating  PCI clock frequency based on the chip's base DPLL
  *   frequency
  * - switch to using the  DPLL clock and enable UltraATA/133 mode by default on
- *   anything  newer than HPT370/A
+ *   anything  newer than HPT370/A (except HPT374 that is not capable of this
+ *   mode according to the manual)
  * - fold PCI clock detection and DPLL setup code into init_chipset_hpt366(),
  *   also fixing the interchanged 25/40 MHz PCI clock cases for HPT36x chips;
  *   unify HPT36x/37x timing setup code and the speedproc handlers by joining
@@ -365,7 +366,6 @@ static u32 sixty_six_base_hpt37x[] = {
 };
 
 #define HPT366_DEBUG_DRIVE_INFO0
-#define HPT374_ALLOW_ATA133_6  1
 #define HPT371_ALLOW_ATA133_6  1
 #define HPT302_ALLOW_ATA133_6  1
 #define HPT372_ALLOW_ATA133_6  1
@@ -450,7 +450,7 @@ static struct hpt_info hpt370a __devinit
 
 static struct hpt_info hpt374 __devinitdata = {
.chip_type  = HPT374,
-   .max_mode   = HPT374_ALLOW_ATA133_6 ? 4 : 3,
+   .max_mode   = 3,
.dpll_clk   = 48,
.settings   = hpt37x_settings
 };

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 07/26] md: Fix two raid10 bugs.

2007-07-30 Thread Greg KH


-stable review patch.  If anyone has any objections, please let us know.

--

1/ When resyncing a degraded raid10 which has more than 2 copies of each block,
  garbage can get synced on top of good data.

2/ We round the wrong way in part of the device size calculation, which
  can cause confusion.

Signed-off-by: Neil Brown <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
---

 drivers/md/raid10.c |6 ++
 1 file changed, 6 insertions(+)

diff .prev/drivers/md/raid10.c ./drivers/md/raid10.c
--- linux-2.6.21.6.orig/drivers/md/raid10.c
+++ linux-2.6.21.6/drivers/md/raid10.c
@@ -1867,6 +1867,7 @@ static sector_t sync_request(mddev_t *md
int d = r10_bio->devs[i].devnum;
bio = r10_bio->devs[i].bio;
bio->bi_end_io = NULL;
+   clear_bit(BIO_UPTODATE, >bi_flags);
if (conf->mirrors[d].rdev == NULL ||
test_bit(Faulty, >mirrors[d].rdev->flags))
continue;
@@ -2037,6 +2038,11 @@ static int run(mddev_t *mddev)
/* 'size' is now the number of chunks in the array */
/* calculate "used chunks per device" in 'stride' */
stride = size * conf->copies;
+
+   /* We need to round up when dividing by raid_disks to
+* get the stride size.
+*/
+   stride += conf->raid_disks - 1;
sector_div(stride, conf->raid_disks);
mddev->size = stride  << (conf->chunk_shift-1);
 

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 03/26] rt-mutex: Fix stale return value

2007-07-30 Thread Greg KH


-stable review patch.  If anyone has any objections, please let us know.

--

Alexey Kuznetsov found some problems in the pi-futex code. 

The major problem is a stale return value in rt_mutex_slowlock():

When the pi chain walk returns -EDEADLK, but the waiter was woken up 
during the phases where the locks were dropped, the rtmutex could be
acquired, but due to the stale return value -EDEADLK returned to the
caller.

Reset the return value in the woken up path.

Cc: Alexey Kuznetsov <[EMAIL PROTECTED]>
Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]>
Acked-by: Ingo Molnar <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>

---
 kernel/rtmutex.c |   11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

--- linux-2.6.21.6.orig/kernel/rtmutex.c
+++ linux-2.6.21.6/kernel/rtmutex.c
@@ -659,9 +659,16 @@ rt_mutex_slowlock(struct rt_mutex *lock,
 * all over without going into schedule to try
 * to get the lock now:
 */
-   if (unlikely(!waiter.task))
+   if (unlikely(!waiter.task)) {
+   /*
+* Reset the return value. We might
+* have returned with -EDEADLK and the
+* owner released the lock while we
+* were walking the pi chain.
+*/
+   ret = 0;
continue;
-
+   }
if (unlikely(ret))
break;
}

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 04/26] rt-mutex: Fix chain walk early wakeup bug

2007-07-30 Thread Greg KH


-stable review patch.  If anyone has any objections, please let us know.

--

Alexey Kuznetsov found some problems in the pi-futex code. 

One of the root causes is:

When a wakeup happens, we do not to stop the chain walk so we
we follow a non existing locking chain.

Drop out when this happens.

Cc: Alexey Kuznetsov <[EMAIL PROTECTED]>
Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]>
Acked-by: Ingo Molnar <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>

---
 kernel/rtmutex.c |   13 +
 1 file changed, 13 insertions(+)

--- linux-2.6.21.6.orig/kernel/rtmutex.c
+++ linux-2.6.21.6/kernel/rtmutex.c
@@ -212,6 +212,19 @@ static int rt_mutex_adjust_prio_chain(st
if (!waiter || !waiter->task)
goto out_unlock_pi;
 
+   /*
+* Check the orig_waiter state. After we dropped the locks,
+* the previous owner of the lock might have released the lock
+* and made us the pending owner:
+*/
+   if (orig_waiter && !orig_waiter->task)
+   goto out_unlock_pi;
+
+   /*
+* Drop out, when the task has no waiters. Note,
+* top_waiter can be NULL, when we are in the deboosting
+* mode!
+*/
if (top_waiter && (!task_has_pi_waiters(task) ||
   top_waiter != task_top_pi_waiter(task)))
goto out_unlock_pi;

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 02/26] sparsemem: fix oops in x86_64 show_mem

2007-07-30 Thread Greg KH


-stable review patch.  If anyone has any objections, please let us know.

--

We aren't sampling for holes in memory. Thus we encounter a section hole with
empty section map pointer for SPARSEMEM and OOPs for show_mem. This issue
has been seen in 2.6.21, current git and current mm. This patch is for
2.6.21 stable. It was tested against sparsemem.

Previous to commit f0a5a58aa812b31fd9f197c4ba48245942364eae memory_present
was called for node_start_pfn to node_end_pfn. This would cover the hole(s)
with reserved pages and valid sections. Most SPARSEMEM supported arches
do a pfn_valid check in show_mem before computing the page structure address.

This issue was brought to my attention on IRC by Arnaldo Carvalho de Melo at
[EMAIL PROTECTED] Thanks to Arnaldo for testing.

Signed-off-by: Bob Picco <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
---

 arch/x86_64/mm/init.c |2 ++
 1 file changed, 2 insertions(+)

--- linux-2.6.21.6.orig/arch/x86_64/mm/init.c
+++ linux-2.6.21.6/arch/x86_64/mm/init.c
@@ -72,6 +72,8 @@ void show_mem(void)
 
for_each_online_pgdat(pgdat) {
for (i = 0; i < pgdat->node_spanned_pages; ++i) {
+   if (!pfn_valid(pgdat->node_start_pfn + i))
+   continue;
page = pfn_to_page(pgdat->node_start_pfn + i);
total++;
if (PageReserved(page))

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 00/26] 2.6.21.7 -stable review

2007-07-30 Thread Greg KH

Very sorry for the long delay in getting these out, it should be the
last 2.6.21-stable release, unless there are some patches that people
point out to us that deserve a new .21.y release.

This is the start of the stable review cycle for the 2.6.21.7 release.
There are 26 patches in this series, all will be posted as a response to
this one.  If anyone has any issues with these being applied, please let
us know.  If anyone is a maintainer of the proper subsystem, and wants
to add a Signed-off-by: line to the patch, please respond with it.

These patches are sent out with a number of different people on the
Cc: line.  If you wish to be a reviewer, please email [EMAIL PROTECTED]
to add your name to the list.  If you want to be off the reviewer list,
also email us.

Responses should be made by August 2, 2007, 00:00:00 UTC.  Anything
received after that time might be too late.

thanks,

the -stable release team
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 01/26] BNX2: Fix netdev watchdog on 5708.

2007-07-30 Thread Greg KH


-stable review patch.  If anyone has any objections, please let us know.

--

There's a bug in the driver that only initializes half of the context
memory on the 5708.  Surprisingly, this works most of the time except
for some occasional netdev watchdogs when sending a lot of 64-byte
packets.  This fix is to add the missing code to initialize the 2nd
half of the context memory.

Update version to 1.5.8.2.

Signed-off-by: Michael Chan <[EMAIL PROTECTED]>
Signed-off-by: David S. Miller <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>

---
 drivers/net/bnx2.c |   25 +++--
 1 file changed, 15 insertions(+), 10 deletions(-)

--- linux-2.6.21.6.orig/drivers/net/bnx2.c
+++ linux-2.6.21.6/drivers/net/bnx2.c
@@ -54,8 +54,8 @@
 
 #define DRV_MODULE_NAME"bnx2"
 #define PFX DRV_MODULE_NAME": "
-#define DRV_MODULE_VERSION "1.5.8.1"
-#define DRV_MODULE_RELDATE "May 7, 2007"
+#define DRV_MODULE_VERSION "1.5.8.2"
+#define DRV_MODULE_RELDATE "June 5, 2007"
 
 #define RUN_AT(x) (jiffies + (x))
 
@@ -1550,6 +1550,7 @@ bnx2_init_context(struct bnx2 *bp)
vcid = 96;
while (vcid) {
u32 vcid_addr, pcid_addr, offset;
+   int i;
 
vcid--;
 
@@ -1570,16 +1571,20 @@ bnx2_init_context(struct bnx2 *bp)
pcid_addr = vcid_addr;
}
 
-   REG_WR(bp, BNX2_CTX_VIRT_ADDR, 0x00);
-   REG_WR(bp, BNX2_CTX_PAGE_TBL, pcid_addr);
+   for (i = 0; i < (CTX_SIZE / PHY_CTX_SIZE); i++) {
+   vcid_addr += (i << PHY_CTX_SHIFT);
+   pcid_addr += (i << PHY_CTX_SHIFT);
+
+   REG_WR(bp, BNX2_CTX_VIRT_ADDR, 0x00);
+   REG_WR(bp, BNX2_CTX_PAGE_TBL, pcid_addr);
+
+   /* Zero out the context. */
+   for (offset = 0; offset < PHY_CTX_SIZE; offset += 4)
+   CTX_WR(bp, 0x00, offset, 0);
 
-   /* Zero out the context. */
-   for (offset = 0; offset < PHY_CTX_SIZE; offset += 4) {
-   CTX_WR(bp, 0x00, offset, 0);
+   REG_WR(bp, BNX2_CTX_VIRT_ADDR, vcid_addr);
+   REG_WR(bp, BNX2_CTX_PAGE_TBL, pcid_addr);
}
-
-   REG_WR(bp, BNX2_CTX_VIRT_ADDR, vcid_addr);
-   REG_WR(bp, BNX2_CTX_PAGE_TBL, pcid_addr);
}
 }
 

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] flush icache before set_pte take6. [4/4] optimization for cpus other than montecito

2007-07-30 Thread KAMEZAWA Hiroyuki

On Mon, 30 Jul 2007 22:15:50 -0600
"David Mosberger-Tang" <[EMAIL PROTECTED]> wrote:

> This seems crazy to me.  Flushing should occur according to the
> *architecture*, not model-by-model.  Even if we happen to get "lucky"
> on pre-Montecito CPUs, that doesn't justify such ugly hacks.  

I'm not sure this can happen before Montecito because L1 was write-through
and L2 was mixed. 

> Or you really want to debug this *again* come next CPU?

No. 
I should add RFC to this patch. I just want to hear opinions.
This is why I separated this patch. I can drop this.

Thanks,
-Kame





 
>   --david
> 
> On 7/30/07, KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> wrote:
> >
> > Add "L2 cache is separated? check flag" as read_mostly global variable.
> >
> > This add one memory reference to global variable to page faults of 
> > "executable"
> > map in do_wp_page(page copy case), file-mapped page fault and some system 
> > calls
> > which does memory map changes. But not so bad as calling sync_icache_dcache 
> > in
> > architectures which doesn't need it.
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]>
> >
> >
> > ---
> >  arch/ia64/kernel/setup.c   |7 +++
> >  include/asm-ia64/pgtable.h |3 ++-
> >  2 files changed, 9 insertions(+), 1 deletion(-)
> >
> > Index: linux-2.6.23-rc1.test/arch/ia64/kernel/setup.c
> > ===
> > --- linux-2.6.23-rc1.test.orig/arch/ia64/kernel/setup.c
> > +++ linux-2.6.23-rc1.test/arch/ia64/kernel/setup.c
> > @@ -106,6 +106,8 @@ struct io_space io_space[MAX_IO_SPACES];
> >  EXPORT_SYMBOL(io_space);
> >  unsigned int num_io_spaces;
> >
> > +int separated_l2_icache_dcache __read_mostly;
> > +
> >  /*
> >   * "flush_icache_range()" needs to know what processor dependent stride 
> > size to use
> >   * when it makes i-cache(s) coherent with d-caches.
> > @@ -718,6 +720,11 @@ get_model_name(__u8 family, __u8 model)
> > printk(KERN_ERR
> >"%s: Table overflow. Some processor model 
> > information will be missing\n",
> >__FUNCTION__);
> > +   /* Montecito has separated L2 Icache and Dcache. This requires
> > +  synchronize Icache and Dcache before set_pte() */
> > +   if (family == 0x20)
> > +   separated_l2_icache_dcache = 1;
> > +
> > return "Unknown";
> >  }
> >
> > Index: linux-2.6.23-rc1.test/include/asm-ia64/pgtable.h
> > ===
> > --- linux-2.6.23-rc1.test.orig/include/asm-ia64/pgtable.h
> > +++ linux-2.6.23-rc1.test/include/asm-ia64/pgtable.h
> > @@ -489,9 +489,10 @@ extern struct page *zero_page_memmap_ptr
> >   * as an executable pte.
> >   */
> >  extern void __sync_icache_dcache(pte_t pte);
> > +extern int separated_l2_icache_dcache;
> >  static inline void sync_icache_dcache(pte_t pte)
> >  {
> > -   if (pte_exec(pte))
> > +   if (pte_exec(pte) && separated_l2_icache_dcache)
> > __sync_icache_dcache(pte);
> >  }
> >  #define __HAVE_ARCH_SYNC_ICACHE_DCACHE
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> > the body of a message to [EMAIL PROTECTED]
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> 
> -- 
> Mosberger Consulting LLC, http://www.mosberger-consulting.com/
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [rfc] direct IO submission and completion scalability issues

2007-07-30 Thread Nick Piggin

On Mon, Jul 30, 2007 at 01:35:19PM -0700, Suresh B wrote:
> On Mon, Jul 30, 2007 at 11:20:04AM -0700, Christoph Lameter wrote:
> > On Fri, 27 Jul 2007, Siddha, Suresh B wrote:
> 
> > > Observation #2: This introduces some migration overhead during IO 
> > > submission.
> > > With the current prototype, every incoming IO request results in an IPI 
> > > and
> > > context switch(to kblockd thread) on the interrupt processing cpu.
> > > This issue needs to be addressed and main challenge to address is
> > > the efficient mechanism of doing this IO migration(how much batching to 
> > > do and
> > > when to send the migrate request?), so that we don't delay the IO much 
> > > and at
> > > the same point, don't cause much overhead during migration.
> > 
> > Right.
> 
> So any suggestions for making this clean and acceptable to everyone?

It is obviously a good idea to hand over the IO at the point which
requires the least number of cachelines to be moved, and I think doing
it in the block layer is right. Mostly you have to convince the block
and driver maintainers I guess.

The scheduler really should be made interrupt-load aware anyway, so I
don't have a problem with changing that; or scheduling kblockd at a
higher priority, but I don't know if SCHED_FIFO is a good idea. Couldn't
it be done in a softirq instead?

Latency for IO migration could be the most difficult problem to solve
really. You don't give much details of the workload, profiles, etc... I
hope this is for a real world test? Can the locking be improved in simpler
ways first?

Just some random questions...

It looks like the main source of cacheline bouncing you're eliminating
is from the initial starting of IO from an empty queue (ie. unplug).
>From then on, the submission is driven by completion, right?

Why is the queue allowed to go empty in the first place in an IO critical
workload?

Are you loading up each CPU with as many disks as it can possibly handle
plus a few more? If so, is that realistic? (I honestly don't know).

You say that you'd like to do this for direct IO only, but if it is more
efficient, why not for buffered IO as well? (or is it not more efficient
for buffered IO? if not, why?)

AFAIKS, you'd still have significant queue_lock contention from other
CPUs inserting requests into the list? What IO scheduler are you using?
I assume noop... as a crazy experiment, what happens if you create per-cpu
request queues?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] sh updates for 2.6.23-rc2

2007-07-30 Thread Paul Mundt

Please pull from:

master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6.23.git

Which contains:

Adrian McMenamin (1):
  sh: Fix Dreamcast DMA issues.

David McCullough (2):
  sh: arch/sh/boot - fix shell usage
  sh: fix get_wchan() for SH kernels without framepointers

Magnus Damm (4):
  sh: remove old broken pint code
  sh: remove support for sh73180 and solution engine 73180
  sh: sh-sci - fix SH7708 support
  sh: remove support for sh7300 and solution engine 7300

Markus Brunner (1):
  rtc: rtc-sh: Correct sh_rtc_set_time() for some SH-3 parts.

Paul Mundt (8):
  sh: Add kmap_coherent()/kunmap_coherent() interface for SH-4.
  sh: Reclaim beginning of P3 space for vmalloc area.
  sh: Kill the rest of the SE73180 cruft.
  sh: Silence sq compile warning on sh4 nommu.
  sh: Restrict DSP support to specific CPUs.
  sh: Kill off virt_to_bus()/bus_to_virt().
  sh: Add sh to the CC_OPTIMIZE_FOR_SIZE dependencies.
  sh: Fix fs.h removal from mm.h regressions.

 arch/sh/Kconfig|   33 +-
 arch/sh/Makefile   |2 -
 arch/sh/boards/se/7300/Makefile|5 -
 arch/sh/boards/se/7300/io.c|  268 
 arch/sh/boards/se/7300/irq.c   |   40 --
 arch/sh/boards/se/7300/setup.c |   74 --
 arch/sh/boards/se/73180/Makefile   |5 -
 arch/sh/boards/se/73180/io.c   |  268 
 arch/sh/boards/se/73180/irq.c  |  136 
 arch/sh/boards/se/73180/setup.c|   75 ---
 arch/sh/boot/Makefile  |7 +-
 arch/sh/boot/compressed/Makefile   |7 +-
 arch/sh/configs/se7300_defconfig   |  696 
 arch/sh/configs/se73180_defconfig  |  648 --
 arch/sh/drivers/dma/dma-api.c  |7 +-
 arch/sh/kernel/cpu/irq/Makefile|1 -
 arch/sh/kernel/cpu/irq/ipr.c   |2 +-
 arch/sh/kernel/cpu/irq/pint.c  |  220 --
 arch/sh/kernel/cpu/sh3/Makefile|4 +-
 .../cpu/sh3/{clock-sh7300.c => clock-sh7710.c} |   26 +-
 arch/sh/kernel/cpu/sh3/setup-sh7300.c  |   43 --
 arch/sh/kernel/cpu/sh4/probe.c |6 -
 arch/sh/kernel/cpu/sh4/sq.c|   18 +-
 arch/sh/kernel/cpu/sh4a/Makefile   |2 -
 arch/sh/kernel/cpu/sh4a/clock-sh73180.c|   81 ---
 arch/sh/kernel/cpu/sh4a/setup-sh73180.c|   43 --
 arch/sh/kernel/init_task.c |2 +-
 arch/sh/kernel/process.c   |7 +-
 arch/sh/kernel/setup.c |3 +-
 arch/sh/kernel/sys_sh.c|1 +
 arch/sh/kernel/timers/timer-tmu.c  |3 +-
 arch/sh/kernel/vsyscall/vsyscall.c |1 +
 arch/sh/mm/Kconfig |   14 +-
 arch/sh/mm/cache-sh4.c |   14 -
 arch/sh/mm/pg-sh4.c|   76 +--
 arch/sh/tools/mach-types   |2 -
 drivers/rtc/rtc-sh.c   |1 +
 drivers/serial/sh-sci.c|9 +-
 drivers/serial/sh-sci.h|   66 +--
 include/asm-sh/bugs.h  |4 +-
 include/asm-sh/cpu-sh3/freq.h  |4 -
 include/asm-sh/cpu-sh3/mmu_context.h   |1 -
 include/asm-sh/cpu-sh3/timer.h |3 +-
 include/asm-sh/cpu-sh4/freq.h  |2 +-
 include/asm-sh/dma-mapping.h   |8 +-
 include/asm-sh/dma.h   |1 +
 include/asm-sh/fixmap.h|8 +-
 include/asm-sh/floppy.h|4 +-
 include/asm-sh/io.h|4 -
 include/asm-sh/pgtable.h   |6 +-
 include/asm-sh/processor.h |4 +-
 include/asm-sh/se7300.h|   64 --
 include/asm-sh/se73180.h   |   66 --
 include/asm-sh/ubc.h   |3 +-
 init/Kconfig   |2 +-
 sound/sh/aica.h|2 +-
 56 files changed, 138 insertions(+), 2964 deletions(-)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/5] x86_64 EFI support -v3

2007-07-30 Thread Eric W. Biederman

"Huang, Ying" <[EMAIL PROTECTED]> writes:

> Following sets of patches add EFI/UEFI (Unified Extensible Firmware
> Interface) support to x86_64 architecture. The patches have been
> tested against 2.6.23-rc1 kernel on Intel platforms with EFI1.10 and
> UEFI2.0 firmware.
>
> UEFI specification can be found here: http://www.uefi.org

Is it still the case you must sign a license you won't implement
it if you read the specification?

> For booting the UEFI x86_64 enabled kernel, the machine with EFI/UEFI
> firmware and the support of bootloader is required. Detailed usage
> guide can be found in Documentation/x86_64/uefi.txt, which is added in
> the patch: efi-doc.patch.
>
> Issues _not_ addressed (per feedback from Eric Biederman)

Thank you for acknowledging them.

I would really prefer to see something start simple and obviously
correct and grow (typical unix/linux development) rather then
attempt to use all of the cool efi features at once.

> - Virtual mode support is still retained in this patch. There is at
>   least one EFI call is fast path: efi_set_rtc_mmss, which must
>   complete as soon as possible.

Bogus.  You are setting the wall clock time in the granularity
of a second.  Yes we can achieve high accuracy by setting things
as soon after the change as we can.  I don't see a couple of
extra micro second being a bit deal here.  If a couple of
extra micro seconds are a big deal we shouldn't be going through
efi to perform this logic in the first place.  This is x86 
and we know the hardware programming interface.

Why in the world are we going through efi for real time clock
operations anyway.  That seems completely silly.

> - The variable efi_enabled is used throughout across architecutres if
>  CONFIG_EFI option is enabled. The i386 code also uses this variable.
>  This is something that can be revisited with code consolidation
>  across architectures.

Fix it first. arch/i386/ efi support is horrible, and show what happens
when things are not done properly the first time.  Later doesn't happen.
With the partvirt logic we have a lot of operations properly split out
already.  Figure out how to use them. 

Ok. Looking what more I can tear into.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] flush icache before set_pte take6. [4/4] optimization for cpus other than montecito

2007-07-30 Thread David Mosberger-Tang

This seems crazy to me.  Flushing should occur according to the
*architecture*, not model-by-model.  Even if we happen to get "lucky"
on pre-Montecito CPUs, that doesn't justify such ugly hacks.  Or you
really want to debug this *again* come next CPU?

  --david

On 7/30/07, KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> wrote:
>
> Add "L2 cache is separated? check flag" as read_mostly global variable.
>
> This add one memory reference to global variable to page faults of 
> "executable"
> map in do_wp_page(page copy case), file-mapped page fault and some system 
> calls
> which does memory map changes. But not so bad as calling sync_icache_dcache in
> architectures which doesn't need it.
>
> Signed-off-by: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]>
>
>
> ---
>  arch/ia64/kernel/setup.c   |7 +++
>  include/asm-ia64/pgtable.h |3 ++-
>  2 files changed, 9 insertions(+), 1 deletion(-)
>
> Index: linux-2.6.23-rc1.test/arch/ia64/kernel/setup.c
> ===
> --- linux-2.6.23-rc1.test.orig/arch/ia64/kernel/setup.c
> +++ linux-2.6.23-rc1.test/arch/ia64/kernel/setup.c
> @@ -106,6 +106,8 @@ struct io_space io_space[MAX_IO_SPACES];
>  EXPORT_SYMBOL(io_space);
>  unsigned int num_io_spaces;
>
> +int separated_l2_icache_dcache __read_mostly;
> +
>  /*
>   * "flush_icache_range()" needs to know what processor dependent stride size 
> to use
>   * when it makes i-cache(s) coherent with d-caches.
> @@ -718,6 +720,11 @@ get_model_name(__u8 family, __u8 model)
> printk(KERN_ERR
>"%s: Table overflow. Some processor model information 
> will be missing\n",
>__FUNCTION__);
> +   /* Montecito has separated L2 Icache and Dcache. This requires
> +  synchronize Icache and Dcache before set_pte() */
> +   if (family == 0x20)
> +   separated_l2_icache_dcache = 1;
> +
> return "Unknown";
>  }
>
> Index: linux-2.6.23-rc1.test/include/asm-ia64/pgtable.h
> ===
> --- linux-2.6.23-rc1.test.orig/include/asm-ia64/pgtable.h
> +++ linux-2.6.23-rc1.test/include/asm-ia64/pgtable.h
> @@ -489,9 +489,10 @@ extern struct page *zero_page_memmap_ptr
>   * as an executable pte.
>   */
>  extern void __sync_icache_dcache(pte_t pte);
> +extern int separated_l2_icache_dcache;
>  static inline void sync_icache_dcache(pte_t pte)
>  {
> -   if (pte_exec(pte))
> +   if (pte_exec(pte) && separated_l2_icache_dcache)
> __sync_icache_dcache(pte);
>  }
>  #define __HAVE_ARCH_SYNC_ICACHE_DCACHE
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 
Mosberger Consulting LLC, http://www.mosberger-consulting.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] sh64 updates for 2.6.23-rc2

2007-07-30 Thread Paul Mundt

Please pull from:

master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh64-2.6.git

Which contains:

Paul Mundt (3):
  sh64: Fix fs.h removal from mm.h regressions.
  sh64: Fix irq_intc build failure.
  sh64: Kill off virt_to_bus()/bus_to_virt().

 arch/sh64/Kconfig  |3 +++
 arch/sh64/kernel/init_task.c   |2 +-
 arch/sh64/kernel/irq_intc.c|1 +
 arch/sh64/kernel/pci-dma.c |4 ++--
 arch/sh64/kernel/process.c |1 +
 arch/sh64/kernel/sys_sh64.c|1 +
 arch/sh64/lib/dbg.c|1 +
 include/asm-sh64/dma-mapping.h |8 
 include/asm-sh64/io.h  |4 
 9 files changed, 14 insertions(+), 11 deletions(-)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 0/4][RFC] lro: Generic Large Receive Offload for TCP traffic

2007-07-30 Thread Leonid Grossman



> -Original Message-
> From: [EMAIL PROTECTED] [mailto:netdev-
> [EMAIL PROTECTED] On Behalf Of Andrew Gallatin
> Sent: Monday, July 30, 2007 10:43 AM
> To: Linas Vepstas
> Cc: Jan-Bernd Themann; netdev; Thomas Klein; Jeff Garzik; Jan-Bernd
> Themann; linux-kernel; linux-ppc; Christoph Raisch; Marcus Eder;
Stefan
> Roscher; David Miller
> Subject: Re: [PATCH 0/4][RFC] lro: Generic Large Receive Offload for
> TCP traffic
> 
> 
> Here is a quick reply before something more official can
> be written up:
> 
> Linas Vepstas wrote:
> 
>  > -- what is LRO?
> 
> Large Receive Offload
> 
>  > -- Basic principles of operation?
> 
> LRO is analogous to a receive side version of TSO.  The NIC (or
> driver) merges several consecutive segments from the same connection,
> fixing up checksums, etc.  Rather than up to 45 separate 1500 byte
> frames (meaning up to 45 trips through the network stack), the driver
> merges them into one 65212 byte frame.  It currently works only
> with TCP over IPv4.
> 
> LRO was, AFAIK, first though of by Neterion.  They had a paper about
> it at OLS2005.
> http://www.linuxinsight.com/files/ols2005/grossman-reprint.pdf
> 
>  > -- Can I use it in my driver?
> 
> Yes, it can be used in any driver.
> 
>  > -- Does my hardware have to have some special feature before I can
> use it?
> 
> No.

Ditto. LRO hw assists (or full LRO offload, meaning that the merge
happens in the ASIC but the merge criteria are set by the host) are
quite beneficial, especially as the number of LRO connections gets very
large (Neterion has IP around fw/hw LRO and will release full hw LRO
offload in the next ASIC), but as Andrew indicated sw-only LRO
implementation can be done for any NIC and provides good results -
especially for non-jumbo workloads. 

> 
>  > -- What sort of performance improvement does it provide?
Throughput?
>  >Latency? CPU usage? How does it affect DMA allocation? Does it
>  >improve only a certain type of traffic (large/small packets,
> etc.)
> 
> The benefit is directly proportional to the packet rate.
> 
> See my reply to the previous RFC for performance information.  The
> executive summary is that for the myri10ge 10GbE driver on low end
> hardware with 1500b frames, I've seen it increase throughput by a
> factor of nearly 2.5x, while at the same time reducing CPU utilization
> by 17%.  The affect for jumbo frames is less dramatic, but still
> impressive (1.10x, 14% CPU reduction)
> 
> You can achieve better speedups if your driver receives into
> high-order pages.
> 
>  > -- Example code? What's the API? How should my driver use it?
> 
> The 3/4 in this patch showed an example of converting a driver
> to use LRO for skb based receive buffers.   I'm working on
> a patch for myri10ge that shows how you would use it in a driver
> which receives into pages.
> 
> Cheers,
> 
> Drew
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CONFIG_SUSPEND? (was: Re: [GIT PATCH] ACPI patches for 2.6.23-rc1)

2007-07-30 Thread david


On Mon, 30 Jul 2007, Len Brown wrote:


On Saturday 28 July 2007 12:55, Linus Torvalds wrote:


So I think the real issue is that we allow that
"suspend_devices_and_enter()" code to be compiled without HOTPLUG_CPU in
the first place. It's not supposed to work that way.


I don't see how CONFIG_HOTPLUG_CPU justifies its own existence.
This e-mail thread would have never happened if it were simply included
in CONFIG_SMP, always.

I agree, of course, that ACPI should never have had to work-around
this by selecting HOTPLUG_CPU.  But even though it is now done at
the right layer, I don't see why PM should have to
be bothered with selecting HOTPLUG_CPU either --
it should just come with SMP.


why do you need hotplug just becouse you have muliple cpus? if you never 
have any intention of useing suspend and your hardware doesn't support 
hotplugging, why should you have to include the code for it?


Dvaid Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2/3] 2.6.23-rc1: known regressions v3

2007-07-30 Thread Alexey Starikovskiy

Meelis Roos wrote:
>>> Subject : New ACPI error/warning with Linus' latest GIT
>>> References  : http://lkml.org/lkml/2007/7/26/395
>>> Last known good : ?
>>> Submitter   : Ismail Dönmez <[EMAIL PROTECTED]>
>>> Caused-By   : ?
>>> Handled-By  : ?
>>> Status  : unknown
>> This started to happen after the second ACPI merge which was for 2.6.23-rc2.
> 
> It appeared after new Embedded Controller code was merged into ACPI. It 
> might as well be just a debug message or a remainder to add support for 
> new queries (whatever these are), but I do not know. The message itself 
> seems harmless.
> 
This _is_ a debug message. EC asks us to perform query which was never defined 
in DSDT.
Previously I thought it would be rare error report, but now it seems that every 
machine
has at least one unregistered query... It is not a functional regression, as we 
just 
ignored errors from query execution before.

Len already has a patch to remove this printk.

Regards,
Alex.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: inotify and /proc/

2007-07-30 Thread Al Viro

On Mon, Jul 30, 2007 at 10:56:27PM -0500, Joseph Pingenot wrote:
> >From Al Viro on Tuesday, 31 July, 2007:
> >On Mon, Jul 30, 2007 at 10:40:59PM -0500, Joseph Pingenot wrote:
> >> I'm trying to implement pwait.  It blocks until a specified PID exits,
> >>   and then it exits.
> >er... ptrace(2)?
> 
> Should work for most common usage scenarios, although will suspect that it
>   won't for for processes owned by another user (at least, I hope
>   it wouldn't).
> 
> What is dangerous about inotify on a proc file?

Playing with the lifetime rules, for starters...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Update: ide problems: 2.6.22-git17 working, 2.6.23-rc1* is not:

2007-07-30 Thread Danny ter Haar

Quoting Len Brown ([EMAIL PROTECTED]):
> Hmmm, okay, the "big hammer" works.  Please see
> if any of these smaller hammers work:
> 
> pnpacpi=off
Went out for dinner, machine was frozen on return.
One less on the checklist ..

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: inotify and /proc/

2007-07-30 Thread Joseph Pingenot

>From Al Viro on Tuesday, 31 July, 2007:
>On Mon, Jul 30, 2007 at 10:40:59PM -0500, Joseph Pingenot wrote:
>> I'm trying to implement pwait.  It blocks until a specified PID exits,
>>   and then it exits.
>er... ptrace(2)?

Should work for most common usage scenarios, although will suspect that it
  won't for for processes owned by another user (at least, I hope
  it wouldn't).

What is dangerous about inotify on a proc file?

-- 
[EMAIL PROTECTED]///
"There is also an entire branch in the physical therapy field dedicated
 to the treatment of little-finger injuries caused by excessive Emacs
 use."  --Linux Weekly News editor (http://lwn.net/Articles/206916/)
///260 IATL / The University of Iowa / Iowa City, IA  52242///
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] create CONFIG_SUSPEND_UP_POSSIBLE

2007-07-30 Thread Len Brown

From: Linus Torvalds <[EMAIL PROTECTED]>

Without this change, it is possible to build CONFIG_HIBERNATE
on all !SMP architectures, but not necessarily their SMP versions.

I don't know for sure if the architecture list under SUSPEND_UP_POSSIBLE
is correct.  For now it simply matches the list for SUSPEND_SMP_POSSIBLE.

Signed-off-by: Len Brown <[EMAIL PROTECTED]>
---
 Kconfig |7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig
index 412859f..ccf6576 100644
--- a/kernel/power/Kconfig
+++ b/kernel/power/Kconfig
@@ -72,6 +72,11 @@ config PM_TRACE
CAUTION: this option will cause your machine's real-time clock to be
set to an invalid time after a resume.
 
+config SUSPEND_UP_POSSIBLE
+   bool
+   depends on (X86 && !X86_VOYAGER) || (PPC64 && (PPC_PSERIES || PPC_PMAC))
+   default y
+
 config SUSPEND_SMP_POSSIBLE
bool
depends on (X86 && !X86_VOYAGER) || (PPC64 && (PPC_PSERIES || PPC_PMAC))
@@ -92,7 +97,7 @@ config PM_SLEEP
 config SUSPEND
bool "Suspend to RAM and standby"
depends on PM
-   depends on !SMP || SUSPEND_SMP_POSSIBLE
+   depends on (!SMP && SUSPEND_UP_POSSIBLE) || (SMP && 
SUSPEND_SMP_POSSIBLE)
default y
---help---
  Allow the system to enter sleep states in which main memory is
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Kernel Build Failure

2007-07-30 Thread Randy Dunlap

On Mon, 30 Jul 2007 22:31:00 -0500 Chris Holvenstot wrote:

> I am having a problem, likely caused by the vacuum between my ears, but
> I thought that a few of you would find it recreational to tear up an
> inexperienced user for doing dumb things.  
> 
> I have what is quickly becoming a "un-ubuntu" 7.04 system - as time goes
> on I seem to be replacing the standard distribution with "alternative"
> components - this past weekend I replaced the kernel because the one
> shipped with Ubuntu was not cutting it with the new SATA drives I
> recently purchased.
> 
> So I downloaded the 2.6.22.1 kernel sources and set out to build a new
> kernel.  For the most part I left things pretty much as they shipped
> when running the menuconfig tool and an hour or so later I had a
> functioning kernel which did a decent job supporting my new hard drives
> (I still have a slight problem with soft resets which seems to go away
> when I turn NCQ down to zero)
> 
> While I was researching the SATA issue I noted a comment on the mailing
> list that having a few more people build and run your RC1 kernels wold
> be desirable.  So I said, why not?   I may still be new to Linux and C
> but I was an MVS Sysprog and assembly language programmer for 25 years
> and if nothing else I have strong troubleshooting skills and a
> understanding of how an operating system works.
> 
> My target system for this activity was my "multi-purpose" AMD64 X2 based
> home server - in addition to being my desktop, it also functions as the
> family web server and even runs the sewing machine via an instance of
> Windows running under VirtualBox.  The box is up 24x7 and gets a good
> work out on a variety of tasks, but if it goes down so be it.  
> 
> However, I think that my process for building an RC1 kernel must be
> flawed, and as much as I poked around I can not spot what I am doing
> wrong.  
> 
> I take a virgin set of 2.6.22.1 sources.  
> 
> To this I added the 2.6.23-rc1 patch set.  The patch verified and
> applied without generating any error messages.
> 
> The only change I made with the menuconfig tool from 2.6.22.1 was to
> change the generated kernel name.
> 
> I ran make and received the following error:
> 
> CC [M]  net/netfilter/nf_conntrack_proto_sctp.o
> net/netfilter/nf_conntrack_proto_sctp.c: In function ‘sctp_new’:
> net/netfilter/nf_conntrack_proto_sctp.c:436: error: implicit declaration
> of function ‘DEBUGP’
> make[2]: *** [net/netfilter/nf_conntrack_proto_sctp.o] Error 1
> make[1]: *** [net/netfilter] Error 2
> make: *** [net] Error 
> 
> Any suggestions as to what I might be doing wrong would be appreciated.

Apply 2.6.23-rc1 to 2.6.22, not to 2.6.22.1.  The 2.6.2x.y -stable
series is a different branch of development.

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] Introduce CONFIG_HIBERNATION and CONFIG_SUSPEND (updated)

2007-07-30 Thread Len Brown

On Sunday 29 July 2007 20:21, Linus Torvalds wrote:
> 
> Ok, I took this, and modified Len's patch to re-introduce ACPI_SLEEP on 
> top of it (I took the easy way out, and just made PM_SLEEP imply 
> ACPI_SLEEP, which should make everything come out right. I could have 
> dropped ACPI_SLEEP entirely in favour of PM_SLEEP, but that would have 
> implied changing more of Len's patch than I was really comfy with).
> 
> Len, Rafael, please do check that the end result looks ok. 

SUSPEND depends only on (!SMP || SUSPEND_SMP_POSSIBLE).
This means that while we limit the architectures it can build on
if they are SMP, it can build on any !SMP architecture --
which probably isn't what we want.

I think the right way to go is your SUSPEND_UP_POSSIBLE suggestion.
Honestly, I though it was overly verbose when I first read it,
but I like it better now, especially since it works;-)
I'll reply w/ an incremental patch.

> I suspect ACPI could now take the PM_SLEEP/SUSPEND/HIBERNATE details into 
> account, and that some of the code is not necessary when HIBERNATE is not 
> selected, for example, but I'm not at all sure that it's worth it being 
> very fine-grained.

As you know, I don't think that it is worth dedicated config options
to save 16KB on an SMP+ACPI kernel.  The prospect of adding code to
slice that 16KB into finer grain savings seems even less worthwhile.

-Len
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove fs.h from mm.h

2007-07-30 Thread Paul Mundt

On Mon, Jul 30, 2007 at 02:36:13AM +0400, Alexey Dobriyan wrote:
> 0) Remove fs.h from mm.h. For this,
> 1) Uninline vma_wants_writenotify(). It's pretty huge anyway.
> 2) Add back fs.h or less bloated headers (err.h) to files that need it.
> 
sh ended up breaking all over the place, and sh64 in a few others.
I'll roll the fixes in to my git trees, but here they are for reference.

--

 arch/sh/kernel/init_task.c |2 +-
 arch/sh/kernel/process.c   |1 +
 arch/sh/kernel/sys_sh.c|1 +
 arch/sh/kernel/vsyscall/vsyscall.c |1 +
 arch/sh/mm/pg-sh4.c|1 +
 arch/sh64/kernel/init_task.c   |2 +-
 arch/sh64/kernel/process.c |1 +
 arch/sh64/kernel/sys_sh64.c|1 +
 arch/sh64/lib/dbg.c|1 +
 9 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/sh/kernel/init_task.c b/arch/sh/kernel/init_task.c
index 44053ea..4b449c4 100644
--- a/arch/sh/kernel/init_task.c
+++ b/arch/sh/kernel/init_task.c
@@ -3,7 +3,7 @@
 #include 
 #include 
 #include 
-
+#include 
 #include 
 #include 
 
diff --git a/arch/sh/kernel/process.c b/arch/sh/kernel/process.c
index 6334a4c..44ebe06 100644
--- a/arch/sh/kernel/process.c
+++ b/arch/sh/kernel/process.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/arch/sh/kernel/sys_sh.c b/arch/sh/kernel/sys_sh.c
index 76b1bc7..024ce5d 100644
--- a/arch/sh/kernel/sys_sh.c
+++ b/arch/sh/kernel/sys_sh.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/arch/sh/kernel/vsyscall/vsyscall.c 
b/arch/sh/kernel/vsyscall/vsyscall.c
index 2aa9438..95f4de0 100644
--- a/arch/sh/kernel/vsyscall/vsyscall.c
+++ b/arch/sh/kernel/vsyscall/vsyscall.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Should the kernel map a VDSO page into processes and pass its
diff --git a/arch/sh/mm/pg-sh4.c b/arch/sh/mm/pg-sh4.c
index df69da9..f4810aa 100644
--- a/arch/sh/mm/pg-sh4.c
+++ b/arch/sh/mm/pg-sh4.c
@@ -8,6 +8,7 @@
  */
 #include 
 #include 
+#include 
 #include 
 #include 
 
diff --git a/arch/sh64/kernel/init_task.c b/arch/sh64/kernel/init_task.c
index de2d07d..deee8bf 100644
--- a/arch/sh64/kernel/init_task.c
+++ b/arch/sh64/kernel/init_task.c
@@ -14,7 +14,7 @@
 #include 
 #include 
 #include 
-
+#include 
 #include 
 #include 
 
diff --git a/arch/sh64/kernel/process.c b/arch/sh64/kernel/process.c
index 1b89c9d..ceb9458 100644
--- a/arch/sh64/kernel/process.c
+++ b/arch/sh64/kernel/process.c
@@ -21,6 +21,7 @@
  * This file handles the architecture-dependent parts of process handling..
  */
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/arch/sh64/kernel/sys_sh64.c b/arch/sh64/kernel/sys_sh64.c
index 19126da..b7f18e2 100644
--- a/arch/sh64/kernel/sys_sh64.c
+++ b/arch/sh64/kernel/sys_sh64.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/arch/sh64/lib/dbg.c b/arch/sh64/lib/dbg.c
index 4310fc8..97816e0 100644
--- a/arch/sh64/lib/dbg.c
+++ b/arch/sh64/lib/dbg.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 typedef u64 regType_t;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CONFIG_SUSPEND? (was: Re: [GIT PATCH] ACPI patches for 2.6.23-rc1)

2007-07-30 Thread Len Brown

On Saturday 28 July 2007 12:55, Linus Torvalds wrote:

> So I think the real issue is that we allow that 
> "suspend_devices_and_enter()" code to be compiled without HOTPLUG_CPU in 
> the first place. It's not supposed to work that way.

I don't see how CONFIG_HOTPLUG_CPU justifies its own existence.
This e-mail thread would have never happened if it were simply included
in CONFIG_SMP, always.

I agree, of course, that ACPI should never have had to work-around
this by selecting HOTPLUG_CPU.  But even though it is now done at
the right layer, I don't see why PM should have to
be bothered with selecting HOTPLUG_CPU either --
it should just come with SMP.

-Len
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Marvell 7042 (sata_mv) fails to initialize drive

2007-07-30 Thread Markus Gutschke

I just tried hooking up a Hitachi 1TB SATA-II drive to a Marvell 7042 
based controller, and the most recent Linux kernel (2.6.23-rc1) fails to 
properly initialize the interface. Here are the relevant kernel messages:



kernel: [43.312417] sata_mv :06:00.0: version 0.81
kernel: [43.312752] ACPI: PCI Interrupt Link [APC5] enabled at IRQ 16
kernel: [43.312757] ACPI: PCI Interrupt :06:00.0[A] -> Link [APC5] -> GSI 16 
(level, low) -> IRQ 16
kernel: [43.312788] sata_mv :06:00.0: Applying 60X1C0 workarounds to 
unknown rev
kernel: [43.314443] sata_mv :06:00.0: Gen-IIE 32 slots 4 ports SCSI mode 
IRQ via INTx
kernel: [43.314535] scsi0 : sata_mv
kernel: [43.314581] scsi1 : sata_mv
kernel: [43.314614] scsi2 : sata_mv
kernel: [43.314640] scsi3 : sata_mv
kernel: [43.314660] ata1: SATA max UDMA/133 cmd 0x ctl 
0xc20003522120 bmdma 0x irq 16
kernel: [43.314663] ata2: SATA max UDMA/133 cmd 0x ctl 
0xc20003524120 bmdma 0x irq 16
kernel: [43.314666] ata3: SATA max UDMA/133 cmd 0x ctl 
0xc20003526120 bmdma 0x irq 16
kernel: [43.314669] ata4: SATA max UDMA/133 cmd 0x ctl 
0xc20003528120 bmdma 0x irq 16
kernel: [53.409086] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
kernel: [59.741602] ata1: EH pending after completion, repeating EH (cnt=4)
kernel: [59.777642] ata2: SATA link down (SStatus 0 SControl 300)
kernel: [59.809752] ata3: SATA link down (SStatus 0 SControl 300)
kernel: [59.841740] ata4: SATA link down (SStatus 0 SControl 300) 


The kernel never even registers the drive as an available disk device, 
whereas everything appears to work fine, if I connect the disk to one of 
the other controllers (JMicron AHCI, or NVidia sata_nv) on this motherboard.


As I have two of those disks (in a RAID-1 array) and multiple 
independent controllers, it is relatively easy for me to do some testing 
here. The worst case scenario is that I need to wait a couple of hours 
for the array to rebuild itself after I am done experimenting.


Let me know, if there is anything I can do to help you diagnose the root 
cause, or whether this is a known bug and you don't need any help 
testing at this point.



Markus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: inotify and /proc/

2007-07-30 Thread Al Viro

On Mon, Jul 30, 2007 at 10:40:59PM -0500, Joseph Pingenot wrote:
> I'm trying to implement pwait.  It blocks until a specified PID exits,
>   and then it exits.

er... ptrace(2)?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: inotify and /proc/

2007-07-30 Thread Kyle McMartin

On Mon, Jul 30, 2007 at 10:40:59PM -0500, Joseph Pingenot wrote:
> From Al Viro on Tuesday, 31 July, 2007:
> >On Mon, Jul 30, 2007 at 10:31:13PM -0500, Joseph Pingenot wrote:
> >> >From Joseph Pingenot on Monday, 30 July, 2007:
> >> >From Al Viro on Tuesday, 31 July, 2007:
> >> >>On Mon, Jul 30, 2007 at 09:16:16PM -0500, Joseph Pingenot wrote:
> >> >>> I was trying to use inotify to watch process changes (especially 
> >> >>> process
> >> >>>   termination) by watching /proc/.
> >> >>> Sadly, although I could see something reading various files, nothing
> >> >>>   was issued when the process I was watching exited and the directory
> >> >>>   went away.
> >> >>> Is this intentional, or a bug?
> >> >>It's a bug you intend to introduce in your program...  IOW, don't
> >> >>do that.
> >> >More background, please?
> >> >What's the way to check for a process exiting without spinning?
> >> I should also specify that the process being waited on is not a
> >>   child process-it's just some other process on the system.
> >Umm...  Any details on intended use?  IOW, is that "I want to write
> >an utility that would wait for given PID to exit, just for the hell
> >of it" or is there something you are trying to implement using that?
> 
> I'm trying to implement pwait.  It blocks until a specified PID exits,
>   and then it exits.
>

ptrace with options set to PTRACE_O_TRACEEXIT?

--Kyle
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 12/26] ext2 white-out support

2007-07-30 Thread Theodore Tso

On Mon, Jul 30, 2007 at 06:13:35PM +0200, Jan Blunck wrote:
> Introduce white-out support to ext2.
> 
> Known Bugs:
> - Needs a reserved inode number for white-outs

You picked different reserved inodes for the ext2 and ext3
filesystems.  That's good for a NACK right there.  The codepoints
(i.e., reserved inode numbers, feature bit masks, etc.) for ext2,
ext3, and ext4 MUST not overlap.  After all, someone might use tune2fs
-j to convert an ext2 filesystem to ext3, and is it's REALLY BAD that
you're using a reserved inode of 7 for ext2, and 9 for ext3.

Also, I note that you have created a new INCOMPAT feature flag support
for whiteouts.  That's really unfortunate; we try to avoid introducing
incompatible feature flags unless absolutely necessary; note that even
adding a COMPAT feature flag means that you need a new version of
e2fsprogs if you want e2fsck to be willing to touch that filesystem.

So --- if you're looking for a way to add whiteout support to
ext2/ext3 without needing a feature bit, here's how.  We allocate a
new inode flag in struct ext3_inode.i_flags:

#define EXT2_WHTOUT_FL   0x0004

We also allocate a new field in the ext2 superblock to store the
"whiteout inode".  (Please coordinate with me so it's a superblock
field not in use by ext3/ext4, and so it's reserved so that no one
else uses it.)  The superblock field, call it s_whtout_ino, stores the
inode number for the "white out inode".

When you create a new whiteout file, the code checks sb->s_whtout_ino,
and if it is zero, it allocates a new inode, and creates it as a
zero-length regular file (i_mode |= S_IFREG) with the EXT2_WHTOUT_FL
flag set in the inode, and then store the inode number in
sb->s_whtout_ino.  If sb->s_whtout_ino is non-zero, you must read in
the inode and make sure that the EXT2_WHTOUT_FL is set.  If it is not,
then allocate a new whiteout inode as described previously.  Then link
the inode into the directory as before.

When reading an inode, if the EXT2_WHTOUT_FL flag is set, then set the
in-memory mode of the inode to be S_IFWHT.  

That's pretty much about it.  For cleanliness sake, it would be good
if ext2_delete_inode clears sb->s_whtout_ino if the last whiteout link
has been deleted, but it's strictly speaking not necessary.  If you do
it this way, the filesystem is completely backwards compatible; the
whiteout files will just appear to links to a normal zero-lenth file.

I wouldn't bother with setting the directory type field to be DT_WHT,
given that they will never be returned to userspace anyway.

Regards,

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] add check do_direct_IO() return val

2007-07-30 Thread Badari


Joe Jin wrote:
Well, I'm having a heck of a time getting this to fail.  It looks  
possible, though.  Joe, were you guys able to narrow it down to a  
reproducible test case?  Do you have any oops output messages from  
the crashes?



Zach, it easy to reproduce through fio with following config file

# cat jobfile
[global]
bs=8k
iodepth=1024
iodepth_batch=60
randrepeat=1
size=1m
directory=/home/oracle
numjobs=20
[job1]
ioengine=sync
bs=1k
direct=1
rw=randread
filename=file1:file2
[job2]
ioengine=libaio
rw=randwrite
direct=1
filename=file1:file2
[job3]
bs=1k
ioengine=posixaio
rw=randwrite
direct=1
filename=file1:file2
[job4]
ioengine=splice
direct=1
rw=randwrite
filename=file1:file2
[job5]
bs=1k
ioengine=sync
rw=randread
filename=file1:file2
[job7]
ioengine=libaio
rw=randwrite
filename=file1:file2
[job8]
ioengine=posixaio
rw=randwrite
filename=file1:file2
[job9]
ioengine=splice
rw=randwrite
filename=file1:file2
[job10]
ioengine=mmap
rw=randwrite
bs=1k
filename=file1:file2
[job11]
ioengine=mmap
rw=randwrite
direct=1
	filename=file1:file2 

  

Hmm.. in this config file, whats causing DIO to panic ? Which test actually
passing faulty buffer ?

Thanks,
Badari
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: inotify and /proc/

2007-07-30 Thread Joseph Pingenot

>From Al Viro on Tuesday, 31 July, 2007:
>On Mon, Jul 30, 2007 at 10:31:13PM -0500, Joseph Pingenot wrote:
>> >From Joseph Pingenot on Monday, 30 July, 2007:
>> >From Al Viro on Tuesday, 31 July, 2007:
>> >>On Mon, Jul 30, 2007 at 09:16:16PM -0500, Joseph Pingenot wrote:
>> >>> I was trying to use inotify to watch process changes (especially process
>> >>>   termination) by watching /proc/.
>> >>> Sadly, although I could see something reading various files, nothing
>> >>>   was issued when the process I was watching exited and the directory
>> >>>   went away.
>> >>> Is this intentional, or a bug?
>> >>It's a bug you intend to introduce in your program...  IOW, don't
>> >>do that.
>> >More background, please?
>> >What's the way to check for a process exiting without spinning?
>> I should also specify that the process being waited on is not a
>>   child process-it's just some other process on the system.
>Umm...  Any details on intended use?  IOW, is that "I want to write
>an utility that would wait for given PID to exit, just for the hell
>of it" or is there something you are trying to implement using that?

I'm trying to implement pwait.  It blocks until a specified PID exits,
  and then it exits.

You can use it to do other stuff after a program finishes.

While we're on the subject, is there some way to receive notification
  that some aspect of a process changes (in this case, stopping using
  CPU, but not exiting).

Thanks for the time to help me figure this out.

-Joseph

-- 
[EMAIL PROTECTED]///
"There is also an entire branch in the physical therapy field dedicated
 to the treatment of little-finger injuries caused by excessive Emacs
 use."  --Linux Weekly News editor (http://lwn.net/Articles/206916/)
///260 IATL / The University of Iowa / Iowa City, IA  52242///
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: inotify and /proc/

2007-07-30 Thread Al Viro

On Mon, Jul 30, 2007 at 10:31:13PM -0500, Joseph Pingenot wrote:
> >From Joseph Pingenot on Monday, 30 July, 2007:
> >From Al Viro on Tuesday, 31 July, 2007:
> >>On Mon, Jul 30, 2007 at 09:16:16PM -0500, Joseph Pingenot wrote:
> >>> I was trying to use inotify to watch process changes (especially process
> >>>   termination) by watching /proc/.
> >>> Sadly, although I could see something reading various files, nothing
> >>>   was issued when the process I was watching exited and the directory
> >>>   went away.
> >>> Is this intentional, or a bug?
> >>It's a bug you intend to introduce in your program...  IOW, don't
> >>do that.
> >More background, please?
> >What's the way to check for a process exiting without spinning?
> 
> I should also specify that the process being waited on is not a
>   child process-it's just some other process on the system.

Umm...  Any details on intended use?  IOW, is that "I want to write
an utility that would wait for given PID to exit, just for the hell
of it" or is there something you are trying to implement using that?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [-mm PATCH 4/9] Memory controller memory accounting (v4)

2007-07-30 Thread YAMAMOTO Takashi

> + lock_meta_page(page);
> + /*
> +  * Check if somebody else beat us to allocating the meta_page
> +  */
> + race_mp = page_get_meta_page(page);
> + if (race_mp) {
> + kfree(mp);
> + mp = race_mp;
> + atomic_inc(>ref_cnt);
> + res_counter_uncharge(>res, 1);
> + goto done;
> + }

i think you need css_put here.

YAMAMOTO Takashi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Fwd: [PlanetCCRMA] atl1 driver; sleeping function]

2007-07-30 Thread Fernando Lopez-Lezcano

Hi Ingo, I'm forwading this report from a Planet CCRMA user, this is
happening to him with 2.6.21.6-rt21...

-- Fernando

 Forwarded Message 
From: Matt Barber
To: [EMAIL PROTECTED]
Subject: [PlanetCCRMA] atl1 driver; sleeping function
Date: Mon, 30 Jul 2007 06:09:58 -0400

Hello,

I'm getting a set of BUG messages in my dmesg with the newest ccrma
kernel.  This is a new box, so I haven't tried the older ccrma
kernels, but the bugs aren't there with Fedora stock.  They look like
this (probably at least a hundred more by now):

BUG: sleeping function called from invalid context IRQ-219(2243) at
kernel/rtmutex.c:613
in_atomic():0 [], irqs_disabled():1
 [] dump_trace+0x64/0x105
 [] show_trace_log_lvl+0x18/0x2c
 [] show_trace+0xf/0x11
 [] dump_stack+0x12/0x14
 [] __rt_spin_lock+0x21/0x3d
 [] atl1_xmit_frame+0x66f/0x6c6 [atl1]
 [] dev_hard_start_xmit+0x1c6/0x225
 [] __qdisc_run+0xb7/0x1cf
 [] dev_queue_xmit+0x14a/0x239
 [] ip_output+0x207/0x243
 [] ip_queue_xmit+0x3b2/0x402
 [] tcp_transmit_skb+0x6e5/0x713
 [] tcp_send_ack+0xeb/0xef
 [] tcp_rcv_established+0x52a/0x7ff
 [] tcp_v4_do_rcv+0x1bf/0x494
 [] tcp_v4_rcv+0x863/0x8d6
 [] ip_local_deliver+0x18f/0x23d
 [] ip_rcv+0x41d/0x456
 [] netif_receive_skb+0x2cc/0x35e
 [] process_backlog+0x76/0xc9
 [] net_rx_action+0xa7/0x1a5
 [] ___do_softirq+0xfe/0x214
 [] do_softirq_from_hardirq+0x48/0x61
 [] do_irqd+0x21a/0x282
 [] kthread+0xb0/0xd8
 [] kernel_thread_helper+0x7/0x10
 ===
printk: 6 messages suppressed.
network driver disabled raw interrupts: atl1_xmit_frame+0x0/0x6c6 [atl1]
BUG: sleeping function called from invalid context firefox-bin(17517)
at kernel/rtmutex.c:613
in_atomic():0 [], irqs_disabled():1
 [] dump_trace+0x64/0x105
 [] show_trace_log_lvl+0x18/0x2c
 [] show_trace+0xf/0x11
 [] dump_stack+0x12/0x14
 [] __rt_spin_lock+0x21/0x3d
 [] atl1_xmit_frame+0x66f/0x6c6 [atl1]
 [] dev_hard_start_xmit+0x1c6/0x225
 [] __qdisc_run+0xb7/0x1cf
 [] dev_queue_xmit+0x14a/0x239
 [] ip_output+0x207/0x243
 [] ip_queue_xmit+0x3b2/0x402
 [] tcp_transmit_skb+0x6e5/0x713
 [] tcp_push_one+0xb3/0xd8
 [] tcp_sendmsg+0x7c8/0x9f9
 [] inet_sendmsg+0x3b/0x45
 [] sock_sendmsg+0xd0/0xeb
 [] sys_sendto+0x11b/0x13b
 [] sys_send+0x37/0x3b
 [] sys_socketcall+0x14a/0x261
 [] syscall_call+0x7/0xb
 [] 0xb7fd8410
 ===
network driver disabled raw interrupts: atl1_xmit_frame+0x0/0x6c6 [atl1]
network driver disabled raw interrupts: atl1_xmit_frame+0x0/0x6c6 [atl1]
network driver disabled raw interrupts: atl1_xmit_frame+0x0/0x6c6 [atl1]
BUG: sleeping function called from invalid context IRQ-219(2243) at
kernel/rtmutex.c:613
in_atomic():0 [], irqs_disabled():1
 [] dump_trace+0x64/0x105
 [] show_trace_log_lvl+0x18/0x2c
 [] show_trace+0xf/0x11
 [] dump_stack+0x12/0x14
 [] __rt_spin_lock+0x21/0x3d
 [] atl1_xmit_frame+0x66f/0x6c6 [atl1]
 [] dev_hard_start_xmit+0x1c6/0x225
 [] __qdisc_run+0xb7/0x1cf
 [] dev_queue_xmit+0x14a/0x239
 [] ip_output+0x207/0x243
 [] ip_queue_xmit+0x3b2/0x402
 [] tcp_transmit_skb+0x6e5/0x713
 [] tcp_send_ack+0xeb/0xef
 [] tcp_rcv_established+0x52a/0x7ff
 [] tcp_v4_do_rcv+0x1bf/0x494
 [] tcp_v4_rcv+0x863/0x8d6
 [] ip_local_deliver+0x18f/0x23d
 [] ip_rcv+0x41d/0x456
 [] netif_receive_skb+0x2cc/0x35e
 [] process_backlog+0x76/0xc9
 [] net_rx_action+0xa7/0x1a5
 [] ___do_softirq+0xfe/0x214
 [] do_softirq_from_hardirq+0x48/0x61
 [] do_irqd+0x21a/0x282
 [] kthread+0xb0/0xd8
 [] kernel_thread_helper+0x7/0x10
 ===
printk: 14 messages suppressed.
network driver disabled raw interrupts: atl1_xmit_frame+0x0/0x6c6 [atl1]
BUG: sleeping function called from invalid context firefox-bin(17517)
at kernel/rtmutex.c:613
in_atomic():0 [], irqs_disabled():1
 [] dump_trace+0x64/0x105
 [] show_trace_log_lvl+0x18/0x2c
 [] show_trace+0xf/0x11
 [] dump_stack+0x12/0x14
 [] __rt_spin_lock+0x21/0x3d
 [] atl1_xmit_frame+0x66f/0x6c6 [atl1]
 [] dev_hard_start_xmit+0x1c6/0x225
 [] __qdisc_run+0xb7/0x1cf
 [] dev_queue_xmit+0x14a/0x239
 [] ip_output+0x207/0x243
 [] ip_queue_xmit+0x3b2/0x402
 [] tcp_transmit_skb+0x6e5/0x713
 [] tcp_push_one+0xb3/0xd8
 [] tcp_sendmsg+0x7c8/0x9f9
 [] inet_sendmsg+0x3b/0x45
 [] sock_sendmsg+0xd0/0xeb
 [] sys_sendto+0x11b/0x13b
 [] sys_send+0x37/0x3b
 [] sys_socketcall+0x14a/0x261
 [] syscall_call+0x7/0xb
 [] 0xb7fd8410
 ===
BUG: sleeping function called from invalid context IRQ-219(2243) at
kernel/rtmutex.c:613
in_atomic():0 [], irqs_disabled():1
 [] dump_trace+0x64/0x105
 [] show_trace_log_lvl+0x18/0x2c
 [] show_trace+0xf/0x11
 [] dump_stack+0x12/0x14
 [] __rt_spin_lock+0x21/0x3d
 [] atl1_xmit_frame+0x66f/0x6c6 [atl1]
 [] dev_hard_start_xmit+0x1c6/0x225
 [] __qdisc_run+0xb7/0x1cf
 [] dev_queue_xmit+0x14a/0x239
 [] ip_output+0x207/0x243
 [] ip_queue_xmit+0x3b2/0x402
 [] tcp_transmit_skb+0x6e5/0x713
 [] __tcp_push_pending_frames+0x6ec/0x7af
 [] tcp_rcv_established+0x107/0x7ff
 [] tcp_v4_do_rcv+0x1bf/0x494
 [] tcp_v4_rcv+0x863/0x8d6
 []

Re: [PATCH] sysrq: add a show-stacktrace-on-all-cpus command

2007-07-30 Thread Avi Kivity

Andrew Morton wrote:
> On Sun, 29 Jul 2007 13:29:26 +0300
> Avi Kivity <[EMAIL PROTECTED]> wrote:
>
>   
>> If a cpu is spinning in the kernel but still responding to interrupts,
>> pressing sysrq-y will show you where it's spinning.
>>
>> Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
>>
>> diff --git a/drivers/char/sysrq.c b/drivers/char/sysrq.c
>> index 39cc318..1dda709 100644
>> --- a/drivers/char/sysrq.c
>> +++ b/drivers/char/sysrq.c
>> @@ -45,6 +45,8 @@ int __read_mostly __sysrq_enabled = 1;
>>  
>>  static int __read_mostly sysrq_always_enabled;
>>  
>> +static spinlock_t show_stack_lock = SPIN_LOCK_UNLOCKED;
>> 
>
> Use DEFINE_SPINLOCK to avoid confusing lockdep.
>
>   

Okay.

>>  int sysrq_on(void)
>>  {
>>  return __sysrq_enabled || sysrq_always_enabled;
>> @@ -309,6 +311,26 @@ static struct sysrq_key_op sysrq_unrt_op = {
>>  .enable_mask= SYSRQ_ENABLE_RTNICE,
>>  };
>>  
>> +static void show_cpu_stack(void *garbage)
>> +{
>> +spin_lock(_stack_lock);
>> +printk("CPU%d stacktrace:\n", raw_smp_processor_id());
>> +dump_stack();
>> +sysrq_handle_showregs(0, NULL);
>> +spin_unlock(_stack_lock);
>> +}
>> +
>> +static void sysrq_show_stacks(int key, struct tty_struct *tty)
>> +{
>> +on_each_cpu(show_cpu_stack, NULL, 0, 1);
>> +}
>> +
>> +static struct sysrq_key_op sysrq_show_stacks_op = {
>> +.handler= sysrq_show_stacks,
>> +.help_msg   = "stacktraces-on-all-cpus(Y)",
>> +.action_msg = "Stack traces on all cpus",
>> +};
>> +
>>  /* Key Operations table and lock */
>>  static DEFINE_SPINLOCK(sysrq_key_table_lock);
>>  
>> @@ -356,7 +378,7 @@ static struct sysrq_key_op *sysrq_key_table[36] = {
>>  _showstate_blocked_op,/* w */
>>  /* x: May be registered on ppc/powerpc for xmon */
>>  NULL,   /* x */
>> -NULL,   /* y */
>> +_show_stacks_op,  /* y */
>>  NULL/* z */
>>  };
>>  
>> 
>
> but, but..  sysrq handlers called from hard IRQ.  Are we sure that none of
> the drivers which call into the sysrq code do so with hard IRQs disabled?
>
> Because if we call on_each_cpu() with hard IRQs disabled, the various
> implementations will emit loud warnings due to the deadlockability.
>
>   

Ah, I tested this with /proc/sysrq-trigger.

Maybe we should temporarily enable irqs here; if you press this key,
your system is already on the way out anyway.  But perhaps that would
entrap curious admins checking out what the keys do.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: inotify and /proc/

2007-07-30 Thread Al Viro

On Mon, Jul 30, 2007 at 10:25:21PM -0500, Joseph Pingenot wrote:
> >From Al Viro on Tuesday, 31 July, 2007:
> >On Mon, Jul 30, 2007 at 09:16:16PM -0500, Joseph Pingenot wrote:
> >> I was trying to use inotify to watch process changes (especially process
> >>   termination) by watching /proc/.
> >> Sadly, although I could see something reading various files, nothing
> >>   was issued when the process I was watching exited and the directory
> >>   went away.
> >> Is this intentional, or a bug?
> >It's a bug you intend to introduce in your program...  IOW, don't
> >do that.
> 
> More background, please?
> 
> What's the way to check for a process exiting without spinning?

Depends on what that process is and how it is related to watching
one...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: inotify and /proc/

2007-07-30 Thread Joseph Pingenot

>From Joseph Pingenot on Monday, 30 July, 2007:
>From Al Viro on Tuesday, 31 July, 2007:
>>On Mon, Jul 30, 2007 at 09:16:16PM -0500, Joseph Pingenot wrote:
>>> I was trying to use inotify to watch process changes (especially process
>>>   termination) by watching /proc/.
>>> Sadly, although I could see something reading various files, nothing
>>>   was issued when the process I was watching exited and the directory
>>>   went away.
>>> Is this intentional, or a bug?
>>It's a bug you intend to introduce in your program...  IOW, don't
>>do that.
>More background, please?
>What's the way to check for a process exiting without spinning?

I should also specify that the process being waited on is not a
  child process-it's just some other process on the system.

-- 
[EMAIL PROTECTED]///
"There is also an entire branch in the physical therapy field dedicated
 to the treatment of little-finger injuries caused by excessive Emacs
 use."  --Linux Weekly News editor (http://lwn.net/Articles/206916/)
///260 IATL / The University of Iowa / Iowa City, IA  52242///
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Kernel Build Failure

2007-07-30 Thread Chris Holvenstot

I am having a problem, likely caused by the vacuum between my ears, but
I thought that a few of you would find it recreational to tear up an
inexperienced user for doing dumb things.  

I have what is quickly becoming a "un-ubuntu" 7.04 system - as time goes
on I seem to be replacing the standard distribution with "alternative"
components - this past weekend I replaced the kernel because the one
shipped with Ubuntu was not cutting it with the new SATA drives I
recently purchased.

So I downloaded the 2.6.22.1 kernel sources and set out to build a new
kernel.  For the most part I left things pretty much as they shipped
when running the menuconfig tool and an hour or so later I had a
functioning kernel which did a decent job supporting my new hard drives
(I still have a slight problem with soft resets which seems to go away
when I turn NCQ down to zero)

While I was researching the SATA issue I noted a comment on the mailing
list that having a few more people build and run your RC1 kernels wold
be desirable.  So I said, why not?   I may still be new to Linux and C
but I was an MVS Sysprog and assembly language programmer for 25 years
and if nothing else I have strong troubleshooting skills and a
understanding of how an operating system works.

My target system for this activity was my "multi-purpose" AMD64 X2 based
home server - in addition to being my desktop, it also functions as the
family web server and even runs the sewing machine via an instance of
Windows running under VirtualBox.  The box is up 24x7 and gets a good
work out on a variety of tasks, but if it goes down so be it.  

However, I think that my process for building an RC1 kernel must be
flawed, and as much as I poked around I can not spot what I am doing
wrong.  

I take a virgin set of 2.6.22.1 sources.  

To this I added the 2.6.23-rc1 patch set.  The patch verified and
applied without generating any error messages.

The only change I made with the menuconfig tool from 2.6.22.1 was to
change the generated kernel name.

I ran make and received the following error:

CC [M]  net/netfilter/nf_conntrack_proto_sctp.o
net/netfilter/nf_conntrack_proto_sctp.c: In function ‘sctp_new’:
net/netfilter/nf_conntrack_proto_sctp.c:436: error: implicit declaration
of function ‘DEBUGP’
make[2]: *** [net/netfilter/nf_conntrack_proto_sctp.o] Error 1
make[1]: *** [net/netfilter] Error 2
make: *** [net] Error 

Any suggestions as to what I might be doing wrong would be appreciated.

Thanks

Chris






-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: inotify and /proc/

2007-07-30 Thread Joseph Pingenot

>From Al Viro on Tuesday, 31 July, 2007:
>On Mon, Jul 30, 2007 at 09:16:16PM -0500, Joseph Pingenot wrote:
>> I was trying to use inotify to watch process changes (especially process
>>   termination) by watching /proc/.
>> Sadly, although I could see something reading various files, nothing
>>   was issued when the process I was watching exited and the directory
>>   went away.
>> Is this intentional, or a bug?
>It's a bug you intend to introduce in your program...  IOW, don't
>do that.

More background, please?

What's the way to check for a process exiting without spinning?

-Joseph

-- 
[EMAIL PROTECTED]///
"There is also an entire branch in the physical therapy field dedicated
 to the treatment of little-finger injuries caused by excessive Emacs
 use."  --Linux Weekly News editor (http://lwn.net/Articles/206916/)
///260 IATL / The University of Iowa / Iowa City, IA  52242///
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/5] x86_64 EFI support -v3: EFI framebuffer driver

2007-07-30 Thread Huang, Ying

This patch adds Graphics Output Protocol support to the kernel.
UEFI2.0 spec deprecates Universal Graphics Adapter (UGA) protocol and
only Graphics Output Protocol (GOP) is produced. Therefore, the boot
loader needs to query the UEFI firmware with appropriate Output
Protocol and pass the video information to the kernel. As a result of
GOP protocol, an EFI framebuffer driver is needed for displaying
console messages. The patch adds a EFI framebuffer driver. The EFI
frame buffer driver in this patch is based on the Intel Mac
framebuffer driver.

The ELILO bootloader takes care of passing the video information as
appropriate for EFI firmware.

Signed-off-by: Chandramouli Narayanan <[EMAIL PROTECTED]>
Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---

 drivers/video/Kconfig   |   11 ++
 drivers/video/Makefile  |1 
 drivers/video/efifb.c   |  242 
 include/linux/screen_info.h |1 
 4 files changed, 255 insertions(+)

Index: linux-2.6.23-rc1/include/linux/screen_info.h
===
--- linux-2.6.23-rc1.orig/include/linux/screen_info.h   2007-07-30 
15:46:55.0 +0800
+++ linux-2.6.23-rc1/include/linux/screen_info.h2007-07-30 
15:48:14.0 +0800
@@ -62,6 +62,7 @@
 #define VIDEO_TYPE_EGAC0x21/* EGA in Color Mode
*/
 #define VIDEO_TYPE_VGAC0x22/* VGA+ in Color Mode   
*/
 #define VIDEO_TYPE_VLFB0x23/* VESA VGA in graphic mode 
*/
+#define VIDEO_TYPE_EFI 0x24/* EFI graphic mode */
 
 #define VIDEO_TYPE_PICA_S3 0x30/* ACER PICA-61 local S3 video  */
 #define VIDEO_TYPE_MIPS_G364   0x31/* MIPS Magnum 4000 G364 video  */
Index: linux-2.6.23-rc1/drivers/video/Kconfig
===
--- linux-2.6.23-rc1.orig/drivers/video/Kconfig 2007-07-30 15:46:55.0 
+0800
+++ linux-2.6.23-rc1/drivers/video/Kconfig  2007-07-31 10:17:10.0 
+0800
@@ -605,6 +605,17 @@
  You will get a boot time penguin logo at no additional cost. Please
  read . If unsure, say Y.
 
+config FB_EFI
+   bool "EFI-based Framebuffer Support"
+   depends on (FB = y) && X86_64 && EFI
+   select FB_CFB_FILLRECT
+   select FB_CFB_COPYAREA
+   select FB_CFB_IMAGEBLIT
+   help
+ This is the EFI frame buffer device driver. If the firmware on
+ your platform is UEFI2.0, select Y to add support for
+ Graphics Output Protocol for early console messages to appear.
+
 config FB_IMAC
bool "Intel-based Macintosh Framebuffer Support"
depends on (FB = y) && X86 && EFI
Index: linux-2.6.23-rc1/drivers/video/Makefile
===
--- linux-2.6.23-rc1.orig/drivers/video/Makefile2007-07-30 
15:46:55.0 +0800
+++ linux-2.6.23-rc1/drivers/video/Makefile 2007-07-30 15:48:14.0 
+0800
@@ -118,6 +118,7 @@
 # Platform or fallback drivers go here
 obj-$(CONFIG_FB_VESA) += vesafb.o
 obj-$(CONFIG_FB_IMAC) += imacfb.o
+obj-$(CONFIG_FB_EFI)  += efifb.o
 obj-$(CONFIG_FB_VGA16)+= vga16fb.o
 obj-$(CONFIG_FB_OF)   += offb.o
 
Index: linux-2.6.23-rc1/drivers/video/efifb.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.23-rc1/drivers/video/efifb.c  2007-07-30 15:48:14.0 
+0800
@@ -0,0 +1,242 @@
+/*
+ * framebuffer driver for EFI/UEFI based system
+ *
+ * (c) 2006 Edgar Hucek <[EMAIL PROTECTED]>
+ * Original efi driver written by Gerd Knorr <[EMAIL PROTECTED]>
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+static struct fb_var_screeninfo efifb_defined __initdata = {
+   .activate   = FB_ACTIVATE_NOW,
+   .height = -1,
+   .width  = -1,
+   .right_margin   = 32,
+   .upper_margin   = 16,
+   .lower_margin   = 4,
+   .vsync_len  = 4,
+   .vmode  = FB_VMODE_NONINTERLACED,
+};
+
+static struct fb_fix_screeninfo efifb_fix __initdata = {
+   .id = "EFI VGA",
+   .type   = FB_TYPE_PACKED_PIXELS,
+   .accel  = FB_ACCEL_NONE,
+   .visual = FB_VISUAL_TRUECOLOR,
+};
+
+static int efifb_setcolreg(unsigned regno, unsigned red, unsigned green,
+  unsigned blue, unsigned transp,
+  struct fb_info *info)
+{
+   /*
+*  Set a single color register. The values supplied are
+*  already rounded down to the hardware's capabilities
+*  (according to the

Re: inotify and /proc/

2007-07-30 Thread Al Viro

On Mon, Jul 30, 2007 at 09:16:16PM -0500, Joseph Pingenot wrote:
> I was trying to use inotify to watch process changes (especially process
>   termination) by watching /proc/.
> 
> Sadly, although I could see something reading various files, nothing
>   was issued when the process I was watching exited and the directory
>   went away.
> 
> Is this intentional, or a bug?

It's a bug you intend to introduce in your program...  IOW, don't
do that.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/5] x86_64 EFI support -v3: EFI boot support

2007-07-30 Thread Huang, Ying

This patch adds boot support for EFI x86_64 system.

Boot parameter setup file is updated for x86_64 EFI support. x86_64
EFI boot loader must conform to the EFI boot parameter offsets defined
in the file include/asm-x86_64/bootsetup.h (and x86_64 patches
submitted to ELILO bootloader conforms to this).

The EFI to E820 memory map conversion is done in bootloader. The ELILO
bootloader x86_64 support has been updated to pass E820 map to
kernel. To support EFI runtime service code memory segment, a new E820
memory segment type named E820_RUNTIME_CODE is added. NX bit is turned
off for EFI runtime service area so that EFI runtime code is
executable.

Signed-off-by: Chandramouli Narayanan <[EMAIL PROTECTED]>
Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---

 arch/x86_64/kernel/aperture.c  |5 +
 arch/x86_64/kernel/e820.c  |8 
 arch/x86_64/kernel/setup.c |   16 +++-
 arch/x86_64/mm/init.c  |   30 --
 include/asm-x86_64/bootsetup.h |6 ++
 include/asm-x86_64/e820.h  |1 +
 6 files changed, 55 insertions(+), 11 deletions(-)

Index: linux-2.6.23-rc1/include/asm-x86_64/bootsetup.h
===
--- linux-2.6.23-rc1.orig/include/asm-x86_64/bootsetup.h2007-07-30 
14:51:31.0 +0800
+++ linux-2.6.23-rc1/include/asm-x86_64/bootsetup.h 2007-07-30 
14:53:58.0 +0800
@@ -17,6 +17,12 @@
 #define APM_BIOS_INFO (*(struct apm_bios_info *) (PARAM+0x40))
 #define DRIVE_INFO (*(struct drive_info_struct *) (PARAM+0x80))
 #define SYS_DESC_TABLE (*(struct sys_desc_table_struct*)(PARAM+0xa0))
+#define EFI_SYSTAB (*((unsigned long *)(PARAM+0x1b8)))
+#define EFI_LOADER_SIG ((unsigned char *)(PARAM+0x1c0))
+#define EFI_MEMDESC_SIZE (*((unsigned int *) (PARAM+0x1c4)))
+#define EFI_MEMDESC_VERSION (*((unsigned int *) (PARAM+0x1c8)))
+#define EFI_MEMMAP_SIZE (*((unsigned int *) (PARAM+0x1cc)))
+#define EFI_MEMMAP (*((unsigned long *)(PARAM+0x1d0)))
 #define MOUNT_ROOT_RDONLY (*(unsigned short *) (PARAM+0x1F2))
 #define RAMDISK_FLAGS (*(unsigned short *) (PARAM+0x1F8))
 #define SAVED_VIDEO_MODE (*(unsigned short *) (PARAM+0x1FA))
Index: linux-2.6.23-rc1/include/asm-x86_64/e820.h
===
--- linux-2.6.23-rc1.orig/include/asm-x86_64/e820.h 2007-07-30 
14:51:31.0 +0800
+++ linux-2.6.23-rc1/include/asm-x86_64/e820.h  2007-07-30 14:53:58.0 
+0800
@@ -19,6 +19,7 @@
 #define E820_RESERVED  2
 #define E820_ACPI  3
 #define E820_NVS   4
+#define E820_RUNTIME_CODE  5   /* efi runtime code */
 
 #ifndef __ASSEMBLY__
 struct e820entry {
Index: linux-2.6.23-rc1/arch/x86_64/kernel/aperture.c
===
--- linux-2.6.23-rc1.orig/arch/x86_64/kernel/aperture.c 2007-07-30 
14:51:31.0 +0800
+++ linux-2.6.23-rc1/arch/x86_64/kernel/aperture.c  2007-07-30 
14:53:58.0 +0800
@@ -94,6 +94,11 @@
printk("Aperture pointing to e820 RAM. Ignoring.\n");
return 0; 
} 
+   if (e820_any_mapped(aper_base, aper_base + aper_size,
+   E820_RUNTIME_CODE)) {
+   printk("Aperture pointing to runtime code. Ignoring.\n");
+   return 0;
+   }
return 1;
 } 
 
Index: linux-2.6.23-rc1/arch/x86_64/kernel/e820.c
===
--- linux-2.6.23-rc1.orig/arch/x86_64/kernel/e820.c 2007-07-30 
14:51:31.0 +0800
+++ linux-2.6.23-rc1/arch/x86_64/kernel/e820.c  2007-07-30 14:53:58.0 
+0800
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -206,6 +207,7 @@
case E820_RAM:  res->name = "System RAM"; break;
case E820_ACPI: res->name = "ACPI Tables"; break;
case E820_NVS:  res->name = "ACPI Non-volatile Storage"; break;
+   case E820_RUNTIME_CODE: res->name = "EFI runtime code"; break;
default:res->name = "reserved";
}
res->start = e820.map[i].addr;
@@ -376,6 +378,9 @@
case E820_NVS:
printk("(ACPI NVS)\n");
break;
+   case E820_RUNTIME_CODE:
+   printk("(runtime code)\n");
+   break;
default:printk("type %u\n", e820.map[i].type);
break;
}
@@ -648,6 +653,9 @@
} else if (*p == '$') {
start_at = memparse(p+1, );
add_memory_region(start_at, mem_size, E820_RESERVED);
+   } else if (*p == '%') {
+   start_at = memparse(p+1, );
+   add_memory_region(start_at, mem_size, E820_RUNTIME_CODE);
} else {
end_user_pfn = (mem_size >> PAGE_SHIFT);

[PATCH 3/5] x86_64 EFI support -v3: EFI runtime support

2007-07-30 Thread Huang, Ying

This patch adds runtime service support for EFI x86_64 system.

The EFI support for emergency_restart and RTC clock is added. The EFI
based implementation and legacy BIOS or CMOS based implementation are
put in separate functions and are chosen based on the value of
efi_enabled.

Signed-off-by: Chandramouli Narayanan <[EMAIL PROTECTED]>
Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---

 reboot.c |   11 ++-
 time.c   |   47 +++
 2 files changed, 41 insertions(+), 17 deletions(-)

Index: linux-2.6.23-rc1/arch/x86_64/kernel/reboot.c
===
--- linux-2.6.23-rc1.orig/arch/x86_64/kernel/reboot.c   2007-07-23 
04:41:00.0 +0800
+++ linux-2.6.23-rc1/arch/x86_64/kernel/reboot.c2007-07-30 
09:26:56.0 +0800
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -117,7 +118,7 @@
pci_iommu_shutdown();
 }
 
-void machine_emergency_restart(void)
+static inline void bios_emergency_restart(void)
 {
int i;
 
@@ -145,6 +146,14 @@
}  
 }
 
+void machine_emergency_restart(void)
+{
+   if (efi_enabled)
+   efi_emergency_restart();
+   else
+   bios_emergency_restart();
+}
+
 void machine_restart(char * __unused)
 {
printk("machine restart\n");
Index: linux-2.6.23-rc1/arch/x86_64/kernel/time.c
===
--- linux-2.6.23-rc1.orig/arch/x86_64/kernel/time.c 2007-07-23 
04:41:00.0 +0800
+++ linux-2.6.23-rc1/arch/x86_64/kernel/time.c  2007-07-30 09:42:12.0 
+0800
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #ifdef CONFIG_ACPI
 #include   /* for PM timer frequency */
@@ -89,13 +90,6 @@
unsigned char control, freq_select;
 
 /*
- * IRQs are disabled when we're called from the timer interrupt,
- * no need for spin_lock_irqsave()
- */
-
-   spin_lock(_lock);
-
-/*
  * Tell the clock it's being set and stop it.
  */
 
@@ -143,14 +137,26 @@
CMOS_WRITE(control, RTC_CONTROL);
CMOS_WRITE(freq_select, RTC_FREQ_SELECT);
 
-   spin_unlock(_lock);
-
return retval;
 }
 
 int update_persistent_clock(struct timespec now)
 {
-   return set_rtc_mmss(now.tv_sec);
+   int retval;
+
+/*
+ * IRQs are disabled when we're called from the timer interrupt,
+ * no need for spin_lock_irqsave()
+ */
+
+   spin_lock(_lock);
+   if (efi_enabled)
+   retval = efi_set_rtc_mmss(now.tv_sec);
+   else
+   retval = set_rtc_mmss(now.tv_sec);
+   spin_unlock(_lock);
+
+   return retval;
 }
 
 void main_timer_handler(void)
@@ -195,14 +201,11 @@
return IRQ_HANDLED;
 }
 
-unsigned long read_persistent_clock(void)
+unsigned long read_cmos_clock(void)
 {
unsigned int year, mon, day, hour, min, sec;
-   unsigned long flags;
unsigned century = 0;
 
-   spin_lock_irqsave(_lock, flags);
-
do {
sec = CMOS_READ(RTC_SECONDS);
min = CMOS_READ(RTC_MINUTES);
@@ -217,8 +220,6 @@
 #endif
} while (sec != CMOS_READ(RTC_SECONDS));
 
-   spin_unlock_irqrestore(_lock, flags);
-
/*
 * We know that x86-64 always uses BCD format, no need to check the
 * config register.
@@ -246,6 +247,20 @@
return mktime(year, mon, day, hour, min, sec);
 }
 
+unsigned long read_persistent_clock(void)
+{
+   unsigned long flags, retval;
+
+   spin_lock_irqsave(_lock, flags);
+   if (efi_enabled)
+   retval = efi_get_time();
+   else
+   retval = read_cmos_clock();
+   spin_unlock_irqrestore(_lock, flags);
+
+   return retval;
+}
+
 /* calibrate_cpu is used on systems with fixed rate TSCs to determine
  * processor frequency */
 #define TICK_COUNT 1
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 5/5] x86_64 EFI support -v3: EFI document

2007-07-30 Thread Huang, Ying

This patch adds document for EFI x86_64 support. The boot parameters
added are documented in Documentation/i386/zero-page.txt. The setup
and operation guide of EFI based system is documented in
Documentation/x86_64/uefi.txt.

Signed-off-by: Chandramouli Narayanan <[EMAIL PROTECTED]>
Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---

 i386/zero-page.txt |   18 ++
 x86_64/uefi.txt|   42 ++
 2 files changed, 56 insertions(+), 4 deletions(-)

Index: linux-2.6.23-rc1/Documentation/i386/zero-page.txt
===
--- linux-2.6.23-rc1.orig/Documentation/i386/zero-page.txt  2007-07-30 
11:28:45.0 +0800
+++ linux-2.6.23-rc1/Documentation/i386/zero-page.txt   2007-07-30 
11:29:28.0 +0800
@@ -31,11 +31,11 @@
  0xb0 - 0x13f  Free. Add more parameters here if you really need them.
  0x140- 0x1be  EDID_INFO Video mode setup
 
-0x1c4  unsigned long   EFI system table pointer
-0x1c8  unsigned long   EFI memory descriptor size
-0x1cc  unsigned long   EFI memory descriptor version
+0x1c4  unsigned long   EFI system table pointer*
+0x1c8  unsigned long   EFI memory descriptor size*
+0x1cc  unsigned long   EFI memory descriptor version*
 0x1d0  unsigned long   EFI memory descriptor map pointer
-0x1d4  unsigned long   EFI memory descriptor map size
+0x1d4  unsigned long   EFI memory descriptor map size*
 0x1e0  unsigned long   ALT_MEM_K, alternative mem check, in Kb
 0x1e4  unsigned long   Scratch field for the kernel setup code
 0x1e8  charnumber of entries in E820MAP (below)
@@ -87,3 +87,13 @@
 0x2d0 - 0xd00  E820MAP
 0xd00 - 0xeff  EDDBUF (edd.S) for disk signature read sector
 0xd00 - 0xeeb  EDDBUF (edd.S) for edd data
+
+Changes for x86_64 implementation:
+-
+For alignment purposes, the following parameters are rearranged.
+
+0x1b8  unsigned long   EFI system table pointer
+0x1c0  unsigned long   EFI Loader signature
+0x1c4  unsigned long   EFI memory descriptor size
+0x1c8  unsigned long   EFI memory descriptor version
+0x1cc  unsigned long   EFI memory descriptor map size
Index: linux-2.6.23-rc1/Documentation/x86_64/uefi.txt
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.23-rc1/Documentation/x86_64/uefi.txt  2007-07-30 
11:29:28.0 +0800
@@ -0,0 +1,42 @@
+General note on [U]EFI x86_64 support
+-
+
+This provides documentation on [U]EFI support for x86_64 architecture.
+The nomenclature EFI and UEFI are used intechangeably in this document.
+
+Although the tools below are _not_ needed for building the kernel,
+the needed bootloader support and associated tools for x86_64 platforms
+with EFI firmware and specifications are listed below.
+
+1. UEFI specification:  http://www.uefi.org
+
+2. Booting EFI64 enabled kernel requires boot loader support.
+Patches to elilo and gnu-efi library with x86_64 support and documentation
+have been submitted to respective project maintainers.
+   elilo: http://sourceforge.net/projects/elilo
+   gnu-efi library: http://sourceforge.net/projects/gnu-efi/
+   gnu-efi-3.0d release now supports [U]EFI x86_64.
+
+3. The tool to convert ELF to PE-COFF image:
+   binutils-2.17.50.0.14 supports Intel64 EFI.
+   see http://www.kernel.org/pub/linux/devel/binutils/
+   [ elilo/gnu-efi with x86_64 support need this binutils support ]
+
+4. x86_64 platform with EFI/UEFI firmware.
+
+Mechanics:
+-
+- Apply the EFI64 kernel patches and build with the following configuration.
+   CONFIG_EFI=y
+   EFI_FB=y
+   CONFIG_FRAMEBUFFER_CONSOLE=y
+   CONFIG_EFI_VARS=y
+
+- Create a VFAT partition on the disk
+- Copy the following to the VFAT partition:
+   elilo bootloader with x86_64 support and elilo configuration file
+   efi64 kernel image and initrd. Instructions on building elilo
+   and its dependencies can be found in the elilo sourceforge project.
+- Boot to EFI shell and invoke elilo choosing efi64 kernel image
+- On UEFI2.0 firmware systems, pass vga=normal for boot messages to show up
+  console. You can pass along the 'resume' boot option to test suspend/resume.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/5] x86_64 EFI support -v3

2007-07-30 Thread Huang, Ying

Following sets of patches add EFI/UEFI (Unified Extensible Firmware
Interface) support to x86_64 architecture. The patches have been
tested against 2.6.23-rc1 kernel on Intel platforms with EFI1.10 and
UEFI2.0 firmware.

UEFI specification can be found here: http://www.uefi.org

For booting the UEFI x86_64 enabled kernel, the machine with EFI/UEFI
firmware and the support of bootloader is required. Detailed usage
guide can be found in Documentation/x86_64/uefi.txt, which is added in
the patch: efi-doc.patch.

Issues _not_ addressed (per feedback from Eric Biederman)

- Virtual mode support is still retained in this patch. There is at
  least one EFI call is fast path: efi_set_rtc_mmss, which must
  complete as soon as possible.

- The variable efi_enabled is used throughout across architecutres if
 CONFIG_EFI option is enabled. The i386 code also uses this variable.
 This is something that can be revisited with code consolidation
 across architectures.

Looking forward to your comments,

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/5] x86_64 EFI support -v3: EFI base support

2007-07-30 Thread Huang, Ying

Changelog between v2 and v3:

1. The EFI callwrapper is re-implemented in assembler.

---

This patch adds basic support for EFI x86_64 system. The main file of
the patch is the addition of efi.c for x86_64. This file is modeled
after the EFI IA32 avatar. EFI initialization are implemented in
efi.c. Some x86_64 specifics are worth noting here. On x86_64,
parameters passed to UEFI firmware services need to follow the UEFI
calling convention. For this purpose, a set of functions named
lin2win ( is the number of parameters) are implemented. EFI
function calls are wrapped before calling the firmware service.

Signed-off-by: Chandramouli Narayanan <[EMAIL PROTECTED]>
Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---

 arch/x86_64/Kconfig   |   13 
 arch/x86_64/kernel/Makefile   |1 
 arch/x86_64/kernel/efi.c  |  507 ++
 arch/x86_64/kernel/efi_callwrap.S |   69 +
 include/asm-x86_64/eficallwrap.h  |   27 ++
 include/linux/efi.h   |1 
 6 files changed, 618 insertions(+)

Index: linux-2.6.23-rc1/include/asm-x86_64/eficallwrap.h
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.23-rc1/include/asm-x86_64/eficallwrap.h   2007-07-30 
15:35:48.0 +0800
@@ -0,0 +1,27 @@
+/*
+ *  Copyright (C) 2007 Intel Corp
+ * Bibo Mao <[EMAIL PROTECTED]>
+ * Huang Ying <[EMAIL PROTECTED]>
+ *
+ *  Function calling ABI conversion from SYSV to Windows for x86_64
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ *  GNU General Public License for more details.
+ *
+ */
+extern efi_status_t lin2win0(void *fp);
+extern efi_status_t lin2win1(void *fp, u64 arg1);
+extern efi_status_t lin2win2(void *fp, u64 arg1, u64 arg2);
+extern efi_status_t lin2win3(void *fp, u64 arg1, u64 arg2, u64 arg3);
+extern efi_status_t lin2win4(void *fp, u64 arg1, u64 arg2, u64 arg3, u64 arg4);
+extern efi_status_t lin2win5(void *fp, u64 arg1, u64 arg2, u64 arg3,
+u64 arg4, u64 arg5);
+extern efi_status_t lin2win6(void *fp, u64 arg1, u64 arg2, u64 arg3,
+u64 arg4, u64 arg5, u64 arg6);
Index: linux-2.6.23-rc1/arch/x86_64/kernel/efi.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.23-rc1/arch/x86_64/kernel/efi.c   2007-07-30 15:40:49.0 
+0800
@@ -0,0 +1,507 @@
+/*
+ * Extensible Firmware Interface
+ *
+ * Based on Extensible Firmware Interface Specification version 1.0
+ *
+ * Copyright (C) 1999 VA Linux Systems
+ * Copyright (C) 1999 Walt Drummond <[EMAIL PROTECTED]>
+ * Copyright (C) 1999-2002 Hewlett-Packard Co.
+ * David Mosberger-Tang <[EMAIL PROTECTED]>
+ * Stephane Eranian <[EMAIL PROTECTED]>
+ * Copyright (C) 2005-2008 Intel Co.
+ * Fenghua Yu <[EMAIL PROTECTED]>
+ * Bibo Mao <[EMAIL PROTECTED]>
+ * Chandramouli Narayanan <[EMAIL PROTECTED]>
+ *
+ * Code to convert EFI to E820 map has been implemented in elilo bootloader
+ * based on a EFI patch by Edgar Hucek. Based on the E820 map, the page table
+ * is setup appropriately for EFI runtime code.
+ * - mouli 06/14/2007.
+ *
+ * All EFI Runtime Services are not implemented yet as EFI only
+ * supports physical mode addressing on SoftSDV. This is to be fixed
+ * in a future version.  --drummond 1999-07-20
+ *
+ * Implemented EFI runtime services and virtual mode calls.  --davidm
+ *
+ * Goutham Rao: <[EMAIL PROTECTED]>
+ * Skip non-WB memory and ignore empty memory ranges.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct efi efi;
+EXPORT_SYMBOL(efi);
+
+struct efi efi_phys __initdata;
+struct efi_memory_map memmap;
+static efi_system_table_t efi_systab __initdata;
+
+static unsigned long efi_rt_eflags;
+/* efi_rt_lock protects efi physical mode call */
+static spinlock_t efi_rt_lock = SPIN_LOCK_UNLOCKED;
+static pgd_t save_pgd;
+
+static efi_status_t _efi_get_time(efi_time_t *tm, efi_time_cap_t *tc)
+{
+   return lin2win2((void *)efi.systab->runtime->get_time,
+   (u64)tm, (u64)tc);
+}
+
+static efi_status_t _efi_set_time(efi_time_t *tm)
+{
+   return lin2win1((void *)efi.systab->runtime->set_time, (u64)tm);
+}
+
+static efi_status_t _efi_get_wakeup_time(efi_bool_t *enabled,
+

Re: [ck] Re: SD still better than CFS for 3d ?

2007-07-30 Thread Matthew Hawkins

On 7/31/07, Roland Dreier <[EMAIL PROTECTED]> wrote:
>  >  Fuck you Martin!
>
> I think you meant to yell at Matthew, not Martin ;)

What's amusing about this is he's yelling at me for something I didn't
do, can't even get my name right, and has the audacity to claim that
*I* am the one looking like a fool!  While we're descending into
primary school theatrics, may I just say "takes one to know one" ;-)

I took the time to track down what caused a breakage - in an "illegal
binary driver" (not against the law here, though defamation certainly
is...) no less.  And contacted the vendor (separately).  Other people
on desktop machines with an ATI card using the fglrx driver may have
been interested to know that they can't do the benchmarking some
people here on lkml and -mm are asking for with a current 2.6.23 git
kernel, hence my post.

Martin's cleanup patch is good and I never claimed otherwise, I just
said the comment on the commit was a bad call (as there are users of
that interface).  Certainly ATI should fix their dodgy drivers.
That's been the cry of the community for a long time...

-- 
Matt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

inotify and /proc/

2007-07-30 Thread Joseph Pingenot

I was trying to use inotify to watch process changes (especially process
  termination) by watching /proc/.

Sadly, although I could see something reading various files, nothing
  was issued when the process I was watching exited and the directory
  went away.

Is this intentional, or a bug?

-Joseph
-- 
[EMAIL PROTECTED]///
"There is also an entire branch in the physical therapy field dedicated
 to the treatment of little-finger injuries caused by excessive Emacs
 use."  --Linux Weekly News editor (http://lwn.net/Articles/206916/)
///260 IATL / The University of Iowa / Iowa City, IA  52242///
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [TULIP] Need new maintainer

2007-07-30 Thread david


On Mon, 30 Jul 2007, Valerie Henson wrote:


On Mon, Jul 30, 2007 at 03:31:58PM -0400, Kyle McMartin wrote:

On Mon, Jul 30, 2007 at 01:04:13PM -0600, Valerie Henson wrote:

The Tulip network driver needs a new maintainer!  I no longer have
time to maintain the Tulip network driver and I'm stepping down.  Jeff
Garzik would be happy to get volunteers.



Since I already take care of a major consumer of these devices (parisc,
which pretty much all have tulip) I'm willing to take care of this.
Alternately, Grant is probably willing.


And I coulda handed you a suitcase full of cards and I missed my
chance!

It's fine by me, although Jeff is the final arbiter.

Thanks!

-VAL


when the maintainership gets settled send me a message and I can send you 
a couple of different cards (including a d-link 4-port card)


David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Check patch reports multiple var with a function assignment

2007-07-30 Thread Jason Wessel

Running checkpatch.pl products an warning when it should not.  I believe 
it can be fixed by adding to the regular expression, but feel free to 
fix it another way as I may not know all the cases this is trying to catch.


-- check patch output --
WARNING: declaring multiple variables together should be avoided
#451: FILE: drivers/serial/8250.c:1685:
+   unsigned char lsr = serial_inp(up, UART_LSR);

-- end check patch output --

Signed-off-by: Jason Wessel <[EMAIL PROTECTED]>

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 73751ab..32c6d74 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -825,8 +825,8 @@ sub process {

# check for multiple declarations, allowing for a function declaration
# continuation.
-   if ($line =~ 
/^.\s*$Type\s+$Ident(?:\s*=[^,{]*)?\s*,\s*$Ident.*/ &&
-   $line !~ 
/^.\s*$Type\s+$Ident(?:\s*=[^,{]*)?\s*,\s*$Type\s*$Ident.*/) {
+   if ($line =~ 
/^.\s*$Type\s+$Ident(?:\s*=[^,{\(]*)?\s*,\s*$Ident.*/ &&
+   $line !~ 
/^.\s*$Type\s+$Ident(?:\s*=[^,{\(]*)?\s*,\s*$Type\s*$Ident.*/) {
   WARN("declaring multiple variables together 
should be avoided\n" . $herecurr);

   }

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove fs.h from mm.h

2007-07-30 Thread Bryan Wu

On Mon, 2007-07-30 at 23:43 +0400, Alexey Dobriyan wrote:
> On Mon, Jul 30, 2007 at 12:42:07PM +0800, Bryan Wu wrote:
> > Can I do something to help this regression testing?
> >
> > Please feel free to ask me.
> 
> Sorry, blackfin toolchain doesn't like me, so I can't test this myself.
> Check current -git if I screwed up anything.
> 
Oh, do you need use blackfin toolchain? Actually, it is very simple to
setup it on your machine.
please get the latest binary toolchain here:
http://blackfin.uclinux.org/gf/download/frsrelease/344/3180/blackfin-toolchain-uclinux-SVN.tar.bz2
http://blackfin.uclinux.org/gf/download/frsrelease/344/3181/blackfin-toolchain-linux-uclibc-SVN.tar.bz2

 - untar these 2 tar ball
 - add the path to your environment variables
 - ready to compile the kernel by these blackfin cross toolchain

> It still takes too much time from clean git pull to final patch, so
> sending it to you would  increase risk that someone will touch core
> headers nontrivially invalidating all work.

Need I to generate the patch? I will take a look at the latest git-tree.

Many thanks, we hope you can add our blackfin to your cross-build check.

Best Regards,
- Bryan Wu
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: bonnie++ benchmarks for ext2,ext3,ext4,jfs,reiserfs,xfs,zfs on software raid 5

2007-07-30 Thread Theodore Tso

On Mon, Jul 30, 2007 at 09:39:39PM +0200, Miklos Szeredi wrote:
> > Extrapolating these %cpu number makes ZFS the fastest.
> > 
> > Are you sure these numbers are correct?
> 
> Note, that %cpu numbers for fuse filesystems are inherently skewed,
> because the CPU usage of the filesystem process itself is not taken
> into account.
> 
> So the numbers are not all that good, but according to the zfs-fuse
> author it hasn't been optimized yet, so they may improve.

Also, something which is data i/o intensive is going to be the best
case for a FUSE filesystem.  If you try something which is much more
metadata intensive (i.e., lots of file creates and deletes, chmods,
etc.) like say with a Postmark benchmark, you would almost certainly
get very different results.  That's not to say that bonnie++
benchmarks aren't useful, but when doing comparisons between
filesystems, it's a good idea to use a wide variety of benchmarks to
avoid getting potentially misleading results.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] flush icache before set_pte take6. [4/4] optimization for cpus other than montecito

2007-07-30 Thread KAMEZAWA Hiroyuki


Add "L2 cache is separated? check flag" as read_mostly global variable.

This add one memory reference to global variable to page faults of "executable"
map in do_wp_page(page copy case), file-mapped page fault and some system calls
which does memory map changes. But not so bad as calling sync_icache_dcache in
architectures which doesn't need it.

Signed-off-by: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]>


---
 arch/ia64/kernel/setup.c   |7 +++
 include/asm-ia64/pgtable.h |3 ++-
 2 files changed, 9 insertions(+), 1 deletion(-)

Index: linux-2.6.23-rc1.test/arch/ia64/kernel/setup.c
===
--- linux-2.6.23-rc1.test.orig/arch/ia64/kernel/setup.c
+++ linux-2.6.23-rc1.test/arch/ia64/kernel/setup.c
@@ -106,6 +106,8 @@ struct io_space io_space[MAX_IO_SPACES];
 EXPORT_SYMBOL(io_space);
 unsigned int num_io_spaces;
 
+int separated_l2_icache_dcache __read_mostly;
+
 /*
  * "flush_icache_range()" needs to know what processor dependent stride size 
to use
  * when it makes i-cache(s) coherent with d-caches.
@@ -718,6 +720,11 @@ get_model_name(__u8 family, __u8 model)
printk(KERN_ERR
   "%s: Table overflow. Some processor model information 
will be missing\n",
   __FUNCTION__);
+   /* Montecito has separated L2 Icache and Dcache. This requires
+  synchronize Icache and Dcache before set_pte() */
+   if (family == 0x20)
+   separated_l2_icache_dcache = 1;
+
return "Unknown";
 }
 
Index: linux-2.6.23-rc1.test/include/asm-ia64/pgtable.h
===
--- linux-2.6.23-rc1.test.orig/include/asm-ia64/pgtable.h
+++ linux-2.6.23-rc1.test/include/asm-ia64/pgtable.h
@@ -489,9 +489,10 @@ extern struct page *zero_page_memmap_ptr
  * as an executable pte.
  */
 extern void __sync_icache_dcache(pte_t pte);
+extern int separated_l2_icache_dcache;
 static inline void sync_icache_dcache(pte_t pte)
 {
-   if (pte_exec(pte))
+   if (pte_exec(pte) && separated_l2_icache_dcache)
__sync_icache_dcache(pte);
 }
 #define __HAVE_ARCH_SYNC_ICACHE_DCACHE

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] flush icache before set_pte take6. [3/4] add montecito brand name

2007-07-30 Thread KAMEZAWA Hiroyuki

Add Brand name "Montecito" to cpuinfo.

Signed-off-by: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]>

---
 arch/ia64/kernel/setup.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux-2.6.23-rc1.test/arch/ia64/kernel/setup.c
===
--- linux-2.6.23-rc1.test.orig/arch/ia64/kernel/setup.c
+++ linux-2.6.23-rc1.test/arch/ia64/kernel/setup.c
@@ -705,7 +705,8 @@ get_model_name(__u8 family, __u8 model)
case 0: memcpy(brand, "McKinley", 9); break;
case 1: memcpy(brand, "Madison", 8); break;
case 2: memcpy(brand, "Madison up to 9M cache", 23); 
break;
-   }
+   } else if (family == 0x20)
+   memcpy(brand, "Montecito", 10);
}
for (i = 0; i < MAX_BRANDS; i++)
if (strcmp(brandname[i], brand) == 0)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] flush icache before set_pte take6. [2/4] sync icache dcache

2007-07-30 Thread KAMEZAWA Hiroyuki

flush icache for ia64 take4.
This patch is against 2.6.23-rc1.

Changes V5 -> V6:
  - no changes. (added new patches to the patch set)

Changes V4 -> V5:
  - removed sync_icache_dcache from do_wp_page() page reuse case.

Changes v3 -> v4:
  - avoid implementing flush_(i)cache_pages().
  - added sync_icache_dcache() call.
  - change Documentation/cachetlb.txt

Current ia64 kernel flushes icache by lazy_mmu_prot_update() *after*
set_pte(). This is wrong. This patch removes lazy_mmu_prot_update and
add sync_icache_dcache(). sync_icache_dcache() is called before set_pte()
if necessary and synchronize icache with dcache (fc.i instruction).

This patch fixes SIGILL problem on NFS/ia64.

About Icache-Dcache inconsistency in ia64
 - When the cache line is modified, Icache and Dcache are purged.

 - When I-cache misses, I-cache will access just the lower layer cache(memory).
   Then, If the lower_layer_cache is not up-to-date, I-cache will see
   old information. For avoiding this case, Icache-Dcache synchronization(fc.i)
   is necessary. (Icache-Dcache synchronization means making Dcache and lower
   layer unified cache(memory) consistent.)

Details:
 - In general, cache flushing macro are used for virtually tagged caches.
   IA64 has physically tagged caches but doesn't guarantee consistency 
   between Icache and Dcache. So, new macro, sync_icache_dcache() is added.
   This is NO-OP in other archs.
 - sync_icache_dcache() only works if pte is executable.
 - sync_icache_dcache must be called before set_pte().
 - A page which is consistent is marked as PG_arch_1.

About changes in generic codes:
 - do_wp_page() need to sync newly copied page.
Here, lazy_mmu_prot_update() was done before set_pte().
This was because SIGILL in JAVA was reported and quick
fix was applied.

 - do_anonymous_page()  newly installed anon pages doesn't contains any
instruction when set_pte() is executed, icache-dcache
synchronization is not necessary.

 - __do_fault()  need to sync newly-installed page.

 - handle_pte_fault()  just changes access bit...then, no need to sync.

 - remove_migration_pte() need to sync newly-installed page.

 
 - change_pte_range()  need to sync icache-dcache. When a user writes
 instruction into the page and modifies protection to be
 executable, it should be synced.

 - hugetlb_change_protection()  Maybe cache will be expired...but
  it is safe to sync Icache before set_pte().

 - page_mkclean_one()  no need to sync icache-dcache. There is no page
   contents modification. And there is no protection 
   change.

Thanks to Zoltan Menyhart for his advices.

Signed-off-by: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]>

---
 Documentation/cachetlb.txt|   11 +++
 arch/ia64/mm/init.c   |6 ++
 include/asm-generic/pgtable.h |8 
 include/asm-ia64/pgtable.h|   15 ++-
 mm/hugetlb.c  |3 +--
 mm/memory.c   |7 ++-
 mm/migrate.c  |2 +-
 mm/mprotect.c |2 +-
 mm/rmap.c |1 -
 9 files changed, 28 insertions(+), 27 deletions(-)

Index: linux-2.6.23-rc1.test/include/asm-generic/pgtable.h
===
--- linux-2.6.23-rc1.test.orig/include/asm-generic/pgtable.h
+++ linux-2.6.23-rc1.test/include/asm-generic/pgtable.h
@@ -124,14 +124,14 @@ static inline void ptep_set_wrprotect(st
 #define pgd_offset_gate(mm, addr)  pgd_offset(mm, addr)
 #endif
 
-#ifndef __HAVE_ARCH_LAZY_MMU_PROT_UPDATE
-#define lazy_mmu_prot_update(pte)  do { } while (0)
-#endif
-
 #ifndef __HAVE_ARCH_MOVE_PTE
 #define move_pte(pte, prot, old_addr, new_addr)(pte)
 #endif
 
+#ifndef __HAVE_ARCH_SYNC_ICACHE_DCACHE
+#define sync_icache_dcache(pte)do {} while (0)
+#endif
+
 /*
  * A facility to provide lazy MMU batching.  This allows PTE updates and
  * page invalidations to be delayed until a call to leave lazy MMU mode
Index: linux-2.6.23-rc1.test/include/asm-ia64/pgtable.h
===
--- linux-2.6.23-rc1.test.orig/include/asm-ia64/pgtable.h
+++ linux-2.6.23-rc1.test/include/asm-ia64/pgtable.h
@@ -484,11 +484,18 @@ extern struct page *zero_page_memmap_ptr
 #endif
 
 /*
- * IA-64 doesn't have any external MMU info: the page tables contain all the 
necessary
- * information.  However, we use this routine to take care of any (delayed) 
i-cache
- * flushing that may be necessary.
+ * IA-64 doesn't guarantee Icache is consistent with Dcache. For ensure
+ * Icache consistency, we have to synchronize them before setting pte
+ * as an executable pte.
  */
-extern void lazy_mmu_prot_update (pte_t pte);
+extern void

[PATCH] flush icache before set_pte take6. [1/4] migration fix

2007-07-30 Thread KAMEZAWA Hiroyuki

In migration, a new page should be cache flushed before set_pte()
in some archs which have virtually-tagged cache..

V5 -> V6:
   * no changes (added new patches to the patch set)
V4 -> V5:
   * changed flush_icache_page to flush_cache_page.

Signed-off-by: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]>

---
 mm/migrate.c |1 +
 1 file changed, 1 insertion(+)

Index: linux-2.6.23-rc1.test/mm/migrate.c
===
--- linux-2.6.23-rc1.test.orig/mm/migrate.c
+++ linux-2.6.23-rc1.test/mm/migrate.c
@@ -172,6 +172,7 @@ static void remove_migration_pte(struct 
pte = pte_mkold(mk_pte(new, vma->vm_page_prot));
if (is_write_migration_entry(entry))
pte = pte_mkwrite(pte);
+   flush_cache_page(vma, addr, pte_pfn(pte));
set_pte_at(mm, addr, ptep, pte);
 
if (PageAnon(new))

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] Debug handling of early spurious interrupts

2007-07-30 Thread Fernando Luis Vázquez Cao

On Mon, 2007-07-30 at 11:22 -0700, Andrew Morton wrote: 
> On Mon, 30 Jul 2007 18:58:14 +0900
> Fernando Luis V__zquez Cao <[EMAIL PROTECTED]> wrote:
> 
> > > 
> > > So bad things might happen because of this change.  And if they do, they
> > > will take a lng time to be discovered, because non-shared interrupt
> > > handlers tend to dwell in crufty old drivers which not many people use.
> > > 
> > > So I'm wondering if it would be more prudent to only do all this for 
> > > shared
> > > handlers?
> > 
> > I have been testing this patches on all my machines, which spans three
> > different architectures: i386, x86_64 (EM64T and Opteron), and ia64.
> > 
> > The good news is that nothing catastrophic happened and, in the process,
> > I managed to find some somewhat buggy interrupt handlers.
> > 
> > I will present a brief summary of my findings here.
> 
> That's quite a lot of breakage for such a small sample of machines.  I do
> suspect that if we were to merge this lot into mainline, all sorts of bad
> stuff would happen.
Yup.

> otoh, as you point out, pretty much everthing which goes wrong is due to
> incorrect or dubious driver behaviour, and there is value in weeding these
> things out. 
> 
> But the problem with this process is that we're weeding things out at
> runtime.  Some drivers don't get used by many people and users of some
> architectures (esp embedded) tend to lag kernel.org by a long time.  So it
> could be years before all the fallout from this change is finally wrapped
> up.
Yes, that is a big concern. However, the same embedded people is
starting to use both kexec and kdump, so they may suffer the issues we
are trying to weed out anyway, even if these patches are not applied.
The difference is that with this new functionality it is possible to
catch potential problems relatively easily, because any incorrect
behaviour this may cause will be easily reproducible and, in most cases,
will reveal itself early at boot time.

As things stand now, I guess we will keep seeing occasional crashes and
strange behaviour in kexec-booted kernels, which in some cases will be
due to incorrect handling of spurious interrupts. Besides, such problems
are really difficult to reproduce because, commonly, we would need to
hit an obscure corner case.

> > If we find drivers that are not fixable we should probably consider this
> > new behaviour on a per-driver basis, as Andrew suggested.
> 
> We haven't found any such yet, have we?
Not yet, fortunately.

> > This would
> > probably require passing a new flag into request_irq. Besides, when such
> > a driver is detected we should emit a warning that it may not work
> > properly in a kdump kernel.
> > 
> > I would appreciate your comments on this.
> 
> Oh well, let's just keep maintaining it in -mm for now, gather more
> information so that we can make a more informed decision later on.
Makes sense. I will look into all the issues I have found so far, do
some more testing (I think a new approach is needed to speed up testing
of these issues...), and get back to you.

Thank you.

Fernando

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1172 matches

Mail list logo