Re: + elf-loader-crash-while-zero-filling-bss.patch added to -mm tree

2008-02-13 Thread Daniel Jacobowitz
On Wed, Feb 13, 2008 at 12:15:06AM -0800, [EMAIL PROTECTED] wrote:
> Subject: Elf loader crash while zero-filling .bss
> From: "Abel Bernabeu" <[EMAIL PROTECTED]>
> 
> I've finally found a solution for the crash in load_binary_elf I
> reported last week:
> 
> http://lkml.org/lkml/2008/1/30/171
> 
> The attached patch solves my problem.
> 
> set_brk(start, end) allocs just page aligned regions (by "collapsing" both
> extremes to the start of the page in which they lay)...  That means than
> even if both pointers are not equal there are still some chances that
> set_brk has allocated no space at all because ELF_PAGEALIGN(elf_bss) ==   
> ELF_PAGEALIGN(elf_brk).
> 
> So the condition was not correct.

This patch is wrong.

ELF_PAGEALIGN rounds up to the end of the page, not down to the start
of the page.  If elf_bss is in the middle of a page, set_brk allocates
any additional pages after the one already allocated.  elf_bss is the
start of the area that needs to be zero initialized, elf_brk is its
end.  So if elf_bss != elf_brk then there's garbage mapped in BSS
from the file and if you don't clear it some of your zero-initialized
variables won't be zero initialized at all.

In the linked message, set_brk is passed elf_bss so its actual
arguments are set_brk (0xa3801, 0x000a4ec8).  It should map one
page.  0xa3801 should be an already mapped page, and clear_user should
succeed in clearing it.

-- 
Daniel Jacobowitz
CodeSourcery
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: + elf-loader-crash-while-zero-filling-bss.patch added to -mm tree

2008-02-13 Thread Daniel Jacobowitz
On Wed, Feb 13, 2008 at 12:15:06AM -0800, [EMAIL PROTECTED] wrote:
 Subject: Elf loader crash while zero-filling .bss
 From: Abel Bernabeu [EMAIL PROTECTED]
 
 I've finally found a solution for the crash in load_binary_elf I
 reported last week:
 
 http://lkml.org/lkml/2008/1/30/171
 
 The attached patch solves my problem.
 
 set_brk(start, end) allocs just page aligned regions (by collapsing both
 extremes to the start of the page in which they lay)...  That means than
 even if both pointers are not equal there are still some chances that
 set_brk has allocated no space at all because ELF_PAGEALIGN(elf_bss) ==   
 ELF_PAGEALIGN(elf_brk).
 
 So the condition was not correct.

This patch is wrong.

ELF_PAGEALIGN rounds up to the end of the page, not down to the start
of the page.  If elf_bss is in the middle of a page, set_brk allocates
any additional pages after the one already allocated.  elf_bss is the
start of the area that needs to be zero initialized, elf_brk is its
end.  So if elf_bss != elf_brk then there's garbage mapped in BSS
from the file and if you don't clear it some of your zero-initialized
variables won't be zero initialized at all.

In the linked message, set_brk is passed elf_bss so its actual
arguments are set_brk (0xa3801, 0x000a4ec8).  It should map one
page.  0xa3801 should be an already mapped page, and clear_user should
succeed in clearing it.

-- 
Daniel Jacobowitz
CodeSourcery
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PAGE_SIZE Availability Inconsistency

2007-03-08 Thread Daniel Jacobowitz
On Thu, Mar 08, 2007 at 04:08:52PM +, Christoph Hellwig wrote:
> No, no no.  We should never export PAGE_SIZE.  We might export NBPG
> as deprecated symbol for gdb if it really needs it, but that should
> happen only on a.out systems, and it it should be a true constant,
> not depending on PAGE_SIZE.
> 
> I've Cc'ed the gdb list on whether they have any comments on this
> issue.

Sounds reasonable.  I do not believe that GDB has any dependence on
PAGE_SIZE; bfd (i.e. both gdb and binutils) use NBPG on a large number
of systems.  Looks like i386, alpha, m68k, s390, vax - but don't quote
me on that, I had to guess from the configure script.

-- 
Daniel Jacobowitz
CodeSourcery
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PAGE_SIZE Availability Inconsistency

2007-03-08 Thread Daniel Jacobowitz
On Thu, Mar 08, 2007 at 04:08:52PM +, Christoph Hellwig wrote:
 No, no no.  We should never export PAGE_SIZE.  We might export NBPG
 as deprecated symbol for gdb if it really needs it, but that should
 happen only on a.out systems, and it it should be a true constant,
 not depending on PAGE_SIZE.
 
 I've Cc'ed the gdb list on whether they have any comments on this
 issue.

Sounds reasonable.  I do not believe that GDB has any dependence on
PAGE_SIZE; bfd (i.e. both gdb and binutils) use NBPG on a large number
of systems.  Looks like i386, alpha, m68k, s390, vax - but don't quote
me on that, I had to guess from the configure script.

-- 
Daniel Jacobowitz
CodeSourcery
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Contents of core dumps

2007-01-03 Thread Daniel Jacobowitz
On Tue, Jan 02, 2007 at 08:57:21PM -0800, David Miller wrote:
> So I'd say we should just put this change in, as-is.  It fixes bugs,
> and in all the time that has passed since my initial posting there
> has not been any serious dissent.

Fine with me.  In that case, I will wait until the kernel is fixed,
verify it, and then probably adjust the GDB test to pass on either
patched or unpatched kernels.

-- 
Daniel Jacobowitz
CodeSourcery
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Contents of core dumps

2007-01-03 Thread Daniel Jacobowitz
On Tue, Jan 02, 2007 at 08:57:21PM -0800, David Miller wrote:
 So I'd say we should just put this change in, as-is.  It fixes bugs,
 and in all the time that has passed since my initial posting there
 has not been any serious dissent.

Fine with me.  In that case, I will wait until the kernel is fixed,
verify it, and then probably adjust the GDB test to pass on either
patched or unpatched kernels.

-- 
Daniel Jacobowitz
CodeSourcery
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Contents of core dumps (was: Re: fs/binfmt_elf.c:maydump())

2007-01-02 Thread Daniel Jacobowitz
[Please CC, I am not subscribed to lkml.]

On Thu, Apr 06, 2006 at 10:18:07PM -0700, David S. Miller wrote:
> How about something like the following patch?  If it's executable
> and not written to, skip it.  This would skip the main executable
> image and all text segments of the shared libraries mapped in.

I've been going through GDB test failures (... again...) and I'm down
to a respectably small number on x86_64, but this is one of the
remaining ones.  I don't suppose there's been any change since we
discussed this in April?

A refresher for those following along: there's a GDB test that mmaps a
file using MAP_PRIVATE and PROT_WRITE.  It expects the contents to end
up in the core dump.  Right now, they don't.  I can fix the test by
making sure it writes to the mapping, but before I change the test,
I want to raise the question of what _should_ be in a core dump.

I took a peek at what Solaris includes in core dumps.  They offer
(not surprisingly) a pile of configuration options.  The default is
just about everything except for file-backed shared memory and some
symbol table data - it includes text segments, rodata, anonymous shared
memory, file backed mappings, et cetera.  I guess that's another
argument in favor of dumping more.  Then you can control it globally,
per process, et cetera.

http://src.opensolaris.org/source/xref/loficc/crypto/usr/src/uts/common/sys/corectl.h

I also checked an AIX manual since there was a reference to SA_FULLDUMP
in the GDB test:

 By default, the user data, anonymously mapped regions, and vm_infox
 structures are not included in a core dump. This partial core dump
 includes the current thread stack, the thread thrdctx structures, the
 user structure, and the state of the registers at the time of the
 fault. A partial core dump contains sufficient information for a stack
 traceback. The size of a core dump can also be limited by the setrlimit
 or setrlimit64 subroutine.

 To enable a full core dump, set the SA_FULLDUMP flag in the sigaction
 subroutine for the signal that is to generate a full core dump. If this
 flag is set when the core is dumped, the user data section, vm_infox,
 and anonymously mapped region structures are included in the core dump.

Not really sure what that translates to, but it's less than what
Solaris dumps, I think.

Does Linux need knobs for this?

> 
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index 537893a..9ec5c2b 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -1167,8 +1167,10 @@ static int maydump(struct vm_area_struct
>   if (vma->vm_flags & VM_SHARED)
>   return vma->vm_file->f_dentry->d_inode->i_nlink == 0;
>  
> - /* If it hasn't been written to, don't write it out */
> - if (!vma->anon_vma)
> + /* If it is executable and hasn't been written to,
> +  * don't write it out.
> +  */
> + if ((vma->vm_flags & VM_EXEC) && !vma->anon_vma)
>   return 0;
>  
>   return 1;
> 
> 

-- 
Daniel Jacobowitz
CodeSourcery
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Re: [Bug 7210] New: Clone flag CLONE_PARENT_TIDPTR leaves invalid results in memory.

2007-01-02 Thread Daniel Jacobowitz
From: Daniel Jacobowitz <[EMAIL PROTECTED]>

Do not implement CLONE_PARENT_SETTID until we know that clone will succeed.
If we do it too early NPTL's data structures temporarily reference a
non-existant TID.

Signed-off-by: Daniel Jacobowitz <[EMAIL PROTECTED]>

---
On Tue, Sep 26, 2006 at 08:59:15PM -0700, Linus Torvalds wrote:
> 
> 
> On Tue, 26 Sep 2006, Roland McGrath wrote:
> >
> > It can go last, right before return, after unlock.
> > Userland only cares that parent_tidptr set before parent syscall returns,
> > and child_tidptr set before child returns.
> 
> Ok, as long as people are sure, I don't care. Then we have to just ignore 
> the error, though, since we can't recover (we've already "exposed" the 
> child on the task lists).
> 
> I don't think it's a big deal. Ignoring the error just means that if you 
> pass in an invalid ptr, it's as if the bit to set that value wasn't set. 
> Not a problem.
> 
> Especially if there is a test-program, can we just have a patch to try 
> that has been verified? It _sounded_ like somebody actually had a program 
> that could trigger this with some horrid code that sent signals and cloned 
> all the time?

I never got back to you about this...

Refresher, if there isn't enough above: CLONE_PARENT_SETTID is
currently implemented right after a TID is assigned.  There's a lot of
clone left to go at that point including a check for pending signals
which can lead to clone failing.  This leaves a TID in NPTL's thread
list which doesn't correspond to a thread.

I found Sunday another place where this is a problem, besides the
process-global UID stuff in glibc.  GDB tries to attach to the
nonexistant thread and gets upset.  I've made it cope, but at the same
time it provides a convenient test case.

Without the attached patch, tls.exp in the GDB testsuite would
intermittently report that it could not attach to a thread - always
within half an hour.  With the patch it ran for four hours without
a problem.

 kernel/fork.c |   13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

Index: linux-source-2.6.18/kernel/fork.c
===
--- linux-source-2.6.18.orig/kernel/fork.c  2007-01-02 13:45:28.0 
-0500
+++ linux-source-2.6.18/kernel/fork.c   2007-01-02 13:52:09.0 -0500
@@ -1012,10 +1012,6 @@ static struct task_struct *copy_process(
delayacct_tsk_init(p);  /* Must remain after dup_task_struct() */
copy_flags(clone_flags, p);
p->pid = pid;
-   retval = -EFAULT;
-   if (clone_flags & CLONE_PARENT_SETTID)
-   if (put_user(p->pid, parent_tidptr))
-   goto bad_fork_cleanup_delays_binfmt;
 
INIT_LIST_HEAD(>children);
INIT_LIST_HEAD(>sibling);
@@ -1251,6 +1247,14 @@ static struct task_struct *copy_process(
total_forks++;
spin_unlock(>sighand->siglock);
write_unlock_irq(_lock);
+
+   /*
+* Now that we know the fork has succeeded, record the new
+* TID.  It's too late to back out if this fails.
+*/
+   if (clone_flags & CLONE_PARENT_SETTID)
+   put_user(p->pid, parent_tidptr);
+
proc_fork_connector(p);
return p;
 
@@ -1281,7 +1285,6 @@ bad_fork_cleanup_policy:
 bad_fork_cleanup_cpuset:
 #endif
cpuset_exit(p);
-bad_fork_cleanup_delays_binfmt:
delayacct_tsk_free(p);
if (p->binfmt)
module_put(p->binfmt->module);

-- 
Daniel Jacobowitz
CodeSourcery
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Re: [Bug 7210] New: Clone flag CLONE_PARENT_TIDPTR leaves invalid results in memory.

2007-01-02 Thread Daniel Jacobowitz
From: Daniel Jacobowitz [EMAIL PROTECTED]

Do not implement CLONE_PARENT_SETTID until we know that clone will succeed.
If we do it too early NPTL's data structures temporarily reference a
non-existant TID.

Signed-off-by: Daniel Jacobowitz [EMAIL PROTECTED]

---
On Tue, Sep 26, 2006 at 08:59:15PM -0700, Linus Torvalds wrote:
 
 
 On Tue, 26 Sep 2006, Roland McGrath wrote:
 
  It can go last, right before return, after unlock.
  Userland only cares that parent_tidptr set before parent syscall returns,
  and child_tidptr set before child returns.
 
 Ok, as long as people are sure, I don't care. Then we have to just ignore 
 the error, though, since we can't recover (we've already exposed the 
 child on the task lists).
 
 I don't think it's a big deal. Ignoring the error just means that if you 
 pass in an invalid ptr, it's as if the bit to set that value wasn't set. 
 Not a problem.
 
 Especially if there is a test-program, can we just have a patch to try 
 that has been verified? It _sounded_ like somebody actually had a program 
 that could trigger this with some horrid code that sent signals and cloned 
 all the time?

I never got back to you about this...

Refresher, if there isn't enough above: CLONE_PARENT_SETTID is
currently implemented right after a TID is assigned.  There's a lot of
clone left to go at that point including a check for pending signals
which can lead to clone failing.  This leaves a TID in NPTL's thread
list which doesn't correspond to a thread.

I found Sunday another place where this is a problem, besides the
process-global UID stuff in glibc.  GDB tries to attach to the
nonexistant thread and gets upset.  I've made it cope, but at the same
time it provides a convenient test case.

Without the attached patch, tls.exp in the GDB testsuite would
intermittently report that it could not attach to a thread - always
within half an hour.  With the patch it ran for four hours without
a problem.

 kernel/fork.c |   13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

Index: linux-source-2.6.18/kernel/fork.c
===
--- linux-source-2.6.18.orig/kernel/fork.c  2007-01-02 13:45:28.0 
-0500
+++ linux-source-2.6.18/kernel/fork.c   2007-01-02 13:52:09.0 -0500
@@ -1012,10 +1012,6 @@ static struct task_struct *copy_process(
delayacct_tsk_init(p);  /* Must remain after dup_task_struct() */
copy_flags(clone_flags, p);
p-pid = pid;
-   retval = -EFAULT;
-   if (clone_flags  CLONE_PARENT_SETTID)
-   if (put_user(p-pid, parent_tidptr))
-   goto bad_fork_cleanup_delays_binfmt;
 
INIT_LIST_HEAD(p-children);
INIT_LIST_HEAD(p-sibling);
@@ -1251,6 +1247,14 @@ static struct task_struct *copy_process(
total_forks++;
spin_unlock(current-sighand-siglock);
write_unlock_irq(tasklist_lock);
+
+   /*
+* Now that we know the fork has succeeded, record the new
+* TID.  It's too late to back out if this fails.
+*/
+   if (clone_flags  CLONE_PARENT_SETTID)
+   put_user(p-pid, parent_tidptr);
+
proc_fork_connector(p);
return p;
 
@@ -1281,7 +1285,6 @@ bad_fork_cleanup_policy:
 bad_fork_cleanup_cpuset:
 #endif
cpuset_exit(p);
-bad_fork_cleanup_delays_binfmt:
delayacct_tsk_free(p);
if (p-binfmt)
module_put(p-binfmt-module);

-- 
Daniel Jacobowitz
CodeSourcery
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Contents of core dumps (was: Re: fs/binfmt_elf.c:maydump())

2007-01-02 Thread Daniel Jacobowitz
[Please CC, I am not subscribed to lkml.]

On Thu, Apr 06, 2006 at 10:18:07PM -0700, David S. Miller wrote:
 How about something like the following patch?  If it's executable
 and not written to, skip it.  This would skip the main executable
 image and all text segments of the shared libraries mapped in.

I've been going through GDB test failures (... again...) and I'm down
to a respectably small number on x86_64, but this is one of the
remaining ones.  I don't suppose there's been any change since we
discussed this in April?

A refresher for those following along: there's a GDB test that mmaps a
file using MAP_PRIVATE and PROT_WRITE.  It expects the contents to end
up in the core dump.  Right now, they don't.  I can fix the test by
making sure it writes to the mapping, but before I change the test,
I want to raise the question of what _should_ be in a core dump.

I took a peek at what Solaris includes in core dumps.  They offer
(not surprisingly) a pile of configuration options.  The default is
just about everything except for file-backed shared memory and some
symbol table data - it includes text segments, rodata, anonymous shared
memory, file backed mappings, et cetera.  I guess that's another
argument in favor of dumping more.  Then you can control it globally,
per process, et cetera.

http://src.opensolaris.org/source/xref/loficc/crypto/usr/src/uts/common/sys/corectl.h

I also checked an AIX manual since there was a reference to SA_FULLDUMP
in the GDB test:

 By default, the user data, anonymously mapped regions, and vm_infox
 structures are not included in a core dump. This partial core dump
 includes the current thread stack, the thread thrdctx structures, the
 user structure, and the state of the registers at the time of the
 fault. A partial core dump contains sufficient information for a stack
 traceback. The size of a core dump can also be limited by the setrlimit
 or setrlimit64 subroutine.

 To enable a full core dump, set the SA_FULLDUMP flag in the sigaction
 subroutine for the signal that is to generate a full core dump. If this
 flag is set when the core is dumped, the user data section, vm_infox,
 and anonymously mapped region structures are included in the core dump.

Not really sure what that translates to, but it's less than what
Solaris dumps, I think.

Does Linux need knobs for this?

 
 diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
 index 537893a..9ec5c2b 100644
 --- a/fs/binfmt_elf.c
 +++ b/fs/binfmt_elf.c
 @@ -1167,8 +1167,10 @@ static int maydump(struct vm_area_struct
   if (vma-vm_flags  VM_SHARED)
   return vma-vm_file-f_dentry-d_inode-i_nlink == 0;
  
 - /* If it hasn't been written to, don't write it out */
 - if (!vma-anon_vma)
 + /* If it is executable and hasn't been written to,
 +  * don't write it out.
 +  */
 + if ((vma-vm_flags  VM_EXEC)  !vma-anon_vma)
   return 0;
  
   return 1;
 
 

-- 
Daniel Jacobowitz
CodeSourcery
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13 SMP on AMD Athlon64 X2 + FC4: PS/2 keyboard b0rken; taskset/sched_setaffinity() saves the day!

2005-09-06 Thread Daniel Jacobowitz
On Tue, Sep 06, 2005 at 11:10:29PM +0200, Frank van Maarseveen wrote:
> While playing with a new AMD Athlon64 X2 3800+ (i386) the keyboard goes
> wild for 10 (20?) seconds, behaves normally for 10 (20?) seconds, and
> then goes wild again: when "wild", every keypress results in a random
> number of repeats, e.g.:
> 
> $ pppsss aaxxxuuu
> bash: pppsss: command not found
> $
> $
> $
> $
> $
> $
> $
> $
> 
> Upgrading Xorg to xorg-x11-6.8.2-37.FC4.45 did not help.
> 
> Booting with "nosmp" seems to fix it. And this _seems_ to fix it too:
> 
> taskset -p 1 `ps axo comm,pid|awk '$1=="X"{print $2}'`
> 
> I haven't seen this problem on the console.

This is probably the same problem as the earlier one you reported.  If
you take a look at bugzilla, you'll see that the normal manifestation
is messed up key repeat rates...

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13 SMP on AMD Athlon64 X2 + FC4: PS/2 keyboard b0rken; taskset/sched_setaffinity() saves the day!

2005-09-06 Thread Daniel Jacobowitz
On Tue, Sep 06, 2005 at 11:10:29PM +0200, Frank van Maarseveen wrote:
 While playing with a new AMD Athlon64 X2 3800+ (i386) the keyboard goes
 wild for 10 (20?) seconds, behaves normally for 10 (20?) seconds, and
 then goes wild again: when wild, every keypress results in a random
 number of repeats, e.g.:
 
 $ pppsss aaxxxuuu
 bash: pppsss: command not found
 $
 $
 $
 $
 $
 $
 $
 $
 
 Upgrading Xorg to xorg-x11-6.8.2-37.FC4.45 did not help.
 
 Booting with nosmp seems to fix it. And this _seems_ to fix it too:
 
 taskset -p 1 `ps axo comm,pid|awk '$1==X{print $2}'`
 
 I haven't seen this problem on the console.

This is probably the same problem as the earlier one you reported.  If
you take a look at bugzilla, you'll see that the normal manifestation
is messed up key repeat rates...

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13 SMP on Athlon X2: nanosleep returning waay to soon, clock_gettime(CLOCK_REALTIME...) proceeding too fast

2005-09-04 Thread Daniel Jacobowitz
On Sun, Sep 04, 2005 at 01:39:15PM +0200, Frank van Maarseveen wrote:
> After replacing the kernel on a fresh FC4 install with a stock 2.6.13
> (using gcc 3.2) and my own config it appears that the clock is going too
> fast: it gains at least an hour every 12 hours or so. FC4 kernel (rpm:
> kernel-2.6.11-1.1369_FC4) seems ok

Mind sticking this information in bugzilla.kernel.org, bug 5105?

> annotated output:
> 
>   CPU0 CPU1   Total
> ---
>  1  0 + 251 = 251
>  2  0 + 251 = 251
>  3  0 + 251 = 251
>  4  0 + 251 = 251
>  5  0 + 251 = 251
>  6  52 + 196 = 248<== (?)
>  7  251 + 0 = 251
>  8  251 + 0 = 251
>  9  251 + 0 = 251
> 10  251 + 0 = 251
> 11  251 + 0 = 251
> 12  251 + 0 = 251
> 13  251 + 0 = 251
> 14  251 + 0 = 251
> 15  251 + 0 = 251
> 16  147 + 1 = 148 <==
> 17  0 + 252 = 252

Hmm, very interesting.


-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13 SMP on Athlon X2: nanosleep returning waay to soon, clock_gettime(CLOCK_REALTIME...) proceeding too fast

2005-09-04 Thread Daniel Jacobowitz
On Sun, Sep 04, 2005 at 01:39:15PM +0200, Frank van Maarseveen wrote:
 After replacing the kernel on a fresh FC4 install with a stock 2.6.13
 (using gcc 3.2) and my own config it appears that the clock is going too
 fast: it gains at least an hour every 12 hours or so. FC4 kernel (rpm:
 kernel-2.6.11-1.1369_FC4) seems ok

Mind sticking this information in bugzilla.kernel.org, bug 5105?

 annotated output:
 
   CPU0 CPU1   Total
 ---
  1  0 + 251 = 251
  2  0 + 251 = 251
  3  0 + 251 = 251
  4  0 + 251 = 251
  5  0 + 251 = 251
  6  52 + 196 = 248== (?)
  7  251 + 0 = 251
  8  251 + 0 = 251
  9  251 + 0 = 251
 10  251 + 0 = 251
 11  251 + 0 = 251
 12  251 + 0 = 251
 13  251 + 0 = 251
 14  251 + 0 = 251
 15  251 + 0 = 251
 16  147 + 1 = 148 ==
 17  0 + 252 = 252

Hmm, very interesting.


-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Fix 32-bit thread debugging on x86_64

2005-07-31 Thread Daniel Jacobowitz
The IA32 ptrace emulation currently returns the wrong registers for
fs/gs; it's returning what x86_64 calls gs_base.  We need regs.gsindex
in order for GDB to correctly locate the TLS area.  Without this patch,
the 32-bit GDB testsuite bombs on a 64-bit kernel.  With it, results
look about like I'd expect, although there are still a handful of
kernel-related failures (vsyscall related?).

Signed-off-by: Daniel Jacobowitz <[EMAIL PROTECTED]>

diff -r -p -u z/linux-2.6.11/arch/x86_64/ia32/ptrace32.c 
linux-2.6.11/arch/x86_64/ia32/ptrace32.c
--- linux-2.6.12.3.orig/arch/x86_64/ia32/ptrace32.c 2005-03-02 
02:37:52.0 -0500
+++ linux-2.6.12.3/arch/x86_64/ia32/ptrace32.c  2005-07-31 15:29:48.0 
-0400
@@ -43,11 +43,11 @@ static int putreg32(struct task_struct *
switch (regno) {
case offsetof(struct user32, regs.fs):
if (val && (val & 3) != 3) return -EIO; 
-   child->thread.fs = val & 0x; 
+   child->thread.fsindex = val & 0x; 
break;
case offsetof(struct user32, regs.gs):
if (val && (val & 3) != 3) return -EIO; 
-   child->thread.gs = val & 0x;
+   child->thread.gsindex = val & 0x;
break;
case offsetof(struct user32, regs.ds):
if (val && (val & 3) != 3) return -EIO; 
@@ -138,10 +138,10 @@ static int getreg32(struct task_struct *
 
switch (regno) {
case offsetof(struct user32, regs.fs):
-   *val = child->thread.fs; 
+   *val = child->thread.fsindex;
break;
case offsetof(struct user32, regs.gs):
-   *val = child->thread.gs;
+   *val = child->thread.gsindex;
break;
case offsetof(struct user32, regs.ds):
*val = child->thread.ds;

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Fix 32-bit thread debugging on x86_64

2005-07-31 Thread Daniel Jacobowitz
The IA32 ptrace emulation currently returns the wrong registers for
fs/gs; it's returning what x86_64 calls gs_base.  We need regs.gsindex
in order for GDB to correctly locate the TLS area.  Without this patch,
the 32-bit GDB testsuite bombs on a 64-bit kernel.  With it, results
look about like I'd expect, although there are still a handful of
kernel-related failures (vsyscall related?).

Signed-off-by: Daniel Jacobowitz [EMAIL PROTECTED]

diff -r -p -u z/linux-2.6.11/arch/x86_64/ia32/ptrace32.c 
linux-2.6.11/arch/x86_64/ia32/ptrace32.c
--- linux-2.6.12.3.orig/arch/x86_64/ia32/ptrace32.c 2005-03-02 
02:37:52.0 -0500
+++ linux-2.6.12.3/arch/x86_64/ia32/ptrace32.c  2005-07-31 15:29:48.0 
-0400
@@ -43,11 +43,11 @@ static int putreg32(struct task_struct *
switch (regno) {
case offsetof(struct user32, regs.fs):
if (val  (val  3) != 3) return -EIO; 
-   child-thread.fs = val  0x; 
+   child-thread.fsindex = val  0x; 
break;
case offsetof(struct user32, regs.gs):
if (val  (val  3) != 3) return -EIO; 
-   child-thread.gs = val  0x;
+   child-thread.gsindex = val  0x;
break;
case offsetof(struct user32, regs.ds):
if (val  (val  3) != 3) return -EIO; 
@@ -138,10 +138,10 @@ static int getreg32(struct task_struct *
 
switch (regno) {
case offsetof(struct user32, regs.fs):
-   *val = child-thread.fs; 
+   *val = child-thread.fsindex;
break;
case offsetof(struct user32, regs.gs):
-   *val = child-thread.gs;
+   *val = child-thread.gsindex;
break;
case offsetof(struct user32, regs.ds):
*val = child-thread.ds;

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86-64: ptrace ia32 BP fix

2005-07-05 Thread Daniel Jacobowitz
On Tue, Jul 05, 2005 at 02:31:15AM -0700, Roland McGrath wrote:
> 
> When the 32-bit vDSO is used to make a system call, the %ebp register for
> the 6th syscall arg has to be loaded from the user stack (where it's pushed
> by the vDSO user code).  The native i386 kernel always does this before
> stopping for syscall tracing, so %ebp can be seen and modified via ptrace
> to access the 6th syscall argument.  The x86-64 kernel fails to do this,
> presenting the stack address to ptrace instead.  This makes the %rbp value
> seen by 64-bit ptrace of a 32-bit process, and the %ebp value seen by a
> 32-bit caller of ptrace, both differ from the native i386 behavior.
> 
> This patch fixes the problem by putting the word loaded from the user stack
> into %rbp before calling syscall_trace_enter, and reloading the 6th syscall
> argument from there afterwards (so ptrace can change it).  This makes the
> behavior match that of i386 kernels.

Wouldn't this  to botch a debugger which supported both backtracing and
PTRACE_SYSCALL, when stopped in a syscall?  We have unwind information
for the VDSO and it's not going to tell us that the kernel has done
something clever to the value of %ebp.


-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86-64: ptrace ia32 BP fix

2005-07-05 Thread Daniel Jacobowitz
On Tue, Jul 05, 2005 at 02:31:15AM -0700, Roland McGrath wrote:
 
 When the 32-bit vDSO is used to make a system call, the %ebp register for
 the 6th syscall arg has to be loaded from the user stack (where it's pushed
 by the vDSO user code).  The native i386 kernel always does this before
 stopping for syscall tracing, so %ebp can be seen and modified via ptrace
 to access the 6th syscall argument.  The x86-64 kernel fails to do this,
 presenting the stack address to ptrace instead.  This makes the %rbp value
 seen by 64-bit ptrace of a 32-bit process, and the %ebp value seen by a
 32-bit caller of ptrace, both differ from the native i386 behavior.
 
 This patch fixes the problem by putting the word loaded from the user stack
 into %rbp before calling syscall_trace_enter, and reloading the 6th syscall
 argument from there afterwards (so ptrace can change it).  This makes the
 behavior match that of i386 kernels.

Wouldn't this  to botch a debugger which supported both backtracing and
PTRACE_SYSCALL, when stopped in a syscall?  We have unwind information
for the VDSO and it's not going to tell us that the kernel has done
something clever to the value of %ebp.


-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: strange incremental patch size [2.6.12-rc2 to 2.6.12-rc3]

2005-04-21 Thread Daniel Jacobowitz
On Thu, Apr 21, 2005 at 12:32:59PM +0200, Maciej Soltysiak wrote:
> Hi,
> 
> These are the sizes of rc2 and rc3 patches
> 
> # ls -la patch-2.6.12*
> -rw-r--r--  1 root src 18011382 Apr  4 18:50 patch-2.6.12-rc2
> -rw-r--r--  1 root src 19979854 Apr 21 02:29 patch-2.6.12-rc3
> 
> Let us make an incremental patch from rc2 to rc3
> 
> # interdiff patch-2.6.12-rc2 patch-2.6.12-rc3 >x
> 
> Let us see how big it is.
> # ls -ld x
> -rw-r--r--  1 root src 37421924 Apr 21 12:28 x
> 
> How come interdiff from rc2 (18MB) to rc3 (20MB) gave me
> 37MB worth of patch-code ? I would expect something about
> 2MB but 40MB ?

Try interdiff -p1?

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: strange incremental patch size [2.6.12-rc2 to 2.6.12-rc3]

2005-04-21 Thread Daniel Jacobowitz
On Thu, Apr 21, 2005 at 12:32:59PM +0200, Maciej Soltysiak wrote:
 Hi,
 
 These are the sizes of rc2 and rc3 patches
 
 # ls -la patch-2.6.12*
 -rw-r--r--  1 root src 18011382 Apr  4 18:50 patch-2.6.12-rc2
 -rw-r--r--  1 root src 19979854 Apr 21 02:29 patch-2.6.12-rc3
 
 Let us make an incremental patch from rc2 to rc3
 
 # interdiff patch-2.6.12-rc2 patch-2.6.12-rc3 x
 
 Let us see how big it is.
 # ls -ld x
 -rw-r--r--  1 root src 37421924 Apr 21 12:28 x
 
 How come interdiff from rc2 (18MB) to rc3 (20MB) gave me
 37MB worth of patch-code ? I would expect something about
 2MB but 40MB ?

Try interdiff -p1?

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH x86_64] Live Patching Function on 2.6.11.7

2005-04-17 Thread Daniel Jacobowitz
On Mon, Apr 18, 2005 at 01:19:57PM +0900, Takashi Ikebe wrote:
> GDB based approach seems not fit to our requirements. GDB(ptrace) based 
> functions are basically need to be done when target process is stopping.
> In addition to that current PTRACE_PEEK/POKE* allows us to copy only a 
> *word* size...

While true, this is easily fixable.  There is even an interface
precedent on OpenBSD (and possibly other platforms as well).

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386 & x86_64: Live Patching Funcion on 2.6.11.7

2005-04-17 Thread Daniel Jacobowitz
On Mon, Apr 18, 2005 at 10:41:23AM +0900, Takashi Ikebe wrote:
> Daniel-san,
> GDB based approach seems not fit to our requirements. GDB(ptrace) based 
> functions are basically need to be done when target process is stopping. 
> From our experience, sometimes patches became to dozens to hundreds at 
> one patching, and in this case GDB based approach cause target process's 
> availability descent.

That's right, it does require the target process be stopped.  If it
isn't stopped how do you know it isn't executing the same instruction
you're currently patching?

Even with hundreds of kilobytes of patch, I have trouble imagining this
takes a substantial amount of time.

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386 & x86_64: Live Patching Funcion on 2.6.11.7

2005-04-17 Thread Daniel Jacobowitz
On Sat, Apr 16, 2005 at 11:44:39PM -0700, David S. Miller wrote:
> 
> Takashi-san, have you ever investigated using kprobes to
> implement this feature?  It seems a perfect fit, and would
> allow support on several architectures other than just x86
> and x86_64.
> 
> If kprobes does not meet your needs completely, it could
> be trivially extended to do so.
> 
> I think implementing something like this from scratch is
> not a good idea when we have much of the needed logic and
> infrastructure already.

Takashi-san's description was not very clear, but it sounds like it's a
patching mechanism for userspace applications - not for kernel space.
So kprobes would not be a good fit.

If I'm right, I'm not sure why some of the bits of it were done
separately instead of via the existing ptrace mechanism.  And GDB
would appreciate a mechanism for mmap/munmap/mprotect in a debugged
process, also.

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386 x86_64: Live Patching Funcion on 2.6.11.7

2005-04-17 Thread Daniel Jacobowitz
On Sat, Apr 16, 2005 at 11:44:39PM -0700, David S. Miller wrote:
 
 Takashi-san, have you ever investigated using kprobes to
 implement this feature?  It seems a perfect fit, and would
 allow support on several architectures other than just x86
 and x86_64.
 
 If kprobes does not meet your needs completely, it could
 be trivially extended to do so.
 
 I think implementing something like this from scratch is
 not a good idea when we have much of the needed logic and
 infrastructure already.

Takashi-san's description was not very clear, but it sounds like it's a
patching mechanism for userspace applications - not for kernel space.
So kprobes would not be a good fit.

If I'm right, I'm not sure why some of the bits of it were done
separately instead of via the existing ptrace mechanism.  And GDB
would appreciate a mechanism for mmap/munmap/mprotect in a debugged
process, also.

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386 x86_64: Live Patching Funcion on 2.6.11.7

2005-04-17 Thread Daniel Jacobowitz
On Mon, Apr 18, 2005 at 10:41:23AM +0900, Takashi Ikebe wrote:
 Daniel-san,
 GDB based approach seems not fit to our requirements. GDB(ptrace) based 
 functions are basically need to be done when target process is stopping. 
 From our experience, sometimes patches became to dozens to hundreds at 
 one patching, and in this case GDB based approach cause target process's 
 availability descent.

That's right, it does require the target process be stopped.  If it
isn't stopped how do you know it isn't executing the same instruction
you're currently patching?

Even with hundreds of kilobytes of patch, I have trouble imagining this
takes a substantial amount of time.

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH x86_64] Live Patching Function on 2.6.11.7

2005-04-17 Thread Daniel Jacobowitz
On Mon, Apr 18, 2005 at 01:19:57PM +0900, Takashi Ikebe wrote:
 GDB based approach seems not fit to our requirements. GDB(ptrace) based 
 functions are basically need to be done when target process is stopping.
 In addition to that current PTRACE_PEEK/POKE* allows us to copy only a 
 *word* size...

While true, this is easily fixable.  There is even an interface
precedent on OpenBSD (and possibly other platforms as well).

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] FUSE permission modell (Was: fuse review bits)

2005-04-11 Thread Daniel Jacobowitz
On Mon, Apr 11, 2005 at 09:56:29PM +0200, Miklos Szeredi wrote:
> Well the sanity check on the "server" side is always enforced.  You
> can't "trick" sftp or ftp to not check permissions.  So checking on
> the "client" side too (where the fuse daemon is running) makes no
> sense, does it?

That argument doesn't make much sense to me.  But we're at the end of
my useful contributions to this discussion; I'm going to be quiet now
and hope some folks who know more about filesystems have more useful
responses.

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] FUSE permission modell (Was: fuse review bits)

2005-04-11 Thread Daniel Jacobowitz
On Mon, Apr 11, 2005 at 09:10:46PM +0200, Miklos Szeredi wrote:
> > Root squashing is actually a much less obnoxious restriction.  It means
> > that local uid 0 doesn't automatically correspond to remote uid 0.
> 
> I don't agree that it's less obnoxious.  Root squashing and a
> restricted directory (-rwx--) would have exactly the same affect:
> root is denied all access.

That's considerably less obnoxious, because such directories are
comparatively rare; most files, root can still read.  There are still
a couple unintuitive cases where root has less privelege than a
particular non-root user, of course.  But your model gives root
normally fewer privileges than the user that mounted th e FS.

> > But why does the kernel need to know anything about this?  Why can't
> > the userspace library present the permissions appropriately to the
> > kernel?
> 
> That is exactly what you should do if you use the default_permissions
> options.  You set the file mode, and the kernel checks the permission.

So why not make default_permissions a feature of the userspace?

> > I'm going to be pretty confused if I see a mode 666 file that I
> > can't even read.  So will various programs.
> 
> How would you get such I file?  I don't understand.

The permissions exposed by the FUSE layer apparently don't correspond
to what local users can do with them.  That's the problem here.  It may
be that I'm completely misunderstanding you - but from what you've
described, the userspace daemon can mark a file's permissions as 666,
and then with allow_other and allow_root off no one else will be able
to read it, despite those permissions.

> > Except for the allow_root bits, I think that having userspace handle
> > the issue entirely would cover both objections.
> 
> If I want to allow unprivileged users to be able to mount their
> filesystems, then handling everything in userspace is not an option.
> For example if you could mount a filesystem in which files have
> user=root instead of your own user ID, you could probably confuse some
> applications running as root, and cause information leak.  That's
> exactly why allow_root and allow_other are disabled for normal users.
> 
> The only safe option that I can imagine is that the kernel will reset
> the user and group fields of the file attributes.  This would again
> require a kernel option, but would be far less useful IMO.

I think we've got a boundary problem here.  You are exposing some
arbitrary, user-supplied values in the permissions, and then performing
sanity checks at access time; I'm suggesting performing the sanity
checking on the other side, when the permissions are supplied to the
kernel by the daemon.

Why would it be less useful to show files that have been "created" by a
user as owned by that user?  Or files that the user has requested no
other users be able to write as unwritable by group/other?  Sure, it
makes your tarfs a little less mapped onto the tar file.  But that's
one of the recurring objections to implementing archivers as
filesystems: the ownership in the archive is _not_ relevant to the
mounted copy.

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] FUSE permission modell (Was: fuse review bits)

2005-04-11 Thread Daniel Jacobowitz
On Mon, Apr 11, 2005 at 07:22:57PM +0100, Jamie Lokier wrote:
> >   1) Only allow mount over a directory for which the user has write
> >  access (and is not sticky)
> 
> Seems good - but why not sticky?  Mounting a user filesystem in
> /tmp/user-xxx/my-mount-point seems not unreasonable - provided the
> administrator can delete the directory (which is possible with
> detachable mount points).

Because then they could mount over /tmp.  "and (is not sticky || is
owned by the user)" may be more appropriate.


-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] FUSE permission modell (Was: fuse review bits)

2005-04-11 Thread Daniel Jacobowitz
On Mon, Apr 11, 2005 at 05:56:09PM +0200, Miklos Szeredi wrote:
> > >   3) No other user should have access to files under the mount, not
> > >  even root[5]
> > 
> > > [5] Obviously root cannot be restricted, but accidental access to
> > > private data is still a good idea.  E.g. root squashing by NFS servers
> > > has a similar affect.
> > 
> > Could you explain a little more?  I don't see the point in denying
> > access to root, but I also can't tell from your explanation whether you
> > do or not.
> 
> Fuse by default does.  This can be disabled by one of two mount
> options: "allow_other" and "allow_root".  The former implies the
> later.  These mount options are only allowed for mounting by root, but
> this can be relaxed with a configuration option.

So the behavior that Cristoph was objecting to here is in fact
configurable?

> > I don't really see the point of this restriction, anyway.  Could you
> > explain why this shouldn't be a matter of policy, and kept out of the
> > kernel?  Have the userspace file servers default to putting restrictive
> > permissions on mounts unless requested otherwise.
> 
> That's an option.  However you can't restrict root that way, and you
> need an extra directory, since permissions on the mountpoint are
> ignored after the mount.

No, you need the userspace daemon to set the permissions on the root
directory of the new mount restrictively.  What am I missing?

> Restricting root is needed, so that a sysadmin won't accidently go
> into a user's private mount (e.g. sshfs to some machine to which the
> sysadmin otherwise has no access).  Root can still gain access by
> doing 'su me', but at least he will have a bad conscience.  This is
> not such a stupid idea as it first sounds IMO, and by default all NFS
> servers exhibit a similar behavior (root squashing).

Root squashing is actually a much less obnoxious restriction.  It means
that local uid 0 doesn't automatically correspond to remote uid 0.

> > >   4) Access should not be further restricted for the owner of the
> > >  mount, even if permission bits, uid or gid would suggest
> > >  otherwise
> > 
> > Similar questions.
> 
> This behavior can be disabled by the "default_permissions" mount
> option (wich is not privileged, since it adds restrictions).  A FUSE
> filesystem mounted by root (and not for private purposes) would
> normally be done with "allow_other,default_permissions".

But why does the kernel need to know anything about this?  Why can't
the userspace library present the permissions appropriately to the
kernel?  I'm going to be pretty confused if I see a mode 666 file that
I can't even read.  So will various programs.

Except for the allow_root bits, I think that having userspace handle
the issue entirely would cover both objections.

> Does this answer your questions?

More or less.

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] FUSE permission modell (Was: fuse review bits)

2005-04-11 Thread Daniel Jacobowitz
On Mon, Apr 11, 2005 at 04:43:32PM +0200, Miklos Szeredi wrote:
>   3) No other user should have access to files under the mount, not
>  even root[5]

> [5] Obviously root cannot be restricted, but accidental access to
> private data is still a good idea.  E.g. root squashing by NFS servers
> has a similar affect.

Could you explain a little more?  I don't see the point in denying
access to root, but I also can't tell from your explanation whether you
do or not.

If I mount a filesystem using ssh, I want to be able to "sudo cp
foo.txt /etc" and not get an inexplicable permissions error.

I don't really see the point of this restriction, anyway.  Could you
explain why this shouldn't be a matter of policy, and kept out of the
kernel?  Have the userspace file servers default to putting restrictive
permissions on mounts unless requested otherwise.

I can think of plenty of uses for this.

>   4) Access should not be further restricted for the owner of the
>  mount, even if permission bits, uid or gid would suggest
>  otherwise

Similar questions.

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] FUSE permission modell (Was: fuse review bits)

2005-04-11 Thread Daniel Jacobowitz
On Mon, Apr 11, 2005 at 04:43:32PM +0200, Miklos Szeredi wrote:
   3) No other user should have access to files under the mount, not
  even root[5]

 [5] Obviously root cannot be restricted, but accidental access to
 private data is still a good idea.  E.g. root squashing by NFS servers
 has a similar affect.

Could you explain a little more?  I don't see the point in denying
access to root, but I also can't tell from your explanation whether you
do or not.

If I mount a filesystem using ssh, I want to be able to sudo cp
foo.txt /etc and not get an inexplicable permissions error.

I don't really see the point of this restriction, anyway.  Could you
explain why this shouldn't be a matter of policy, and kept out of the
kernel?  Have the userspace file servers default to putting restrictive
permissions on mounts unless requested otherwise.

I can think of plenty of uses for this.

   4) Access should not be further restricted for the owner of the
  mount, even if permission bits, uid or gid would suggest
  otherwise

Similar questions.

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] FUSE permission modell (Was: fuse review bits)

2005-04-11 Thread Daniel Jacobowitz
On Mon, Apr 11, 2005 at 05:56:09PM +0200, Miklos Szeredi wrote:
 3) No other user should have access to files under the mount, not
even root[5]
  
   [5] Obviously root cannot be restricted, but accidental access to
   private data is still a good idea.  E.g. root squashing by NFS servers
   has a similar affect.
  
  Could you explain a little more?  I don't see the point in denying
  access to root, but I also can't tell from your explanation whether you
  do or not.
 
 Fuse by default does.  This can be disabled by one of two mount
 options: allow_other and allow_root.  The former implies the
 later.  These mount options are only allowed for mounting by root, but
 this can be relaxed with a configuration option.

So the behavior that Cristoph was objecting to here is in fact
configurable?

  I don't really see the point of this restriction, anyway.  Could you
  explain why this shouldn't be a matter of policy, and kept out of the
  kernel?  Have the userspace file servers default to putting restrictive
  permissions on mounts unless requested otherwise.
 
 That's an option.  However you can't restrict root that way, and you
 need an extra directory, since permissions on the mountpoint are
 ignored after the mount.

No, you need the userspace daemon to set the permissions on the root
directory of the new mount restrictively.  What am I missing?

 Restricting root is needed, so that a sysadmin won't accidently go
 into a user's private mount (e.g. sshfs to some machine to which the
 sysadmin otherwise has no access).  Root can still gain access by
 doing 'su me', but at least he will have a bad conscience.  This is
 not such a stupid idea as it first sounds IMO, and by default all NFS
 servers exhibit a similar behavior (root squashing).

Root squashing is actually a much less obnoxious restriction.  It means
that local uid 0 doesn't automatically correspond to remote uid 0.

 4) Access should not be further restricted for the owner of the
mount, even if permission bits, uid or gid would suggest
otherwise
  
  Similar questions.
 
 This behavior can be disabled by the default_permissions mount
 option (wich is not privileged, since it adds restrictions).  A FUSE
 filesystem mounted by root (and not for private purposes) would
 normally be done with allow_other,default_permissions.

But why does the kernel need to know anything about this?  Why can't
the userspace library present the permissions appropriately to the
kernel?  I'm going to be pretty confused if I see a mode 666 file that
I can't even read.  So will various programs.

Except for the allow_root bits, I think that having userspace handle
the issue entirely would cover both objections.

 Does this answer your questions?

More or less.

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] FUSE permission modell (Was: fuse review bits)

2005-04-11 Thread Daniel Jacobowitz
On Mon, Apr 11, 2005 at 07:22:57PM +0100, Jamie Lokier wrote:
1) Only allow mount over a directory for which the user has write
   access (and is not sticky)
 
 Seems good - but why not sticky?  Mounting a user filesystem in
 /tmp/user-xxx/my-mount-point seems not unreasonable - provided the
 administrator can delete the directory (which is possible with
 detachable mount points).

Because then they could mount over /tmp.  and (is not sticky || is
owned by the user) may be more appropriate.


-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] FUSE permission modell (Was: fuse review bits)

2005-04-11 Thread Daniel Jacobowitz
On Mon, Apr 11, 2005 at 09:10:46PM +0200, Miklos Szeredi wrote:
  Root squashing is actually a much less obnoxious restriction.  It means
  that local uid 0 doesn't automatically correspond to remote uid 0.
 
 I don't agree that it's less obnoxious.  Root squashing and a
 restricted directory (-rwx--) would have exactly the same affect:
 root is denied all access.

That's considerably less obnoxious, because such directories are
comparatively rare; most files, root can still read.  There are still
a couple unintuitive cases where root has less privelege than a
particular non-root user, of course.  But your model gives root
normally fewer privileges than the user that mounted th e FS.

  But why does the kernel need to know anything about this?  Why can't
  the userspace library present the permissions appropriately to the
  kernel?
 
 That is exactly what you should do if you use the default_permissions
 options.  You set the file mode, and the kernel checks the permission.

So why not make default_permissions a feature of the userspace?

  I'm going to be pretty confused if I see a mode 666 file that I
  can't even read.  So will various programs.
 
 How would you get such I file?  I don't understand.

The permissions exposed by the FUSE layer apparently don't correspond
to what local users can do with them.  That's the problem here.  It may
be that I'm completely misunderstanding you - but from what you've
described, the userspace daemon can mark a file's permissions as 666,
and then with allow_other and allow_root off no one else will be able
to read it, despite those permissions.

  Except for the allow_root bits, I think that having userspace handle
  the issue entirely would cover both objections.
 
 If I want to allow unprivileged users to be able to mount their
 filesystems, then handling everything in userspace is not an option.
 For example if you could mount a filesystem in which files have
 user=root instead of your own user ID, you could probably confuse some
 applications running as root, and cause information leak.  That's
 exactly why allow_root and allow_other are disabled for normal users.
 
 The only safe option that I can imagine is that the kernel will reset
 the user and group fields of the file attributes.  This would again
 require a kernel option, but would be far less useful IMO.

I think we've got a boundary problem here.  You are exposing some
arbitrary, user-supplied values in the permissions, and then performing
sanity checks at access time; I'm suggesting performing the sanity
checking on the other side, when the permissions are supplied to the
kernel by the daemon.

Why would it be less useful to show files that have been created by a
user as owned by that user?  Or files that the user has requested no
other users be able to write as unwritable by group/other?  Sure, it
makes your tarfs a little less mapped onto the tar file.  But that's
one of the recurring objections to implementing archivers as
filesystems: the ownership in the archive is _not_ relevant to the
mounted copy.

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] FUSE permission modell (Was: fuse review bits)

2005-04-11 Thread Daniel Jacobowitz
On Mon, Apr 11, 2005 at 09:56:29PM +0200, Miklos Szeredi wrote:
 Well the sanity check on the server side is always enforced.  You
 can't trick sftp or ftp to not check permissions.  So checking on
 the client side too (where the fuse daemon is running) makes no
 sense, does it?

That argument doesn't make much sense to me.  But we're at the end of
my useful contributions to this discussion; I'm going to be quiet now
and hope some folks who know more about filesystems have more useful
responses.

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux-2.6.11 can't disable CAD

2005-04-08 Thread Daniel Jacobowitz
On Thu, Apr 07, 2005 at 04:50:32PM -0400, Richard B. Johnson wrote:
> On Thu, 7 Apr 2005, Jan Harkes wrote:
> 
> >On Thu, Apr 07, 2005 at 11:16:14AM -0400, Richard B. Johnson wrote:
> >>In the not-too distant past, one could disable Ctl-Alt-DEL.
> >>Can't do it anymore.
> >...
> >>Observe that reboot() returns 0 and `strace` understands what
> >>parameters were passed. The result is that, if I hit Ctl-Alt-Del,
> >>`init` will still execute the shutdown-order (INIT 0).
> >
> >Actually, if CAD is enabled in the kernel, it will just reboot.
> >If CAD is disabled in the kernel a SIGINT is sent to pid 1 (/sbin/init).
> >
> 
> No, that's not how it ever worked. There are parameters that are
> available in the reboot-system call that define the operation that
> will occur when the 3-finger salute occurs.
> 
> Execute man 2 reboot.

Take your own advice.  From the man page:

   LINUX_REBOOT_CMD_CAD_ON
  (RB_ENABLE_CAD, 0x89abcdef).  CAD is enabled.  This means
  that the CAD keystroke will immediately cause the action
  associated with LINUX_REBOOT_CMD_RESTART.

   LINUX_REBOOT_CMD_CAD_OFF
  (RB_DISABLE_CAD, 0).  CAD is disabled. This means that the CAD
  keystroke will cause a SIGINT signal to be sent to init
  (process 1), whereupon this process may decide upon a
  proper action (maybe: kill all processes, sync, reboot).

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

2005-04-08 Thread Daniel Jacobowitz
On Thu, Apr 07, 2005 at 11:02:22PM -0700, David S. Miller wrote:
> On Fri, 08 Apr 2005 09:19:39 +0400
> Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
> 
> > > I know, the same thing holds for most architectures, including i386.
> > > However, this is not an issue for uni-processor kernels anywhere else,
> > > so what's so special about MIPS?
> > 
> > Does i386 or ppc has cached and uncached memory?
> 
> Yes, they do.
> 
> > No, i386, ppc and others do not require sync on uncached memory access,
> > and only instruction not data cache sync on SMP.
> 
> On MIPS, all the MIPS atomic operations will operate on cached memory.
> And as far as a uniprocessor cpu is concerned, updating the cache is
> all that matters.
> 
> In fact, this SYNC instruction seems unnecessary even on SMP.  If the
> cache is updated, it is part of the coherent memory space and thus
> MOESI main bus SMP cache coherency transactions will see the update
> value.  When another processor does a "read-to-share" or "read-to-own"
> request on the main bus, the processor which did the atomic OP will
> provide the correct data from it's cache in response to that transaction.
> 
> So what you have to do is show me an example where the MIPS kernel can
> do an atomic.h operation on uncached memory.  I even think that is
> invalid, come to think of it.

It better be...

My impression is that the MIPS story isn't so simple, because the
architecture only offers very weak coherency guarantees.  Most of the
SMP implementations offer strong coherency in practice, but at least
one (RM9000) doesn't.

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

2005-04-08 Thread Daniel Jacobowitz
On Thu, Apr 07, 2005 at 11:02:22PM -0700, David S. Miller wrote:
 On Fri, 08 Apr 2005 09:19:39 +0400
 Evgeniy Polyakov [EMAIL PROTECTED] wrote:
 
   I know, the same thing holds for most architectures, including i386.
   However, this is not an issue for uni-processor kernels anywhere else,
   so what's so special about MIPS?
  
  Does i386 or ppc has cached and uncached memory?
 
 Yes, they do.
 
  No, i386, ppc and others do not require sync on uncached memory access,
  and only instruction not data cache sync on SMP.
 
 On MIPS, all the MIPS atomic operations will operate on cached memory.
 And as far as a uniprocessor cpu is concerned, updating the cache is
 all that matters.
 
 In fact, this SYNC instruction seems unnecessary even on SMP.  If the
 cache is updated, it is part of the coherent memory space and thus
 MOESI main bus SMP cache coherency transactions will see the update
 value.  When another processor does a read-to-share or read-to-own
 request on the main bus, the processor which did the atomic OP will
 provide the correct data from it's cache in response to that transaction.
 
 So what you have to do is show me an example where the MIPS kernel can
 do an atomic.h operation on uncached memory.  I even think that is
 invalid, come to think of it.

It better be...

My impression is that the MIPS story isn't so simple, because the
architecture only offers very weak coherency guarantees.  Most of the
SMP implementations offer strong coherency in practice, but at least
one (RM9000) doesn't.

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux-2.6.11 can't disable CAD

2005-04-08 Thread Daniel Jacobowitz
On Thu, Apr 07, 2005 at 04:50:32PM -0400, Richard B. Johnson wrote:
 On Thu, 7 Apr 2005, Jan Harkes wrote:
 
 On Thu, Apr 07, 2005 at 11:16:14AM -0400, Richard B. Johnson wrote:
 In the not-too distant past, one could disable Ctl-Alt-DEL.
 Can't do it anymore.
 ...
 Observe that reboot() returns 0 and `strace` understands what
 parameters were passed. The result is that, if I hit Ctl-Alt-Del,
 `init` will still execute the shutdown-order (INIT 0).
 
 Actually, if CAD is enabled in the kernel, it will just reboot.
 If CAD is disabled in the kernel a SIGINT is sent to pid 1 (/sbin/init).
 
 
 No, that's not how it ever worked. There are parameters that are
 available in the reboot-system call that define the operation that
 will occur when the 3-finger salute occurs.
 
 Execute man 2 reboot.

Take your own advice.  From the man page:

   LINUX_REBOOT_CMD_CAD_ON
  (RB_ENABLE_CAD, 0x89abcdef).  CAD is enabled.  This means
  that the CAD keystroke will immediately cause the action
  associated with LINUX_REBOOT_CMD_RESTART.

   LINUX_REBOOT_CMD_CAD_OFF
  (RB_DISABLE_CAD, 0).  CAD is disabled. This means that the CAD
  keystroke will cause a SIGINT signal to be sent to init
  (process 1), whereupon this process may decide upon a
  proper action (maybe: kill all processes, sync, reboot).

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Do not misuse Coverity please (Was: sound/oss/cs46xx.c: fix a check after use)

2005-03-29 Thread Daniel Jacobowitz
On Mon, Mar 28, 2005 at 10:23:48PM -0800, Andrew Morton wrote:
> >  > -int old=card->amplifier;
> >  > +int old;
> >  >  if(!card)
> >  >  {
> >  >  CS_DBGOUT(CS_ERROR, 2, printk(KERN_INFO 
> >  >  "cs46xx: amp_hercules() called before 
> > initialized.\n"));
> >  >  return;
> >  >  }
> >  > +old = card->amplifier;

> No, there is a third case: the pointer can be NULL, but the compiler
> happened to move the dereference down to after the check.
> 
> If the optimiser is later changed, or if someone tries to compile the code
> with -O0, it will oops.

The thing GCC is most likely to do with this code is discard the NULL
check entirely and leave only the oops; the "if (!card)" can not be
reached without passing through "card->amplifier", and a pointer which
is dereferenced can not be NULL in a valid program.

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Do not misuse Coverity please (Was: sound/oss/cs46xx.c: fix a check after use)

2005-03-29 Thread Daniel Jacobowitz
On Mon, Mar 28, 2005 at 10:23:48PM -0800, Andrew Morton wrote:
-int old=card-amplifier;
+int old;
 if(!card)
 {
 CS_DBGOUT(CS_ERROR, 2, printk(KERN_INFO 
 cs46xx: amp_hercules() called before 
  initialized.\n));
 return;
 }
+old = card-amplifier;

 No, there is a third case: the pointer can be NULL, but the compiler
 happened to move the dereference down to after the check.
 
 If the optimiser is later changed, or if someone tries to compile the code
 with -O0, it will oops.

The thing GCC is most likely to do with this code is discard the NULL
check entirely and leave only the oops; the if (!card) can not be
reached without passing through card-amplifier, and a pointer which
is dereferenced can not be NULL in a valid program.

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CHECKER] inconsistent NFS stat cache (NFS on ext3, 2.6.11)

2005-03-13 Thread Daniel Jacobowitz
On Sun, Mar 13, 2005 at 07:50:09PM -0500, Trond Myklebust wrote:
> Sorry, but you should _never_ have gotten an ESTALE error if the file
> was not in use when you deleted the old copy of glibc. A fresh call to
> open() will always result in a new lookup of the filehandle.
> What may have happened in the case of the EIO error is that you may have
> raced: i.e. a client starts reading the file while it is being copied
> to.

It is in a separate root filesystem, currently not used by anything on
the target.  It is likely to be in cache, but I can absolutely
guarantee it isn't open.  Hmm, server is x86_64 2.6.7, client is 2.6.10
MIPS.  I should upgrade them and see if that helps.

Unfortunately I haven't found any smaller testcases than installing an
entire root FS.

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CHECKER] inconsistent NFS stat cache (NFS on ext3, 2.6.11)

2005-03-13 Thread Daniel Jacobowitz
On Sun, Mar 13, 2005 at 03:42:29PM -0500, Trond Myklebust wrote:
> su den 13.03.2005 Klokka 15:04 (-0500) skreiv Daniel Jacobowitz:
> 
> > I can't find any documentation about this, but it seems like the same
> > problem that has been causing me headaches lately; when I replace glibc
> > from the server side of an nfsroot, the client has a couple of
> > variously wrong reads before it sees the new files.  If it breaks NFS
> > so badly, why is it the default for the Linux NFS server?
> 
> No, that's a very different issue: you are violating the NFS cache
> consistency rules if you are changing a file that is being held open by
> other machines.
> The correct way to do the above is to use GNU install with the '-b'
> option: that will rename the version of glibc that is in use, and then
> install the new glibc in a different inode.

[closed and/or irrelevant lists removed from CC:]

No, the copy of glibc in question is not in use at the time.  The next
attempt to open it on the client will sometimes generate a "stale NFS
handle" message, or if the open succeeds a read will sometimes return
EIO.  But it sounds like this is a different problem than the original
poster was testing for.

I'm still curious about the answer to my question above :-)

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More trouble with i386 EFLAGS and ptrace

2005-03-13 Thread Daniel Jacobowitz
On Sun, Mar 13, 2005 at 12:27:58AM -0800, Roland McGrath wrote:
> This patch further cleans up the appearance of TF in eflags when ptrace is
> involved.  With this, PTRACE_SINGLESTEP will not cause TF to appear in
> eflags as seen by PTRACE_GETREGS and the like, when the instruction faulted
> for some reason other than the single-step trap.
> 
> This moves the check added by Dan's patch from setup_sigcontext to
> handle_signal.  This is a cosmetic difference, but I think it makes more
> sense to consolidate all the "reset registers to canonical state" work in
> the same place (i.e. put it with the syscall rollback code), separate from
> the signal handler setup.  The change that matters is moving the similar
> check out of do_debug, where it only covers the case of a single-step trap.
> Instead, it goes into the ptrace_signal_deliver macro, which is called
> before the ptrace stop for whatever signal results from whatever kind of
> fault in that instruction (or asynchronous signal).  With that, the
> handle_signal check is still needed only for the case of PTRACE_SINGLESTEP
> with a handled signal.
> 
> 
> Thanks,
> Roland

Thanks, looks right to me!


-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CHECKER] inconsistent NFS stat cache (NFS on ext3, 2.6.11)

2005-03-13 Thread Daniel Jacobowitz
On Sun, Mar 13, 2005 at 12:04:27AM -0500, Trond Myklebust wrote:
> lau den 12.03.2005 Klokka 03:56 (-0800) skreiv Junfeng Yang:
> > Hi,
> > 
> > We checked NFS on top of ext3 using FiSC (our file system model checker)
> > and found a case where NFS stat cache can contain inconsistent entries.
> > 
> > Basically, to trigger this inconsistency, just do the following steps:
> > 1. create a file A1, write a few bytes to it, so A1 is 4 words
> > 2. create a hard link A2, pointing to A1
> > 3. stat on A2. A2's size is 4 words
> > 4. truncate A1 to a larger size, write a few bytes at the end. now it's
> > 1031 words.
> > 5. stat on A2. it's size is still 4 words, which should be 1031 words
> > 
> > We have a test case to re-create this warning.  You can download it at
> > http://fisc.stanford.edu/bug16/crash.c.  It includes some sudo commands
> > to mount nfs partitions, which you might want to change according to your
> > local settings.
> > 
> > cat /etc/exports shows:
> > /mnt/sbd0-export  localhost(rw,sync)
> > /mnt/sbd1-export  localhost(rw,sync)
> > 
> > Let me know if you have any problems reproducing the warning. We'd
> > appreciate any confirmations/clarifications.
> > 
> 
> This is a known problem. Turn off the (default - grrr) subtree checking
> export option on the server, and it will all work properly. The subtree
> checking option violates the NFS standards for filehandle generation in
> so many ways, that it isn't even funny.

I can't find any documentation about this, but it seems like the same
problem that has been causing me headaches lately; when I replace glibc
from the server side of an nfsroot, the client has a couple of
variously wrong reads before it sees the new files.  If it breaks NFS
so badly, why is it the default for the Linux NFS server?

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CHECKER] inconsistent NFS stat cache (NFS on ext3, 2.6.11)

2005-03-13 Thread Daniel Jacobowitz
On Sun, Mar 13, 2005 at 12:04:27AM -0500, Trond Myklebust wrote:
 lau den 12.03.2005 Klokka 03:56 (-0800) skreiv Junfeng Yang:
  Hi,
  
  We checked NFS on top of ext3 using FiSC (our file system model checker)
  and found a case where NFS stat cache can contain inconsistent entries.
  
  Basically, to trigger this inconsistency, just do the following steps:
  1. create a file A1, write a few bytes to it, so A1 is 4 words
  2. create a hard link A2, pointing to A1
  3. stat on A2. A2's size is 4 words
  4. truncate A1 to a larger size, write a few bytes at the end. now it's
  1031 words.
  5. stat on A2. it's size is still 4 words, which should be 1031 words
  
  We have a test case to re-create this warning.  You can download it at
  http://fisc.stanford.edu/bug16/crash.c.  It includes some sudo commands
  to mount nfs partitions, which you might want to change according to your
  local settings.
  
  cat /etc/exports shows:
  /mnt/sbd0-export  localhost(rw,sync)
  /mnt/sbd1-export  localhost(rw,sync)
  
  Let me know if you have any problems reproducing the warning. We'd
  appreciate any confirmations/clarifications.
  
 
 This is a known problem. Turn off the (default - grrr) subtree checking
 export option on the server, and it will all work properly. The subtree
 checking option violates the NFS standards for filehandle generation in
 so many ways, that it isn't even funny.

I can't find any documentation about this, but it seems like the same
problem that has been causing me headaches lately; when I replace glibc
from the server side of an nfsroot, the client has a couple of
variously wrong reads before it sees the new files.  If it breaks NFS
so badly, why is it the default for the Linux NFS server?

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More trouble with i386 EFLAGS and ptrace

2005-03-13 Thread Daniel Jacobowitz
On Sun, Mar 13, 2005 at 12:27:58AM -0800, Roland McGrath wrote:
 This patch further cleans up the appearance of TF in eflags when ptrace is
 involved.  With this, PTRACE_SINGLESTEP will not cause TF to appear in
 eflags as seen by PTRACE_GETREGS and the like, when the instruction faulted
 for some reason other than the single-step trap.
 
 This moves the check added by Dan's patch from setup_sigcontext to
 handle_signal.  This is a cosmetic difference, but I think it makes more
 sense to consolidate all the reset registers to canonical state work in
 the same place (i.e. put it with the syscall rollback code), separate from
 the signal handler setup.  The change that matters is moving the similar
 check out of do_debug, where it only covers the case of a single-step trap.
 Instead, it goes into the ptrace_signal_deliver macro, which is called
 before the ptrace stop for whatever signal results from whatever kind of
 fault in that instruction (or asynchronous signal).  With that, the
 handle_signal check is still needed only for the case of PTRACE_SINGLESTEP
 with a handled signal.
 
 
 Thanks,
 Roland

Thanks, looks right to me!


-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CHECKER] inconsistent NFS stat cache (NFS on ext3, 2.6.11)

2005-03-13 Thread Daniel Jacobowitz
On Sun, Mar 13, 2005 at 03:42:29PM -0500, Trond Myklebust wrote:
 su den 13.03.2005 Klokka 15:04 (-0500) skreiv Daniel Jacobowitz:
 
  I can't find any documentation about this, but it seems like the same
  problem that has been causing me headaches lately; when I replace glibc
  from the server side of an nfsroot, the client has a couple of
  variously wrong reads before it sees the new files.  If it breaks NFS
  so badly, why is it the default for the Linux NFS server?
 
 No, that's a very different issue: you are violating the NFS cache
 consistency rules if you are changing a file that is being held open by
 other machines.
 The correct way to do the above is to use GNU install with the '-b'
 option: that will rename the version of glibc that is in use, and then
 install the new glibc in a different inode.

[closed and/or irrelevant lists removed from CC:]

No, the copy of glibc in question is not in use at the time.  The next
attempt to open it on the client will sometimes generate a stale NFS
handle message, or if the open succeeds a read will sometimes return
EIO.  But it sounds like this is a different problem than the original
poster was testing for.

I'm still curious about the answer to my question above :-)

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CHECKER] inconsistent NFS stat cache (NFS on ext3, 2.6.11)

2005-03-13 Thread Daniel Jacobowitz
On Sun, Mar 13, 2005 at 07:50:09PM -0500, Trond Myklebust wrote:
 Sorry, but you should _never_ have gotten an ESTALE error if the file
 was not in use when you deleted the old copy of glibc. A fresh call to
 open() will always result in a new lookup of the filehandle.
 What may have happened in the case of the EIO error is that you may have
 raced: i.e. a client starts reading the file while it is being copied
 to.

It is in a separate root filesystem, currently not used by anything on
the target.  It is likely to be in cache, but I can absolutely
guarantee it isn't open.  Hmm, server is x86_64 2.6.7, client is 2.6.10
MIPS.  I should upgrade them and see if that helps.

Unfortunately I haven't found any smaller testcases than installing an
entire root FS.

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More trouble with i386 EFLAGS and ptrace

2005-03-08 Thread Daniel Jacobowitz
On Mon, Mar 07, 2005 at 01:29:12PM -0800, Roland McGrath wrote:
> > Is this semantically different from the patch I posted, i.e. is there
> > any case which one of them covers and not the other?
> 
> Yes, the second case that I described when I said there were two cases!
> (Sheesh.)

Calm down, there were already two cases.  I reread your message and
couldn't pick out the answer, or I wouldn't have asked.

>  To repeat, when the process was doing PTRACE_SINGLESTEP and then
> stops on some other signal rather than because of the single-step trap
> (e.g. single-stepping an instruction that faults), ptrace will show TF set
> in its registers.  With my patch, it will show TF clear.

I can reproduce this problem with the patch that Linus committed, so
you should probably update your patch for a current snapshot and nag
him about it.

> > That is an inability to set breakpoints in the vsyscall page.  Andrew
> > told me (last May, wow) that he thought this worked in Fedora, but I
> > haven't seen any signs of the code.  It would certainly be a Good Thing
> > if it is possible!
> 
> Fedora kernels use a normal mapping (with randomized location) for the
> page, rather than the fixed high address in the vanilla kernel.  The
> FIXADDR_USER_START area is globally mapped in a special way not using
> normal vma data structures, and is permanently read-only in all tasks.  
> COW via ptrace works normally for Fedora's flavor, but no writing is ever
> possible to the fixmap page.

Blech.  I assume that there is no way to map a normal VMA over top of
the fixed page, for a particular process?  This makes debugging the
vsyscall DSO a real pain.

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More trouble with i386 EFLAGS and ptrace

2005-03-08 Thread Daniel Jacobowitz
On Mon, Mar 07, 2005 at 01:29:12PM -0800, Roland McGrath wrote:
  Is this semantically different from the patch I posted, i.e. is there
  any case which one of them covers and not the other?
 
 Yes, the second case that I described when I said there were two cases!
 (Sheesh.)

Calm down, there were already two cases.  I reread your message and
couldn't pick out the answer, or I wouldn't have asked.

  To repeat, when the process was doing PTRACE_SINGLESTEP and then
 stops on some other signal rather than because of the single-step trap
 (e.g. single-stepping an instruction that faults), ptrace will show TF set
 in its registers.  With my patch, it will show TF clear.

I can reproduce this problem with the patch that Linus committed, so
you should probably update your patch for a current snapshot and nag
him about it.

  That is an inability to set breakpoints in the vsyscall page.  Andrew
  told me (last May, wow) that he thought this worked in Fedora, but I
  haven't seen any signs of the code.  It would certainly be a Good Thing
  if it is possible!
 
 Fedora kernels use a normal mapping (with randomized location) for the
 page, rather than the fixed high address in the vanilla kernel.  The
 FIXADDR_USER_START area is globally mapped in a special way not using
 normal vma data structures, and is permanently read-only in all tasks.  
 COW via ptrace works normally for Fedora's flavor, but no writing is ever
 possible to the fixmap page.

Blech.  I assume that there is no way to map a normal VMA over top of
the fixed page, for a particular process?  This makes debugging the
vsyscall DSO a real pain.

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More trouble with i386 EFLAGS and ptrace

2005-03-06 Thread Daniel Jacobowitz
On Sun, Mar 06, 2005 at 07:16:37PM -0800, Roland McGrath wrote:
> > I think mine is more correct; the problem doesn't occur because the
> > debugger cancelled a signal, it occurs because a bogus TF bit was saved
> > to the signal context.  I like keeping solutions close to their
> > problems.  But that's just aesthetic.
> 
> I understand the scenario.  Understanding how it comes about made me
> recognize there is another scenario that is also handled wrong.  
> I didn't say the second scenario was what you are seeing.
> 
> Dan's patch covers the case of PTRACE_SINGLESTEP called to deliver a signal
> that has a handler to run.  That's because there TF is set after the ptrace
> stop, when it's resuming.  This is a "normalize register state" operation.
> I think it would be a little clearer to do this in handle_signal where the
> similar case of tweaking register state to back up a system call is done.
> 
> The patch I posted moves the resetting of TF from the trap handler to
> ptrace_signal_deliver.  This is necessary to ensure that TF is not shown as
> set in the registers retrieved by the debugger when the process stops for
> something other than the single-step trap requested by PTRACE_SINGLESTEP.

Is this semantically different from the patch I posted, i.e. is there
any case which one of them covers and not the other?

> Here is a patch that does both of those things.  This had no effect on any
> of the gdb testsuite cases (for good or ill) aside from sigstep.exp, and:
> 
> $ grep 'FAIL.*sigstep' testsuite/gdb.sum
> KFAIL: gdb.base/sigstep.exp: finish from handleri; leave handler (could not 
> set breakpoint) (PRMS: gdb/1736)
> 
> I don't know what that one is about, but it was KFAIL before the change too.

That is an inability to set breakpoints in the vsyscall page.  Andrew
told me (last May, wow) that he thought this worked in Fedora, but I
haven't seen any signs of the code.  It would certainly be a Good Thing
if it is possible!

> 

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More trouble with i386 EFLAGS and ptrace

2005-03-06 Thread Daniel Jacobowitz
On Sun, Mar 06, 2005 at 01:22:25PM -0800, Roland McGrath wrote:
> > I _think_ your test-case would work right if you just moved that code from
> > the special-case in do_debug(), and moved it to the top of
> > setup_sigcontext() instead. I've not tested it, though, and haven't really 
> > given it any "deep thought". Maybe somebody smarter can say "yeah, that's 
> > obviously the right thing to do" or "no, that won't work because.."
> 
> Indeed, this is what my original changes for this did, before you started
> cleaning things up to be nice to TF users other than PTRACE_SINGLESTEP. 
> 
> I note, btw, that the x86_64 code is still at that prior stage.  So I think
> it doesn't have this new wrinkle, but it also doesn't have the advantages
> of the more recent i386 changes.  Once we're sure about the i386 state, we
> should update the x86_64 code to match.
> 
> I'm not sure what kind of smart this makes me, but I'll say that your plan
> would work and no, it's obviously not the right thing to do. ;-) I haven't
> tested the following, not having tracked down the specific problem case you
> folks are talking about.  But I think this is the right solution.  The
> difference is that when we stop for some signal and report to the debugger,
> the debugger looking at our registers will see TF clear instead of set,
> before it decides whether to continue us with the signal or what to do.
> With the change yo suggested, (I think) if the debugger decides to eat the
> signal and resume, we would get a spurious single-step trap after executing
> the next instruction, instead of resuming normally as requested.

Roland, the sigstep.exp test in the GDB testsuite will show this
problem; if your patch monotonically improves GDB HEAD testsuite
results and removes all the FAILs for sigstep.exp, then it's probably
equivalent to the one I just posted for this testcase.

I think mine is more correct; the problem doesn't occur because the
debugger cancelled a signal, it occurs because a bogus TF bit was saved
to the signal context.  I like keeping solutions close to their
problems.  But that's just aesthetic.

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More trouble with i386 EFLAGS and ptrace

2005-03-06 Thread Daniel Jacobowitz
On Sun, Mar 06, 2005 at 12:03:22PM -0800, Linus Torvalds wrote:
> I _think_ your test-case would work right if you just moved that code from
> the special-case in do_debug(), and moved it to the top of
> setup_sigcontext() instead. I've not tested it, though, and haven't really 
> given it any "deep thought". Maybe somebody smarter can say "yeah, that's 
> obviously the right thing to do" or "no, that won't work because.."

I bought it, but the GDB testsuite didn't.  Both copies seem to be
necessary; there's generally no signal handler for SIGTRAP, so moving
it disables the test in the most common case.  I didn't poke at it long
enough to figure out what the failing case was, but it introduced a
different situation which could leave TF enabled. This, however,
worked:

If a debugger set the TF bit, make sure to clear it when creating a
signal context.  Otherwise, TF will be incorrectly restored by
sigreturn.

Signed-off-by: Daniel Jacobowitz <[EMAIL PROTECTED]>

= arch/i386/kernel/signal.c 1.53 vs edited =
--- 1.53/arch/i386/kernel/signal.c  2005-01-31 01:20:14 -05:00
+++ edited/arch/i386/kernel/signal.c2005-03-06 15:36:41 -05:00
@@ -277,6 +277,18 @@
 {
int tmp, err = 0;
 
+   /*
+* If TF is set due to a debugger (PT_DTRACE), clear the TF
+* flag so that register information in the sigcontext is
+* correct.
+*/
+   if (unlikely(regs->eflags & TF_MASK)) {
+   if (likely(current->ptrace & PT_DTRACE)) {
+   current->ptrace &= ~PT_DTRACE;
+   regs->eflags &= ~TF_MASK;
+   }
+   }
+
tmp = 0;
__asm__("movl %%gs,%0" : "=r"(tmp): "0"(tmp));
err |= __put_user(tmp, (unsigned int __user *)>gs);

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More trouble with i386 EFLAGS and ptrace

2005-03-06 Thread Daniel Jacobowitz
On Sun, Mar 06, 2005 at 02:38:41PM -0500, Daniel Jacobowitz wrote:
> The reason this happens is that when the inferior hits a breakpoint, the
> first thing GDB will do is remove the breakpoint, single-step past it, and
> reinsert it.  So GDB does a PTRACE_SINGLESTEP, and the kernel invokes the
> signal handler (without single-step - good so far).  When the signal handler
> returns, we've lost track of the fact that ptrace set the single-step flag,
> however.  So the single-step completes and returns SIGTRAP to GDB.  GDB is
> expecting a SIGTRAP and reinserts the breakpoint.  Then it resumes the
> inferior, but now the trap flag is set in $eflags.  So, oops, the continue
> acts like a step instead.

Eh, I got the event sequence wrong as usual, but the basic description
is right.

- Original SIGTRAP at breakpoint
- user says "cont"
- GDB tries to singlestep past the breakpoint - PTRACE_SINGLESTEP, no
  signal
- GDB receives SIGALRM at the same PC
- GDB tries to singlestep past the breakpoint - PTRACE_SINGLESTEP,
  SIGALRM
- GDB receives SIGTRAP at the first instruction of the handler
- GDB reinserts the breakpoint at line 18.  This is a "step-resume"
  breakpoint - we were stepping, we were interrupted by a signal.
- GDB issues PTRACE_CONT, no signal
- GDB receives SIGTRAP at the sigreturn location - this is the
  step-resume breakpoint.
- GDB remove that and issues PTRACE_SINGLESTEP, no signal - It
  is trying again to get past the breakpoint location so that it
  can honor the user's "cont" request.
- GDB receives SIGTRAP at the instruction after the breakpoint.
- GDB reinserts the original breakpoint and issues PTRACE_CONTINUE.

All of this is what's supposed to happen.  The executable be running
free now until it hits the breakpoint again.

- GDB receives an unexpected SIGTRAP at the next instruction (the
  second instruction after the original breakpoint).

If your compiler uses only two instructions for the loop, you might not
see this.  gcc -O0 will use three by default.  Just stick something
else in the loop.

> What to do?  We need to know when we restore the trap bit in sigreturn
> whether it was set by ptrace or by the application (possibly including by
> the signal handler).

If I'm following this right, then the saved value of eflags in the
signal handler should not contain the trap bit at this point.  It does,
though.  It's hard to see this in GDB, because the CFI does not express
%eflags, so "print $eflags" won't track up the stack.  I don't think
there's a handy dwarf register number for it at the moment.  But you
can print out the struct sigcontext by hand once you locate it on the
stack.

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


More trouble with i386 EFLAGS and ptrace

2005-03-06 Thread Daniel Jacobowitz
It looks like the changes to preserve eflags when single-stepping don't work
right with signals.  Take this test case:


#include 
#include 

volatile int done;

void handler (int sig)
{
  done = 1;
}

int main()
{
  while (1)
{
  done = 0;
  signal (SIGALRM, handler);
  alarm (1);
  while (!done);
}
}


And this GDB session:

(gdb) b 18
Breakpoint 1 at 0x804840d: file test.c, line 18.
(gdb) r
Starting program: /home/drow/eflags/test

Breakpoint 1, main () at test.c:18
18while (!done);
(gdb) p/x $eflags
$1 = 0x200217
(gdb) c
Continuing.

Program received signal SIGTRAP, Trace/breakpoint trap.
0x08048414 in main () at test.c:18
18while (!done);
(gdb) p/x $eflags
$2 = 0x200302

There's an implied delay before the "c" which is long enough for the signal
handler to become pending.

The reason this happens is that when the inferior hits a breakpoint, the
first thing GDB will do is remove the breakpoint, single-step past it, and
reinsert it.  So GDB does a PTRACE_SINGLESTEP, and the kernel invokes the
signal handler (without single-step - good so far).  When the signal handler
returns, we've lost track of the fact that ptrace set the single-step flag,
however.  So the single-step completes and returns SIGTRAP to GDB.  GDB is
expecting a SIGTRAP and reinserts the breakpoint.  Then it resumes the
inferior, but now the trap flag is set in $eflags.  So, oops, the continue
acts like a step instead.

What to do?  We need to know when we restore the trap bit in sigreturn
whether it was set by ptrace or by the application (possibly including by
the signal handler).

Andrew, serious kudos for GDB's sigstep.exp, which uncovered this problem
(through a much more complicated test - I may add the smaller one).

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


More trouble with i386 EFLAGS and ptrace

2005-03-06 Thread Daniel Jacobowitz
It looks like the changes to preserve eflags when single-stepping don't work
right with signals.  Take this test case:

snip
#include signal.h
#include unistd.h

volatile int done;

void handler (int sig)
{
  done = 1;
}

int main()
{
  while (1)
{
  done = 0;
  signal (SIGALRM, handler);
  alarm (1);
  while (!done);
}
}
snip

And this GDB session:

(gdb) b 18
Breakpoint 1 at 0x804840d: file test.c, line 18.
(gdb) r
Starting program: /home/drow/eflags/test

Breakpoint 1, main () at test.c:18
18while (!done);
(gdb) p/x $eflags
$1 = 0x200217
(gdb) c
Continuing.

Program received signal SIGTRAP, Trace/breakpoint trap.
0x08048414 in main () at test.c:18
18while (!done);
(gdb) p/x $eflags
$2 = 0x200302

There's an implied delay before the c which is long enough for the signal
handler to become pending.

The reason this happens is that when the inferior hits a breakpoint, the
first thing GDB will do is remove the breakpoint, single-step past it, and
reinsert it.  So GDB does a PTRACE_SINGLESTEP, and the kernel invokes the
signal handler (without single-step - good so far).  When the signal handler
returns, we've lost track of the fact that ptrace set the single-step flag,
however.  So the single-step completes and returns SIGTRAP to GDB.  GDB is
expecting a SIGTRAP and reinserts the breakpoint.  Then it resumes the
inferior, but now the trap flag is set in $eflags.  So, oops, the continue
acts like a step instead.

What to do?  We need to know when we restore the trap bit in sigreturn
whether it was set by ptrace or by the application (possibly including by
the signal handler).

Andrew, serious kudos for GDB's sigstep.exp, which uncovered this problem
(through a much more complicated test - I may add the smaller one).

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More trouble with i386 EFLAGS and ptrace

2005-03-06 Thread Daniel Jacobowitz
On Sun, Mar 06, 2005 at 02:38:41PM -0500, Daniel Jacobowitz wrote:
 The reason this happens is that when the inferior hits a breakpoint, the
 first thing GDB will do is remove the breakpoint, single-step past it, and
 reinsert it.  So GDB does a PTRACE_SINGLESTEP, and the kernel invokes the
 signal handler (without single-step - good so far).  When the signal handler
 returns, we've lost track of the fact that ptrace set the single-step flag,
 however.  So the single-step completes and returns SIGTRAP to GDB.  GDB is
 expecting a SIGTRAP and reinserts the breakpoint.  Then it resumes the
 inferior, but now the trap flag is set in $eflags.  So, oops, the continue
 acts like a step instead.

Eh, I got the event sequence wrong as usual, but the basic description
is right.

- Original SIGTRAP at breakpoint
- user says cont
- GDB tries to singlestep past the breakpoint - PTRACE_SINGLESTEP, no
  signal
- GDB receives SIGALRM at the same PC
- GDB tries to singlestep past the breakpoint - PTRACE_SINGLESTEP,
  SIGALRM
- GDB receives SIGTRAP at the first instruction of the handler
- GDB reinserts the breakpoint at line 18.  This is a step-resume
  breakpoint - we were stepping, we were interrupted by a signal.
- GDB issues PTRACE_CONT, no signal
- GDB receives SIGTRAP at the sigreturn location - this is the
  step-resume breakpoint.
- GDB remove that and issues PTRACE_SINGLESTEP, no signal - It
  is trying again to get past the breakpoint location so that it
  can honor the user's cont request.
- GDB receives SIGTRAP at the instruction after the breakpoint.
- GDB reinserts the original breakpoint and issues PTRACE_CONTINUE.

All of this is what's supposed to happen.  The executable be running
free now until it hits the breakpoint again.

- GDB receives an unexpected SIGTRAP at the next instruction (the
  second instruction after the original breakpoint).

If your compiler uses only two instructions for the loop, you might not
see this.  gcc -O0 will use three by default.  Just stick something
else in the loop.

 What to do?  We need to know when we restore the trap bit in sigreturn
 whether it was set by ptrace or by the application (possibly including by
 the signal handler).

If I'm following this right, then the saved value of eflags in the
signal handler should not contain the trap bit at this point.  It does,
though.  It's hard to see this in GDB, because the CFI does not express
%eflags, so print $eflags won't track up the stack.  I don't think
there's a handy dwarf register number for it at the moment.  But you
can print out the struct sigcontext by hand once you locate it on the
stack.

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More trouble with i386 EFLAGS and ptrace

2005-03-06 Thread Daniel Jacobowitz
On Sun, Mar 06, 2005 at 12:03:22PM -0800, Linus Torvalds wrote:
 I _think_ your test-case would work right if you just moved that code from
 the special-case in do_debug(), and moved it to the top of
 setup_sigcontext() instead. I've not tested it, though, and haven't really 
 given it any deep thought. Maybe somebody smarter can say yeah, that's 
 obviously the right thing to do or no, that won't work because..

I bought it, but the GDB testsuite didn't.  Both copies seem to be
necessary; there's generally no signal handler for SIGTRAP, so moving
it disables the test in the most common case.  I didn't poke at it long
enough to figure out what the failing case was, but it introduced a
different situation which could leave TF enabled. This, however,
worked:

If a debugger set the TF bit, make sure to clear it when creating a
signal context.  Otherwise, TF will be incorrectly restored by
sigreturn.

Signed-off-by: Daniel Jacobowitz [EMAIL PROTECTED]

= arch/i386/kernel/signal.c 1.53 vs edited =
--- 1.53/arch/i386/kernel/signal.c  2005-01-31 01:20:14 -05:00
+++ edited/arch/i386/kernel/signal.c2005-03-06 15:36:41 -05:00
@@ -277,6 +277,18 @@
 {
int tmp, err = 0;
 
+   /*
+* If TF is set due to a debugger (PT_DTRACE), clear the TF
+* flag so that register information in the sigcontext is
+* correct.
+*/
+   if (unlikely(regs-eflags  TF_MASK)) {
+   if (likely(current-ptrace  PT_DTRACE)) {
+   current-ptrace = ~PT_DTRACE;
+   regs-eflags = ~TF_MASK;
+   }
+   }
+
tmp = 0;
__asm__(movl %%gs,%0 : =r(tmp): 0(tmp));
err |= __put_user(tmp, (unsigned int __user *)sc-gs);

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More trouble with i386 EFLAGS and ptrace

2005-03-06 Thread Daniel Jacobowitz
On Sun, Mar 06, 2005 at 01:22:25PM -0800, Roland McGrath wrote:
  I _think_ your test-case would work right if you just moved that code from
  the special-case in do_debug(), and moved it to the top of
  setup_sigcontext() instead. I've not tested it, though, and haven't really 
  given it any deep thought. Maybe somebody smarter can say yeah, that's 
  obviously the right thing to do or no, that won't work because..
 
 Indeed, this is what my original changes for this did, before you started
 cleaning things up to be nice to TF users other than PTRACE_SINGLESTEP. 
 
 I note, btw, that the x86_64 code is still at that prior stage.  So I think
 it doesn't have this new wrinkle, but it also doesn't have the advantages
 of the more recent i386 changes.  Once we're sure about the i386 state, we
 should update the x86_64 code to match.
 
 I'm not sure what kind of smart this makes me, but I'll say that your plan
 would work and no, it's obviously not the right thing to do. ;-) I haven't
 tested the following, not having tracked down the specific problem case you
 folks are talking about.  But I think this is the right solution.  The
 difference is that when we stop for some signal and report to the debugger,
 the debugger looking at our registers will see TF clear instead of set,
 before it decides whether to continue us with the signal or what to do.
 With the change yo suggested, (I think) if the debugger decides to eat the
 signal and resume, we would get a spurious single-step trap after executing
 the next instruction, instead of resuming normally as requested.

Roland, the sigstep.exp test in the GDB testsuite will show this
problem; if your patch monotonically improves GDB HEAD testsuite
results and removes all the FAILs for sigstep.exp, then it's probably
equivalent to the one I just posted for this testcase.

I think mine is more correct; the problem doesn't occur because the
debugger cancelled a signal, it occurs because a bogus TF bit was saved
to the signal context.  I like keeping solutions close to their
problems.  But that's just aesthetic.

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More trouble with i386 EFLAGS and ptrace

2005-03-06 Thread Daniel Jacobowitz
On Sun, Mar 06, 2005 at 07:16:37PM -0800, Roland McGrath wrote:
  I think mine is more correct; the problem doesn't occur because the
  debugger cancelled a signal, it occurs because a bogus TF bit was saved
  to the signal context.  I like keeping solutions close to their
  problems.  But that's just aesthetic.
 
 I understand the scenario.  Understanding how it comes about made me
 recognize there is another scenario that is also handled wrong.  
 I didn't say the second scenario was what you are seeing.
 
 Dan's patch covers the case of PTRACE_SINGLESTEP called to deliver a signal
 that has a handler to run.  That's because there TF is set after the ptrace
 stop, when it's resuming.  This is a normalize register state operation.
 I think it would be a little clearer to do this in handle_signal where the
 similar case of tweaking register state to back up a system call is done.
 
 The patch I posted moves the resetting of TF from the trap handler to
 ptrace_signal_deliver.  This is necessary to ensure that TF is not shown as
 set in the registers retrieved by the debugger when the process stops for
 something other than the single-step trap requested by PTRACE_SINGLESTEP.

Is this semantically different from the patch I posted, i.e. is there
any case which one of them covers and not the other?

 Here is a patch that does both of those things.  This had no effect on any
 of the gdb testsuite cases (for good or ill) aside from sigstep.exp, and:
 
 $ grep 'FAIL.*sigstep' testsuite/gdb.sum
 KFAIL: gdb.base/sigstep.exp: finish from handleri; leave handler (could not 
 set breakpoint) (PRMS: gdb/1736)
 
 I don't know what that one is about, but it was KFAIL before the change too.

That is an inability to set breakpoints in the vsyscall page.  Andrew
told me (last May, wow) that he thought this worked in Fedora, but I
haven't seen any signs of the code.  It would certainly be a Good Thing
if it is possible!

 

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ARM undefined symbols. Again.

2005-02-25 Thread Daniel Jacobowitz
On Fri, Feb 25, 2005 at 08:23:49PM +, Russell King wrote:
> On Fri, Feb 25, 2005 at 11:59:01AM -0800, Linus Torvalds wrote:
> > On Fri, 25 Feb 2005, Russell King wrote:
> > > So, what's happening about this?
> > 
> > Btw, is there any real reason why the ARM _tools_ can't just be fixed? I 
> > don't see why this isn't a tools bug?
> 
> It is a tools bug.  But the issue is that *all* versions of binutils
> currently available which are kernel-capable (since the inclusion of
> the kbuild .incbin requirement on binutils) have this bug, with the
> exception of maybe CVS versions.
> 
> We can't say "you must use the current CVS binutils to build the
> kernel" because that's not a sane toolchain base to build products
> on.
> 
> I've been wanting to see a version of binutils released pretty damn
> quick so I can say "kernel only builds with latest toolchain" but
> I suspect even that's going to be seen as being unreasonable.

Not sure who you asked, but since I run the binutils releases...

I am fairly positive that this bug has been fixed in the binutils CVS:

2004-07-02  Nick Clifton  <[EMAIL PROTECTED]>

* config/tc-arm.c (md_apply_fix3:BFD_RELOC_ARM_IMMEDIATE): Do not
allow values which have come from undefined symbols.
Always consider this fixup to have been processed as a reloc
cannot be generated for it.

I know several ARM kernel developers who are using tools with this
patch applied already.  Also, I anticipate the release of binutils 2.16
including the fix in about a month.

> And yes, the toolchain peoples point of view is "fix the kernel".

Huh?  Obviously the kernel isn't broken, unless you're talking about
the kallsyms checks now.

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ARM undefined symbols. Again.

2005-02-25 Thread Daniel Jacobowitz
On Fri, Feb 25, 2005 at 08:23:49PM +, Russell King wrote:
 On Fri, Feb 25, 2005 at 11:59:01AM -0800, Linus Torvalds wrote:
  On Fri, 25 Feb 2005, Russell King wrote:
   So, what's happening about this?
  
  Btw, is there any real reason why the ARM _tools_ can't just be fixed? I 
  don't see why this isn't a tools bug?
 
 It is a tools bug.  But the issue is that *all* versions of binutils
 currently available which are kernel-capable (since the inclusion of
 the kbuild .incbin requirement on binutils) have this bug, with the
 exception of maybe CVS versions.
 
 We can't say you must use the current CVS binutils to build the
 kernel because that's not a sane toolchain base to build products
 on.
 
 I've been wanting to see a version of binutils released pretty damn
 quick so I can say kernel only builds with latest toolchain but
 I suspect even that's going to be seen as being unreasonable.

Not sure who you asked, but since I run the binutils releases...

I am fairly positive that this bug has been fixed in the binutils CVS:

2004-07-02  Nick Clifton  [EMAIL PROTECTED]

* config/tc-arm.c (md_apply_fix3:BFD_RELOC_ARM_IMMEDIATE): Do not
allow values which have come from undefined symbols.
Always consider this fixup to have been processed as a reloc
cannot be generated for it.

I know several ARM kernel developers who are using tools with this
patch applied already.  Also, I anticipate the release of binutils 2.16
including the fix in about a month.

 And yes, the toolchain peoples point of view is fix the kernel.

Huh?  Obviously the kernel isn't broken, unless you're talking about
the kallsyms checks now.

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Consolidate compat_sys_waitid

2005-02-15 Thread Daniel Jacobowitz
On Tue, Feb 15, 2005 at 02:01:49PM +1100, Stephen Rothwell wrote:
> Hi all,
> 
> This patch does:
>   - consolidate the three implementations of compat_sys_waitid
> (some were called sys32_waitid).
>   - adds sys_waitid syscall to ppc
>   - adds sys_waitid and compat_sys_waitid syscalls to ppc64
> 
> Parisc seemed to assume th existance of compat_sys_waitid.  The MIPS
> syscall tables have me confused and may need updating.  I have arbitrarily
> chosen the next available syscall number on ppc and ppc64, I hope this is
> correct.

I posted a (not-consolidated) sys32_waitid to the MIPS list on Sunday.
The syscall tables should confuse you :-)  N32 needs to use compat
versions of most structures, but not siginfo_t.  O32 needs to use
compat versions of everything.  Your new version can replace the
sys32_waitid from my patch, but not sysn32_waitid.

Ralf, I'll let you sort it out :-)

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Consolidate compat_sys_waitid

2005-02-15 Thread Daniel Jacobowitz
On Tue, Feb 15, 2005 at 02:01:49PM +1100, Stephen Rothwell wrote:
 Hi all,
 
 This patch does:
   - consolidate the three implementations of compat_sys_waitid
 (some were called sys32_waitid).
   - adds sys_waitid syscall to ppc
   - adds sys_waitid and compat_sys_waitid syscalls to ppc64
 
 Parisc seemed to assume th existance of compat_sys_waitid.  The MIPS
 syscall tables have me confused and may need updating.  I have arbitrarily
 chosen the next available syscall number on ppc and ppc64, I hope this is
 correct.

I posted a (not-consolidated) sys32_waitid to the MIPS list on Sunday.
The syscall tables should confuse you :-)  N32 needs to use compat
versions of most structures, but not siginfo_t.  O32 needs to use
compat versions of everything.  Your new version can replace the
sys32_waitid from my patch, but not sysn32_waitid.

Ralf, I'll let you sort it out :-)

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Blocking behavior changed for pipes in 2.6.11-rc3

2005-02-11 Thread Daniel Jacobowitz
This program [cribbed loosely from tst-cancel17.c in glibc] has changed
behavior with the recent pipe changes.  It used to block; which makes sense.
It gets the maximum buffer size for the pipe (or a page if that's larger),
and writes that many bytes plus two to it.  It reads one back.  The write
"shouldn't" have room to finish.

Checking the POSIX language for _PC_PIPE_BUF I think this is OK - it doesn't
say that no more bytes than that can be written at once, just that this is
the maximum which are guaranteed to be written atomically.  So I'm guessing
this change is a feature, not a bug.  Right?

[snip]

#include 
#include 
#include 
#include 
#include 
#include 

void *
tf (void *fd)
{
  int *fds = fd;
  char mem[1];
  read (fds[0], mem, 1);
}

int
main (void)
{
  pthread_t th;
  int len;
  int fds[2];

  if (pipe (fds) != 0)
{
  puts ("pipe failed");
  return 1;
}

  size_t len2 = fpathconf (fds[1], _PC_PIPE_BUF);
  size_t page_size = sysconf (_SC_PAGESIZE);
  len2 = (len2 < page_size ? page_size : len2) + 1 + 1;
  char *mem2 = malloc (len2);

  pthread_create (, NULL, tf, fds);
  write (fds[1], mem2, len2);

  return 0;
}

[/snip]

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Blocking behavior changed for pipes in 2.6.11-rc3

2005-02-11 Thread Daniel Jacobowitz
This program [cribbed loosely from tst-cancel17.c in glibc] has changed
behavior with the recent pipe changes.  It used to block; which makes sense.
It gets the maximum buffer size for the pipe (or a page if that's larger),
and writes that many bytes plus two to it.  It reads one back.  The write
shouldn't have room to finish.

Checking the POSIX language for _PC_PIPE_BUF I think this is OK - it doesn't
say that no more bytes than that can be written at once, just that this is
the maximum which are guaranteed to be written atomically.  So I'm guessing
this change is a feature, not a bug.  Right?

[snip]

#include errno.h
#include pthread.h
#include stdio.h
#include stdlib.h
#include string.h
#include unistd.h

void *
tf (void *fd)
{
  int *fds = fd;
  char mem[1];
  read (fds[0], mem, 1);
}

int
main (void)
{
  pthread_t th;
  int len;
  int fds[2];

  if (pipe (fds) != 0)
{
  puts (pipe failed);
  return 1;
}

  size_t len2 = fpathconf (fds[1], _PC_PIPE_BUF);
  size_t page_size = sysconf (_SC_PAGESIZE);
  len2 = (len2  page_size ? page_size : len2) + 1 + 1;
  char *mem2 = malloc (len2);

  pthread_create (th, NULL, tf, fds);
  write (fds[1], mem2, len2);

  return 0;
}

[/snip]

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11-rc3: Kylix application no longer works?

2005-02-09 Thread Daniel Jacobowitz
On Tue, Feb 08, 2005 at 06:10:18PM -0800, Andrew Morton wrote:
> We could just remove the printk and stick a comment over it.  If the
> application later tries to access the not-there pages then it'll just
> fault.
> 
> However I worry if there is some way in which we can leave unzeroed memory
> accessible to the application, although it's hard to see how that could
> happen.
> 
> Daniel, Pavel cruelly chopped you off the Cc when replying.  What's your
> diagnosis on the below?

It's asking for a lot of unwritable zeroed space.  See this:

>   LOAD   0x00 0x08048000 0x08048000 0xb7354 0x1b7354 R E 0x1000
>   LOAD   0x0b7354 0x08200354 0x08200354 0x1e3e4 0x1f648 RW  0x1000

The 0xb7354 is size to map from the file, the 0x1b7354 is size to map
in memory.  We're supposed to zero-fill the rest.  Now that I think
about it I can see why this is a problem - the kernel probably assumes
that any segment with MemSiz > FileSiz will be writable.  Certainly
it's a bit weird for the app to request unwritable zeroed pages.

clear_user's probably not the right way to provide the extra zeroing.

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11-rc3: Kylix application no longer works?

2005-02-09 Thread Daniel Jacobowitz
On Tue, Feb 08, 2005 at 06:10:18PM -0800, Andrew Morton wrote:
 We could just remove the printk and stick a comment over it.  If the
 application later tries to access the not-there pages then it'll just
 fault.
 
 However I worry if there is some way in which we can leave unzeroed memory
 accessible to the application, although it's hard to see how that could
 happen.
 
 Daniel, Pavel cruelly chopped you off the Cc when replying.  What's your
 diagnosis on the below?

It's asking for a lot of unwritable zeroed space.  See this:

   LOAD   0x00 0x08048000 0x08048000 0xb7354 0x1b7354 R E 0x1000
   LOAD   0x0b7354 0x08200354 0x08200354 0x1e3e4 0x1f648 RW  0x1000

The 0xb7354 is size to map from the file, the 0x1b7354 is size to map
in memory.  We're supposed to zero-fill the rest.  Now that I think
about it I can see why this is a problem - the kernel probably assumes
that any segment with MemSiz  FileSiz will be writable.  Certainly
it's a bit weird for the app to request unwritable zeroed pages.

clear_user's probably not the right way to provide the extra zeroing.

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11-rc3: Kylix application no longer works?

2005-02-08 Thread Daniel Jacobowitz
On Tue, Feb 08, 2005 at 06:51:06PM +0100, Pavel Machek wrote:
> Hi!
> 
> > I wonder if reverting the patch will restore the old behaviour?
> 
> This seems to be minimal fix to get Kylix application back to the
> working state... Maybe it is good idea for 2.6.11?

Why does clearing the BSS fail?  Are the program headers bogus?
(readelf -l).

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11-rc3: Kylix application no longer works?

2005-02-08 Thread Daniel Jacobowitz
On Tue, Feb 08, 2005 at 06:51:06PM +0100, Pavel Machek wrote:
 Hi!
 
  I wonder if reverting the patch will restore the old behaviour?
 
 This seems to be minimal fix to get Kylix application back to the
 working state... Maybe it is good idea for 2.6.11?

Why does clearing the BSS fail?  Are the program headers bogus?
(readelf -l).

-- 
Daniel Jacobowitz
CodeSourcery, LLC
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel CVS troubles with cvsps

2005-01-25 Thread Daniel Jacobowitz
On Tue, Jan 25, 2005 at 05:42:03PM +0100, Andrea Arcangeli wrote:
> Any help is appreciated. I'm just starting to look more seriously into
> this since I've some tools that depends on the cvsps to work and kernel
> CVS is the only fully coherent linearized source of info in open format
> (rest is either a priorietary format or unusable because out of
> synchrony because not linearized).  Until now I hoped that by waiting it
> would automatically fixup, but it didn't yet ;).

FYI, I haven't tried using cvsps on the kernel CVS, but I used to use it on
GCC - and it fell down like this on a constant basis.

You might want to take a look at 'xcvs', by Jun Sun.  It's much more
reliable and does everything I used to use cvsps for.  And generally
faster too.

-- 
Daniel Jacobowitz
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel CVS troubles with cvsps

2005-01-25 Thread Daniel Jacobowitz
On Tue, Jan 25, 2005 at 05:42:03PM +0100, Andrea Arcangeli wrote:
 Any help is appreciated. I'm just starting to look more seriously into
 this since I've some tools that depends on the cvsps to work and kernel
 CVS is the only fully coherent linearized source of info in open format
 (rest is either a priorietary format or unusable because out of
 synchrony because not linearized).  Until now I hoped that by waiting it
 would automatically fixup, but it didn't yet ;).

FYI, I haven't tried using cvsps on the kernel CVS, but I used to use it on
GCC - and it fell down like this on a constant basis.

You might want to take a look at 'xcvs', by Jun Sun.  It's much more
reliable and does everything I used to use cvsps for.  And generally
faster too.

-- 
Daniel Jacobowitz
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: forestalling GNU incompatibility - proposal for binary relative dynamic linking

2005-01-24 Thread Daniel Jacobowitz
On Mon, Jan 24, 2005 at 03:53:11PM -0800, Edward Peschko wrote:
> On Mon, Jan 24, 2005 at 03:38:49PM -0800, Richard Henderson wrote:
> > On Mon, Jan 24, 2005 at 03:16:36PM -0800, Edward Peschko wrote:
> > > cool.. any chance for some syntactic sugar so me (and other 
> > > users/vendors) wouldn't need to change any of their build scripts 
> > > and compilation processes?
> > 
> > Uh, like what?  That's about as simple as you can get.
> > 
> > 
> > r~
> 
> I don't understand. 
> 
> Which is simpler, changing an environmental variable, or adding extra 
> CFLAGS to every single compile and recompiling?
> 
> In addition, in your --rpath example, the relative pathing is hardcoded
> into the executable, wheras with "*" you could modify the runtime behavior
> of the executable at runtime. I suppose you could change this with chrpath,
> but why bother? What if you want to test out two versions of relative
> libraries side by side? 

You might want to take a look at Richard's suggestion again.  The
string '$ORIGIN' gets hardcoded into the binary and handled by the
dynamic linker.

But really, RPATH is a good solution to almost no problems.

-- 
Daniel Jacobowitz
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: seccomp for 2.6.11-rc1-bk8

2005-01-24 Thread Daniel Jacobowitz
On Sun, Jan 23, 2005 at 07:34:24AM +, David Wagner wrote:
> Chris Wright  wrote:
> >* David Wagner ([EMAIL PROTECTED]) wrote:
> >> There is a simple tweak to ptrace which fixes that: one could add an
> >> API to specify a set of syscalls that ptrace should not trap on.  To get
> >> seccomp-like semantics, the user program could specify {read,write}, but
> >> if the user program ever wants to change its policy, it could change that
> >> set.  Solaris /proc (which is what is used for tracing) has this feature.
> >> I coded up such an extension to ptrace semantics a long time ago, and
> >> it seemed to work fine for me, though of course I am not a ptrace expert.
> >
> >Hmm, yeah, that'd be nice.  That only leaves the issue of tracer dying
> >(say from that crazy oom killer ;-).
> 
> Yes, I also implemented was a ptrace option which causes the child to be
> slaughtered if the parent dies for any reason.  I could dig up the code,
> but I don't recall it being very hard.  This was ages ago (a 2.0.x kernel)
> and I have no idea what might have changed.  Also, am definitely not a
> guru on kernel internals, so it is always possible I missed something.
> But, at least on the surface this doesn't seem hard to implement.

Maybe it's time to resubmit both of these.  OTOH, maybe it's time to do
something more drastic to ptrace to untangle it from signals...

> A third thing I implemented was a option which would cause ptrace() to be
> inherited across forks.  The way that strace does this (last I looked)
> is an unreliable abomination: when it sees a request to call fork(), it
> sets a breakpoint at the next instruction after the fork() by re-writing
> the code of the parent, then when that breakpoint triggers it attaches to
> the child, restores the parent's code, and lets them continue executing.
> This is icky, and I have little confidence in its security to prevent
> children from escaping a ptrace() jail, so I added a feature to ptrace()
> that remedies the situation.

This has since been done in 2.5.x; see PTRACE_EVENT_FORK.  GDB even
uses it nowadays.  I'm not sure if strace does.

-- 
Daniel Jacobowitz
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: seccomp for 2.6.11-rc1-bk8

2005-01-24 Thread Daniel Jacobowitz
On Sun, Jan 23, 2005 at 07:34:24AM +, David Wagner wrote:
 Chris Wright  wrote:
 * David Wagner ([EMAIL PROTECTED]) wrote:
  There is a simple tweak to ptrace which fixes that: one could add an
  API to specify a set of syscalls that ptrace should not trap on.  To get
  seccomp-like semantics, the user program could specify {read,write}, but
  if the user program ever wants to change its policy, it could change that
  set.  Solaris /proc (which is what is used for tracing) has this feature.
  I coded up such an extension to ptrace semantics a long time ago, and
  it seemed to work fine for me, though of course I am not a ptrace expert.
 
 Hmm, yeah, that'd be nice.  That only leaves the issue of tracer dying
 (say from that crazy oom killer ;-).
 
 Yes, I also implemented was a ptrace option which causes the child to be
 slaughtered if the parent dies for any reason.  I could dig up the code,
 but I don't recall it being very hard.  This was ages ago (a 2.0.x kernel)
 and I have no idea what might have changed.  Also, am definitely not a
 guru on kernel internals, so it is always possible I missed something.
 But, at least on the surface this doesn't seem hard to implement.

Maybe it's time to resubmit both of these.  OTOH, maybe it's time to do
something more drastic to ptrace to untangle it from signals...

 A third thing I implemented was a option which would cause ptrace() to be
 inherited across forks.  The way that strace does this (last I looked)
 is an unreliable abomination: when it sees a request to call fork(), it
 sets a breakpoint at the next instruction after the fork() by re-writing
 the code of the parent, then when that breakpoint triggers it attaches to
 the child, restores the parent's code, and lets them continue executing.
 This is icky, and I have little confidence in its security to prevent
 children from escaping a ptrace() jail, so I added a feature to ptrace()
 that remedies the situation.

This has since been done in 2.5.x; see PTRACE_EVENT_FORK.  GDB even
uses it nowadays.  I'm not sure if strace does.

-- 
Daniel Jacobowitz
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: forestalling GNU incompatibility - proposal for binary relative dynamic linking

2005-01-24 Thread Daniel Jacobowitz
On Mon, Jan 24, 2005 at 03:53:11PM -0800, Edward Peschko wrote:
 On Mon, Jan 24, 2005 at 03:38:49PM -0800, Richard Henderson wrote:
  On Mon, Jan 24, 2005 at 03:16:36PM -0800, Edward Peschko wrote:
   cool.. any chance for some syntactic sugar so me (and other 
   users/vendors) wouldn't need to change any of their build scripts 
   and compilation processes?
  
  Uh, like what?  That's about as simple as you can get.
  
  
  r~
 
 I don't understand. 
 
 Which is simpler, changing an environmental variable, or adding extra 
 CFLAGS to every single compile and recompiling?
 
 In addition, in your --rpath example, the relative pathing is hardcoded
 into the executable, wheras with * you could modify the runtime behavior
 of the executable at runtime. I suppose you could change this with chrpath,
 but why bother? What if you want to test out two versions of relative
 libraries side by side? 

You might want to take a look at Richard's suggestion again.  The
string '$ORIGIN' gets hardcoded into the binary and handled by the
dynamic linker.

But really, RPATH is a good solution to almost no problems.

-- 
Daniel Jacobowitz
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New topic (PowerPC Linux PCI HELL)

2000-09-13 Thread Daniel Jacobowitz

On Wed, Sep 13, 2000 at 05:29:58PM -0700, Andre Hedrick wrote:
> 
> Okay who can teach me how to force hooks and ram this down the PPC
> 
> pci_write_config_word(dev, PCI_COMMAND, 0x05);
> 
> I have all the address registered.
> My new PPC G3 (7600/132) toy is not allowing IO's on PCI cards to come
> alive.  Thus I get some of the most beuatiful lockups ever.
> I suspect that this needs to be handled down in the arch.
> 
> ./linux/arch/ppc/kernel/{chrp_pci.c|mbx_pci.c|pmac_pci.c|prep_pci.c}
> 
> Basically I can not get the IO's active, regardless of BIOS on the card.
> Yes this is the old trick that used to work of making ix86 cards run in
> non ix86-pci slots.
> 
> Here is the fun part, I have a native mac/ppc Ultra-66 card that is fin
> under Mac OS, but the IO's are not enable in linux and it crash like a big
> dog also.

I'm going to bet you need to look at Michel Lanners' (did I spell that
right this time?) PCI patches.  For instance, I've always needed this
hideous patch on my 7300/200 to get a Promise Ultra66 card to work:

diff -ur merging-bk/drivers/block/ide-pci.c work-bk/drivers/block/ide-pci.c
--- merging-bk/drivers/block/ide-pci.c  Tue Apr  4 22:19:16 2000
+++ work-bk/drivers/block/ide-pci.c Thu Mar  9 15:33:25 2000
@@ -468,6 +468,15 @@
printk("%s: error accessing PCI regs\n", d->name);
return;
}
+#ifdef __powerpc__
+   if (!(pcicmd & PCI_COMMAND_IO)) {   /* is device disabled? */
+   pci_write_config_word(dev, PCI_COMMAND, pcicmd | PCI_COMMAND_IO);
+   if (pci_read_config_word(dev, PCI_COMMAND, )) {
+   printk("%s: error accessing PCI regs\n", d->name);
+   return;
+   }
+   }
+#endif
if (!(pcicmd & PCI_COMMAND_IO)) {   /* is device disabled? */
/*
 * PnP BIOS was *supposed* to have set this device up for us,


Dan

/\  /\
|   Daniel Jacobowitz|__|SCS Class of 2002   |
|   Debian GNU/Linux Developer__Carnegie Mellon University   |
| [EMAIL PROTECTED] |  |   [EMAIL PROTECTED]  |
\/  \/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/