subject:"A possible bug\?"

Re: Possible bug in bc(1)

2016-02-20 Thread Wolfgang Petzold

Hi,

actually, if you "ignore \ within numbers", your
input reads 2*12*1, doesn't it?

W.

Am 20.02.2016 um 12:07 schrieb Ruslan Makhmatkhanov:
> Hello,
> 
> I'm getting strange result with something looking like valid data:
> 
> [rm@smsh-zfs ~]> bc
> 2*1\
> 2*1
> 24
> 
> I'd expect the output being like that:
> 2*1\
> 2
> 2*1
> 2
> 
> What I see in bc(1) man-page regarding to backslash is:
> "The sequence ‘\’ is ignored within numbers."
> 
> So looks like it doesn't actually ignored or I missing something?
> 
> Thanks for clarification.
> 

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Possible bug in or around posix_fadvise after r292326

2016-01-05 Thread Konstantin Belousov

On Mon, Jan 04, 2016 at 10:05:21PM -0800, Benno Rice wrote:
> Hi Konstantin,
> 
> I recently updated my dev box to r292962. After doing this I attempted to set 
> up PostgreSQL 9.4. When I ran initdb the last phase hung. Using procstat -kk 
> I found it appeared to be stuck in a loop inside a posix_fadvise syscall. I 
> could not ^C or ^Z the initdb process. I could kill it but a subsequent 
> attempt to rm -rf the /usr/local/pgsql/data directory also got stuck and was 
> unkillable by any means. Rebooting allowed me to remove the directory but the 
> initdb process still hung when I re-ran it.
> 
> I tried PostgreSQL 9.3 with similar results.
> 
> Looking at the source code for initdb I found that it calls posix_fadvise 
> like so[1]:
> 
>  /*
>   * We do what pg_flush_data() would do in the backend: prefer to use
>   * sync_file_range, but fall back to posix_fadvise.  We ignore errors
>   * because this is only a hint.
>   */
>  #if defined(HAVE_SYNC_FILE_RANGE)
>  (void) sync_file_range(fd, 0, 0, SYNC_FILE_RANGE_WRITE);
>  #elif defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_DONTNEED)
>  (void) posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED);
>  #else
>  #error PG_FLUSH_DATA_WORKS should not have been defined
>  #endif
> 
> Looking for recent commits involving POSIX_FADV_DONTNEED I found r292326:
> 
> https://svnweb.freebsd.org/changeset/base/292326 
> 
> 
> Backing this revision out allowed the initdb process to complete.
> 
> My current theory is that some how we???re getting ENOLCK or EAGAIN from the 
> BUF_TIMELOCK call in bnoreuselist:
> 
> https://svnweb.freebsd.org/base/head/sys/kern/vfs_subr.c?view=annotate#l1676 
> 
> 
> Leading to an infinite loop in vop_stdadvise:
> 
> https://svnweb.freebsd.org/base/head/sys/kern/vfs_default.c?annotate=292373#l1083
>  
> 
> 
> I haven???t managed to dig any deeper than that yet.
> 
> Is there any other information I could give you to help narrow this down?

I do not see this issue locally.

When the hang in initdb occur, what is the state of the initdb thread
which performs advise() ?  Is it "brlsfl" sleep, or is the thread running ?

If buffer lock is not available, and this is the cause of the ENOLCK/EAGAIN,
then the question is who is the owner of the corresponding buffer lock.
You could overview the state of the system with 'ps' command in ddb, and
'show alllocks' would list owner, unless buffer was async.

Also, I do not quite understand the behaviour of SIGINT/SIGKILL.  Could
it be that the process was not killed by SIGKILL as well ?  It would be
consistent with the vnode lock still owned and preventing the accesses.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Possible bug in or around posix_fadvise after r292326

2016-01-05 Thread Konstantin Belousov

On Tue, Jan 05, 2016 at 01:07:40PM +0200, Konstantin Belousov wrote:
> On Mon, Jan 04, 2016 at 10:05:21PM -0800, Benno Rice wrote:
> > Hi Konstantin,
> > 
> > I recently updated my dev box to r292962. After doing this I attempted to 
> > set up PostgreSQL 9.4. When I ran initdb the last phase hung. Using 
> > procstat -kk I found it appeared to be stuck in a loop inside a 
> > posix_fadvise syscall. I could not ^C or ^Z the initdb process. I could 
> > kill it but a subsequent attempt to rm -rf the /usr/local/pgsql/data 
> > directory also got stuck and was unkillable by any means. Rebooting allowed 
> > me to remove the directory but the initdb process still hung when I re-ran 
> > it.
> > 
> > I tried PostgreSQL 9.3 with similar results.
> > 
> > Looking at the source code for initdb I found that it calls posix_fadvise 
> > like so[1]:
> > 
> >  /*
> >   * We do what pg_flush_data() would do in the backend: prefer to use
> >   * sync_file_range, but fall back to posix_fadvise.  We ignore errors
> >   * because this is only a hint.
> >   */
> >  #if defined(HAVE_SYNC_FILE_RANGE)
> >  (void) sync_file_range(fd, 0, 0, SYNC_FILE_RANGE_WRITE);
> >  #elif defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_DONTNEED)
> >  (void) posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED);
> >  #else
> >  #error PG_FLUSH_DATA_WORKS should not have been defined
> >  #endif
> > 
> > Looking for recent commits involving POSIX_FADV_DONTNEED I found r292326:
> > 
> > https://svnweb.freebsd.org/changeset/base/292326 
> > 
> > 
> > Backing this revision out allowed the initdb process to complete.
> > 
> > My current theory is that some how we???re getting ENOLCK or EAGAIN from 
> > the BUF_TIMELOCK call in bnoreuselist:
> > 
> > https://svnweb.freebsd.org/base/head/sys/kern/vfs_subr.c?view=annotate#l1676
> >  
> > 
> > 
> > Leading to an infinite loop in vop_stdadvise:
> > 
> > https://svnweb.freebsd.org/base/head/sys/kern/vfs_default.c?annotate=292373#l1083
> >  
> > 
> > 
> > I haven???t managed to dig any deeper than that yet.
> > 
> > Is there any other information I could give you to help narrow this down?
> 
> I do not see this issue locally.
> 
> When the hang in initdb occur, what is the state of the initdb thread
> which performs advise() ?  Is it "brlsfl" sleep, or is the thread running ?
> 
> If buffer lock is not available, and this is the cause of the ENOLCK/EAGAIN,
> then the question is who is the owner of the corresponding buffer lock.
> You could overview the state of the system with 'ps' command in ddb, and
> 'show alllocks' would list owner, unless buffer was async.
> 
> Also, I do not quite understand the behaviour of SIGINT/SIGKILL.  Could
> it be that the process was not killed by SIGKILL as well ?  It would be
> consistent with the vnode lock still owned and preventing the accesses.

Just in case, if this is due to the quadratic loop behaviour, there
is no need to restart from the very start in the new stdadvise(DONTNEED)
implementation.  So regardless of the answers to the questions above,
you might also try this patch.

diff --git a/sys/kern/vfs_default.c b/sys/kern/vfs_default.c
index fd83f87..3da8618 100644
--- a/sys/kern/vfs_default.c
+++ b/sys/kern/vfs_default.c
@@ -1080,15 +1080,9 @@ vop_stdadvise(struct vop_advise_args *ap)
bsize = vp->v_bufobj.bo_bsize;
startn = ap->a_start / bsize;
endn = ap->a_end / bsize;
-   for (;;) {
-   error = bnoreuselist(>bo_clean, bo, startn, endn);
-   if (error == EAGAIN)
-   continue;
+   error = bnoreuselist(>bo_clean, bo, startn, endn);
+   if (error == 0)
error = bnoreuselist(>bo_dirty, bo, startn, endn);
-   if (error == EAGAIN)
-   continue;
-   break;
-   }
BO_RUNLOCK(bo);
VOP_UNLOCK(vp, 0);
break;
diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c
index ace97e8..8cac32f 100644
--- a/sys/kern/vfs_subr.c
+++ b/sys/kern/vfs_subr.c
@@ -1670,6 +1670,7 @@ bnoreuselist(struct bufv *bufv, struct bufobj *bo, 
daddr_t startn, daddr_t endn)
ASSERT_BO_LOCKED(bo);
 
for (lblkno = startn;; lblkno++) {
+again:
bp = BUF_PCTRIE_LOOKUP_GE(>bv_root, lblkno);
if (bp == NULL || bp->b_lblkno >= endn)
break;
@@ -1677,7 +1678,9 @@ bnoreuselist(struct bufv *bufv, struct bufobj *bo, 
daddr_t startn, daddr_t endn)
LK_INTERLOCK, BO_LOCKPTR(bo), "brlsfl", 0, 0);
if (error != 0) {
BO_RLOCK(bo);
-   return

Re: Possible bug in or around posix_fadvise after r292326

2016-01-05 Thread Peter Holm

On Tue, Jan 05, 2016 at 01:07:40PM +0200, Konstantin Belousov wrote:
> On Mon, Jan 04, 2016 at 10:05:21PM -0800, Benno Rice wrote:
> > Hi Konstantin,
> > 
> > I recently updated my dev box to r292962. After doing this I attempted to 
> > set up PostgreSQL 9.4. When I ran initdb the last phase hung. Using 
> > procstat -kk I found it appeared to be stuck in a loop inside a 
> > posix_fadvise syscall. I could not ^C or ^Z the initdb process. I could 
> > kill it but a subsequent attempt to rm -rf the /usr/local/pgsql/data 
> > directory also got stuck and was unkillable by any means. Rebooting allowed 
> > me to remove the directory but the initdb process still hung when I re-ran 
> > it.
> > 
> > I tried PostgreSQL 9.3 with similar results.
> > 
> > Looking at the source code for initdb I found that it calls posix_fadvise 
> > like so[1]:
> > 
> >  /*
> >   * We do what pg_flush_data() would do in the backend: prefer to use
> >   * sync_file_range, but fall back to posix_fadvise.  We ignore errors
> >   * because this is only a hint.
> >   */
> >  #if defined(HAVE_SYNC_FILE_RANGE)
> >  (void) sync_file_range(fd, 0, 0, SYNC_FILE_RANGE_WRITE);
> >  #elif defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_DONTNEED)
> >  (void) posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED);
> >  #else
> >  #error PG_FLUSH_DATA_WORKS should not have been defined
> >  #endif
> > 
> > Looking for recent commits involving POSIX_FADV_DONTNEED I found r292326:
> > 
> > https://svnweb.freebsd.org/changeset/base/292326 
> > 
> > 
> > Backing this revision out allowed the initdb process to complete.
> > 
> > My current theory is that some how we???re getting ENOLCK or EAGAIN from 
> > the BUF_TIMELOCK call in bnoreuselist:
> > 
> > https://svnweb.freebsd.org/base/head/sys/kern/vfs_subr.c?view=annotate#l1676
> >  
> > 
> > 
> > Leading to an infinite loop in vop_stdadvise:
> > 
> > https://svnweb.freebsd.org/base/head/sys/kern/vfs_default.c?annotate=292373#l1083
> >  
> > 
> > 
> > I haven???t managed to dig any deeper than that yet.
> > 
> > Is there any other information I could give you to help narrow this down?
> 
> I do not see this issue locally.
> 

I do:

(kgdb) f 9
#9  0x80ac7956 in vop_stdadvise (ap=0xfe081dc6d930) at 
../../../kern/vfs_default.c:1087
1087error = bnoreuselist(>bo_dirty, bo, startn, 
endn);
(kgdb) l
1082endn = ap->a_end / bsize;
1083for (;;) {
1084error = bnoreuselist(>bo_clean, bo, startn, 
endn);
1085if (error == EAGAIN)
1086continue;
1087error = bnoreuselist(>bo_dirty, bo, startn, 
endn);
1088if (error == EAGAIN)
1089continue;
1090break;
1091}
(kgdb) info loc
vp = (struct vnode *) 0xf8008bdaa9c0
bo = (struct bufobj *) 0xf8008bdaab28
startn = 0x0
endn = 0x
start = 0x0
end = 0x8000
bsize = 0x8000
error = 0x0
(kgdb)

https://people.freebsd.org/~pho/stress/log/kostik855.txt

- Peter
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Possible bug in or around posix_fadvise after r292326

2016-01-04 Thread Benno Rice


> On Jan 4, 2016, at 22:05, Benno Rice  wrote:
> 
> Hi Konstantin,
> 
> I recently updated my dev box to r292962. After doing this I attempted to set 
> up PostgreSQL 9.4. When I ran initdb the last phase hung. Using procstat -kk 
> I found it appeared to be stuck in a loop inside a posix_fadvise syscall. I 
> could not ^C or ^Z the initdb process. I could kill it but a subsequent 
> attempt to rm -rf the /usr/local/pgsql/data directory also got stuck and was 
> unkillable by any means. Rebooting allowed me to remove the directory but the 
> initdb process still hung when I re-ran it.

[snip]

> 
> I haven’t managed to dig any deeper than that yet.
> 
> Is there any other information I could give you to help narrow this down?

Rebooted with a WITNESS kernel and got the following LOR after initdb started:

https://gist.github.com/jeamland/69a07c4523f0dea4c26c 



___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Possible bug in or around posix_fadvise after r292326

2016-01-04 Thread Adrian Chadd

+1, saw this locally on my up to date amd64 laptop. :(


-a
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Possible bug in softfloat

2014-11-30 Thread Julian Elischer


On 11/29/14, 4:59 AM, Adrian Chadd wrote:

You can easily fire up a mips32 / mips64 emulator build - cross-build
a world+kernel, build an image, then run qemu-devel to boot it.

https://wiki.freebsd.org/FreeBSD/MipsEmulation

You should be able to get a 32 bit soft-float mips environment inside
there which you can use to trigger it.
(And also run whatever other floating point validation suite you may hvae.)


I suspect this is one for bde to look at.





-adrian


On 28 November 2014 at 11:07, Steve Kargl
s...@troutmask.apl.washington.edu wrote:

On Fri, Nov 28, 2014 at 10:54:25AM -0800, Adrian Chadd wrote:

On 28 November 2014 at 10:34, Steve Kargl
s...@troutmask.apl.washington.edu wrote:

In a thread on comp.lang.c, it was pointed out that softfloat
has a bug and in checking src/lib/libc/softfloat I confimed
the issue is present in FreeBSD.  What I hae not confirmed
is whether or not it is possible to hit this bug.  In fact,
it may only hit arm and mips.  Anyway, here's the patch

So we should just commit this?


I suspect the answer is yes, but I have no idea on how
to trigger this code path.  I also have no access to
arm or mips hardware where the problem may manifest only.

It may also be appropriate to have someone else confirm
that the patch is indeed correct.

--
steve

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org




___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Possible bug in softfloat

2014-11-28 Thread Adrian Chadd

On 28 November 2014 at 10:34, Steve Kargl
s...@troutmask.apl.washington.edu wrote:
 In a thread on comp.lang.c, it was pointed out that softfloat
 has a bug and in checking src/lib/libc/softfloat I confimed
 the issue is present in FreeBSD.  What I hae not confirmed
 is whether or not it is possible to hit this bug.  In fact,
 it may only hit arm and mips.  Anyway, here's the patch

So we should just commit this?


-a



 Index: softfloat/bits64/softfloat-macros
 ===
 --- softfloat/bits64/softfloat-macros   (revision 275211)
 +++ softfloat/bits64/softfloat-macros   (working copy)
 @@ -157,7 +157,7 @@
  z0 = a0count;
  }
  else {
 -z1 = ( count  64 ) ? ( a0( count  63 ) ) : 0;
 +z1 = ( count  128 ) ? ( a0( count  63 ) ) : 0;
  z0 = 0;
  }
  *z1Ptr = z1;

 --
 Steve
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Possible bug in softfloat

2014-11-28 Thread Steve Kargl

On Fri, Nov 28, 2014 at 10:54:25AM -0800, Adrian Chadd wrote:
 On 28 November 2014 at 10:34, Steve Kargl
 s...@troutmask.apl.washington.edu wrote:
  In a thread on comp.lang.c, it was pointed out that softfloat
  has a bug and in checking src/lib/libc/softfloat I confimed
  the issue is present in FreeBSD.  What I hae not confirmed
  is whether or not it is possible to hit this bug.  In fact,
  it may only hit arm and mips.  Anyway, here's the patch
 
 So we should just commit this?
 

I suspect the answer is yes, but I have no idea on how
to trigger this code path.  I also have no access to 
arm or mips hardware where the problem may manifest only.  

It may also be appropriate to have someone else confirm
that the patch is indeed correct.

-- 
steve
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Possible bug in softfloat

2014-11-28 Thread Adrian Chadd

You can easily fire up a mips32 / mips64 emulator build - cross-build
a world+kernel, build an image, then run qemu-devel to boot it.

https://wiki.freebsd.org/FreeBSD/MipsEmulation

You should be able to get a 32 bit soft-float mips environment inside
there which you can use to trigger it.
(And also run whatever other floating point validation suite you may hvae.)



-adrian


On 28 November 2014 at 11:07, Steve Kargl
s...@troutmask.apl.washington.edu wrote:
 On Fri, Nov 28, 2014 at 10:54:25AM -0800, Adrian Chadd wrote:
 On 28 November 2014 at 10:34, Steve Kargl
 s...@troutmask.apl.washington.edu wrote:
  In a thread on comp.lang.c, it was pointed out that softfloat
  has a bug and in checking src/lib/libc/softfloat I confimed
  the issue is present in FreeBSD.  What I hae not confirmed
  is whether or not it is possible to hit this bug.  In fact,
  it may only hit arm and mips.  Anyway, here's the patch

 So we should just commit this?


 I suspect the answer is yes, but I have no idea on how
 to trigger this code path.  I also have no access to
 arm or mips hardware where the problem may manifest only.

 It may also be appropriate to have someone else confirm
 that the patch is indeed correct.

 --
 steve
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Possible bug in NFSv4 with krb5p security?

2013-03-25 Thread Andrey Simonenko

On Wed, Feb 20, 2013 at 06:29:07PM -0500, Rick Macklem wrote:
 Andrey Simonnenko wrote:
  
  Another variant. This is a program that can be used for verifying
  correctness of the function, just change PWBUF_SIZE_* values and put
  some printfs to see when buffer is reallocated. sizehint is updated
  only when malloc() succeeded.
  
  -
  static int
  getpwnam_r_func(const char *name, uid_t *uidp)
  {
  #define PWBUF_SIZE_INI (2 * MAXLOGNAME + MAXPATHLEN + _PASSWORD_LEN)
  #define PWBUF_SIZE_INC 128
  
  static size_t sizehint = PWBUF_SIZE_INI;
  
  struct passwd pwd;
  struct passwd *pw;
  char *buf;
  size_t size;
  int error;
  char lname[MAXLOGNAME];
  char bufs[PWBUF_SIZE_INI];
  
  strncpy(lname, name, sizeof(lname));
  
  buf = bufs;
  size = sizeof(bufs);
  for (;;) {
  error = getpwnam_r(lname, pwd, buf, size, pw);
  if (buf != bufs)
  free(buf);
  if (pw != NULL) {
  *uidp = pw-pw_uid;
  return (GSS_S_COMPLETE);
  } else if (error != ERANGE || size  SIZE_MAX - PWBUF_SIZE_INC)
  return (GSS_S_FAILURE);
  if (size != sizehint)
  size = sizehint;
  else
  size += PWBUF_SIZE_INC;
  buf = malloc(size);
  if (buf == NULL)
  return (GSS_S_FAILURE);
  sizehint = size;
  }
  }
  
 All looks fine to me. (Before my mailer messed with the whitespace;-)
 
 Thanks, rick
 ps: I will commit it in April, unless someone else does so sooner.

I was thinking about this approach once again and made a conclusion that
it is wrong.  Using static buffer at first and then allocate memory for
next calls can cause slowdown, depends on number of entries and used
database backend of course.  The libc code for getpwnam() allocates
memory for the buffer and does not free it on exit from the function,

If the above written code or any of its modification is going to be
committed to the source base (by you or by some another committer),
then I ask and require to not write/mention my name and email address
neither in source file nor in commit log message.

Appropriate commit log message for the above written code can be the
following:

--
Since FreeBSD does not support sysconf(_SC_GETPW_R_SIZE_MAX), then allocate
a buffer of sufficient size for getpwnam_r() as it is suggested in EXAMPLES
of SUSv4 documentation for the getpwnam_r() function.
--

since this documentation has similar code.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Possible bug in NFSv4 with krb5p security?

2013-03-25 Thread Rick Macklem

Andrey Simonenko wrote:
 On Wed, Feb 20, 2013 at 06:29:07PM -0500, Rick Macklem wrote:
  Andrey Simonnenko wrote:
  
   Another variant. This is a program that can be used for verifying
   correctness of the function, just change PWBUF_SIZE_* values and
   put
   some printfs to see when buffer is reallocated. sizehint is
   updated
   only when malloc() succeeded.
  
   -
   static int
   getpwnam_r_func(const char *name, uid_t *uidp)
   {
   #define PWBUF_SIZE_INI (2 * MAXLOGNAME + MAXPATHLEN +
   _PASSWORD_LEN)
   #define PWBUF_SIZE_INC 128
  
 static size_t sizehint = PWBUF_SIZE_INI;
  
 struct passwd pwd;
 struct passwd *pw;
 char *buf;
 size_t size;
 int error;
 char lname[MAXLOGNAME];
 char bufs[PWBUF_SIZE_INI];
  
 strncpy(lname, name, sizeof(lname));
  
 buf = bufs;
 size = sizeof(bufs);
 for (;;) {
 error = getpwnam_r(lname, pwd, buf, size, pw);
 if (buf != bufs)
 free(buf);
 if (pw != NULL) {
 *uidp = pw-pw_uid;
 return (GSS_S_COMPLETE);
 } else if (error != ERANGE || size  SIZE_MAX - PWBUF_SIZE_INC)
 return (GSS_S_FAILURE);
 if (size != sizehint)
 size = sizehint;
 else
 size += PWBUF_SIZE_INC;
 buf = malloc(size);
 if (buf == NULL)
 return (GSS_S_FAILURE);
 sizehint = size;
 }
   }
  
  All looks fine to me. (Before my mailer messed with the
  whitespace;-)
 
  Thanks, rick
  ps: I will commit it in April, unless someone else does so sooner.
 
 I was thinking about this approach once again and made a conclusion
 that
 it is wrong. Using static buffer at first and then allocate memory for
 next calls can cause slowdown, depends on number of entries and used
 database backend of course. The libc code for getpwnam() allocates
 memory for the buffer and does not free it on exit from the function,
 
Not sure what you were saying by the last sentence? Using a static
buffer here would make the code unsafe for multiple threads. Although
the gssd is currently single threaded, that might change someday, maybe?
(Using the static as a hint should be safe for multiple threads, since
 it is just a hint.)

 If the above written code or any of its modification is going to be
 committed to the source base (by you or by some another committer),
 then I ask and require to not write/mention my name and email address
 neither in source file nor in commit log message.
 
Ok, that's your choice.

I think the code is fine, since the likelyhood
of the first getpwuid_r() with the buffer on the stack failing is nearly
0, given that it is much bigger than 128. (Although I think a loop on
ERANGE should be in the code, just making the buffer a lot bigger than
128 will fix the problem for 99.9% of the cases.)
The only change I was planning to the above was moving the free(buf)
down until after pwd has been used, so it is safe for fields like pw_name,
which I think will live in buf. That way the code won't be broken if/when
someone uses it for pw_name.

rick

 Appropriate commit log message for the above written code can be the
 following:
 
 --
 Since FreeBSD does not support sysconf(_SC_GETPW_R_SIZE_MAX), then
 allocate
 a buffer of sufficient size for getpwnam_r() as it is suggested in
 EXAMPLES
 of SUSv4 documentation for the getpwnam_r() function.
 --
 
 since this documentation has similar code.
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to
 freebsd-current-unsubscr...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Possible bug in NFSv4 with krb5p security?

2013-02-20 Thread Andrey Simonenko

On Tue, Feb 19, 2013 at 08:52:49PM -0500, Rick Macklem wrote:
  
  I cannot find how to get information about maximum buffer size for
  the getpwnam_r() function. This information should be returned by
  sysconf(_SC_GETPW_R_SIZE_MAX), but since it does not work on FreeBSD
  it is necessary to guess its size. Original value is 128 and it works
  for somebody, 1024 works for your environment, but it can fail for
  another environment.
  
  SUSv4 specifies Storage referenced by the structure is allocated from
  the memory provided with the buffer parameter, but then tells about
  groups
  in EXAMPLE for getpwnam_r() Note that sysconf(_SC_GETPW_R_SIZE_MAX)
  may
  return -1 if there is no hard limit on the size of the buffer needed
  to
  store all the groups returned.
  
  malloc() can give overhead, but that function can try to call
  getpwnam_r()
  with buffer allocated from stack and if getpwnam_r() failed with
  ERANGE
  use dynamically allocated buffer.
  
  #define PWBUF_SIZE_INI (2 * MAXLOGNAME + 2 * MAXPATHLEN +
  _PASSWORD_LEN + 1)
  #define PWBUF_SIZE_INC 128
  
  char bufs[2 * MAXLOGNAME + MAXPATHLEN + PASSWORD_LEN + 1 + 32];
  
  error = getpwnam_r(lname, pwd, bufs, sizeof(bufs), pw);
  if (pw != NULL) {
  *uidp = pw-pw_uid;
  return (GSS_S_COMPLETE);
  } else if (error != ERANGE)
  return (GSS_S_FAILURE);
  
  size = PWBUF_SIZE_INI;
  for (;;) {
  size += PWBUF_SIZE_INC;
  buf = malloc(size);
  if (buf == NULL)
  return (GSS_S_FAILURE);
  error = getpwnam_r(lname, pwd, buf, size, pw);
  free(buf);
  if (pw != NULL) {
  *uidp = pw-pw_uid;
  return (GSS_S_COMPLETE);
  } else {
  if (error == ERANGE 
  size = SIZE_MAX - PWBUF_SIZE_INC)
  continue;
  return (GSS_S_FAILURE);
  }
  }
 
 Just my opinion, but I think the above is a good approach.
 (ie. First trying a fairly large buffer on the stack that
  will succeed 99.99% of the time, but check for ERANGE and
  loop trying progressively larger malloc'd buffers until
  it stops reporting ERANGE.)
 
 I suspect the overheads behind getpwnam_r() are larger than
 the difference between using a buffer on the stack vs malloc,
 so I think it should use a fairly large buffer the first time.
 
 Personally, I might have coded it as a single do { } while(),
 with the first attempt in it, but that's just personal stylistic
 taste. (You can check for buf != bufs before doing a free() of it.)
 And, if you wanted to be clever, the code could use a static bufsiz_hint,
 which is set to the largest size needed sofar and that is used as
 the initial malloc size. That way it wouldn't loop as much for a
 site with huge passwd entries. (An entire bio in the GECOS field or ???)
 

Thanks for the review.

Another variant.  This is a program that can be used for verifying
correctness of the function, just change PWBUF_SIZE_* values and put
some printfs to see when buffer is reallocated.  sizehint is updated
only when malloc() succeeded.

-
#include sys/param.h
#include sys/limits.h

#include gssapi/gssapi.h

#include errno.h
#include limits.h
#include pwd.h
#include stdio.h
#include stdlib.h
#include string.h
#include unistd.h

static int
getpwnam_r_func(const char *name, uid_t *uidp)
{
#define PWBUF_SIZE_INI (2 * MAXLOGNAME + MAXPATHLEN + _PASSWORD_LEN)
#define PWBUF_SIZE_INC 128

static size_t sizehint = PWBUF_SIZE_INI;

struct passwd pwd;
struct passwd *pw;
char *buf;
size_t size;
int error;
char lname[MAXLOGNAME];
char bufs[PWBUF_SIZE_INI];

strncpy(lname, name, sizeof(lname));

buf = bufs;
size = sizeof(bufs);
for (;;) {
error = getpwnam_r(lname, pwd, buf, size, pw);
if (buf != bufs)
free(buf);
if (pw != NULL) {
*uidp = pw-pw_uid;
return (GSS_S_COMPLETE);
} else if (error != ERANGE || size  SIZE_MAX - PWBUF_SIZE_INC)
return (GSS_S_FAILURE);
if (size != sizehint)
size = sizehint;
else
size += PWBUF_SIZE_INC;
buf = malloc(size);
if (buf == NULL)
return (GSS_S_FAILURE);
sizehint = size;
}
}

int
main(void)
{
const char *str[] = { man, root, q, bin, NULL };
uid_t uid;
u_int i;

for (i = 0; str[i] != NULL; ++i) {
printf(%-20s\t, str[i]);
if (getpwnam_r_func(str[i], uid) == GSS_S_COMPLETE)
printf(%u\n, uid);
else
printf(not found\n);
}
return (0);
}
-
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to

Re: Possible bug in NFSv4 with krb5p security?

2013-02-20 Thread Rick Macklem

Andrey Simonnenko wrote:
 On Tue, Feb 19, 2013 at 08:52:49PM -0500, Rick Macklem wrote:
  
   I cannot find how to get information about maximum buffer size for
   the getpwnam_r() function. This information should be returned by
   sysconf(_SC_GETPW_R_SIZE_MAX), but since it does not work on
   FreeBSD
   it is necessary to guess its size. Original value is 128 and it
   works
   for somebody, 1024 works for your environment, but it can fail for
   another environment.
  
   SUSv4 specifies Storage referenced by the structure is allocated
   from
   the memory provided with the buffer parameter, but then tells
   about
   groups
   in EXAMPLE for getpwnam_r() Note that
   sysconf(_SC_GETPW_R_SIZE_MAX)
   may
   return -1 if there is no hard limit on the size of the buffer
   needed
   to
   store all the groups returned.
  
   malloc() can give overhead, but that function can try to call
   getpwnam_r()
   with buffer allocated from stack and if getpwnam_r() failed with
   ERANGE
   use dynamically allocated buffer.
  
   #define PWBUF_SIZE_INI (2 * MAXLOGNAME + 2 * MAXPATHLEN +
   _PASSWORD_LEN + 1)
   #define PWBUF_SIZE_INC 128
  
   char bufs[2 * MAXLOGNAME + MAXPATHLEN + PASSWORD_LEN + 1 + 32];
  
   error = getpwnam_r(lname, pwd, bufs, sizeof(bufs), pw);
   if (pw != NULL) {
   *uidp = pw-pw_uid;
   return (GSS_S_COMPLETE);
   } else if (error != ERANGE)
   return (GSS_S_FAILURE);
  
   size = PWBUF_SIZE_INI;
   for (;;) {
   size += PWBUF_SIZE_INC;
   buf = malloc(size);
   if (buf == NULL)
   return (GSS_S_FAILURE);
   error = getpwnam_r(lname, pwd, buf, size, pw);
   free(buf);
   if (pw != NULL) {
   *uidp = pw-pw_uid;
   return (GSS_S_COMPLETE);
   } else {
   if (error == ERANGE 
   size = SIZE_MAX - PWBUF_SIZE_INC)
   continue;
   return (GSS_S_FAILURE);
   }
   }
 
  Just my opinion, but I think the above is a good approach.
  (ie. First trying a fairly large buffer on the stack that
   will succeed 99.99% of the time, but check for ERANGE and
   loop trying progressively larger malloc'd buffers until
   it stops reporting ERANGE.)
 
  I suspect the overheads behind getpwnam_r() are larger than
  the difference between using a buffer on the stack vs malloc,
  so I think it should use a fairly large buffer the first time.
 
  Personally, I might have coded it as a single do { } while(),
  with the first attempt in it, but that's just personal stylistic
  taste. (You can check for buf != bufs before doing a free() of it.)
  And, if you wanted to be clever, the code could use a static
  bufsiz_hint,
  which is set to the largest size needed sofar and that is used as
  the initial malloc size. That way it wouldn't loop as much for a
  site with huge passwd entries. (An entire bio in the GECOS field or
  ???)
 
 
 Thanks for the review.
 
 Another variant. This is a program that can be used for verifying
 correctness of the function, just change PWBUF_SIZE_* values and put
 some printfs to see when buffer is reallocated. sizehint is updated
 only when malloc() succeeded.
 
 -
 #include sys/param.h
 #include sys/limits.h
 
 #include gssapi/gssapi.h
 
 #include errno.h
 #include limits.h
 #include pwd.h
 #include stdio.h
 #include stdlib.h
 #include string.h
 #include unistd.h
 
 static int
 getpwnam_r_func(const char *name, uid_t *uidp)
 {
 #define PWBUF_SIZE_INI (2 * MAXLOGNAME + MAXPATHLEN + _PASSWORD_LEN)
 #define PWBUF_SIZE_INC 128
 
 static size_t sizehint = PWBUF_SIZE_INI;
 
 struct passwd pwd;
 struct passwd *pw;
 char *buf;
 size_t size;
 int error;
 char lname[MAXLOGNAME];
 char bufs[PWBUF_SIZE_INI];
 
 strncpy(lname, name, sizeof(lname));
 
 buf = bufs;
 size = sizeof(bufs);
 for (;;) {
 error = getpwnam_r(lname, pwd, buf, size, pw);
 if (buf != bufs)
 free(buf);
 if (pw != NULL) {
 *uidp = pw-pw_uid;
 return (GSS_S_COMPLETE);
 } else if (error != ERANGE || size  SIZE_MAX - PWBUF_SIZE_INC)
 return (GSS_S_FAILURE);
 if (size != sizehint)
 size = sizehint;
 else
 size += PWBUF_SIZE_INC;
 buf = malloc(size);
 if (buf == NULL)
 return (GSS_S_FAILURE);
 sizehint = size;
 }
 }
 
All looks fine to me. (Before my mailer messed with the whitespace;-)

Thanks, rick
ps: I will commit it in April, unless someone else does so sooner.

 int
 main(void)
 {
 const char *str[] = { man, root, q, bin, NULL };
 uid_t uid;
 u_int i;
 
 for (i = 0; str[i] != NULL; ++i) {
 printf(%-20s\t, str[i]);
 if (getpwnam_r_func(str[i], uid) == GSS_S_COMPLETE)
 printf(%u\n, uid);
 else
 printf(not found\n);
 }
 return (0);
 }
 -
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to
 freebsd-current-unsubscr...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to

Re: Possible bug in NFSv4 with krb5p security?

2013-02-19 Thread Andrey Simonenko

On Tue, Feb 19, 2013 at 12:06:13AM +0800, Elias Martenson wrote:
 
 You were right, the problem was in pname_to_uid.c. In it, the following
 code can be found:
 
 char lname[MAXLOGNAME + 1], buf[1024];
 
 /* some code snipped for brevity... */
 
 getpwnam_r(lname, pwd, buf, sizeof(buf), pw);
 if (pw) {
 *uidp = pw-pw_uid;
 return (GSS_S_COMPLETE);
 } else {
 return (GSS_S_FAILURE);
 }
 
 As it turns out, the getpwnam_r() call fails with ERANGE (I had to check
 the return value from getpwnam_r() in order to determine this, as pw is set
 to NULL both if there was an error or if the user name can't be found).
 
 Now, increasing the size of buf to 1024 solved the problem, and now the
 lookup works correctly.
 
 I wrote a small test program that issued the same call to getpwnam_r() and
 it worked. Until I su'ed to root, and then it failed.
 
 It seems as though the buffer needs to be bigger if you're root. I have no
 idea why, but there you have it. Problem solved.

It can require bigger buffer, since root can get the pw_password field
in the struct passwd{}.

Since sysconf(_SC_GETPW_R_SIZE_MAX) does not work on FreeBSD, the buffer
for getpwnam_r() call should have at least (2 * MAXLOGNAME + 2 * MAXPATHLEN +
_PASSWORD_LEN + 1) bytes (it is unclear how much is required for pw_gecos).

This buffer can be dynamically reallocated until getpwnam_r() is not
return ERANGE error (the following code has not been compiled and verified):

#define PWBUF_SIZE_INI (2 * MAXLOGNAME + 2 * MAXPATHLEN + _PASSWORD_LEN + 1)
#define PWBUF_SIZE_INC 128

size = PWBUF_SIZE_INI;
for (;;) {
size += PWBUF_SIZE_INC;
buf = malloc(size);
if (buf == NULL)
return (GSS_S_FAILURE);
error = getpwnam_r(lname, pwd, buf, size, pw);
free(buf);
if (pw != NULL) {
*uidp = pw-pw_uid;
return (GSS_S_COMPLETE);
} else {
if (error == ERANGE 
size = SIZE_MAX - PWBUF_SIZE_INC)
continue;
return (GSS_S_FAILURE);
}
}
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Possible bug in NFSv4 with krb5p security?

2013-02-19 Thread Elias Mårtenson

On 19 February 2013 17:31, Andrey Simonenko si...@comsys.ntu-kpi.kiev.uawrote:

It can require bigger buffer, since root can get the pw_password field
 in the struct passwd{}.

 Since sysconf(_SC_GETPW_R_SIZE_MAX) does not work on FreeBSD, the buffer
 for getpwnam_r() call should have at least (2 * MAXLOGNAME + 2 *
 MAXPATHLEN +
 _PASSWORD_LEN + 1) bytes (it is unclear how much is required for pw_gecos).

 This buffer can be dynamically reallocated until getpwnam_r() is not
 return ERANGE error (the following code has not been compiled and
 verified):


Is this really a better solution than to aim high right away? A series of
malloc() calls should certainly have much higher overhead than the previous
stack-allocated solution.

A better compromise would be to do the lookup in a separate function, that
allocates the buffer using alloca() instead, yes?

Regards,
Elias
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Possible bug in NFSv4 with krb5p security?

2013-02-19 Thread Andrey Simonenko

On Tue, Feb 19, 2013 at 05:35:50PM +0800, Elias Martenson wrote:
 On 19 February 2013 17:31, Andrey Simonenko 
 si...@comsys.ntu-kpi.kiev.uawrote:
 
 It can require bigger buffer, since root can get the pw_password field
  in the struct passwd{}.
 
  Since sysconf(_SC_GETPW_R_SIZE_MAX) does not work on FreeBSD, the buffer
  for getpwnam_r() call should have at least (2 * MAXLOGNAME + 2 *
  MAXPATHLEN +
  _PASSWORD_LEN + 1) bytes (it is unclear how much is required for pw_gecos).
 
  This buffer can be dynamically reallocated until getpwnam_r() is not
  return ERANGE error (the following code has not been compiled and
  verified):
 
 
 Is this really a better solution than to aim high right away? A series of
 malloc() calls should certainly have much higher overhead than the previous
 stack-allocated solution.
 
 A better compromise would be to do the lookup in a separate function, that
 allocates the buffer using alloca() instead, yes?

I cannot find how to get information about maximum buffer size for
the getpwnam_r() function.  This information should be returned by
sysconf(_SC_GETPW_R_SIZE_MAX), but since it does not work on FreeBSD
it is necessary to guess its size.  Original value is 128 and it works
for somebody, 1024 works for your environment, but it can fail for
another environment.

SUSv4 specifies Storage referenced by the structure is allocated from
the memory provided with the buffer parameter, but then tells about groups
in EXAMPLE for getpwnam_r() Note that sysconf(_SC_GETPW_R_SIZE_MAX) may
return -1 if there is no hard limit on the size of the buffer needed to
store all the groups returned.

malloc() can give overhead, but that function can try to call getpwnam_r()
with buffer allocated from stack and if getpwnam_r() failed with ERANGE
use dynamically allocated buffer.

#define PWBUF_SIZE_INI (2 * MAXLOGNAME + 2 * MAXPATHLEN + _PASSWORD_LEN + 1)
#define PWBUF_SIZE_INC 128

char bufs[2 * MAXLOGNAME + MAXPATHLEN + PASSWORD_LEN + 1 + 32];

error = getpwnam_r(lname, pwd, bufs, sizeof(bufs), pw);
if (pw != NULL) {
*uidp = pw-pw_uid;
return (GSS_S_COMPLETE);
} else if (error != ERANGE)
return (GSS_S_FAILURE);

size = PWBUF_SIZE_INI;
for (;;) {
size += PWBUF_SIZE_INC;
buf = malloc(size);
if (buf == NULL)
return (GSS_S_FAILURE);
error = getpwnam_r(lname, pwd, buf, size, pw);
free(buf);
if (pw != NULL) {
*uidp = pw-pw_uid;
return (GSS_S_COMPLETE);
} else {
if (error == ERANGE 
size = SIZE_MAX - PWBUF_SIZE_INC)
continue;
return (GSS_S_FAILURE);
}
}
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Possible bug in NFSv4 with krb5p security?

2013-02-19 Thread Rick Macklem

Andrey Simonenko wrote:
 On Tue, Feb 19, 2013 at 05:35:50PM +0800, Elias Martenson wrote:
  On 19 February 2013 17:31, Andrey Simonenko
  si...@comsys.ntu-kpi.kiev.uawrote:
 
  It can require bigger buffer, since root can get the pw_password
  field
   in the struct passwd{}.
  
   Since sysconf(_SC_GETPW_R_SIZE_MAX) does not work on FreeBSD, the
   buffer
   for getpwnam_r() call should have at least (2 * MAXLOGNAME + 2 *
   MAXPATHLEN +
   _PASSWORD_LEN + 1) bytes (it is unclear how much is required for
   pw_gecos).
  
   This buffer can be dynamically reallocated until getpwnam_r() is
   not
   return ERANGE error (the following code has not been compiled and
   verified):
  
 
  Is this really a better solution than to aim high right away? A
  series of
  malloc() calls should certainly have much higher overhead than the
  previous
  stack-allocated solution.
 
  A better compromise would be to do the lookup in a separate
  function, that
  allocates the buffer using alloca() instead, yes?
 
 I cannot find how to get information about maximum buffer size for
 the getpwnam_r() function. This information should be returned by
 sysconf(_SC_GETPW_R_SIZE_MAX), but since it does not work on FreeBSD
 it is necessary to guess its size. Original value is 128 and it works
 for somebody, 1024 works for your environment, but it can fail for
 another environment.
 
 SUSv4 specifies Storage referenced by the structure is allocated from
 the memory provided with the buffer parameter, but then tells about
 groups
 in EXAMPLE for getpwnam_r() Note that sysconf(_SC_GETPW_R_SIZE_MAX)
 may
 return -1 if there is no hard limit on the size of the buffer needed
 to
 store all the groups returned.
 
 malloc() can give overhead, but that function can try to call
 getpwnam_r()
 with buffer allocated from stack and if getpwnam_r() failed with
 ERANGE
 use dynamically allocated buffer.
 
 #define PWBUF_SIZE_INI (2 * MAXLOGNAME + 2 * MAXPATHLEN +
 _PASSWORD_LEN + 1)
 #define PWBUF_SIZE_INC 128
 
 char bufs[2 * MAXLOGNAME + MAXPATHLEN + PASSWORD_LEN + 1 + 32];
 
 error = getpwnam_r(lname, pwd, bufs, sizeof(bufs), pw);
 if (pw != NULL) {
 *uidp = pw-pw_uid;
 return (GSS_S_COMPLETE);
 } else if (error != ERANGE)
 return (GSS_S_FAILURE);
 
 size = PWBUF_SIZE_INI;
 for (;;) {
 size += PWBUF_SIZE_INC;
 buf = malloc(size);
 if (buf == NULL)
 return (GSS_S_FAILURE);
 error = getpwnam_r(lname, pwd, buf, size, pw);
 free(buf);
 if (pw != NULL) {
 *uidp = pw-pw_uid;
 return (GSS_S_COMPLETE);
 } else {
 if (error == ERANGE 
 size = SIZE_MAX - PWBUF_SIZE_INC)
 continue;
 return (GSS_S_FAILURE);
 }
 }

Just my opinion, but I think the above is a good approach.
(ie. First trying a fairly large buffer on the stack that
 will succeed 99.99% of the time, but check for ERANGE and
 loop trying progressively larger malloc'd buffers until
 it stops reporting ERANGE.)

I suspect the overheads behind getpwnam_r() are larger than
the difference between using a buffer on the stack vs malloc,
so I think it should use a fairly large buffer the first time.

Personally, I might have coded it as a single do { } while(),
with the first attempt in it, but that's just personal stylistic
taste. (You can check for buf != bufs before doing a free() of it.)
And, if you wanted to be clever, the code could use a static bufsiz_hint,
which is set to the largest size needed sofar and that is used as
the initial malloc size. That way it wouldn't loop as much for a
site with huge passwd entries. (An entire bio in the GECOS field or ???)

Btw, the same fix is needed in gssd.c, where it calls
getpwuid_r(). { Interesting that for Elias's case, it
must work for 128, although the getpwnam_r() didn't quite fit
in 128. }

Also, FYI, kuserok.c uses a 2048 byte buffer and doesn't check
for ERANGE.

rick

 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to
 freebsd-current-unsubscr...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Possible bug in NFSv4 with krb5p security?

2013-02-18 Thread Elias Mårtenson

On 17 February 2013 22:58, Rick Macklem rmack...@uoguelph.ca wrote:

I think the Makefiles are in the kerberos5 directory.

 Since the only function you care about is the one in
 kerberos5/lib/libgssapi_krb5/pname_to_uid.c, I'd
 just put a copy of that file in usr.sbin/gssd and
 modify the Makefile there to compile it and link
 its .o into gssd, avoiding rebuilding any libraries.

 I'd put a couple of fprintf(stderr, ...) in it and
 then run gssd -d and see what it says.

 Just how I'd attack it, rick


Good news! The problem is solved!

You were right, the problem was in pname_to_uid.c. In it, the following
code can be found:

char lname[MAXLOGNAME + 1], buf[1024];

/* some code snipped for brevity... */

getpwnam_r(lname, pwd, buf, sizeof(buf), pw);
if (pw) {
*uidp = pw-pw_uid;
return (GSS_S_COMPLETE);
} else {
return (GSS_S_FAILURE);
}

As it turns out, the getpwnam_r() call fails with ERANGE (I had to check
the return value from getpwnam_r() in order to determine this, as pw is set
to NULL both if there was an error or if the user name can't be found).

Now, increasing the size of buf to 1024 solved the problem, and now the
lookup works correctly.

I wrote a small test program that issued the same call to getpwnam_r() and
it worked. Until I su'ed to root, and then it failed.

It seems as though the buffer needs to be bigger if you're root. I have no
idea why, but there you have it. Problem solved.

Should this be fixed in the main codebase?

Oh, and thanks so much to all of you for being patient with me while
solving this. I really appreciate it. Also, I'd like to say that the code
base was quite pleasant to work with. Thanks for that too. :-)

Regards,
Elias
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Possible bug in NFSv4 with krb5p security?

2013-02-18 Thread Elias Mårtenson

On 19 February 2013 00:06, Elias Mårtenson loke...@gmail.com wrote:

char lname[MAXLOGNAME + 1], buf[1024];


Oops. Here I am, replying to myself.

The above is a typo. That's by modified code. In the original source, buf
is 128 bytes in size.

Regards,
Elias
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Possible bug in NFSv4 with krb5p security?

2013-02-18 Thread Rick Macklem

Elias Martenson wrote:
 On 17 February 2013 22:58, Rick Macklem rmack...@uoguelph.ca wrote:
 
 I think the Makefiles are in the kerberos5 directory.
 
  Since the only function you care about is the one in
  kerberos5/lib/libgssapi_krb5/pname_to_uid.c, I'd
  just put a copy of that file in usr.sbin/gssd and
  modify the Makefile there to compile it and link
  its .o into gssd, avoiding rebuilding any libraries.
 
  I'd put a couple of fprintf(stderr, ...) in it and
  then run gssd -d and see what it says.
 
  Just how I'd attack it, rick
 
 
 Good news! The problem is solved!
 
 You were right, the problem was in pname_to_uid.c. In it, the
 following
 code can be found:
 
 char lname[MAXLOGNAME + 1], buf[1024];
 
 /* some code snipped for brevity... */
 
 getpwnam_r(lname, pwd, buf, sizeof(buf), pw);
 if (pw) {
 *uidp = pw-pw_uid;
 return (GSS_S_COMPLETE);
 } else {
 return (GSS_S_FAILURE);
 }
 
 As it turns out, the getpwnam_r() call fails with ERANGE (I had to
 check
 the return value from getpwnam_r() in order to determine this, as pw
 is set
 to NULL both if there was an error or if the user name can't be
 found).
 
 Now, increasing the size of buf to 1024 solved the problem, and now
 the
 lookup works correctly.
 
 I wrote a small test program that issued the same call to getpwnam_r()
 and
 it worked. Until I su'ed to root, and then it failed.
 
 It seems as though the buffer needs to be bigger if you're root. I
 have no
 idea why, but there you have it. Problem solved.
 
 Should this be fixed in the main codebase?
 
Yes, I would definitely say so.

I won't be able to do a commit until April, but meybe someone else
can do a commit sooner?

 Oh, and thanks so much to all of you for being patient with me while
 solving this. I really appreciate it. Also, I'd like to say that the
 code
 base was quite pleasant to work with. Thanks for that too. :-)
 
And thanks for working through this, so we now have a fix, rick

 Regards,
 Elias
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to
 freebsd-current-unsubscr...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Possible bug in NFSv4 with krb5p security?

2013-02-17 Thread Elias Mårtenson

On 17 February 2013 02:17, Doug Rabson d...@rabson.org wrote:


 I think it was Rick that mentioned the patch. I would apply the patch and
 rebuild your kernel in the interests of changing as little as possible
 while debugging the original issue.


Fair enough. I did this. Thanks.

Now, I'm sorry for asking something that should be obvious, but how can I
rebuild crypto/heimdal? There is no Makefile in this directory, but when I
did make world it did build it. So how does this actually work? Is there
a special Makefile somewhere else that I should use? I need to be able to
rebuild these things withou thaving to do a full make world, which is the
only way I figured out so far.

(of course, I could do a automake/configure/make sequence, but it seems as
though the official FreeBSD build doesn't do this (I couldn't find any
config.log file dropped from the configure script)).

Regards,
Elias
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Possible bug in NFSv4 with krb5p security?

2013-02-17 Thread Rick Macklem

Elias Martenson wrote:
 On 17 February 2013 02:17, Doug Rabson d...@rabson.org wrote:
 
 
  I think it was Rick that mentioned the patch. I would apply the
  patch and
  rebuild your kernel in the interests of changing as little as
  possible
  while debugging the original issue.
 
 
 Fair enough. I did this. Thanks.
 
 Now, I'm sorry for asking something that should be obvious, but how
 can I
 rebuild crypto/heimdal?
I think the Makefiles are in the kerberos5 directory.

Since the only function you care about is the one in
kerberos5/lib/libgssapi_krb5/pname_to_uid.c, I'd
just put a copy of that file in usr.sbin/gssd and
modify the Makefile there to compile it and link
its .o into gssd, avoiding rebuilding any libraries.

I'd put a couple of fprintf(stderr, ...) in it and
then run gssd -d and see what it says.

Just how I'd attack it, rick

 There is no Makefile in this directory, but
 when I
 did make world it did build it. So how does this actually work? Is
 there
 a special Makefile somewhere else that I should use? I need to be able
 to
 rebuild these things withou thaving to do a full make world, which
 is the
 only way I figured out so far.
 
 (of course, I could do a automake/configure/make sequence, but it
 seems as
 though the official FreeBSD build doesn't do this (I couldn't find any
 config.log file dropped from the configure script)).
 
 Regards,
 Elias
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to
 freebsd-current-unsubscr...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Possible bug in NFSv4 with krb5p security?

2013-02-16 Thread Elias Mårtenson

OK, here I am replying to my own email. I just want to mention that I
removed the ports version of Heimdal, but with no change in behaviour.


On 16 February 2013 09:38, Elias Mårtenson loke...@gmail.com wrote:


 On 16 Feb, 2013 1:42 AM, Benjamin Kaduk ka...@mit.edu wrote:
 
  And yet one more thing: Heimdal ships with its own version of
 libgssapi. I
  can link gssd to it, but it won't run properly (it hangs pretty early).
 
  I have forgotten: you are using Heimdal from ports, not from the base
 system?  I remember it being easy to get into subtly-broken configurations
 when both a ports and a base version are present.

 I am indeed using Heimdal from ports. This machine is also the KDC. I
 wasn't aware that there was a non-ports version available.

 What do you suggest I do? Simply remove the one from ports? Do I have to
 do something to activate the other one?

 (I have a hard time checking this as I am nowhere near the computers now)

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Possible bug in NFSv4 with krb5p security?

2013-02-16 Thread Doug Rabson

This may be a stupid question but does the user 'elias' exist in the local
password database?

If you are using heimdal from the base distribution and you have source,
you should be able to build them with debug information which may help.
When I was writing gssd, I mostly ran it under gdb to debug problems like
this. To build something in the base with debug information, go to the
directory in the source tree for that component and type something like
'make DEBUG_FLAGS=-g clean all install'.


On 16 February 2013 09:24, Elias Mårtenson loke...@gmail.com wrote:

 OK, here I am replying to my own email. I just want to mention that I
 removed the ports version of Heimdal, but with no change in behaviour.


 On 16 February 2013 09:38, Elias Mårtenson loke...@gmail.com wrote:

 
  On 16 Feb, 2013 1:42 AM, Benjamin Kaduk ka...@mit.edu wrote:
  
   And yet one more thing: Heimdal ships with its own version of
  libgssapi. I
   can link gssd to it, but it won't run properly (it hangs pretty
 early).
  
   I have forgotten: you are using Heimdal from ports, not from the base
  system?  I remember it being easy to get into subtly-broken
 configurations
  when both a ports and a base version are present.
 
  I am indeed using Heimdal from ports. This machine is also the KDC. I
  wasn't aware that there was a non-ports version available.
 
  What do you suggest I do? Simply remove the one from ports? Do I have to
  do something to activate the other one?
 
  (I have a hard time checking this as I am nowhere near the computers now)
 
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Possible bug in NFSv4 with krb5p security?

2013-02-16 Thread Elias Mårtenson

On 16 February 2013 18:58, Doug Rabson d...@rabson.org wrote:

 This may be a stupid question but does the user 'elias' exist in the local
 password database?

 If you are using heimdal from the base distribution and you have source,
 you should be able to build them with debug information which may help.
 When I was writing gssd, I mostly ran it under gdb to debug problems like
 this. To build something in the base with debug information, go to the
 directory in the source tree for that component and type something like
 'make DEBUG_FLAGS=-g clean all install'.


No worries. I do have that user (and everything else, specifically single
sign-on ssh) works with it. I do agree that if I had not that user, the
behaviour I see would be neatly explained.

When it comes to gssd, I've got its behaviour pretty well nailed down. It
does what it's supposed to do.

However, when I tried rebuilding libgssapi.so.10, I ended up with gssd
hanging when it used the new library. I have no idea why.

Would it be wise to upgrade from 9.1-RELEASE to something newer? I've seen
references to 10-CURRENT. I'd like to be debugging the latest version of
everything, but this machine also needs to serve as a fileserver for my
home office, so some degree of stability is needed.

Regards,
Elias
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Possible bug in NFSv4 with krb5p security?

2013-02-16 Thread Doug Rabson

On 16 February 2013 13:57, Elias Mårtenson loke...@gmail.com wrote:

 On 16 February 2013 18:58, Doug Rabson d...@rabson.org wrote:

 This may be a stupid question but does the user 'elias' exist in the
 local password database?

 If you are using heimdal from the base distribution and you have source,
 you should be able to build them with debug information which may help.
 When I was writing gssd, I mostly ran it under gdb to debug problems like
 this. To build something in the base with debug information, go to the
 directory in the source tree for that component and type something like
 'make DEBUG_FLAGS=-g clean all install'.


 No worries. I do have that user (and everything else, specifically single
 sign-on ssh) works with it. I do agree that if I had not that user, the
 behaviour I see would be neatly explained.

 When it comes to gssd, I've got its behaviour pretty well nailed down. It
 does what it's supposed to do.

 However, when I tried rebuilding libgssapi.so.10, I ended up with gssd
 hanging when it used the new library. I have no idea why.

 Would it be wise to upgrade from 9.1-RELEASE to something newer? I've seen
 references to 10-CURRENT. I'd like to be debugging the latest version of
 everything, but this machine also needs to serve as a fileserver for my
 home office, so some degree of stability is needed.


I don't think much (if anything) has changed with gssd between 9.1 and
current. When your gssd hangs, you can try to get a stack trace using gdb's
attach command.



 Regards,
 Elias

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Possible bug in NFSv4 with krb5p security?

2013-02-16 Thread Elias Mårtenson

On 17 February 2013 00:03, Doug Rabson d...@rabson.org wrote:

 I don't think much (if anything) has changed with gssd between 9.1 and
 current. When your gssd hangs, you can try to get a stack trace using gdb's
 attach command.

 Fair enough. However, when it hangs, I have at least a 50% chance of
hitting the gssd-realted kerberl panic. Should I apply the patch you gave
me, or would you suggest an upgrade?
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Possible bug in NFSv4 with krb5p security?

2013-02-16 Thread Doug Rabson

On 16 February 2013 16:18, Elias Mårtenson loke...@gmail.com wrote:

 On 17 February 2013 00:03, Doug Rabson d...@rabson.org wrote:

 I don't think much (if anything) has changed with gssd between 9.1 and
 current. When your gssd hangs, you can try to get a stack trace using gdb's
 attach command.

 Fair enough. However, when it hangs, I have at least a 50% chance of
 hitting the gssd-realted kerberl panic. Should I apply the patch you gave
 me, or would you suggest an upgrade?


I think it was Rick that mentioned the patch. I would apply the patch and
rebuild your kernel in the interests of changing as little as possible
while debugging the original issue.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Possible bug in NFSv4 with krb5p security?

2013-02-16 Thread Rick Macklem

Doug Rabson wrote:
 On 16 February 2013 13:57, Elias Mårtenson loke...@gmail.com wrote:
 
  On 16 February 2013 18:58, Doug Rabson d...@rabson.org wrote:
 
  This may be a stupid question but does the user 'elias' exist in
  the
  local password database?
 
  If you are using heimdal from the base distribution and you have
  source,
  you should be able to build them with debug information which may
  help.
  When I was writing gssd, I mostly ran it under gdb to debug
  problems like
  this. To build something in the base with debug information, go to
  the
  directory in the source tree for that component and type something
  like
  'make DEBUG_FLAGS=-g clean all install'.
 
 
  No worries. I do have that user (and everything else, specifically
  single
  sign-on ssh) works with it. I do agree that if I had not that user,
  the
  behaviour I see would be neatly explained.
 
  When it comes to gssd, I've got its behaviour pretty well nailed
  down. It
  does what it's supposed to do.
 
  However, when I tried rebuilding libgssapi.so.10, I ended up with
  gssd
  hanging when it used the new library. I have no idea why.
 
  Would it be wise to upgrade from 9.1-RELEASE to something newer?
  I've seen
  references to 10-CURRENT. I'd like to be debugging the latest
  version of
  everything, but this machine also needs to serve as a fileserver for
  my
  home office, so some degree of stability is needed.
 
 
 I don't think much (if anything) has changed with gssd between 9.1 and
 current.
Nothing that would affect this, as far as I know. The changes were done
to add support for searching for credential cache files with different
names. This should affect client side behaviour only and only if the
new command line options are used to enable this code.

 When your gssd hangs, you can try to get a stack trace using
 gdb's
 attach command.
 
 
 
  Regards,
  Elias
 
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to
 freebsd-current-unsubscr...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Possible bug in NFSv4 with krb5p security?

2013-02-15 Thread Elias Mårtenson

On 14 February 2013 07:42, Rick Macklem rmack...@uoguelph.ca wrote:

Elias Martenson wrote: Secondly, what if the issue is gssd not correctly
 mapping the
  principals to
  Unix usernames? How can I determine if this is the case. There seems
  to be
  no logging options for gssd (-d does absolutely nothing other than
  prevent
  the process from detaching. It still doesn't log anything).
 
 Yep. I added a few cases that output debugging, but they're all on the
 client side. (I wasn't the original author of this gssd.)

 You could easily add some. It's the function with pname_to_uid in it
 that does the translation. It basically does a gss_pname_to_uid()
 followed by a getpwuid() to do the translation from principal name
 to uid + gid list. If this fails, then it maps uid == 65534, which
 is usually nobody. (Why does the code has 65534 hardwired in it?
 I have no idea.;-)

 Just add fprintf()s and run it with -d to see what it is doing.

 If the initiator principal is nfs/client-host.domain it will get
 mapped to nobody as above.


Thank you. I did exactly that and I found out some more.

The problem occurss in file gss.c, in the
function gssd_pname_to_uid_1_svc(). This function is responsible for taking
a principal and returning the Unix user ID that this principal corresponds
to. I did confirm that this function is called with elias@REALM, which is
the correct principal. It then calls the libgssapi function
gss_pname_to_uid() which does the actual lookup.

The problem is that after the lookup (which succeeds by the way), it
returns user ID 0 (i.e. root, what!?). Of course, this uid later gets
mapped to nobody, resulting in the behaviour that I see.

I tried to add more debugging information in libgssapi.so.10, but if I just
try to add some printf() statements, the entire thing hangs. I'm not sure
how to proceed from there.

Oh, and the libgssapi function gss_pname_to_uid() actually delegates the
actual lookup to a function that depends on what security mechanism is in
place. My printf()'s (that caused the hang) attempted to print what
mechanism was actually used.

And yet one more thing: Heimdal ships with its own version of libgssapi. I
can link gssd to it, but it won't run properly (it hangs pretty early).

Does anyone have any idea what might be going on here?

Regards,
Elias
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Possible bug in NFSv4 with krb5p security?

2013-02-15 Thread Benjamin Kaduk


On Sat, 16 Feb 2013, Elias Mårtenson wrote:



Thank you. I did exactly that and I found out some more.

The problem occurss in file gss.c, in the
function gssd_pname_to_uid_1_svc(). This function is responsible for taking
a principal and returning the Unix user ID that this principal corresponds
to. I did confirm that this function is called with elias@REALM, which is
the correct principal. It then calls the libgssapi function
gss_pname_to_uid() which does the actual lookup.

The problem is that after the lookup (which succeeds by the way), it
returns user ID 0 (i.e. root, what!?). Of course, this uid later gets
mapped to nobody, resulting in the behaviour that I see.

I tried to add more debugging information in libgssapi.so.10, but if I just
try to add some printf() statements, the entire thing hangs. I'm not sure
how to proceed from there.

Oh, and the libgssapi function gss_pname_to_uid() actually delegates the
actual lookup to a function that depends on what security mechanism is in
place. My printf()'s (that caused the hang) attempted to print what
mechanism was actually used.


Unless things are very messed up, it should be using the krb5 mechanism, 
which I believe will boil down to krb5_aname_to_localname, per 
heimdal/lib/gssapi/krb5/pname_to_uid.c.  I'm not sure how this would end 
up with success but uid 0, though.
Do you have the default realm set in krb5.conf?  Having it set to a 
different value than the realm of elias@REALM could result in strange 
behavior.



And yet one more thing: Heimdal ships with its own version of libgssapi. I
can link gssd to it, but it won't run properly (it hangs pretty early).


I have forgotten: you are using Heimdal from ports, not from the base 
system?  I remember it being easy to get into subtly-broken configurations 
when both a ports and a base version are present.


-Ben Kaduk___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Possible bug in NFSv4 with krb5p security?

2013-02-15 Thread Rick Macklem

Benjamin Kaduk wrote:
 On Sat, 16 Feb 2013, Elias Mårtenson wrote:
 
 
  Thank you. I did exactly that and I found out some more.
 
  The problem occurss in file gss.c, in the
  function gssd_pname_to_uid_1_svc(). This function is responsible for
  taking
  a principal and returning the Unix user ID that this principal
  corresponds
  to. I did confirm that this function is called with elias@REALM,
  which is
  the correct principal. It then calls the libgssapi function
  gss_pname_to_uid() which does the actual lookup.
 
  The problem is that after the lookup (which succeeds by the way), it
  returns user ID 0 (i.e. root, what!?). Of course, this uid later
  gets
  mapped to nobody, resulting in the behaviour that I see.
 
  I tried to add more debugging information in libgssapi.so.10, but if
  I just
  try to add some printf() statements, the entire thing hangs. I'm not
  sure
  how to proceed from there.
 
  Oh, and the libgssapi function gss_pname_to_uid() actually delegates
  the
  actual lookup to a function that depends on what security mechanism
  is in
  place. My printf()'s (that caused the hang) attempted to print what
  mechanism was actually used.
 
 Unless things are very messed up, it should be using the krb5
 mechanism,
 which I believe will boil down to krb5_aname_to_localname, per
 heimdal/lib/gssapi/krb5/pname_to_uid.c. I'm not sure how this would
 end
 up with success but uid 0, though.
 Do you have the default realm set in krb5.conf? Having it set to a
 different value than the realm of elias@REALM could result in strange
 behavior.
 
  And yet one more thing: Heimdal ships with its own version of
  libgssapi. I
  can link gssd to it, but it won't run properly (it hangs pretty
  early).
 
 I have forgotten: you are using Heimdal from ports, not from the base
 system? I remember it being easy to get into subtly-broken
 configurations
 when both a ports and a base version are present.
 
 -Ben Kaduk
Well, here's the aname_to_localname function sources. After this, it just
calls getpwnam_r() to get the password database entry for the name.
I've put *** in front of what I suspect is causing your problem.
I have no idea when there is a name_string.len == 2 with root as the
second string. Maybe Benjamin knows?

krb5_error_code KRB5_LIB_FUNCTION
krb5_aname_to_localname (krb5_context context,
 krb5_const_principal aname,
 size_t lnsize,
 char *lname)
{
krb5_error_code ret;
krb5_realm *lrealms, *r;
int valid;
size_t len;
const char *res;

ret = krb5_get_default_realms (context, lrealms);
if (ret)
return ret;

valid = 0;
for (r = lrealms; *r != NULL; ++r) {
if (strcmp (*r, aname-realm) == 0) {
valid = 1;
break;
}
}
krb5_free_host_realm (context, lrealms);
if (valid == 0)
return KRB5_NO_LOCALNAME;

if (aname-name.name_string.len == 1)
res = aname-name.name_string.val[0];
*** else if (aname-name.name_string.len == 2
  strcmp (aname-name.name_string.val[1], root) == 0) {
krb5_principal rootprinc;
krb5_boolean userok;

res = root;

ret = krb5_copy_principal(context, aname, rootprinc);
if (ret)
return ret;

userok = krb5_kuserok(context, rootprinc, res);
krb5_free_principal(context, rootprinc);
if (!userok)
return KRB5_NO_LOCALNAME;

} else
return KRB5_NO_LOCALNAME;

len = strlen (res);
if (len = lnsize)
return ERANGE;
strlcpy (lname, res, lnsize);

return 0;
}

I've never seen Kerberos map to root like the above would for
name.name_string.len == 2, but I'm guessing that's how you get
uid == 0.

rick
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Possible bug in NFSv4 with krb5p security?

2013-02-15 Thread Elias Mårtenson

On 16 Feb, 2013 8:57 AM, Rick Macklem rmack...@uoguelph.ca wrote:

 Benjamin Kaduk wrote:
  On Sat, 16 Feb 2013, Elias Mårtenson wrote:
 
  
   Thank you. I did exactly that and I found out some more.
  
   The problem occurss in file gss.c, in the
   function gssd_pname_to_uid_1_svc(). This function is responsible for
   taking
   a principal and returning the Unix user ID that this principal
   corresponds
   to. I did confirm that this function is called with elias@REALM,
   which is
   the correct principal. It then calls the libgssapi function
   gss_pname_to_uid() which does the actual lookup.
  
   The problem is that after the lookup (which succeeds by the way), it
   returns user ID 0 (i.e. root, what!?). Of course, this uid later
   gets
   mapped to nobody, resulting in the behaviour that I see.
  
   I tried to add more debugging information in libgssapi.so.10, but if
   I just
   try to add some printf() statements, the entire thing hangs. I'm not
   sure
   how to proceed from there.
  
   Oh, and the libgssapi function gss_pname_to_uid() actually delegates
   the
   actual lookup to a function that depends on what security mechanism
   is in
   place. My printf()'s (that caused the hang) attempted to print what
   mechanism was actually used.
 
  Unless things are very messed up, it should be using the krb5
  mechanism,
  which I believe will boil down to krb5_aname_to_localname, per
  heimdal/lib/gssapi/krb5/pname_to_uid.c. I'm not sure how this would
  end
  up with success but uid 0, though.
  Do you have the default realm set in krb5.conf? Having it set to a
  different value than the realm of elias@REALM could result in strange
  behavior.
 
   And yet one more thing: Heimdal ships with its own version of
   libgssapi. I
   can link gssd to it, but it won't run properly (it hangs pretty
   early).
 
  I have forgotten: you are using Heimdal from ports, not from the base
  system? I remember it being easy to get into subtly-broken
  configurations
  when both a ports and a base version are present.
 
  -Ben Kaduk
 Well, here's the aname_to_localname function sources. After this, it just
 calls getpwnam_r() to get the password database entry for the name.
 I've put *** in front of what I suspect is causing your problem.
 I have no idea when there is a name_string.len == 2 with root as the
 second string. Maybe Benjamin knows?

 krb5_error_code KRB5_LIB_FUNCTION
 krb5_aname_to_localname (krb5_context context,
  krb5_const_principal aname,
  size_t lnsize,
  char *lname)
 {
 krb5_error_code ret;
 krb5_realm *lrealms, *r;
 int valid;
 size_t len;
 const char *res;

 ret = krb5_get_default_realms (context, lrealms);
 if (ret)
 return ret;

 valid = 0;
 for (r = lrealms; *r != NULL; ++r) {
 if (strcmp (*r, aname-realm) == 0) {
 valid = 1;
 break;
 }
 }
 krb5_free_host_realm (context, lrealms);
 if (valid == 0)
 return KRB5_NO_LOCALNAME;

 if (aname-name.name_string.len == 1)
 res = aname-name.name_string.val[0];
 *** else if (aname-name.name_string.len == 2
   strcmp (aname-name.name_string.val[1], root) == 0) {
 krb5_principal rootprinc;
 krb5_boolean userok;

 res = root;

 ret = krb5_copy_principal(context, aname, rootprinc);
 if (ret)
 return ret;

 userok = krb5_kuserok(context, rootprinc, res);
 krb5_free_principal(context, rootprinc);
 if (!userok)
 return KRB5_NO_LOCALNAME;

 } else
 return KRB5_NO_LOCALNAME;

 len = strlen (res);
 if (len = lnsize)
 return ERANGE;
 strlcpy (lname, res, lnsize);

 return 0;
 }

 I've never seen Kerberos map to root like the above would for
 name.name_string.len == 2, but I'm guessing that's how you get
 uid == 0.

Sorry for bad formatting, I'm typing this on my phone.

Wouldn't the case you quoted cover the case where you gave a principal name
like foo/root which would be mapped to root?

I've never seen such principals, but it makes sense based on what I see in
the code.

Regards,
Elias
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Possible bug in NFSv4 with krb5p security?

2013-02-15 Thread Elias Mårtenson

On 16 Feb, 2013 1:42 AM, Benjamin Kaduk ka...@mit.edu wrote:

 And yet one more thing: Heimdal ships with its own version of libgssapi.
I
 can link gssd to it, but it won't run properly (it hangs pretty early).

 I have forgotten: you are using Heimdal from ports, not from the base
system?  I remember it being easy to get into subtly-broken configurations
when both a ports and a base version are present.

I am indeed using Heimdal from ports. This machine is also the KDC. I
wasn't aware that there was a non-ports version available.

What do you suggest I do? Simply remove the one from ports? Do I have to do
something to activate the other one?

(I have a hard time checking this as I am nowhere near the computers now)
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Possible bug in NFSv4 with krb5p security?

2013-02-14 Thread Elias Mårtenson

Thank you for your help. I'm currently in the process of analysing what is
happening inside gssd during these operations. I'll get back later with a
summary of my findings.

However, I have found a real bug this time. An honest to FSM kernel crash.
This is how I reproduced it:

  - Kill gssd
  - Attempt to mount a kerberised NFS mount from the Linux machine
  - The mount attempt will hang because gssd isn't running
  - While the mount is hung, start gssd
  - Kernel crash

What should I do about that one?

Regards,
Elias
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Possible bug in NFSv4 with krb5p security?

2013-02-14 Thread Rick Macklem

Elias Martenson wrote:
 Thank you for your help. I'm currently in the process of analysing
 what is happening inside gssd during these operations. I'll get back
 later with a summary of my findings.
 
 
 However, I have found a real bug this time. An honest to FSM kernel
 crash. This is how I reproduced it:
 
 
 - Kill gssd
 - Attempt to mount a kerberised NFS mount from the Linux machine
 - The mount attempt will hang because gssd isn't running
 - While the mount is hung, start gssd
 - Kernel crash
 
 
 What should I do about that one?
 
There was a patch applied to head about 2 months ago (r244370) to
stop crashes when the gssd was restarted. The patch is also here:
 http://people.freebsd.org/~rmacklem/kgssapi.patch

I don't remember if you mentioned which kernel version you are
running, but this would also be in stable/9 of about 6 weeks ago,
but not in 9.1-release.

If your system has this patch and still crashes, please email
the backtrace for the crash. You can take a photo of it, if it
is a screen console.

rick

 
 Regards,
 Elias
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Possible bug in NFSv4 with krb5p security?

2013-02-13 Thread Elias Mårtenson

Thanks for the information. I was looking a bit further into the tcpdump
log, and this is what happens:

Here are some relevant packets:

 115 
NULL call establishing a mutual context(?)
  GSS-API:
Kerberos AP-REQ:
  Ticket: Server Name (Principal): nfs/domainname
 117 
NULL reply to packet 115
 169 
NFSV4 Call. OPEN DH:.../foo.txt
  Credentials:
Procedure: RPCSEC_GSS_DATA
GSS Context: (reference to the context created in the request from
frame 115)
  Verifier:
GSS Token: (some long sequence of binary data)
GSS-API: krb5_blob (more binary data)
 170 
NFSV4 Reply
  OPEN Status: NFS4ERR_ACCESS
=

The weird thing here is that the OPEN request in packet 169 refers to a
context that was created for the principal name nfs/domainname (i.e. the
service principal). Am I correct in my understanding that this context is
different from the user principal that is accessing the data? And if so,
where is the details regarding that principal? Is it in the verifier part
in packet 169?

Secondly, what if the issue is gssd not correctly mapping the principals to
Unix usernames? How can I determine if this is the case. There seems to be
no logging options for gssd (-d does absolutely nothing other than prevent
the process from detaching. It still doesn't log anything).

Regards,
Elias

On 13 February 2013 06:47, Rick Macklem rmack...@uoguelph.ca wrote:

 Elias Martenson wrote:
  On 12 February 2013 23:20, Rick Macklem  rmack...@uoguelph.ca 
  wrote:
 
 
 
 
 
 
  There is (in case you missed it on google):
  http://code.google.com/p/macnfsv4/wiki/FreeBSD8KerberizedNFSSetup
  (Nothing much has changed since FreeBSD8, except the name of the
  client
  side patch for host based initiator credentials in the keytab file.)
  I was hoping others would add/update the wiki and it would eventually
  become FreeBSD doc, but that hasn't happened.
 
 
 
 
  Thank you for the link. I have indeed found that, and I have followed
  it to the letter.
 
 
  I have up the exact same thing from Ubuntu machines as well as from
  Solaris, and I do have a fairly good understanding of Kerberos.
  FreeBSD however, is pretty new to me.
 
 
  Other than that, the various discussions in the archive on this list
  may help. Unfortunately, I don't know of an easy way to figure out
  what is busted. I always suggest looking at the packets in wireshark,
  but for some reason, I get the impression that folk don't like
  doing this? It is what I do first when I run into NFS issues.
 
 
 
  I've been looking at the dumps using Wireshark. Well, I had to drop
  down the security since everything is encrypted when using keb5p. I do
  get the same errors using sec=krb5.
 
 
  When looking at this trace, I see a normal OPEN request followed by a
  NFS4ERR_ACCESS as a reply. The Kerberos credentials are of course
  encrypted, so I can't really say anything about that part.
 
 Well, it sounds like you are doing all the right stuff, so I don't know
 why it is returning EACCES?

 I'm not a ZFS person, so I never test that. If you have a UFS file system
 you could export for testing, that might be worth a try. ZFS likes to do
 things differently;-)

 You can look at the authentication stuff in the RPC header:
 Actually, the credentials in the RPC header aren't encrypted, although
 they are binary data. It's been a while since I looked at the RFC, but
 the authenticator is basically:
 - an RPCSEC_GSS version (must be 1 for FreeBSD to be able to use it)
 - a type that will be DATA in this case
 - a credential handle (just a blob of bits the server uses as a shorthand
 for the principal)
 - a sequence# used to subvert replay attempts

 Then the authentication verifier is an encrypted checksum of the above,
 that the server uses to verify it.

 All the Kerberos stuff happens via NFS Null RPCs, where the GSSAPI tokens
 are passed as data (a Null RPC has no arguments) and the credential handle
 and a session key get established. (The Kerberos ticket is inside the
 GSSAPI
 token for the first of these Null RPCs.)

 
  Note that NFS4 with Kerberos security never uses the user ID numbers.
  They purely use the Kerberos principals for authorisation. This is
  different from the default sys security model that blindly uses user
  ID's.
 
 Yep, of course. The Kerberos user principal name@REALM is translated
 to a uid + gid list by the gssd via a lookup of name as the username.
 The uid + gid list is then associated with that credential handle I
 mentioned
 above.

 
 
 
   nfscbd_enable=YES
  Needed for client side only, and only if delegations are
  enabled at the server and you want the client to acquire
  delegations. (Delegations are not enabled by default on the
  FreeBSD NFSv4 server.)
 
 
 
  Noted. Thank you. I will change this.
 
 
 
   rpc_lockd_enable=YES
   rpc_statd_enable=YES
  You shouldn't need rpc_lockd or rpc_statd for NFSv4,
  since they are only used for NFSv3.
 
 
 
  Good point. I'll disable

Re: Possible bug in NFSv4 with krb5p security?

2013-02-13 Thread Rick Macklem

Elias Martenson wrote:
 Thanks for the information. I was looking a bit further into the
 tcpdump
 log, and this is what happens:
 
 Here are some relevant packets:
 
  115 
 NULL call establishing a mutual context(?)
 GSS-API:
 Kerberos AP-REQ:
 Ticket: Server Name (Principal): nfs/domainname
  117 
 NULL reply to packet 115
  169 
 NFSV4 Call. OPEN DH:.../foo.txt
 Credentials:
 Procedure: RPCSEC_GSS_DATA
 GSS Context: (reference to the context created in the request from
 frame 115)
 Verifier:
 GSS Token: (some long sequence of binary data)
 GSS-API: krb5_blob (more binary data)
  170 
 NFSV4 Reply
 OPEN Status: NFS4ERR_ACCESS
 =
 
 The weird thing here is that the OPEN request in packet 169 refers to
 a
 context that was created for the principal name nfs/domainname (i.e.
 the
 service principal).
Normally the client will get a service ticket for nfs/server-host.domain using
initiator credentials that are either user (via their TGT) or
nfs/client-host.domain via its keytab entry. The important part is what the
initiator principal is. (Watching what gets logged in your KDC's log as you
do the mount and access attempt might answer this if the wireshark trace 
doesn't.)

 Am I correct in my understanding that this context
 is
 different from the user principal that is accessing the data?
Ok, it sounds like the Linux client might be using a host based initiator 
credential
for this request, but I'm not sure. (This is the credential where the
initiator principal was nfs/client-host.domain.)

Most NFSv4 clients use a host based initiator credential in a keytab file
for system operations. (The FreeBSD client can only do this if it is patched.)
(By system operations, I am referring to NFSv4 Operations that are related to
 maintenance of lock state: SetClientID, SetClientIDConfirm, Renew, 
ReleaseLockOwner
 and maybe a couple of others.)
This is done so that the mount doesn't break when a user's TGT expires and also
because few Kerberos setups have a user principal for root and root is usually
doing the mounts.

Now, what I think the client should be doing is a separate Null RPC sequence
for each user (like elias) when that user first attempts to access files
within the mount. The user must have a valid TGT at this point for it to work.
Then it should use the credential handle acquired from that exchange for file
accesses (including Opens) for that user.

If wireshark understands the RPCSEC_GSS authenticator, try and look for the
credential handle field and also look for a separate Null RPC sequence for
user principal elias or whoever you are logged in as when trying to open
the file.

If you are correct and the Linux client is using the credential handle it
acquired for nfs/client-host.domain, then the FreeBSD gssd would map that to
nobody and it would explain what your problem is. I would consider that
a Linux client bug. Linux clients have been successfully tested against
the FreeBSD server (did not do the above) in the past, but things change/break.

 And if
 so,
 where is the details regarding that principal? Is it in the verifier
 part
 in packet 169?
 
There is a credential handle (I might not have the correct name since I
haven't looked at the RFC in years) which identifies which principal it
is referred to. (The same credential handle is in the reply to the Null RPC
for the RPCSEC_GSS init.) So you need to find the Null RPC sequence that
creates the particular credential handle (just bits that are the same) used
in the RPCSEC_GSS data authenticator. The principal is in the ticket inside
the GSSAPI token in the Null RPC data.

 Secondly, what if the issue is gssd not correctly mapping the
 principals to
 Unix usernames? How can I determine if this is the case. There seems
 to be
 no logging options for gssd (-d does absolutely nothing other than
 prevent
 the process from detaching. It still doesn't log anything).
 
Yep. I added a few cases that output debugging, but they're all on the
client side. (I wasn't the original author of this gssd.)

You could easily add some. It's the function with pname_to_uid in it
that does the translation. It basically does a gss_pname_to_uid()
followed by a getpwuid() to do the translation from principal name
to uid + gid list. If this fails, then it maps uid == 65534, which
is usually nobody. (Why does the code has 65534 hardwired in it?
I have no idea.;-)

Just add fprintf()s and run it with -d to see what it is doing.

If the initiator principal is nfs/client-host.domain it will get
mapped to nobody as above.

Good luck with it, rick

 Regards,
 Elias
 
 On 13 February 2013 06:47, Rick Macklem rmack...@uoguelph.ca wrote:
 
  Elias Martenson wrote:
   On 12 February 2013 23:20, Rick Macklem  rmack...@uoguelph.ca 
   wrote:
  
  
  
  
  
  
   There is (in case you missed it on google):
   http://code.google.com/p/macnfsv4/wiki/FreeBSD8KerberizedNFSSetup
   (Nothing much has changed since FreeBSD8, except the name of the

Re: Possible bug in NFSv4 with krb5p security?

2013-02-12 Thread Rick Macklem

Elias Martenson wrote:
First of all, I used the bug word in the subject, and I'm not doing
that
lightly. I fully understand that the initial reaction to such claim is
he
did something wrong, and frankly, that's what I'm hoping.

I've spent the last two weeks trying to get an NFS share working with
krb5p
security from a FreeBSD server to OSX and Ubuntu clients. I've
followed all
the documentation, read everything Google could find for me, asked on
the
IRC channel and even asked on Stackexchange, all to no avail.

There is (in case you missed it on google):
http://code.google.com/p/macnfsv4/wiki/FreeBSD8KerberizedNFSSetup
(Nothing much has changed since FreeBSD8, except the name of the client
side patch for host based initiator credentials in the keytab file.)
I was hoping others would add/update the wiki and it would eventually
become FreeBSD doc, but that hasn't happened.

Feel free to add to the wiki. All you need is a google login.

Other than that, the various discussions in the archive on this list
may help. Unfortunately, I don't know of an easy way to figure out
what is busted. I always suggest looking at the packets in wireshark,
but for some reason, I get the impression that folk don't like
doing this? It is what I do first when I run into NFS issues.

In all my reading, something struck me as odd: Nowhere did I find any
indication that anyone has actually set this up on 9.1-CURRENT. After
receiving zero replies on Stackexchange I started to think that
perhaps
this is actually a bug.

Now, after all this talk, please let me explain what I've done. Most
of
this text is taken verbatim from my Stackexchange question here:

http://serverfault.com/questions/477118/permissions-are-not-taking-effect-with-kerberised-nfsv4-on-freebsd

Problem summary
===

My goal is to achieve the following:

- Files served from the FreeBSD system
- The only security model should be krb5p
- Clients are Linux (Ubuntu) and OSX

The problem that I'm facing is that even though the Kerberos
authentication
works, all accesses are performed using the user nobody.

I can see the permissions when I do ls -l. Even the user mapping
works
correctly, but unless nobody has permission to do anything with the
files, I get a permission denied.

Here's an example interaction from the client (Ubuntu in this case,
but the
same thing happens from OSX). In this example,
/export/shared/testshare is
the shared directory from the FreeBSD server:

(I have changed the actual domain name to `domain` and the Kerberos
realm
name to `REALM`)

$ kinit
Password for elias@REALM:
$ klist
Ticket cache: FILE:/tmp/krb5cc_1000_GBjtDP
Default principal: elias@REALM

Valid starting Expires Service principal
09/02/2013 09:40:47 10/02/2013 09:40:44 krbtgt/REALM@REALM
$ sudo mount -t nfs4 -osec=krb5p,vers=4 lion:/export/shared/testshare
/mnt
$ ls -l /mnt
total 4
-rw-r--r-- 1 nobody nogroup 5 Feb 7 18:17 bar.txt
-rw--- 1 elias nogroup 4 Feb 5 23:09 foo.txt
$ cat /mnt/bar.txt
blah
$ echo foo /mnt/bar.txt
bash: /mnt/bar.txt: Permission denied
$ cat /mnt/foo.txt
cat: /mnt/foo.txt: Permission denied
$ klist
Ticket cache: FILE:/tmp/krb5cc_1000_GBjtDP
Default principal: elias@REALM

Valid starting Expires Service principal
09/02/2013 09:40:47 10/02/2013 09:40:44 krbtgt/REALM@REALM
09/02/2013 09:41:56 10/02/2013 09:40:44 nfs/lion.domain@REALM

Server configuration

I have had quite some problems in finding a comprehensive guide to
setting
up NFSv4 on FreeBSD. This is somewhat surprising in itself as I have
found
that information on how to do things in FreeBSD to be very good.

Here are the relevant lines in /etc/rc.conf:

rpcbind_enable=YES
nfs_server_enable=YES
nfsv4_server_enable=YES
nfsuserd_enable=YES
nfscbd_enable=YES
Needed for client side only, and only if delegations are
enabled at the server and you want the client to acquire
delegations. (Delegations are not enabled by default on the
FreeBSD NFSv4 server.)

mountd_enable=YES
gssd_enable=YES
rpc_lockd_enable=YES
rpc_statd_enable=YES
You shouldn't need rpc_lockd or rpc_statd for NFSv4,
since they are only used for NFSv3.

zfs_enable=YES

Here is the content of /etc/exports:

/export/shared/testshare -sec=krb5p
V4: / -sec=krb5p

Another interesting aspect is that when I used `tcpdump` to record the
NFS
network traffic between the client and the server, I saw NFS3 packets
together with the NFS4 packets. Both of these packet types contained
encrypted data, so I still think Kerberos was used, but given the
configuration above, I would have expected there to be nothing but
NFS4
traffic.

I don't know why a Linux client would mix NFSv3 RPCs with NFSv4 ones,
but you can look at a raw packet capture done by tcpdump in wireshark.
(Unlike tcpdump, wireshark understands NFS and RPC packets, so you can
look at what is there.) You might have been seeing rpc.lockd/rpc.statd

Re: Possible bug in NFSv4 with krb5p security?

2013-02-12 Thread Elias Mårtenson

On 12 February 2013 23:20, Rick Macklem rmack...@uoguelph.ca wrote:

There is (in case you missed it on google):
 http://code.google.com/p/macnfsv4/wiki/FreeBSD8KerberizedNFSSetup
 (Nothing much has changed since FreeBSD8, except the name of the client
  side patch for host based initiator credentials in the keytab file.)
 I was hoping others would add/update the wiki and it would eventually
 become FreeBSD doc, but that hasn't happened.


Thank you for the link. I have indeed found that, and I have followed it to
the letter.

I have up the exact same thing from Ubuntu machines as well as from
Solaris, and I do have a fairly good understanding of Kerberos. FreeBSD
however, is pretty new to me.


 Other than that, the various discussions in the archive on this list
 may help. Unfortunately, I don't know of an easy way to figure out
 what is busted. I always suggest looking at the packets in wireshark,
 but for some reason, I get the impression that folk don't like
 doing this? It is what I do first when I run into NFS issues.


I've been looking at the dumps using Wireshark. Well, I had to drop down
the security since everything is encrypted when using keb5p. I do get the
same errors using sec=krb5.

When looking at this trace, I see a normal OPEN request followed by a
NFS4ERR_ACCESS as a reply. The Kerberos credentials are of course
encrypted, so I can't really say anything about that part.

Note that NFS4 with Kerberos security never uses the user ID numbers. They
purely use the Kerberos principals for authorisation. This is different
from the default sys security model that blindly uses user ID's.


  nfscbd_enable=YES
 Needed for client side only, and only if delegations are
 enabled at the server and you want the client to acquire
 delegations. (Delegations are not enabled by default on the
 FreeBSD NFSv4 server.)


Noted. Thank you. I will change this.


  rpc_lockd_enable=YES
  rpc_statd_enable=YES
 You shouldn't need rpc_lockd or rpc_statd for NFSv4,
 since they are only used for NFSv3.


Good point. I'll disable those too.


 I don't know why a Linux client would mix NFSv3 RPCs with NFSv4 ones,


I was suggested to set vfs.nfsd.server_min_nfsvers to 4 in order to
completely disable NFS version below 4. I did this and got rid of the stray
NFS3 requests. It didn't solve the original problem though.


  If anyone is able to confirm whether or not this actually has been
  tested
  in 9.1-CURRENT, I'd appreciate it. Also, if not, then I'd love to know
  where I should start looking for a solution. I'm experienced in system
  level programming (having worked on Solaris at Sun in a previous
  life), but
  a pointer where to start would be helpful.
 
 Usually, when everything is being done by nobody, it indicates that
 the mapping between uid, gids - name@domain isn't working correctly.
 (Looking at the packets in wireshark, you need to look at the attributes
  called Owner and Owner_Group to see what they are being set to.)


I actually doubt this. First of all, I have the correct idmapd setup
working from FreeBSD to Ubuntu (I can see that since I can see the correct
user names in ls even though the user ID's differ). On OSX I haven't got
it to work yet.

But, the behaviour is the same on both systems.

This is actually expected, as the permission checks are orthogonal to the
ID mapping.


 The most common problem (since you do have nfsuserd running on the server)
 is for the domain spec to be different between client and server.
 FreeBSD's nfsuserd defaults to the domain part of the machine's hostname.
 Linux's rpc.idmapd sets the domain from /etc/idmapd.conf (at least I think
 that's what it is called) and many distros ship with it set to my.domain
 or something like that.


Correct. I have set this correctly. I know this, since once I did, ls
started giving me the correct user names.


 As such, I'd start by checking the Linux client and seeing what it has for
 the domain spec. in /etc/idmapd.conf.

 If you want to override the default for FreeBSD, there is a command line
 option for nfsuserd to do this. I can't remember what it is, but man
 nfsuserd
 will give you the answer and it can be set in /etc/rc.conf using
 nfsuserd_flags.


Thank you. I'm definitely willing to double-check this and I am not going
to claim to know exactly what's going on in the realms of NFS security. :-)


 If this is configured correctly, then looking at the packets in wireshark
 is
 the best starting point w.r.t. figuring out what is broken. I do limited
 testing of this and it works for me. I don't know how many others use it,
 although some definitely have fun getting it working. (Usually it is the
 Kerberos part on the client side that causes the most grief.)


It certainly is fun. But it gets frustrating when one fights a single
problem for weeks on end.

Far too few shops use Kerberos though.

Regards,
Elias
___
freebsd-current@freebsd.org mailing list

Re: Possible bug in NFSv4 with krb5p security?

2013-02-12 Thread Rick Macklem

Elias Martenson wrote:
 On 12 February 2013 23:20, Rick Macklem  rmack...@uoguelph.ca 
 wrote:
 
 
 
 
 
 
 There is (in case you missed it on google):
 http://code.google.com/p/macnfsv4/wiki/FreeBSD8KerberizedNFSSetup
 (Nothing much has changed since FreeBSD8, except the name of the
 client
 side patch for host based initiator credentials in the keytab file.)
 I was hoping others would add/update the wiki and it would eventually
 become FreeBSD doc, but that hasn't happened.
 
 
 
 
 Thank you for the link. I have indeed found that, and I have followed
 it to the letter.
 
 
 I have up the exact same thing from Ubuntu machines as well as from
 Solaris, and I do have a fairly good understanding of Kerberos.
 FreeBSD however, is pretty new to me.
 
 
 Other than that, the various discussions in the archive on this list
 may help. Unfortunately, I don't know of an easy way to figure out
 what is busted. I always suggest looking at the packets in wireshark,
 but for some reason, I get the impression that folk don't like
 doing this? It is what I do first when I run into NFS issues.
 
 
 
 I've been looking at the dumps using Wireshark. Well, I had to drop
 down the security since everything is encrypted when using keb5p. I do
 get the same errors using sec=krb5.
 
 
 When looking at this trace, I see a normal OPEN request followed by a
 NFS4ERR_ACCESS as a reply. The Kerberos credentials are of course
 encrypted, so I can't really say anything about that part.
 
Well, it sounds like you are doing all the right stuff, so I don't know
why it is returning EACCES?

I'm not a ZFS person, so I never test that. If you have a UFS file system
you could export for testing, that might be worth a try. ZFS likes to do
things differently;-)

You can look at the authentication stuff in the RPC header:
Actually, the credentials in the RPC header aren't encrypted, although
they are binary data. It's been a while since I looked at the RFC, but
the authenticator is basically:
- an RPCSEC_GSS version (must be 1 for FreeBSD to be able to use it)
- a type that will be DATA in this case
- a credential handle (just a blob of bits the server uses as a shorthand
for the principal)
- a sequence# used to subvert replay attempts

Then the authentication verifier is an encrypted checksum of the above,
that the server uses to verify it.

All the Kerberos stuff happens via NFS Null RPCs, where the GSSAPI tokens
are passed as data (a Null RPC has no arguments) and the credential handle
and a session key get established. (The Kerberos ticket is inside the GSSAPI
token for the first of these Null RPCs.)

 
 Note that NFS4 with Kerberos security never uses the user ID numbers.
 They purely use the Kerberos principals for authorisation. This is
 different from the default sys security model that blindly uses user
 ID's.
 
Yep, of course. The Kerberos user principal name@REALM is translated
to a uid + gid list by the gssd via a lookup of name as the username.
The uid + gid list is then associated with that credential handle I mentioned
above.

 
 
 
  nfscbd_enable=YES
 Needed for client side only, and only if delegations are
 enabled at the server and you want the client to acquire
 delegations. (Delegations are not enabled by default on the
 FreeBSD NFSv4 server.)
 
 
 
 Noted. Thank you. I will change this.
 
 
 
  rpc_lockd_enable=YES
  rpc_statd_enable=YES
 You shouldn't need rpc_lockd or rpc_statd for NFSv4,
 since they are only used for NFSv3.
 
 
 
 Good point. I'll disable those too.
 
 
 
 I don't know why a Linux client would mix NFSv3 RPCs with NFSv4 ones,
 
 
 I was suggested to set vfs.nfsd.server_min_nfsvers to 4 in order to
 completely disable NFS version below 4. I did this and got rid of the
 stray NFS3 requests. It didn't solve the original problem though.
 
 
 
  If anyone is able to confirm whether or not this actually has been
  tested
  in 9.1-CURRENT, I'd appreciate it. Also, if not, then I'd love to
  know
  where I should start looking for a solution. I'm experienced in
  system
  level programming (having worked on Solaris at Sun in a previous
  life), but
  a pointer where to start would be helpful.
 
 Usually, when everything is being done by nobody, it indicates that
 the mapping between uid, gids - name@domain isn't working
 correctly.
 (Looking at the packets in wireshark, you need to look at the
 attributes
 called Owner and Owner_Group to see what they are being set to.)
 
 
 
 I actually doubt this. First of all, I have the correct idmapd setup
 working from FreeBSD to Ubuntu (I can see that since I can see the
 correct user names in ls even though the user ID's differ). On OSX I
 haven't got it to work yet.
 
 
 But, the behaviour is the same on both systems.
 
 
 This is actually expected, as the permission checks are orthogonal to
 the ID mapping.
 
 
 The most common problem (since you do have nfsuserd running on the
 server)
 is for the domain spec to be different between client and server.
 FreeBSD's

Re: A possible bug?

2003-06-04 Thread Donn Miller



Zbynek Houska wrote:
Dear all,

  when I tryied to mount an iso image over network (using samba) my computer
unexpectedly crashed.
 I issued this command : mdconfig -a -t vnode -f
/path/to/my/file/mounted/on-local-machine
 and since that kernel crashed, no ping response, nothing at all. I've been
connected to this machine via ssh.
I use 5.0 RELEASE with GENERIC kernel on P3/1 Ghz, 128 MB RAM.
It happens again when I issue same command and therefore I tryied to add
dumpon=Yes to my /etc/rc.conf, but nothing has been written to /var/crash.
So I enclose message from screen:
Fatal trap 12   : page fault while in kernel mode
faukt virtual address = 0x0
fault code= supervisor read, page not present
instruction pointer= 0x8:0xc1c50d4a
stack pointer   = 0x10:0xcd1b8718
frame pointer   = 0x10:0xcd1b8738
code segment = base 0x0, limit 0xf, type 0x1b
 = DPL 0, pres 1, def31 1, gran 1
processor eflags  = interrupt enabled,  resume,IOPL = 0
current proccess = 549 (mdconfig)
trap number= 12
panic:fault page
I'm seeing the same thing with 5.1-RC1.  Tried to mount an ISO image 
with mdconfig -a -t vnode -f isofile -u 0 and FreeBSD immediately 
panic-ed.  The iso file resided on a samba mount, which I had mounted 
with mount_smbfs.  I'll try cp-ing the file to my local UFS filesystem, 
and then try mdconfig, and see if I get the panic again.

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]

A possible bug?

2003-05-28 Thread Zbynek Houska

Dear all,

  when I tryied to mount an iso image over network (using samba) my computer
unexpectedly crashed.

 I issued this command : mdconfig -a -t vnode -f
/path/to/my/file/mounted/on-local-machine

 and since that kernel crashed, no ping response, nothing at all. I've been
connected to this machine via ssh.
I use 5.0 RELEASE with GENERIC kernel on P3/1 Ghz, 128 MB RAM.

It happens again when I issue same command and therefore I tryied to add
dumpon=Yes to my /etc/rc.conf, but nothing has been written to /var/crash.
So I enclose message from screen:

Fatal trap 12   : page fault while in kernel mode
faukt virtual address = 0x0
fault code= supervisor read, page not present
instruction pointer= 0x8:0xc1c50d4a
stack pointer   = 0x10:0xcd1b8718
frame pointer   = 0x10:0xcd1b8738
code segment = base 0x0, limit 0xf, type 0x1b
 = DPL 0, pres 1, def31 1, gran 1
processor eflags  = interrupt enabled,  resume,IOPL = 0
current proccess = 549 (mdconfig)
trap number= 12
panic:fault page


Hope someone could explain me what happened.

Zbynek 

---
Zbynek Houska
IT Dept.
Foxconn CZ
Pardubice
Czech Republic

phone: (+420)466057289

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]

FW: Re: Error with post 1.1 release Postfix and Cyrus -Possible Bug in VM system

2002-05-30 Thread David W. Chapman Jr.

Do we have anyone working on the VM system that could look at this?

- Forwarded message from Wietse Venema [EMAIL PROTECTED] -

Date: Thu, 30 May 2002 12:49:10 -0400 (EDT)
Reply-To: Postfix users [EMAIL PROTECTED]
From: [EMAIL PROTECTED] (Wietse Venema)
To: Postfix users [EMAIL PROTECTED]
Subject: Re: Error with post 1.1 release Postfix and Cyrus
X-Mailer: ELM [version 2.4ME+ PL82 (25)]
Sender: [EMAIL PROTECTED]

You're somehow still running qmgr code that speaks the protocol
from before 20020514.

To find the file,

# find / \( -name qmgr -o -name nqmgr \) -ls

But you may not find this file.

After upgrading Postfix I very, very, occasionally find that FreeBSD
will execute a new process from an old file that was just replaced.

Postfix always installs executables by using mv newfile oldfile.
At this time, the old file may still be executing, and the parent
process is always executing (the Postfix master daemon).

I suspect an obscure VM system bug. postfix reload does not seem
to cure this condition. The problem goes away after postfix stop
then postfix start, which terminates the parent process.

This has happened to me only twice over the past year. My server
and workstations run FreeBSD versions 4.1 - 4.4. I haven't found
the time and energy to debug this.

Wietse
-
To unsubscribe, send mail to [EMAIL PROTECTED] with content
(not subject): unsubscribe postfix-users

- End forwarded message -

-- 
David W. Chapman Jr.
[EMAIL PROTECTED]   Raintree Network Services, Inc. www.inethouston.net
[EMAIL PROTECTED]   FreeBSD Committer www.FreeBSD.org

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message

Re: Possible bug in /sys/i386/pci/pci_cfgreg.c

2002-04-14 Thread M. Warner Losh


I talked with klaus via IRC on #newcard on Friday.  Turns out that the
'0' in question isn't in INTLINE, but rather part of the PIR table
listing which interrupts are valid.  I have a patch in my local tree
that I hope to commit shortly.

I thought about fixing the powerof2 macro, but since it was last
changed in 1994 (likely earlier than that, since this was in file rev
1.1), I took the cowards way out and just fixed where we used it.

Warner

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message

Re: Possible bug in /sys/i386/pci/pci_cfgreg.c

2002-04-12 Thread M. Warner Losh


In message: Pine.GSO.4.44.0204121353110.3002-10@sunhalle19
Klaus Leibrandt [EMAIL PROTECTED] writes:
: Hi Folks.
: 
: I spent some time tracking down my CardBus problem and found the
: following:
: 
: In /sys/i386/pci/pci_cfgreg.c:
: In Function: static int pci_cfgintr_linked(struct PIR_entry *pe, int pin):
: 
: There you can find this code (my changes included):
: 
:   /* link destination mapped to a unique interrupt? */
: 
: /*if (powerof2(pi-irqs)) { // i changed this line to the next
: line of code */
:  /* the reason is given below*/
: 
: /* On my system the CardBus bridge has no interrupt assigned (lazy BIOS I
: would say).
:So pi-irqs is 0. powerof2 returns true (which is wrong as ahown
: below).
:Then ffs returs 0. that - 1 gives an irq of -1 whis is totally wrong)
:The if clause must evaluate to false if there is no interrut assigned
:otherwise the system will panic during boot as I experineced it. */

Ah, 0 is a *BOGUS* way of saying no interrupt assigned, per the PCI
spec.  However, lots of BIOSes use it, so its meaning must be defined
in some other document I've never laid eyes on since more and more
places are using it.

0 should be mapped in the i386 level.

Try the following patch and let me know how it works for you.  There
is also some code now in dev/pci/pci.c that should be removed (since
it is only a subset of this stuff):

if (cfg-intpin  0  cfg-intline != 255
-#ifdef __i386__
-cfg-intline != 0
-#endif
) {

if you have a new enough kernel.

Warner

Index: pci_cfgreg.c
===
RCS file: /cache/ncvs/src/sys/i386/pci/pci_cfgreg.c,v
retrieving revision 1.83
diff -u -r1.83 pci_cfgreg.c
--- pci_cfgreg.c16 Mar 2002 23:02:41 -  1.83
+++ pci_cfgreg.c12 Apr 2002 15:26:06 -
@@ -178,6 +178,7 @@
 u_int32_t
 pci_cfgregread(int bus, int slot, int func, int reg, int bytes)
 {
+uint32_t line, pin;
 #ifdef APIC_IO
 /*
  * If we are using the APIC, the contents of the intline register will probably
@@ -186,7 +187,6 @@
  * attempts to read them and translate to our private vector numbers.
  */
 if ((reg == PCIR_INTLINE)  (bytes == 1)) {
-   int pin, line;
 
pin = pci_do_cfgregread(bus, slot, func, PCIR_INTPIN, 1);
line = pci_do_cfgregread(bus, slot, func, PCIR_INTLINE, 1);
@@ -217,6 +217,18 @@
}
return(line);
 }
+#else
+/*
+ * Some BIOS writers seem to want to ignore the spec and put
+ * 0 in the intline rather than 255 to indicate none.  The rest of
+ * the code uses 255 as an invalid IRQ.
+ */
+if (reg == PCIR_INTLINE  bytes == 1) {
+   line = pci_do_cfgregread(bus, slot, func, PCIR_INTLINE, 1);
+   pin = pci_do_cfgregread(bus, slot, func, PCIR_INTPIN, 1);
+   if (pin != 0  (line == 0 || line = 128))
+   return (255);
+}
 #endif /* APIC_IO */
 return(pci_do_cfgregread(bus, slot, func, reg, bytes));
 }
@@ -394,14 +406,6 @@
(pci_get_slot(*childp) == device) 
(pci_get_intpin(*childp) == matchpin)) {
irq = pci_get_irq(*childp);
-   /*
-* Some BIOS writers seem to want to ignore the spec and put
-* 0 in the intline rather than 255 to indicate none.  Once
-* we've found one that matches, we break because there can
-* be no others (which is why test looks a little odd).
-*/
-   if (irq == 0)
-   irq = 255;
if (irq != 255)
PRVERB((pci_cfgintr_search: linked (%x) to configured irq %d at 
%d:%d:%d\n,
  pe-pe_intpin[pin - 1].link, irq,

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message

Re: A possible bug in the interrupt thread preemption code [Was:

2001-02-26 Thread Maxim Sobolev


John Baldwin wrote:

 On 22-Feb-01 Maxim Sobolev wrote:
  John Baldwin wrote:
 
  A recursive sched_lock?  Erm, well, stick these options in your kernel
  config:
 
  options KTR
  options KTR_EXTEND
  options KTR_COMPILE=KTR_LOCK
  options KTR_MASK=KTR_MASK
 
  Bah, it even doesn't compile with these options:
  cc -c -pipe -O -march=pentium -Wall -Wredundant-decls -Wnested-externs
  -Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith -Winline
  -Wcast-qual
  -fformat-extensions -ansi  -nostdinc -I-  -I. -I../.. -I../../dev
  -I../../../include -I../../contrib/dev/acpica/Subsystem/Include  -D_KERNEL
  -include
  opt_global.h -elf  -mpreferred-stack-boundary=2  ../../kern/kern_ktr.c
  ../../kern/kern_ktr.c: In function `__Tunable_ktr_mask':
  ../../kern/kern_ktr.c:95: `KTR_MASK' undeclared (first use in this function)
  ../../kern/kern_ktr.c:95: (Each undeclared identifier is reported only once
  ../../kern/kern_ktr.c:95: for each function it appears in.)
  *** Error code 1
  1 error

 Oh, whoops, that should be:

 options KTR_MASK=KTR_LOCK

Update: I'm still unable to boot kernel on my machine even into single user.
Following is backtrace from ddb (after commenting out enable_intr() in
trap.c::trace() as usually):

Fatal trap 9: general protection fault while in kernel mode
instruction pointer = 0x8:0xc0265e36
stask pointer  = 0x10:0xc3577f50
frame pointer  = 0x10:0xc3577f64
code segment  = base 0x0, limit 0xf, type 0x1b
   = DPL 0, pres 1, def32 1, gran 1
processor flags  = resume, IOPL = 0
current process  = 16 (irq14: ata0)
kernel: type 9 trap, code=0
Stopped at sw1b+0x7c: ltr %si
db trace
sw1b(c0147c74, c0147c74, 0, c32c1da0, c3577f94) at sw1b+0x7c
ithread_loop(c0741c00, c3577fa8) at ithread_loop+0x67b
fork_exit(c0147c74, c0741c00, c3577fa8) at fork_exit+0xd6
fork_trampoline() at fork_trampoline+0x8

-Maxim


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

A possible bug in the interrupt thread preemption code [Was: Kernel panic in irq14: ata0]

2001-02-22 Thread Maxim Sobolev


Dag-Erling Smorgrav wrote:

 Maxim Sobolev [EMAIL PROTECTED] writes:
  It's not an ata specific problem, but rather a problem of all ISA
  devices (I have an ISA based ata controller).

 I don't think it has anything to do with ISA. I've had similar
 problems on a PCI-only system (actually, PCI+EISA motherboard with no
 EISA cards) with no ATA devices (disks, CD-ROM and streamer are all
 SCSI).

 Considering that backing out rev 1.14 of ithread.c eliminates the
 panics, and that that revision is supposed to enable interrupt thread
 preemption, and that the crashed kernels show signs of stack smashing,
 I'd say the cause is probably a bug in the preemption code.

Update: the bug is still here, as of -current from 22 Feb. Hovewer, this time
it even doesn't let to boot into single-user with following panic message:
kernel trap 12 with interrupts disabled
panic: mutex sched lock recursed at ../../kern/kern_synch.c:872

syncing disks...

-Maxim



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: A possible bug in the interrupt thread preemption code [Was:

2001-02-22 Thread Dag-Erling Smorgrav


John Baldwin [EMAIL PROTECTED] writes:
 On 22-Feb-01 Maxim Sobolev wrote:
  Dag-Erling Smorgrav wrote:
  
  Maxim Sobolev [EMAIL PROTECTED] writes:
   It's not an ata specific problem, but rather a problem of all ISA
   devices (I have an ISA based ata controller).
 
  I don't think it has anything to do with ISA. I've had similar
  problems on a PCI-only system (actually, PCI+EISA motherboard with no
  EISA cards) with no ATA devices (disks, CD-ROM and streamer are all
  SCSI).
 
  Considering that backing out rev 1.14 of ithread.c eliminates the
  panics, and that that revision is supposed to enable interrupt thread
  preemption, and that the crashed kernels show signs of stack smashing,
  I'd say the cause is probably a bug in the preemption code.
  
  Update: the bug is still here, as of -current from 22 Feb. Hovewer, this time
  it even doesn't let to boot into single-user with following panic message:
  kernel trap 12 with interrupts disabled
  panic: mutex sched lock recursed at ../../kern/kern_synch.c:872
 
 E.  That would be something that is leaking sched_lock.  Hmm...

I have another sched_lock-related problem which showed up over the
weekend. Starting StarOffice 5.2 invariably triggers the following
panic:

root@aes /var/crash# gdb -k
sGNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-unknown-freebsd".
(kgdb) source ~des/kgdb
(kgdb) kernel 0
IdlePTD 3526656
initial pcb at 2cb980
panicstr: from debugger
panic messages:
---
panic: mutex sched lock not owned at ../../posix4/ksched.c:215
panic: from debugger
Uptime: 3m37s

dumping to dev ad0b, offset 262528
dump ata0: resetting devices .. done
127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 107 
106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 
80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 
51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 
22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
---
#0  dumpsys () at ../../kern/kern_shutdown.c:476
476 if (dumping++) {
(kgdb) where
#0  dumpsys () at ../../kern/kern_shutdown.c:476
#1  0xc0187a04 in boot (howto=260) at ../../kern/kern_shutdown.c:319
#2  0xc0187dd9 in panic (fmt=0xc02521b4 "from debugger")
at ../../kern/kern_shutdown.c:569
#3  0xc011cdad in db_panic (addr=-1071459127, have_addr=0, count=-1,
modif=0xc879cd9c "") at ../../ddb/db_command.c:433
#4  0xc011cd4b in db_command (last_cmdp=0xc0285420, cmd_table=0xc0285280,
aux_cmd_tablep=0xc02b68bc) at ../../ddb/db_command.c:333
#5  0xc011ce12 in db_command_loop () at ../../ddb/db_command.c:455
#6  0xc011f07f in db_trap (type=3, code=0) at ../../ddb/db_trap.c:71
#7  0xc022d258 in kdb_trap (type=3, code=0, regs=0xc879ce9c)
at ../../i386/i386/db_interface.c:164
#8  0xc023a098 in trap (frame={tf_fs = -1060962280, tf_es = -932118512,
  tf_ds = -1060962288, tf_edi = -1071197888, tf_esi = 256,
  tf_ebp = -931541272, tf_isp = -931541304, tf_ebx = 514,
  tf_edx = -1071149169, tf_ecx = -1070757120, tf_eax = 18, tf_trapno = 3,
  tf_err = 0, tf_eip = -1071459127, tf_cs = 8, tf_eflags = 70,
  tf_esp = -1071149185, tf_ss = -1071240285}) at ../../i386/i386/trap.c:615
#9  0xc022d4c9 in Debugger (msg=0xc0262ba3 "panic") at machine/cpufunc.h:60
#10 0xc0187dd0 in panic (fmt=0xc0261a48 "mutex %s not owned at %s:%d")
at ../../kern/kern_shutdown.c:567
#11 0xc0180c89 in _mtx_assert (m=0xc02e3e20, what=1,
file=0xc026d140 "../../posix4/ksched.c", line=215)
---Type return to continue, or q return to quit---
at ../../kern/kern_mutex.c:611
#12 0xc01f0d51 in ksched_yield (ret=0xc8712f24, ksched=0xc0a97660)
at ../../posix4/ksched.c:215
#13 0xc01f100b in sched_yield (p=0xc8712dc0, uap=0xc879cf80)
at ../../posix4/p1003_1b.c:225
#14 0xc023b239 in syscall (frame={tf_fs = 47, tf_es = 47, tf_ds = 47,
  tf_edi = -1077939044, tf_esi = 706867048, tf_ebp = -1077939116,
  tf_isp = -931541036, tf_ebx = 714966384, tf_edx = 1, tf_ecx = 134979841,
  tf_eax = 158, tf_trapno = 22, tf_err = 2, tf_eip = 717073383,
  tf_cs = 31, tf_eflags = 514, tf_esp = -1077939144, tf_ss = 47})
at ../../i386/i386/trap.c:1191
#15 0xc022dbe3 in Xint0x80_syscall ()
#16 0x2a182a9e in ?? ()
#17 0x2a18a328 in ?? ()
#18 0x2a057f6b in ?? ()
#19 0x2a057eb5 in ?? ()
#20 0x28f5e2a9 in ?? ()
#21 0x28191db5 in ?? ()
#22 0x80513a3 in ?? ()
#23 0x28f55eab in ?? ()
#24 0x80512da in ?? ()
#25 0x2a059cf1 in ?? ()
#26 0x2a181e35 in ?? ()
---Type return to continue, or q return to quit---
#27 0x2ab551eb in ?? ()
(kgdb)

DES
-- 
Dag-Erling Smorgrav - [EMAIL PROTECTED]

To

Re: A possible bug in the interrupt thread preemption code [Was:

2001-02-22 Thread Maxim Sobolev


John Baldwin wrote:

 On 22-Feb-01 Maxim Sobolev wrote:
  Dag-Erling Smorgrav wrote:
 
  Maxim Sobolev [EMAIL PROTECTED] writes:
   It's not an ata specific problem, but rather a problem of all ISA
   devices (I have an ISA based ata controller).
 
  I don't think it has anything to do with ISA. I've had similar
  problems on a PCI-only system (actually, PCI+EISA motherboard with no
  EISA cards) with no ATA devices (disks, CD-ROM and streamer are all
  SCSI).
 
  Considering that backing out rev 1.14 of ithread.c eliminates the
  panics, and that that revision is supposed to enable interrupt thread
  preemption, and that the crashed kernels show signs of stack smashing,
  I'd say the cause is probably a bug in the preemption code.
 
  Update: the bug is still here, as of -current from 22 Feb. Hovewer, this time
  it even doesn't let to boot into single-user with following panic message:
  kernel trap 12 with interrupts disabled
  panic: mutex sched lock recursed at ../../kern/kern_synch.c:872

 E.  That would be something that is leaking sched_lock.  Hmm...

 Got a backtrace?  What is really annoying is that preemption has been in the
 kernel since Feb 1.  I just accidentally turned it off in the ithread code
 reorganization and then turned it back on.  It was off for a few hours after
 only being on for 2 weeks, and now everyone magically has problems.

Here it is (from DDB):
panic(c027de93,c0297409,c027f878,368,80286)
_mtx_assert(c02ea000,9,c027f878,368,80286)
mi_switch(c32c5da0,3,c02cea44,c357be98)
ithread_schedule(c0747c00,1)
sched_ithd(e)
Xresume14()
--- interrupt, eip = 0xc025b60f, esp = 0x80296, ebp = 0xc357bf08 ---
trap(18, 10, 10,c01597b6,20)
calltrap()
--- trap 0x9, eip = 0xc025a5de, esp = 0xc357bf50, ebp = 0xc357bf64 ---
sw1b(c0146cbc,c0146cbc,c32c5da0,c357bf94)
ithread_loop(c0747c00,c357bfa8)
fork_exit(c0146cbc,c0747c00,c357bfa8)
fork_trampoline()

-Maxim


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: A possible bug in the interrupt thread preemption code [Was:

2001-02-22 Thread John Baldwin



On 22-Feb-01 Maxim Sobolev wrote:
 John Baldwin wrote:
 
 On 22-Feb-01 Maxim Sobolev wrote:
  Dag-Erling Smorgrav wrote:
 
  Maxim Sobolev [EMAIL PROTECTED] writes:
   It's not an ata specific problem, but rather a problem of all ISA
   devices (I have an ISA based ata controller).
 
  I don't think it has anything to do with ISA. I've had similar
  problems on a PCI-only system (actually, PCI+EISA motherboard with no
  EISA cards) with no ATA devices (disks, CD-ROM and streamer are all
  SCSI).
 
  Considering that backing out rev 1.14 of ithread.c eliminates the
  panics, and that that revision is supposed to enable interrupt thread
  preemption, and that the crashed kernels show signs of stack smashing,
  I'd say the cause is probably a bug in the preemption code.
 
  Update: the bug is still here, as of -current from 22 Feb. Hovewer, this
  time
  it even doesn't let to boot into single-user with following panic message:
  kernel trap 12 with interrupts disabled
  panic: mutex sched lock recursed at ../../kern/kern_synch.c:872

 E.  That would be something that is leaking sched_lock.  Hmm...

 Got a backtrace?  What is really annoying is that preemption has been in the
 kernel since Feb 1.  I just accidentally turned it off in the ithread code
 reorganization and then turned it back on.  It was off for a few hours after
 only being on for 2 weeks, and now everyone magically has problems.
 
 Here it is (from DDB):
 panic(c027de93,c0297409,c027f878,368,80286)
 _mtx_assert(c02ea000,9,c027f878,368,80286)
 mi_switch(c32c5da0,3,c02cea44,c357be98)
 ithread_schedule(c0747c00,1)
 sched_ithd(e)
 Xresume14()
 --- interrupt, eip = 0xc025b60f, esp = 0x80296, ebp = 0xc357bf08 ---
 trap(18, 10, 10,c01597b6,20)
 calltrap()
 --- trap 0x9, eip = 0xc025a5de, esp = 0xc357bf50, ebp = 0xc357bf64 ---
 sw1b(c0146cbc,c0146cbc,c32c5da0,c357bf94)
 ithread_loop(c0747c00,c357bfa8)
 fork_exit(c0146cbc,c0747c00,c357bfa8)
 fork_trampoline()

*sigh*  This is why enabling interrupts in trap() is such a bad idea.  If we
get a trap in the scheduler, then lots of bad crap starts to happen because we
can get an interrupt while we are in a trap. :( Can you compile your kernel with
INVARIANTS on though, as I think the kernel should've panic'd earlier if it is
doing what I think it is doing.  Also, if you are feeling industrious, edit
sys/i386/i386/trap.c and comment out the enable_intr() call near the beginning
of the trap() function right after the printf for 'kernel trap %d with
interrupts disabled'.
 
 -Maxim

-- 

John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: A possible bug in the interrupt thread preemption code [Was:

2001-02-22 Thread John Baldwin



On 22-Feb-01 Dag-Erling Smorgrav wrote:
 John Baldwin [EMAIL PROTECTED] writes:
 On 22-Feb-01 Maxim Sobolev wrote:
  Dag-Erling Smorgrav wrote:
  
  Maxim Sobolev [EMAIL PROTECTED] writes:
   It's not an ata specific problem, but rather a problem of all ISA
   devices (I have an ISA based ata controller).
 
  I don't think it has anything to do with ISA. I've had similar
  problems on a PCI-only system (actually, PCI+EISA motherboard with no
  EISA cards) with no ATA devices (disks, CD-ROM and streamer are all
  SCSI).
 
  Considering that backing out rev 1.14 of ithread.c eliminates the
  panics, and that that revision is supposed to enable interrupt thread
  preemption, and that the crashed kernels show signs of stack smashing,
  I'd say the cause is probably a bug in the preemption code.
  
  Update: the bug is still here, as of -current from 22 Feb. Hovewer, this
  time
  it even doesn't let to boot into single-user with following panic message:
  kernel trap 12 with interrupts disabled
  panic: mutex sched lock recursed at ../../kern/kern_synch.c:872
 
 E.  That would be something that is leaking sched_lock.  Hmm...
 
 I have another sched_lock-related problem which showed up over the
 weekend. Starting StarOffice 5.2 invariably triggers the following
 panic:
 
 root@aes /var/crash# gdb -k
 sGNU gdb 4.18
 Copyright 1998 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain conditions.
 Type "show copying" to see the conditions.
 There is absolutely no warranty for GDB.  Type "show warranty" for details.
 This GDB was configured as "i386-unknown-freebsd".
 (kgdb) source ~des/kgdb
 (kgdb) kernel 0
 IdlePTD 3526656
 initial pcb at 2cb980
 panicstr: from debugger
 panic messages:
 ---
 panic: mutex sched lock not owned at ../../posix4/ksched.c:215

Easy enough.  It seems I missed adding sched_lock around a need_resched().
I'll fix in a second..
-- 

John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: A possible bug in the interrupt thread preemption code [Was:

2001-02-22 Thread John Baldwin



On 22-Feb-01 Maxim Sobolev wrote:

  Here it is (from DDB):
  panic(c027de93,c0297409,c027f878,368,80286)
  _mtx_assert(c02ea000,9,c027f878,368,80286)
  mi_switch(c32c5da0,3,c02cea44,c357be98)
  ithread_schedule(c0747c00,1)
  sched_ithd(e)
  Xresume14()
  --- interrupt, eip = 0xc025b60f, esp = 0x80296, ebp = 0xc357bf08 ---
  trap(18, 10, 10,c01597b6,20)
  calltrap()
  --- trap 0x9, eip = 0xc025a5de, esp = 0xc357bf50, ebp = 0xc357bf64 ---
  sw1b(c0146cbc,c0146cbc,c32c5da0,c357bf94)
  ithread_loop(c0747c00,c357bfa8)
  fork_exit(c0146cbc,c0747c00,c357bfa8)
  fork_trampoline()

 *sigh*  This is why enabling interrupts in trap() is such a bad idea.  If we
 get a trap in the scheduler, then lots of bad crap starts to happen because
 we
 can get an interrupt while we are in a trap. :( Can you compile your kernel
 with
 INVARIANTS on though, as I think the kernel should've panic'd earlier if it
 is
 doing what I think it is doing.
 
 It's already have INVARIANTS, MUTEX_DEBUG, WITNESS and WITNESS_DDB.

Hmm, ouch, you do'nt want MUTEX_DEBUG, that'll slow your system to a crawl.
 
  Also, if you are feeling industrious, edit
 sys/i386/i386/trap.c and comment out the enable_intr() call near the
 beginning
 of the trap() function right after the printf for 'kernel trap %d with
 interrupts disabled'.
 
 Ok, I'll try so.
 
 -Maxim

It will still panic, just hopefully a better panic.

-- 

John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: A possible bug in the interrupt thread preemption code [Was:

2001-02-22 Thread Dag-Erling Smorgrav


John Baldwin [EMAIL PROTECTED] writes:
 On 22-Feb-01 Maxim Sobolev wrote:
  It's already have INVARIANTS, MUTEX_DEBUG, WITNESS and WITNESS_DDB.
 Hmm, ouch, you do'nt want MUTEX_DEBUG, that'll slow your system to a crawl.

For the same reason, you probably want WITNESS_SKIPSPIN.

WITNESS_DDB is a bad idea, BTW, there's a (presumably harmless) lock
order reversal in the FS code that you're practically guaranteed to to
hit during boot.

DES
-- 
Dag-Erling Smorgrav - [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: A possible bug in the interrupt thread preemption code [Was:

2001-02-22 Thread John Baldwin



On 22-Feb-01 Dag-Erling Smorgrav wrote:
 John Baldwin [EMAIL PROTECTED] writes:
 On 22-Feb-01 Maxim Sobolev wrote:
  It's already have INVARIANTS, MUTEX_DEBUG, WITNESS and WITNESS_DDB.
 Hmm, ouch, you do'nt want MUTEX_DEBUG, that'll slow your system to a crawl.
 
 For the same reason, you probably want WITNESS_SKIPSPIN.

Not really.  WITNESS doesn't really bog down spin mutexes all that much.  It
has a very simple order checking that is nothing like the order checking for
sleep mutexes.  The killer for MUTEX_DEBUG is that each mtx_init() involves
walking a linked list of _all_ of the mutexes in the system and checking each
one with the one beign init'd to check for a duplicate init.

 WITNESS_DDB is a bad idea, BTW, there's a (presumably harmless) lock
 order reversal in the FS code that you're practically guaranteed to to
 hit during boot.

Well, they aren't necessarily harmless, but they've been around for a very long
time, so if they do cause rare lockups, they are rare at least.

 DES
 -- 
 Dag-Erling Smorgrav - [EMAIL PROTECTED]

-- 

John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: A possible bug in the interrupt thread preemption code [Was:

2001-02-22 Thread Maxim Sobolev


John Baldwin wrote:

 On 22-Feb-01 Maxim Sobolev wrote:

   Here it is (from DDB):
   panic(c027de93,c0297409,c027f878,368,80286)
   _mtx_assert(c02ea000,9,c027f878,368,80286)
   mi_switch(c32c5da0,3,c02cea44,c357be98)
   ithread_schedule(c0747c00,1)
   sched_ithd(e)
   Xresume14()
   --- interrupt, eip = 0xc025b60f, esp = 0x80296, ebp = 0xc357bf08 ---
   trap(18, 10, 10,c01597b6,20)
   calltrap()
   --- trap 0x9, eip = 0xc025a5de, esp = 0xc357bf50, ebp = 0xc357bf64 ---
   sw1b(c0146cbc,c0146cbc,c32c5da0,c357bf94)
   ithread_loop(c0747c00,c357bfa8)
   fork_exit(c0146cbc,c0747c00,c357bfa8)
   fork_trampoline()
 
  *sigh*  This is why enabling interrupts in trap() is such a bad idea.  If we
  get a trap in the scheduler, then lots of bad crap starts to happen because
  we
  can get an interrupt while we are in a trap. :( Can you compile your kernel
  with
  INVARIANTS on though, as I think the kernel should've panic'd earlier if it
  is
  doing what I think it is doing.
 
  It's already have INVARIANTS, MUTEX_DEBUG, WITNESS and WITNESS_DDB.

 Hmm, ouch, you do'nt want MUTEX_DEBUG, that'll slow your system to a crawl.

It doesn't really matter, because system can't even boot into single-user due to
panic.

   Also, if you are feeling industrious, edit
  sys/i386/i386/trap.c and comment out the enable_intr() call near the
  beginning
  of the trap() function right after the printf for 'kernel trap %d with
  interrupts disabled'.
 
  Ok, I'll try so.
 
  -Maxim

 It will still panic, just hopefully a better panic.

I did understand that, but the panic I see after the change is exactly the same as
before. Any other ideas?

-Maxim


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: A possible bug in the interrupt thread preemption code [Was:

2001-02-22 Thread John Baldwin



On 22-Feb-01 Maxim Sobolev wrote:
 John Baldwin wrote:
 
 On 22-Feb-01 Maxim Sobolev wrote:

   Here it is (from DDB):
   panic(c027de93,c0297409,c027f878,368,80286)
   _mtx_assert(c02ea000,9,c027f878,368,80286)
   mi_switch(c32c5da0,3,c02cea44,c357be98)
   ithread_schedule(c0747c00,1)
   sched_ithd(e)
   Xresume14()
   --- interrupt, eip = 0xc025b60f, esp = 0x80296, ebp = 0xc357bf08 ---
   trap(18, 10, 10,c01597b6,20)
   calltrap()
   --- trap 0x9, eip = 0xc025a5de, esp = 0xc357bf50, ebp = 0xc357bf64 ---
   sw1b(c0146cbc,c0146cbc,c32c5da0,c357bf94)
   ithread_loop(c0747c00,c357bfa8)
   fork_exit(c0146cbc,c0747c00,c357bfa8)
   fork_trampoline()
 
  *sigh*  This is why enabling interrupts in trap() is such a bad idea.  If
  we
  get a trap in the scheduler, then lots of bad crap starts to happen
  because
  we
  can get an interrupt while we are in a trap. :( Can you compile your
  kernel
  with
  INVARIANTS on though, as I think the kernel should've panic'd earlier if
  it
  is
  doing what I think it is doing.
 
  It's already have INVARIANTS, MUTEX_DEBUG, WITNESS and WITNESS_DDB.

 Hmm, ouch, you do'nt want MUTEX_DEBUG, that'll slow your system to a crawl.
 
 It doesn't really matter, because system can't even boot into single-user due
 to
 panic.
 
   Also, if you are feeling industrious, edit
  sys/i386/i386/trap.c and comment out the enable_intr() call near the
  beginning
  of the trap() function right after the printf for 'kernel trap %d with
  interrupts disabled'.
 
  Ok, I'll try so.
 
  -Maxim

 It will still panic, just hopefully a better panic.
 
 I did understand that, but the panic I see after the change is exactly the
 same as
 before. Any other ideas?

A recursive sched_lock?  Erm, well, stick these options in your kernel config:

options KTR
options KTR_EXTEND
options KTR_COMPILE=KTR_LOCK
options KTR_MASK=KTR_MASK

Then when it panics, use the 'show ktr' command to list the mutex operations up
until that point.  Hopefully you can see where it is grabbing sched lock the
first time and then not releasing it.  Also, hsa the backtrace changed at all? 
If not, then you may have commented out the wrong enable_intr(). :)

 -Maxim

-- 

John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: A possible bug in the interrupt thread preemption code [Was:

2001-02-22 Thread Maxim Sobolev


John Baldwin wrote:

 On 22-Feb-01 Maxim Sobolev wrote:
  John Baldwin wrote:
 
  On 22-Feb-01 Maxim Sobolev wrote:
 
Here it is (from DDB):
panic(c027de93,c0297409,c027f878,368,80286)
_mtx_assert(c02ea000,9,c027f878,368,80286)
mi_switch(c32c5da0,3,c02cea44,c357be98)
ithread_schedule(c0747c00,1)
sched_ithd(e)
Xresume14()
--- interrupt, eip = 0xc025b60f, esp = 0x80296, ebp = 0xc357bf08 ---
trap(18, 10, 10,c01597b6,20)
calltrap()
--- trap 0x9, eip = 0xc025a5de, esp = 0xc357bf50, ebp = 0xc357bf64 ---
sw1b(c0146cbc,c0146cbc,c32c5da0,c357bf94)
ithread_loop(c0747c00,c357bfa8)
fork_exit(c0146cbc,c0747c00,c357bfa8)
fork_trampoline()
  
   *sigh*  This is why enabling interrupts in trap() is such a bad idea.  If
   we
   get a trap in the scheduler, then lots of bad crap starts to happen
   because
   we
   can get an interrupt while we are in a trap. :( Can you compile your
   kernel
   with
   INVARIANTS on though, as I think the kernel should've panic'd earlier if
   it
   is
   doing what I think it is doing.
  
   It's already have INVARIANTS, MUTEX_DEBUG, WITNESS and WITNESS_DDB.
 
  Hmm, ouch, you do'nt want MUTEX_DEBUG, that'll slow your system to a crawl.
 
  It doesn't really matter, because system can't even boot into single-user due
  to
  panic.
 
Also, if you are feeling industrious, edit
   sys/i386/i386/trap.c and comment out the enable_intr() call near the
   beginning
   of the trap() function right after the printf for 'kernel trap %d with
   interrupts disabled'.
  
   Ok, I'll try so.
  
   -Maxim
 
  It will still panic, just hopefully a better panic.
 
  I did understand that, but the panic I see after the change is exactly the
  same as
  before. Any other ideas?

 A recursive sched_lock?  Erm, well, stick these options in your kernel config:

 options KTR
 options KTR_EXTEND
 options KTR_COMPILE=KTR_LOCK
 options KTR_MASK=KTR_MASK

 Then when it panics, use the 'show ktr' command to list the mutex operations up
 until that point.  Hopefully you can see where it is grabbing sched lock the
 first time and then not releasing it.

Ok, I'll do it and send results later.

  Also, hsa the backtrace changed at all?
 If not, then you may have commented out the wrong enable_intr(). :)

Did what you have suggested. Please see attached diff.

-Maxim


--- src/sys/i386/i386/trap.c2001/02/22 16:20:12 1.1
+++ src/sys/i386/i386/trap.c2001/02/22 16:20:58
@@ -264,7 +264,7 @@
 * We should walk p_heldmtx here and see if any are
 * spin mutexes, and not do this if so.
 */
-   enable_intr();
+/* enable_intr();*/
}
}

Re: A possible bug in the interrupt thread preemption code [Was:

2001-02-22 Thread Maxim Sobolev


John Baldwin wrote:

 On 22-Feb-01 Maxim Sobolev wrote:
  John Baldwin wrote:
 
  On 22-Feb-01 Maxim Sobolev wrote:
 
Here it is (from DDB):
panic(c027de93,c0297409,c027f878,368,80286)
_mtx_assert(c02ea000,9,c027f878,368,80286)
mi_switch(c32c5da0,3,c02cea44,c357be98)
ithread_schedule(c0747c00,1)
sched_ithd(e)
Xresume14()
--- interrupt, eip = 0xc025b60f, esp = 0x80296, ebp = 0xc357bf08 ---
trap(18, 10, 10,c01597b6,20)
calltrap()
--- trap 0x9, eip = 0xc025a5de, esp = 0xc357bf50, ebp = 0xc357bf64 ---
sw1b(c0146cbc,c0146cbc,c32c5da0,c357bf94)
ithread_loop(c0747c00,c357bfa8)
fork_exit(c0146cbc,c0747c00,c357bfa8)
fork_trampoline()
  
   *sigh*  This is why enabling interrupts in trap() is such a bad idea.  If
   we
   get a trap in the scheduler, then lots of bad crap starts to happen
   because
   we
   can get an interrupt while we are in a trap. :( Can you compile your
   kernel
   with
   INVARIANTS on though, as I think the kernel should've panic'd earlier if
   it
   is
   doing what I think it is doing.
  
   It's already have INVARIANTS, MUTEX_DEBUG, WITNESS and WITNESS_DDB.
 
  Hmm, ouch, you do'nt want MUTEX_DEBUG, that'll slow your system to a crawl.
 
  It doesn't really matter, because system can't even boot into single-user due
  to
  panic.
 
Also, if you are feeling industrious, edit
   sys/i386/i386/trap.c and comment out the enable_intr() call near the
   beginning
   of the trap() function right after the printf for 'kernel trap %d with
   interrupts disabled'.
  
   Ok, I'll try so.
  
   -Maxim
 
  It will still panic, just hopefully a better panic.
 
  I did understand that, but the panic I see after the change is exactly the
  same as
  before. Any other ideas?

 A recursive sched_lock?  Erm, well, stick these options in your kernel config:

 options KTR
 options KTR_EXTEND
 options KTR_COMPILE=KTR_LOCK
 options KTR_MASK=KTR_MASK

Bah, it even doesn't compile with these options:
cc -c -pipe -O -march=pentium -Wall -Wredundant-decls -Wnested-externs
-Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual
-fformat-extensions -ansi  -nostdinc -I-  -I. -I../.. -I../../dev
-I../../../include -I../../contrib/dev/acpica/Subsystem/Include  -D_KERNEL -include
opt_global.h -elf  -mpreferred-stack-boundary=2  ../../kern/kern_ktr.c
../../kern/kern_ktr.c: In function `__Tunable_ktr_mask':
../../kern/kern_ktr.c:95: `KTR_MASK' undeclared (first use in this function)
../../kern/kern_ktr.c:95: (Each undeclared identifier is reported only once
../../kern/kern_ktr.c:95: for each function it appears in.)
*** Error code 1
1 error

-Maxim



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: A possible bug in the interrupt thread preemption code [Was:

2001-02-22 Thread John Baldwin



On 22-Feb-01 Maxim Sobolev wrote:
 John Baldwin wrote:

 A recursive sched_lock?  Erm, well, stick these options in your kernel
 config:

 options KTR
 options KTR_EXTEND
 options KTR_COMPILE=KTR_LOCK
 options KTR_MASK=KTR_MASK
 
 Bah, it even doesn't compile with these options:
 cc -c -pipe -O -march=pentium -Wall -Wredundant-decls -Wnested-externs
 -Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith -Winline
 -Wcast-qual
 -fformat-extensions -ansi  -nostdinc -I-  -I. -I../.. -I../../dev
 -I../../../include -I../../contrib/dev/acpica/Subsystem/Include  -D_KERNEL
 -include
 opt_global.h -elf  -mpreferred-stack-boundary=2  ../../kern/kern_ktr.c
 ../../kern/kern_ktr.c: In function `__Tunable_ktr_mask':
 ../../kern/kern_ktr.c:95: `KTR_MASK' undeclared (first use in this function)
 ../../kern/kern_ktr.c:95: (Each undeclared identifier is reported only once
 ../../kern/kern_ktr.c:95: for each function it appears in.)
 *** Error code 1
 1 error

Oh, whoops, that should be:

options KTR_MASK=KTR_LOCK

 -Maxim

-- 

John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: A possible bug in the interrupt thread preemption code [Was:

2001-02-22 Thread Maxim Sobolev


John Baldwin wrote:

 On 22-Feb-01 Maxim Sobolev wrote:
  John Baldwin wrote:
 
  On 22-Feb-01 Maxim Sobolev wrote:
 
Here it is (from DDB):
panic(c027de93,c0297409,c027f878,368,80286)
_mtx_assert(c02ea000,9,c027f878,368,80286)
mi_switch(c32c5da0,3,c02cea44,c357be98)
ithread_schedule(c0747c00,1)
sched_ithd(e)
Xresume14()
--- interrupt, eip = 0xc025b60f, esp = 0x80296, ebp = 0xc357bf08 ---
trap(18, 10, 10,c01597b6,20)
calltrap()
--- trap 0x9, eip = 0xc025a5de, esp = 0xc357bf50, ebp = 0xc357bf64 ---
sw1b(c0146cbc,c0146cbc,c32c5da0,c357bf94)
ithread_loop(c0747c00,c357bfa8)
fork_exit(c0146cbc,c0747c00,c357bfa8)
fork_trampoline()
  
   *sigh*  This is why enabling interrupts in trap() is such a bad idea.  If
   we
   get a trap in the scheduler, then lots of bad crap starts to happen
   because
   we
   can get an interrupt while we are in a trap. :( Can you compile your
   kernel
   with
   INVARIANTS on though, as I think the kernel should've panic'd earlier if
   it
   is
   doing what I think it is doing.
  
   It's already have INVARIANTS, MUTEX_DEBUG, WITNESS and WITNESS_DDB.
 
  Hmm, ouch, you do'nt want MUTEX_DEBUG, that'll slow your system to a crawl.
 
  It doesn't really matter, because system can't even boot into single-user due
  to
  panic.
 
Also, if you are feeling industrious, edit
   sys/i386/i386/trap.c and comment out the enable_intr() call near the
   beginning
   of the trap() function right after the printf for 'kernel trap %d with
   interrupts disabled'.
  
   Ok, I'll try so.
  
   -Maxim
 
  It will still panic, just hopefully a better panic.
 
  I did understand that, but the panic I see after the change is exactly the
  same as
  before. Any other ideas?

 A recursive sched_lock?  Erm, well, stick these options in your kernel config:

 options KTR
 options KTR_EXTEND
 options KTR_COMPILE=KTR_LOCK
 options KTR_MASK=KTR_MASK

 Then when it panics, use the 'show ktr' command to list the mutex operations up
 until that point.  Hopefully you can see where it is grabbing sched lock the
 first time and then not releasing it.

Got the following:

724: REL (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:438
723: GOT (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:350
722: REL (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:438
721: GOT (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:350
680: REL (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:438
679: GOT (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:350
569: REL (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:438
568: GOT (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:350
546: REL (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:438
545: GOT (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:350
544: REL (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:438
543: GOT (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:350
515: REL (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:438
366: REL (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:438
365: GOT (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:350
317: REL (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:438
254: REL (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:438
253: GOT (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:350
252: REL (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:438
251: GOT (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:350
194: REL (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:438
193: GOT (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:350
182: REL (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:438
181: GOT (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:350
46: GOT (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:350
1020: REL (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:438
1019: GOT (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:350

-Maxim


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

stranges in threads implementation... possible bug?

2000-11-24 Thread Sergey Osokin


Hello!
My friend find some stranges in FreeBSD threads implementation...
Here is a "special" code:

=
#include stdio.h
#include assert.h
#include string
#include pthread.h
#include unistd.h
#include errno.h

#define Debug(x)printf x

extern "C" {
typedef void *(*_THR_C_FUNC)(void *args);
}
typedef void *(*_THR_FUNC)(void *args);

/*-*/
classMutex
{
public:
  Mutex() { assert(::pthread_mutex_init(this-lock_, 0) == 0); }
  ~Mutex (void) { assert(::pthread_mutex_destroy(this-lock_)==0); }
  int acquire (void) { return ::pthread_mutex_lock(this-lock_); }
  int release (void) { return ::pthread_mutex_unlock (this-lock_); }
  pthread_mutex_t lock_;
};

/*-*/
class Condition
{
public:
  Condition (Mutex m);
  ~Condition (void);
  int wait (void);
  void signal (void);
protected:
  pthread_cond_t cond_;
  Mutex mutex_;
};

Condition::Condition (Mutex m) : mutex_ (m)
{
  assert (pthread_cond_init(this-cond_, 0) == 0);
}

Condition::~Condition (void)
{
   
while(::pthread_cond_destroy(this-cond_) == -1  errno == EBUSY)
  {
assert(::pthread_cond_broadcast(this-cond_) == 0);
#ifdef __linux__
::sched_yield ();
#else
::pthread_yield();
#endif
  }
}

int Condition::wait (void)
{
  return ::pthread_cond_wait(this-cond_, this-mutex_.lock_);
}

void Condition::signal (void)
{
  assert(::pthread_cond_signal(this-cond_) == 0);
}

/*-*/
class Guard
{
public:
  Guard (Mutex l);
  ~Guard (void);
private:
  Mutex *lock_;
};
Guard::Guard (Mutex l)
  : lock_ (l)
{
  this-lock_-acquire ();
}
Guard::~Guard (void)
{
  this-lock_-release ();
}

/*-*/
class _Base_Thread_Adapter
{
public:
  _Base_Thread_Adapter (_THR_FUNC user_func, void *arg);
  void *invoke (void);
  _THR_C_FUNC entry_point (void) { return entry_point_; }

private:
  _THR_FUNC user_func_;
  void *arg_;
  _THR_C_FUNC entry_point_;
};

extern "C" void * _thread_adapter (void *args)
{
  _Base_Thread_Adapter *thread_args = (_Base_Thread_Adapter*)args;
  void *status = thread_args-invoke ();
  return status;
}

_Base_Thread_Adapter::_Base_Thread_Adapter (_THR_FUNC user_func, void *arg)
  : user_func_ (user_func), arg_ (arg), entry_point_ (_thread_adapter)
{
}

void *
_Base_Thread_Adapter::invoke (void)
{
  void *(*func)(void *) = this-user_func_;
  void *arg = this-arg_;
  delete this;
  return func(arg);
}

/*-*/
class SS {
  public:
void spawn();
static void run();
static void *WThread( void *data );
};
  
/*-*/
static Mutex CMutex;
static Condition Cond(CMutex);
static Mutex m1;

/*-*/
#define REL(m,n) assert(m.release() != -1)
#define ACQ(m,n) assert(m.acquire() != -1)
  
/*-*/
void *
SS::WThread( void *data )
{
Cond.signal();
Debug(("run thread...\n"));
SS::run();
Debug(("thread ended\n"));
return NULL;
}

/*-*/
int thr_create (_THR_FUNC func, void *args)
{
  _Base_Thread_Adapter *thread_args;
  thread_args  = new  _Base_Thread_Adapter(func, args);
  pthread_attr_t attr;
  if (::pthread_attr_init (attr) != 0)
  return -1;
  ::pthread_attr_setdetachstate(attr, PTHREAD_CREATE_DETACHED);
  pthread_t thr_id;
  assert( ::pthread_create (thr_id, attr,
thread_args-entry_point(), thread_args) == 0 );
  ::pthread_attr_destroy (attr);
}
/*-*/
void
SS::spawn()
{
#ifdef BAD
int rc;
Guard guard(m1);   // !!!
#else
Guard guard(m1);   // !!!
int rc;
#endif
pthread_attr_t attr;
if (::pthread_attr_init (attr) != 0) return;
::pthread_attr_setdetachstate(attr, PTHREAD_CREATE_DETACHED);
thr_create(SS::WThread, (void *)0);
::pthread_attr_destroy (attr);
ACQ(CMutex, "CMutex");
rc = Cond.wait();
if( rc == -1 )
  Debug(("Cond wait failed: %s\n", strerror(errno)));
REL(CMutex, "CMutex"); 
}

/*-*/
void
SS::run()
{
string s;   // !!!
string s1;  // !!!
sleep(1);
}

/*=*/
static void sp_call(SS *ss)
{
string s;   // !!!
ss-spawn();
}

/*--*/
int main(int argc, char **argv)
{
SS ss;
sp_call(ss);
sleep(2);
Debug(("Exitting...\n"));
sleep(3);
return 0;
}
=

and here is is a

Re: Possible bug in current?

2000-08-01 Thread Damon M. Conway


 Damon Hammis wrote:
Has anyone else stumbled across this bug in 5.0-CURRENT?  Whenever I try
to do a tail -f on a text file the system locks up and requires a hard
reboot.

Anyone else see anything similar?

yes, there is a long discussion on -current about it right now.

damon


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Possible bug in current?

2000-08-01 Thread Jim Bloom


Fixed with revision 1.14 sys/kern/kern_event.c.  Update your kernel sources and
the problem should go away.

Jim Bloom
[EMAIL PROTECTED]

"Damon M. Conway" wrote:
 
  Damon Hammis wrote:
 Has anyone else stumbled across this bug in 5.0-CURRENT?  Whenever I try
 to do a tail -f on a text file the system locks up and requires a hard
 reboot.
 
 Anyone else see anything similar?
 
 yes, there is a long discussion on -current about it right now.
 
 damon


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Possible bug in current?

2000-08-01 Thread Kenneth Wayne Culver


Yeah, it's supposedly fixed now.


=
| Kenneth Culver  | FreeBSD: The best NT upgrade|
| Unix Systems Administrator  | ICQ #: 24767726 |
| and student at The  | AIM: muythaibxr |
| The University of Maryland, | Website: (Under Construction)   |
| College Park.   | http://www.wam.umd.edu/~culverk/|
=

On Tue, 1 Aug 2000, Damon Hammis wrote:

 Has anyone else stumbled across this bug in 5.0-CURRENT?  Whenever I try
 to do a tail -f on a text file the system locks up and requires a hard
 reboot.
 
 Anyone else see anything similar?
 
 --Damon
 
_ _
 |__/|  .~~.
 /o=o'`./  .'
{o__,   \{
  / .  . )\  
  `-` '-' \} 
 .(   _(   )_.' 
'---.~_ _ _| 
 
 
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with "unsubscribe freebsd-questions" in the body of the message
 



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Possible bug in netinet6/in6_rmx.c ?

2000-07-07 Thread Jonathan M. Bresler


 
By the way, while we are talking about sysctl, I don't suppose you would be
  willing to review/commit PR 15251? It is a fairly straightforward patch that
 
 I see Jonathan Bresler took it (today).
 

wow dude! put me on the spot or something!

jmb


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Possible bug in netinet6/in6_rmx.c ?

2000-07-05 Thread Kelly Yancey


On Tue, 4 Jul 2000, Andrzej Bialecki wrote:

 Yeah, something like that. The question is who is going to fix it? INET6
 issues should probably stay in sync with other BSDs and KAME, and
 therefore IMHO the maintainer of inet6 code should step out and fix
 it... (Hello?? :)
 

  Hmm. Good point.

By the way, while we are talking about sysctl, I don't suppose you would be
  willing to review/commit PR 15251? It is a fairly straightforward patch that
 
 I see Jonathan Bresler took it (today).
 

  Actually, I think it was John Baldwin...too many JB's around here :)

  Kelly

--
Kelly Yancey  -  [EMAIL PROTECTED]  -  Belmont, CA
System Administrator, eGroups.com  http://www.egroups.com/
Maintainer, BSD Driver Database   http://www.posi.net/freebsd/drivers/
Coordinator, Team FreeBSDhttp://www.posi.net/freebsd/Team-FreeBSD/



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Possible bug in netinet6/in6_rmx.c ?

2000-07-04 Thread Andrzej Bialecki


On Sun, 2 Jul 2000, Kelly Yancey wrote:

 On Sun, 2 Jul 2000, Andrzej Bialecki wrote:
 
  Hi,
  
  While working on adding dynamic sysctls support, I discovered something
  that looks like a bug.
  
  For kernels that have both INET and INET6, three sysctl entries (rtexpire,
  rtminexpire, rtmaxcache) are registered twice - both in netinet/in_rmx.c
  and netinet6/in6_rmx.c.
  
  It seems they should be registered only once, within a section that is
  common to INET and INET6.
  
  Andrzej Bialecki
  
 
   I think the real problem is that the rtexpire, rtminexpire, and rtmaxcache
 variables are each declared static in netinet/in_rmx.c and again in
 netinet6/in6_in6_rmx.c. Do we really need separate learned route expiration
 times for ip4 and ip6? If the answer is yes, then the solution should be to
 move the ip6 versions under the net.inet.ip6 sysctl tree.
   Otherwise, as you suggest, rtexpire and friends need to be common (maybe
 directly under net.inet?)

Yeah, something like that. The question is who is going to fix it? INET6
issues should probably stay in sync with other BSDs and KAME, and
therefore IMHO the maintainer of inet6 code should step out and fix
it... (Hello?? :)

   By the way, while we are talking about sysctl, I don't suppose you would be
 willing to review/commit PR 15251? It is a fairly straightforward patch that

I see Jonathan Bresler took it (today).

Andrzej Bialecki

//  [EMAIL PROTECTED] WebGiro AB, Sweden (http://www.webgiro.com)
// ---
// -- FreeBSD: The Power to Serve. http://www.freebsd.org 
// --- Small  Embedded FreeBSD: http://www.freebsd.org/~picobsd/ 




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Possible bug in netinet6/in6_rmx.c ?

2000-07-02 Thread Kelly Yancey


On Sun, 2 Jul 2000, Andrzej Bialecki wrote:

 Hi,
 
 While working on adding dynamic sysctls support, I discovered something
 that looks like a bug.
 
 For kernels that have both INET and INET6, three sysctl entries (rtexpire,
 rtminexpire, rtmaxcache) are registered twice - both in netinet/in_rmx.c
 and netinet6/in6_rmx.c.
 
 It seems they should be registered only once, within a section that is
 common to INET and INET6.
 
 Andrzej Bialecki
 

  I think the real problem is that the rtexpire, rtminexpire, and rtmaxcache
variables are each declared static in netinet/in_rmx.c and again in
netinet6/in6_in6_rmx.c. Do we really need separate learned route expiration
times for ip4 and ip6? If the answer is yes, then the solution should be to
move the ip6 versions under the net.inet.ip6 sysctl tree.
  Otherwise, as you suggest, rtexpire and friends need to be common (maybe
directly under net.inet?)

  By the way, while we are talking about sysctl, I don't suppose you would be
willing to review/commit PR 15251? It is a fairly straightforward patch that
fixes a number of signed-ness bugs with sysctl as well as fix certain sysctl
variables to use the correct data type (mostly an issue when ints and longs
are different sizes). Thanks,

  Kelly

--
Kelly Yancey  -  [EMAIL PROTECTED]  -  Belmont, CA
System Administrator, eGroups.com  http://www.egroups.com/
Maintainer, BSD Driver Database   http://www.posi.net/freebsd/drivers/
Coordinator, Team FreeBSDhttp://www.posi.net/freebsd/Team-FreeBSD/



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

70 matches

Mail list logo