Re: NFS deadlock on 9.2-Beta1

2013-07-25 Thread Michael Tratz

On Jul 24, 2013, at 5:25 PM, Rick Macklem  wrote:

> Michael Tratz wrote:
>> Two machines (NFS Server: running ZFS / Client: disk-less), both are
>> running FreeBSD r253506. The NFS client starts to deadlock processes
>> within a few hours. It usually gets worse from there on. The
>> processes stay in "D" state. I haven't been able to reproduce it
>> when I want it to happen. I only have to wait a few hours until the
>> deadlocks occur when traffic to the client machine starts to pick
>> up. The only way to fix the deadlocks is to reboot the client. Even
>> an ls to the path which is deadlocked, will deadlock ls itself. It's
>> totally random what part of the file system gets deadlocked. The NFS
>> server itself has no problem at all to access the files/path when
>> something is deadlocked on the client.
>> 
>> Last night I decided to put an older kernel on the system r252025
>> (June 20th). The NFS server stayed untouched. So far 0 deadlocks on
>> the client machine (it should have deadlocked by now). FreeBSD is
>> working hard like it always does. :-) There are a few changes to the
>> NFS code from the revision which seems to work until Beta1. I
>> haven't tried to narrow it down if one of those commits are causing
>> the problem. Maybe someone has an idea what could be wrong and I can
>> test a patch or if it's something else, because I'm not a kernel
>> expert. :-)
>> 
> Well, the only NFS client change committed between r252025 and r253506
> is r253124. It fixes a file corruption problem caused by a previous
> commit that delayed the vnode_pager_setsize() call until after the
> nfs node mutex lock was unlocked.
> 
> If you can test with only r253124 reverted to see if that gets rid of
> the hangs, it would be useful, although from the procstats, I doubt it.
> 
>> I have run several procstat -kk on the processes including the ls
>> which deadlocked. You can see them here:
>> 
>> http://pastebin.com/1RPnFT6r
> 
> All the processes you show seem to be stuck waiting for a vnode lock
> or in __utmx_op_wait. (I`m not sure what the latter means.)
> 
> What is missing is what processes are holding the vnode locks and
> what they are stuck on.
> 
> A starting point might be ``ps axhl``, to see what all the threads
> are doing (particularily the WCHAN for them all). If you can drop into
> the debugger when the NFS mounts are hung and do a ```show alllocks``
> that could help. See:
> http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html
> 
> I`ll admit I`d be surprised if r253124 caused this, but who knows.
> 
> If there have been changes to your network device driver between
> r252025 and r253506, I`d try reverting those. (If an RPC gets stuck
> waiting for a reply while holding a vnode lock, that would do it.)
> 
> Good luck with it and maybe someone else can think of a commit
> between r252025 and r253506 that could cause vnode locking or network
> problems.
> 
> rick
> 
>> 
>> I have tried to mount the file system with and without nolockd. It
>> didn't make a difference. Other than that it is mounted with:
>> 
>> rw,nfsv3,tcp,noatime,rsize=32768,wsize=32768
>> 
>> Let me know if you need me to do something else or if some other
>> output is required. I would have to go back to the problem kernel
>> and wait until the deadlock occurs to get that information.
>> 

Thanks Rick and Steven for your quick replies.

I spoke too soon regarding r252025 fixing the problem. The same issue started 
to show up after about 1 day and a few hours of uptime.

"ps axhl" shows all those stuck processes in newnfs

I recompiled the GENERIC kernel for Beta1 with the debugging options:

http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html

ps and debugging output:

http://pastebin.com/1v482Dfw

(I only listed processes matching newnfs, if you need the whole list, please 
let me know)

The first PID showing up having that problem is 14001. Certainly the "show 
alllocks" command shows interesting information for that PID.
I looked through the commit history for those files mentioned in the output to 
see if there is something obvious to me. But I don't know. :-)
I hope that information helps you to dig deeper into the issue what might be 
causing those deadlocks.

I did include the pciconf -lv, because you mentioned network device drivers. 
It's Intel igb. The same hardware is running a kernel from January 19th, 2013 
also as an NFS client. That machine is rock solid. No problems at all.

I also went to r251611. That's before r251641 (The NFS FHA changes). Same 
problem. Here is another debugging output from that kernel:

http://pastebin.com/ryv8BYc4

If I should test something else or provide some other output, please let me 
know.

Again thank you!

Michael


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-sta

Re: java (openjdk6) segfaults when built with 9-stable clang

2013-07-25 Thread Baptiste Daroussin
On Wed, Jul 24, 2013 at 10:51:00AM +0200, Arnaud Houdelette wrote:
> Hi
> 
> I recently upgraded my home NAS from 9.1-RELEASE to 9-stable (r253470  
> (9.2-BETA1))
> 
> I also upgraded my poudriere building jail. Since then, multimedia/xbmc 
> port fails to build in configure stage : java segfaults (sig11).
> I use WITH_CLANG_IS_CC=YES, for world and build-jails.
> 
> I found following workarounds:
>   - use previously (with 9.1-RELEASE world and clang) build openjdk6 pkg 
> (same version).
>   - use USE_GCC=YES for java port.
> 
> It's the only one place I use java (openjdk6-b27_4). So I cannot say if 
> java works otherwise.
> 
> Is this a java or clang bug ?
> 

Here is the bug

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6636110

Fixed in b27_6

regards,
Bapt


pgpj_xoM64r9k.pgp
Description: PGP signature


Re: ZFS: can't read MOS of pool

2013-07-25 Thread Trond Endrestøl
On Thu, 25 Jul 2013 20:45+0200, ?ukasz W?sikowski wrote:

> W dniu 2013-07-25 18:40, Trond Endrestøl pisze:
> 
> >> Any hints how to go from here?
> > 
> > I'm only subscribed to freebsd-stable@, but as far as I can tell, no 
> > one from freebsd-fs@ nor freebsd-stable@ has yet replied. I'm only 
> > replying to freebsd-stable@.
> > 
> > First, just some quick questions:
> > 
> > Have you by chance upgraded the pool format without upgrading the boot 
> > blocks? Or was the pool already at 5000?
> > 
> > You didn't mention if you have made any attempt at updating the boot 
> > blocks after playing with ezjail-admin.
> > 
> > Perhaps you should consider updating the boot blocks once more:
> > 
> > 1. Boot from the live CD.
> > 
> > 2. Import the pool read-only without mounting any fs:
> > 
> > zpool import -f -N -o readonly=on klawisz
> > 
> > 3. Mount the root fs read-only:
> > 
> > mount -r -t zfs klawisz/ROOTFS /tmp/zroot
> > 
> > 4. Update the boot blocks from the files stored in the root fs:
> > 
> > gpart bootcode -b /tmp/zroot/boot/pmbr -p /tmp/zroot/boot/gptzfsboot -i 1 
> > ada0
> > 
> > 5. Unmount the root fs:
> > 
> > umount /tmp/zroot
> > 
> > 6. Reboot the system, do NOT export the pool.
> > 
> > Hopefully, the updated gptzfsboot stored in ada0p1 will be able to 
> > read the MOS.
> > 
> > That's all I can think of at the moment.
> > 
> > Best of luck.
> 
> Thank you for your reply. The pool was created with version 5000. I have
> updated the boot blocks before (but with exporting pool after). I don't
> think that ezjail-admin has anything to do with booting.

Never ever leave a pool you intend to boot from in the exported state. 
Period. ;-)

At least that was a big no-no when I started using ZFS a couple of 
years ago. Maybe this has changed since FreeBSD no longer relies on 
the zpool.cache file.

> I did as you suggested and it didn't help, still MOS can't be read. I'm
> pretty sure I can reproduce this problem. I will try to do detailed
> guide and post it here.

Reading through your post once more, 
http://pastie.org/private/mtfhkx0wx0vve29xn0plw , I noticed the 
following:

1. The bootfs property is set to klawisz/ROOTFS for the klawisz pool.

2. The mountpoint property is set to / for the klawisz filesystem.

3. The mountpoint property is set to legacy for the klawisz/ROOTFS filesystem.

Maybe item 2 is the cause of all this confusion. I see nothing wrong 
with items 1 and 3.

Perhaps you should reset the mountpoint property for klawisz, using:

zfs set mountpoint=legacy klawisz

At the same time you may let klawisz/ROOTFS inherit the mountpoint 
property from klawisz by running:

zfs inherit mountpoint klawisz/ROOTFS

You should also check the mountpoint properties of all the other 
filesystems you intend to mount automatically.

-- 
+---++
| Vennlig hilsen,   | Best regards,  |
| Trond Endrestøl,  | Trond Endrestøl,   |
| IT-ansvarlig, | System administrator,  |
| Fagskolen Innlandet,  | Gjøvik Technical College, Norway,  |
| tlf. mob.   952 62 567,   | Cellular...: +47 952 62 567,   |
| sentralbord 61 14 54 00.  | Switchboard: +47 61 14 54 00.  |
+---++___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: java (openjdk6) segfaults when built with 9-stable clang

2013-07-25 Thread Arnaud Houdelette

On 24/07/2013 10:51, Arnaud Houdelette wrote:

Hi

I recently upgraded my home NAS from 9.1-RELEASE to 9-stable (r253470  
(9.2-BETA1))


I also upgraded my poudriere building jail. Since then, 
multimedia/xbmc port fails to build in configure stage : java 
segfaults (sig11).

I use WITH_CLANG_IS_CC=YES, for world and build-jails.

I found following workarounds:
 - use previously (with 9.1-RELEASE world and clang) build openjdk6 
pkg (same version).

 - use USE_GCC=YES for java port.

It's the only one place I use java (openjdk6-b27_4). So I cannot say 
if java works otherwise.


Is this a java or clang bug ?

Should I open a PR ?


Solved in openjdk6-b27_6.
See http://lists.freebsd.org/pipermail/freebsd-ports/2013-July/085074.html

regards,

Arnaud
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ZFS: can't read MOS of pool

2013-07-25 Thread Łukasz Wąsikowski
W dniu 2013-07-25 18:40, Trond Endrestøl pisze:

>> Any hints how to go from here?
> 
> I'm only subscribed to freebsd-stable@, but as far as I can tell, no 
> one from freebsd-fs@ nor freebsd-stable@ has yet replied. I'm only 
> replying to freebsd-stable@.
> 
> First, just some quick questions:
> 
> Have you by chance upgraded the pool format without upgrading the boot 
> blocks? Or was the pool already at 5000?
> 
> You didn't mention if you have made any attempt at updating the boot 
> blocks after playing with ezjail-admin.
> 
> Perhaps you should consider updating the boot blocks once more:
> 
> 1. Boot from the live CD.
> 
> 2. Import the pool read-only without mounting any fs:
> 
> zpool import -f -N -o readonly=on klawisz
> 
> 3. Mount the root fs read-only:
> 
> mount -r -t zfs klawisz/ROOTFS /tmp/zroot
> 
> 4. Update the boot blocks from the files stored in the root fs:
> 
> gpart bootcode -b /tmp/zroot/boot/pmbr -p /tmp/zroot/boot/gptzfsboot -i 1 ada0
> 
> 5. Unmount the root fs:
> 
> umount /tmp/zroot
> 
> 6. Reboot the system, do NOT export the pool.
> 
> Hopefully, the updated gptzfsboot stored in ada0p1 will be able to 
> read the MOS.
> 
> That's all I can think of at the moment.
> 
> Best of luck.

Thank you for your reply. The pool was created with version 5000. I have
updated the boot blocks before (but with exporting pool after). I don't
think that ezjail-admin has anything to do with booting.

I did as you suggested and it didn't help, still MOS can't be read. I'm
pretty sure I can reproduce this problem. I will try to do detailed
guide and post it here.

-- 
best regards,
Lukasz Wasikowski
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ZFS: can't read MOS of pool

2013-07-25 Thread Trond Endrestøl
On Mon, 22 Jul 2013 18:18+0200, ?ukasz W?sikowski wrote:

> Hi,
> 
> I've got a problem with booting zfs-on-root FreeBSD 9.2-PRERELEASE. I'm
> getting:
> 
> ZFS: i/o error - all block copies unavailable
> ZFS: can't read MOS of pool klawisz
> gptzfsboot: failed to mount default pool klawisz
> 
> Machine is VM running under KVM on Proxmox 2.3-13. VM has 8 GB of RAM,
> 400 GB of local storage with SCSI Controller type: Default (lsi).
> 
> I'm not sure what I did to make this VM unbootable. I've installed
> 9.2-PRERELEASE, did source based upgrade to r253470, mergemaster,
> reinstalled bootcode and rebooted. To this point VM was bootable.
> 
> Then I did installworld from /usr/src to ezjail's basejail (ezjail-admin
> update -i), did mergemaster for jails, install some ports - none of this
> should mess with booting. I rebooted VM and got unbootable system.
> 
> When I boot from liveCD I can import this pool (scrub shows no errors),
> mount it, chroot to it, and work with it. I just can't get it to boot.
> 
> Some information about the system:
> http://pastie.org/private/mtfhkx0wx0vve29xn0plw
> 
> I've tried to downgrade to r252316 - no luck, system is still unbootable.
> 
> Any hints how to go from here?

I'm only subscribed to freebsd-stable@, but as far as I can tell, no 
one from freebsd-fs@ nor freebsd-stable@ has yet replied. I'm only 
replying to freebsd-stable@.

First, just some quick questions:

Have you by chance upgraded the pool format without upgrading the boot 
blocks? Or was the pool already at 5000?

You didn't mention if you have made any attempt at updating the boot 
blocks after playing with ezjail-admin.

Perhaps you should consider updating the boot blocks once more:

1. Boot from the live CD.

2. Import the pool read-only without mounting any fs:

zpool import -f -N -o readonly=on klawisz

3. Mount the root fs read-only:

mount -r -t zfs klawisz/ROOTFS /tmp/zroot

4. Update the boot blocks from the files stored in the root fs:

gpart bootcode -b /tmp/zroot/boot/pmbr -p /tmp/zroot/boot/gptzfsboot -i 1 ada0

5. Unmount the root fs:

umount /tmp/zroot

6. Reboot the system, do NOT export the pool.

Hopefully, the updated gptzfsboot stored in ada0p1 will be able to 
read the MOS.

That's all I can think of at the moment.

Best of luck.

-- 
+---++
| Vennlig hilsen,   | Best regards,  |
| Trond Endrestøl,  | Trond Endrestøl,   |
| IT-ansvarlig, | System administrator,  |
| Fagskolen Innlandet,  | Gjøvik Technical College, Norway,  |
| tlf. mob.   952 62 567,   | Cellular...: +47 952 62 567,   |
| sentralbord 61 14 54 00.  | Switchboard: +47 61 14 54 00.  |
+---++___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: zpool on a zvol inside zpool

2013-07-25 Thread Ronald Klop
On Mon, 22 Jul 2013 10:04:19 +0200, Eugene M. Zheganin   
wrote:



Hi.

I'm moving some of my geli installation to a new machine. On an old
machine it was running UFS. I use ZFS on a new machine, but I don't have
an encrypted main pool (and I don't want to), so I'm kinda considering a
way where I will make a zpool on a zvol encrypted by geli. Would it be
completely insane (should I use UFS instead ?) or would it be still
valid  ?

Thanks.
Eugene.


I think that depends on your configuration and situation.
If you have a spare disk to use for GELI+UFS. That is more simple to  
configure/maintain.
But if you are running a big fileserver than the overhead of ZFS+GELI+UFS  
might be negligible.


Ronald.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: stopping amd causes a freeze

2013-07-25 Thread Konstantin Belousov
On Thu, Jul 25, 2013 at 09:56:59AM +0200, Dominic Fandrey wrote:
> On 22/07/2013 12:07, Konstantin Belousov wrote:
> > On Mon, Jul 22, 2013 at 11:50:24AM +0200, Dominic Fandrey wrote:
> >> ...
> >>
> >> I run amd through sysutils/automounter, which is a scripting solution
> >> that generates an amd.map file based on encountered devices and devd
> >> events. The SIGHUP it sends to amd to tell it the map file was updated
> >> does not cause problems, only a -SIGKILL- SIGTERM may cause the freeze.
> >>
> >> Nothing was mounted (by amd) during the last freeze.
> >>
> >> ...
> > 
> > Are you sure that the machine did not paniced ?  Do you have serial console 
> > ?
> > 
> > The amd(8) locks itself into memory, most likely due to the fear of
> > deadlock. There are some known issues with user wirings in stable/9.
> > If the problem you see is indeed due to wiring, you might try to apply
> > r253187-r253191.
> 
> I tried that. Applying the diff was straightforward enough. But the
> resulting kernel paniced as soon as it tried to mount the root fs.
You did provided a useful info to diagnose the issue.

Patch should keep KBI compatible, but, just in case, if you have any
third-party module, rebuild it.

> 
> So I'll wait for the MFC from someone who knows what he/she is doing.

Patch below booted for me, and I run some sanity check tests for the
mlockall(2), which also did not resulted in misbehaviour.

Index: kern/vfs_bio.c
===
--- kern/vfs_bio.c  (revision 253643)
+++ kern/vfs_bio.c  (working copy)
@@ -1614,7 +1614,8 @@ brelse(struct buf *bp)
(PAGE_SIZE - poffset) : resid;
 
KASSERT(presid >= 0, ("brelse: extra page"));
-   vm_page_set_invalid(m, poffset, presid);
+   if (pmap_page_wired_mappings(m) == 0)
+   vm_page_set_invalid(m, poffset, presid);
if (had_bogus)
printf("avoided corruption bug in 
bogus_page/brelse code\n");
}
Index: vm/vm_fault.c
===
--- vm/vm_fault.c   (revision 253643)
+++ vm/vm_fault.c   (working copy)
@@ -286,6 +286,19 @@ RetryFault:;
(u_long)vaddr);
}
 
+   if (fs.entry->eflags & MAP_ENTRY_IN_TRANSITION &&
+   fs.entry->wiring_thread != curthread) {
+   vm_map_unlock_read(fs.map);
+   vm_map_lock(fs.map);
+   if (vm_map_lookup_entry(fs.map, vaddr, &fs.entry) &&
+   (fs.entry->eflags & MAP_ENTRY_IN_TRANSITION)) {
+   fs.entry->eflags |= MAP_ENTRY_NEEDS_WAKEUP;
+   vm_map_unlock_and_wait(fs.map, 0);
+   } else
+   vm_map_unlock(fs.map);
+   goto RetryFault;
+   }
+
/*
 * Make a reference to this object to prevent its disposal while we
 * are messing with it.  Once we have the reference, the map is free
Index: vm/vm_map.c
===
--- vm/vm_map.c (revision 253643)
+++ vm/vm_map.c (working copy)
@@ -2272,6 +2272,7 @@ vm_map_unwire(vm_map_t map, vm_offset_t start, vm_
 * above.)
 */
entry->eflags |= MAP_ENTRY_IN_TRANSITION;
+   entry->wiring_thread = curthread;
/*
 * Check the map for holes in the specified region.
 * If VM_MAP_WIRE_HOLESOK was specified, skip this check.
@@ -2304,8 +2305,24 @@ done:
else
KASSERT(result, ("vm_map_unwire: lookup failed"));
}
-   entry = first_entry;
-   while (entry != &map->header && entry->start < end) {
+   for (entry = first_entry; entry != &map->header && entry->start < end;
+   entry = entry->next) {
+   /*
+* If VM_MAP_WIRE_HOLESOK was specified, an empty
+* space in the unwired region could have been mapped
+* while the map lock was dropped for draining
+* MAP_ENTRY_IN_TRANSITION.  Moreover, another thread
+* could be simultaneously wiring this new mapping
+* entry.  Detect these cases and skip any entries
+* marked as in transition by us.
+*/
+   if ((entry->eflags & MAP_ENTRY_IN_TRANSITION) == 0 ||
+   entry->wiring_thread != curthread) {
+   KASSERT((flags & VM_MAP_WIRE_HOLESOK) != 0,
+   ("vm_map_unwire: !HOLESOK and new/changed entry"));
+   continue;
+   }
+
if (rv == KERN_SUCCESS && (!user_unwire ||
(entry->eflags & MAP_ENTRY_USER_WIRED))) {

Re: stopping amd causes a freeze

2013-07-25 Thread Dominic Fandrey
On 22/07/2013 12:07, Konstantin Belousov wrote:
> On Mon, Jul 22, 2013 at 11:50:24AM +0200, Dominic Fandrey wrote:
>> ...
>>
>> I run amd through sysutils/automounter, which is a scripting solution
>> that generates an amd.map file based on encountered devices and devd
>> events. The SIGHUP it sends to amd to tell it the map file was updated
>> does not cause problems, only a -SIGKILL- SIGTERM may cause the freeze.
>>
>> Nothing was mounted (by amd) during the last freeze.
>>
>> ...
> 
> Are you sure that the machine did not paniced ?  Do you have serial console ?
> 
> The amd(8) locks itself into memory, most likely due to the fear of
> deadlock. There are some known issues with user wirings in stable/9.
> If the problem you see is indeed due to wiring, you might try to apply
> r253187-r253191.

I tried that. Applying the diff was straightforward enough. But the
resulting kernel paniced as soon as it tried to mount the root fs.

So I'll wait for the MFC from someone who knows what he/she is doing.

Regards

-- 
A: Because it fouls the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail? 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"