Re: FreeBSD 8.2 - active plus inactive memory leak!?

2012-03-07 Thread Luke Marsden
On Wed, 2012-03-07 at 10:23 +0200, Konstantin Belousov wrote:
 On Wed, Mar 07, 2012 at 12:36:21AM +, Luke Marsden wrote:
  I'm trying to confirm that, on a system with no pages swapped out, that
  the following is a true statement:
  
  a page is accounted for in active + inactive if and only if it
  corresponds to one or more of the pages accounted for in the
  resident memory lists of all the processes on the system (as per
  the output of 'top' and 'ps')
 No.
 
 The pages belonging to vnode vm object can be active or inactive or cached
 but not mapped into any process address space.

Thank you, Konstantin.  Does the number of vnodes we've got open on this
machine (272011) fully explain away the memory gap?

Memory gap:
11264M active + 2598M inactive - 9297M sum-of-resident = 4565M

Active vnodes:
vfs.numvnodes: 272011

That gives a lower bound at 17.18Kb per vode (or higher if we take into
account shared libs, etc); that seems a bit high for a vnode vm object
doesn't it?

If that doesn't fully explain it, what else might be chewing through
active memory?

Also, when are vnodes freed?

This system does have some tuning...
kern.maxfiles: 100
vm.pmap.pv_entry_max: 73296250

Could that be contributing to so much active + inactive memory (5GB+
more than expected), or do PV entries live in wired e.g. kernel memory?


On Tue, 2012-03-06 at 17:48 -0700, Ian Lepore wrote:
 In my experience, the bulk of the memory in the inactive category is
 cached disk blocks, at least for ufs (I think zfs does things
 differently).  On this desktop machine I have 12G physical and
 typically have roughly 11G inactive, and I can unmount one particular
 filesystem where most of my work is done and instantly I have almost
 no inactive and roughly 11G free.

Okay, so this could be UFS disk cache, except the system is ZFS-on-root
with no UFS filesystems active or mounted.  Can I confirm that no
double-caching of ZFS data is happening in active + inactive (+ cache)
memory?

Thanks,
Luke

-- 
CTO, Hybrid Logic
+447791750420  |  +1-415-449-1165  | www.hybrid-cluster.com 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: FreeBSD 8.2 - active plus inactive memory leak!?

2012-03-07 Thread Luke Marsden
On Wed, 2012-03-07 at 10:23 +0200, Konstantin Belousov wrote:
 On Wed, Mar 07, 2012 at 12:36:21AM +, Luke Marsden wrote:
  I'm trying to confirm that, on a system with no pages swapped out, that
  the following is a true statement:
  
  a page is accounted for in active + inactive if and only if it
  corresponds to one or more of the pages accounted for in the
  resident memory lists of all the processes on the system (as per
  the output of 'top' and 'ps')
 No.
 
 The pages belonging to vnode vm object can be active or inactive or cached
 but not mapped into any process address space.

Thank you, Konstantin.  Does the number of vnodes we've got open on this
machine (272011) fully explain away the memory gap?

Memory gap:
11264M active + 2598M inactive - 9297M sum-of-resident = 4565M

Active vnodes:
vfs.numvnodes: 272011

That gives a lower bound at 17.18Kb per vode (or higher if we take into
account shared libs, etc); that seems a bit high for a vnode vm object
doesn't it?

If that doesn't fully explain it, what else might be chewing through
active memory?

Also, when are vnodes freed?

This system does have some tuning...
kern.maxfiles: 100
vm.pmap.pv_entry_max: 73296250

Could that be contributing to so much active + inactive memory (5GB+
more than expected), or do PV entries live in wired e.g. kernel memory?


On Tue, 2012-03-06 at 17:48 -0700, Ian Lepore wrote:
 In my experience, the bulk of the memory in the inactive category is
 cached disk blocks, at least for ufs (I think zfs does things
 differently).  On this desktop machine I have 12G physical and
 typically have roughly 11G inactive, and I can unmount one particular
 filesystem where most of my work is done and instantly I have almost
 no inactive and roughly 11G free.

Okay, so this could be UFS disk cache, except the system is ZFS-on-root
with no UFS filesystems active or mounted.  Can I confirm that no
double-caching of ZFS data is happening in active + inactive (+ cache)
memory?

Thanks,
Luke

-- 
CTO, Hybrid Logic
+447791750420  |  +1-415-449-1165  | www.hybrid-cluster.com 



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


FreeBSD 8.2 - active plus inactive memory leak!?

2012-03-06 Thread Luke Marsden
Hi all,

I'm having some trouble with some production 8.2-RELEASE servers where
the 'Active' and 'Inact' memory values reported by top don't seem to
correspond with the processes which are running on the machine.  I have
two near-identical machines (with slightly different workloads); on one,
let's call it A, active + free is small (6.5G) and on the other (B)
active + free is large (13.6G), even though they have almost identical
sums-of-resident memory (8.3G on A and 9.3G on B).

The only difference is that A has a smaller number of quite long-running
processes (it's hosting a small number of busy sites) and B has a larger
number of more frequently killed/recycled processes (it's hosting a
larger number of quiet sites, so the FastCGI processes get killed and
restarted frequently).  Notably B has many more ZFS filesystems mounted
than A (around 4,000 versus 100).  The machines are otherwise under
similar amounts of load.  I hoped that the community could please help
me understand what's going on with respect to the worryingly large
amount of active + free memory on B.

Both machines are ZFS-on-root with FreeBSD 8.2-RELEASE with uptimes
around 5-6 days.  I have recently reduced the ARC cache on both machines
since my previous thread [1] and Wired memory usage is now stable at 6G
on A and 7G on B with an arc_max of 4G on both machines.

Neither of the machines have any swap in use:

Swap: 10G Total, 10G Free

My current (probably quite simplistic) understanding of the FreeBSD
virtual memory system is that, for each process as reported by top:

  * Size corresponds to the total size of all the text pages for the
process (those belonging to code in the binary itself and linked
libraries) plus data pages (including stack and malloc()'d but
not-yet-written-to memory segments).
  * Resident corresponds to a subset of the pages above: those pages
which actually occupy physical/core memory.  Notably pages may
appear in size but not appear in resident for read-only text
pages from libraries which have not been used yet or which have
been malloc()'d but not yet written-to.

My understanding for the values for the system as a whole (at the top in
'top') is as follows:

  * Active / inactive memory is the same thing: resident memory from
processes in use.  Being in the inactive as opposed to active
list simply indicates that the pages in question are less
recently used and therefore more likely to get swapped out if
the machine comes under memory pressure.
  * Wired is mostly kernel memory.
  * Cache is freed memory which the kernel has decided to keep in
case it correspond to a useful page in future; it can be cheaply
evicted into the free list.
  * Free memory is actually not being used for anything.

It seems that pages which occur in the active + inactive lists must
occur in the resident memory of one or more processes (or more since
processes can share pages in e.g. read-only shared libs or COW forked
address space).  Conversely, if a page *does not* occur in the resident
memory of any process, it must not occupy any space in the active +
inactive lists.

Therefore the active + inactive memory should always be less than or
equal to the sum of the resident memory of all the processes on the
system, right?

But it's not.  So, I wrote a very simple Python script to add up the
resident memory values in the output from 'top' and, on machine A:

Mem: 3388M Active, 3209M Inact, 6066M Wired, 196K Cache, 11G
Free
There were 246 processes totalling 8271 MB resident memory

Whereas on machine B:

Mem: 11G Active, 2598M Inact, 7177M Wired, 733M Cache, 1619M
Free
There were 441 processes totalling 9297 MB resident memory

Now, on machine A:

3388M active + 3209M inactive - 8271M sum-of-resident = -1674M

I can attribute this negative value to shared libraries between the
running processes (which the sum-of-res is double-counting but active +
inactive is not).  But on machine B:

11264M active + 2598M inactive - 9297M sum-of-resident = 4565M

I'm struggling to explain how, when there are only 9.2G (worst case,
discounting shared pages) of resident processes, the system is using 11G
+ 2598M = 13.8G of memory!

This missing memory is scary, because it seems to be increasing over
time, and eventually when the system runs out of free memory, I'm
certain it will crash in the same way described in my previous thread
[1].

Is my understanding of the virtual memory system badly broken - in which
case please educate me ;-) or is there a real problem here?  If so how
can I dig deeper to help uncover/fix it?

Best Regards,
Luke Marsden

[1] lists.freebsd.org/pipermail/freebsd-fs/2012-February/013775.html
[2] https://gist.github.com/1988153

-- 
CTO, Hybrid Logic
+447791750420  |  +1-415-449-1165  | www.hybrid

Re: FreeBSD 8.2 - active plus inactive memory leak!?

2012-03-06 Thread Luke Marsden
Thanks for your email, Chuck.

  Conversely, if a page *does not* occur in the resident
  memory of any process, it must not occupy any space in the active +
  inactive lists.
 
 Hmm...if a process gets swapped out entirely, the pages for it will be moved 
 to the cache list, flushed, and then reused as soon as the disk I/O 
 completes. 
   But there is a window where the process can be marked as swapped out (and 
 considered no longer resident), but still has some of it's pages in physical 
 memory.

There's no swapping happening on these machines (intentionally so,
because as soon as we hit swap everything goes tits up), so this window
doesn't concern me.

I'm trying to confirm that, on a system with no pages swapped out, that
the following is a true statement:

a page is accounted for in active + inactive if and only if it
corresponds to one or more of the pages accounted for in the
resident memory lists of all the processes on the system (as per
the output of 'top' and 'ps')

  Therefore the active + inactive memory should always be less than or
  equal to the sum of the resident memory of all the processes on the
  system, right?
 
 No.  If you've got a lot of process pages shared (ie, a webserver with lots 
 of 
 httpd children, or a database pulling in a large common shmem area), then 
 your 
 process resident sizes can be very large compared to the system-wide 
 active+inactive count.

But that's what I'm saying...

sum(process resident sizes) = active + inactive

Or as I said it above, equivalently:

active + inactive = sum(process resident sizes)

The data I've got from this system, and what's killing us, shows the
opposite: active + inactive  sum(process resident sizes) - by over 5GB
now and growing, which is what keeps causing these machines to crash.

In particular:
Mem: 13G Active, 1129M Inact, 7543M Wired, 120M Cache, 1553M Free

But the total sum of resident memories is 9457M (according to summing
the output from ps or top).

13G + 1129M = 14441M (active + inact)  9457M (sum of res)

That's 4984M out, and that's almost enough to push us over the edge.

If my understanding of VM is correct, I don't see how this can happen.
But it's happening, and it's causing real trouble here because our free
memory keeps hitting zero and then we swap-spiral.

What can I do to investigate this discrepancy?  Are there some tools
that I can use to debug the memory allocated in active to find out
where it's going, if not to resident process memory?

Thanks,
Luke

-- 
CTO, Hybrid Logic
+447791750420  |  +1-415-449-1165  | www.hybrid-cluster.com 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Another ZFS ARC memory question

2012-03-02 Thread Luke Marsden
...@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64

Thanks!
Luke Marsden

-- 
CTO, Hybrid Logic
+447791750420  |  +1-415-449-1165  | www.hybrid-cluster.com 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Another ZFS ARC memory question

2012-02-24 Thread Luke Marsden
Hi all,

Just wanted to get your opinion on best practices for ZFS.

We're running 8.2-RELEASE v15 in production on 24GB RAM amd64 machines
but have been having trouble with short spikes in application memory
usage resulting in huge amounts of swapping, bringing the whole machine
to its knees and crashing it hard.  I suspect this is because when there
is a sudden spike in memory usage the zfs arc reclaim thread is unable
to free system memory fast enough.

This most recently happened yesterday as you can see from the following
munin graphs:

E.g. http://hybrid-logic.co.uk/memory-day.png
 http://hybrid-logic.co.uk/swap-day.png

Our response has been to start limiting the ZFS ARC cache to 4GB on our
production machines - trading performance for stability is fine with me
(and we have L2ARC on SSD so we still get good levels of caching).

My questions are:

  * is this a known problem?
  * what is the community's advice for production machines running
ZFS on FreeBSD, is manually limiting the ARC cache (to ensure
that there's enough actually free memory to handle a spike in
application memory usage) the best solution to this
spike-in-memory-means-crash problem?
  * has FreeBSD 9.0 / ZFS v28 solved this problem?
  * rather than setting a hard limit on the ARC cache size, is it
possible to adjust the auto-tuning variables to leave more free
memory for spiky memory situations?  e.g. set the auto-tuning to
make arc eat 80% of memory instead of ~95% like it is at
present?
  * could the arc reclaim thread be made to drop ARC pages with
higher priority before the system starts swapping out
application pages?

Thank you for any/all answers, and thank you for making FreeBSD
awesome :-)

Best Regards,
Luke Marsden

-- 
CTO, Hybrid Logic
+447791750420  |  +1-415-449-1165  | www.hybrid-cluster.com 



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Swap on zvol - recommendable?

2012-02-12 Thread Luke Marsden
 On Feb 6, 2012, at 11:57 AM, Patrick M. Hausen wrote:
 
  Hi, all,
  
  is it possible to make a definite statement about swap on zvols?
  
  I found some older discussions about a resource starvation
  scenario when ZFS arc would be the cause of the system
  running out of memory, trying to swap, yet the ZFS would
  not be accessible until some memory was freed - leading to
  a deadlock.
  
  Is this still the case with RELENG_8? The various Root on
  ZFS guides mention both choices (decicated or gmirror
  partition vs. zvol), yet don't say anything about the respective
  merits or risks. I am aware of the fact that I cannot dump to
  a raidz2 zvol ...
  

On Tue, 2012-02-07 at 20:53 +0100, Peter Ankerstål wrote:
 I can just tell you I had this problem still in 8.1 and it was a HUGE
 problem. System stalled every two weeks or so. Now when the swap is
 moved away from zfs it works fine.
 

I can confirm that this is still a problem on 8.2 and 9.0.

-- 
CTO, Hybrid Logic
+447791750420  |  +1-415-449-1165  | www.hybrid-cluster.com 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 8.1R possible zfs snapshot livelock?

2011-05-20 Thread Luke Marsden
On Wed, 2011-05-18 at 14:05 +0200, Borja Marcos wrote: 
 On May 17, 2011, at 1:29 PM, Jeremy Chadwick wrote:
 
  * ZFS send | ssh zfs recv results in ZFS subsystem hanging;
 8.1-RELEASE;
   February 2011:
 
 http://lists.freebsd.org/pipermail/freebsd-fs/2011-February/010602.html
 
 I found a reproducible deadlock condition actually. If you keep some
 I/O activity on a dataset on which you are receiving a ZFS incremental
 snapshot at the same time, it can deadlock.
 
 Imagine this situation: Two servers, A and B. A dataset on server A is
 replicated at regular intervals to B, so that you keep a reasonably up
 to date copy.
 
 Something like:
 
 (Runnning on server A):
 
 zfs snapshot thepool/thedataset@thistime
 zfs send -Ri thepooll/thedataser@previoustime
 hepool/thedataset@thistime | ssh serverB zfs receive -d thepool
 
 It works, but I suffered a deadlock when one of the periodic daily
 scripts was running. Doing some tests, I saw that ZFS  can deadlock if
 you do a zfs receive onto a dataset which has some read activity.
 Disabling atime didn't help either.
 
 But if you make sure *not* to access the replicated dataset it works,
 I haven´t seen it failing otherwise. 
 
 If  you wish to reproduce it, try creating a dataset for /usr/obj,
 running make buildworld on it, replicating at, say, 30 or 60 second
 intervals, and keep several scripts (or rsync) reading the target
 dataset files and just copying them to another place in the usual,
 classic way. (example: tar cf - . | ( cd /destination  tar xf -)
 

Is there a PR for this?  I'd like to see it addressed, since read-only
I/O on a dataset which is being updated by `zfs recv` is an important
part of what we plan to do with ZFS on FreeBSD.

-- 
Best Regards,
Luke Marsden
CTO, Hybrid Logic Ltd.

Web: http://www.hybrid-cluster.com/
Hybrid Web Cluster - cloud web hosting

Phone: +447791750420


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 8.1R possible zfs snapshot livelock?

2011-05-18 Thread Luke Marsden
Hi all,

On Tue, 2011-05-17 at 04:29 -0700, Jeremy Chadwick wrote:
 There are still some outstanding incidents that directly pertain to
 ZFS snapshots, or are related to ZFS snapshots (meaning things like
 send/recv which are commonly used alongside snapshots), which I
 remember reading about but really saw no answer to:
 
 * ZFS send | ssh zfs recv results in ZFS subsystem hanging;
   8.1-RELEASE;

   February 2011:
http://lists.freebsd.org/pipermail/freebsd-fs/2011-February/010602.html

As the original author of this post I wanted to chime in to say that our
problem was mis-diagnosed here as being related to snapshots and zfs
send/receive.  Instead, it was a bug [1] relating to force-unmounting a
ZFS filesystem which has active child nullfs mounts and active special
devices (FIFO).  There is a related kernel panic [1] which suggests that
this is a problem area.  I've been meaning to collect enough information
to submit a proper bug report -- I can at least reliably reproduce the
issue -- but have been rather too busy with the 1.0 release of our
application, and was put off by one response: IMO this is expected.

[1] http://lists.freebsd.org/pipermail/freebsd-fs/2011-March/010983.html

Our application -- see HCFS at http://www.hybrid-cluster.com/tech/ --
makes very heavy use of ZFS snapshots and ZFS send/receive on FreeBSD
(currently 8.1), and since we engineered it so that it never attempts
foolish force-unmounts on busy filesystems we've seen no kernel hangs
over the course of hundreds of thousands of snapshot and zfs replication
events in testing.

I'm interested to know whether the OP's problem is fixed in 8.2 or
8-STABLE, since it could affect us.  Also, thanks for the links to the
backports for 8.2, Jeremy, I'll include those in our next system image.

-- 
Best Regards,
Luke Marsden
CTO, Hybrid Logic Ltd.

Web: http://www.hybrid-cluster.com/
Hybrid Web Cluster - cloud web hosting

Phone: +447791750420



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Guaranteed kernel panic with ZFS + nullfs

2011-03-16 Thread Luke Marsden
Hi all,

The following script seems to cause a guaranteed kernel panic on 8.1-R,
8.2-R and 8-STABLE as of today (2011-03-16), with both ZFS v14/15, and
v28 on 8.2-R with mm@ patches from 2011-03. I suspect it may also affect
9-CURRENT but have not tested this yet.

#!/usr/local/bin/bash
export POOL=hpool # change this to your pool name
sudo zfs destroy -r $POOL/foo
sudo zfs create $POOL/foo
sudo zfs set mountpoint=/foo $POOL/foo
sudo mount -t nullfs /foo /bar
sudo touch /foo/baz
ls /bar # should see baz
sudo zfs umount -f $POOL/foo # seems okay (ls: /bar: Bad file
descriptor)
sudo zfs mount $POOL/foo # PANIC!

Can anyone suggest a patch which fixes this? Preferably against
8-STABLE :-)

I also have a more subtle problem where, after mounting and then quickly
force-unmounting a ZFS filesystem (call it A) with two nullfs-mounted
filesystems and a devfs filesystem within it, running ls on the
mountpoint of the parent filesystem of A hangs.

I'm working on narrowing it down to a shell script like the above - as
soon as I have one I'll post a followup.

This latter problem is actually more of an issue for me - I can avoid
the behaviour which triggers the panic (if it hurts, don't do it), but
I need to be able to perform the actions which trigger the deadlock
(mounting and unmounting filesystems).

This also affects 8.1-R, 8.2-R, 8-STABLE and 8.2-R+v28.

It seems to be the zfs umount -f process which hangs and triggers
further accesses to the parent filesystem to hang. Note that I have
definitely correctly unmounted the nullfs and devfs mounts from within
the filesystem before I force the unmount. Unfortunately the -f is
necessary in my application.

After the hang:

hybrid@dev3:/opt/HybridCluster$ sudo ps ax |grep zfs
   41  ??  DL 0:00.11 [zfskern]
 3751  ??  D  0:00.03 /sbin/zfs unmount -f hpool/hcfs/filesystem1

hybrid@dev3:/opt/HybridCluster$ sudo procstat -kk 3751
  PIDTID COMM TDNAME
KSTACK   
 3751 100264 zfs  -mi_switch+0x16f
sleepq_wait+0x42 _sleep+0x31c zfsvfs_teardown+0x269 zfs_umount+0x1a7
dounmount+0x28a unmount+0x3c8 syscall+0x1e7 Xfast_syscall+0xe1 

hybrid@dev3:/opt/HybridCluster$ sudo procstat -kk 41
  PIDTID COMM TDNAME
KSTACK   
   41 100058 zfskern  arc_reclaim_thre mi_switch+0x16f
sleepq_timedwait+0x42 _cv_timedwait+0x129 arc_reclaim_thread+0x2d1
fork_exit+0x118 fork_trampoline+0xe 
   41 100062 zfskern  l2arc_feed_threa mi_switch+0x16f
sleepq_timedwait+0x42 _cv_timedwait+0x129 l2arc_feed_thread+0x1be
fork_exit+0x118 fork_trampoline+0xe 
   41 100090 zfskern  txg_thread_enter mi_switch+0x16f
sleepq_wait+0x42 _cv_wait+0x111 txg_thread_wait+0x79 txg_quiesce_thread
+0xb5 fork_exit+0x118 fork_trampoline+0xe 
   41 100091 zfskern  txg_thread_enter mi_switch+0x16f
sleepq_timedwait+0x42 _cv_timedwait+0x129 txg_thread_wait+0x3c
txg_sync_thread+0x355 fork_exit+0x118 fork_trampoline+0xe 

I will continue to attempt to create a shell script which makes this
latter bug easily reproducible.

In the meantime, what further information can I gather? I will build a
debug kernel in the morning.

If it helps accelerate finding a solution to this problem, Hybrid Logic
Ltd might be able to fund a small bounty for a fix. Contact me off-list
if you can help in this way.

-- 
Best Regards,
Luke Marsden
CTO, Hybrid Logic Ltd.

Web: http://www.hybrid-cluster.com/
Hybrid Web Cluster - cloud web hosting

Phone: +441172232002 / +16179496062



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


ZFS hanging with simultaneous zfs recv and zfs umount -f

2011-02-01 Thread Luke Marsden
Hi FreeBSD-{stable,current,fs},

I've reliably been able to cause the ZFS subsystem to hang under FreeBSD
8.1-RELEASE under the following conditions:

Another server is sending the server an incremental snapshot stream
which is in the process of being received with:

zfs send -I $OLD $FS@$NEW |ssh $HOST zfs recv -uF $FILESYSTEM

On the receiving server, we forcibly unmount the filesystem which is
being received into with:

zfs umount -f $FILESYSTEM

(the filesystem may or may not actually be mounted)

This causes any ZFS file operation (such as ls) to hang forever and
when attempting to reboot the machine, it goes down and stops responding
to pings, but then hangs somewhere in the reboot process and needs a
hard power cycle. Unfortunately we don't have a remote console on this
machine.

I understand this is a fairly harsh use case but the ideal behaviour
would be for the zfs recv to emit an error message (if necessary) rather
than rendering the entire machine unusable ;-)

Let me know if you need any further information. I appreciate that
providing a script to reliably reproduce the problem, testing on
-CURRENT and 8.2-PRE, and submitting a bug report will help... I will do
this in due course, but don't have time right now -- just wanted to get
this bug report out there first in case there's an obvious fix.

Thank you for supporting ZFS on FreeBSD!!

-- 
Best Regards,
Luke Marsden
CTO, Hybrid Logic Ltd.

Web: http://www.hybrid-cluster.com/
Hybrid Web Cluster - cloud web hosting

Phone: +441172232002 / +16179496062



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


ZFS hanging with simultaneous zfs recv and zfs umount -f

2011-02-01 Thread Luke Marsden
Hi FreeBSD-{stable,current,fs},

I've reliably been able to cause the ZFS subsystem to hang under FreeBSD
8.1-RELEASE under the following conditions:

Another server is sending the server an incremental snapshot stream
which is in the process of being received with:

zfs send -I $OLD $FS@$NEW |ssh $HOST zfs recv -uF $FILESYSTEM

On the receiving server, we forcibly unmount the filesystem which is
being received into with:

zfs umount -f $FILESYSTEM

(the filesystem may or may not actually be mounted)

This causes any ZFS file operation (such as ls) to hang forever and
when attempting to reboot the machine, it goes down and stops responding
to pings, but then hangs somewhere in the reboot process and needs a
hard power cycle. Unfortunately we don't have a remote console on this
machine.

I understand this is a fairly harsh use case but the ideal behaviour
would be for the zfs recv to emit an error message (if necessary) rather
than rendering the entire machine unusable ;-)

Let me know if you need any further information. I appreciate that
providing a script to reliably reproduce the problem, testing on
-CURRENT and 8.2-PRE, and submitting a bug report will help... I will do
this in due course, but don't have time right now -- just wanted to get
this bug report out there first in case there's an obvious fix.

Thank you for supporting ZFS on FreeBSD!!

-- 
Best Regards,
Luke Marsden
CTO, Hybrid Logic Ltd.

Web: http://www.hybrid-cluster.com/
Hybrid Web Cluster - cloud web hosting

Phone: +441172232002 / +16179496062



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Virtio drivers for FreeBSD on KVM

2010-12-30 Thread Luke Marsden
Hi everyone,

With more cloud infrastructure providers using KVM than ever before, the
importance of having FreeBSD performant as a guest on these
infrastructures [1], [2], [3] is increasing. It seems that using Virtio
drivers give a pretty significant performance boost [4], [5].

There was a NetBSD driver, and there seems to (have been) some work
happening to port this to DragonFly BSD at [6] and [7] -- does anyone
know if this code is stable, or if it has stalled, or if anyone's
working on it?

It may be possible to use the work done on the Xen paravirtualised
network and disk drivers, combined with the NetBSD code, as starting
point for an implementation?

My company might soon be in a position to sponsor the work to get this
completed and available at some point in FreeBSD 8. I'd be very
interested to hear from anyone who's involved, or who might like to be.

-- 
Best Regards,
Luke Marsden
CTO, Hybrid Logic Ltd.

Web: http://www.hybrid-cluster.com/
Hybrid Web Cluster - cloud web hosting

Mobile: +447791750420


[1] http://www.elastichosts.com/
[2] http://www.cloudsigma.com/
[3] http://beta.brightbox.com/
[4] http://arstechnica.com/civs/viewtopic.php?f=16t=34039
[5] blog.loftninjas.org/2008/10/22/kvm-virtio-network-performance/
[6] kerneltrap.org/mailarchive/dragonflybsd-kernel/2010/10/23/6884356
[7] http://gitorious.org/virtio-drivers


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problem running 8.1R on KVM with AMD hosts

2010-09-30 Thread Luke Marsden
Hi FreeBSD-stable,

  1. Please, build your kernel with debug symbols.
  2. Show kgdb output

I could not convince the kernel to dump (it was looping forever but not
panicing), but I have managed to compiled a kernel with debugging
symbols and DDB which immediately drops into the debugger when the
problem occurs, see screenshot at:

http://lukemarsden.net/kvm-panic.png

Progress, I sense.

I tried typing 'panic' on the understanding that this should force a
panic and cause it would dump core to the configured swap device (I have
set dump* in /etc/rc.conf) so that I could get you the kgdb output, but
it just looped back into the debugger. This issue seems to occur very
early in the boot process.

I would like to invite anyone with the skills and the inclination to
have a poke around with this directly over VNC to email me off-list and
I will turn on the VM and send you the VNC credentials. My email address
is: luke [at] hybrid-logic.co.uk

Or you can catch me on Skype at luke.marsden. I'm in GMT+1.

I look forward to hearing from you ;-)

-- 
Best Regards,
Luke Marsden
Hybrid Logic Ltd.

Web: http://www.hybrid-cluster.com/
Hybrid Web Cluster - cloud web hosting based on FreeBSD and ZFS

Mobile: +447791750420


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problem running 8.1R on KVM with AMD hosts

2010-09-30 Thread Luke Marsden
On Thu, 2010-09-30 at 18:55 -0400, Jung-uk Kim wrote:
 It seems MCA capability is advertised by the CPUID translator but 
 writing to the MSRs causes GPF.  In other words, it seems like a CPU 
 emulator bug.  A simple workaround is 'set hw.mca.enabled=0' from the 
 loader prompt.  If it works, add hw.mca.enabled=0 
 in /boot/loader.conf to make it permanent.  MCA does not make any 
 sense in emulation any way.

Awesome, this allows us to boot 8.1R on Linux KVM with AMD hardware!

Thank you very much. This has just doubled our number of availability
zones.

-- 
Best Regards,
Luke Marsden
Hybrid Logic Ltd.

Web: http://www.hybrid-cluster.com/
Hybrid Web Cluster - cloud web hosting based on FreeBSD and ZFS

Mobile: +447791750420

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problem running 8.1R on KVM with AMD hosts

2010-09-27 Thread Luke Marsden
Hi all,

Thanks for your responses.

 1. Please, build your kernel with debug symbols.
 2. Show kgdb output

I will build a debug kernel as per your instructions and post the
results as soon as I can. Likely in the next couple of days.

I have secured us test hardware at ElasticHosts to debug this as
necessary. As a reference point, 8.0R runs fine on this particular
infrastructure: Linux KVM on AMD hardware. More detail to follow.

Thank you.

-- 
Best Regards,
Luke Marsden
Hybrid Logic Ltd.

Web: http://www.hybrid-cluster.com/
Hybrid Web Cluster - cloud web hosting based on FreeBSD and ZFS

Mobile: +447791750420

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org