Jeremy Archer wrote:
> I started to look at ref counting to convince myself that the db_bu field in
> a cached dmu_impl_t object
> is guaranteed to point at a valid arc_buf_t.
>
> I have seen a "deadbeef" crash on a busy system when zfs_write() is
> pre-pagefaulting in
> the file's pages.
>
Hi Jeremy,
Jeremy Archer wrote:
> Hello,
>
> I believe the following is true, correct me if it is not:
>
> If more than one objects reference a block (e.g. 2 files have the same block
> open)
> there must be multiple clones of the arc_buf_t ( and associated dmu_impl_t )
> records
> present, o
Jeremy Archer wrote:
> Thanks for the explanation.
>
>> a dirty
>> buffer goes onto the
>> list corresponding to the txg it belongs to.
>
> Ok. I see that all dirty buffers are put on a per txg list.
> This is for easy synchronization, makes sense.
>
> The per dmu_buf_impl_t details
Jeremy Archer wrote:
> Greets,
>
> I have read a couple of earlier posts by Jeff and Mark Maybee explaining how
> Arc reference counting works.
> These posts did help clarifying this piece of code ( a bit complex, to say
> the least).
> I would like to solicit more com
Ben Rockwood wrote:
> I need some help with clarification.
>
> My understanding is that there are 2 instances in which ZFS will write
> to disk:
> 1) TXG Sync
> 2) ZIL
>
> Post-snv_87 a TXG should sync out when the TXG is either over filled or
> hits the timeout of 30 seconds.
>
> First question
Ben Rockwood wrote:
> Mark Maybee wrote:
>> Ben Rockwood wrote:
>>> I need some help with clarification.
>>>
>>> My understanding is that there are 2 instances in which ZFS will write
>>> to disk:
>>> 1) TXG Sync
>>> 2) ZIL
>&g
Gack! Absolutely correct J?rgen. I have filed 6806627 to track this.
-Mark
J?rgen Keil wrote:
> It seems there is a bug introduced by the putback for
>
> author: Mark Maybee
> date: Wed Jan 28 11:04:37 2009 -0700 (2 weeks ago)
> files:usr/src/uts/comm
Pawel Jakub Dawidek wrote:
> On Sat, Apr 18, 2009 at 06:05:56PM -0700, Matthew.Ahrens at sun.com wrote:
>> Author: Matthew Ahrens
>> Repository: /hg/onnv/onnv-gate
>> Latest revision: f41cf682d0d3e3cf5c4ec17669b903ae621ef882
>> Total changesets: 1
>> Log message:
>> PSARC/2009/204 ZFS user/group q
J Duff wrote:
> I'm trying to understand the arc code.
>
> Can a read zio and a write zio share the same arc_buf_hdr_t?
>
No.
> If so, do they each have their own arc_buf_t both of which point back to the
> same arc_buf_hdr_t? In other words, do they each have their own copy of the
> data (arc
ZFS is designed to "sync" a transaction group about every 5 seconds
under normal work loads. So your system looks to be operating as
designed. Is there some specific reason why you need to reduce this
interval? In general, this is a bad idea, as there is somewhat of a
"fixed overhead" associated
J Duff wrote:
> I?m trying to understand the inner workings of the adaptive replacement cache
> (arc). I see there are arc_bufs and arc_buf_hdrs. Each arc_buf_hdr points to
> an arc_buf_t. The arc_buf_t is really one entry in a list of arc_buf_t
> entries. The multiple entries are accessed throu
Ricardo M. Correia wrote:
> Hi Matthew,
>
> On Qui, 2008-02-07 at 12:48 -0800, Matthew Ahrens wrote:
>> on disk:
>> header:
>> struct sa_phys {
>> uint16_t sa_numattrs;
>> struct {
>> uint16_t sa_type; /* enum sssa_type */
>> uint16_t sa_leng
The arc_buf_hdr_t is an in-core-only data structure, so it has no
version impact. It does not matter where you add an extra field.
However, only add a field to this structure if you really need it...
increasing the size of this struct will likely have a negative
performance impact on the ARC.
-M
This fix for this bug is currently in test and will be pushed shortly.
-Mark
Pawel Jakub Dawidek wrote:
> On Tue, Jul 22, 2008 at 08:56:31AM -0600, Mark Shellenbaum wrote:
>> Pawel Jakub Dawidek wrote:
>>> On Tue, Jul 22, 2008 at 04:28:45PM +0200, Pawel Jakub Dawidek wrote:
Hi.
I j
This is a known bug:
6732083 arc_read() panic: rw_exit: lock not held
with a known cause. The fix you suggest works, but it rather ugly. We
are working on a fix now.
-Mark
Pawel Jakub Dawidek wrote:
> On Tue, Jul 29, 2008 at 12:41:16PM +0200, Pawel Jakub Dawidek wrote:
>> Hi.
>>
>> We're test
Andreas,
We have explored the idea of increasing the dnode size in the past
and discovered that a larger dnode size has a significant negative
performance impact on the ZPL (at least with our current caching
and read-ahead policies). So we don't have any plans to increase
its size generically any
Andreas Dilger wrote:
> On Sep 13, 2007 15:27 -0600, Mark Maybee wrote:
>> We have explored the idea of increasing the dnode size in the past
>> and discovered that a larger dnode size has a significant negative
>> performance impact on the ZPL (at least with our current cach
Pawel,
I'm not quite sure I understand why thread #1 below is stalled. Is
there only a single thread available for IO completion?
-Mark
Pawel Jakub Dawidek wrote:
> Hi.
>
> I'm observing the following deadlock.
>
> One thread holds zfsvfs->z_hold_mtx[i] lock and waits for I/O:
>
> Tracing pi
2AM -0700, Mark Maybee wrote:
>> Pawel,
>>
>> I'm not quite sure I understand why thread #1 below is stalled. Is
>> there only a single thread available for IO completion?
>
> There are few, but I belive the thread #2 is trying to complete the very
> I/O request o
Pawel Jakub Dawidek wrote:
> On Wed, Nov 07, 2007 at 07:41:54AM -0700, Mark Maybee wrote:
>> Hmm, seems rather unlikely that these two IOs are related. Thread 1
>> is trying to read a dnode in order to extract the znode data from its
>> bonus buffer. Thread 2 is completing a
Chris Kirby wrote:
> Matthew Ahrens wrote:
>> So, we use RW_LOCK_HELD() to mean, "might this thread hold the lock?" and
>> it
>> is generally only used in assertions. Eg, some routine should only be
>> called
>> with the lock held, so we "ASSERT(RW_LOCK_HELD(lock))". The fact that
>> someti
Chris Kirby wrote:
> Mark Maybee wrote:
>> Chris Kirby wrote:
>>
>>> Matthew Ahrens wrote:
>>>
>>>> So, we use RW_LOCK_HELD() to mean, "might this thread hold the
>>>> lock?" and it is generally only used in assertions. Eg,
Yes, its the same in Solaris. Its probably more correct to always to
do the dmu_write() as this keeps the page and file in sync.
-Mark
Pawel Jakub Dawidek wrote:
> Hi.
>
> I'm pondering this piece of mappedwrite():
>
> if (pp = page_lookup(vp, start, SE_SHARED)) {
> caddr_t
Ricardo Correia wrote:
> On Wednesday 09 May 2007 04:57:53 Ricardo Correia wrote:
>> 2) In the end of zfs_zget(), if the requested object number is not found,
>> it allocates a new znode with that object number. This shouldn't happen in
>> any FUSE operation.
>
> Apparently, I didn't (and I still
Yup, J?rgen is correct. The problem here is that we are blocked in
arc_data_buf_alloc() while holding a hash_lock. This is bug 6457639.
One possibility, for this specific bug might be to drop the lock before
the allocate and then redo the read lookup (in case there is a race)
with the necessary b
1, VOP_INACTIVE() is
> called.
> 2) VOP_INACTIVE() calls zfs_inactive() which calls zfs_zinactive().
> 3) zfs_zinactive() calls dmu_buf_rele()
> 4) ??
> 5) znode_pageout_func() calls zfs_znode_free() which finally frees the vnode.
>
> As for step 4, Mark Maybee mentioned:
>
m the community :-).
-Mark
Pawel Jakub Dawidek wrote:
> On Fri, May 18, 2007 at 08:22:26AM -0600, Mark Maybee wrote:
>> Yup, J?rgen is correct. The problem here is that we are blocked in
>> arc_data_buf_alloc() while holding a hash_lock. This is bug 6457639.
>> One possibil
Darren J Moffat wrote:
> Does the ARC get flushed for a dataset when it is unmounted ?
Yes
> What does change when a dataset is unmounted ?
>
Pretty much everything associated with the dataset should be evicted...
its possible that some of the meta-data may hang around I suppose (I
don't remembe
Darren J Moffat wrote:
> For an encrypted dataset it is possible that by the time we arrive in
> zio_write() [ zio_write_encrypt() ] that when we lookup which key is
> needed to encrypted this data that key isn't available to us.
>
> Is there some value of zio->io_error I can set that will not r
Atul Vidwansa wrote:
> Hi,
>I have few questions about the way a transaction group is created.
>
> 1. Is it possible to group transactions related to multiple operations
> in same group? For example, an "rmdir foo" followed by "mkdir bar",
> can these end up in same transaction group?
>
Yes.
Mike,
Please post this sort of query to zfs-discuss (rather than zfs-code).
zfs-code is a development discussion forum.
Without any form of replication that zfs knows about (RAIDZ or mirrors),
there is no way for ZFS to fix up data errors detected in a scrub.
RAID5 LUNs just look like normal devi
Daniel Rock wrote:
> Hi,
>
> I just triggered a kernel panic while trying to import a zpool.
>
> The disk in the zpool was residing on a Symmetrix and mirrored with SRDF. The
> host sees both devices though (one writeable device "R1" on one
> Symmetrix box and one write protected device "R2" on
Darren,
I looked a bit at your dumps... in both cases, the problem is that the
os_phys block that we read from the disk is garbage:
> 0x9377b000::print objset_phys_t
{
os_meta_dnode = {
dn_type = 0
dn_indblkshift = 0
dn_nlevels = 0
dn_nblkptr = 0
pipeline rather than the async_stages.
Its still not clear to me how this can result in your problems, but
then I don't yet understand how the SPA io pipeline works in all
circumstances.
-Mark
Mark Maybee wrote:
> Darren,
>
> I looked a bit at your dumps... in both cases, the probl
Jeremy Teo wrote:
> Heya,
>
> just a short blurb of what I understand from grokking dmu_zfetch.c
>
> Basically the current code issues a prefetch (ie. create a new
> prefecth stream) whenever a block (level 0, DB_RF_NOPREFETCH is not
> set) is is read in dbuf_read.
>
> Since ZFS is multi-threade
Pawel Jakub Dawidek wrote:
> ZFS works really stable on FreeBSD, but I'm biggest problem is how to
> control ZFS memory usage. I've no idea how to leash that beast.
>
> FreeBSD has a backpresure mechanism. I can register my function so it
> will be called when there are memory problems, which I do
Pawel Jakub Dawidek wrote:
> On Tue, Nov 07, 2006 at 06:06:48PM -0700, Mark Maybee wrote:
>
>>The problem is that in ZFS the vnode holds onto more memory than just
>>the vnode itself. Its fine to place the vnode on a "free vnodes list"
>>after a VOP_INACTIVE(
Pawel Jakub Dawidek wrote:
> On Fri, Nov 10, 2006 at 06:36:07AM -0700, Mark Maybee wrote:
>
>>Pawel Jakub Dawidek wrote:
>>
>>>On Tue, Nov 07, 2006 at 06:06:48PM -0700, Mark Maybee wrote:
>>>
>>>>The problem is that in ZFS the vnode holds onto more
Pawel Jakub Dawidek wrote:
> Hi.
>
> FreeBSD's WITNESS mechanism for detecting lock order reversals reports
> LOR here:
>
> lock order reversal:
> 1st 0xc3f7738c zfs:dbuf (zfs:dbuf) @
> /zoo/pjd/zfstest/sys/modules/zfs/../../contrib/opensolaris/uts/common/fs/zfs/dnode_sync.c:410
> 2nd 0xc3fefc
Pawel Jakub Dawidek wrote:
> I had another one, can you analize it?
>
> lock order reversal:
> 1st 0xc44b9b00 zfs:dbuf (zfs:dbuf) @
> /zoo/pjd/zfstest/sys/modules/zfs/../../contrib/opensolaris/uts/common/fs/zfs/dbuf.c:1644
> 2nd 0xc45be898 zfs:dbufs (zfs:dbufs) @
> /zoo/pjd/zfstest/sys/modules
No, nested transactions are not allowed.
(your sense is correct :-)).
-Mark
Jeremy Teo wrote:
> Are nested transactions allowed/supported by the DMU?
>
> Namely, if I have function Foo and function Bar that both wrap their
> own operations using a transaction such that Foo and Bar are
> atomic/
Hi Darren,
Sorry about the slow response (from me). I was on vacation last
week (and am on semi-vacation this week).
I can't answer your question about using the zio transform stuff.
You will have to get Jeff or Bill's attention for that.
As far as the ARC "hook" goes: it doesn't yet exist. Yo
Sorry Pawel,
The team is rather slammed with work at the moment, so we may be a bit
slow at getting to things like this. We certainly appreciate getting
these patches though.
-Mark
Pawel Jakub Dawidek wrote:
> On Mon, Nov 13, 2006 at 03:14:04PM +0100, Pawel Jakub Dawidek wrote:
>
>>The patch b
Pawel Jakub Dawidek wrote:
>Hi.
>
>I'm currently working on snapshots and can't understand one thing.
>
>When someone lookups a snapshot directory it gets automatically mounted
>from zfsctl_snapdir_lookup() via domount().
>
>Ok, domount() ends up in zfs_domount(). zfs_domount() wants to open
>data
44 matches
Mail list logo