Re: [zfs-discuss] Reading ZFS config for an extended period

2010-10-12 Thread Anton Pomozov
I have 1TB mirror deduplicated pool.
snv_134 runned on x86 i7 PC with 8GB RAM
I destroyed 30GB zfs volume and now trying to import that pool at the LiveUSB 
runned osol.
It works 2h already, I'm waiting ...

How can I see some progressbar or another signs of current import job?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Reading ZFS config for an extended period

2010-10-12 Thread Roy Sigurd Karlsbakk
- Original Message -
 I have 1TB mirror deduplicated pool.
 snv_134 runned on x86 i7 PC with 8GB RAM
 I destroyed 30GB zfs volume and now trying to import that pool at the
 LiveUSB runned osol.
 It works 2h already, I'm waiting ...

It may even take longer. I've seen this take a while. It's a known bug. The fix 
is not to use dedup...

 How can I see some progressbar or another signs of current import job?

You can't. If you reboot, the system will likely hang until the volume is 
removed. It should be possible to take the system up in single user mode, but 
you should probably just wait

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Reading ZFS config for an extended period

2010-10-12 Thread Anton Pomozov
It finished!
Going to switch off dedup ... if it's possible yet
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Reading ZFS config for an extended period

2010-02-17 Thread Khyron
Ugh!  If you received a direct response to me instead of via the list,
apologies for
that.


Rob:

I'm just reporting the news.  The RFE is out there.  Just like SLOGs, I
happen to
think it a good idea, personally, but that's my personal opinion.  If it
makes dedup
more usable, I don't see the harm.


Taemun:

The issue, as I understand it, is not use-lots-of-cpu or just dies from
paging.  I
believe it is more to do with all of the small, random reads/writes in
updating the
DDT.

Remember, the DDT is stored within the pool, just as the ZIL is if you don't
have
a SLOG.  (The S in SLOG standing for separate.)  So all the DDT updates
are in
competition for I/O with the actual data deletion.  If the DDT could be
stored as
a separate VDEV already, I'm sure a way would have been hacked together by
someone (likely someone on this list).  Hence, the need for the RFE to
create this
functionality where it does not currently exist.  The DDT is separate from
the ARC
or L2ARC.

Here's the bug:

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6913566

If I'm incorrect, someone please let me know.


Markus:

Yes, the issue would appear to be dataset size vs. RAM size.  Sounds like an
area
ripe for testing, much like RAID Z3 performance.


Cheers all!

On Tue, Feb 16, 2010 at 00:20, taemun tae...@gmail.com wrote:

 The system in question has 8GB of ram. It never paged during the
 import (unless I was asleep at that point, but anyway).

 It ran for 52 hours, then started doing 47% kernel cpu usage. At this
 stage, dtrace stopped responding, and so iopattern died, as did
 iostat. It was also increasing ram usage rapidly (15mb / minute).
 After an hour of that, the cpu went up to 76%. An hour later, CPU
 usage stopped. Hard drives were churning throughout all of this
 (albeit at a rate that looks like each vdev is being controller by a
 single threaded operation).

 I'm guessing that if you don't have enough ram, it gets stuck on the
 use-lots-of-cpu phase, and just dies from too much paging. Of course,
 I have absolutely nothing to back that up.

 Personally, I think that if L2ARC devices were persistent, we already
 have the mechanism in place for storing the DDT as a seperate vdev.
 The problem is, there is nothing you can run at boot time to populate
 the L2ARC, so the dedup writes are ridiculously slow until the cache
 is warm. If the cache stayed warm, or there was an option to forcibly
 warm up the cache, this could be somewhat alleviated.

 Cheers




-- 
You can choose your friends, you can choose the deals. - Equity Private

If Linux is faster, it's a Solaris bug. - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Reading ZFS config for an extended period

2010-02-17 Thread Miles Nordin
 k == Khyron  khyron4...@gmail.com writes:

 k The RFE is out there.  Just like SLOGs, I happen to think it a
 k good idea, personally, but that's my personal opinion.  If it
 k makes dedup more usable, I don't see the harm.

slogs and l2arcs, modulo the current longstanding ``cannot import pool
with attached missing slog'' bug, are disposeable: You will lose
either a little data or no data if the device goes away (once the bug
is finally fixed).  This makes them less ponderous because these days
we are looking for raidz2 or raidz3 amount of redundancy, so in a
seperate device that wasn't disposeable we'd need a 3- or 4-way
mirror.  It also makes their seperateness more seperable since they
can go away at any time, so maybe they do deserve to be seperate.  The
two together make the complexity more bearable.

Would an sddt be disposeable, or would it be a critical top-level vdev
needed for import?  If it's critical, well, that's kind of annoying,
because now we need 3-way mirrors of sddt to match the minimum
best-practice redundancy of the rest of the pool's redundancy, and my
first reaction would be ``can you spread it throughout the normal
raidz{,2,3} vdevs at least in backup form?''

once I say a copy should be kept in the main pool even afer it becomes
an sddt, well, what's that imply?

 * In the read case it means cacheing, so it could go in the l2arc.
   How's DDT different from anything else in the l2arc?

 * In the write case it means sometimes commiting it quickly without
   waiting on the main pool so we can release some lock or answer some
   RPC and continue.  Why not write it to the slog?  Then if we lose
   the slog we can do what we always do without the slog and roll back
   to the last valid txg, losing whatever writes were associated with
   that lost ddt update.

The two cases fit fine with the types of SSD's we're using for each
role and the type of special error recovery we have if we lose the
device.  Why litter a landscape so full of special cases and tricks
(like the ``import pool with missing slog'' bug that is taking so long
to resolve) with yet another kind of vdev that will take 1 year to
discover special cases and a halfdecade to address them?

Maybe there is a reason.  Are DDT write patterns different than slog
write patterns?  Is it possible to make a DDT read cache using less
ARC for pointers than the l2arc currently uses?  Is the DDT
particularly hard on the l2arc by having small block sizes?  Will the
sddt be delivered with a separate offline ``not an fsck!!!'' tool for
slowly regenerating it from pool data if it's lost, or maybe after an
sddt goes bad the pool can be mounted space-wastingly as in like no
dedup is done and deletes do not free space, with an empty DDT, and the
sddt regenerated by a scrub?  If the performance or recovery behavior
is different than what we're working towards with optional-slog and
persistent-l2arc then maybe sddt does deserve to be antoher vdev type.

soi dunno.  On one hand I'm clearly nowhere near informed enough
to weigh in on an architectural decision like this and shouldn't even
be discussing it, and the same applies to you Khyron, to my view,
since our input seems obvious at best and misinformed at worst.  On
the other hand, other major architectural changes (slog) was delivered
incomplete in a cripplingly bad and silly, trivial way for, AFAICT,
nothing but lack of sufficient sysadmin bitching and moaning, leaving
heaps of multi-terabyte naked pools out there for half a decade with
fancy triple redundancy that will be totally lost if a single SSD +
zpool.cache goes bad, so apparently thinking things through even at
this trivial level might have some value to the ones actually doing
the work.


pgp8j6Y2dtxrq.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Reading ZFS config for an extended period

2010-02-15 Thread taemun
Just thought I'd chime in for anyone who had read this - the import
operation completed this time, after 60 hours of disk grinding.

:)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Reading ZFS config for an extended period

2010-02-15 Thread Khyron
The DDT is stored within the pool, IIRC, but there is an RFE open to allow
you to
store it on a separate top level VDEV, like a SLOG.

The other thing I've noticed with all of the destroyed a large dataset with
dedup
enabled and it's taking forever to import/destory/insert function here
questions
is that the process runs so so so much faster with 8+ GiB of RAM.  Almost to
a man,
everyone who reports these 3, 4, or more day destroys has  8 GiB of RAM on
the
storage server.

Just some observations/thoughts.

On Mon, Feb 15, 2010 at 23:14, taemun tae...@gmail.com wrote:

 Just thought I'd chime in for anyone who had read this - the import
 operation completed this time, after 60 hours of disk grinding.

 :)
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




-- 
You can choose your friends, you can choose the deals. - Equity Private

If Linux is faster, it's a Solaris bug. - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Reading ZFS config for an extended period

2010-02-15 Thread Rob Logan

  RFE open to allow you to store [DDT] on a separate top level VDEV

hmm, add to this spare, log and cache vdevs, its to the point of making
another pool and thinly provisioning volumes to maintain partitioning  
flexibility.

taemun: hay, thanks for closing the loop!

Rob
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Reading ZFS config for an extended period

2010-02-15 Thread taemun
The system in question has 8GB of ram. It never paged during the
import (unless I was asleep at that point, but anyway).

It ran for 52 hours, then started doing 47% kernel cpu usage. At this
stage, dtrace stopped responding, and so iopattern died, as did
iostat. It was also increasing ram usage rapidly (15mb / minute).
After an hour of that, the cpu went up to 76%. An hour later, CPU
usage stopped. Hard drives were churning throughout all of this
(albeit at a rate that looks like each vdev is being controller by a
single threaded operation).

I'm guessing that if you don't have enough ram, it gets stuck on the
use-lots-of-cpu phase, and just dies from too much paging. Of course,
I have absolutely nothing to back that up.

Personally, I think that if L2ARC devices were persistent, we already
have the mechanism in place for storing the DDT as a seperate vdev.
The problem is, there is nothing you can run at boot time to populate
the L2ARC, so the dedup writes are ridiculously slow until the cache
is warm. If the cache stayed warm, or there was an option to forcibly
warm up the cache, this could be somewhat alleviated.

Cheers
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Reading ZFS config for an extended period

2010-02-15 Thread Markus Kovero
 The other thing I've noticed with all of the destroyed a large dataset with 
 dedup 
 enabled and it's taking forever to import/destory/insert function here 
 questions 
 is that the process runs so so so much faster with 8+ GiB of RAM.  Almost to 
 a man, 
 everyone who reports these 3, 4, or more day destroys has  8 GiB of RAM on 
 the 
 storage server.

I've witnessed destroys that take several days with 24GB+ systems (dataset over 
30TB). I guess it's just matter of how large datasets vs. how much ram.

Yours
Markus Kovero
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Reading ZFS config for an extended period

2010-02-13 Thread taemun
After around four days the process appeared to have stalled (no
audible hard drive activity). I restarted with milestone=none; deleted
/etc/zfs/zpool.cache, restarted, and went zpool import tank. (also
allowed root login to ssh, so I could make new ssh sessions if
required.) Now I can watch the process from on the machine.

My present question is how is the DDT stored? I believe the DDT to
have around 10M entries for this dataset, as per:
DDT-sha256-zap-duplicate: 400478 entries, size 490 on disk, 295 in core
DDT-sha256-zap-unique: 10965661 entries, size 381 on disk, 187 in core
(taken just previous to the attempt to destroy the dataset)

A sample from iopattern shows:
%RAN %SEQ  COUNTMINMAXAVG KR
 1000195512512512 97
 1000414512  65536895362
 1000261512512512130
 1000273512512512136
 1000247512512512123
 1000297512512512148
 1000292512512512146
 1000250512512512125
 1000274512512512137
 1000302512512512151
 1000294512512512147
 1000308512512512154
  982286512512512143
 1000270512512512135
 1000390512512512195
 1000269512512512134
 1000251512512512125
 1000254512512512127
 1000265512512512132
 1000283512512512141

As the pool is comprised of 2x 8-disk raidz vdevs, I presume that each
element is stored twice (for the raidz redundancy). So around 280 512b
read op/s, that's 140 entries per second.

Is the import of a semi-broken pool:
1 Reading all the DDT markers for the dataset; or
2 Reading all the DDT markers for the pool; or
3 Reading all of the block markers for the dataset; or
4 Reading all of the block markers for the pool
Prior to actually finalising what it needs to do to fix the pool? I'd
like to be able to estimate the length of time likely before the
import finishes.

Or should I tell it to roll back to the last valid txg - ie before the
zfs destroy dataset command was issued? (by zpool import -F.) Or is
this likely to take as long/longer than the present import/fix?

Cheers.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Reading ZFS config for an extended period

2010-02-11 Thread Lori Alt

On 02/11/10 08:15, taemun wrote:

Can anyone comment about whether the on-boot Reading ZFS confi is
any slower/better/whatever than deleting zpool.cache, rebooting and
manually importing?

I've been waiting more than 30 hours for this system to come up. There
is a pool with 13TB of data attached. The system locked up whilst
destroying a 934GB dedup'd dataset, and I was forced to reboot it. I
can hear hard drive activity presently - ie its doing
bsomething/b, but am really hoping there is a better way :)

Thanks
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  
I think that this is a consequence of 6924390.-  
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6924390 ZFS 
destroy on de-duped dataset locks all I/O


This bug is closed as a dup of another bug which is not readable from 
the opensolaris site, (I'm not clear what makes some bugs readable and 
some not).


While trying to reproduce 6924390 (or its equivalent) yesterday, my 
system hung as yours did, and when I rebooted, it hung at Reading ZFS 
config.


Someone who knows more about the root cause of this situation (i.e., the 
bug named above) might be able tell you what's going on and how to 
recover (it might be that what's going on is that the destroy has 
resumed and you have to wait for it to complete, which I think it will, 
but it might take a long time).


Lori

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Reading ZFS config for an extended period

2010-02-11 Thread Bill Sommerfeld

On 02/11/10 10:33, Lori Alt wrote:

This bug is closed as a dup of another bug which is not readable from
the opensolaris site, (I'm not clear what makes some bugs readable and
some not).


the other bug in question was opened yesterday and probably hasn't had 
time to propagate.


- Bill


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Reading ZFS config for an extended period

2010-02-11 Thread taemun
Do you think that more RAM would help this progress faster? We've just
hit 48 hours. No visible progress (although that doesn't really mean
much).

It is presently in a system with 8GB of ram, I could try to move the
pool across to a system with 20GB of ram, if that is likely to
expedite the process. Of course, if it isn't going to make any
difference, I'd rather not restart this process.

Thanks

On 12 February 2010 06:08, Bill Sommerfeld sommerf...@sun.com wrote:
 On 02/11/10 10:33, Lori Alt wrote:

 This bug is closed as a dup of another bug which is not readable from
 the opensolaris site, (I'm not clear what makes some bugs readable and
 some not).

 the other bug in question was opened yesterday and probably hasn't had time
 to propagate.

                                        - Bill



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss