Re: [zfs-discuss] Deleting large amounts of files

2010-07-21 Thread Hernan Freschi
On Tue, Jul 20, 2010 at 1:40 PM, Ulrich Graef ulrich.gr...@oracle.com wrote:

 When you are writing to a file and currently dedup is enabled, then the
 Data is entered into the dedup table of the pool.
 (There is one dedup table per pool not per zfs).

 Switching off the dedup does not change this data.
Yes, i suppose so (just as enabling dedup or compression doesn't alter
on-disk data),

 After switching off dedup, he dedup table is used until this file is deleted
 or overwritten.

 Deleting or overwriting then accesses the dedup table and corrects the
 reference count.

Is there a way to see which files are using dedup? Or should I just
copy everything  to a new ZFS?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Deleting large amounts of files

2010-07-19 Thread Hernan Freschi
Hi, thanks for answering,

 How large is your ARC / your main memory?
   Probably too small to hold all metadata (1/1000 of the data amount).
   = metadata has to be read again and again

Main memory is 8GB. ARC (according to arcstat.pl) usually stays at 5-7GB

 A recordsize smaller than 128k increases the problem.

recordsize is default, 128k

 Its a data volume, perhaps raidz or raidz2 and you are using an older ZPOOL 
 version?
It's raidz, pool version is 22

   Reading is done for the whole raid stripe when you are reading a block.

   = the whole raidz stripe has the attributes of a single disk (see Roch's 
 blog).

 The number of files is not specified.
some 20 files deleted, each about 4GB in size

 Updating the dedup table needs random access of the table.
dedup was enabled at some point, but I disabled it long ago. Does it
still matter? Should I copy all these files again (or zfs send) to
un-dedup those blocks?


 ~ 60 reads per second is normal for a sata disk with 7200 RPM.

shouldnt ~60 reads per second at about 128k (not counting prefetch) be
about 7MB/s, instead of the 144kbps (!) I'm getting?


 so far nothing suprising...


 Regards,

    Ulrich



 - Original Message -
 From: drge...@gmail.com
 To: zfs-discuss@opensolaris.org
 Sent: Monday, July 19, 2010 5:14:03 PM GMT +01:00 Amsterdam / Berlin / Bern / 
 Rome / Stockholm / Vienna
 Subject: [zfs-discuss] Deleting large amounts of files

 Hello,
 I think this is the second time this happens to me. A couple of year ago, I 
 deleted a big (500G) zvol and then the machine started to hang some 20 
 minutes later (out of memory), even rebooting didnt help. But with the great 
 support from Victor Latushkin, who on a weekend helped me debug the problem 
 (abort the transaction and restart it again, which required some black magic 
 and recompiling of ZFS) it worked.

 Now I'm facing a similar problem. I was writing about 20GB (from CIFS) to a 
 filesystem. While that was going, I deleted some old files, freeing up about 
 60GB in the process. After Windows was done deleting those (it was instant), 
 i tried to delete another file, which I didnt have permision to. So I SSHd to 
 the machine and removed it manually (pfexec rm file). And thats where 
 problems started.

 First, I noticed the rm wasnt instant. It was taking long (over 5 minutes). I 
 tried Ctrl-C, Ctrl-Z, another SSH and kill, nothing worked. After a while it 
 died with killed. I did a zfs list, and noticed the free space wasn't 
 updated.

 I tried sync, it also hangs. I try a reboot - it won't, I guess it's 
 waiting for the sync to finish. So I hard reboot the machine. When it comes 
 back I can access the ZFS pool again. I go to the directory where I tried to 
 delete the files with rm: files are still there (they weren't before the 
 reboot).

 I try a sync again. Same result (hang). top shows a decreasing amount of 
 free memory. zpool iostat 5 shows:

 rpool       69.4G  79.6G      0      0      0      0
 tera        3.12T   513G     63      0   144K      0
 --  -  -  -  -  -  -
 rpool       69.4G  79.6G      0      0      0      0
 tera        3.12T   513G     63      0   142K      0
 --  -  -  -  -  -  -
 rpool       69.4G  79.6G      0      0      0      0
 tera        3.12T   513G     62      0   142K      0
 --  -  -  -  -  -  -
 rpool       69.4G  79.6G      0      0      0      0
 tera        3.12T   513G     64      0   144K      0
 --  -  -  -  -  -  -
 rpool       69.4G  79.6G      0      0      0      0
 tera        3.12T   513G     65      0   148K      0

 Could this be related to the fact that I THINK i enabled deduplication on 
 this pool a while ago (but then I disabled it due to performance reasons)?

 What should I do? Do I have to wait for these reads to finish? Why are they 
 so slow anyway?

 Thanks,
 Hernan
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] problems with ludelete

2008-11-04 Thread Hernan Freschi
Hi, I'm not sure if this is the right place to ask. I'm having a little trouble 
deleting old solaris installs:

[EMAIL PROTECTED]:~]# lustatus
Boot Environment   Is   Active ActiveCanCopy
Name   Complete NowOn Reboot Delete Status
--  -- - -- --
b90yes  no noyes-
snv95  yes  no noyes-
snv101 yes  yesyes   no -
[EMAIL PROTECTED]:~]# lu
lu  lucancellucreateludeletelufslistlumount 
lustatusluupgrade
luactivate  lucompare   lucurr  ludesc  lumake  lurename
luumountluxadm
[EMAIL PROTECTED]:~]# lustatus
Boot Environment   Is   Active ActiveCanCopy
Name   Complete NowOn Reboot Delete Status
--  -- - -- --
b90yes  no noyes-
snv95  yes  no noyes-
snv101 yes  yesyes   no -
[EMAIL PROTECTED]:~]# ludelete b90
System has findroot enabled GRUB
Checking if last BE on any disk...
ERROR: lulib_umount: failed to umount BE: snv95.
ERROR: This boot environment b90 is the last BE on the above disk.
ERROR: Deleting this BE may make it impossible to boot from this disk.
ERROR: However you may still boot solaris if you have BE(s) on other disks.
ERROR: You *may* have to change boot-device order in the BIOS to accomplish 
this.
ERROR: If you still want to delete this BE b90, please use the force option 
(-f).
Unable to delete boot environment.
[EMAIL PROTECTED]:~]# ludelete snv95
System has findroot enabled GRUB
Checking if last BE on any disk...
ERROR: lulib_umount: failed to umount BE: snv95.
ERROR: This boot environment snv95 is the last BE on the above disk.
ERROR: Deleting this BE may make it impossible to boot from this disk.
ERROR: However you may still boot solaris if you have BE(s) on other disks.
ERROR: You *may* have to change boot-device order in the BIOS to accomplish 
this.
ERROR: If you still want to delete this BE snv95, please use the force option 
(-f).
Unable to delete boot environment.

if anyone could help me I'd appreciate it.

Thanks,
Hernan
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] can anyone help me?

2008-06-03 Thread Hernan Freschi
no, weird situation. I unplugged the disks from the controller (I have them 
labeled) before upgrading to snv89. after the upgrade, the controller names 
changed.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] can anyone help me?

2008-06-02 Thread Hernan Freschi
Thanks for your answer, 
 after looking at your posts my suggestion would be to
 try the OpenSolaris 2008.05 Live CD and to import
 your pool using the CD. That CD is nv86 + some extra
 fixes.
I upgraded the snv85 to snv89 to see if it helped, but it didn't. I'll try to 
download the 2008.05 CD again (the ISO for that is one of the things trapped in 
the pool I can't import).
 
 But an upgrade from Sol10 to NV is untested and
 nothing I would recommend at all. A fresh install of snvXY is
 what I know works.

Didn't know that. I was simply following the N+2 rule, upgrading 10 to 11.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] can anyone help me? [SOLVED]

2008-06-02 Thread Hernan Freschi
Well, finally managed to solve my issue, thanks to the invaluable help of 
Victor Latushkin, who I can't thank enough.

I'll post a more detailed step-by-step record of what he and I did (well, all 
credit to him actually) to solve this. Actually, the problem is still there 
(destroying a huge zvol or clone is slow and takes a LOT of memory, and will 
die when it runs out of memory), but now I'm able to import my zpool and all is 
there.

What Victor did was hack ZFS (libzfs) to force a rollback to abort the 
endless destroy, which was re-triggered every time the zpool was imported, as 
it was inconsistent. With this custom version of libzfs, setting an environment 
variable makes libzfs to bypass the destroy and jump to rollback, undoing the 
last destroy command.

I'll be posting the long version of the story soon.

Hernán
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] can anyone help me?

2008-06-01 Thread Hernan Freschi
I'll provide you with the results of these commands soon. But for the record, 
solaris does hang (dies out of memory, can't type anything on the console, 
etc). What I can do is boot with -k and get to kmdb when it's hung (BREAK over 
serial line). I have a crashdump I can upload.

I checked the disks with the drive manufacturers' tests and found no errors.
The controller is an NForce4 SATA on-board. zpool version is the latest (10). 
The non-default settings were removed, these were only for testing. No other 
non-default eeprom settings (other than the serial console options, but these 
were added after the problem started).
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] can anyone help me?

2008-06-01 Thread Hernan Freschi
Here's the output. Numbers may be a little off because I'm doing a nightly 
build and compressing a crashdump with bzip2 at the same time.

extended device statistics
r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
3.7   19.40.10.3  3.3  0.0  142.71.6   1   3 c0d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c0t1d0
0.00.00.00.0  0.0  0.00.1   12.6   0   0 c5t0d0
0.00.00.00.0  0.0  0.00.1   13.0   0   0 c5t1d0
0.00.00.00.0  0.0  0.00.1   12.6   0   0 c6t0d0
0.00.00.00.0  0.0  0.00.1   13.4   0   0 c6t1d0
extended device statistics
r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
   25.9   12.01.30.3  0.0  0.20.04.4   0  14 c0d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c0t1d0
   75.20.0   75.20.0  0.0  1.00.1   12.7   0  96 c5t0d0
   68.20.0   68.20.0  0.0  0.90.1   13.1   0  89 c5t1d0
   71.70.0   71.70.0  0.0  0.90.1   13.1   0  94 c6t0d0
   62.80.0   62.80.0  0.0  0.90.1   14.0   0  88 c6t1d0
extended device statistics
r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
   24.0   16.00.60.3  0.0  0.00.10.8   0   3 c0d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c0t1d0
   65.50.0   65.50.0  0.0  0.90.1   14.2   0  93 c5t0d0
   59.00.0   59.00.0  0.0  0.90.1   14.9   0  88 c5t1d0
   67.50.0   67.50.0  0.0  0.90.1   13.2   0  89 c6t0d0
   66.50.0   66.50.0  0.0  0.90.1   14.0   0  93 c6t1d0
extended device statistics
r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
   47.0   15.50.80.2  0.1  0.11.91.6   3   5 c0d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c0t1d0
   55.50.0   55.50.0  0.0  0.80.1   14.5   0  80 c5t0d0
   73.00.0   73.00.0  0.0  1.00.1   13.2   0  96 c5t1d0
   72.50.0   72.50.0  0.0  1.00.1   13.3   0  96 c6t0d0
   68.00.0   68.00.0  0.0  1.00.1   14.3   0  97 c6t1d0
extended device statistics
r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
0.09.50.00.2  0.0  0.00.00.3   0   0 c0d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c0t1d0
   65.00.0   65.00.0  0.0  0.90.1   14.5   0  94 c5t0d0
   73.50.0   73.50.0  0.0  0.90.1   12.8   0  94 c5t1d0
   75.00.0   75.00.0  0.0  0.90.1   11.8   0  89 c6t0d0
   68.50.0   68.50.0  0.0  0.90.1   13.9   0  95 c6t1d0
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] can anyone help me?

2008-05-31 Thread Hernan Freschi
Seriously, can anyone help me? I've been asking for a week. No relevant 
answers, just a couple of answers but none solved my problem or even pointed me 
in the right way, and my posts were bumped down into oblivion.

I don't know how to ask. My home server has been offline for over a week now 
because of a ZFS issue. Please, can anyone help me? I refuse to believe that 
The world's most advanced filesystem is so fragile that a simple, textbook, 
administration command can render it useless.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] can anyone help me?

2008-05-31 Thread Hernan Freschi
fwiw, here are my previous posts:

http://www.opensolaris.org/jive/thread.jspa?threadID=61301tstart=30
http://www.opensolaris.org/jive/thread.jspa?threadID=62120tstart=0
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Dtracing ZFS/ZIL

2008-05-30 Thread Hernan Freschi
Hello. I'm still having problems with my array. It's been replaying the ZIL (I 
think) for a week now and it hasn't finished. Now I don't know if it will ever 
finish: is it starting from scratch every time?  I'm dtracing the ZIL and this 
is what I get:

 0  46882dsl_pool_zil_clean:return
  0  46881 dsl_pool_zil_clean:entry
  0  46882dsl_pool_zil_clean:return
  0  46881 dsl_pool_zil_clean:entry
  0  46882dsl_pool_zil_clean:return
  0  46881 dsl_pool_zil_clean:entry
  0  46882dsl_pool_zil_clean:return
  0  46881 dsl_pool_zil_clean:entry
  0  46882dsl_pool_zil_clean:return
  0  46881 dsl_pool_zil_clean:entry
  0  46882dsl_pool_zil_clean:return
  0  46881 dsl_pool_zil_clean:entry
  0  46882dsl_pool_zil_clean:return
  0  46881 dsl_pool_zil_clean:entry
  0  46882dsl_pool_zil_clean:return
  0  46881 dsl_pool_zil_clean:entry
  0  46882dsl_pool_zil_clean:return

Does this mean that the ZIL is being updated? Or am I starting all over from 
scratch every time it reboots? (Rememer that I'm rebooting every 15 minutes 
because else the machine hangs when it runs out of memory).

Hernan
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] help with a BIG problem,

2008-05-25 Thread Hernan Freschi
Hello, thanks for your suggestion. I tried settin zfs_arc_max to 0x3000 
(768MB, out of 3GB). The system ran for almost 45 minutes before it froze. 

Here's an interesting piece of arcstat.pl, which I noticed just as it was 
pasing by:


Time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz c
15:17:41   152   152100   152  100 00   152  100 2G  805M
15:17:42   139   139100   139  100 00   139  100 2G  805M
State Changed
15:17:43   188   188100   188  100 00   188  100 2G  805M
15:17:44   150   150100   150  100 00   150  100 2G  805M
15:17:45   151   151100   151  100 00   151  100 2G  805M
15:17:46   149   149100   149  100 00   149  100 2G  805M
15:17:47   161   161100   161  100 00   161  100 2G  805M
15:17:48   153   153100   153  100 00   153  100 2G  219M
15:17:49   140   140100   140  100 00   140  100 2G  100M
15:17:50   143   143100   143  100 00   143  100 2G  100M
15:17:51   145   145100   145  100 00   145  100 2G  100M

notice how it suddenly drops c from 805M to 100M in 2 seconds. Also arcsz 
is 2G, which is weird because it shouldn't grow beyond 0x3000 (768M), 
right? And it's also weird to also get 100% MISS ratio

Here's top just before it froze:

last pid:  5253;  load avg:  0.47,  0.37,  0.33;   up 0+00:44:53 
15:20:14
77 processes: 75 sleeping, 1 running, 1 on cpu
CPU states: 57.5% idle,  1.0% user, 41.6% kernel,  0.0% iowait,  0.0% swap
Memory: 3072M phys mem, 28M free mem, 2055M swap, 1994M free swap

   PID USERNAME LWP PRI NICE  SIZE   RES STATETIMECPU COMMAND
  1248 root   1  590 5940K 2736K sleep0:14  0.82% arcstat.pl
  5206 root   9  590   47M 4892K sleep0:01  0.35% java
   855 root   2  590 5076K 1588K sleep0:09  0.33% apcupsd
  3134 root   1  590 5152K 1764K sleep0:02  0.26% zpool
  1261 root   1  590 4104K  588K cpu  0:03  0.22% top
  3125 root   1  590 6352K 1536K sleep0:00  0.06% sshd
  1151 root   1  590 6352K 1504K sleep0:00  0.05% sshd
62 root   1  590 1832K  540K sleep0:01  0.05% powernowd
   849 root   1  590   11M 1100K sleep0:00  0.05% snmpd
   465 proxy  1  590   15M 2196K run  0:00  0.04% squid
   271 daemon 1  590 6652K  264K sleep0:00  0.03% rcapd
  1252 root   1  590 6352K 1292K sleep0:00  0.02% sshd
 7 root  14  590   12M 5412K sleep0:04  0.02% svc.startd
   880 root   1  590 6276K 2076K sleep0:00  0.02% httpd
   847 root   1  590 2436K 1148K sleep0:00  0.02% dhcpagent

and finally, zpool iostat 1:

tera1.51T   312G207  0  1.22M  0
tera1.51T   312G141  0   854K  0
tera1.51T   312G 70  0   427K  0
tera1.51T   312G204  0  1.20M  0
tera1.51T   312G187  0  1.10M  0
tera1.51T   312G179  0  1.05M  0
tera1.51T   312G120  0   743K  0
tera1.51T   312G 94  0   580K  0
tera1.51T   312G 77  0   471K  0
tera1.51T   312G115  0   696K  0

Which shows a very poor read performance, for a 4xSATA2 (this array usually 
saturates my gigabit ethernet). And it's not that the kernel is processing that 
much data because the CPU is 57% idle and I THINK powernowd is making it run at 
900MHz.

Hernán
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] help with a BIG problem,

2008-05-24 Thread Hernan Freschi
No, this is a 64-bit system (athlon64) with 64-bit kernel of course.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] help with a BIG problem,

2008-05-24 Thread Hernan Freschi
So, I think I've narrowed it down to two things:

* ZFS tries to destroy the dataset every time it's called because the last time 
it didn't finish destroying
* In this process, ZFS makes the kernel run out of memory and die

So I thought of two options, but I'm not sure if I'm right:

Option 1: Destroy is an atomic operation

If destroy is atomic, then I guess what it's trying to do is look up all the 
blocks that need to be deleted/unlinked/released/freed (not sure which is the 
word). After it has that list, it will write it to the ZIL (remember this is 
just what I suppose, correct me if I'm wrong!) and start to physically delete 
the blocks, until the operation is done and it's finally committed.

If this is the case, then the process will be restarted from scratch every time 
the system is rebooted. But I read that apparently in previous versions, 
rebooting while destroying a clone that it's taking too long makes the clone 
reappear intact next time. This, and the fact that zpool iostat show only reads 
and no or very few writes is what lead me to think this is how it works.

So if this is the case, I'd like to abort this destroy. After importing the 
pool, I will have everything as it was and maybe I can delete snapshots before 
the clone's parent snapshot and maybe this will speed up the destroy process, 
or just leave the clone.

Option 2: Destroy is not atomic

By this I don't mean that it's not atomic, as in if the operation is 
canceled, it will finish in an incomplete state, but as in if the system is 
rebooted, the operation will RESUME at the point it was where it died. 

If this is the case, maybe I can write a script to reboot the computer in a 
fixed amount of time, and run it on boot:

zpool import xx 
sleep 20 seconds
rm /etc/zfs/zpool.cache
sleep 1800 seconds
reboot

This will work under the assumption that the list of blocks to be deleted is 
flushed to the ZIL or something before boot, to allow the operation to restart 
at the same point. This is a very nasty hack but it may do the trick only in a 
very slow fashion: zpool iostat shows 1MB/s read when it's doing the destroy. 
The dataset in question has 450GB which means that the operation will take 5 
days to finish if it needs to read the whole dataset to destroy it, or 7 days 
if it also needs to go through the other snapshots (600GB total).

So, my only viable option seems to be to abort this. How can I do this? 
disable the ZIL, maybe? Delete the ZIL? scrub after this?

Thanks,
Hernán
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] help with a BIG problem, can't import my zpool anymore

2008-05-23 Thread Hernan Freschi
Hello, I'm having a big problem here, disastrous maybe. 

I have a zpool consisting of 4x500GB SATA drives, this pool was born on S10U4 
and was recently upgraded to snv85 because of iSCSI issues with some initiator. 
Last night I was doing housekeeping, deleting old snapshots. One snapshot 
failed to delete because it had a dependant clone. So I try to destroy that 
clone: Everything went wrong from there.

The deletion was taking an excessively long time (over 40 minutes). zpool 
status hungs when I call it. zfs list too. zpool iostat showed disk activity. 
Other services non dependant on the pool were running, and the iSCSI this 
machine was serving was unbearably slow. 

At one point, I lost all iSCSI, SSH, web, and all other services. Ping still 
worked. So I go to the server and notice that the fans are running at 100%. I 
try to get a console (local VGA+keyboard) but the monitor shows no signal. No 
disk activity seemed to be happening at the moment. So, I do the standard 
procedure (reboot). Solaris boots but stops at hostname: blah. I see disk 
activity from the pool disks, so I let it boot. 30 minutes later, still didn't 
finish. I thought (correctly) that the system was waiting to mount the ZFS 
before booting, but for some reason it doesn't. I call it the day and let the 
machine do its thing.

8 hours later I return. CPU is cold, disks are idle and... solaris stays at the 
same hostname: blah. Time to reboot again, this time in failsafe. zpool 
import shows that the devices are detected and online. I delete 
/etc/zfs/zpool.cache and reboot. Solaris starts normally with all services 
running, but of course no zfs. zpool import shows the available pool, no 
errors. I do zpool import -f pool... 20 minutes later I'm still waiting for the 
pool to mount. zpool iostat shows activity:

   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
tera1.51T   312G274  0  1.61M  2.91K
tera1.51T   312G308  0  1.82M  0
tera1.51T   312G392  0  2.31M  0
tera1.51T   312G468  0  2.75M  0

but the mountpoint /tera is still not populated (and zpool import still doesn't 
exit).

zpool status shows:

  pool: tera
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
teraONLINE   0 0 0
  raidz1ONLINE   0 0 0
c1d0ONLINE   0 0 0
c2d0ONLINE   0 0 0
c3d0ONLINE   0 0 0
c4d0ONLINE   0 0 0

errors: No known data errors

What's going on? Why is taking so long to import?

Thanks in advance,
Hernan
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] help with a BIG problem, can't import my zpool anymore

2008-05-23 Thread Hernan Freschi
I got more info. I can run zpool history and this is what I get:

2008-05-23.00:29:40 zfs destroy tera/[EMAIL PROTECTED]
2008-05-23.00:29:47 [internal destroy_begin_sync txg:3890809] dataset = 152
2008-05-23.01:28:38 [internal destroy_begin_sync txg:3891101] dataset = 152
2008-05-23.07:01:36 zpool import -f tera
2008-05-23.07:01:40 [internal destroy_begin_sync txg:3891106] dataset = 152
2008-05-23.10:52:56 zpool import -f tera
2008-05-23.10:52:58 [internal destroy_begin_sync txg:3891112] dataset = 152
2008-05-23.12:17:49 [internal destroy_begin_sync txg:3891114] dataset = 152
2008-05-23.12:27:48 zpool import -f tera
2008-05-23.12:27:50 [internal destroy_begin_sync txg:3891120] dataset = 152
2008-05-23.13:03:07 [internal destroy_begin_sync txg:3891122] dataset = 152
2008-05-23.13:56:52 zpool import -f tera
2008-05-23.13:56:54 [internal destroy_begin_sync txg:3891128] dataset = 152

apparently, it starts destroying dataset #152, which is the parent snapshot of 
the clone I issued the command to destroy. Not sure how it works, but I ordered 
the deletion of the CLONE, not the snapshot (which I was going to destroy 
anyway). 

The question is still, why does it hang the machine? Why can't I access the 
filesystems? Isn't it supposed to import the zpool, mount the ZFSs and then do 
the destroy, in background?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] help with a BIG problem, can't import my zpool anymore

2008-05-23 Thread Hernan Freschi
I let it run for about 4 hours. when I returned, still the same: I can ping the 
machine but I can't SSH to it, or use the console. Please, I need urgent help 
with this issue!
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] help with a BIG problem, can't import my zpool anymore

2008-05-23 Thread Hernan Freschi
I let it run while watching TOP, and this is what I got just before it hung. 
Look at free mem. Is this memory allocated to the kernel? can I allow the 
kernel to swap?

last pid:  7126;  load avg:  3.36,  1.78,  1.11;   up 0+01:01:11
  21:16:49
88 processes: 78 sleeping, 9 running, 1 on cpu
CPU states: 22.4% idle,  0.4% user, 77.2% kernel,  0.0% iowait,  0.0% swap
Memory: 3072M phys mem, 31M free mem, 2055M swap, 1993M free swap

   PID USERNAME LWP PRI NICE  SIZE   RES STATETIMECPU COMMAND
  7126 root   9  580   45M 4188K run  0:00  0.71% java
  4821 root   1  590 5124K 1724K run  0:03  0.46% zfs
  5096 root   1  590 5124K 1724K run  0:03  0.45% zfs
  2470 root   1  590 4956K 1660K sleep0:06  0.45% zfs
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] help with a BIG problem,

2008-05-23 Thread Hernan Freschi
I forgot to post arcstat.pl's output:

Time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz c
22:32:37  556K  525K 94  515K   949K   98  515K   97 1G1G
22:32:38636310063  100 0063  100 1G1G
22:32:39747410074  100 0074  100 1G1G
22:32:40767610076  100 0076  100 1G1G
State Changed
22:32:41757510075  100 0075  100 1G1G
22:32:42777710077  100 0077  100 1G1G
22:32:43727210072  100 0072  100 1G1G
22:32:44808010080  100 0080  100 1G1G
State Changed
22:32:45989810098  100 0098  100 1G1G

sometimes c is 2G.

I tried the mkfile and swap, but I get:
[EMAIL PROTECTED]:/]# mkfile -n 4g /export/swap
[EMAIL PROTECTED]:/]# swap -a /export/swap
/export/swap may contain holes - can't swap on it.

/export is the only place where I have enough free space. I could add another 
drive if needed.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] help with a BIG problem,

2008-05-23 Thread Hernan Freschi
oops. replied too fast.
Ran without -n, and space was added successfully... but it didn't work. It died 
out of memory again.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss