[zfs-discuss] ZFS on Linux vs FreeBSD

2012-04-25 Thread Paul Archer
This may fall into the realm of a religious war (I hope not!), but recently 
several people on this list have said/implied that ZFS was only acceptable for 
production use on FreeBSD (or Solaris, of course) rather than Linux with ZoL.


I'm working on a project at work involving a large(-ish) amount of data, about 
5TB, working its way up to 12-15TB eventually, spread among a dozen or so 
nodes. There may or may not be a clustered filesystem involved (probably 
gluster if we use anything). I've been looking at ZoL as the primary 
filesystem for this data. We're a Linux shop, so I'd rather not switch to 
FreeBSD, or any of the Solaris-derived distros--although I have no problem 
with them, I just don't want to introduce another OS into the mix if I can 
avoid it.


So, the actual questions are:

Is ZoL really not ready for production use?

If not, what is holding it back? Features? Performance? Stability?

If not, then what kind of timeframe are we looking at to get past whatever is 
holding it back?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on Linux vs FreeBSD

2012-04-25 Thread Paul Archer

9:59am, Richard Elling wrote:


On Apr 25, 2012, at 5:48 AM, Paul Archer wrote:

  This may fall into the realm of a religious war (I hope not!), but 
recently several people on this list have
  said/implied that ZFS was only acceptable for production use on FreeBSD 
(or Solaris, of course) rather than Linux
  with ZoL.

  I'm working on a project at work involving a large(-ish) amount of data, 
about 5TB, working its way up to 12-15TB


This is pretty small by today's standards.  With 4TB disks, that is only 3-4 
disks + redundancy.

True. At my last job, we were used to researchers asking for individual 4-5TB 
filesystems, and 1-2TB increases in size. When I left, there was over a 100TB 
online (in '07).




  eventually, spread among a dozen or so nodes. There may or may not be a 
clustered filesystem involved (probably
  gluster if we use anything).


I wouldn't dream of building a clustered file system that small. Maybe when you 
get into the
multiple-PB range, then it might make sense.

The point of a clustered filesystem was to be able to spread our data out 
among all nodes and still have access from any node without having to run NFS. 
Size of the data set (once you get past the point where you can replicate it 
on each node) is irrelevant.





  I've been looking at ZoL as the primary filesystem for this data. We're a 
Linux shop, so I'd rather not switch to
  FreeBSD, or any of the Solaris-derived distros--although I have no 
problem with them, I just don't want to
  introduce another OS into the mix if I can avoid it.

  So, the actual questions are:

  Is ZoL really not ready for production use?

  If not, what is holding it back? Features? Performance? Stability?


The computer science behind ZFS is sound. But it was also developed for Solaris 
which
is quite different than Linux under the covers. So the Linux and other OS ports 
have issues
around virtual memory system differences and fault management differences. This 
is the
classic getting it to work is 20% of the effort, getting it to work when all 
else is failing is
the other 80% case.
 -- richard


I understand the 80/20 rule. But this doesn't really answer the question(s). 
If there weren't any major differences among operating systems, the project 
probably would have been done long ago.


To put it slightly differently, if I used ZoL in production, would I be likely 
to experience performance or stability problems? Or would it be lacking in 
features that I would likely need?___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)

2012-04-25 Thread Paul Archer




11:26am, Richard Elling wrote:


On Apr 25, 2012, at 10:59 AM, Paul Archer wrote:

  The point of a clustered filesystem was to be able to spread our data out 
among all nodes and still have access
  from any node without having to run NFS. Size of the data set (once you 
get past the point where you can replicate
  it on each node) is irrelevant.


Interesting, something more complex than NFS to avoid the complexities of NFS? 
;-)

We have data coming in on multiple nodes (with local storage) that is needed 
on other multiple nodes. The only way to do that with NFS would be with a 
matrix of cross mounts that would be truly scary.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on Linux vs FreeBSD

2012-04-25 Thread Paul Archer

To put it slightly differently, if I used ZoL in production, would I be likely 
to experience performance or stability
problems?

I saw one team revert from ZoL (CentOS 6) back to ext on some backup servers 
for an application project, the killer  was
stat times (find running slow etc.), perhaps more layer 2 cache could have 
solved the problem, but it was easier to deploy
ext/lvm2.


Hmm... I've got 1.4TB in about 70K files in 2K directories, and a simple find 
on a cold FS took me about 6 seconds:


root@hoard22:/hpool/12/db# time find . -type d | wc
df -h
   20822082   32912

real0m5.923s
user0m0.052s
sys 0m1.012s


So I'd say I'm doing OK there. But I've got 10K disks and a fast SSD for 
caching.___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on Linux vs FreeBSD

2012-04-25 Thread Paul Archer

9:08pm, Stefan Ring wrote:


Sorry for not being able to contribute any ZoL experience. I've been
pondering whether it's worth trying for a few months myself already.
Last time I checked, it didn't support the .zfs directory (for
snapshot access), which you really don't want to miss after getting
used to it.

Actually, rc8 (or was it rc7?) introduced/implemented the .zfs directory. If 
you're upgrading, you need to reboot,  but other than that, it works 
perfectly.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)

2012-04-25 Thread Paul Archer

2:20pm, Richard Elling wrote:


On Apr 25, 2012, at 12:04 PM, Paul Archer wrote:

Interesting, something more complex than NFS to avoid the 
complexities of NFS? ;-)

  We have data coming in on multiple nodes (with local storage) that is 
needed on other multiple nodes. The only way
  to do that with NFS would be with a matrix of cross mounts that would be 
truly scary.


Ignoring lame NFS clients, how is that architecture different than what you 
would have 
with any other distributed file system? If all nodes share data to all other 
nodes, then...?
 -- richard



Simple. With a distributed FS, all nodes mount from a single DFS. With NFS, 
each node would have to mount from each other node. With 16 nodes, that's 
what, 240 mounts? Not to mention your data is in 16 different mounts/directory 
structures, instead of being in a unified filespace.___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)

2012-04-25 Thread Paul Archer

2:34pm, Rich Teer wrote:


On Wed, 25 Apr 2012, Paul Archer wrote:


Simple. With a distributed FS, all nodes mount from a single DFS. With NFS,
each node would have to mount from each other node. With 16 nodes, that's
what, 240 mounts? Not to mention your data is in 16 different 
mounts/directory

structures, instead of being in a unified filespace.


Perhaps I'm being overly simplistic, but in this scenario, what would prevent
one from having, on a single file server, /exports/nodes/node[0-15], and then
having each node NFS-mount /exports/nodes from the server?  Much simplier 
than

your example, and all data is available on all machines/nodes.



That assumes the data set will fit on one machine, and that machine won't be a 
performance bottleneck.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs

2012-04-25 Thread Paul Archer

Tomorrow, Ian Collins wrote:


On 04/26/12 10:34 AM, Paul Archer wrote:
That assumes the data set will fit on one machine, and that machine won't 
be a

performance bottleneck.


Aren't those general considerations when specifying a file server?

I suppose. But I meant specifically that our data will not fit on one single 
machine, and we are relying on spreading it across more nodes to get it on 
more spindles as well.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] dedup causing problems with NFS?(was Re: snapshots taking too much space)

2010-04-15 Thread Paul Archer

3:08pm, Daniel Carosone wrote:


On Wed, Apr 14, 2010 at 08:48:42AM -0500, Paul Archer wrote:

So I turned deduplication on on my staging FS (the one that gets mounted
on the database servers) yesterday, and since then I've been seeing the
mount hang for short periods of time off and on. (It lights nagios up
like a Christmas tree 'cause the disk checks hang and timeout.)


Does it have enough (really, lots) of memory?  Do you have an l2arc
cache device attached (as well)?

Dedup has a significant memory requirement, or it has to go to disk
for lots of DDT entries.  While its doing that, NFS requests can time
out.  Lengthening the timeouts on the client (for the fs mounted as a
backup destination) might help you around the edges of the problem.

As a related issue, are your staging (export) and backup fileystems
in the same pool?  If they are, moving from staging to final will
involve another round of updating lots of DDT entries.

What might be worthwhile trying:
- turning dedup *off* on the staging filesystem, so NFS isn't waiting
  for it, and then deduping later as you move to the backup area at
  leisure (effectively, asynchronously to the nfs writes).
- or, perhaps eliminating this double work by writing directly to the
  main backup fs.



Thanks for the info.

FWIW, I have turned off dedup on the staging filesystem, but the dedup'ed data 
is still there, so it's a bit late now.


The reason I can't write directly to the main backup FS is that the backup 
process (RMAN run by my Oracle DBA) writes new files in place, and so my 
snapshots were taking up 500GB each, vs the 50GB I get if I use rsync instead.


I had the dedup turned on on the staging FS so that I could take snapshots of 
it with dedup and the final FS without dedup (but populated via rsync) to 
compare which works best. I guess I'll have to wait until I can get some more 
RAM on the box.


Paul
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] dedup causing problems with NFS?(was Re: snapshots taking too much space)

2010-04-15 Thread Paul Archer

Yesterday, Erik Trimble wrote:


Daniel Carosone wrote:

On Wed, Apr 14, 2010 at 08:48:42AM -0500, Paul Archer wrote:

So I turned deduplication on on my staging FS (the one that gets mounted 
on the database servers) yesterday, and since then I've been seeing the 
mount hang for short periods of time off and on. (It lights nagios up like 
a Christmas tree 'cause the disk checks hang and timeout.)




Does it have enough (really, lots) of memory?  Do you have an l2arc
cache device attached (as well)? 
The OP said he had 8GB of RAM, and I suspect that a cheap SSD in the 40-60GB 
range for L2ARC would actually be the best choice to speed things up in the 
future, rather than add another 8GB of RAM.


I think I'm going to try both. Easier to get one request for upgrades approved 
than get a second one approved if the first one doesn't cut it.



Paul
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] dedup screwing up snapshot deletion

2010-04-15 Thread Paul Archer

3:26pm, Daniel Carosone wrote:


On Wed, Apr 14, 2010 at 09:04:50PM -0500, Paul Archer wrote:

I realize that I did things in the wrong order. I should have removed the
oldest snapshot first, on to the newest, and then removed the data in the
FS itself.


For the problem in question, this is irrelevant.  As discussed in the
rest of the thread,  you'll hit this when doing anyting that requires
updating the ref counts on a large number of DDT entries.

The only way snapshot order can really make a big difference is if you
arrange for it to do so in advance.  If you know you have a large
amount of data to delete from a filesystem:
- snapshot at the start
- start deleting
- snapshot fast and frequently during the deletion
- let the snapshots go, later, at a controlled pace, to limit the
  rate of actual block frees.

That's a great idea. I wish I had thought of/heard of it before I deleted the 
data in my dedup'ed FS.


Paul
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] dedup causing problems with NFS?(was Re: snapshots taking too much space)

2010-04-14 Thread Paul Archer
So I turned deduplication on on my staging FS (the one that gets mounted on 
the database servers) yesterday, and since then I've been seeing the mount 
hang for short periods of time off and on. (It lights nagios up like a 
Christmas tree 'cause the disk checks hang and timeout.)


I haven't turned dedup off again yet, because I'd like to figure out how to 
get past this problem.


Can anyone give me an idea of why the mounts might be hanging, or where to 
look for clues? And has anyone had this problem with dedup and NFS before? 
FWIW, the clients are a mix of Solaris and Linux.


Paul




Yesterday, Paul Archer wrote:


Yesterday, Arne Jansen wrote:


Paul Archer wrote:


Because it's easier to change what I'm doing than what my DBA does, I
decided that I would put rsync back in place, but locally. So I changed
things so that the backups go to a staging FS, and then are rsync'ed
over to another FS that I take snapshots on. The only problem is that
the snapshots are still in the 500GB range.

So, I need to figure out why these snapshots are taking so much more
room than they were before.

This, BTW, is the rsync command I'm using (and essentially the same
command I was using when I was rsync'ing from the NetApp):

rsync -aPH --inplace --delete /staging/oracle_backup/
/backups/oracle_backup/


Try adding --no-whole-file to rsync. rsync disables block-by-block
comparison if used locally by default.



Thanks for the tip. I didn't realize rsync had that behavior. It looks like 
that got my snapshots back to the 50GB range. I'm going to try dedup on the 
staging FS as well, so I can do a side-by-side of which gives me the better 
space savings.


Paul
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] dedup screwing up snapshot deletion

2010-04-14 Thread Paul Archer
I have an approx 700GB (of data) FS that I had dedup turned on for. (See 
previous posts.) I turned on dedup after the FS was populated, and was not 
sure dedup was working. I had another copy of the data, so I removed the data, 
and then tried to destroy the snapshots I had taken. The first two didn't take 
too long, but the last one (the oldest) has taken literally hours now. I've 
rebooted and tried starting over, but it hasn't made a difference.
I realize that I did things in the wrong order. I should have removed the 
oldest snapshot first, on to the newest, and then removed the data in the FS 
itself. But still, it shouldn't take hours, should it?


I made sure the machine was otherwise idle, and did an 'iostat', which shows 
about 5KB/sec reads and virtually no writes to the pool. Any ideas where to 
look? I'd just remove the FS entirely at this point, but I'd have to destroy 
the snapshot first, so I'm in the same boat, yes?


TIA,


Paul
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] dedup screwing up snapshot deletion

2010-04-14 Thread Paul Archer

7:51pm, Richard Jahnel wrote:


This sounds like the known issue about the dedupe map not fitting in ram.

When blocks are freed, dedupe scans the whole map to ensure each block is not 
is use before releasing it. This takes a veeery long time if the map doesn't 
fit in ram.

If you can try adding more ram to the system.
--


Thanks for the info. Unfortunately, I'm not sure I'll be able to add more RAM 
any time soon. But I'm certainly going to try, as this is the primary backup 
server for our Oracle databases.


Thanks again,

Paul

PS It's got 8GB right now. You think doubling that to 16GB would cut it? Is 
there a way to see how big the map is, anyway?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] snapshots taking too much space

2010-04-13 Thread Paul Archer

Yesterday, Arne Jansen wrote:


Paul Archer wrote:


Because it's easier to change what I'm doing than what my DBA does, I
decided that I would put rsync back in place, but locally. So I changed
things so that the backups go to a staging FS, and then are rsync'ed
over to another FS that I take snapshots on. The only problem is that
the snapshots are still in the 500GB range.

So, I need to figure out why these snapshots are taking so much more
room than they were before.

This, BTW, is the rsync command I'm using (and essentially the same
command I was using when I was rsync'ing from the NetApp):

rsync -aPH --inplace --delete /staging/oracle_backup/
/backups/oracle_backup/


Try adding --no-whole-file to rsync. rsync disables block-by-block
comparison if used locally by default.



Thanks for the tip. I didn't realize rsync had that behavior. It looks like 
that got my snapshots back to the 50GB range. I'm going to try dedup on the 
staging FS as well, so I can do a side-by-side of which gives me the better 
space savings.


Paul
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] snapshots taking too much space

2010-04-12 Thread Paul Archer

I've got a bit of a strange problem with snapshot sizes. First, some
background:

For ages our DBA backed up all the company databases to a directory NFS
mounted from a NetApp filer. That directory would then get dumped to tape.

About a year ago, I built an OpenSolaris (technically Nexenta) machine with 24
x 1.5TB drives, for about 24TB of usable space. I am using this to backup OS
images using backuppc.

I was also backing up the DBA's backup volume from the NetApp to the (ZFS)
backup server. This is a combination of rsync + snapshots. The snapshots were
using about 50GB/day. The backup volume is about 600GB total, so this
wasn't bad, especially on a box with 24TB of space available.

I decided to cut out the middleman, and save some of that expensive NetApp
disk space, by having the DBA backup directly to the backup server. I
repointed the NFS mounts on our DB servers to point to the backup server
instead of the NetApp. Then I ran a simple cron job to snapshot that ZFS
filesystem daily.

My problem is that the snapshots started taking around 500GB instead of 50GB.
After a bit of thinking, I realized that the backup system my DBA was using
must have been writing new files and moving them into place, or possibly 
writing a whole new file even if only part changed.
I think this is the problem because ZFS never overwrites files in place. 
Instead it would allocate new blocks. But rsync does a byte-by-byte 
comparison, and only updates the blocks that have changed.


Because it's easier to change what I'm doing than what my DBA does, I decided 
that I would put rsync back in place, but locally. So I changed things so that 
the backups go to a staging FS, and then are rsync'ed over to another FS that 
I take snapshots on. The only problem is that the snapshots are still in the 
500GB range.


So, I need to figure out why these snapshots are taking so much more room than 
they were before.


This, BTW, is the rsync command I'm using (and essentially the same command I 
was using when I was rsync'ing from the NetApp):


rsync -aPH --inplace --delete /staging/oracle_backup/ /backups/oracle_backup/



This is the old system (rsync'ing from a NetApp and taking snapshots):
zfs list -t snapshot -r bpool/snapback
NAME   USED  AVAIL  REFER  MOUNTPOINT
...
bpool/snapb...@20100310-18271353.7G  -   868G  -
bpool/snapb...@20100312-00031859.8G  -   860G  -
bpool/snapb...@20100312-18255254.0G  -   840G  -
bpool/snapb...@20100313-18483471.7G  -   884G  -
bpool/snapb...@20100314-12302417.5G  -   832G  -
bpool/snapb...@20100315-17360972.6G  -   891G  -
bpool/snapb...@20100316-16552724.3G  -   851G  -
bpool/snapb...@20100317-17130456.2G  -   884G  -
bpool/snapb...@20100318-17025050.9G  -   865G  -
bpool/snapb...@20100319-18113153.9G  -   874G  -
bpool/snapb...@20100320-18361780.8G  -   902G  -
...



This is from the new system (backing up directly to one volume, rsync'ing to 
and snapshotting another one):


r...@backup02:~# zfs list -t snapshot -r bpool/backups/oracle_backup
NAME  USED  AVAIL  REFER  MOUNTPOINT
bpool/backups/oracle_bac...@20100411-023130   479G  -   681G  -
bpool/backups/oracle_bac...@20100411-104428   515G  -   721G  -
bpool/backups/oracle_bac...@20100412-144700  0  -   734G  -


Thanks for any help,

Paul
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs inotify?

2009-10-25 Thread Paul Archer

OK, so this may be a little off-topic, but here goes:
The reason I switched to OpenSolaris was primarily to take advantage of ZFS's 
features when storing my digital imaging collection.


I switched from a pretty stock Linux setup, but it left me at one 
disadvantage. I had been using inotify under Linux to trigger a series of Ruby 
scripts that would do all the basic ingestion/setup for me (renaming files, 
converting to DNG, adding bulk metadata). The scripts will run under 
OpenSolaris, except for the inotify part.


Question: Is there a facility similar to inotify that I can use to monitor a 
directory structure in OpenSolaris/ZFS, such that it will block until a file 
is modified (added, deleted, etc), and then pass the state along (STDOUT is 
fine)? One other requirement: inotify can handle subdirectories being added on 
the fly. So if you use it to monitor, for example, /data/images/incoming, and 
a /data/images/incoming/100canon directory gets created, then the files under 
that directory will automatically be monitored as well.


Thanks,

Paul Archer
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs inotify?

2009-10-25 Thread Paul Archer

5:12pm, Cyril Plisko wrote:


Question: Is there a facility similar to inotify that I can use to monitor a
directory structure in OpenSolaris/ZFS, such that it will block until a file
is modified (added, deleted, etc), and then pass the state along (STDOUT is
fine)? One other requirement: inotify can handle subdirectories being added
on the fly. So if you use it to monitor, for example, /data/images/incoming,
and a /data/images/incoming/100canon directory gets created, then the files
under that directory will automatically be monitored as well.



while there is no inotify for Solaris, there are similar technologies available.

Check port_create(3C) and gam_server(1)

I can't find much on gam_server on Solaris (couldn't find too much on it at 
all, really), and port_create is apparently a system call. (I'm not a 
developer--if I can't write it in BASH, Perl, or Ruby, I can't write it.)

I appreciate the suggestions, but I need something a little more pret-a-porte.

Does anyone have any dtrace experience? I figure this could probably be done 
with dtrace, but I don't know enough about it to write a dtrace script 
(although I may learn if that turns out to be the best way to go). I was 
hoping that there'd be a script out there already, but I haven't turned up 
anything yet.


Paul
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] dedup video

2009-10-13 Thread Paul Archer
Someone posted this link: https://slx.sun.com/1179275620 for a video on ZFS 
deduplication. But the site isn't responding (which is typical of Sun, since 
I've been dealing with them for the last 12 years).

Does anyone know of a mirror site, or if the video is on YouTube?

Paul
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Comments on home OpenSolaris/ZFS server

2009-09-29 Thread Paul Archer

9:51am, Ware Adams wrote:


On Sep 29, 2009, at 9:32 AM, p...@paularcher.org wrote:


I am using an SC846xxx for a project here at work.
The hardware consists of an ASUS server-level motherboard with 2 quad-core
Xeons, 8GB of RAM, an LSI PCI-e SAS/SATA card, and 24 1.5TB HD, all in one
of these cases.
The drives are in one pool with 3x 7+1 raid-z sets. Raw is 32TB, usable is
about 24TB. Total price was about $6000. (It'd be about $800 less now that
1.5TB drives have dropped in price.)


If I can go with something like this it's going to be the easiest way to get 
lots of drives.  Do you have this outside of a server room?  Would the noise 
be manageable if say it were mounted in an enclosed rack with sound 
deadening?




It's in a server room, but I had it here in the office while I was putting 
it together. The case really isn't too loud. 24 hard drives make a fair 
bit of noise--but I think if you had it in a closet with some 
soundproofing, it wouldn't be bad. And if you went with a smaller 
enclosure (12 drives, for instance) that would help.


Paul

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Comments on home OpenSolaris/ZFS server

2009-09-29 Thread Paul Archer

You don't like http://www.supermicro.com/products/nfo/chassis_storage.cfm
?
I must admit I don't have a price list of these.



I am using an SC846xxx for a project here at work.
The hardware consists of an ASUS server-level motherboard with 2 quad-core
Xeons, 8GB of RAM, an LSI PCI-e SAS/SATA card, and 24 1.5TB HD, all in one
of these cases.
The drives are in one pool with 3x 7+1 raid-z sets. Raw is 32TB, usable is
about 24TB. Total price was about $6000. (It'd be about $800 less now that
1.5TB drives have dropped in price.)

I built it for disk to disk backups. Right now, I'm using backuppc for
backing up the OS'es of our DB servers and such, and rsync and snapshots
for the databases themselves.
I get about 50MB/sec read and write speeds, but I think that's because the
version of the SC846 I got has a single backplane for the SAS/SATA drives,
and one connector to the LSI card. Of course, for what I'm doing, that's
fine.

Paul

Oh, I think the SC846 I got was about $1100.
http://www.cdw.com/shop/search/results.aspx?key=sc846searchscope=Allsr=1Find+it.x=0Find+it.y=0



One thing I forgot to mention: there is a wart with this case. The 
connectors for the low-profile CDROM drive are too short, and the power 
connector for the internal drive hits the lid of the case. I actually had 
to find a low-profile molex power connector for the hard drive, and I can 
only use the CDROM drive if I open the case up and loosen the internal 
hard drive so I can plug the CDROM in. Otherwise, though, the case is very 
well built.


Paul
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] extremely slow writes (with good reads)

2009-09-28 Thread Paul Archer

Yesterday, Paul Archer wrote:



I estimate another 10-15 hours before this disk is finished resilvering and 
the zpool is OK again. At that time, I'm going to switch some hardware out 
(I've got a newer and higher-end LSI card that I hadn't used before because 
it's PCI-X, and won't fit on my current motherboard.)
I'll report back what I get with it tomorrow or the next day, depending on 
the timing on the resilver.


Paul Archer


And the hits just keep coming...
The resilver finished last night, so rebooted the box as I had just 
upgraded to the latest Dev build. Not only did the upgrade fail (love that 
instant rollback!), but now the zpool won't come online:


r...@shebop:~# zpool import
  pool: datapool
id: 3410059226836265661
 state: UNAVAIL
status: The pool is formatted using an older on-disk version.
action: The pool cannot be imported due to damaged devices or data.
config:

datapool UNAVAIL  insufficient replicas
  raidz1 UNAVAIL  corrupted data
c7d0 ONLINE
c8d0s0   ONLINE
c9d0s0   ONLINE
c11d0s0  ONLINE
c10d0s0  ONLINE


I've tried renaming /etc/zfs/zpool.cache and rebooting, but no joy.
Is it OK to scream and tear my hair out now?

Paul

PS I don't suppose there's an RFE out there for give useful data when a 
pool is unavailable. Or even better, allow a pool to be imported (but no 
filesystems mounted) so it *can be fixed*.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] extremely slow writes (with good reads)

2009-09-28 Thread Paul Archer

8:30am, Paul Archer wrote:


And the hits just keep coming...
The resilver finished last night, so rebooted the box as I had just upgraded 
to the latest Dev build. Not only did the upgrade fail (love that instant 
rollback!), but now the zpool won't come online:


r...@shebop:~# zpool import
 pool: datapool
   id: 3410059226836265661
state: UNAVAIL
status: The pool is formatted using an older on-disk version.
action: The pool cannot be imported due to damaged devices or data.
config:

   datapool UNAVAIL  insufficient replicas
 raidz1 UNAVAIL  corrupted data
   c7d0 ONLINE
   c8d0s0   ONLINE
   c9d0s0   ONLINE
   c11d0s0  ONLINE
   c10d0s0  ONLINE


I've tried renaming /etc/zfs/zpool.cache and rebooting, but no joy.
Is it OK to scream and tear my hair out now?



A little more research came up with this:

r...@shebop:~# zdb -l /dev/dsk/c7d0

LABEL 0

failed to unpack label 0

LABEL 1

failed to unpack label 1

LABEL 2

failed to unpack label 2

LABEL 3

failed to unpack label 3


While 'zdb -l /dev/dsk/c7d0s0' shows normal labels. So the new question 
is: how do I tell ZFS to use c7d0s0 instead of c7d0? I can't do a 'zpool 
replace' because the zpool isn't online.


Paul
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] extremely slow writes (with good reads)

2009-09-28 Thread Paul Archer

7:56pm, Victor Latushkin wrote:


While 'zdb -l /dev/dsk/c7d0s0' shows normal labels. So the new question is: 
how do I tell ZFS to use c7d0s0 instead of c7d0? I can't do a 'zpool 
replace' because the zpool isn't online.


ZFS actually uses c7d0s0 and not c7d0 - it shortens output to c7d0 in case it 
controls entire disk. As before upgrade it looked like this:


   NAMESTATE READ WRITE CKSUM
   datapoolONLINE   0 0 0
 raidz1ONLINE   0 0 0
   c2d0s0  ONLINE   0 0 0
   c3d0s0  ONLINE   0 0 0
   c4d0s0  ONLINE   0 0 0
   c6d0s0  ONLINE   0 0 0
   c5d0s0  ONLINE   0 0 0

I guess something happened to the labeling of disk c7d0 (used to be c2d0) 
before, during or after upgrade.


It would be nice to show what zdb -l shows for this disk and some other disk 
too. output of 'prtvtoc /dev/rdsk/cXdYs0' can be helpful too.




This is from c7d0:


LABEL 0

version=13
name='datapool'
state=0
txg=233478
pool_guid=3410059226836265661
hostid=519305
hostname='shebop'
top_guid=7679950824008134671
guid=17458733222130700355
vdev_tree
type='raidz'
id=0
guid=7679950824008134671
nparity=1
metaslab_array=23
metaslab_shift=32
ashift=9
asize=7501485178880
is_log=0
children[0]
type='disk'
id=0
guid=17458733222130700355
path='/dev/dsk/c7d0s0'
devid='id1,c...@asamsung_hd154ui=s1y6j1ks742049/a'

phys_path='/p...@0,0/pci10de,3...@4/pci8086,3...@7/pci-...@0/i...@1/c...@0,0:a'
whole_disk=1
DTL=588
children[1]
type='disk'
id=1
guid=4735756507338772729
path='/dev/dsk/c8d0s0'
devid='id1,c...@asamsung_hd154ui=s1y6j1ks742050/a'

phys_path='/p...@0,0/pci10de,3...@4/pci8086,3...@7/pci-...@1/i...@0/c...@0,0:a'
whole_disk=0
DTL=467
children[2]
type='disk'
id=2
guid=10113358996255761229
path='/dev/dsk/c9d0s0'
devid='id1,c...@asamsung_hd154ui=s1y6j1ks742059/a'

phys_path='/p...@0,0/pci10de,3...@4/pci8086,3...@7/pci-...@1/i...@1/c...@0,0:a'
whole_disk=0
DTL=573
children[3]
type='disk'
id=3
guid=11460855531791764612
path='/dev/dsk/c11d0s0'
devid='id1,c...@asamsung_hd154ui=s1y6j1ks742048/a'

phys_path='/p...@0,0/pci10de,3...@4/pci8086,3...@7/pci-...@2/i...@1/c...@0,0:a'
whole_disk=0
DTL=571
children[4]
type='disk'
id=4
guid=14986691153111294171
path='/dev/dsk/c10d0s0'
devid='id1,c...@ast31500341as=9vs0ttwf/a'

phys_path='/p...@0,0/pci10de,3...@4/pci8086,3...@7/pci-...@2/i...@0/c...@0,0:a'
whole_disk=0
DTL=473


Labels 1-3 are identical

The other disks in the pool give identical results (except for the guid's, 
which match with what's above).




c8d0 - c11d0 are identical, so I didn't include that output below:

r...@shebop:/tmp# prtvtoc /dev/rdsk/c7d0s0
* /dev/rdsk/c7d0s0 partition map
*
* Dimensions:
* 512 bytes/sector
* 2930264064 sectors
* 2930263997 accessible sectors
*
* Flags:
*   1: unmountable
*  10: read-only
*
* Unallocated space:
*   First SectorLast
*   Sector CountSector
*  34   222   255
*
*  First SectorLast
* Partition  Tag  FlagsSector CountSector  Mount Directory
   0  400256 2930247391 2930247646
   8 1100  2930247647 16384 2930264030
r...@shebop:/tmp#
r...@shebop:/tmp# prtvtoc /dev/rdsk/c8d0s0
* /dev/rdsk/c8d0s0 partition map
*
* Dimensions:
* 512 bytes/sector
* 2930264064 sectors
* 2930277101 accessible sectors
*
* Flags:
*   1: unmountable
*  10: read-only
*
*  First SectorLast
* Partition  Tag  FlagsSector CountSector  Mount Directory
   0 1700 34 2930277101 2930277134


 Thanks for the help!


Paul Archer
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] extremely slow writes (with good reads)

2009-09-28 Thread Paul Archer
In light of all the trouble I've been having with this zpool, I bought a 
2TB drive, and I'm going to move all my data over to it, then destroy the 
pool and start over.


Before I do that, what is the best way on an x86 system to format/label 
the disks?


Thanks,

Paul


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] extremely slow writes (with good reads)

2009-09-28 Thread Paul Archer

Cool.
FWIW, there appears to be an issue with the LSI 150-6 card I was using. I 
grabbed an old server m/b from work, and put a newer PCI-X LSI card in it, 
and I'm getting write speeds of about 60-70MB/sec, which is about 40x the 
write speed I was seeing with the old card.


Paul


Tomorrow, Robert Milkowski wrote:


Paul Archer wrote:
In light of all the trouble I've been having with this zpool, I bought a 
2TB drive, and I'm going to move all my data over to it, then destroy the 
pool and start over.


Before I do that, what is the best way on an x86 system to format/label the 
disks?





if entire disk is going to be dedicated to a one zfs pool then don't bother 
with manual labeling - when creating a pool provide a disk name without a 
slice name (so for example c0d0 instead of c0d0s0) and zfs will automatically 
put an EFI label on it with s0 representing entire disk (- reserved area).


--
Robert Milkowski
http://milek.blogspot.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] extremely slow writes (with good reads)

2009-09-28 Thread Paul Archer

11:04pm, Paul Archer wrote:


Cool.
FWIW, there appears to be an issue with the LSI 150-6 card I was using. I 
grabbed an old server m/b from work, and put a newer PCI-X LSI card in it, 
and I'm getting write speeds of about 60-70MB/sec, which is about 40x the 
write speed I was seeing with the old card.


Paul


Small correction: I was seeing writes in the 60-70MB range because I was 
writing to a single 2TB (on its own pool). When I tried writing back to 
the primary (4+1 raid-z) pool, I was getting between 100-120MB/sec. 
(That's for sequential writes, anyway.)


paul
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] extremely slow writes (with good reads)

2009-09-27 Thread Paul Archer
Problem is that while it's back, the performance is horrible. It's 
resilvering at about (according to iostat) 3.5MB/sec. And at some point, I 
was zeroing out the drive (with 'dd if=/dev/zero of=/dev/dsk/c7d0'), and 
iostat showed me that the drive was only writing at around 3.5MB/sec. *And* 
it showed reads of about the same 3.5MB/sec even during the dd.


This same hardware and even the same zpool have been run under linux with 
zfs-fuse and BSD, and with BSD at least, performance was much better. A 
complete resilver under BSD took 6 hours. Right now zpool is estimating 
this resilver to take 36.


Could this be a driver problem? Something to do with the fact that this is 
a very old SATA card (LSI 150-6)?


This is driving me crazy. I finally got my zpool working under Solaris so 
I'd have some stability, and I've got no performance.






It appears your controller is preventing ZFS from enabling write cache.

I'm not familiar with that model. You will need to find a way to enable the 
drives write cache manually.




My controller, while normally a full RAID controller, has had its BIOS 
turned off, so it's acting as a simple SATA controller. Plus, I'm seeing 
this same slow performance with dd, not just with ZFS. And I wouldn't 
think that write caching would make a difference with using dd (especially 
writing in from /dev/zero).


The other thing that's weird is the writes. I am seeing writes in that 
3.5MB/sec range during the resilver, *and* I was seeing the same thing 
during the dd.
This is from the resilver, but again, the dd was similar. c7d0 is the 
device in question:


r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0  238.00.0  476.0  0.0  1.00.04.1   0  99 c12d1
   30.8   37.8 3302.4 3407.2 14.1  2.0  206.0   29.2 100 100 c7d0
   80.40.0 3417.60.0  0.3  0.33.33.2   8  14 c8d0
   80.40.0 3417.60.0  0.3  0.33.43.2   9  14 c9d0
   80.60.0 3417.60.0  0.3  0.33.43.2   9  14 c10d0
   80.60.0 3417.60.0  0.3  0.33.33.1   9  14 c11d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c12t0d0


Paul Archer
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] extremely slow writes (with good reads)

2009-09-27 Thread Paul Archer


My controller, while normally a full RAID controller, has had its BIOS 
turned off, so it's acting as a simple SATA controller. Plus, I'm seeing 
this same slow performance with dd, not just with ZFS. And I wouldn't think 
that write caching would make a difference with using dd (especially 
writing in from /dev/zero).


I don't think you got what I said. Because the controller normally runs as a 
RAID controller the controller controls the SATA drives' on-board write 
cache, it may not allow the OS to enable/disable the drives' on-board write 
cache.


I see what you're saying. I just think that with the BIOS turned off, this 
card is essentially acting like a dumb SATA controller, and therefore not 
doing anything with the drives' cache.




Using 'dd' to the raw disk will also show the same poor performance if the HD 
on-board write-cache is disabled.


The other thing that's weird is the writes. I am seeing writes in that 
3.5MB/sec range during the resilver, *and* I was seeing the same thing 
during the dd.


Was the 'dd' to the raw disk? Either was it shows the HDs aren't setup 
properly.


This is from the resilver, but again, the dd was similar. c7d0 is the 
device in question:


  r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  0.0  238.00.0  476.0  0.0  1.00.04.1   0  99 c12d1
 30.8   37.8 3302.4 3407.2 14.1  2.0  206.0   29.2 100 100 c7d0
 80.40.0 3417.60.0  0.3  0.33.33.2   8  14 c8d0
 80.40.0 3417.60.0  0.3  0.33.43.2   9  14 c9d0
 80.60.0 3417.60.0  0.3  0.33.43.2   9  14 c10d0
 80.60.0 3417.60.0  0.3  0.33.33.1   9  14 c11d0
  0.00.00.00.0  0.0  0.00.00.0   0   0 c12t0d0


Try using 'format -e' on the drives, go into 'cache' then 'write-cache' and 
display the current state. You can try to manually enable it from there.




I tried this, but the 'cache' menu item didn't show up. The man page says 
it only works for SCSI disks. Do you know of any other way to get/set 
those parameters?


Paul
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] extremely slow writes (with good reads)

2009-09-27 Thread Paul Archer

1:19pm, Richard Elling wrote:

The other thing that's weird is the writes. I am seeing writes in that 
3.5MB/sec range during the resilver, *and* I was seeing the same thing 
during the dd.
This is from the resilver, but again, the dd was similar. c7d0 is the 
device in question:


  r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  0.0  238.00.0  476.0  0.0  1.00.04.1   0  99 c12d1
 30.8   37.8 3302.4 3407.2 14.1  2.0  206.0   29.2 100 100 c7d0


This is the bottleneck. 29.2 ms average service time is slow.
As you can see, this causes a backup in the queue, which is
seeing an average service time of 206 ms.

The problem could be the disk itself or anything in the path
to that disk, including software.  But first, look for hardware
issues via
iostat -E
fmadm faulty
fmdump -eV



I don't see anything in the output of these commands except for the ZFS 
errors from when I was trying to get the disk online and resilvered.
I estimate another 10-15 hours before this disk is finished resilvering 
and the zpool is OK again. At that time, I'm going to switch some hardware 
out (I've got a newer and higher-end LSI card that I hadn't used before 
because it's PCI-X, and won't fit on my current motherboard.)
I'll report back what I get with it tomorrow or the next day, depending on 
the timing on the resilver.


Paul Archer
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] extremely slow writes (with good reads)

2009-09-25 Thread Paul Archer
Since I got my zfs pool working under solaris (I talked on this list 
last week about moving it from linux  bsd to solaris, and the pain that 
was), I'm seeing very good reads, but nada for writes.


Reads:

r...@shebop:/data/dvds# rsync -aP young_frankenstein.iso /tmp
sending incremental file list
young_frankenstein.iso
^C1032421376  20%   86.23MB/s0:00:44

Writes:

r...@shebop:/data/dvds# rsync -aP /tmp/young_frankenstein.iso yf.iso
sending incremental file list
young_frankenstein.iso
^C  68976640   6%2.50MB/s0:06:42


This is pretty typical of what I'm seeing.


r...@shebop:/data/dvds# zpool status -v
  pool: datapool
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
pool will no longer be accessible on older software versions.
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
datapoolONLINE   0 0 0
  raidz1ONLINE   0 0 0
c2d0s0  ONLINE   0 0 0
c3d0s0  ONLINE   0 0 0
c4d0s0  ONLINE   0 0 0
c6d0s0  ONLINE   0 0 0
c5d0s0  ONLINE   0 0 0

errors: No known data errors

  pool: syspool
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
syspool ONLINE   0 0 0
  c0d1s0ONLINE   0 0 0

errors: No known data errors

(This is while running an rsync from a remote machine to a ZFS filesystem)
r...@shebop:/data/dvds# iostat -xn 5
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   11.14.8  395.8  275.9  5.8  0.1  364.74.3   2   5 c0d1
9.8   10.9  514.3  346.4  6.8  1.4  329.7   66.7  68  70 c5d0
9.8   10.9  516.6  346.4  6.7  1.4  323.1   66.2  67  70 c6d0
9.7   10.9  491.3  346.3  6.7  1.4  324.7   67.2  67  70 c3d0
9.8   10.9  519.9  346.3  6.8  1.4  326.7   67.2  68  71 c4d0
9.8   11.0  493.5  346.6  3.6  0.8  175.3   37.9  38  41 c2d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c0t0d0
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.00.00.00.0  0.0  0.00.00.0   0   0 c0d1
   64.6   12.6 8207.4  382.1 32.8  2.0  424.7   25.9 100 100 c5d0
   62.2   12.2 7203.2  370.1 27.9  2.0  375.1   26.7  99 100 c6d0
   53.2   11.8 5973.9  390.2 25.9  2.0  398.8   30.5  98  99 c3d0
   49.4   10.6 5398.2  389.8 30.2  2.0  503.7   33.3  99 100 c4d0
   45.2   12.8 5431.4  337.0 14.3  1.0  247.3   17.9  52  52 c2d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c0t0d0


Any ideas?

Paul
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] extremely slow writes (with good reads)

2009-09-25 Thread Paul Archer
Oh, for the record, the drives are 1.5TB SATA, in a 4+1 raidz-1 config. 
All the drives are on the same LSI 150-6 PCI controller card, and the M/B 
is a generic something or other with a triple-core, and 2GB RAM.


Paul


3:34pm, Paul Archer wrote:

Since I got my zfs pool working under solaris (I talked on this list last 
week about moving it from linux  bsd to solaris, and the pain that was), I'm 
seeing very good reads, but nada for writes.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] moving files from one fs to another, splittin/merging

2009-09-24 Thread Paul Archer
I may have missed something in the docs, but if I have a file in one FS, 
and want to move it to another FS (assuming both filesystems are on the 
same ZFS pool), is there a way to do it outside of the standard 
mv/cp/rsync commands? For example, I have a pool with my home directory as 
a FS, and I have another FS with ISOs. I download an ISO of an OpenSolaris 
DVD (say, 3GB), but it goes into my home directory. Since ZFS is all about 
pools and shared storage, it seems like it would be natural to move the 
file vi a 'zfs' command, rather mv/cp/etc...


On a related(?) note, is there a way to split an existing filesystem? To 
use the example above, let's say I have an ISO directory in my home 
directory, but it's getting big, plus I'd like to share it out on my 
network. Is there a way to split my home directory's FS, so that the ISO 
directory becomes its own FS?


Paul Archer
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] moving files from one fs to another, splittin/merging

2009-09-24 Thread Paul Archer

Thanks for the info. Glad to hear it's in the works, too.

Paul


1:21pm, Mark J Musante wrote:


On Thu, 24 Sep 2009, Paul Archer wrote:

I may have missed something in the docs, but if I have a file in one FS, 
and want to move it to another FS (assuming both filesystems are on the 
same ZFS pool), is there a way to do it outside of the standard mv/cp/rsync 
commands?


Not yet.  CR 6483179 covers this.


On a related(?) note, is there a way to split an existing filesystem?


Not yet.  CR 6400399 covers this.


Regards,
markm


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



[zfs-discuss] SOLVED: Re: migrating from linux to solaris ZFS

2009-09-20 Thread Paul Archer

Thursday, Paul Archer wrote:


Tomorrow, Fajar A. Nugraha wrote:


There was a post from Ricardo on zfs-fuse list some time ago.
Apparently  if you do a zpool create on whole disks, Linux on
Solaris behaves differently:
- solaris will create EFI partition on that disk, and use the partition as 
vdev

- Linux will use the whole disk without any partition, just like with
a file-based vdev.

The result is that you might be unable to import the pool on *solaris or 
*BSD.


The recommended way to create a portable pool is to create the pool
on a partition setup recognizable on all those OS. He suggested a
simple DOS/MBR partition table.

So in short, if you had created the pool on top of sda1 instead of
sda, it will work. I'm surprised though that you can offlined sda and
replaced it with sda1 when previously you said I see that if I try
to replace sda with sda1, zpool complains that sda1 is too small



I was a bit surprised about that, too. But I found that a standard PC/Linux 
partition reserves around 24MB at the beginning of the disk, and an EFI (or 
actually, GPT) disklabel and partition only uses a few 100KB.




As I mentioned above, I created GPT disklabels and partitions on all my 
disks, then one-by-one offlined the disk and replaced it with the 
partition from the same disk (eg 'zpool replace datapool ad1 ad1p1').
I did the first replacement with Linux and zfs-fuse. The resilver took 32 
hours. I did the rest in FreeBSD, which took 5-6 hours for each disk.


It was tedious, but the pool is available in Solaris (finally!), so 
hopefully no more NFS issues or kernel panics. (I had NFS issues with 
both Linux and BSD, and kernel panics with BSD.)


Paul

PS. Complicating matters was the fact that for some reason, BSD didn't 
like my LSI 150-6 SATA card (which is the only one Solaris plays nice 
with), so I had to keep switching cards every time I went from one OS to 
the other. Blech. OTOH, here's to Live CDs!

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] migrating from linux to solaris ZFS

2009-09-17 Thread Paul Archer
I recently (re)built a fileserver at home, using Ubuntu and zfs-fuse to 
create a ZFS filesystem (RAIDz1) on five 1.5TB drives.


I had some serious issues with NFS not working properly (kept getting 
stale file handles), so I tried to switch to OpenSolaris/Nexenta, but my 
SATA controller wasn't supported.


I went to FreeBSD, and got ZFS working there, and was able to 
import the ZFS pool that I had created under Linux and zfs-fuse. But I had 
issues with kernel panics.


Finally, I found a SATA card that would work with Solaris (an old LSI 
150-6). I upgraded the firmware and turned off the BIOS (so it would act 
as a plain SATA card, rather than doing RAID), and I could finally access 
the drives under Solaris.


Now my problem is that even though Solaris can see the drives, and 
recognizes that I have a ZFS pool, it won't import it. This isn't a case 
of using -f to force the import. Rather, even though the drives are all 
online and showing as available, 'zpool import' says I have 
insufficient replicas and that the raidz is unavailable due to 
corrupted data. (I can post screen caps later today.)


I can reboot into Linux and import the pools, but haven't figured out why 
I can't import them in Solaris. I don't know if it makes a difference (I 
wouldn't think so), but zfs-fuse under Linux is using ZFS version 13, 
where Nexenta is using version 14.


Any ideas?


Paul
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] migrating from linux to solaris ZFS

2009-09-17 Thread Paul Archer

10:09pm, Fajar A. Nugraha wrote:


On Thu, Sep 17, 2009 at 8:55 PM, Paul Archer p...@paularcher.org wrote:

I can reboot into Linux and import the pools, but haven't figured out why I
can't import them in Solaris. I don't know if it makes a difference (I
wouldn't think so), but zfs-fuse under Linux is using ZFS version 13, where
Nexenta is using version 14.


Just a guess, but did you use the whole drive while creating the pool
on Linux? Something like zpool create poolname raidz sda sdb sdc ?

Yes, I did. I was under the impression that was the way to go. If it's not 
(ie it needs to be a single disk-sized partion), I can try moving. I'm 
assuming if I add a partition, I can do something like:

zpool replace datapool sda sda1

Paul
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] migrating from linux to solaris ZFS

2009-09-17 Thread Paul Archer

10:40am, Paul Archer wrote:

I can reboot into Linux and import the pools, but haven't figured out why 
I

can't import them in Solaris. I don't know if it makes a difference (I
wouldn't think so), but zfs-fuse under Linux is using ZFS version 13, 
where

Nexenta is using version 14.


Just a guess, but did you use the whole drive while creating the pool
on Linux? Something like zpool create poolname raidz sda sdb sdc ?

Yes, I did. I was under the impression that was the way to go. If it's not 
(ie it needs to be a single disk-sized partion), I can try moving. I'm 
assuming if I add a partition, I can do something like:

zpool replace datapool sda sda1

Or not. I see that if I try to replace sda with sda1, zpool complains that 
sda1 is too small.


Any suggestions (that don't include 'start over')?

Paul
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] migrating from linux to solaris ZFS

2009-09-17 Thread Paul Archer

5:08pm, Darren J Moffat wrote:


Paul Archer wrote:

10:09pm, Fajar A. Nugraha wrote:


On Thu, Sep 17, 2009 at 8:55 PM, Paul Archer p...@paularcher.org wrote:
I can reboot into Linux and import the pools, but haven't figured out why 
I

can't import them in Solaris. I don't know if it makes a difference (I
wouldn't think so), but zfs-fuse under Linux is using ZFS version 13, 
where

Nexenta is using version 14.


Just a guess, but did you use the whole drive while creating the pool
on Linux? Something like zpool create poolname raidz sda sdb sdc ?

Yes, I did. I was under the impression that was the way to go. If it's not 
(ie it needs to be a single disk-sized partion), I can try moving. I'm 
assuming if I add a partition, I can do something like:

zpool replace datapool sda sda1


What kind of partition table is on the disks, is it EFI ?  If not that might 
be part of the issue.


I don't believe there is any partition table on the disks. I pointed zfs 
to the raw disks when I setup the pool.


Paul
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] migrating from linux to solaris ZFS

2009-09-17 Thread Paul Archer

6:44pm, Darren J Moffat wrote:


Paul Archer wrote:
What kind of partition table is on the disks, is it EFI ?  If not that 
might be part of the issue.


I don't believe there is any partition table on the disks. I pointed zfs to 
the raw disks when I setup the pool.


If you run fdisk on OpenSolaris against this disk what does it show as the 
partition type eg:


fdisk -v /dev/rdsk/c7t4d0p0

Mine shows:

 1 EFI   0  4560045601100

Which tells me I have an EFI label on the disk.

My boot ZFS pool shows this:

one one side of the mirror:

 1 Diagnostic0 3   4  0
 2   ActiveSolaris2  4  4559945596100
and on the other:

 1   ActiveSolaris2  1  4559945599100

--


I just took a look, and it seems that all the drives have a single 
partition on them. I'm looking under Linux, as I can't reboot it into 
Solaris again until I get home tonight.


r...@ubuntu:~# fdisk -l /dev/sda

Disk /dev/sda: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0xce13f90b

   Device Boot  Start End  Blocks   Id  System
/dev/sda1   1  182401  1465136001   83  Linux

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] migrating from linux to solaris ZFS

2009-09-17 Thread Paul Archer

7:37pm, Darren J Moffat wrote:


Paul Archer wrote:

r...@ubuntu:~# fdisk -l /dev/sda

Disk /dev/sda: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0xce13f90b

   Device Boot  Start End  Blocks   Id  System
/dev/sda1   1  182401  1465136001   83  Linux


That is good enough.  That is your problem right there.  Solaris doesn't 
recognise this partition type.  FreeBSD I think does.


I'm not sure what you can do to get Solaris to recognise this.  If there is a 
non destructive way under Linux to change this to an EFI partition that would 
be a good way to start.


I doubt that simply changing the tag from Linux (32) to Solaris2 (191) would 
be enough since you would lack the vtoc in there.  Plus ideally you want this 
as EFI unless you need to put OpenSolaris into that pool to boot from it - 
but sounds like you don't.




I did a little research and found that parted on Linux handles EFI 
labelling. I used it to change the partition scheme on sda, creating an 
sda1. I then offlined sda and replaced it with sda1. I wish I had just 
tried a scrub instead of the replace, though, as I've gotta wait about 35 
hours for the resilver to finish. (1.5TB data on five disks with a single 
PCI controller card.)


Paul
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] migrating from linux to solaris ZFS

2009-09-17 Thread Paul Archer

Tomorrow, Fajar A. Nugraha wrote:


There was a post from Ricardo on zfs-fuse list some time ago.
Apparently  if you do a zpool create on whole disks, Linux on
Solaris behaves differently:
- solaris will create EFI partition on that disk, and use the partition as vdev
- Linux will use the whole disk without any partition, just like with
a file-based vdev.

The result is that you might be unable to import the pool on *solaris or *BSD.

The recommended way to create a portable pool is to create the pool
on a partition setup recognizable on all those OS. He suggested a
simple DOS/MBR partition table.

So in short, if you had created the pool on top of sda1 instead of
sda, it will work. I'm surprised though that you can offlined sda and
replaced it with sda1 when previously you said I see that if I try
to replace sda with sda1, zpool complains that sda1 is too small



I was a bit surprised about that, too. But I found that a standard 
PC/Linux partition reserves around 24MB at the beginning of the disk, and 
an EFI (or actually, GPT) disklabel and partition only uses a few 100KB.


Paul
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss