Re: [zfs-discuss] SSD and ZFS

2010-02-13 Thread Tracey Bernath
Thanks Brendan,
I was going to move it over to 8kb block size once I got through this index
rebuild. My thinking was that a disproportionate block size would show up as
excessive IO thruput, not a lack of thruput.

The question about the cache comes from the fact that the 18GB or so that it
says is in the cache IS the database. This was why I was thinking the index
rebuild should be CPU constrained, and I should see a spike in reading from
the cache.  If the entire file is cached, why would it go to the disks at
all for the reads?

The disks are delivering about 30MB/s of reads, but this SSD is rated for
sustained 70MB/s, so there should be a chance to pick up 100% gain.

I've seen lots of mention of kernel settings, but those only seem to apply
to cache flushes on sync writes.

Any idea on where to look next? I've spent about a week tinkering with
it.I'm trying to get a major customer to switch over to zfs and an open
storage solution, but I'm afraid if I cant get it to work in the small
scale, I cant convince them about the large scale.

Thanks,
Tracey


On Fri, Feb 12, 2010 at 4:43 PM, Brendan Gregg - Sun Microsystems 
bren...@sun.com wrote:

 On Fri, Feb 12, 2010 at 02:25:51PM -0800, TMB wrote:
  I have a similar question, I put together a cheapo RAID with four 1TB WD
 Black (7200) SATAs, in a 3TB RAIDZ1, and I added a 64GB OCZ Vertex SSD, with
 slice 0 (5GB) for ZIL and the rest of the SSD  for cache:
  # zpool status dpool
pool: dpool
   state: ONLINE
   scrub: none requested
  config:
 
  NAMESTATE READ WRITE CKSUM
  dpool   ONLINE   0 0 0
raidz1ONLINE   0 0 0
  c0t0d0  ONLINE   0 0 0
  c0t0d1  ONLINE   0 0 0
  c0t0d2  ONLINE   0 0 0
  c0t0d3  ONLINE   0 0 0
  [b]logs
c0t0d4s0  ONLINE   0 0 0[/b]
  [b]cache
c0t0d4s1  ONLINE   0 0 0[/b]
  spares
c0t0d6AVAIL
c0t0d7AVAIL
 
 capacity operationsbandwidth
  pool used  avail   read  write   read  write
  --  -  -  -  -  -  -
  dpool   72.1G  3.55T237 12  29.7M   597K
raidz172.1G  3.55T237  9  29.7M   469K
  c0t0d0  -  -166  3  7.39M   157K
  c0t0d1  -  -166  3  7.44M   157K
  c0t0d2  -  -166  3  7.39M   157K
  c0t0d3  -  -167  3  7.45M   157K
c0t0d4s020K  4.97G  0  3  0   127K
  cache   -  -  -  -  -  -
c0t0d4s1  17.6G  36.4G  3  1   249K   119K
  --  -  -  -  -  -  -
  I just don't seem to be getting any bang for the buck I should be.  This
 was taken while rebuilding an Oracle index, all files stored in this pool.
  The WD disks are at 100%, and nothing is coming from the cache.  The cache
 does have the entire DB cached (17.6G used), but hardly reads anything from
 it.  I also am not seeing the spike of data flowing into the ZIL either,
 although iostat show there is just write traffic hitting the SSD:
 
   extended device statistics  cpu
  devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b  us sy wt id
  sd0 170.00.4 7684.70.0  0.0 35.0  205.3   0 100  11  8  0 82
  sd1 168.40.4 7680.20.0  0.0 34.6  205.1   0 100
  sd2 172.00.4 7761.70.0  0.0 35.0  202.9   0 100
  sd3   0.0  0.0  0.00.0  0.0  0.00.0   0   0
  sd4 170.00.4 7727.10.0  0.0 35.0  205.3   0 100
  [b]sd5   1.6  2.6  182.4  104.8  0.0  0.5  117.8   0  31 [/b]
 
  Since this SSD is in a RAID array, and just presents as a regular disk
 LUN, is there a special incantation required to turn on the Turbo mode?
 
  Doesnt it seem that all this traffic should be maxing out the SSD? Reads
 from the cache, and writes to the ZIL? I have a seocnd identical SSD I
 wanted to add as a mirror, but it seems pointless if there's no zip to be
 had

 The most likely reason is that this workload has been identified as
 streaming
 by ZFS, which is prefetching from disk instead of the L2ARC
 (l2arc_nopreftch=1).

 It also looks like you've used a 128 Kbyte ZFS record size.  Is Oracle
 doing
 128 Kbyte random I/O?  We usually tune that down before creating the
 database;
 which will use the L2ARC device more efficiently.

 Brendan

 --
 Brendan Gregg, Fishworks
 http://blogs.sun.com/brendan




-- 
Tracey Bernath
913-488-6284
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

2010-02-13 Thread Andy Stenger
I had a very similar problem. 8 external USB drives running OpenSolaris
native. When I moved the machine into a different room and powered it back
up (there were a couple of reboots and a couple of broken usb cables and
drive shut downs in between), I got the same error. Loosing that much data
is definitely a shock.

I m running zraid2 and I would have assumed that a 2 level redundancy should
fine to toss a lot of roughness at the pool.

After panicking a little, stressing my family out, and some playing with zdb
that lead nowhere, I did a
zpool export mypool
zpool import mypool

It complained about being unable to mount because the mount point was not
empty, so I did
umount /mypool/mypool
zfs mount mypool/mypool
zfs status mypool

and to my relieving surprise it seems all fine.
ls /mypool/mypool

does show data.

Scrub is running right now to be on the safe side.

Thought that may help some folks out there.

Cheers!

Andy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

2010-02-13 Thread Remco Lengers
I just have the say this, and I don't mean it in a bad way... If you 
really care about your data why then use usb drives with lose cables and 
(apparently no backup)


USB connected drives for data backup are okay, for playing around and 
getting to know ZFS seems also okay. Using it for online data that you 
care about and expecting it to be reliable...its just not the right 
technology for that imho.


..Remco

On 2/13/10 11:23 AM, Andy Stenger wrote:
I had a very similar problem. 8 external USB drives running 
OpenSolaris native. When I moved the machine into a different room and 
powered it back up (there were a couple of reboots and a couple of 
broken usb cables and drive shut downs in between), I got the same 
error. Loosing that much data is definitely a shock.


I m running zraid2 and I would have assumed that a 2 level redundancy 
should fine to toss a lot of roughness at the pool.


After panicking a little, stressing my family out, and some playing 
with zdb that lead nowhere, I did a

zpool export mypool
zpool import mypool

It complained about being unable to mount because the mount point was 
not empty, so I did

umount /mypool/mypool
zfs mount mypool/mypool
zfs status mypool

and to my relieving surprise it seems all fine.
ls /mypool/mypool

does show data.

Scrub is running right now to be on the safe side.

Thought that may help some folks out there.

Cheers!

Andy


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Oracle Performance - ZFS vs UFS

2010-02-13 Thread Tony MacDoodle
Was wondering if anyone has had any performance issues with Oracle running
on ZFS as compared to UFS?

Thanks
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-13 Thread Edward Ned Harvey
I have a new server, with 7 disks in it.  I am performing benchmarks on it
before putting it into production, to substantiate claims I make, like
striping mirrors is faster than raidz and so on.  Would anybody like me to
test any particular configuration?  Unfortunately I don't have any SSD, so I
can't do any meaningful test on the ZIL etc.  Unless someone in the Boston
area has a 2.5 SAS SSD they wouldn't mind lending for a few hours.  ;-)

 

My hardware configuration:  Dell PE 2970 with 8 cores.  Normally 32G, but I
pulled it all out to get it down to 4G of ram.  (Easier to benchmark disks
when the file operations aren't all cached.)  ;-)  Solaris 10 10/09.  PERC
6/i controller.  All disks are configured in PERC for Adaptive ReadAhead,
and Write Back, JBOD.  7 disks present, each SAS 15krpm 160G.  OS is
occupying 1 disk, so I have 6 disks to play with.

 

I am currently running the following tests:

 

Will test, including the time to flush(), various record sizes inside file
sizes up to 16G, sequential write and sequential read.  Not doing any mixed
read/write requests.  Not doing any random read/write.

iozone -Reab somefile.wks -g 17G -i 1 -i 0

 

Configurations being tested:

. Single disk

. 2-way mirror

. 3-way mirror

. 4-way mirror

. 5-way mirror

. 6-way mirror

. Two mirrors striped (or concatenated)

. Three mirrors striped (or concatenated)

. 5-disk raidz

. 6-disk raidz

. 6-disk raidz2

 

Hypothesized results:

. N-way mirrors write at the same speed of a single disk

. N-way mirrors read n-times faster than a single disk

. Two mirrors striped read and write 2x faster than a single mirror

. Three mirrors striped read and write 3x faster than a single
mirror

. Raidz and raidz2:  No hypothesis.  Some people say they perform
comparable to many disks working together.  Some people say it's slower than
a single disk.  Waiting to see the results.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs import fails even though all disks are online

2010-02-13 Thread Victor Latushkin

Mark J Musante wrote:

On Thu, 11 Feb 2010, Cindy Swearingen wrote:


On 02/11/10 04:01, Marc Friesacher wrote:


fr...@vault:~# zpool import
  pool: zedpool
id: 10232199590840258590
 state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:

zedpoolONLINE
  raidz1   ONLINE
c4d0   ONLINE
c5d0   ONLINE
c6d0   ONLINE
c7d0   ONLINE
logs
zedpoolONLINE
  mirror   ONLINE
c12t0d0p0  ONLINE
c10t0d0p0  ONLINE


Is this the actual unedited config output?  I've never seen the name of 
the pool show up after logs.


I've looked into it and think this is

6599442 zpool import has faults in the display

which is fixed in build 116, whereas system is running build 111b

One thing you can try is to use dtrace to look at any 
ldi_open_by_name(), ldi_open_by_devid(), or ldi_open_by_dev() calls that 
zfs makes.  That may give a clue as to what's going wrong.


fmdump -eV suggests that there are issues with pool-wide metadata objects, so 
device in cannot import 'zedpool': one or more devices is currently 
unavailable probably refers to raidz top-level vdev.


Pool recovery would help to recover this pool

regards,
victor




Regards,
markm
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD and ZFS

2010-02-13 Thread Richard Elling
comment below...

On Feb 12, 2010, at 2:25 PM, TMB wrote:
 I have a similar question, I put together a cheapo RAID with four 1TB WD 
 Black (7200) SATAs, in a 3TB RAIDZ1, and I added a 64GB OCZ Vertex SSD, with 
 slice 0 (5GB) for ZIL and the rest of the SSD  for cache:
 # zpool status dpool
  pool: dpool
 state: ONLINE
 scrub: none requested
 config:
 
NAMESTATE READ WRITE CKSUM
dpool   ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c0t0d0  ONLINE   0 0 0
c0t0d1  ONLINE   0 0 0
c0t0d2  ONLINE   0 0 0
c0t0d3  ONLINE   0 0 0
 [b]logs
  c0t0d4s0  ONLINE   0 0 0[/b]
 [b]cache
  c0t0d4s1  ONLINE   0 0 0[/b]
spares
  c0t0d6AVAIL   
  c0t0d7AVAIL   
 
   capacity operationsbandwidth
 pool used  avail   read  write   read  write
 --  -  -  -  -  -  -
 dpool   72.1G  3.55T237 12  29.7M   597K
  raidz172.1G  3.55T237  9  29.7M   469K
c0t0d0  -  -166  3  7.39M   157K
c0t0d1  -  -166  3  7.44M   157K
c0t0d2  -  -166  3  7.39M   157K
c0t0d3  -  -167  3  7.45M   157K
  c0t0d4s020K  4.97G  0  3  0   127K
 cache   -  -  -  -  -  -
  c0t0d4s1  17.6G  36.4G  3  1   249K   119K
 --  -  -  -  -  -  -
 I just don't seem to be getting any bang for the buck I should be.  This was 
 taken while rebuilding an Oracle index, all files stored in this pool.  The 
 WD disks are at 100%, and nothing is coming from the cache.  The cache does 
 have the entire DB cached (17.6G used), but hardly reads anything from it.  I 
 also am not seeing the spike of data flowing into the ZIL either, although 
 iostat show there is just write traffic hitting the SSD:
 
 extended device statistics  cpu
 devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b  us sy wt id
 sd0 170.00.4 7684.70.0  0.0 35.0  205.3   0 100  11  8  0 82
 sd1 168.40.4 7680.20.0  0.0 34.6  205.1   0 100 
 sd2 172.00.4 7761.70.0  0.0 35.0  202.9   0 100 
 sd3   0.0  0.0  0.00.0  0.0  0.00.0   0   0 
 sd4 170.00.4 7727.10.0  0.0 35.0  205.3   0 100 
 [b]sd5   1.6  2.6  182.4  104.8  0.0  0.5  117.8   0  31 [/b]

iostat has a n option, which is very useful for looking at device names :-)

The SSD here is perfoming well.  The rest are clobbered. 205 millisecond
response time will be agonizingly slow.

By default, for this version of ZFS, up to 35 I/Os will be queued to the
disk, which is why you see 35.0 in the actv column. The combination
of actv=35 and svc_t200 indicates that this is the place to start working.
Begin by reducing zfs_vdev_max_pending from 35 to something like 1 to 4.
This will reduce the concurrent load on the disks, thus reducing svc_t.
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Device_I.2FO_Queue_Size_.28I.2FO_Concurrency.29

 -- richard

 Since this SSD is in a RAID array, and just presents as a regular disk LUN, 
 is there a special incantation required to turn on the Turbo mode?
 
 Doesnt it seem that all this traffic should be maxing out the SSD? Reads from 
 the cache, and writes to the ZIL? I have a seocnd identical SSD I wanted to 
 add as a mirror, but it seems pointless if there's no zip to be had
 
 help?
 
 Thanks,
 Tracey
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Oracle Performance - ZFS vs UFS

2010-02-13 Thread Richard Elling
On Feb 13, 2010, at 5:23 AM, Tony MacDoodle wrote:

 Was wondering if anyone has had any performance issues with Oracle running on 
 ZFS as compared to UFS?

The ZFS for Databases wiki  is the place to collect information and advice 
for database on ZFS.
http://www.solarisinternals.com/wiki/index.php/ZFS_for_Databases

I notice that it is missing some later research results and will try to update
it over the next few days.

ZFS can perform better or worse than UFS.  Follow the recommendations
for configuration with your database to avoid wasting time rediscovering
the new world :-)
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-13 Thread Richard Elling
Some thoughts below...

On Feb 13, 2010, at 6:06 AM, Edward Ned Harvey wrote:

 I have a new server, with 7 disks in it.  I am performing benchmarks on it 
 before putting it into production, to substantiate claims I make, like 
 “striping mirrors is faster than raidz” and so on.  Would anybody like me to 
 test any particular configuration?  Unfortunately I don’t have any SSD, so I 
 can’t do any meaningful test on the ZIL etc.  Unless someone in the Boston 
 area has a 2.5” SAS SSD they wouldn’t mind lending for a few hours.  ;-)
  
 My hardware configuration:  Dell PE 2970 with 8 cores.  Normally 32G, but I 
 pulled it all out to get it down to 4G of ram.  (Easier to benchmark disks 
 when the file operations aren’t all cached.)  ;-)  Solaris 10 10/09.  PERC 
 6/i controller.  All disks are configured in PERC for Adaptive ReadAhead, and 
 Write Back, JBOD.  7 disks present, each SAS 15krpm 160G.  OS is occupying 1 
 disk, so I have 6 disks to play with.

Put the memory back in and limit the ARC cache size instead. x86 boxes
have a tendency to change the memory bus speed depending on how much
memory is in the box.

Similarly, you can test primarycache settings rather than just limiting ARC 
size.

 I am currently running the following tests:
  
 Will test, including the time to flush(), various record sizes inside file 
 sizes up to 16G, sequential write and sequential read.  Not doing any mixed 
 read/write requests.  Not doing any random read/write.
 iozone -Reab somefile.wks -g 17G -i 1 -i 0

IMHO, sequential tests are a waste of time.  With default configs, it will be 
difficult to separate the raw performance from prefetched performance.
You might try disabling prefetch as an option.

With sync writes, you will run into the zfs_immediate_write_sz boundary.

Perhaps someone else can comment on how often they find interesting 
sequential workloads which aren't backup-related.

 Configurations being tested:
 · Single disk
 · 2-way mirror
 · 3-way mirror
 · 4-way mirror
 · 5-way mirror
 · 6-way mirror
 · Two mirrors striped (or concatenated)
 · Three mirrors striped (or concatenated)
 · 5-disk raidz
 · 6-disk raidz
 · 6-disk raidz2

Please add some raidz3 tests :-)  We have little data on how raidz3 performs.

  
 Hypothesized results:
 · N-way mirrors write at the same speed of a single disk
 · N-way mirrors read n-times faster than a single disk
 · Two mirrors striped read and write 2x faster than a single mirror
 · Three mirrors striped read and write 3x faster than a single mirror
 · Raidz and raidz2:  No hypothesis.  Some people say they perform 
 comparable to many disks working together.  Some people say it’s slower than 
 a single disk.  Waiting to see the results.

Please post results (with raw data would be nice ;-).  If you would be so
kind as to collect samples of iosnoop -Da I would be eternally grateful :-)
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Oracle Performance - ZFS vs UFS

2010-02-13 Thread Jim Mauro

Using ZFS for Oracle can be configured to deliver very good performance.
Depending on what your priorities are in terms of critical metrics, keep 
in mind

that the most performant solution is to use Oracle ASM on raw disk devices.
That is not intended to imply anything negative about ZFS or UFS. The simple
fact is that when you but your Oracle datafiles on any file system, 
there's a much
longer code path involved in reading and writing files, along with the 
file systems
use of memory that needs to be considered. ZFS offers enterprise-class 
features

(the admin model, snapshots, etc) that make it a great choice to deploy in
production, but, from a pure performance point-of-view, it's not going to be
the absolute fastest. Configured correctly, it can meet or exceed 
performance

requirements.

For Oracle, you need to;
- Make sure you're the latest Solaris 10 update release (update 8).
- For the datafiles, set the recordsize to align with the db_block_size (8k)
- Put the redo logs on a seperate zpool, with the default 128k recordsize
- Disable ZFS data caching (primarycache=metadata). Let Oracle cache the 
data

   in the SGA.
- Watch your space in your zpools - don't run them at 90% full.

Read the link Richard sent for some additional information.

Thanks,
/jim


Tony MacDoodle wrote:
Was wondering if anyone has had any performance issues with Oracle 
running on ZFS as compared to UFS?


Thanks


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-13 Thread Bob Friesenhahn

On Sat, 13 Feb 2010, Edward Ned Harvey wrote:


Will test, including the time to flush(), various record sizes inside file 
sizes up to 16G,
sequential write and sequential read.  Not doing any mixed read/write 
requests.  Not doing any
random read/write.

iozone -Reab somefile.wks -g 17G -i 1 -i 0


Make sure to also test with a command like

  iozone -m -t 8 -T -O -r 128k -o -s 12G

I am eager to read your test report.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-13 Thread Bob Friesenhahn

On Sat, 13 Feb 2010, Bob Friesenhahn wrote:


Make sure to also test with a command like

 iozone -m -t 8 -T -O -r 128k -o -s 12G


Actually, it seems that this is more than sufficient:

  iozone -m -t 8 -T -r 128k -o -s 4G

since it creates a 4GB test file for each thread, with 8 threads.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] available space

2010-02-13 Thread Charles Hedrick
I have the following pool:

NAME   SIZE   USED  AVAILCAP  HEALTH  ALTROOT
OIRT  6.31T  3.72T  2.59T58%  ONLINE  /

zfs list shows the following for a typical file system:

NAMEUSED  AVAIL  REFER  MOUNTPOINT
OIRT/sakai/production  1.40T  1.77T  1.40T  /OIRT/sakai/production

Why is available lower when shown by zfs than zpool?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] available space

2010-02-13 Thread Thomas Burgess
one shows pool size, one shows filesystem size.


the pool size is based on raw space.

the zfs list size shows how much is used and how much usable space is
ableable.

for instance, i use raidz2 with 1tb drives so if i do zpool list i see ALL
the space, including parity, but if i do zfs list i only see how much space
the filesystem seems.

2 different tools for 2 different jobs.


On Sat, Feb 13, 2010 at 12:28 PM, Charles Hedrick hedr...@rutgers.eduwrote:

 I have the following pool:

 NAME   SIZE   USED  AVAILCAP  HEALTH  ALTROOT
 OIRT  6.31T  3.72T  2.59T58%  ONLINE  /

 zfs list shows the following for a typical file system:

 NAMEUSED  AVAIL  REFER  MOUNTPOINT
 OIRT/sakai/production  1.40T  1.77T  1.40T  /OIRT/sakai/production

 Why is available lower when shown by zfs than zpool?
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pool import with failed ZIL device now possible ?

2010-02-13 Thread Charles Hedrick
I have a similar situation. I have a system that is used for backup copies of 
logs and other non-critical things, where the primary copy is on a Netapp. Data 
gets written in batches a few times a day. We use this system because storage 
on it is a lot less expensive than on the Netapp. It's only non-critical data 
that is sent via NFS. Critical data is sent to this server either by zfs send | 
receive, or by an rsync running on the server that reads from the Netapp over 
NFS. Thus the important data shouldn't go through the ZIL.

I am seriously considering turning off the ZIL, because NFS write performance 
is so lousy.

I'd use SSD, except that I can't find a reasonable way of doing so. I have a 
pair of servers with Sun Cluster, sharing a J4200 JBOD. If there's a failure, 
operations move to the other server. Thus a local SSD is no better than ZIL 
disabled. I'd love to put an SSD in the J4200, but the claim that this was 
going to be supported seems to have vanished.

Someone once asked why I both with redundant systems if I don't care about the 
data. The answer is that if the NFS mounts hang, my production service hang. 
Also, I do care about some of the data. It just happens not to go through the 
ZIL.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs/sol10u8 less stable than in sol10u5?

2010-02-13 Thread sean walmsley
We recently patched our X4500 from Sol10 U6 to Sol10 U8 and have not noticed 
anything like what you're seeing. We do not have any SSD devices installed.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-13 Thread Edward Ned Harvey
 IMHO, sequential tests are a waste of time.  With default configs, it

 will be

 difficult to separate the raw performance from prefetched

 performance.

 You might try disabling prefetch as an option.

 

Let me clarify:

 

Iozone does a nonsequential series of sequential tests, specifically for the
purpose of identifying the performance tiers, separating the various levels
of hardware accelerated performance from the raw disk performance.

 

This is the reason why I took out all but 4G of the system RAM.  In the
(incomplete) results I have so far, it's easy to see these tiers for a
single disk:  

. For filesizes 0 to 4M, a single disk 
writes 2.8 Gbit/sec and reads ~40-60 Gbit/sec.  
This boost comes from writing to PERC cache, and reading from CPU L2 cache.



. For filesizes 4M to 128M, a single disk 
writes 2.8 Gbit/sec and reads 24 Gbit/sec.  
This boost comes from writing to PERC cache, and reading from system memory.



. For filesizes 128M to 4G, a single disk 
writes 1.2 Gbit/sec and reads 24 Gbit/sec.
This boost comes from reading system memory.



. For filesizes 4G to 16G, a single disk
writes 1.2 Gbit/sec and reads 1.2 Gbit/sec
This is the raw disk performance.  (SAS, 15krpm, 146G disks)

 

 

 Please add some raidz3 tests :-)  We have little data on how raidz3

 performs.

 

Does this require a specific version of OS?  I'm on Solaris 10 10/09, and
man zpool doesn't seem to say anything about raidz3 ... I haven't tried
using it ... does it exist?

 

 

 Please post results (with raw data would be nice ;-).  If you would be

 so

 kind as to collect samples of iosnoop -Da I would be eternally

 grateful :-)

 

I'm guessing iosnoop is an opensolaris thing?  Is there an equivalent for
solaris?

 

I'll post both the raw results, and my simplified conclusions.  Most people
would not want the raw data.  Most people just want to know What's the
performance hit I take by using raidz2 instead of raidz?  and so on.

 

Or ... What's faster, raidz, or hardware raid-5?

 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Oracle Performance - ZFS vs UFS

2010-02-13 Thread Jason King
On Sat, Feb 13, 2010 at 9:58 AM, Jim Mauro james.ma...@sun.com wrote:
 Using ZFS for Oracle can be configured to deliver very good performance.
 Depending on what your priorities are in terms of critical metrics, keep in
 mind
 that the most performant solution is to use Oracle ASM on raw disk devices.
 That is not intended to imply anything negative about ZFS or UFS. The simple
 fact is that when you but your Oracle datafiles on any file system, there's
 a much
 longer code path involved in reading and writing files, along with the file
 systems
 use of memory that needs to be considered. ZFS offers enterprise-class
 features
 (the admin model, snapshots, etc) that make it a great choice to deploy in
 production, but, from a pure performance point-of-view, it's not going to be
 the absolute fastest. Configured correctly, it can meet or exceed
 performance
 requirements.

 For Oracle, you need to;
 - Make sure you're the latest Solaris 10 update release (update 8).
 - For the datafiles, set the recordsize to align with the db_block_size (8k)
 - Put the redo logs on a seperate zpool, with the default 128k recordsize
 - Disable ZFS data caching (primarycache=metadata). Let Oracle cache the
 data
   in the SGA.
 - Watch your space in your zpools - don't run them at 90% full.

 Read the link Richard sent for some additional information.

There is of course the caveat of using raw devices with databases (it
becomes harder to track usage, especially as the number of LUNs
increases, slightly less visibility into their usage statistics at the
OS level ).   However perhaps now someone can implement the CR I filed
a long time ago to add ASM support to libfstyp.so that would allow
zfs, mkfs, format, etc. to identify ASM volumes =)


 Thanks,
 /jim


 Tony MacDoodle wrote:

 Was wondering if anyone has had any performance issues with Oracle running
 on ZFS as compared to UFS?

 Thanks
 

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Oracle Performance - ZFS vs UFS (Jason King)

2010-02-13 Thread Allen Eastwood

 There is of course the caveat of using raw devices with databases (it
 becomes harder to track usage, especially as the number of LUNs
 increases, slightly less visibility into their usage statistics at the
 OS level ).   However perhaps now someone can implement the CR I filed
 a long time ago to add ASM support to libfstyp.so that would allow
 zfs, mkfs, format, etc. to identify ASM volumes =)

While that would be nice, I would submit that if using ASM, usage becomes 
solely a DBA problem.  From the OS level, as a system admin, I don't really 
care…I refer any questions back to the DBA.  They should have tools to deal 
with all that.

OTOH, with more things stacked on more servers (zones, etc.) I might care if 
there's a chance of whatever Oracle is doing affecting performance elsewhere.

Thoughts?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-13 Thread Bob Friesenhahn

On Sat, 13 Feb 2010, Edward Ned Harvey wrote:


 kind as to collect samples of iosnoop -Da I would be eternally 
 grateful :-)


I'm guessing iosnoop is an opensolaris thing?  Is there an equivalent for 
solaris?


Iosnoop is part of the DTrace Toolkit by Brendan Gregg, which does 
work on Solaris 10.  See http://www.brendangregg.com/dtrace.html;.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Oracle Performance - ZFS vs UFS (Jason King)

2010-02-13 Thread Jason King
My problem is when you have 100+ luns divided between OS and DB,
keeping track of what's for what can become problematic.   It becomes
even worse when you start adding luns -- the chance of accidentally
grabbing a DB lun instead of one of the new ones is non-trivial (then
there's also the chance that your storage guy might make a mistake and
give you luns already mapped elsewhere on accident -- which I have
seen happen before).  And when you're forced to do it at 3am after
already working 12 hours that day well safeguards are a good
thing.


On Sat, Feb 13, 2010 at 2:13 PM, Allen Eastwood mi...@paconet.us wrote:

 There is of course the caveat of using raw devices with databases (it
 becomes harder to track usage, especially as the number of LUNs
 increases, slightly less visibility into their usage statistics at the
 OS level ).   However perhaps now someone can implement the CR I filed
 a long time ago to add ASM support to libfstyp.so that would allow
 zfs, mkfs, format, etc. to identify ASM volumes =)

 While that would be nice, I would submit that if using ASM, usage becomes 
 solely a DBA problem.  From the OS level, as a system admin, I don't really 
 care…I refer any questions back to the DBA.  They should have tools to deal 
 with all that.

 OTOH, with more things stacked on more servers (zones, etc.) I might care if 
 there's a chance of whatever Oracle is doing affecting performance elsewhere.

 Thoughts?
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Oracle Performance - ZFS vs UFS (Jason King)

2010-02-13 Thread Allen Eastwood
So, one of the tricks I've used in the past is to assign a volname in format as 
I use luns.  Dunno if that's an option with ASM?  ZFS seems to blow those away, 
the last time I looked.

-A

On Feb 13, 2010, at 14:32 , Jason King wrote:

 My problem is when you have 100+ luns divided between OS and DB,
 keeping track of what's for what can become problematic.   It becomes
 even worse when you start adding luns -- the chance of accidentally
 grabbing a DB lun instead of one of the new ones is non-trivial (then
 there's also the chance that your storage guy might make a mistake and
 give you luns already mapped elsewhere on accident -- which I have
 seen happen before).  And when you're forced to do it at 3am after
 already working 12 hours that day well safeguards are a good
 thing.
 
 
 On Sat, Feb 13, 2010 at 2:13 PM, Allen Eastwood mi...@paconet.us wrote:
 
 There is of course the caveat of using raw devices with databases (it
 becomes harder to track usage, especially as the number of LUNs
 increases, slightly less visibility into their usage statistics at the
 OS level ).   However perhaps now someone can implement the CR I filed
 a long time ago to add ASM support to libfstyp.so that would allow
 zfs, mkfs, format, etc. to identify ASM volumes =)
 
 While that would be nice, I would submit that if using ASM, usage becomes 
 solely a DBA problem.  From the OS level, as a system admin, I don't really 
 care…I refer any questions back to the DBA.  They should have tools to deal 
 with all that.
 
 OTOH, with more things stacked on more servers (zones, etc.) I might care if 
 there's a chance of whatever Oracle is doing affecting performance elsewhere.
 
 Thoughts?
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Oracle Performance - ZFS vs UFS

2010-02-13 Thread Brad
Don't use raidz for the raid type - go with a striped set
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-13 Thread Richard Elling
On Feb 13, 2010, at 10:54 AM, Edward Ned Harvey wrote:
  Please add some raidz3 tests :-)  We have little data on how raidz3
  performs.
  
 Does this require a specific version of OS?  I'm on Solaris 10 10/09, and 
 man zpool doesn't seem to say anything about raidz3 ... I haven't tried 
 using it ... does it exist?

Never mind. I have no interest in performance tests for Solaris 10.
The code is so old, that it does not represent current ZFS at all.
IMHO, if you want to do performance tests, then you need to be
on the very latest dev release.  Otherwise, the results can't be
carried forward to make a difference -- finding performance issues
that are already fixed isn't a good use of your time.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs import fails even though all disks are online

2010-02-13 Thread Marc Friesacher
The problem has been resolved by Victor.

Thank you again for your time and effort yesterday.
I don't think I would have ever been able to get my data back without your 
level of expertise and hands-on approach.

As discussed last night, the important data has been backed up already and come 
Monday I'll be building another OS server which will be hosting a complete 
duplicate of all the data. A bit more costly than relying on RAIDZ alone, but 
at least I should never have to tell my wife that our photos, including wedding 
and honeymoon, are gone forever.

This will also give me the opportunity to update server builds without fearing 
data loss, one of the reasons I was still on 111b.

Thank you also to Cindy and Mark for trying to help me. Just having some things 
to try kept me hoping that there would be a solution.

This community rocks and ZFS does too.

Marc.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Reading ZFS config for an extended period

2010-02-13 Thread taemun
After around four days the process appeared to have stalled (no
audible hard drive activity). I restarted with milestone=none; deleted
/etc/zfs/zpool.cache, restarted, and went zpool import tank. (also
allowed root login to ssh, so I could make new ssh sessions if
required.) Now I can watch the process from on the machine.

My present question is how is the DDT stored? I believe the DDT to
have around 10M entries for this dataset, as per:
DDT-sha256-zap-duplicate: 400478 entries, size 490 on disk, 295 in core
DDT-sha256-zap-unique: 10965661 entries, size 381 on disk, 187 in core
(taken just previous to the attempt to destroy the dataset)

A sample from iopattern shows:
%RAN %SEQ  COUNTMINMAXAVG KR
 1000195512512512 97
 1000414512  65536895362
 1000261512512512130
 1000273512512512136
 1000247512512512123
 1000297512512512148
 1000292512512512146
 1000250512512512125
 1000274512512512137
 1000302512512512151
 1000294512512512147
 1000308512512512154
  982286512512512143
 1000270512512512135
 1000390512512512195
 1000269512512512134
 1000251512512512125
 1000254512512512127
 1000265512512512132
 1000283512512512141

As the pool is comprised of 2x 8-disk raidz vdevs, I presume that each
element is stored twice (for the raidz redundancy). So around 280 512b
read op/s, that's 140 entries per second.

Is the import of a semi-broken pool:
1 Reading all the DDT markers for the dataset; or
2 Reading all the DDT markers for the pool; or
3 Reading all of the block markers for the dataset; or
4 Reading all of the block markers for the pool
Prior to actually finalising what it needs to do to fix the pool? I'd
like to be able to estimate the length of time likely before the
import finishes.

Or should I tell it to roll back to the last valid txg - ie before the
zfs destroy dataset command was issued? (by zpool import -F.) Or is
this likely to take as long/longer than the present import/fix?

Cheers.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss