Re: [zfs-discuss] Problem with AOC-SAT2-MV8

2008-07-01 Thread Marc Bevand
I remember a similar pb with an AOC-SAT2-MV8 controller in a system of mine: 
Solaris rebooted each time the marvell88sx driver tried to detect the disks 
attached to it. I don't remember if happened during installation, or during 
the first boot after a successful install. I ended up spending a night reverse 
engineering the controller's firmware/BIOS to find and fix the bug. The system 
has been running fine since I reflashed the controller with my patched 
firmware.

To make a long story short, a lot of these controllers in the wild use a buggy 
firmware, version 1.0b [1]. During POST the controller's firmware scans the 
PCI bus to find the device it is supposed to initialize, ie the controller's 
Marvell 88SX6081 chip. It incorrectly assumes that the *first* device with one 
of these PCI device IDs is the 88SX6081: 5040 5041 5080 5081 6041 6042 6081 
7042 (the firmware is generic and supposed to support different chips). My 
system's motherboard happened to have an Marvell chip 88SX5041 onboard (device 
ID 5041) which was found first. So during POST the AOC-SAT2-MV8 firmware was 
initializing disks connected to the 5041, leaving the 6081 disks in an 
uninitialized stat. Then after POST when Solaris was booting, I guess the 
marvell88sx barfed on this unexpected state and was causing the kernel to 
reboot.

To fix the bug, I simply patched the firmware to remove 5041 from the device 
ID list. I used the Supermicro-provided tool to reflash the firmware [1].

You said your motherboard is a Supermicro H8DM8E-2. There is no such model, do 
you mean H8DM8-2 or H8DME-2 ?. To determine whether one of your PCI devices 
has one of the device IDs I mentionned, run:
  $ /usr/X11/bin/scanpci

I have recently had to replace this AOC-SAT2-MV8 controller with another one 
(we accidentally broke a SATA connector during a maintainance operation). Its 
firmware version is using a totally different numbering scheme (it's probably 
more recent) and it worked right out-of-the-box on the same motherboard. So it 
looks like Marvell or Supermicro fixed the bug in at least some later 
revisions of the AOC-SAT2-MV8. But they don't distribute this newer firmware 
on their FTP site.

Do you know if yours is using firmware 1.0b (displayed during POST) ?

[1] ftp://ftp.supermicro.com/Firmware/AOC-SAT2-MV8


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problem with AOC-SAT2-MV8

2008-07-01 Thread Ross
Good point about the motherboard number, I replied but never spotted that.  I'd 
assumed it was the H8DM3-2, which is on the Sun HCL.  Hadn't realised 
Supermicro had quite so many similar model numbers.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS configuration for VMware

2008-07-01 Thread Marc Bevand
Erik Trimble Erik.Trimble at Sun.COM writes:
 
 * Huge RAM drive in a 1U small case (ala Cisco 2500-series routers), 
 with SAS or FC attachment.

Almost what you want:
http://www.superssd.com/products/ramsan-400/
128 GB RAM-based device, 3U chassis, FC and Infiniband connectivity.

However as a commenter pointed out [1] you would be basically buying RAM at 
~20x its street price... Plus the density sucks and they could strip down this 
device much more (remove the backup drives, etc.)

[1] 
http://storagemojo.com/2008/03/07/flash-talking-and-a-wee-dram-with-texas-memory-systems/

-marc

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Periodic flush

2008-07-01 Thread Roch - PAE
Robert Milkowski writes:

  Hello Roch,
  
  Saturday, June 28, 2008, 11:25:17 AM, you wrote:
  
  
  RB I suspect,  a single dd is cpu bound.
  
  I don't think so.
  

We're nearly so as you show. More below.

  Se below one with a stripe of 48x disks again. Single dd with 1024k
  block size and 64GB to write.
  
  bash-3.2# zpool iostat 1
 capacity operationsbandwidth
  pool used  avail   read  write   read  write
  --  -  -  -  -  -  -
  test 333K  21.7T  1  1   147K   147K
  test 333K  21.7T  0  0  0  0
  test 333K  21.7T  0  0  0  0
  test 333K  21.7T  0  0  0  0
  test 333K  21.7T  0  0  0  0
  test 333K  21.7T  0  0  0  0
  test 333K  21.7T  0  0  0  0
  test 333K  21.7T  0  0  0  0
  test 333K  21.7T  0  1.60K  0   204M
  test 333K  21.7T  0  20.5K  0  2.55G
  test4.00G  21.7T  0  9.19K  0  1.13G
  test4.00G  21.7T  0  0  0  0
  test4.00G  21.7T  0  1.78K  0   228M
  test4.00G  21.7T  0  12.5K  0  1.55G
  test7.99G  21.7T  0  16.2K  0  2.01G
  test7.99G  21.7T  0  0  0  0
  test7.99G  21.7T  0  13.4K  0  1.68G
  test12.0G  21.7T  0  4.31K  0   530M
  test12.0G  21.7T  0  0  0  0
  test12.0G  21.7T  0  6.91K  0   882M
  test12.0G  21.7T  0  21.8K  0  2.72G
  test16.0G  21.7T  0839  0  88.4M
  test16.0G  21.7T  0  0  0  0
  test16.0G  21.7T  0  4.42K  0   565M
  test16.0G  21.7T  0  18.5K  0  2.31G
  test20.0G  21.7T  0  8.87K  0  1.10G
  test20.0G  21.7T  0  0  0  0
  test20.0G  21.7T  0  12.2K  0  1.52G
  test24.0G  21.7T  0  9.28K  0  1.14G
  test24.0G  21.7T  0  0  0  0
  test24.0G  21.7T  0  0  0  0
  test24.0G  21.7T  0  0  0  0
  test24.0G  21.7T  0  14.5K  0  1.81G
  test28.0G  21.7T  0  10.1K  63.6K  1.25G
  test28.0G  21.7T  0  0  0  0
  test28.0G  21.7T  0  10.7K  0  1.34G
  test32.0G  21.7T  0  13.6K  63.2K  1.69G
  test32.0G  21.7T  0  0  0  0
  test32.0G  21.7T  0  0  0  0
  test32.0G  21.7T  0  11.1K  0  1.39G
  test36.0G  21.7T  0  19.9K  0  2.48G
  test36.0G  21.7T  0  0  0  0
  test36.0G  21.7T  0  0  0  0
  test36.0G  21.7T  0  17.7K  0  2.21G
  test40.0G  21.7T  0  5.42K  63.1K   680M
  test40.0G  21.7T  0  0  0  0
  test40.0G  21.7T  0  6.62K  0   844M
  test44.0G  21.7T  1  19.8K   125K  2.46G
  test44.0G  21.7T  0  0  0  0
  test44.0G  21.7T  0  0  0  0
  test44.0G  21.7T  0  18.0K  0  2.24G
  test47.9G  21.7T  1  13.2K   127K  1.63G
  test47.9G  21.7T  0  0  0  0
  test47.9G  21.7T  0  0  0  0
  test47.9G  21.7T  0  15.6K  0  1.94G
  test47.9G  21.7T  1  16.1K   126K  1.99G
  test51.9G  21.7T  0  0  0  0
  test51.9G  21.7T  0  0  0  0
  test51.9G  21.7T  0  14.2K  0  1.77G
  test55.9G  21.7T  0  14.0K  63.2K  1.73G
  test55.9G  21.7T  0  0  0  0
  test55.9G  21.7T  0  0  0  0
  test55.9G  21.7T  0  16.3K  0  2.04G
  test59.9G  21.7T  0  14.5K  63.2K  1.80G
  test59.9G  21.7T  0  0  0  0
  test59.9G  21.7T  0  0  0  0
  test59.9G  21.7T  0  17.7K  0  2.21G
  test63.9G  21.7T  0  4.84K  62.6K   603M
  test63.9G  21.7T  0  0  0  0
  test63.9G  21.7T  0  0  0  0
  test63.9G  21.7T  0  0  0  0
  test63.9G  21.7T  0  0  0  0
  test63.9G  21.7T  0  0  0  0
  test63.9G  21.7T  0  0  0  0
  test63.9G  21.7T  0  0  0  0
  ^C
  bash-3.2#
  
  bash-3.2# ptime dd if=/dev/zero of=/test/q1 bs=1024k count=65536
  65536+0 records in
  65536+0 records out
  
  real 1:06.312
  user0.074
  sys54.060
  bash-3.2#
  
  Doesn't look like it's CPU bound.
  

So if sys we're at 81%  of CPU saturation. If you make this
100% you will still have zeros in the zpool iostat.

We 

Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread jan damborsky
Hi Jeff,


Jeff Bonwick wrote:
 Neither swap or dump are mandatory for running Solaris.

 Dump is mandatory in the sense that losing crash dumps is criminal.

I think that installer should be tolerant in this point and shouldn't
refuse to proceed with installation if user doesn't provide enough
available disk space to create dump device.

It should be probably documented (for example mentioned in release notes)
that when minimum disk space is provided for installation, swap  dump
are not created.


 Swap is more complex.  It's certainly not mandatory.  Not so long ago,
 swap was typically larger than physical memory.  But in recent years,
 we've essentially moved to a world in which paging is considered a bug.
 Swap devices are often only a fraction of physical memory size now,
 which raises the question of why we even bother.  On my desktop, which
 has 16GB of memory, the default OpenSolaris swap partition is 2GB.
 That's just stupid.  Unless swap space significantly expands the
 amount of addressable virtual memory, there's no reason to have it.

I agree with you in this point. Since new formula for calculating
swap  dump will take into account amount of physical memory, the
values should make more sense.

That said, this is just default value and certainly wouldn't be feasible
in all situations. However, as this is something which can be changed at
will after installation is done, I would rather keep that formula as simple
as reasonable.


 There have been a number of good suggestions here:

 (1) The right way to size the dump device is to let dumpadm(1M) do it
 based on the dump content type.

To be honest, it is not quite clear to me, how we might utilize
dumpadm(1M) to help us to calculate/recommend size of dump device.
Could you please elaborate more on this ?


 (2) In a virtualized environment, a better way to get a crash dump
 would be to snapshot the VM.  This would require a little bit
 of host/guest cooperation, in that the installer (or dumpadm)
 would have to know that it's operating in a VM, and the kernel
 would need some way to notify the VM that it just panicked.
 Both of these ought to be doable.

Yes - I like this idea as well. But until the appropriate support is
provided by virtual tools and/or implemented in kernel, I think (I might
be wrong) that in the installer we will still need to use standard
mechanisms for now.

Thank you,
Jan

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread jan damborsky
Mike Gerdts wrote:
 On Mon, Jun 30, 2008 at 9:19 AM, jan damborsky [EMAIL PROTECTED] wrote:
 Hi Mike,


 Mike Gerdts wrote:
 On Wed, Jun 25, 2008 at 11:09 PM, Jan Damborsky [EMAIL PROTECTED]
 wrote:
 Thank you very much all for this valuable input.

 Based on the collected information, I would take
 following approach as far as calculating size of
 swap and dump devices on ZFS volumes in Caiman
 installer is concerned.

 [1] Following formula would be used for calculating
   swap and dump sizes:

 size_of_swap = size_of_dump = MAX(512 MiB, MIN(physical_memory/2, 32
 GiB))
 dump should scale with memory size, but the size given is completely
 overkill.  On very active (heavy kernel activity) servers with 300+ GB
 of RAM, I have never seen a (compressed) dump that needed more than 8
 GB.  Even uncompressed the maximum size I've seen has been in the 18
 GB range.  This has been without zfs in the mix.  It is my
 understanding that at one time the arc was dumped as part of kernel
 memory but that was regarded as a bug and has sense been fixed.  If
 the arc is dumped, a value of dump much closer to physical memory is
 likely to be appropriate.
 I would agree that given the fact, user can customize this any time
 after installation, the smaller upper bound is the better. Would
 it be fine then to use 16 GiB, or even smaller one would be more
 appropriate ?

 By default, only kernel memory is dumped to the dump device.  Further,
 this is compressed.  I have heard that 3x compression is common and
 the samples that I have range from 3.51x - 6.97x.

 If you refer to InfoDoc 228921 (contract only - can that be opened or
 can a Sun employee get permission to post same info to an open wiki?)
 you will see a method for approximating the size of a crash dump.  On
 my snv_91 virtualbox instance (712 MB RAM configured), that method
 gave me an estimated (uncompressed) crash dump size of about 450 MB.
 I induced a panic to test the approximation.  In reality it was 323 MB
 and compress(1) takes it down to 106 MB.  My understanding is that the
 algorithm used in the kernel is a bit less aggressive than the
 algorithm used by compress(1) so maybe figure 120 - 150 MB in this
 case.  My guess is that this did not compress as well as my other
 samples because on this smaller system a higher percentage of my
 kernel pages were not full of zeros.

 Perhaps the right size for the dump device is more like:

 MAX(256 MiB, MIN(physical_memory/4, 16 GiB)

Thanks a lot for making this investigation and collecting
valuable data - I will modify the proposed formula according
to your suggestion.


 Further, dumpadm(1M) could be enhanced to resize the dump volume on
 demand.  The size that it would choose would likely be based upon what
 is being dumped (kernel, kernel+user, etc.), memory size, current
 estimate using InfoDoc 228921 logic, etc.

 As an aside, does the dedicated dump on all machines make it so that
 savecore no longer runs by default?  It just creates a lot of extra
 I/O during boot (thereby slowing down boot after a crash) and uses a
 lot of extra disk space for those that will never look at a crash
 dump.  Those that actually use it (not the majority target audience
 for OpenSolaris, I would guess) will be able to figure out how to
 enable (the yet non-existent) svc:/system/savecore:default.

 Looking at the savecore(1M) man pages, it seems that it is managed
 by svc:/system/dumpadm:default. Looking at the installed system,
 this service is online. If I understand correctly, you are recommending
 to disable it by default ?

 dumpadm -n is really the right way to do this.

I see - thanks for clarifying it.

Jan

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread jan damborsky
Dave Miner wrote:
 I agree - I am just thinking, if it is fine in general to allow
 normal non-experienced user (who is the target audience for Slim
 installer) to run system without swap. To be honest, I don't know,
 since I am not very experienced in this area.
 If people agree that this is not issue at all, I don't have any
 objections against making swap optional.


 Now that we don't have to reserve slices for it, making swap optional in 
 the space calculation is fine.  We don't place any lower limits on 
 memory, and it's just virtual memory, after all.  Besides which, we can 
 infer that the system works well enough for the user's purposes without 
 swap since the boot from the CD won't have used any swap.

That is a good point. Based on this and also on Jeff's comment
I will make swap optional as well.

Thank you,
Jan

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] swap dump on ZFS volume - updated proposal

2008-07-01 Thread jan damborsky
Hi all,

Based on the further comments I received, following
approach would be taken as far as calculating default
size of swap and dump devices on ZFS volumes in Caiman
installer is concerned.

[1] Following formula would be used for calculating
swap and dump sizes:

size_of_swap = MAX(512 MiB, MIN(physical_memory/2, 32 GiB))
size_of_dump = MAX(256 MiB, MIN(physical_memory/4, 16 GiB))

User can reconfigure this after installation is done on live
system by zfs set command.

[2] dump and swap devices will be considered optional

dump and swap devices will be considered optional during
fresh installation and will be created only if there is
appropriate space available on disk provided.

Minimum disk space required will not take into account
dump and swap, thus allowing user to install on small disks.
This will need to be documented (e.g. as part of release notes),
so that user is aware of such behavior.

Recommended disk size (which now covers one full upgrade plus
2GiB space for additional software) will take into account dump
and swap.

Dump and swap devices will be then created if user dedicates
at least recommended disk space for installation.

Thank you very much all for this valuable input.
Jan


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Jürgen Keil
Mike Gerdts wrote

 By default, only kernel memory is dumped to the dump device.  Further,
 this is compressed.  I have heard that 3x compression is common and
 the samples that I have range from 3.51x - 6.97x.

My samples are in the range 1.95x - 3.66x.  And yes, I lost
a few crash dumps on a box with a 2GB swap slice, after
physical memory was upgraded from 4GB to 8GB.

% grep pages dumped /var/adm/messages*
/var/adm/messages:Jun 27 13:43:56 tiger2 genunix: [ID 409368 kern.notice] 
^M100% done: 593680 pages dumped, compression ratio 3.51, 
/var/adm/messages.0:Jun 25 13:08:22 tiger2 genunix: [ID 409368 kern.notice] 
^M100% done: 234922 pages dumped, compression ratio 2.39, 
/var/adm/messages.1:Jun 12 13:22:53 tiger2 genunix: [ID 409368 kern.notice] 
^M100% done: 399746 pages dumped, compression ratio 1.95, 
/var/adm/messages.1:Jun 12 19:00:01 tiger2 genunix: [ID 409368 kern.notice] 
^M100% done: 245417 pages dumped, compression ratio 2.41, 
/var/adm/messages.1:Jun 16 19:15:37 tiger2 genunix: [ID 409368 kern.notice] 
^M100% done: 710001 pages dumped, compression ratio 3.48, 
/var/adm/messages.1:Jun 16 19:21:35 tiger2 genunix: [ID 409368 kern.notice] 
^M100% done: 315989 pages dumped, compression ratio 3.66, 
/var/adm/messages.2:Jun 11 15:40:32 tiger2 genunix: [ID 409368 kern.notice] 
^M100% done: 341209 pages dumped, compression ratio 2.68,
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Darren J Moffat
Jeff Bonwick wrote:
 Neither swap or dump are mandatory for running Solaris.
 
 Dump is mandatory in the sense that losing crash dumps is criminal.

Agreed on that point, I remember all to well why I was in Sun Service 
the days when the first dump was always lost because savecore didn't 
used to be run!

 Swap is more complex.  It's certainly not mandatory.  Not so long ago,
 swap was typically larger than physical memory.  But in recent years,
 we've essentially moved to a world in which paging is considered a bug.
 Swap devices are often only a fraction of physical memory size now,
 which raises the question of why we even bother.  On my desktop, which
 has 16GB of memory, the default OpenSolaris swap partition is 2GB.
 That's just stupid.  Unless swap space significantly expands the
 amount of addressable virtual memory, there's no reason to have it.

What has alwyas annoyed me about Solaris (and every Linux distro I've 
ever used) is that unlike Windows and MacOS X we put swap management 
(devices and their size) into the hands of the admin.  The upside of 
this though is that it is easy to mirror swap using SVM.

Instead we should take it completely out of their hands and do it all 
dynamically when it is needed.  Now that we can swap on a ZVOL and ZVOLs 
can be extended this is much easier to deal with and we don't lose the 
benefit of protected swap devices (in fact we have much more than we had 
with SVM).


-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Mike Gerdts
On Tue, Jul 1, 2008 at 5:56 AM, Darren J Moffat [EMAIL PROTECTED] wrote:
 Instead we should take it completely out of their hands and do it all
 dynamically when it is needed.  Now that we can swap on a ZVOL and ZVOLs
 can be extended this is much easier to deal with and we don't lose the
 benefit of protected swap devices (in fact we have much more than we had
 with SVM).

Are you suggesting that if I have a system that has 500 MB swap free
and someone starts up another JVM with a 16 GB heap that swap should
automatically grow by 16+ GB right at that time?  I have seen times
where applications require X GB of RAM, make the reservation, then
never dirty more than X/2 GB of pages.  In these cases dynamically
growing swap to a certain point may be OK.

In most cases, however, I see this as a recipe for disaster.  I would
rather have an application die (and likely restart via SMF) because it
can't get the memory that it requested than have heavy paging bring
the system to such a crawl that transactions time out and it takes
tens of minutes for administrators to log in and shut down some
workload.  The app that can't start will likely do so during a
maintenance window.  The app that causes the system to crawl will,
with all likelihood, do so during peak production or when the admin is
in bed.

Perhaps bad paging activity (definition needed) should throw some
messages to FMA so that the nice GUI tool that answers the question
why does my machine suck? can say that it has been excessively short
on memory X times in recent history.  Any of these approaches is miles
above the Linux approach of finding a memory hog to kill.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Darren J Moffat
Mike Gerdts wrote:
 On Tue, Jul 1, 2008 at 5:56 AM, Darren J Moffat [EMAIL PROTECTED] wrote:
 Instead we should take it completely out of their hands and do it all
 dynamically when it is needed.  Now that we can swap on a ZVOL and ZVOLs
 can be extended this is much easier to deal with and we don't lose the
 benefit of protected swap devices (in fact we have much more than we had
 with SVM).
 
 Are you suggesting that if I have a system that has 500 MB swap free
 and someone starts up another JVM with a 16 GB heap that swap should
 automatically grow by 16+ GB right at that time?  I have seen times
 where applications require X GB of RAM, make the reservation, then
 never dirty more than X/2 GB of pages.  In these cases dynamically
 growing swap to a certain point may be OK.

Not at all, and I don't see how you could get that assumption from what 
I said.  I said dynamically when it is needed.

 In most cases, however, I see this as a recipe for disaster.  I would
 rather have an application die (and likely restart via SMF) because it
 can't get the memory that it requested than have heavy paging bring
 the system to such a crawl that transactions time out and it takes
 tens of minutes for administrators to log in and shut down some
 workload.  The app that can't start will likely do so during a
 maintenance window.  The app that causes the system to crawl will,
 with all likelihood, do so during peak production or when the admin is
 in bed.

I would not favour a system where the admin had no control over swap.
I'm just suggesting that in many cases where swap is actually needed 
there is no real need for the admin to be involved in managing the swap 
and its size should not need to be predetermined.

-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Mike Gerdts
On Tue, Jul 1, 2008 at 7:31 AM, Darren J Moffat [EMAIL PROTECTED] wrote:
 Mike Gerdts wrote:

 On Tue, Jul 1, 2008 at 5:56 AM, Darren J Moffat [EMAIL PROTECTED]
 wrote:

 Instead we should take it completely out of their hands and do it all
 dynamically when it is needed.  Now that we can swap on a ZVOL and ZVOLs
 can be extended this is much easier to deal with and we don't lose the
 benefit of protected swap devices (in fact we have much more than we had
 with SVM).

 Are you suggesting that if I have a system that has 500 MB swap free
 and someone starts up another JVM with a 16 GB heap that swap should
 automatically grow by 16+ GB right at that time?  I have seen times
 where applications require X GB of RAM, make the reservation, then
 never dirty more than X/2 GB of pages.  In these cases dynamically
 growing swap to a certain point may be OK.

 Not at all, and I don't see how you could get that assumption from what I
 said.  I said dynamically when it is needed.

I think I came off wrong in my initial message.  I've seen times when
vmstat reports only megabytes of free swap while gigabytes of RAM were
available.  That is, reservations far outstripped actual usage.  Do
you have mechanisms in mind to be able to detect such circumstances
and grow swap to a point that the system can handle more load without
spiraling to a long slow death?

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Jason King
On Tue, Jul 1, 2008 at 8:10 AM, Mike Gerdts [EMAIL PROTECTED] wrote:
 On Tue, Jul 1, 2008 at 7:31 AM, Darren J Moffat [EMAIL PROTECTED] wrote:
 Mike Gerdts wrote:

 On Tue, Jul 1, 2008 at 5:56 AM, Darren J Moffat [EMAIL PROTECTED]
 wrote:

 Instead we should take it completely out of their hands and do it all
 dynamically when it is needed.  Now that we can swap on a ZVOL and ZVOLs
 can be extended this is much easier to deal with and we don't lose the
 benefit of protected swap devices (in fact we have much more than we had
 with SVM).

 Are you suggesting that if I have a system that has 500 MB swap free
 and someone starts up another JVM with a 16 GB heap that swap should
 automatically grow by 16+ GB right at that time?  I have seen times
 where applications require X GB of RAM, make the reservation, then
 never dirty more than X/2 GB of pages.  In these cases dynamically
 growing swap to a certain point may be OK.

 Not at all, and I don't see how you could get that assumption from what I
 said.  I said dynamically when it is needed.

 I think I came off wrong in my initial message.  I've seen times when
 vmstat reports only megabytes of free swap while gigabytes of RAM were
 available.  That is, reservations far outstripped actual usage.  Do
 you have mechanisms in mind to be able to detect such circumstances
 and grow swap to a point that the system can handle more load without
 spiraling to a long slow death?

Having this dynamic would be nice with Oracle.  10g at least will use
DISM in the preferred configuration Oracle is now preaching to DBAs.
I ran into this a few months ago on an upgrade (Solaris 8 - 10,
Oracle 8 - 10g, and hw upgrade).  The side effect of using DISM is
that it reserves an amount equal to the SGA in swap, and will fail to
startup if swap is too small.  In practice, I don't see the space ever
being touched (I suspect it's mostly there as a requirement for
dynamic reconfiguration w/ DISM, but didn't bother to dig that far).
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Darren J Moffat
Mike Gerdts wrote:

 Not at all, and I don't see how you could get that assumption from what I
 said.  I said dynamically when it is needed.
 
 I think I came off wrong in my initial message.  I've seen times when
 vmstat reports only megabytes of free swap while gigabytes of RAM were
 available.  That is, reservations far outstripped actual usage. 

Ah that makes it more clear.

  Do you have mechanisms in mind to be able to detect such circumstances
 and grow swap to a point that the system can handle more load without
 spiraling to a long slow death?

I don't as yet because I haven't had time to think about this.  Maybe 
once I've finished with the ZFS Crypto project and I spend some time 
looking at encrypted VM (other than by swapping on an encrypted ZVOL).
At the moment while it annoys me it isn't on my todo list to try and 
implement a fix.

-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Richard Elling
Darren J Moffat wrote:
 Mike Gerdts wrote:

   
 Not at all, and I don't see how you could get that assumption from what I
 said.  I said dynamically when it is needed.
   
 I think I came off wrong in my initial message.  I've seen times when
 vmstat reports only megabytes of free swap while gigabytes of RAM were
 available.  That is, reservations far outstripped actual usage. 
 

 Ah that makes it more clear.

   Do you have mechanisms in mind to be able to detect such circumstances
   
 and grow swap to a point that the system can handle more load without
 spiraling to a long slow death?
 

 I don't as yet because I haven't had time to think about this.  Maybe 
 once I've finished with the ZFS Crypto project and I spend some time 
 looking at encrypted VM (other than by swapping on an encrypted ZVOL).
 At the moment while it annoys me it isn't on my todo list to try and 
 implement a fix.

   

Here is a good start, BSD's dynamic_pager
http://developer.apple.com/documentation/Darwin/Reference/ManPages/man8/dynamic_pager.8.html

Mike, many people use this all day long and seem to be quite happy.
I think the slow death spiral might be overrated :-)
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Some basic questions about getting the best performance for database usage

2008-07-01 Thread Christiaan Willemsen
 Why not go to 128-256 GBytes of RAM?  It isn't that
 expensive and would
 significantly help give you a big performance boost
 ;-)

Would be nice, but it not that much inexpensive since we'd have to move up a 
class in server choise, and besides the extra memory cost, also brings some 
more money with it.

 The database transaction log should be relatively
 small, so I would
 look for two LUNs (disks), mirrored.  Similarly, the
 ZIL should be
 relatively small -- two LUNs (disks), mirrored.  You
 will want ZFS to
 manage the redundancy here, so think about mirroring
 at the
 ZFS level.  The actual size needed will be based on
 the transaction
 load which causes writes.  For ZIL sizing, we like to
 see something
 like 20 seconds worth of write workload.  In most
 cases, this will
 fit into the write cache of a decent array, so you
 may not have to
 burn an actual pair of disks in the backing store.
  But since I don't
 now the array your using, it will be difficult to be
 specific.

Oka, so if the array cache is large enough, there is no actual need for a 
seperate ZIL disk.

Another consideration could be the use of SSD's for all of the stuff. You'll 
only need a few of these to have by far beter IO performance than the 16 SAS 
disks could ever do. Also, you'd probably not need a ZIL disk, nor a disk for 
the transaction log.

It will cost about the same, but will probably give better performance
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Proper wayto do disk replacement in an A1000 storage array and raidz2.

2008-07-01 Thread Cindy . Swearingen
Hi--

I'm not quite sure about the exact sequence of events here, but it
sounds like you had two spares and replaced the failed disk with one of
the spares, which you can do manually with the zpool replace command.

The remaining spare should drop back into the spare pool if you detached
it. Check your zpool status output to confirm the spare status. If you
need to add it back to the pool, then you can use zpool add.

You can review more about managing spares, here:

http://docs.sun.com/app/docs/doc/817-2271/gcvcw?a=view

Your questions have reminded me that we need a better example of the
failed disk--spare replacement scenario in this guide so I will add
one.

Thanks,

Cindy

Demian Phillips wrote:
 Thanks. I have another spare so I replaced with that and it put the used 
 spare back to spare status.
 
 I assume at this point once I replace the failed disk I just need to let 
 solaris see the change and then add it back into the pool as a spare (to 
 replace the spare I took out and used in the replace)?
 
 I see some odd behavior related to the FC array and controller but that is 
 not ZFS related so I will have to post elsewhere about that fun.
  
  
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problem with AOC-SAT2-MV8

2008-07-01 Thread Tim
So what version is on you new card?  Seems itd be far easier to
request from supermicro if we knew what to ask for.



On 7/1/08, Marc Bevand [EMAIL PROTECTED] wrote:
 I remember a similar pb with an AOC-SAT2-MV8 controller in a system of mine:
 Solaris rebooted each time the marvell88sx driver tried to detect the disks
 attached to it. I don't remember if happened during installation, or during
 the first boot after a successful install. I ended up spending a night
 reverse
 engineering the controller's firmware/BIOS to find and fix the bug. The
 system
 has been running fine since I reflashed the controller with my patched
 firmware.

 To make a long story short, a lot of these controllers in the wild use a
 buggy
 firmware, version 1.0b [1]. During POST the controller's firmware scans the
 PCI bus to find the device it is supposed to initialize, ie the controller's
 Marvell 88SX6081 chip. It incorrectly assumes that the *first* device with
 one
 of these PCI device IDs is the 88SX6081: 5040 5041 5080 5081 6041 6042 6081
 7042 (the firmware is generic and supposed to support different chips). My
 system's motherboard happened to have an Marvell chip 88SX5041 onboard
 (device
 ID 5041) which was found first. So during POST the AOC-SAT2-MV8 firmware was
 initializing disks connected to the 5041, leaving the 6081 disks in an
 uninitialized stat. Then after POST when Solaris was booting, I guess the
 marvell88sx barfed on this unexpected state and was causing the kernel to
 reboot.

 To fix the bug, I simply patched the firmware to remove 5041 from the device
 ID list. I used the Supermicro-provided tool to reflash the firmware [1].

 You said your motherboard is a Supermicro H8DM8E-2. There is no such model,
 do
 you mean H8DM8-2 or H8DME-2 ?. To determine whether one of your PCI devices
 has one of the device IDs I mentionned, run:
   $ /usr/X11/bin/scanpci

 I have recently had to replace this AOC-SAT2-MV8 controller with another one
 (we accidentally broke a SATA connector during a maintainance operation).
 Its
 firmware version is using a totally different numbering scheme (it's
 probably
 more recent) and it worked right out-of-the-box on the same motherboard. So
 it
 looks like Marvell or Supermicro fixed the bug in at least some later
 revisions of the AOC-SAT2-MV8. But they don't distribute this newer firmware
 on their FTP site.

 Do you know if yours is using firmware 1.0b (displayed during POST) ?

 [1] ftp://ftp.supermicro.com/Firmware/AOC-SAT2-MV8


 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Some basic questions about getting the best performance for database usage

2008-07-01 Thread Richard Elling
Christiaan Willemsen wrote:
 Why not go to 128-256 GBytes of RAM?  It isn't that
 expensive and would
 significantly help give you a big performance boost
 ;-)
 

 Would be nice, but it not that much inexpensive since we'd have to move up a 
 class in server choise, and besides the extra memory cost, also brings some 
 more money with it.
   

It should cost less than a RAID array...
Advertisement: Sun's low-end servers have 16 DIMM slots.

Fast, inexpensive, reliable: pick 2.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Keith Bierman

On Jul 1, 2008, at 10:55 AM, Miles Nordin wrote:

 I don't think it's overrated at all.  People all around me are using
 this dynamic_pager right now, and they just reboot when they see too
 many pinwheels.  If they are ``quite happy,'' it's not with their
 pager.

I often exist in a sea of mac users, and I've never seen them reboot  
other than after the periodic Apple Updates. Killing firefox every  
couple of days, or after visiting certain demented sites is not  
uncommon and is probably a good idea.
 

 They see demand as capacity rather than temperature but...the machine
 does need to run out of memory eventually.  Don't drink the
 dynamic_pager futuristic kool-aid.  It's broken, both in theory and in
 the day-to-day experience of the Mac users around me.


I've got macs with uptimes of months ... admittedly not in the same  
territory as my old SunOS or Solaris boxes, but Apple has seldom  
resisted the temptation to drop a security update or a quicktime  
update for longer.

-- 
Keith H. Bierman   [EMAIL PROTECTED]  | AIM kbiermank
5430 Nassau Circle East  |
Cherry Hills Village, CO 80113   | 303-997-2749
speaking for myself* Copyright 2008




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Bob Friesenhahn
On Tue, 1 Jul 2008, Miles Nordin wrote:

 I don't think it's overrated at all.  People all around me are using
 this dynamic_pager right now, and they just reboot when they see too
 many pinwheels.  If they are ``quite happy,'' it's not with their
 pager.

While we have seen these pinwheels under OS-X, the cause seems to be 
usually application lockup (due to poor application/library design) 
and not due to paging to death.  Paging to death causes lots of 
obvious disk churn.

Microsoft Windows includes a dynamic page file as well.

It is wrong to confuse total required paging space with thrashing. 
These are completely different issues.

Dynamic sizing of paging space seems to fit well with the new zfs 
root/boot strategy where everything is shared via a common pool.  If 
you don't use it, you don't lose it.  System resource limits can be 
used to block individual applications from consuming all resources.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Richard Elling
Miles Nordin wrote:
 re == Richard Elling [EMAIL PROTECTED] writes:
 

 re Mike, many people use this all day long and seem to be quite
 re happy.  I think the slow death spiral might be overrated :-)

 I don't think it's overrated at all.  People all around me are using
 this dynamic_pager right now, and they just reboot when they see too
 many pinwheels.  If they are ``quite happy,'' it's not with their
 pager.
   

If you run out of space, things fail.  Pinwheels are a symptom of
running out of RAM, not running out of swap.

 The pinwheel is part of a Mac user's daily vocabulary, and although
 they generally don't know this, it almost always appears because of
 programs that leak memory, grow, and eventually cause thrashing.  They
 do not even realize that restarting Mail or Firefox will fix the
 pinwheels.  They just reboot.  
   

...which frees RAM.

 so obviously it's an unworkable approach.  To them, being forced to
 reboot, even if it takes twenty minutes to shut down as long as it's a
 clean reboot, makes them feel more confident than Firefox unexpectedly
 crashing.  For us, exactly the opposite is true.

 I think dynamic_pager gets it backwards.  ``demand'' is a reason *NOT*
 to increase swap.  If all the allocated pages in swap are
 cold---colder than the disk's io capacity---then there is no
 ``demand'' and maybe it's ok to add some free pages which might absorb
 some warmer data.  If there are already warm pages in swap
 (``demand''), then do not satisfy more of it, instead let swap fill
 and return ENOMEM.
   

You will get more service calls for failures due to ENOMEM than
you will get for pinwheels.  Given the large size of disks in today's
systems, you may never see an ENOMEM.  The goodness here is
that it is one less thing that requires a service touch, even a local
sysadmin service touch costs real $$.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Streaming video and audio over CIFS lags.

2008-07-01 Thread Juho Mäkinen
I built a NAS with three 750 SATA disks in RAIDZ configuration and I've 
exported some filesystems using the Solaris kernel CIFS.

Streaming video or even audio from the exported shares to windows xp gives a 
laggy performance. Seeking the video can take ages, audio (playing mp3 with 
winamp from the cifs share) stops from time to time and also the video playback 
lags and pauses from time to time. The videos work just fine when I play them 
from my local computer, or stream them from another windows computer via CIFS.

I've searched this forums and found some other users reporting same problems, 
but not any good answers. I've tested network performance with iperf, which 
reports network speed to be about 50 Mt/sec (1Gbps network, so it should work 
much faster).

This http://www.opensolaris.org/jive/thread.jspa?messageID=232250#232250 
posting suggests trying to set zfetch_block_cap to 16..32, but I didn't found 
any ways how to set it. Any ideas? How I could improve my NAS to allow me to 
stream HD video from it?

 - Juho Mäkinen
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Streaming video and audio over CIFS lags.

2008-07-01 Thread Will Murnane
On Tue, Jul 1, 2008 at 14:47, Juho Mäkinen [EMAIL PROTECTED] wrote:
 Streaming video or even audio from the exported shares to windows xp gives a 
 laggy performance. Seeking the video can take ages, audio (playing mp3 with 
 winamp from the cifs share) stops from time to time and also the video 
 playback lags and pauses from time to time. The videos work just fine when I 
 play them from my local computer, or stream them from another windows 
 computer via CIFS.

What does local disk performance look like?  Try bonnie++
(http://will.incorrige.us/solaris-packages/CSEEbonnie++.pkg.gz or
build it yourself, it's straightforward).  Also, what ethernet card is
this?  What does dladm show-dev report?  How does copying a large
file behave?  Do you get bursty transfers of a higher rate, or a
steady 50 megabits (look at Windows Task Manager's various tabs)?

Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Miles Nordin
 bf == Bob Friesenhahn [EMAIL PROTECTED] writes:
 re == Richard Elling [EMAIL PROTECTED] writes:

re If you run out of space, things fail.  Pinwheels are a symptom
re of running out of RAM, not running out of swap.

okay.  But what is the point?

Pinwheels are a symptom of thrashing.

Pinwheels are not showing up when the OS is returning ENOMEM.
Pinwheels are not ``things fail'', they are ``things are going slower
than some watcher thinks they should.''

AFAICT they show up when the application under the cursor has been
blocked for about five seconds, which is usually because it's
thrashing, though sometimes it's because it's trying to read from an
NFS share that went away (this also causes pinwheels).

bf While we have seen these pinwheels under OS-X, the cause
bf seems to be usually application lockup (due to poor
bf application/library design) and not due to paging to death.

that's simply not my experience.

bf Paging to death causes lots of obvious disk churn.

You can check for it in 'top' on OS X.  they list pageins and pageouts.

bf It is wrong to confuse total required paging space with
bf thrashing.  These are completely different issues.

and I did not.  I even rephrased the word ``demand'' in terms of
thrashing.  I am not confused.

bf Dynamic sizing of paging space seems to fit well with the new
bf zfs root/boot strategy where everything is shared via a common
bf pool.

yes, it fits extremely well.

What I'm saying is, do not do it just because it ``fits well''.  Even
if it fits really really well so it almost begs you like a sort of
compulsive taxonomical lust to put the square peg into the square
hole, don't do it, because it's a bad idea!

When applications request memory reservations that are likely to bring
the whole system down due to thrashing, they need to get ENOMEM.  It
isn't okay to change the memory reservation ceiling to the ZFS pool
size, or to any other unreasonably large and not-well-considered
amount, even if the change includes a lot of mealy-mouthed pandering
orbiting around the word ``dynamic''.


pgpAy1nBoP74b.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Some basic questions about getting the best performance for database usage

2008-07-01 Thread Johan Hartzenberg
On Mon, Jun 30, 2008 at 10:17 AM, Christiaan Willemsen 
[EMAIL PROTECTED] wrote:

 The question is: how can we maximize IO by using the best possible
 combination of hardware and ZFS RAID?

 Here are some generic concepts that still hold true:

More disks can handle more IOs.

Larger disks can put more data on the outer edge, where performance is
better.

If you use disks much larger than your required data set, then the head seek
movement will also be minimized (You can limit the seek more by forcing the
file system to live in a small slice on the disk, the placement on the disk
which you can control.)

Don't put all your disks on a single controller.  Just as more disks can
handle more IOs at a time, so can more controllers issue more instructions
at once.  On the other hand giving each disk a dedicated controller is a
waste because the controller will then be idle most of the time, waiting for
the disk to return results.

RAM, as mentioned before, is your friend.  ZFS will use it liberally.

You mentioned a 70 GB database, so: If you take say 10 x 146GB 15Krpm SAS
disks, set those up in a 4-disk stripe and add a mirror to each disk, you'll
get pretty decent performance.  I read somewhere that ZFS automatically
gives preferences to the outer cylinders of a disk when selecting free
blocks, but you could also restrict the ZFS pool to using only the outer say
20 GB of each disk by creating slices and adding those to the pool.

Note if you do use slices in stead of whole disks, you need to manually turn
on disk write caching (format -e - SCSI cache options)

If you don't care about tracking file access times, turn it off. (zfs set
atime=off datapool)

Have you decided on a server model yet?  Storage subsystems?  HBAs?  The
specifics in your configuration will undoubtedly get lots of responses from
this list about how to tune each component!  Everything from memory
interleaving to spreading your HBAs across schizo chips.

However much more important in your actual end result is your application
and DB setup, config, and how it is developed.  If the application
developers or the DBAs get it wrong, the system will always be a dog.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Bob Friesenhahn
On Tue, 1 Jul 2008, Miles Nordin wrote:
 
 okay.  But what is the point?
 
 Pinwheels are a symptom of thrashing.

They seem like the equivalent of the meaningless hourglass icon to me.

 Pinwheels are not showing up when the OS is returning ENOMEM.
 Pinwheels are not ``things fail'', they are ``things are going slower
 than some watcher thinks they should.''

Not all applications demand instant response when they are processing. 
Sometimes they have actual work to do.

 bf It is wrong to confuse total required paging space with
 bf thrashing.  These are completely different issues.
 
 and I did not.  I even rephrased the word ``demand'' in terms of
 thrashing.  I am not confused.

You sound angry.

 When applications request memory reservations that are likely to bring
 the whole system down due to thrashing, they need to get ENOMEM.  It

What is the relationship between the size of the memory reservation 
and thrashing?  Are they somehow related?  I don't see the 
relationship.  It does not bother me if the memory reservation is 10X 
the size of physical memory as long as the access is orderly and not 
under resource contention (i.e. thrashing).  A few days ago I had a 
process running which consumed 48GB of virtual address space without 
doing any noticeable thrashing and with hardly any impact to usability 
of the desktop.

 isn't okay to change the memory reservation ceiling to the ZFS pool
 size, or to any other unreasonably large and not-well-considered
 amount, even if the change includes a lot of mealy-mouthed pandering
 orbiting around the word ``dynamic''.

I have seen mealy worms.  They are kind of ugly but fun to hold in 
your hand and show your friends.  I am don't think I would want them 
in my mouth and am not sure how I would pander to a worm.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Some basic questions about getting the best performance for database usage

2008-07-01 Thread Bob Friesenhahn
On Tue, 1 Jul 2008, Johan Hartzenberg wrote:

 Larger disks can put more data on the outer edge, where performance is
 better.

On the flip side, disks with a smaller form factor produce less 
vibration and are less sensitive to it so seeks stabilize faster with 
less chance of error.  The platters are also smaller so they can seek 
faster and more reliably.  Less heat is produced and less energy is 
consumed.  The 2.5 form factor is the better choice if large storage 
is not required.

 get pretty decent performance.  I read somewhere that ZFS automatically
 gives preferences to the outer cylinders of a disk when selecting free
 blocks, but you could also restrict the ZFS pool to using only the outer say
 20 GB of each disk by creating slices and adding those to the pool.

A more effective method would be to place a quota on the filesystem 
which assures that there will always be substantial free space in the 
pool.  Simply decide to not use a large part of the pool space.  With 
lots of free space in the pool, zfs won't have to look very hard for 
more free space to satisfy its COW requirements and it is more likely 
that the allocation is a good one (less fragmentation).

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Some basic questions about getting the best performance for database usage

2008-07-01 Thread Mike Gerdts
On Mon, Jun 30, 2008 at 11:43 AM, Akhilesh Mritunjai
[EMAIL PROTECTED] wrote:
 I'll probably be having 16 Seagate 15K5 SAS disks,
 150 GB each.  Two in HW raid1 for the OS, two in HW
 raid 1 or 10 for the transaction log. The OS does not
 need to be on ZFS, but could be.

 Whatever you do, DO NOT mix zfs and HW RAID.

 ZFS likes to handle redundancy all by itself. It's much smarter than any HW 
 RAID and when does NOT like it when it detects a data corruption it can't fix 
 (i.e. no replicas). HW RAID's can't fix data corruption and that leads to a 
 very unhappy ZFS.

 Let ZFS handle all redundancy.

If you are dealing with a high-end storage array[1] that does RAID-5,
you probably want to do RAID-5 on there, as well as mirroring with
ZFS.  This allows disk replacements to be done using only the internal
paths of the array.  If you push the rebuild of a 1 TB disk to the
server, it causes an unnecessary amount of traffic across shared[2]
components such as CHIPP processors[3], inter-switch-links, etc.
Mirroring then allows zfs to have the bits needed to self-heal.


1. Typically as physically large as the combined size of your fridge,
your mom's fridge, and those of your three best friends that are out
of college and have a fridges significantly larger than a keg.
2. Shared as in one server's behavior can and may be somewhat likely
to affect the performance of another.
3. Assuming Hitachi


-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Some basic questions about getting the best performance for database usage

2008-07-01 Thread Akhilesh Mritunjai
I feel I'm being mis-understood.

RAID - Redundant Array of Inexpensive Disks.

I meant to state that - Let ZFS deal with redundancy.

If you want to have an AID by all means have your RAID controller do all 
kind of striping/mirroring it can to help with throughput or ease of managing 
drives.

Let ZFS deal with the redundancy part. I'm not counting redundancy offered by 
traditional RAID as you can see by just posts in this forums that -
1. It doesn't work.
2. It bites when you least expect it to.
3. You can do nothing but resort to tapes and LOT of aspirin when you get 
bitten.

- Akhilesh
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Streaming video and audio over CIFS lags.

2008-07-01 Thread Juho Mäkinen
Here's bonnie++ output with default settings:
Version  1.03   --Sequential Output-- --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
sonas 4100M 138010  74 144083  33 76546  19 138071  90 185735  15 464.7 
  1
--Sequential Create-- Random Create
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
 files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
 16 24307  99 + +++ + +++ 24438  99 + +++ + +++
sonas,4100M,138010,74,144083,33,76546,19,138071,90,185735,15,464.7,1,16,24307,99,+,+++,+,+++,24438,99,+,+++,+,+++

[EMAIL PROTECTED]:/etc# dladm show-dev
LINKSTATE  SPEEDDUPLEX
rge0up 1000Mb   full

A bit speed testing with dd:
[EMAIL PROTECTED]:/etc# dd if=/dev/zero of=/storagepool/users/test.dump bs=128k
2035941376 bytes (2.0 GB) copied, 23.6579 seconds, 86.1 MB/s

The network rate is about 35-45 MB/s when copying large files from Windows XP 
over CIFS to the ZFS share. At the same time zpool iostat 5 shows write speed 
which is jumping in the following manner (there's actual sequent numbers for 
write speed) 50.9M, 43.0M, 7.88M, 50.9M, 50.9M, 18.6M, 32.3M, 50.9M

Monitoring iostat -xnz 1 shows the following trend (while copying large files 
from windows):

three seconds of writing 
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0  477.10.0 57656.7  2.8 31.65.9   66.3  97 100 c3t0d0
0.0  469.40.0 56671.8  2.8 31.66.0   67.4  97 100 c3t1d0
0.0  482.60.0 58648.3  2.8 31.65.8   65.5  97 100 c3t2d0

then three seconds without any writing. Then again three seconds of writing 
following with three seconds without writing.

My machine is based on GIGABYTE GA-P35-DS3P motherboard with Intel Code 2 Duo 
processor, 2GB fast 800Mhz DDR2 ram in dual-channel configuration and three 
Seagate 7200.11 750GB disks connected to the motherboard SATA controler running 
in AHCI mode. The motherboard uses Realtek 8111B network chipset to provide the 
gigabit ethernet.

Does that info help anything?

 - Juho Mäkinen
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Miles Nordin
 bf == Bob Friesenhahn [EMAIL PROTECTED] writes:

bf What is the relationship between the size of the memory
bf reservation and thrashing?

The problem is that size-capping is the only control we have over
thrashing right now.  Maybe there are better ways to predict thrashing
than through reservation size, and maybe it's possible to design swap
admission control that's safer and yet also more gracious to your
Java-like reservations of large cold datasets than the flat capping we
have now, but Mac OS doesn't have it.

I suspect there are even cough some programs that try to get an idea
how much memory pressure there is, and how often they need to gc, by
making big reservations until they get ENOMEM.  They develop tricks in 
an ecosystem, presuming some reasonable swap cap is configured, so
removing it will mess up their (admittedly clumsy) tricks.

To my view, if the goal is ``manual tuning is bad.  we want to
eliminate swap size as a manual tuneable,'' then the ``dynamic''
aspect of the tuning should be to grow the swap area until it gets too
hot: until the ``demand'' is excessive.  Some WFQ-ish thing might be
in order, too, like a complicated version of ulimit.  But this may be
tricky or impossible, and in any case none of that is on the table so
far: the type of autotuning you are trying to copy from other
operating systems is just to remove the upper limit on swap size
entirely, which is a step backwards.

I think it's the wrong choice for a desktop, but it is somewhat
workable choice on a single-user machine where it's often just as
irritating to the user if his one big application crashes in which all
his work is stored, as if the whole machine grinds to a halt.  But
that view is completely incompatible with most Solaris systems as well
as with this fault-isolation, resiliency marketing push with sol10.

so, if you are saying Mac users are happy with dynamic swap, raises
hand, not happy!, and even if I were it's not applicable to Solaris.

I think ZFS swap should stay with a fixed-sized (albeit manually
changeable!) cap until Java wizards can integrate some dynamicly
self-disciplining swap concepts into their gc algorithms (meaning,
probably forever).

bf You sound angry.

Maybe I am and maybe I'm not, but wouldn't it be better not to bring
this up unless it's interfering with my ability to communicate?
Because if I were, saying I sound angry is poking the monkey through
the bars, likely to make me angrier, which is unpleasant for me and
wastes time for everyone---unless it amuses you or something.  This is
a technical list.  Let's not talk about our feelings, please.


pgpmus2PqUBNK.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Jeff Bonwick
 To be honest, it is not quite clear to me, how we might utilize
 dumpadm(1M) to help us to calculate/recommend size of dump device.
 Could you please elaborate more on this ?

dumpadm(1M) -c specifies the dump content, which can be kernel, kernel plus
current process, or all memory.  If the dump content is 'all', the dump space
needs to be as large as physical memory.  If it's just 'kernel', it can be
some fraction of that.

Jeff
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Jeff Bonwick
 The problem is that size-capping is the only control we have over
 thrashing right now.

It's not just thrashing, it's also any application that leaks memory.
Without a cap, the broken application would continue plowing through
memory until it had consumed every free block in the storage pool.

What we really want is dynamic allocation with lower and upper bounds
to ensure that there's always enough swap space, and that a reasonable
upper limit isn't exceeded.  As fortune would have it, that's exactly
what we get with quotas and reservations on zvol-based swap today.

If you prefer uncapped behavior, no problem -- unset the reservation
and grow the swap zvol to 16EB.

(Ultimately it would be cleaner to express this more directly, rather
than via the nominal size of an emulated volume.  The VM 2.0 project
will address that, along with many other long-standing annoyances.)

Jeff
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Bob Friesenhahn
On Tue, 1 Jul 2008, Miles Nordin wrote:

bf What is the relationship between the size of the memory
bf reservation and thrashing?

 The problem is that size-capping is the only control we have over
 thrashing right now.  Maybe there are better ways to predict thrashing
 than through reservation size, and maybe it's possible to design swap

To be clear, thrashing as pertains to the paging device is due to 
the application making random access to virtual memory which is larger 
than the amount of physical memory on the machine.  This is very 
similar to random access to disk (i.e. not very efficient) and in fact 
it does cause random access to disk.  In a well-designed VM system 
(Solaris is probably second to none), sequential access to virtual 
memory causes reasonably sequential I/O requests to disk.  Stale or 
dirty pages are expunged as needed in order to clear space for new 
requests.  If multiple applications are fighting over the same VM, 
then there can still be thrashing even if their access is orderly.

If using more virtual address space than there is physical address 
space always leads to problems, then it would not have much value.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] behavior of disk identifiers and zpools.

2008-07-01 Thread Demian Phillips
I am using an LSI PCI-X dual port HBA, in a 2 chip opteron system.
Connected to the HBA is a SUN Storagetek A1000 populated with 14 36GB disks.

I have two questions that I think are related.

Initially I set up 2 zpools one on each channel so the pool looked like this:

share
  raidz2 
c2t3d0   
c2t4d0   
c2t5d0   
c2t6d0   
c2t7d0   
c2t8d0   
  raidz2 
c3t9d0   
c3t10d0  
c3t11d0  
c3t12d0  
c3t13d0  
c3t14d0  
spares
  c3t15d0AVAIL
  c2t2d0 AVAIL

With the mpt driver and alternate pathing turned on I could sustain 100MB/s 
throughput into the file systems I create on it.

I was learning the zpool commands and features when I unmounted the file 
systems and exported the pool. This worked and I ran the import according to 
the documentation and that worked, but it added all the disks on c2 instead of 
half on c2 and half on c3 like I had before. Now I am back down to 40MB/s at 
best throughput.

Why did it do that and how can I in such a setup export and import while 
keeping my paths how I want them?

Next question is more of a recent issue.

I posted here asking about replacing the disk, but didnt really find out if I 
needed to do any work in the OS side.

I had a disk fail and the hot spare took over. I had another disk spare in the 
array and I ran the replace using it (removed it first). I then spun down the 
bad disk and popped in a replacement.

Bringing it back up I could not add the new disk into the pool (as a 
replacement for the spare I used for the replace) even after running the proper 
utils to scan the bus (and they did run and work).

So I shutdown and rebooted.

The system comes back up fine, and before I go to add the disk I do a zpool 
status and notice that after the boot the disks in the pools have re-arranged 
themselves.

Original zpool:
share
 raidz2
  c2t3d0
  c2t4d0
  c2t5d0
  c2t6d0
  c2t7d0
  c2t8d0  drive that failed
 raidz2
  c2t9d0
  c2t10d0
  c2t11d0
  c2t12d0
  c2t13d0
  c2t14d0
spares
 c2t2d0
 c2t16d0 I have no idea why it isn't t15

I removed the c2t2d0 spare and ran zpool replace using c2t2d0 to replace the 
dead c2t8d0.
I ran a scan just to be sure before I did anything and it checks out fine.
After rebooting it shows up like this (before I add the spare volume back):

share
 raidz2
  c2t3d0
  c2t5d0
  c2t6d0
  c2t7d0
  c2t8d0
  c2t2d0 
 raidz2
  c2t9d0
  c2t10d0
  c2t11d0
  c2t12d0
  c2t13d0
  c2t14d0
spares
 c2t16d0 

The devices designated as c2t4d0 that was not touched during the replacement is 
now missing but c2t8d0 which was failed and replaced is there now. I added 
c2t4d0 as a spare and got no errors and ever ran 2 rescans just to be sure.

Everything is working ok but I'd like to know why that happened. 

I feel like trying to understand the behavior of the devices is like trying to 
map R'lyeh. I suspect I should name the server/array Cthluthu (if that will fit 
on the little lcd) or maybe Hastur (nothing like seeing that name bounce around 
on the display).
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Streaming video and audio over CIFS lags.

2008-07-01 Thread MC
I mentioned this too, but on the performance forum: 
http://www.opensolaris.org/jive/thread.jspa?threadID=64907tstart=0

Unfortunately the performance forum has tumbleweeds blowing through it, so that 
was probably the wrong place to complain.  Not that people don't care about 
performance, but the forum is dead.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Streaming video and audio over CIFS lags.

2008-07-01 Thread Richard Elling
MC wrote:
 I mentioned this too, but on the performance forum: 
 http://www.opensolaris.org/jive/thread.jspa?threadID=64907tstart=0

 Unfortunately the performance forum has tumbleweeds blowing through it, so 
 that was probably the wrong place to complain.  Not that people don't care 
 about performance, but the forum is dead.
   

It looks pretty lively from my browser :-)
http://www.opensolaris.org/jive/forum.jspa?forumID=26start=0

But it may be that you are experiencing network performance issues
specifically, which pretty quickly descends into hardware+driver
details.  It is unusual for general performance group to know much
detail about more than a few such combinations.

Note: NIC device drivers can vary widely in their designs and
optimizations.  For large systems, we tend to tune for large-scale
efficiency which de-tunes for small-scale rapid response.  For
example, interrupt coalescing is important for big machines
handling many requests, but it works against rapid response for
a small workload.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Miles Nordin
 bf == Bob Friesenhahn [EMAIL PROTECTED] writes:

bf sequential access to virtual memory causes reasonably
bf sequential I/O requests to disk.

no, thrashing is not when memory is accessed randomly instead of
sequentially.  It's when the working set of pages is too big to fit in
physical RAM.  A program that allocates twice physical RAM size, then
just scans through the entire block over and over, sequentially, will
cause thrashing: the program will run orders of magnitude slower than
it would run if it had enough physical RAM for its working set.  

Yes, I am making assumptions:

 1. more than one program is running.  the other program might just be
xterm, but it's there.

 2. programs that allocate memory expect it to be about as fast as
memory usually is.

But, just read the assumptions.  They're not really assumptions.
They're just definitions of what is RAM, and what is a time-sharing
system.  They're givens.

To benefit, you need your program to loop tens of thousands of times
over one chunk of memory, then stop using that chunk and move on to a
different chunk.  This is typical, but it's not sequential.  It's
temporal and spatial locality.

A ``well-designed'' or ``second-to-none'' VM subsystem combined with
convenient programs that only use sequentially-accessed chunks of
memory does not avoid thrashing if the working set is larger than
physical RAM.

bf If using more virtual address space than there is physical
bf address space always leads to problems, then it would not have
bf much value.

It's useful when some of the pages are almost never used, like the
part of Mozilla's text segment where the mail reader lives, or large
contiguous chunks of memory that have leaked from buggy C daemons that
kill and restart themselves every hour but leak like upside-down
buckets until then, or the getty processes running on tty's with
nothing connected to them.

It's also useful when you tend to use some pages for a while, then use
other pages.  like chew chew chew chew swallow, chew chew chew chew
swallow: maybe this takes two or three times as long to run if the
swallower has to be paged in and out, but at least if you chew for a
few minutes, and if you stop chewing while you swallow like most
people, it won't run 100 times slower.  If you make chewing and
swallowing separate threads, then the working set is now the entire
program, it doesn't fit in physical RAM, and the program thrashes and
runs 100 times slower.

sorry for the tangent.  I'll shut up now.


pgpB8blizI6Om.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] missing features? _FIOSATIME ioctl support

2008-07-01 Thread Timothy Baum
Richard L. Hamilton writes:
 _FIOSATIME - why doesn't zfs support this (assuming I didn't just miss it)?
 Might be handy for backups.

Roch Bourbonnais writes:
 Are these syscall sufficent ?
 int utimes(const char *path, const struct timeval times[2]);
 int futimesat(int fildes, const char *path, const struct timeval times[2]);

No, the difference is that utimes() and futimesat() also update ctime, while 
the _FIOSATIME ioctl only changes the atime (requires privilege).  This allows 
backup software or file scanning software to save and restore the original 
atime after processing, where changing the ctime is undesirable.   
Unfortunately, according to the Solaris source browser, the _FIOSATIME ioctl is 
only implemented for ufs, not for zfs.

For example, we use a locally-written file scanner to detect changes to file 
checksums, and would like to preserve atime and ctime.  I was very happy to 
discover the _FIOSATIME ioctl, even though it is not an officially supported 
interface.  Also, star backup software uses this capability.  Assuming 
_FIOSATIME continues to work on ufs, it would make sense to implement on zfs 
and other filesystems for consistency.

I see there is a separate thread on implementing O_NOATIME for open() or 
fcntl(), which would be compatible with Linux (and would avoid updating the 
atime at all, rather than updating it twice).  
http://www.opensolaris.org/jive/thread.jspa?messageID=195813
This presumably would also require support from the filesystem.

I assume that an existing _FIOSATIME ioctl could be implemented for zfs and 
available sooner than a new O_NOATIME flag for open/fcntl, although uniform 
support for O_NOATIME makes more sense in the long run, especially for 
cross-platform compatibility.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Streaming video and audio over CIFS lags.

2008-07-01 Thread MC
 It looks pretty lively from my browser :-)

Now that you showed up ;)

In my case it is OpenSolaris in VirtualBox so I was expecting more cooperation, 
or at least people striving to make them cooperate.

But like you said, this is likely just a case of OpenSolaris being optimized 
for big iron and not home computers.  Which is a shame because OpenSolaris is 
for home computers..
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs on top of 6140 FC array

2008-07-01 Thread Justin Vassallo
When set up with  multi-pathing to dual redundant controllers, is layering
zfs on top of the 6140 of any benefit? AFAIK this array does have internal
redundant paths up to the disk connection.

 

justin



smime.p7s
Description: S/MIME cryptographic signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Bob Friesenhahn
On Tue, 1 Jul 2008, Miles Nordin wrote:

 But, just read the assumptions.  They're not really assumptions.
 They're just definitions of what is RAM, and what is a time-sharing
 system.  They're givens.

In today's systems with two or three levels of cache in front of 
RAM, variable page sizes, and huge complexities these are definitely 
not givens.

 A ``well-designed'' or ``second-to-none'' VM subsystem combined with
 convenient programs that only use sequentially-accessed chunks of
 memory does not avoid thrashing if the working set is larger than
 physical RAM.

This simplistic view was perhaps more appropriate 10 or 15 years ago 
than it is now when typical systems come with with 2GB or more RAM and 
small rack-mount systems can be fitted with 128GB of RAM.

The notion of chewing before moving on is interesting but it is 
worthwhile noting that it takes some time for applications to chew 
through 2GB or more RAM so the simplistic view of working set is now 
pretty dated.  The chew and move on you describe becomes the normal 
case for sequential access.

Regardless, it seems that Solaris should be willing to supply a large 
virtual address space if the application needs it and the 
administrator should have the ability to apply limits. Dynamic 
reservation would decrease administrative overhead and would allow 
large programs to be run without requiring a permanent allocation. 
This would be good for me since then I don't have to permanently 
assign 32GB of space for swap in case I need it.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs on top of 6140 FC array

2008-07-01 Thread Erik Trimble

On Wed, 2008-07-02 at 02:22 +0200, Justin Vassallo wrote:
 When set up with  multi-pathing to dual redundant controllers, is
 layering zfs on top of the 6140 of any benefit? AFAIK this array does
 have internal redundant paths up to the disk connection.
 
  
 
 justin
 

Multipathing and redundant controllers in the 6140 gets you HARDWARE 
redundancy, 
so you are (less) susceptible to a piece of hardware killing your setup.

You should _still_ use ZFS redundancy to get all the other benefits, 
from self-healing to true block integrity, none of which is provided by the 
hardware 
(any hardware, for that matter).


-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problem with AOC-SAT2-MV8

2008-07-01 Thread Marc Bevand
Marc Bevand m.bevand at gmail.com writes:
 
 I have recently had to replace this AOC-SAT2-MV8 controller with another one 
 (we accidentally broke a SATA connector during a maintainance operation). Its 
 firmware version is using a totally different numbering scheme (it's probably 
 more recent) and it worked right out-of-the-box on the same motherboard.

I found the time to reboot the aforementioned system today, and the firmware
version displayed during POST by the newer AOC-SAT2-MV8 is Driver Version
3.2.1.3.

-marc

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Checksum question.

2008-07-01 Thread Brian McBride
I have some questions from a customer about zfs checksums.
Could anyone answer some of these? Thanks.

Brian

Customer:
 I would like to know more about zfs's checksum feature.  I'm guessing 
it is something that is applied to the data and not the disks (as in 
raid-5).

 For performance reasons, I turned off checksum on our zfs filesystem 
(along with atime updates).  Because of a concern for possible data 
corruption (silent data corruption), I'm interested in turning checksum 
back on.  When I do so, will it create checksums for existing files or 
will they need to be rewritten?  And can you tell me the overhead 
involved with having checksum active (CPU time, additional space)?

-- 
Brian McBride
System Support Engineer
Sun Microsystems
Cell: 206-851-1028
Email: [EMAIL PROTECTED]

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss