Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume - updated proposal

2008-07-02 Thread jan damborsky
Dave Miner wrote:
 jan damborsky wrote:
 ...
 [2] dump and swap devices will be considered optional

 dump and swap devices will be considered optional during
 fresh installation and will be created only if there is
 appropriate space available on disk provided.

 Minimum disk space required will not take into account
 dump and swap, thus allowing user to install on small disks.
 This will need to be documented (e.g. as part of release notes),
 so that user is aware of such behavior.


 I'd like to at least consider whether a warning should be displayed in 
 the GUI about the lack of dump space if it won't be created, since it 
 does represent a serviceability issue.


This is a good suggestion - I think we might display the warning message
on Summary screen before user actually starts installation process.
In this case there is possibility to go back and change the disk size.
Or we might display the warning dialog earlier - when user decides to
leave Disk screen.
I will check with Niall and Frank in order to work out the right solution
from UI point of view.

Thank you,
Jan

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-02 Thread jan damborsky
Jeff Bonwick wrote:
 To be honest, it is not quite clear to me, how we might utilize
 dumpadm(1M) to help us to calculate/recommend size of dump device.
 Could you please elaborate more on this ?

 dumpadm(1M) -c specifies the dump content, which can be kernel, kernel plus
 current process, or all memory.  If the dump content is 'all', the dump space
 needs to be as large as physical memory.  If it's just 'kernel', it can be
 some fraction of that.

I see - thanks a lot for clarification.

Jan

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-02 Thread David Magda
On Jun 30, 2008, at 19:19, Jeff Bonwick wrote:

 Dump is mandatory in the sense that losing crash dumps is criminal.

 Swap is more complex.  It's certainly not mandatory.  Not so long ago,
 swap was typically larger than physical memory.

These two statements kind of imply that dump and swap are two  
different slices. They certainly can be, but how often are they?

 On my desktop, which has 16GB of memory, the default OpenSolaris  
 swap partition is 2GB.
 That's just stupid.  Unless swap space significantly expands the
 amount of addressable virtual memory, there's no reason to have it.

Quite often swap and dump are the same device, at least in the  
installs that I've worked with, and I think the default for Solaris  
is that if dump is not explicitly specified it defaults to swap, yes?  
Is there any reason why they should be separate?

Having two just seems like a waste to me, even with disk sizes being  
what they are (and growing). A separate dump device is only really  
needed if something goes completely wrong, otherwise it's just  
sitting there doing nothing. If you're panicing, then whatever is  
in swap is now no longer relevant, so over writing it is no big deal.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-02 Thread Kyle McDonald
David Magda wrote:

 Quite often swap and dump are the same device, at least in the  
 installs that I've worked with, and I think the default for Solaris  
 is that if dump is not explicitly specified it defaults to swap, yes?  
 Is there any reason why they should be separate?

   
I beleive there are technical limitations with ZFS Boot that stop them 
from sharing the same Zvol..
 Having two just seems like a waste to me, even with disk sizes being  
 what they are (and growing). A separate dump device is only really  
 needed if something goes completely wrong, otherwise it's just  
 sitting there doing nothing. If you're panicing, then whatever is  
 in swap is now no longer relevant, so over writing it is no big deal.
   
That said, with all the talk of dynamic sizing, If, during normal 
operation the swap Zvol has space allocated, and the Dump Zvol is sized 
to 0. Then during a panic, could the swap volume be sized to 0 and the 
dump volume expanded to whatever size?

This at least while still requireing 2 Zvol's would allow (even when the 
rest of the pool is short on space) a close approximation of the old 
behavior of sharing the same slice for both swap and dump.

  -Kyle

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-02 Thread Darren J Moffat
David Magda wrote:
 On Jun 30, 2008, at 19:19, Jeff Bonwick wrote:
 
 Dump is mandatory in the sense that losing crash dumps is criminal.

 Swap is more complex.  It's certainly not mandatory.  Not so long ago,
 swap was typically larger than physical memory.
 
 These two statements kind of imply that dump and swap are two  
 different slices. They certainly can be, but how often are they?

If they are ZVOLs then they are ALWAYS different.

 Quite often swap and dump are the same device, at least in the  
 installs that I've worked with, and I think the default for Solaris  
 is that if dump is not explicitly specified it defaults to swap, yes?  

Correct.

 Is there any reason why they should be separate?

You might want dump but not swap.

They maybe connected via completely different types of storage 
interconnect.  For dump ideally you want the simplest possible route to 
the disk.

-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-02 Thread Mike Gerdts
On Wed, Jul 2, 2008 at 10:08 AM, David Magda [EMAIL PROTECTED] wrote:
 Quite often swap and dump are the same device, at least in the
 installs that I've worked with, and I think the default for Solaris
 is that if dump is not explicitly specified it defaults to swap, yes?
 Is there any reason why they should be separate?

Aside from what Kyle just said...

If they are separate you can avoid doing savecore if you are never
going to read it.  For most people, my guess is that savecore just
means that they cause a bunch of thrashing during boot (swap/dump is
typically on same spindles as /var/crashh), waste some space in
/var/crash, and never look at the crash dump.  If you come across a
time where you actually do want to look at it, you can manually run
savecore at some time in the future.

Also, last time I looked (and I've not seen anything to suggest it is
fixed) proper dependencies do not exist to prevent paging activity
after boot from trashing the crash dump in a shared swap+dump device -
even when savecore is enabled.  It is only by luck that you get
anything out of it.  Arguably this should be fixed by proper SMF
dependencies.

--
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-02 Thread sanjay nadkarni (Laptop)
Mike Gerdts wrote:
 On Wed, Jul 2, 2008 at 10:08 AM, David Magda [EMAIL PROTECTED] wrote:
   
 Quite often swap and dump are the same device, at least in the
 installs that I've worked with, and I think the default for Solaris
 is that if dump is not explicitly specified it defaults to swap, yes?
 Is there any reason why they should be separate?
 

 Aside from what Kyle just said...

 If they are separate you can avoid doing savecore if you are never
 going to read it.  For most people, my guess is that savecore just
 means that they cause a bunch of thrashing during boot (swap/dump is
 typically on same spindles as /var/crashh), waste some space in
 /var/crash, and never look at the crash dump.  If you come across a
 time where you actually do want to look at it, you can manually run
 savecore at some time in the future.

 Also, last time I looked (and I've not seen anything to suggest it is
 fixed) proper dependencies do not exist to prevent paging activity
 after boot from trashing the crash dump in a shared swap+dump device -
 even when savecore is enabled.  It is only by luck that you get
 anything out of it.  Arguably this should be fixed by proper SMF
 dependencies.
   
Really ? Back when I looked at it, dumps were written to the back end of 
the swap device.  This would prevent paging from writing on top of a 
valid dump.  Furthermore  when the system is  coming up, savecore was 
run very early to grab core so that paging would not trash the core. 


-Sanjay

 --
 Mike Gerdts
 http://mgerdts.blogspot.com/
 ___
 caiman-discuss mailing list
 [EMAIL PROTECTED]
 http://mail.opensolaris.org/mailman/listinfo/caiman-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-02 Thread Kyle McDonald
sanjay nadkarni (Laptop) wrote:
 Mike Gerdts wrote:
   
 On Wed, Jul 2, 2008 at 10:08 AM, David Magda [EMAIL PROTECTED] wrote:
   
 
 Quite often swap and dump are the same device, at least in the
 installs that I've worked with, and I think the default for Solaris
 is that if dump is not explicitly specified it defaults to swap, yes?
 Is there any reason why they should be separate?
 
   
 Aside from what Kyle just said...

 If they are separate you can avoid doing savecore if you are never
 going to read it.  For most people, my guess is that savecore just
 means that they cause a bunch of thrashing during boot (swap/dump is
 typically on same spindles as /var/crashh), waste some space in
 /var/crash, and never look at the crash dump.  If you come across a
 time where you actually do want to look at it, you can manually run
 savecore at some time in the future.

 Also, last time I looked (and I've not seen anything to suggest it is
 fixed) proper dependencies do not exist to prevent paging activity
 after boot from trashing the crash dump in a shared swap+dump device -
 even when savecore is enabled.  It is only by luck that you get
 anything out of it.  Arguably this should be fixed by proper SMF
 dependencies.
   
 
 Really ? Back when I looked at it, dumps were written to the back end of 
 the swap device.  This would prevent paging from writing on top of a 
 valid dump.  Furthermore  when the system is  coming up, savecore was 
 run very early to grab core so that paging would not trash the core. 

   
I'm guessing Mike is suggesting that making the swap device available 
for paging should be dependent on savecore having already completed it's 
job.

-Kyle

 -Sanjay

   
 --
 Mike Gerdts
 http://mgerdts.blogspot.com/
 ___
 caiman-discuss mailing list
 [EMAIL PROTECTED]
 http://mail.opensolaris.org/mailman/listinfo/caiman-discuss
   
 

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-02 Thread George Wilson
Kyle McDonald wrote:
 David Magda wrote:
   
 Quite often swap and dump are the same device, at least in the  
 installs that I've worked with, and I think the default for Solaris  
 is that if dump is not explicitly specified it defaults to swap, yes?  
 Is there any reason why they should be separate?

   
 
 I beleive there are technical limitations with ZFS Boot that stop them 
 from sharing the same Zvol..
   
Yes, there is. Swap zvols are ordinary zvols which still COW their 
blocks and leverage checksumming, etc. Dump zvols don't have this luxury 
because when the system crashes you are limited in the number of tasks 
that you can perform. So we solved this by changing the personality of a 
zvol when it's added as a dump device. In particular, we needed to make 
sure that all the blocks that the dump device cared about were available 
at the time of a system crash. So we preallocate the dump device when it 
gets created. We also follow a different I/O path when writing to a dump 
device allowing us to behave as if we were a separate partition on the 
disk. The dump subsystem doesn't know the difference which is exactly 
what we wanted. :-)

 Having two just seems like a waste to me, even with disk sizes being  
 what they are (and growing). A separate dump device is only really  
 needed if something goes completely wrong, otherwise it's just  
 sitting there doing nothing. If you're panicing, then whatever is  
 in swap is now no longer relevant, so over writing it is no big deal.
   
 
 That said, with all the talk of dynamic sizing, If, during normal 
 operation the swap Zvol has space allocated, and the Dump Zvol is sized 
 to 0. Then during a panic, could the swap volume be sized to 0 and the 
 dump volume expanded to whatever size.
   

Unfortunately that's not possible for the reasons I mentioned. You can 
resize the dump zvol to a smaller size but unfortunately you can't make 
it a size 0 as there is a minimum size requirement.

Thanks,
George
 This at least while still requireing 2 Zvol's would allow (even when the 
 rest of the pool is short on space) a close approximation of the old 
 behavior of sharing the same slice for both swap and dump.

   -Kyle

   
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   
 

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread jan damborsky
Hi Jeff,


Jeff Bonwick wrote:
 Neither swap or dump are mandatory for running Solaris.

 Dump is mandatory in the sense that losing crash dumps is criminal.

I think that installer should be tolerant in this point and shouldn't
refuse to proceed with installation if user doesn't provide enough
available disk space to create dump device.

It should be probably documented (for example mentioned in release notes)
that when minimum disk space is provided for installation, swap  dump
are not created.


 Swap is more complex.  It's certainly not mandatory.  Not so long ago,
 swap was typically larger than physical memory.  But in recent years,
 we've essentially moved to a world in which paging is considered a bug.
 Swap devices are often only a fraction of physical memory size now,
 which raises the question of why we even bother.  On my desktop, which
 has 16GB of memory, the default OpenSolaris swap partition is 2GB.
 That's just stupid.  Unless swap space significantly expands the
 amount of addressable virtual memory, there's no reason to have it.

I agree with you in this point. Since new formula for calculating
swap  dump will take into account amount of physical memory, the
values should make more sense.

That said, this is just default value and certainly wouldn't be feasible
in all situations. However, as this is something which can be changed at
will after installation is done, I would rather keep that formula as simple
as reasonable.


 There have been a number of good suggestions here:

 (1) The right way to size the dump device is to let dumpadm(1M) do it
 based on the dump content type.

To be honest, it is not quite clear to me, how we might utilize
dumpadm(1M) to help us to calculate/recommend size of dump device.
Could you please elaborate more on this ?


 (2) In a virtualized environment, a better way to get a crash dump
 would be to snapshot the VM.  This would require a little bit
 of host/guest cooperation, in that the installer (or dumpadm)
 would have to know that it's operating in a VM, and the kernel
 would need some way to notify the VM that it just panicked.
 Both of these ought to be doable.

Yes - I like this idea as well. But until the appropriate support is
provided by virtual tools and/or implemented in kernel, I think (I might
be wrong) that in the installer we will still need to use standard
mechanisms for now.

Thank you,
Jan

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread jan damborsky
Mike Gerdts wrote:
 On Mon, Jun 30, 2008 at 9:19 AM, jan damborsky [EMAIL PROTECTED] wrote:
 Hi Mike,


 Mike Gerdts wrote:
 On Wed, Jun 25, 2008 at 11:09 PM, Jan Damborsky [EMAIL PROTECTED]
 wrote:
 Thank you very much all for this valuable input.

 Based on the collected information, I would take
 following approach as far as calculating size of
 swap and dump devices on ZFS volumes in Caiman
 installer is concerned.

 [1] Following formula would be used for calculating
   swap and dump sizes:

 size_of_swap = size_of_dump = MAX(512 MiB, MIN(physical_memory/2, 32
 GiB))
 dump should scale with memory size, but the size given is completely
 overkill.  On very active (heavy kernel activity) servers with 300+ GB
 of RAM, I have never seen a (compressed) dump that needed more than 8
 GB.  Even uncompressed the maximum size I've seen has been in the 18
 GB range.  This has been without zfs in the mix.  It is my
 understanding that at one time the arc was dumped as part of kernel
 memory but that was regarded as a bug and has sense been fixed.  If
 the arc is dumped, a value of dump much closer to physical memory is
 likely to be appropriate.
 I would agree that given the fact, user can customize this any time
 after installation, the smaller upper bound is the better. Would
 it be fine then to use 16 GiB, or even smaller one would be more
 appropriate ?

 By default, only kernel memory is dumped to the dump device.  Further,
 this is compressed.  I have heard that 3x compression is common and
 the samples that I have range from 3.51x - 6.97x.

 If you refer to InfoDoc 228921 (contract only - can that be opened or
 can a Sun employee get permission to post same info to an open wiki?)
 you will see a method for approximating the size of a crash dump.  On
 my snv_91 virtualbox instance (712 MB RAM configured), that method
 gave me an estimated (uncompressed) crash dump size of about 450 MB.
 I induced a panic to test the approximation.  In reality it was 323 MB
 and compress(1) takes it down to 106 MB.  My understanding is that the
 algorithm used in the kernel is a bit less aggressive than the
 algorithm used by compress(1) so maybe figure 120 - 150 MB in this
 case.  My guess is that this did not compress as well as my other
 samples because on this smaller system a higher percentage of my
 kernel pages were not full of zeros.

 Perhaps the right size for the dump device is more like:

 MAX(256 MiB, MIN(physical_memory/4, 16 GiB)

Thanks a lot for making this investigation and collecting
valuable data - I will modify the proposed formula according
to your suggestion.


 Further, dumpadm(1M) could be enhanced to resize the dump volume on
 demand.  The size that it would choose would likely be based upon what
 is being dumped (kernel, kernel+user, etc.), memory size, current
 estimate using InfoDoc 228921 logic, etc.

 As an aside, does the dedicated dump on all machines make it so that
 savecore no longer runs by default?  It just creates a lot of extra
 I/O during boot (thereby slowing down boot after a crash) and uses a
 lot of extra disk space for those that will never look at a crash
 dump.  Those that actually use it (not the majority target audience
 for OpenSolaris, I would guess) will be able to figure out how to
 enable (the yet non-existent) svc:/system/savecore:default.

 Looking at the savecore(1M) man pages, it seems that it is managed
 by svc:/system/dumpadm:default. Looking at the installed system,
 this service is online. If I understand correctly, you are recommending
 to disable it by default ?

 dumpadm -n is really the right way to do this.

I see - thanks for clarifying it.

Jan

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread jan damborsky
Dave Miner wrote:
 I agree - I am just thinking, if it is fine in general to allow
 normal non-experienced user (who is the target audience for Slim
 installer) to run system without swap. To be honest, I don't know,
 since I am not very experienced in this area.
 If people agree that this is not issue at all, I don't have any
 objections against making swap optional.


 Now that we don't have to reserve slices for it, making swap optional in 
 the space calculation is fine.  We don't place any lower limits on 
 memory, and it's just virtual memory, after all.  Besides which, we can 
 infer that the system works well enough for the user's purposes without 
 swap since the boot from the CD won't have used any swap.

That is a good point. Based on this and also on Jeff's comment
I will make swap optional as well.

Thank you,
Jan

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Jürgen Keil
Mike Gerdts wrote

 By default, only kernel memory is dumped to the dump device.  Further,
 this is compressed.  I have heard that 3x compression is common and
 the samples that I have range from 3.51x - 6.97x.

My samples are in the range 1.95x - 3.66x.  And yes, I lost
a few crash dumps on a box with a 2GB swap slice, after
physical memory was upgraded from 4GB to 8GB.

% grep pages dumped /var/adm/messages*
/var/adm/messages:Jun 27 13:43:56 tiger2 genunix: [ID 409368 kern.notice] 
^M100% done: 593680 pages dumped, compression ratio 3.51, 
/var/adm/messages.0:Jun 25 13:08:22 tiger2 genunix: [ID 409368 kern.notice] 
^M100% done: 234922 pages dumped, compression ratio 2.39, 
/var/adm/messages.1:Jun 12 13:22:53 tiger2 genunix: [ID 409368 kern.notice] 
^M100% done: 399746 pages dumped, compression ratio 1.95, 
/var/adm/messages.1:Jun 12 19:00:01 tiger2 genunix: [ID 409368 kern.notice] 
^M100% done: 245417 pages dumped, compression ratio 2.41, 
/var/adm/messages.1:Jun 16 19:15:37 tiger2 genunix: [ID 409368 kern.notice] 
^M100% done: 710001 pages dumped, compression ratio 3.48, 
/var/adm/messages.1:Jun 16 19:21:35 tiger2 genunix: [ID 409368 kern.notice] 
^M100% done: 315989 pages dumped, compression ratio 3.66, 
/var/adm/messages.2:Jun 11 15:40:32 tiger2 genunix: [ID 409368 kern.notice] 
^M100% done: 341209 pages dumped, compression ratio 2.68,
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Darren J Moffat
Jeff Bonwick wrote:
 Neither swap or dump are mandatory for running Solaris.
 
 Dump is mandatory in the sense that losing crash dumps is criminal.

Agreed on that point, I remember all to well why I was in Sun Service 
the days when the first dump was always lost because savecore didn't 
used to be run!

 Swap is more complex.  It's certainly not mandatory.  Not so long ago,
 swap was typically larger than physical memory.  But in recent years,
 we've essentially moved to a world in which paging is considered a bug.
 Swap devices are often only a fraction of physical memory size now,
 which raises the question of why we even bother.  On my desktop, which
 has 16GB of memory, the default OpenSolaris swap partition is 2GB.
 That's just stupid.  Unless swap space significantly expands the
 amount of addressable virtual memory, there's no reason to have it.

What has alwyas annoyed me about Solaris (and every Linux distro I've 
ever used) is that unlike Windows and MacOS X we put swap management 
(devices and their size) into the hands of the admin.  The upside of 
this though is that it is easy to mirror swap using SVM.

Instead we should take it completely out of their hands and do it all 
dynamically when it is needed.  Now that we can swap on a ZVOL and ZVOLs 
can be extended this is much easier to deal with and we don't lose the 
benefit of protected swap devices (in fact we have much more than we had 
with SVM).


-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Mike Gerdts
On Tue, Jul 1, 2008 at 5:56 AM, Darren J Moffat [EMAIL PROTECTED] wrote:
 Instead we should take it completely out of their hands and do it all
 dynamically when it is needed.  Now that we can swap on a ZVOL and ZVOLs
 can be extended this is much easier to deal with and we don't lose the
 benefit of protected swap devices (in fact we have much more than we had
 with SVM).

Are you suggesting that if I have a system that has 500 MB swap free
and someone starts up another JVM with a 16 GB heap that swap should
automatically grow by 16+ GB right at that time?  I have seen times
where applications require X GB of RAM, make the reservation, then
never dirty more than X/2 GB of pages.  In these cases dynamically
growing swap to a certain point may be OK.

In most cases, however, I see this as a recipe for disaster.  I would
rather have an application die (and likely restart via SMF) because it
can't get the memory that it requested than have heavy paging bring
the system to such a crawl that transactions time out and it takes
tens of minutes for administrators to log in and shut down some
workload.  The app that can't start will likely do so during a
maintenance window.  The app that causes the system to crawl will,
with all likelihood, do so during peak production or when the admin is
in bed.

Perhaps bad paging activity (definition needed) should throw some
messages to FMA so that the nice GUI tool that answers the question
why does my machine suck? can say that it has been excessively short
on memory X times in recent history.  Any of these approaches is miles
above the Linux approach of finding a memory hog to kill.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Darren J Moffat
Mike Gerdts wrote:
 On Tue, Jul 1, 2008 at 5:56 AM, Darren J Moffat [EMAIL PROTECTED] wrote:
 Instead we should take it completely out of their hands and do it all
 dynamically when it is needed.  Now that we can swap on a ZVOL and ZVOLs
 can be extended this is much easier to deal with and we don't lose the
 benefit of protected swap devices (in fact we have much more than we had
 with SVM).
 
 Are you suggesting that if I have a system that has 500 MB swap free
 and someone starts up another JVM with a 16 GB heap that swap should
 automatically grow by 16+ GB right at that time?  I have seen times
 where applications require X GB of RAM, make the reservation, then
 never dirty more than X/2 GB of pages.  In these cases dynamically
 growing swap to a certain point may be OK.

Not at all, and I don't see how you could get that assumption from what 
I said.  I said dynamically when it is needed.

 In most cases, however, I see this as a recipe for disaster.  I would
 rather have an application die (and likely restart via SMF) because it
 can't get the memory that it requested than have heavy paging bring
 the system to such a crawl that transactions time out and it takes
 tens of minutes for administrators to log in and shut down some
 workload.  The app that can't start will likely do so during a
 maintenance window.  The app that causes the system to crawl will,
 with all likelihood, do so during peak production or when the admin is
 in bed.

I would not favour a system where the admin had no control over swap.
I'm just suggesting that in many cases where swap is actually needed 
there is no real need for the admin to be involved in managing the swap 
and its size should not need to be predetermined.

-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Mike Gerdts
On Tue, Jul 1, 2008 at 7:31 AM, Darren J Moffat [EMAIL PROTECTED] wrote:
 Mike Gerdts wrote:

 On Tue, Jul 1, 2008 at 5:56 AM, Darren J Moffat [EMAIL PROTECTED]
 wrote:

 Instead we should take it completely out of their hands and do it all
 dynamically when it is needed.  Now that we can swap on a ZVOL and ZVOLs
 can be extended this is much easier to deal with and we don't lose the
 benefit of protected swap devices (in fact we have much more than we had
 with SVM).

 Are you suggesting that if I have a system that has 500 MB swap free
 and someone starts up another JVM with a 16 GB heap that swap should
 automatically grow by 16+ GB right at that time?  I have seen times
 where applications require X GB of RAM, make the reservation, then
 never dirty more than X/2 GB of pages.  In these cases dynamically
 growing swap to a certain point may be OK.

 Not at all, and I don't see how you could get that assumption from what I
 said.  I said dynamically when it is needed.

I think I came off wrong in my initial message.  I've seen times when
vmstat reports only megabytes of free swap while gigabytes of RAM were
available.  That is, reservations far outstripped actual usage.  Do
you have mechanisms in mind to be able to detect such circumstances
and grow swap to a point that the system can handle more load without
spiraling to a long slow death?

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Jason King
On Tue, Jul 1, 2008 at 8:10 AM, Mike Gerdts [EMAIL PROTECTED] wrote:
 On Tue, Jul 1, 2008 at 7:31 AM, Darren J Moffat [EMAIL PROTECTED] wrote:
 Mike Gerdts wrote:

 On Tue, Jul 1, 2008 at 5:56 AM, Darren J Moffat [EMAIL PROTECTED]
 wrote:

 Instead we should take it completely out of their hands and do it all
 dynamically when it is needed.  Now that we can swap on a ZVOL and ZVOLs
 can be extended this is much easier to deal with and we don't lose the
 benefit of protected swap devices (in fact we have much more than we had
 with SVM).

 Are you suggesting that if I have a system that has 500 MB swap free
 and someone starts up another JVM with a 16 GB heap that swap should
 automatically grow by 16+ GB right at that time?  I have seen times
 where applications require X GB of RAM, make the reservation, then
 never dirty more than X/2 GB of pages.  In these cases dynamically
 growing swap to a certain point may be OK.

 Not at all, and I don't see how you could get that assumption from what I
 said.  I said dynamically when it is needed.

 I think I came off wrong in my initial message.  I've seen times when
 vmstat reports only megabytes of free swap while gigabytes of RAM were
 available.  That is, reservations far outstripped actual usage.  Do
 you have mechanisms in mind to be able to detect such circumstances
 and grow swap to a point that the system can handle more load without
 spiraling to a long slow death?

Having this dynamic would be nice with Oracle.  10g at least will use
DISM in the preferred configuration Oracle is now preaching to DBAs.
I ran into this a few months ago on an upgrade (Solaris 8 - 10,
Oracle 8 - 10g, and hw upgrade).  The side effect of using DISM is
that it reserves an amount equal to the SGA in swap, and will fail to
startup if swap is too small.  In practice, I don't see the space ever
being touched (I suspect it's mostly there as a requirement for
dynamic reconfiguration w/ DISM, but didn't bother to dig that far).
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Darren J Moffat
Mike Gerdts wrote:

 Not at all, and I don't see how you could get that assumption from what I
 said.  I said dynamically when it is needed.
 
 I think I came off wrong in my initial message.  I've seen times when
 vmstat reports only megabytes of free swap while gigabytes of RAM were
 available.  That is, reservations far outstripped actual usage. 

Ah that makes it more clear.

  Do you have mechanisms in mind to be able to detect such circumstances
 and grow swap to a point that the system can handle more load without
 spiraling to a long slow death?

I don't as yet because I haven't had time to think about this.  Maybe 
once I've finished with the ZFS Crypto project and I spend some time 
looking at encrypted VM (other than by swapping on an encrypted ZVOL).
At the moment while it annoys me it isn't on my todo list to try and 
implement a fix.

-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Richard Elling
Darren J Moffat wrote:
 Mike Gerdts wrote:

   
 Not at all, and I don't see how you could get that assumption from what I
 said.  I said dynamically when it is needed.
   
 I think I came off wrong in my initial message.  I've seen times when
 vmstat reports only megabytes of free swap while gigabytes of RAM were
 available.  That is, reservations far outstripped actual usage. 
 

 Ah that makes it more clear.

   Do you have mechanisms in mind to be able to detect such circumstances
   
 and grow swap to a point that the system can handle more load without
 spiraling to a long slow death?
 

 I don't as yet because I haven't had time to think about this.  Maybe 
 once I've finished with the ZFS Crypto project and I spend some time 
 looking at encrypted VM (other than by swapping on an encrypted ZVOL).
 At the moment while it annoys me it isn't on my todo list to try and 
 implement a fix.

   

Here is a good start, BSD's dynamic_pager
http://developer.apple.com/documentation/Darwin/Reference/ManPages/man8/dynamic_pager.8.html

Mike, many people use this all day long and seem to be quite happy.
I think the slow death spiral might be overrated :-)
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Keith Bierman

On Jul 1, 2008, at 10:55 AM, Miles Nordin wrote:

 I don't think it's overrated at all.  People all around me are using
 this dynamic_pager right now, and they just reboot when they see too
 many pinwheels.  If they are ``quite happy,'' it's not with their
 pager.

I often exist in a sea of mac users, and I've never seen them reboot  
other than after the periodic Apple Updates. Killing firefox every  
couple of days, or after visiting certain demented sites is not  
uncommon and is probably a good idea.
 

 They see demand as capacity rather than temperature but...the machine
 does need to run out of memory eventually.  Don't drink the
 dynamic_pager futuristic kool-aid.  It's broken, both in theory and in
 the day-to-day experience of the Mac users around me.


I've got macs with uptimes of months ... admittedly not in the same  
territory as my old SunOS or Solaris boxes, but Apple has seldom  
resisted the temptation to drop a security update or a quicktime  
update for longer.

-- 
Keith H. Bierman   [EMAIL PROTECTED]  | AIM kbiermank
5430 Nassau Circle East  |
Cherry Hills Village, CO 80113   | 303-997-2749
speaking for myself* Copyright 2008




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Bob Friesenhahn
On Tue, 1 Jul 2008, Miles Nordin wrote:

 I don't think it's overrated at all.  People all around me are using
 this dynamic_pager right now, and they just reboot when they see too
 many pinwheels.  If they are ``quite happy,'' it's not with their
 pager.

While we have seen these pinwheels under OS-X, the cause seems to be 
usually application lockup (due to poor application/library design) 
and not due to paging to death.  Paging to death causes lots of 
obvious disk churn.

Microsoft Windows includes a dynamic page file as well.

It is wrong to confuse total required paging space with thrashing. 
These are completely different issues.

Dynamic sizing of paging space seems to fit well with the new zfs 
root/boot strategy where everything is shared via a common pool.  If 
you don't use it, you don't lose it.  System resource limits can be 
used to block individual applications from consuming all resources.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Richard Elling
Miles Nordin wrote:
 re == Richard Elling [EMAIL PROTECTED] writes:
 

 re Mike, many people use this all day long and seem to be quite
 re happy.  I think the slow death spiral might be overrated :-)

 I don't think it's overrated at all.  People all around me are using
 this dynamic_pager right now, and they just reboot when they see too
 many pinwheels.  If they are ``quite happy,'' it's not with their
 pager.
   

If you run out of space, things fail.  Pinwheels are a symptom of
running out of RAM, not running out of swap.

 The pinwheel is part of a Mac user's daily vocabulary, and although
 they generally don't know this, it almost always appears because of
 programs that leak memory, grow, and eventually cause thrashing.  They
 do not even realize that restarting Mail or Firefox will fix the
 pinwheels.  They just reboot.  
   

...which frees RAM.

 so obviously it's an unworkable approach.  To them, being forced to
 reboot, even if it takes twenty minutes to shut down as long as it's a
 clean reboot, makes them feel more confident than Firefox unexpectedly
 crashing.  For us, exactly the opposite is true.

 I think dynamic_pager gets it backwards.  ``demand'' is a reason *NOT*
 to increase swap.  If all the allocated pages in swap are
 cold---colder than the disk's io capacity---then there is no
 ``demand'' and maybe it's ok to add some free pages which might absorb
 some warmer data.  If there are already warm pages in swap
 (``demand''), then do not satisfy more of it, instead let swap fill
 and return ENOMEM.
   

You will get more service calls for failures due to ENOMEM than
you will get for pinwheels.  Given the large size of disks in today's
systems, you may never see an ENOMEM.  The goodness here is
that it is one less thing that requires a service touch, even a local
sysadmin service touch costs real $$.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Miles Nordin
 bf == Bob Friesenhahn [EMAIL PROTECTED] writes:
 re == Richard Elling [EMAIL PROTECTED] writes:

re If you run out of space, things fail.  Pinwheels are a symptom
re of running out of RAM, not running out of swap.

okay.  But what is the point?

Pinwheels are a symptom of thrashing.

Pinwheels are not showing up when the OS is returning ENOMEM.
Pinwheels are not ``things fail'', they are ``things are going slower
than some watcher thinks they should.''

AFAICT they show up when the application under the cursor has been
blocked for about five seconds, which is usually because it's
thrashing, though sometimes it's because it's trying to read from an
NFS share that went away (this also causes pinwheels).

bf While we have seen these pinwheels under OS-X, the cause
bf seems to be usually application lockup (due to poor
bf application/library design) and not due to paging to death.

that's simply not my experience.

bf Paging to death causes lots of obvious disk churn.

You can check for it in 'top' on OS X.  they list pageins and pageouts.

bf It is wrong to confuse total required paging space with
bf thrashing.  These are completely different issues.

and I did not.  I even rephrased the word ``demand'' in terms of
thrashing.  I am not confused.

bf Dynamic sizing of paging space seems to fit well with the new
bf zfs root/boot strategy where everything is shared via a common
bf pool.

yes, it fits extremely well.

What I'm saying is, do not do it just because it ``fits well''.  Even
if it fits really really well so it almost begs you like a sort of
compulsive taxonomical lust to put the square peg into the square
hole, don't do it, because it's a bad idea!

When applications request memory reservations that are likely to bring
the whole system down due to thrashing, they need to get ENOMEM.  It
isn't okay to change the memory reservation ceiling to the ZFS pool
size, or to any other unreasonably large and not-well-considered
amount, even if the change includes a lot of mealy-mouthed pandering
orbiting around the word ``dynamic''.


pgpAy1nBoP74b.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Bob Friesenhahn
On Tue, 1 Jul 2008, Miles Nordin wrote:
 
 okay.  But what is the point?
 
 Pinwheels are a symptom of thrashing.

They seem like the equivalent of the meaningless hourglass icon to me.

 Pinwheels are not showing up when the OS is returning ENOMEM.
 Pinwheels are not ``things fail'', they are ``things are going slower
 than some watcher thinks they should.''

Not all applications demand instant response when they are processing. 
Sometimes they have actual work to do.

 bf It is wrong to confuse total required paging space with
 bf thrashing.  These are completely different issues.
 
 and I did not.  I even rephrased the word ``demand'' in terms of
 thrashing.  I am not confused.

You sound angry.

 When applications request memory reservations that are likely to bring
 the whole system down due to thrashing, they need to get ENOMEM.  It

What is the relationship between the size of the memory reservation 
and thrashing?  Are they somehow related?  I don't see the 
relationship.  It does not bother me if the memory reservation is 10X 
the size of physical memory as long as the access is orderly and not 
under resource contention (i.e. thrashing).  A few days ago I had a 
process running which consumed 48GB of virtual address space without 
doing any noticeable thrashing and with hardly any impact to usability 
of the desktop.

 isn't okay to change the memory reservation ceiling to the ZFS pool
 size, or to any other unreasonably large and not-well-considered
 amount, even if the change includes a lot of mealy-mouthed pandering
 orbiting around the word ``dynamic''.

I have seen mealy worms.  They are kind of ugly but fun to hold in 
your hand and show your friends.  I am don't think I would want them 
in my mouth and am not sure how I would pander to a worm.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Miles Nordin
 bf == Bob Friesenhahn [EMAIL PROTECTED] writes:

bf What is the relationship between the size of the memory
bf reservation and thrashing?

The problem is that size-capping is the only control we have over
thrashing right now.  Maybe there are better ways to predict thrashing
than through reservation size, and maybe it's possible to design swap
admission control that's safer and yet also more gracious to your
Java-like reservations of large cold datasets than the flat capping we
have now, but Mac OS doesn't have it.

I suspect there are even cough some programs that try to get an idea
how much memory pressure there is, and how often they need to gc, by
making big reservations until they get ENOMEM.  They develop tricks in 
an ecosystem, presuming some reasonable swap cap is configured, so
removing it will mess up their (admittedly clumsy) tricks.

To my view, if the goal is ``manual tuning is bad.  we want to
eliminate swap size as a manual tuneable,'' then the ``dynamic''
aspect of the tuning should be to grow the swap area until it gets too
hot: until the ``demand'' is excessive.  Some WFQ-ish thing might be
in order, too, like a complicated version of ulimit.  But this may be
tricky or impossible, and in any case none of that is on the table so
far: the type of autotuning you are trying to copy from other
operating systems is just to remove the upper limit on swap size
entirely, which is a step backwards.

I think it's the wrong choice for a desktop, but it is somewhat
workable choice on a single-user machine where it's often just as
irritating to the user if his one big application crashes in which all
his work is stored, as if the whole machine grinds to a halt.  But
that view is completely incompatible with most Solaris systems as well
as with this fault-isolation, resiliency marketing push with sol10.

so, if you are saying Mac users are happy with dynamic swap, raises
hand, not happy!, and even if I were it's not applicable to Solaris.

I think ZFS swap should stay with a fixed-sized (albeit manually
changeable!) cap until Java wizards can integrate some dynamicly
self-disciplining swap concepts into their gc algorithms (meaning,
probably forever).

bf You sound angry.

Maybe I am and maybe I'm not, but wouldn't it be better not to bring
this up unless it's interfering with my ability to communicate?
Because if I were, saying I sound angry is poking the monkey through
the bars, likely to make me angrier, which is unpleasant for me and
wastes time for everyone---unless it amuses you or something.  This is
a technical list.  Let's not talk about our feelings, please.


pgpmus2PqUBNK.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Jeff Bonwick
 To be honest, it is not quite clear to me, how we might utilize
 dumpadm(1M) to help us to calculate/recommend size of dump device.
 Could you please elaborate more on this ?

dumpadm(1M) -c specifies the dump content, which can be kernel, kernel plus
current process, or all memory.  If the dump content is 'all', the dump space
needs to be as large as physical memory.  If it's just 'kernel', it can be
some fraction of that.

Jeff
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Jeff Bonwick
 The problem is that size-capping is the only control we have over
 thrashing right now.

It's not just thrashing, it's also any application that leaks memory.
Without a cap, the broken application would continue plowing through
memory until it had consumed every free block in the storage pool.

What we really want is dynamic allocation with lower and upper bounds
to ensure that there's always enough swap space, and that a reasonable
upper limit isn't exceeded.  As fortune would have it, that's exactly
what we get with quotas and reservations on zvol-based swap today.

If you prefer uncapped behavior, no problem -- unset the reservation
and grow the swap zvol to 16EB.

(Ultimately it would be cleaner to express this more directly, rather
than via the nominal size of an emulated volume.  The VM 2.0 project
will address that, along with many other long-standing annoyances.)

Jeff
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Bob Friesenhahn
On Tue, 1 Jul 2008, Miles Nordin wrote:

bf What is the relationship between the size of the memory
bf reservation and thrashing?

 The problem is that size-capping is the only control we have over
 thrashing right now.  Maybe there are better ways to predict thrashing
 than through reservation size, and maybe it's possible to design swap

To be clear, thrashing as pertains to the paging device is due to 
the application making random access to virtual memory which is larger 
than the amount of physical memory on the machine.  This is very 
similar to random access to disk (i.e. not very efficient) and in fact 
it does cause random access to disk.  In a well-designed VM system 
(Solaris is probably second to none), sequential access to virtual 
memory causes reasonably sequential I/O requests to disk.  Stale or 
dirty pages are expunged as needed in order to clear space for new 
requests.  If multiple applications are fighting over the same VM, 
then there can still be thrashing even if their access is orderly.

If using more virtual address space than there is physical address 
space always leads to problems, then it would not have much value.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Miles Nordin
 bf == Bob Friesenhahn [EMAIL PROTECTED] writes:

bf sequential access to virtual memory causes reasonably
bf sequential I/O requests to disk.

no, thrashing is not when memory is accessed randomly instead of
sequentially.  It's when the working set of pages is too big to fit in
physical RAM.  A program that allocates twice physical RAM size, then
just scans through the entire block over and over, sequentially, will
cause thrashing: the program will run orders of magnitude slower than
it would run if it had enough physical RAM for its working set.  

Yes, I am making assumptions:

 1. more than one program is running.  the other program might just be
xterm, but it's there.

 2. programs that allocate memory expect it to be about as fast as
memory usually is.

But, just read the assumptions.  They're not really assumptions.
They're just definitions of what is RAM, and what is a time-sharing
system.  They're givens.

To benefit, you need your program to loop tens of thousands of times
over one chunk of memory, then stop using that chunk and move on to a
different chunk.  This is typical, but it's not sequential.  It's
temporal and spatial locality.

A ``well-designed'' or ``second-to-none'' VM subsystem combined with
convenient programs that only use sequentially-accessed chunks of
memory does not avoid thrashing if the working set is larger than
physical RAM.

bf If using more virtual address space than there is physical
bf address space always leads to problems, then it would not have
bf much value.

It's useful when some of the pages are almost never used, like the
part of Mozilla's text segment where the mail reader lives, or large
contiguous chunks of memory that have leaked from buggy C daemons that
kill and restart themselves every hour but leak like upside-down
buckets until then, or the getty processes running on tty's with
nothing connected to them.

It's also useful when you tend to use some pages for a while, then use
other pages.  like chew chew chew chew swallow, chew chew chew chew
swallow: maybe this takes two or three times as long to run if the
swallower has to be paged in and out, but at least if you chew for a
few minutes, and if you stop chewing while you swallow like most
people, it won't run 100 times slower.  If you make chewing and
swallowing separate threads, then the working set is now the entire
program, it doesn't fit in physical RAM, and the program thrashes and
runs 100 times slower.

sorry for the tangent.  I'll shut up now.


pgpB8blizI6Om.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Bob Friesenhahn
On Tue, 1 Jul 2008, Miles Nordin wrote:

 But, just read the assumptions.  They're not really assumptions.
 They're just definitions of what is RAM, and what is a time-sharing
 system.  They're givens.

In today's systems with two or three levels of cache in front of 
RAM, variable page sizes, and huge complexities these are definitely 
not givens.

 A ``well-designed'' or ``second-to-none'' VM subsystem combined with
 convenient programs that only use sequentially-accessed chunks of
 memory does not avoid thrashing if the working set is larger than
 physical RAM.

This simplistic view was perhaps more appropriate 10 or 15 years ago 
than it is now when typical systems come with with 2GB or more RAM and 
small rack-mount systems can be fitted with 128GB of RAM.

The notion of chewing before moving on is interesting but it is 
worthwhile noting that it takes some time for applications to chew 
through 2GB or more RAM so the simplistic view of working set is now 
pretty dated.  The chew and move on you describe becomes the normal 
case for sequential access.

Regardless, it seems that Solaris should be willing to supply a large 
virtual address space if the application needs it and the 
administrator should have the ability to apply limits. Dynamic 
reservation would decrease administrative overhead and would allow 
large programs to be run without requiring a permanent allocation. 
This would be good for me since then I don't have to permanently 
assign 32GB of space for swap in case I need it.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-06-30 Thread jan damborsky
Hi Darren,


Darren J Moffat wrote:
 Jan Damborsky wrote:
 Thank you very much all for this valuable input.

 Based on the collected information, I would take
 following approach as far as calculating size of
 swap and dump devices on ZFS volumes in Caiman
 installer is concerned.

 [1] Following formula would be used for calculating
 swap and dump sizes:

 size_of_swap = size_of_dump = MAX(512 MiB, MIN(physical_memory/2, 32 
 GiB))

 User can reconfigure this after installation is done on live
 system by zfs set command.

 If the min space isn't available do NOT abort the install just 
 continue without creating swap space, but put a small warning 
 somewhere suitable.
 Don't ask the user to confirm this and don't make a big deal about it.


I think it is necessary to have some absolute minimum
and not allow installer to proceed if user doesn't
provide at least minimum required, as we have to make
sure that installation doesn't fail because of space
issues.

As this lower bound is not hard coded but dynamically calculated
by the installer according to the size of bits to be installed,
it reflects actual needs as far as necessary minimum space is
required - it is currently ~4GiB.

However, the absolute minimum always includes minimum swap space,
which is now 512 MiB. I think the algorithm might be modified,
so that swap space is not created if space doesn't allow it,
but to be honest I don't know if this is what we want to allow
for normal user.

Thank you,
Jan

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-06-30 Thread Darren J Moffat
jan damborsky wrote:
 I think it is necessary to have some absolute minimum
 and not allow installer to proceed if user doesn't
 provide at least minimum required, as we have to make
 sure that installation doesn't fail because of space
 issues.

I very strongly disagree.

Neither swap or dump are mandatory for running Solaris.

 As this lower bound is not hard coded but dynamically calculated
 by the installer according to the size of bits to be installed,
 it reflects actual needs as far as necessary minimum space is
 required - it is currently ~4GiB.

Which is unrealistically too high given that the actual amount of bits 
that are put on disk by a minimal install.

 However, the absolute minimum always includes minimum swap space,
 which is now 512 MiB. I think the algorithm might be modified,
 so that swap space is not created if space doesn't allow it,
 but to be honest I don't know if this is what we want to allow

Why not ?  swap is not mandatory.

If there is enough space for the packages that will be installed but not 
enough for swap or dump then the installation should proceed, it just 
wouldn't create swap or dump.

-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-06-30 Thread jan damborsky
Darren J Moffat wrote:
 jan damborsky wrote:
 I think it is necessary to have some absolute minimum
 and not allow installer to proceed if user doesn't
 provide at least minimum required, as we have to make
 sure that installation doesn't fail because of space
 issues.

 I very strongly disagree.

 Neither swap or dump are mandatory for running Solaris.

I agree with you in this point. Actually, the posted
proposal count with dump to be optional.

I am sorry about the confusion - by minimum space required
I meant minimum disk space for installation, not minimum swap
or dump space.


 As this lower bound is not hard coded but dynamically calculated
 by the installer according to the size of bits to be installed,
 it reflects actual needs as far as necessary minimum space is
 required - it is currently ~4GiB.

 Which is unrealistically too high given that the actual amount of bits 
 that are put on disk by a minimal install.

Installer currently uses following formula for
calculating minimum required disk space:

min_size = image_size * 1.2 + MIN_SWAP_SPACE,

where MIN_SWAP_SPACE is 512MiB.


 However, the absolute minimum always includes minimum swap space,
 which is now 512 MiB. I think the algorithm might be modified,
 so that swap space is not created if space doesn't allow it,
 but to be honest I don't know if this is what we want to allow

 Why not ?  swap is not mandatory.

I agree - I am just thinking, if it is fine in general to allow
normal non-experienced user (who is the target audience for Slim
installer) to run system without swap. To be honest, I don't know,
since I am not very experienced in this area.
If people agree that this is not issue at all, I don't have any
objections against making swap optional.


 If there is enough space for the packages that will be installed but 
 not enough for swap or dump then the installation should proceed, it 
 just wouldn't create swap or dump.

Please see above.

Thank you,
Jan

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-06-30 Thread jan damborsky
Hi Mike,


Mike Gerdts wrote:
 On Wed, Jun 25, 2008 at 11:09 PM, Jan Damborsky [EMAIL PROTECTED] wrote:
 Thank you very much all for this valuable input.

 Based on the collected information, I would take
 following approach as far as calculating size of
 swap and dump devices on ZFS volumes in Caiman
 installer is concerned.

 [1] Following formula would be used for calculating
swap and dump sizes:

 size_of_swap = size_of_dump = MAX(512 MiB, MIN(physical_memory/2, 32 GiB))

 dump should scale with memory size, but the size given is completely
 overkill.  On very active (heavy kernel activity) servers with 300+ GB
 of RAM, I have never seen a (compressed) dump that needed more than 8
 GB.  Even uncompressed the maximum size I've seen has been in the 18
 GB range.  This has been without zfs in the mix.  It is my
 understanding that at one time the arc was dumped as part of kernel
 memory but that was regarded as a bug and has sense been fixed.  If
 the arc is dumped, a value of dump much closer to physical memory is
 likely to be appropriate.

I would agree that given the fact, user can customize this any time
after installation, the smaller upper bound is the better. Would
it be fine then to use 16 GiB, or even smaller one would be more
appropriate ?


 As an aside, does the dedicated dump on all machines make it so that
 savecore no longer runs by default?  It just creates a lot of extra
 I/O during boot (thereby slowing down boot after a crash) and uses a
 lot of extra disk space for those that will never look at a crash
 dump.  Those that actually use it (not the majority target audience
 for OpenSolaris, I would guess) will be able to figure out how to
 enable (the yet non-existent) svc:/system/savecore:default.


Looking at the savecore(1M) man pages, it seems that it is managed
by svc:/system/dumpadm:default. Looking at the installed system,
this service is online. If I understand correctly, you are recommending
to disable it by default ?

Thank you,
Jan

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-06-30 Thread Dave Miner

 I agree - I am just thinking, if it is fine in general to allow
 normal non-experienced user (who is the target audience for Slim
 installer) to run system without swap. To be honest, I don't know,
 since I am not very experienced in this area.
 If people agree that this is not issue at all, I don't have any
 objections against making swap optional.
 

Now that we don't have to reserve slices for it, making swap optional in 
the space calculation is fine.  We don't place any lower limits on 
memory, and it's just virtual memory, after all.  Besides which, we can 
infer that the system works well enough for the user's purposes without 
swap since the boot from the CD won't have used any swap.

Dave

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-06-30 Thread Mike Gerdts
On Mon, Jun 30, 2008 at 9:19 AM, jan damborsky [EMAIL PROTECTED] wrote:
 Hi Mike,


 Mike Gerdts wrote:

 On Wed, Jun 25, 2008 at 11:09 PM, Jan Damborsky [EMAIL PROTECTED]
 wrote:

 Thank you very much all for this valuable input.

 Based on the collected information, I would take
 following approach as far as calculating size of
 swap and dump devices on ZFS volumes in Caiman
 installer is concerned.

 [1] Following formula would be used for calculating
   swap and dump sizes:

 size_of_swap = size_of_dump = MAX(512 MiB, MIN(physical_memory/2, 32
 GiB))

 dump should scale with memory size, but the size given is completely
 overkill.  On very active (heavy kernel activity) servers with 300+ GB
 of RAM, I have never seen a (compressed) dump that needed more than 8
 GB.  Even uncompressed the maximum size I've seen has been in the 18
 GB range.  This has been without zfs in the mix.  It is my
 understanding that at one time the arc was dumped as part of kernel
 memory but that was regarded as a bug and has sense been fixed.  If
 the arc is dumped, a value of dump much closer to physical memory is
 likely to be appropriate.

 I would agree that given the fact, user can customize this any time
 after installation, the smaller upper bound is the better. Would
 it be fine then to use 16 GiB, or even smaller one would be more
 appropriate ?

By default, only kernel memory is dumped to the dump device.  Further,
this is compressed.  I have heard that 3x compression is common and
the samples that I have range from 3.51x - 6.97x.

If you refer to InfoDoc 228921 (contract only - can that be opened or
can a Sun employee get permission to post same info to an open wiki?)
you will see a method for approximating the size of a crash dump.  On
my snv_91 virtualbox instance (712 MB RAM configured), that method
gave me an estimated (uncompressed) crash dump size of about 450 MB.
I induced a panic to test the approximation.  In reality it was 323 MB
and compress(1) takes it down to 106 MB.  My understanding is that the
algorithm used in the kernel is a bit less aggressive than the
algorithm used by compress(1) so maybe figure 120 - 150 MB in this
case.  My guess is that this did not compress as well as my other
samples because on this smaller system a higher percentage of my
kernel pages were not full of zeros.

Perhaps the right size for the dump device is more like:

MAX(256 MiB, MIN(physical_memory/4, 16 GiB)

Further, dumpadm(1M) could be enhanced to resize the dump volume on
demand.  The size that it would choose would likely be based upon what
is being dumped (kernel, kernel+user, etc.), memory size, current
estimate using InfoDoc 228921 logic, etc.

 As an aside, does the dedicated dump on all machines make it so that
 savecore no longer runs by default?  It just creates a lot of extra
 I/O during boot (thereby slowing down boot after a crash) and uses a
 lot of extra disk space for those that will never look at a crash
 dump.  Those that actually use it (not the majority target audience
 for OpenSolaris, I would guess) will be able to figure out how to
 enable (the yet non-existent) svc:/system/savecore:default.


 Looking at the savecore(1M) man pages, it seems that it is managed
 by svc:/system/dumpadm:default. Looking at the installed system,
 this service is online. If I understand correctly, you are recommending
 to disable it by default ?

dumpadm -n is really the right way to do this.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-06-30 Thread Jeff Bonwick
 Neither swap or dump are mandatory for running Solaris.

Dump is mandatory in the sense that losing crash dumps is criminal.

Swap is more complex.  It's certainly not mandatory.  Not so long ago,
swap was typically larger than physical memory.  But in recent years,
we've essentially moved to a world in which paging is considered a bug.
Swap devices are often only a fraction of physical memory size now,
which raises the question of why we even bother.  On my desktop, which
has 16GB of memory, the default OpenSolaris swap partition is 2GB.
That's just stupid.  Unless swap space significantly expands the
amount of addressable virtual memory, there's no reason to have it.

There have been a number of good suggestions here:

(1) The right way to size the dump device is to let dumpadm(1M) do it
based on the dump content type.

(2) In a virtualized environment, a better way to get a crash dump
would be to snapshot the VM.  This would require a little bit
of host/guest cooperation, in that the installer (or dumpadm)
would have to know that it's operating in a VM, and the kernel
would need some way to notify the VM that it just panicked.
Both of these ought to be doable.

Jeff
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-06-30 Thread John Levon
On Mon, Jun 30, 2008 at 04:19:15PM -0700, Jeff Bonwick wrote:

 (2) In a virtualized environment, a better way to get a crash dump
 would be to snapshot the VM.  This would require a little bit
 of host/guest cooperation, in that the installer (or dumpadm)
 would have to know that it's operating in a VM, and the kernel
 would need some way to notify the VM that it just panicked.
 Both of these ought to be doable.

This is trivial with xVM, btw: just make the panic routine call
HYPERVISOR_shutdown(SHUTDOWN_crash); and dom0 will automatically create a
full crash dump for the domain, which is readably directly in MDB.

As a refinement you might want to only do this if a (suitable) place to
crash dump isn't available.

regards
john
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-06-27 Thread Robert Milkowski
Hello Mike,

Wednesday, June 25, 2008, 9:36:16 PM, you wrote:

MG On Wed, Jun 25, 2008 at 3:09 PM, Robert Milkowski [EMAIL PROTECTED] wrote:
 Well, I've seen core dumps bigger than 10GB (even without ZFS)... :)

MG Was that the size in the dump device or the size in /var/crash?  If it
MG was the size in /var/crash, divide that by the compress ratio reported
MG on the console after the dump completed.


good poin - it was file size in /var/crash so uncompressed.

-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-06-25 Thread Jan Damborsky
Thank you very much all for this valuable input.

Based on the collected information, I would take
following approach as far as calculating size of
swap and dump devices on ZFS volumes in Caiman
installer is concerned.

[1] Following formula would be used for calculating
swap and dump sizes:

size_of_swap = size_of_dump = MAX(512 MiB, MIN(physical_memory/2, 32 GiB))

User can reconfigure this after installation is done on live
system by zfs set command.

[2] dump device will be considered optional

dump device will be created only if there is appropriate
space available on disk provided.

Minimum disk space required will not take into account
dump device, thus allowing user to install on small disks.

Recommended disk size (which now covers one full upgrade plus
2GiB space for additional software) will take into account dump
device as well. Dump device will be then created if user dedicates
at least recommended disk space for installation.

Please feel free to correct me, if I misunderstood some point.

Thank you very much again,
Jan


Dave Miner wrote:
 Peter Tribble wrote:
 On Tue, Jun 24, 2008 at 8:27 PM, Dave Miner [EMAIL PROTECTED] wrote:
 Keith Bierman wrote:
 A lot of developers use VMs of one sort or another these days, and
 few of them use jumpstart (especially when the entire point of the
 exercise is to get their feet wet on new platforms, or new versions
 of old platforms).

 Perhaps I travel in the wrong circles these days.
 All they'd have to do under my suggested solutoin is make the virtual
 disk large enough to get a dump pool created automatically.  Our
 recommended sizing would encompass that.

 So remind me again - what is our recommended sizing? (Especially
 in the light of this discussion.)



 Dynamically calculated based info recorded in the image.

 http://src.opensolaris.org/source/xref/caiman/slim_source/usr/src/lib/liborchestrator/perform_slim_install.c#om_get_recommended_size
  


 It's in the 4+ GB range right now.

 Dave

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-06-25 Thread Darren J Moffat
Jan Damborsky wrote:
 Thank you very much all for this valuable input.
 
 Based on the collected information, I would take
 following approach as far as calculating size of
 swap and dump devices on ZFS volumes in Caiman
 installer is concerned.
 
 [1] Following formula would be used for calculating
 swap and dump sizes:
 
 size_of_swap = size_of_dump = MAX(512 MiB, MIN(physical_memory/2, 32 GiB))
 
 User can reconfigure this after installation is done on live
 system by zfs set command.

If the min space isn't available do NOT abort the install just continue 
without creating swap space, but put a small warning somewhere suitable.
Don't ask the user to confirm this and don't make a big deal about it.

-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-06-25 Thread Mike Gerdts
On Wed, Jun 25, 2008 at 11:09 PM, Jan Damborsky [EMAIL PROTECTED] wrote:
 Thank you very much all for this valuable input.

 Based on the collected information, I would take
 following approach as far as calculating size of
 swap and dump devices on ZFS volumes in Caiman
 installer is concerned.

 [1] Following formula would be used for calculating
swap and dump sizes:

 size_of_swap = size_of_dump = MAX(512 MiB, MIN(physical_memory/2, 32 GiB))

dump should scale with memory size, but the size given is completely
overkill.  On very active (heavy kernel activity) servers with 300+ GB
of RAM, I have never seen a (compressed) dump that needed more than 8
GB.  Even uncompressed the maximum size I've seen has been in the 18
GB range.  This has been without zfs in the mix.  It is my
understanding that at one time the arc was dumped as part of kernel
memory but that was regarded as a bug and has sense been fixed.  If
the arc is dumped, a value of dump much closer to physical memory is
likely to be appropriate.

As an aside, does the dedicated dump on all machines make it so that
savecore no longer runs by default?  It just creates a lot of extra
I/O during boot (thereby slowing down boot after a crash) and uses a
lot of extra disk space for those that will never look at a crash
dump.  Those that actually use it (not the majority target audience
for OpenSolaris, I would guess) will be able to figure out how to
enable (the yet non-existent) svc:/system/savecore:default.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-06-25 Thread Mike Gerdts
On Wed, Jun 25, 2008 at 3:09 PM, Robert Milkowski [EMAIL PROTECTED] wrote:
 Well, I've seen core dumps bigger than 10GB (even without ZFS)... :)

Was that the size in the dump device or the size in /var/crash?  If it
was the size in /var/crash, divide that by the compress ratio reported
on the console after the dump completed.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-06-25 Thread Mike Gerdts
On Wed, Jun 25, 2008 at 3:36 PM, Mike Gerdts [EMAIL PROTECTED] wrote:
 On Wed, Jun 25, 2008 at 3:09 PM, Robert Milkowski [EMAIL PROTECTED] wrote:
 Well, I've seen core dumps bigger than 10GB (even without ZFS)... :)

 Was that the size in the dump device or the size in /var/crash?  If it
 was the size in /var/crash, divide that by the compress ratio reported
 on the console after the dump completed.

I just did some digging for real life examples.  Here are some
extremes.  The first one is extreme in size and the second one is
extreme in compression ratio.  All of my samples (~20) had compression
ratios that ranged from 3.51 to 6.97.

100% done: 1946979 pages dumped, compression ratio 4.01, dump succeeded
100% done: 501196 pages dumped, compression ratio 6.97, dump succeeded

$ echo '1946979 * 8 / 1024 /1024 / 4.01' | bc -l
3.70430696634877649625
$ echo '501196 * 8 / 1024 /1024 / 6.97' | bc -l
.54861148084424318507

The first one is 14.8 GB uncompressed, but wrote 3.7 GB to dump.  The
second one was 3.8 GB uncompressed but wrote 0.5 GB to dump.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-06-25 Thread Robert Milkowski
Hello Mike,

Wednesday, June 25, 2008, 2:09:31 PM, you wrote:


MG dump should scale with memory size, but the size given is completely
MG overkill.  On very active (heavy kernel activity) servers with 300+ GB
MG of RAM, I have never seen a (compressed) dump that needed more than 8
MG GB.  Even uncompressed the maximum size I've seen has been in the 18
MG GB range.  This has been without zfs in the mix.  It is my
MG understanding that at one time the arc was dumped as part of kernel
MG memory but that was regarded as a bug and has sense been fixed.  If
MG the arc is dumped, a value of dump much closer to physical memory is
MG likely to be appropriate.

Well, I've seen core dumps bigger than 10GB (even without ZFS)... :)



-- 
Best regards,
 Robert Milkowski   mailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-06-25 Thread Robert Milkowski
Hello Darren,

Wednesday, June 25, 2008, 1:19:53 PM, you wrote:

DJM Jan Damborsky wrote:
 Thank you very much all for this valuable input.
 
 Based on the collected information, I would take
 following approach as far as calculating size of
 swap and dump devices on ZFS volumes in Caiman
 installer is concerned.
 
 [1] Following formula would be used for calculating
 swap and dump sizes:
 
 size_of_swap = size_of_dump = MAX(512 MiB, MIN(physical_memory/2, 32 GiB))
 
 User can reconfigure this after installation is done on live
 system by zfs set command.

DJM If the min space isn't available do NOT abort the install just continue
DJM without creating swap space, but put a small warning somewhere suitable.
DJM Don't ask the user to confirm this and don't make a big deal about it.


Yeah, I've just tried to install snv_91 on a 16GB CF card with ZFS as
root file system.. and I couldn't because it wanted almost 40GB of
disk space and I couldn't overwrite it. I know it is not Caiman. Such
a behavior is irritating.


-- 
Best regards,
 Robert Milkowskimailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-06-24 Thread jan damborsky
Hi Lori,


Lori Alt wrote:
 Richard Elling wrote:
 Hi Jan, comments below...

 jan damborsky wrote:
   
 Hi folks,

 I am member of Solaris Install team and I am currently working
 on making Slim installer compliant with ZFS boot design specification:

 http://opensolaris.org/os/community/arc/caselog/2006/370/commitment-materials/spec-txt/

 After ZFS boot project was integrated into Nevada and support
 for installation on ZFS root delivered into legacy installer,
 some differences occurred between how Slim installer implements
 ZFS root and how it is done in legacy installer.

 One part is that we need to change in Slim installer is to create
 swap  dump on ZFS volume instead of utilizing UFS slice for this
 as defined in design spec and implemented in SXCE installer.

 When reading through the specification and looking at SXCE
 installer source code, I have realized some points are not quite
 clear to me.

 Could I please ask you to help me clarify them in order to
 follow the right way as far as implementation of that features
 is concerned ?

 Thank you very much,
 Jan


 [i] Formula for calculating dump  swap size
 

 I have gone through the specification and found that
 following formula should be used for calculating default
 size of swap  dump during installation:

 o size of dump: 1/4 of physical memory
   
 

 This is a non-starter for systems with 1-4 TBytes of physical
 memory.  There must be a reasonable maximum cap, most
 likely based on the size of the pool, given that we regularly
 boot large systems from modest-sized disks.
 Actually, starting with build 90, the legacy installer sets the 
 default size of the
 swap and dump zvols to half the size of physical memory, but no more
 then 32 GB and no less than 512 MB.   Those are just the defaults.
 Administrators can use the zfs command to modify the volsize
 property of both the swap and dump zvols (to any value, including
 values larger than 32 GB).

Agreed - the formula [i] is mentioned in PSARC document, but
the implementation I was investigating by looking at latest
SXCE installer code is exactly what you are describing here.

Since that calculation is part of PSARC, I assumed that every
implementation of ZFS root should follow this in order to be
fully compliant with ZFS root design ?




 o size of swap: max of (512MiB, 1% of rpool size)

 However, looking at the source code, SXCE installer
 calculates default sizes using slightly different
 algorithm:

 size_of_swap = size_of_dump = MAX(512 MiB, MIN(physical_memory/2, 32 GiB))

 Are there any preferences which one should be used or is
 there any other possibility we might take into account ?
   
 

 zero would make me happy :-)  But there are some cases where swap
 space is preferred.  Again, there needs to be a reasonable cap.  In
 general, the larger the system, the less use for swap during normal
 operations, so for most cases there is no need for really large swap
 volumes.  These can also be adjusted later, so the default can be
 modest.  One day perhaps it will be fully self-adjusting like it is
 with other UNIX[-like] implementations.

   
 [ii] Procedure of creating dump  swap
 --

 Looking at the SXCE source code, I have discovered that following
 commands should be used for creating swap  dump:

 o swap
 # /usr/sbin/zfs create -b PAGESIZE -V size_in_mbm rpool/swap
 # /usr/sbin/swap -a /dev/zvol/dsk/rpool/swap

 o dump
 # /usr/sbin/zfs create -b 128*1024 -V size_in_mbm rpool/dump
 # /usr/sbin/dumpadm -d /dev/zvol/dsk/rpool/dump

 
 The above commands for creating the swap and dump zvols match
 what the legacy installer does, as of build 90.

ok - Then I will use this implementation also in Slim installer.


 Could you please let me know, if my observations are correct
 or if I should use different approach ?

 As far as setting of volume block size is concerned (-b option),
 how that numbers are to be determined ? Will they be the same in
 different scenarios or are there plans to tune them in some way
 in future ?
 
 There are no plans to tune this.  The block sizes are appropriate
 for the way the zvols are to be used.

I see - thanks for clarification.


   
 

 Setting the swap blocksize to pagesize is interesting, but should be
 ok for most cases.  The reason I say it is interesting is because it
 is optimized for small systems, but not for larger systems which
 typically see more use of large page sizes.  OTOH larger systems
 should not swap, so it is probably a non-issue for them.  Small
 systems should see this as the best solution.

 Dump just sets the blocksize to the default, so it is a no-op.
  -- richard

   
 [iii] Is there anything else I should be aware of ?
 ---
   
 

 Installation should *not* fail due to running out of space because
 of large dump or swap allocations.  I think the algorithm should
 first 

Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-06-24 Thread Mike Gerdts
On Mon, Jun 23, 2008 at 11:58 AM, Lori Alt [EMAIL PROTECTED] wrote:
 The Caiman team can make their own decision here, but we
 decided to be more hard-nosed about disk space requirements in the
 legacy install.  If the pool is too small to accommodate the recommended
 swap and dump zvols, then maybe this system isn't a good candidate for
 a zfs root pool.  Basically, we decided that since you almost
 can't buy disks smaller than 60 GB these days, it's not worth much
 effort to facilitate the setup of zfs root pools on disks that are smaller
 than that.  If you really need to do so, Jumpstart can be used to
 set the dump and swap sizes to whatever you like, at the time
 of initial install.

This is extremely bad for virtualized environments.  If I have a
laptop with 150 GB disk, a dual core processor, and 4 GB of RAM I
would expect that I should have plenty of room to install 10+ virtual
machines, and be able to run up to 2 - 4 of them at a time.  Requiring
60 GB would mean that I could only install 2 virtual machines - which
is on par with what I was doing with my previous laptop that had a 30
GB disk.

The same argument can be made for VMware, LDoms, Xen, etc., but those
are much more likely to use jumpstart for installations than
laptop-based VM's.


-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-06-24 Thread Lori Alt



Mike Gerdts wrote:


On Mon, Jun 23, 2008 at 11:58 AM, Lori Alt [EMAIL PROTECTED] wrote:
 


The Caiman team can make their own decision here, but we
decided to be more hard-nosed about disk space requirements in the
legacy install.  If the pool is too small to accommodate the recommended
swap and dump zvols, then maybe this system isn't a good candidate for
a zfs root pool.  Basically, we decided that since you almost
can't buy disks smaller than 60 GB these days, it's not worth much
effort to facilitate the setup of zfs root pools on disks that are smaller
than that.  If you really need to do so, Jumpstart can be used to
set the dump and swap sizes to whatever you like, at the time
of initial install.
   



This is extremely bad for virtualized environments.  If I have a
laptop with 150 GB disk, a dual core processor, and 4 GB of RAM I
would expect that I should have plenty of room to install 10+ virtual
machines, and be able to run up to 2 - 4 of them at a time.  Requiring
60 GB would mean that I could only install 2 virtual machines - which
is on par with what I was doing with my previous laptop that had a 30
GB disk.

The same argument can be made for VMware, LDoms, Xen, etc., but those
are much more likely to use jumpstart for installations than
laptop-based VM's.

 


This is a good point.  Perhaps at some point we should add back the
capability of overriding the default swap/dump sizes in the interactive
install.  However, swap can't always be reduced by much.  The default swap
sizes we chose were not totally arbitrary.  But of course, environments 
differ

widely.  In some environments, it's probably reasonable to run with little
or no swap. 


Right now, you have two options to override the default and dump sizes:
use Jumpstart to do the install, or modify the sizes of the swap and dump
zvols after the install completes (using the zfs set command to modify the
volsize)

The Caiman team may wish to offer more configurability in this
regard in their install procedure.

Lori
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-06-24 Thread Richard Elling
jan damborsky wrote:
 Hi Lori,


 Lori Alt wrote:

 The Caiman team can make their own decision here, but we
 decided to be more hard-nosed about disk space requirements in the
 legacy install.  If the pool is too small to accommodate the recommended
 swap and dump zvols, then maybe this system isn't a good candidate for
 a zfs root pool.  Basically, we decided that since you almost
 can't buy disks smaller than 60 GB these days, it's not worth much
 effort to facilitate the setup of zfs root pools on disks that are 
 smaller
 than that.  If you really need to do so, Jumpstart can be used to
 set the dump and swap sizes to whatever you like, at the time
 of initial install.

 I would agree with you as far as internal disks are concerned.
 However, since Slim installer also allows to install for example
 on USB sticks, which are smaller, the minimum required space might
 be the issue.

With ZFS, the actual space used is difficult to predict, so there
should be some leeway allowed.  For USB sticks, I'm generally
using compression and copies=2, both of which radically change
the actual space used.  It is unlikely that we can install 5 lbs of
flour in a 1 lb bag, but may not be impossible.


 Do we need to create two separate volumes for swap and dump or
 might be one ZFS volume enough which then would be shared by both
 swap and dump ?

IMHO, you can make dump optional, with no dump being default. 
Before Sommerfeld pounces on me (again :-), let me defend myself:
the vast majority of people will never get a core dump and if
they did, they wouldn't know what to do with it.  We will just end
up wasting a bunch of space.  As Solaris becomes more popular, this
problem becomes bigger.  OTOH, people who actually care about
core dumps can enable them quite easily. WWMSD?
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-06-24 Thread Dave Miner
Lori Alt wrote:
 
 
 Mike Gerdts wrote:
 On Mon, Jun 23, 2008 at 11:58 AM, Lori Alt [EMAIL PROTECTED] wrote:
   
 The Caiman team can make their own decision here, but we
 decided to be more hard-nosed about disk space requirements in the
 legacy install.  If the pool is too small to accommodate the recommended
 swap and dump zvols, then maybe this system isn't a good candidate for
 a zfs root pool.  Basically, we decided that since you almost
 can't buy disks smaller than 60 GB these days, it's not worth much
 effort to facilitate the setup of zfs root pools on disks that are smaller
 than that.  If you really need to do so, Jumpstart can be used to
 set the dump and swap sizes to whatever you like, at the time
 of initial install.
 

 This is extremely bad for virtualized environments.  If I have a
 laptop with 150 GB disk, a dual core processor, and 4 GB of RAM I
 would expect that I should have plenty of room to install 10+ virtual
 machines, and be able to run up to 2 - 4 of them at a time.  Requiring
 60 GB would mean that I could only install 2 virtual machines - which
 is on par with what I was doing with my previous laptop that had a 30
 GB disk.

 The same argument can be made for VMware, LDoms, Xen, etc., but those
 are much more likely to use jumpstart for installations than
 laptop-based VM's.

   
 This is a good point.  Perhaps at some point we should add back the
 capability of overriding the default swap/dump sizes in the interactive
 install.  However, swap can't always be reduced by much.  The default swap
 sizes we chose were not totally arbitrary.  But of course, environments 
 differ
 widely.  In some environments, it's probably reasonable to run with little
 or no swap. 
 
 Right now, you have two options to override the default and dump sizes:
 use Jumpstart to do the install, or modify the sizes of the swap and dump
 zvols after the install completes (using the zfs set command to modify the
 volsize)
 
 The Caiman team may wish to offer more configurability in this
 regard in their install procedure.
 

I doubt we'd have interest in providing more configurability in the 
interactive installer.  As Richard sort of points out subsequently, most 
people wouldn't know what to do here, anyway, and the ones who do 
usually use automated provisioning like Jumpstart, where we can provide 
those options.

That said, I'd like to default to having a dump volume when space 
allows, so that we are in a position to gather crash dumps, since 
reproducing them is usually not easy, and almost always undesirable. 
It'd be lower priority than having enough space for 2-3 BE's plus swap, 
so might be automatically dropped when space is less than that.

Dave
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-06-24 Thread Bill Sommerfeld
On Tue, 2008-06-24 at 09:41 -0700, Richard Elling wrote:
 IMHO, you can make dump optional, with no dump being default. 
 Before Sommerfeld pounces on me (again :-))

actually, in the case of virtual machines, doing the dump *in* the
virtual machine into preallocated virtual disk blocks is silly.  if you
can break the abstraction barriers a little, I'd think it would make
more sense for the virtual machine infrastructure to create some sort of
snapshot at the time of failure which could then be converted into a
form that mdb can digest...

- Bill






___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-06-24 Thread Dave Miner
Keith Bierman wrote:
 On Jun 24, 2008, at 11:01 AM, Dave Miner wrote:
 
 I doubt we'd have interest in providing more configurability in the
 interactive installer.  As Richard sort of points out subsequently,  
 most
 people wouldn't know what to do here, anyway, and the ones who do
 usually use automated provisioning like Jumpstart, where we can  
 provide
 those options.
 
 
 A lot of developers use VMs of one sort or another these days, and  
 few of them use jumpstart (especially when the entire point of the  
 exercise is to get their feet wet on new platforms, or new versions  
 of old platforms).
 
 Perhaps I travel in the wrong circles these days.
 

All they'd have to do under my suggested solutoin is make the virtual 
disk large enough to get a dump pool created automatically.  Our 
recommended sizing would encompass that.

I do like Bill's suggestion of getting VM's to snapshot the VM on panic, 
though.

Dave
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-06-23 Thread Lori Alt

Richard Elling wrote:

Hi Jan, comments below...

jan damborsky wrote:
  

Hi folks,

I am member of Solaris Install team and I am currently working
on making Slim installer compliant with ZFS boot design specification:

http://opensolaris.org/os/community/arc/caselog/2006/370/commitment-materials/spec-txt/

After ZFS boot project was integrated into Nevada and support
for installation on ZFS root delivered into legacy installer,
some differences occurred between how Slim installer implements
ZFS root and how it is done in legacy installer.

One part is that we need to change in Slim installer is to create
swap  dump on ZFS volume instead of utilizing UFS slice for this
as defined in design spec and implemented in SXCE installer.

When reading through the specification and looking at SXCE
installer source code, I have realized some points are not quite
clear to me.

Could I please ask you to help me clarify them in order to
follow the right way as far as implementation of that features
is concerned ?

Thank you very much,
Jan


[i] Formula for calculating dump  swap size


I have gone through the specification and found that
following formula should be used for calculating default
size of swap  dump during installation:

o size of dump: 1/4 of physical memory
  



This is a non-starter for systems with 1-4 TBytes of physical
memory.  There must be a reasonable maximum cap, most
likely based on the size of the pool, given that we regularly
boot large systems from modest-sized disks.
Actually, starting with build 90, the legacy installer sets the default 
size of the

swap and dump zvols to half the size of physical memory, but no more
then 32 GB and no less than 512 MB.   Those are just the defaults.
Administrators can use the zfs command to modify the volsize
property of both the swap and dump zvols (to any value, including
values larger than 32 GB).




o size of swap: max of (512MiB, 1% of rpool size)

However, looking at the source code, SXCE installer
calculates default sizes using slightly different
algorithm:

size_of_swap = size_of_dump = MAX(512 MiB, MIN(physical_memory/2, 32 GiB))

Are there any preferences which one should be used or is
there any other possibility we might take into account ?
  



zero would make me happy :-)  But there are some cases where swap
space is preferred.  Again, there needs to be a reasonable cap.  In
general, the larger the system, the less use for swap during normal
operations, so for most cases there is no need for really large swap
volumes.  These can also be adjusted later, so the default can be
modest.  One day perhaps it will be fully self-adjusting like it is
with other UNIX[-like] implementations.

  

[ii] Procedure of creating dump  swap
--

Looking at the SXCE source code, I have discovered that following
commands should be used for creating swap  dump:

o swap
# /usr/sbin/zfs create -b PAGESIZE -V size_in_mbm rpool/swap
# /usr/sbin/swap -a /dev/zvol/dsk/rpool/swap

o dump
# /usr/sbin/zfs create -b 128*1024 -V size_in_mbm rpool/dump
# /usr/sbin/dumpadm -d /dev/zvol/dsk/rpool/dump



The above commands for creating the swap and dump zvols match
what the legacy installer does, as of build 90.


Could you please let me know, if my observations are correct
or if I should use different approach ?

As far as setting of volume block size is concerned (-b option),
how that numbers are to be determined ? Will they be the same in
different scenarios or are there plans to tune them in some way
in future ?


There are no plans to tune this.  The block sizes are appropriate
for the way the zvols are to be used.

  



Setting the swap blocksize to pagesize is interesting, but should be
ok for most cases.  The reason I say it is interesting is because it
is optimized for small systems, but not for larger systems which
typically see more use of large page sizes.  OTOH larger systems
should not swap, so it is probably a non-issue for them.  Small
systems should see this as the best solution.

Dump just sets the blocksize to the default, so it is a no-op.
 -- richard

  

[iii] Is there anything else I should be aware of ?
---
  



Installation should *not* fail due to running out of space because
of large dump or swap allocations.  I think the algorithm should
first take into account the space available in the pool after accounting
for the OS.


  

The Caiman team can make their own decision here, but we
decided to be more hard-nosed about disk space requirements in the
legacy install.  If the pool is too small to accommodate the recommended
swap and dump zvols, then maybe this system isn't a good candidate for
a zfs root pool.  Basically, we decided that since you almost
can't buy disks smaller than 60 GB these days, it's not worth much
effort to facilitate the setup of zfs root pools on disks that are smaller
than