Re: [zfs-discuss] ZFS on 32bit x86

2006-06-23 Thread Casper . Dik

AMD Geodes are 32-bit only. I haven't heard any mention that they will 
_ever_ be 64-bit.  But, honestly, this and the Via chip aren't really 
ever going to be targets for Solaris. That is,  they simply aren't (any 
substantial) part of the audience we're trying to reach with Solaris x86.

I'm not sure that we want to limit the Solaris target in that way; we
want laptops, small embedded systems as well as big iron.  The more
systems Solaris runs on the bigger the eco system.  If it means no
high thruput ZFS, then that is fine with me and certainly I would not
prioritize this.

And we don't want 1U systems; we want mini-ITX, nearly silent systems
which can fit in cars or which can be easily hidden away.

The price is not the objection; it's the form factor.

Casper
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on 32bit x86

2006-06-23 Thread Darren J Moffat

Erik Trimble wrote:

Artem Kachitchkine wrote:


AMD Geodes are 32-bit only. I haven't heard any mention that they 
will _ever_ be 64-bit.  But, honestly, this and the Via chip aren't 
really ever going to be targets for Solaris. That is,  they simply 
aren't (any substantial) part of the audience we're trying to reach 
with Solaris x86.


Didn't know our audience was made up of CPUs :) I like to think (when 
in a good mood) that we are trying to reach creative people who can 
take OpenSolaris where Sun haven't imagined or been able to.


-Artem.

Then let those folks fix the problem. The issue here is what amount of 
effort _Sun_ can put into fixing 32-bit Solaris so as to enable ZFS to 
comfortably run on it.


This is an @opensolaris.org alias it is about working together as a 
community and identifying problems and discovering solutions.  I don't 
think it is at all appropriate to bring up Sun business choices here. 
Where that is appropriate is when Sun employees need to justify to their 
manager what they are working on.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] recommended hardware for a zfs/nfs NAS?

2006-06-23 Thread Dick Davies

I was wondering if anyone could recommend hardware
forr a ZFS-based NAS for home use.

The 'zfs on 32-bit' thread has scared me of a mini-itx fanless
setup, so I'm looking at sparc or opteron. Ideally it would:

a) run quiet (blade 100/150 is ok, x4100 ain't :) )
b) take advantage of cheap disks
  ( ide/sata, unless scsi suddenly got affordable)
c) come in around the 300-400 pounds mark

Don't need massive storage, it just needs to be reliable and reasonably
fast - I was thinking of maybe a 2-way 100Gb mirror set.

Graphics are a complete non-issue.
It only needs to saturate 100mbit (I'm not planning to use it for
anything else, so CPU isn't important).

Any used sun systems fit the bill, or should I be thinking of rolling
my own opteron? Thanks!

--
Rasputin :: Jack of All Trades - Master of Nuns
http://number9.hellooperator.net/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS and Virtualization

2006-06-23 Thread Dagobert Michelsen
Hello Nate,
  I have few issues about ZFS and virtualization:
  
  [b]Virtualization and performance[/b]
  When filesystem traffic occurs on a zpool
 containing
  only spindles dedicated to this zpool i/o can be
  distributed evenly. When the zpool is located on a
  lun sliced from a raid group shared by multiple
  systems the capability of doing i/o from this
 zpool
  will be limited. Avoiding or limiting i/o to this
 lun
  until the load from the other systems decreases
 would
  overall help performance for the local zpool.
  I heard some rumors recently about using SMI-S to
  de-virtualize the traffic and allow Solaris to
 peek
  through the virtualization layers thus optimizing
 i/o
  target selection. Maybe someone has some rumors to
  add ;-)
  Virtualization with 6920 has been briefly
 discussed
  at
 
 http://www.opensolaris.org/jive/thread.jspa?messageID=14984#14984
 but without conclusion or
  recommendations.
 
 I don't know the answer, but: Wouldn't the overhead
 of using SMI-S, or some other method, to determine
 the load on the raid group from the storage array,
 negate any potential I/O benefits you could gain?
 Avoiding or limiting I/O to a heavily used LUN in
 your zpool would reduce the number of spindles in
 your zpool thus reducing aggregate throughput
  anyway(?).

Yes, you may be right on this. The current implementation
limiting outstanding i/o operations per lun now seems
more appropriate to me too.

 Storage array layout best practices suggest, if at
 all possible, to limit the number of LUNs you create
 from a raid group.  Exactly because of the I/O
 limitations that you mention.  

This is basically true, however in virtualized environment
you can not always ensure this because of the complexity.
You have spindles distributed across a raid group, sliced luns
from the raid group, virtualized them e.g. with 6920 and
distributed the virtualized luns possibly to different hosts
or zpools. Knowing which luns lead to which spindle might
help to optimize vdev selection.

 I can understand building the smarts into ZFS to
 handle multipath LUNs (LUNs presented out of more
 than one controller on the array, active-active
 configurations, not simply dual-fabric multipathing)
 and load balance that way.  Does ZFS simply take
 advantage of MPxIO in Solaris for multipathing/load
 balancing or are there plans to build support for it
 into the file system itself?

This has already been discussed at
http://www.opensolaris.org/jive/thread.jspa?messageID=44278#44159
and
http://www.opensolaris.org/jive/thread.jspa?messageID=19248#19248


  [b]Volume mobility[/b]
  One of the major advantages of zfs is sharing of
 the
  zpool capacity between filesystems. I often run
  application in small application containers
 located
  on separate luns which are zoned to several hosts
 so
  they can be run on different hosts. The idea
 behind
  this is failover, testing and load adjustment.
  Because only complete zpools can be migrated
 capacity
  sharing between movable containers is currently
  impossible.
  Are there any plans to allow zpools to be
  concurrently shareable between hosts?
 
 Clarification, you're not asking for shared file
 system behaviour are you?

No. A shared filesystem is concurrently mountable on multiple servers at the 
same time. I was thinking of mounting [i]different[/i] filesystems from the 
same pool on different servers, so each filesystem is mounted at most on one 
server at the same time.

 Multiple systems zoned to
 see the same LUNs and simultaneously reading/writing
 to them?

Yes, the LUNs must be visible to each host and simultaneaous writing will occur.

 but I assume if you coordinated which server had
 ownership of a zpool, there would be nothing from
 stopping you from creating a zpool on servera with a
 set of LUNs, creating your zfs file systems within
 the pools, zoning the same set of LUNs to one or more
 other servers, and then coordinating who has
 ownership of the zpool.

This works out of the box with 'zpool export' and 'zpool import'

 Ex: You're testing an application/data installed on a
 ZFS file system on a 32-bit server (x86) system, then
 you want to test it on an Opteron.  So you zone the
 LUNs to the Opteron and stop using the zpool on the
 32-bit server and use it on the Opteron  I may be
 completely incorrect about the above.

This, too, works already out of the box.

 Other than that scenario, I think your questions fit
 more closely to the shared file system topic that I
 brought up originally.

Do you mean 
http://www.opensolaris.org/jive/click.jspa?searchID=98699messageID=16480 ?

 Still if you had production
 data in a ZFS file system in your pool as well as
 test data in a separate ZFS file system also using
 the same pool (your application container) the
 disks making up your common pool would still have to
 be visible to multiple servers and you probably would
 want to limit exposure to the other ZFS file systems
 within that pool on the 

[zfs-discuss] http://www.opensolaris.org/os/community/zfs/version/3

2006-06-23 Thread Chris Gerhard
having just upgraded to nv42 zpool status tells me I need to upgrade the ondisk 
version.

zpool version points me at 
http://www.opensolaris.org/os/community/zfs/version/3 :

: sigma TS 6 $; zpool upgrade -v
This system is currently running ZFS version 3.

The following versions are suppored:

VER  DESCRIPTION
---  
 1   Initial ZFS version
 2   Ditto blocks (replicated metadata)
 3   Hot spares and double parity RAID-Z

For more information on a particular version, including supported releases, see:
http://www.opensolaris.org/os/community/zfs/version/N

Where 'N' is the version number.
: sigma TS 7 $; wget http://www.opensolaris.org/os/community/zfs/version/3
--11:14:30--  http://www.opensolaris.org/os/community/zfs/version/3
   = `3'
Resolving www.opensolaris.org... 72.5.124.63
Connecting to www.opensolaris.org|72.5.124.63|:80... connected.
HTTP request sent, awaiting response... 404
11:14:30 ERROR 404: (no description).

: sigma TS 8 $;

When will this page appear?  It would aslo be kind of neat if the page 
http://www.opensolaris.org/os/community/zfs/version/N existed so that the link 
from the zpool command above would work when opened perhaps giving an index of 
all the versions.

--chris
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: where has all my space gone? (with zfs mountroot + b38)

2006-06-23 Thread James C. McPherson

Mark Shellenbaum wrote:
...
So we have a bunch of stuff in the in-core delete queue, but no threads 
to process them.  The fact that we don't have the threads is related to 
the bug that Tabriz is working on.


Hi Mark,
after installing your fixes from three days ago and (cough!) ensuring
that my boot archive contained them, I then spent the next 7 or so
hours waiting for the delete queue to be flushed.

In that time my root disk (a Maxtor) decided it didn't like me much (I
was asking it to do too much io) so zfs paniced... then a few single-
user boots later (where each time the boot process was stuck in the
fs-usr service, flushing the queue) and I'm finally back to having the
disk space that I think I should have.

My one remaining concern is that I'm not sure that I've got all my
zfs bits totally sync'd with my kernel so I'll be bfuing again tomorrow
just to make sure.


Thanks for your help with this, I reallyreally appreciate it.


best regards,
James C. McPherson
--
Solaris Datapath Engineering
Data Management Group
Sun Microsystems
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: Re: [zfs-discuss] 15 minute fdsync problem and ZFS: Solved

2006-06-23 Thread Joe Little

On 6/23/06, Roch [EMAIL PROTECTED] wrote:


Joe Little writes:
  On 6/22/06, Bill Moore [EMAIL PROTECTED] wrote:
   Hey Joe.  We're working on some ZFS changes in this area, and if you
   could run an experiment for us, that would be great.  Just do this:
  
   echo 'zil_disable/W1' | mdb -kw
  
   We're working on some fixes to the ZIL so it won't be a bottleneck when
   fsyncs come around.  The above command will let us know what kind of
   improvement is on the table.  After our fixes you could get from 30-80%
   of that improvement, but this would be a good data point.  This change
   makes ZFS ignore the iSCSI/NFS fsync requests, but we still push out a
   txg every 5 seconds.  So at most, your disk will be 5 seconds out of
   date compared to what it should be.  It's a pretty small window, but it
   all depends on your appetite for such windows.  :)
  
   After running the above command, you'll need to unmount/mount the
   filesystem in order for the change to take effect.
  
   If you don't have time, no big deal.
  
  
   --Bill
  
  
   On Thu, Jun 22, 2006 at 04:22:22PM -0700, Joe Little wrote:
On 6/22/06, Jeff Bonwick [EMAIL PROTECTED] wrote:
 a test against the same iscsi targets using linux and XFS and the
 NFS server implementation there gave me 1.25MB/sec writes. I was about
 to throw in the towel and deem ZFS/NFS has unusable until B41 came
 along and at least gave me 1.25MB/sec.

That's still super slow -- is this over a 10Mb link or something?

Jeff

I  think the performance is   in line with expectation  for,
small  file,single  threaded, open/write/close   NFS
workload (nfs must commit on close). Therefore I expect :

(avg file size) / (I/O latency).

Joe does this formula approach the 1.25 MB/s ?



To this day, I still don't know how to calculate the i/o latency.
Average file size is always expected to be close to kernel page size
for NASes -- 4-8k. Always tune for that.






   
Nope, gig-e link (single e1000g, or aggregate, doesn't matter) to the
iscsi target, and single gig-e link (nge) to the NFS clients, who are
gig-e. Sun Ultra20 or AMD Quad Opteron, again with no difference.
   
Again, the issue is the multiple fsyncs that NFS requires, and likely
the serialization of those iscsi requests. Apparently, there is a
basic latency in iscsi that one could improve upon with FC, but we are
definitely in the all ethernet/iscsi camp for multi-building storage
pool growth and don't have interest in a FC-based SAN.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  
 
  Well, following Bill's advice and the previous note on disabling zil,
  I ran my test on a B38 opteron initiator and if you do a time on the
  copy from the client, 6250 8k files transfer at 6MB/sec now. If you
  watch the entire commit on the backend using zpool iostat 1 I see
  that it takes a few more seconds, and the actual rate there is
  4MB/sec. Beats my best of 1.25MB/sec, and this is not B41.
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Joe, you know this but for the benefit of  others, I have to
highlight that running  any NFS server  this way, may cause
silent data corruption from client's point of view.

Whenever a server keeps  data in RAM this  way and  does not
commit it to stable storage  upon request from clients, that
opens a time window for corruption. So  a client writes to a
page, then reads the same page, and if the server suffered a
crash in between, the data may not match.

So this is performance at the expense of data integrity.

-r


Yes.. ZFS in its normal mode has better data integrity. However, this
may be a more ideal tradeoff if you have specific read/write patterns.
In my case, I'm going to use ZFS initially for my tier2 storage, with
nightly write periods (needs to be short duration rsync from tier1)
and mostly read periods throughout the rest of the day. I'd love to
use ZFS as a tier1 service as well, but then you'd have to perform as
a NetApp does. Same tricks, same NVRAM or initial write to local
stable storage before writing to backend storage. 6MB/sec is closer to
expected behavior for first tier at the expense of reliability. I
don't know what the answer is for Sun to make ZFS 1st Tier quality
with their NFS implementation and its sync happiness.





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: recommended hardware for a zfs/nfs NAS?

2006-06-23 Thread Wes Williams
 I was wondering if anyone could recommend hardware
 forr a ZFS-based NAS for home use.
 
 setup, so I'm looking at sparc or opteron. Ideally it
 would:
 
 a) run quiet (blade 100/150 is ok, x4100 ain't :) )

Not much space in a Blade 100/150 for multiple disks, but it is quiet and cheap.
For NAS use, I'd think RAID would be ideal...RAID in a Blade 100/150 isn't 
ideal.

 b) take advantage of cheap disks
( ide/sata, unless scsi suddenly got affordable)
 come in around the 300-400 pounds mark
 

Ummm...

 Don't need massive storage, it just needs to be
 reliable and reasonably

Reliable = redundant
Redundant = multiple disks

 fast - I was thinking of maybe a 2-way 100Gb mirror
 set.

Fast = $ (typically)
Speed costs money.  How fast do you want to go?  8)

 Graphics are a complete non-issue.
 It only needs to saturate 100mbit (I'm not planning

Saturating 100Mbit with a 64-bit CPU and redundant disks for $300-400 Pounds 
may be tough.

 to use it for
 anything else, so CPU isn't important).
 
True, but ZFS compression=on works very well, and potentially  on-disk 
encryption support [down the road] with compression may change your mind.

 Any used sun systems fit the bill, or should I be
 thinking of rolling
 my own opteron? Thanks!

Roll your own with Opteron, new disks, etc.?  ...not feasible in that price 
range, IMHO.

Depending on what your current workstation is, I just might suggest adding 
disks there and beef it up.  

Nice NAS:
A nice new Ultra 20 is inexpensive, quiet, and makes a good workstation.  I'd 
suggest beefing up your existing workstation or replacing it with a U20 with 
two fast SATA disks  ZFS mirrors or stripes.

Cheap NAS:
The last option I suggest is a Blade 100/150 with only one IDE drive - you 
could zip tie another in the box carefully I suppose - but this isn't the most 
reliable place to save your personal data.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] add_install_client and ZFS and SMF incompatibility

2006-06-23 Thread Constantin Gonzalez
Hi,

I just set up an install server on my notebook and of course all the installer
data is on a ZFS volume. I love the zfs compression=on command!

It seems that the standard ./add_install_client script from the S10U2 Tools
directory creates an entry in /etc/vfstab for a loopback mount of the Solaris
miniroot into the /tftpboot directory.

Unfortunately, at boot time (I'm using Nevada build 39), the mount_all
script tries to mount the loopback mount from /vfstab before ZFS gets its
filesystems mounted.

So the SMF filesystem/local method fails and I have to either mount all ZFS
filesystems from hand, then re-run mount_all or replace the vfstab entry with
a simple symlink. Which only works until you say add_install_client the next
time.

Is this a known issue?

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: recommended hardware for a zfs/nfs NAS?

2006-06-23 Thread Casper . Dik


Saturating 100Mbit with a 64-bit CPU and redundant disks for $300-400 Pounds 
may be tough.

Anything in the market can saturate 100Mbit easily; even with a single
cheap IDE disk.  The disks are generally a factor 5-10 faster than the
100Mbit network.

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Priorities (was: ZFS on 32bit x86)

2006-06-23 Thread Erik Trimble

Darren J Moffat wrote:
This is an @opensolaris.org alias it is about working together as a 
community and identifying problems and discovering solutions.  I don't 
think it is at all appropriate to bring up Sun business choices here. 
Where that is appropriate is when Sun employees need to justify to 
their manager what they are working on.




Darren brings up a good point here, and I thank him for making me 
remember that this isn't just a Sun-only developer list.  However, this 
does bring to light a current problem:  who is working on what, and how 
do the various sponsoring entities prioritize work?


I've run into this problem on a couple of large Open Source projects, 
and we do need to make things a bit more transparent. We have the same 
problem over here in the Java group - how do we coordinate bugfixing and 
feature additions within a large community of developers and users, 
where developers may come from a variety of sources, and users may also 
be interested in providing not just feedback/RFEs, but actual 
sponsorship for developer time.


Obviously, a developer is going to be most interested in producing work 
that their sponsor thinks is important (and, naturally, it is very 
possible for a developer to be his or her own sponsor).   For a 
developer who doesn't have specific work directed by the sponsor, there 
needs to be some way for the community to prioritize work for that 
developer.  That is, we as the community need to be able to let the 
developers know what is important to us, in an organized way.


Personally, I'd like to have the ZFS community have an open bug and RFE 
system that looks like the one for Java (check out: 
http://bugs.sun.com/bugdatabase/index.jsp), or something that provides 
similar features. We (the users) would have a much easier way to hunt 
down things going on with developers' work, and developers would have a 
much easier time determining what is considered widely important to the 
user community.



I've previously bitched about a lack of view of feature schedules for 
ZFS.   This would solve that problem, also.



How about it folks - would it be a good idea for me to explore what it 
takes to get such a bug/RFE setup implemented for the ZFS community on 
OpenSolaris.org?


-Erik

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] add_install_client and ZFS and SMF incompatibility

2006-06-23 Thread Rainer Orth
Constantin Gonzalez [EMAIL PROTECTED] writes:

 Is this a known issue?

Yes, I've raised this during ZFS Beta as SDR-0192.  For some reason, I
don't have a CR here.

Rainer

-- 
-
Rainer Orth, Faculty of Technology, Bielefeld University
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] recommended hardware for a zfs/nfs NAS?

2006-06-23 Thread Erik Trimble

Dick Davies wrote:

I was wondering if anyone could recommend hardware
forr a ZFS-based NAS for home use.

The 'zfs on 32-bit' thread has scared me of a mini-itx fanless
setup, so I'm looking at sparc or opteron. Ideally it would:

a) run quiet (blade 100/150 is ok, x4100 ain't :) )
b) take advantage of cheap disks
  ( ide/sata, unless scsi suddenly got affordable)
c) come in around the 300-400 pounds mark

Don't need massive storage, it just needs to be reliable and reasonably
fast - I was thinking of maybe a 2-way 100Gb mirror set.

Graphics are a complete non-issue.
It only needs to saturate 100mbit (I'm not planning to use it for
anything else, so CPU isn't important).

Any used sun systems fit the bill, or should I be thinking of rolling
my own opteron? Thanks!

If it's just going to be a NAS,  look for a AMD Sempron or Intel Celeron 
D (with 64-bit extension, so you'll need the LGA775 socket version) 
based motherboard with 4 SATA ports on-board - check the OpenSolaris 
folks for drivers. You should be able to get 4 mid-sized  SATA drives 
(say in the 160GB range), and either RAID-Z or stripe/mirror them. That 
will be more than enough to keep a 100Mbit interface fully occupied, 
both reading and writing.


Example:

Socket 753 motherboard w/ 4 SATA ports ( Biostar NF4 4X-A7)   $60
Sempron 2600+   $75
1GB RAM$50
mid-tower case   $50
(4) 80GB SATA drives   4 @ $50 each
CD-ROM$20

Total:  $455


-Erik

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: recommended hardware for a zfs/nfs NAS?

2006-06-23 Thread Wes Williams
 
 
 Saturating 100Mbit with a 64-bit CPU and redundant
 disks for $300-400 Pounds may be tough.
 
 Anything in the market can saturate 100Mbit easily;
 even with a single
 cheap IDE disk.  The disks are generally a factor
 5-10 faster than the
 100Mbit network.
 
 Casper
 
Indeed, I stand corrected - must have been thinking 1000Mbit
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] recommended hardware for a zfs/nfs NAS?

2006-06-23 Thread Richard Elling

Dick Davies wrote:

I was wondering if anyone could recommend hardware
forr a ZFS-based NAS for home use.

The 'zfs on 32-bit' thread has scared me of a mini-itx fanless
setup, so I'm looking at sparc or opteron. Ideally it would:


I think the issue with ZFS on 32-bit is revolving around the
efficient use of memory.  If you have lots of memory, ZFS won't
use it.  By contrast, in 64-bit systems, when you have lots of
memory, ZFS will use it.  In either case, if you only have a
little bit of memory, ZFS may dominate.  [my simplification, I'll
expect correction from the ZFS team, if I'm wrong :-)]


a) run quiet (blade 100/150 is ok, x4100 ain't :) )
b) take advantage of cheap disks
  ( ide/sata, unless scsi suddenly got affordable)
c) come in around the 300-400 pounds mark


I don't think you will have much memory at that price.  I'd go for
2 GBytes, no matter what processor you get.  512 MBytes is too little
(I've got one of those here on the Ranch-net... for archive purposes
only)
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 15 minute fdsync problem and ZFS: Solved

2006-06-23 Thread Richard Elling

Joe Little wrote:

On 6/23/06, Roch [EMAIL PROTECTED] wrote:

Joe, you know this but for the benefit of  others, I have to
highlight that running  any NFS server  this way, may cause
silent data corruption from client's point of view.

Whenever a server keeps  data in RAM this  way and  does not
commit it to stable storage  upon request from clients, that
opens a time window for corruption. So  a client writes to a
page, then reads the same page, and if the server suffered a
crash in between, the data may not match.

So this is performance at the expense of data integrity.


I agree, as a RAS guy this line of reasoning makes me nervous...
I've never known anyone who regularly made this trade-off and
didn't get burned.


Yes.. ZFS in its normal mode has better data integrity. However, this
may be a more ideal tradeoff if you have specific read/write patterns.


The only pattern this makes sense for is the write-only pattern.
That pattern has near zero utility.


In my case, I'm going to use ZFS initially for my tier2 storage, with
nightly write periods (needs to be short duration rsync from tier1)
and mostly read periods throughout the rest of the day. I'd love to
use ZFS as a tier1 service as well, but then you'd have to perform as
a NetApp does. Same tricks, same NVRAM or initial write to local
stable storage before writing to backend storage. 6MB/sec is closer to
expected behavior for first tier at the expense of reliability. I
don't know what the answer is for Sun to make ZFS 1st Tier quality
with their NFS implementation and its sync happiness.


I know the answer will not compromise data integrity.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Fwd: Re: [zfs-discuss] 15 minute fdsync problem and ZFS: Solved

2006-06-23 Thread Tao Chen
I should copy this to the list.-- Forwarded message --On 6/23/06, 
Joe Little [EMAIL PROTECTED] wrote:

I can post back to Roch what this latency is. I think the latency is aconstant regardless of the zil or not. all that I do by disabling thezil is that I'm able to submit larger chunks at a time (faster) thandoing 1k or worse blocks 3 times per file (the NFS fsync penalty)

Please send the script ( I attached a modified version ) along with the result.
They need to see how it works to trust ( or dispute ) the result.
Rule #1 in performance tuning is do not trust the report from an unproven tool :)

I have some comment on the output below. This is for a bit longer (16 trees of 6250 8k files, again with zil disabled):
Generating report from biorpt.sh.rec ... === Top 5 I/O types ===DEVICETBLKs COUNT-sd2 W 2563095sd1 W 2562843

sd1 W 2 201sd2 W 2 197sd1 W32 185This part tells me majority of I/Os are 128KB writes on sd2 and sd1.

 === Top 5 worst I/O response time ===DEVICETBLKsOFFSETTIMESTAMPTIME.ms-sd2 W 175 52907067185.9338433559.55

sd1 W 256 52109768047.5619183097.21sd1 W 256 52115196854.9442533090.42sd1 W 256 52115222454.9442073090.23sd1 W64 52115248054.944241
3090.21Longest response time are more than 3 seconds, ouch.

=== Top 5 Devices with largest number of I/Os ===DEVICEREAD AVG.ms MBWRITE AVG.ms MBIOs SEEK-- -- - -- - sd16 
0.340 4948 387.88413 4954 0%sd26 0.250 4230 387.07405 4236 0%cmdk0 23 8.110152 0.84017510%
Average response time of  300ms is bad.
I calculate SEEK rate on 512-byte block basis, since I/Os are mostly 128K, the seek rate is less than 1% ( 0 ), in other words I consider this as mostly sequential I/O. I guess it's debatable whether 512-byte-based calculation is meaningful.
=== Top 5 Devices with largest amount of data transfer ===
DEVICEREAD 
AVG.ms MBWRITE AVG.ms MB Tol.MB MB/s-- -- - -- - sd16 0.340 4948 387.884134134sd26 
0.250 4230 387.074054054cmdk0 23 8.110152 0.84000=== Report saved in biorpt.sh.rec.rpt ===
I calculate the MB/s on per-second basis, meaning as long as there's at least one finished I/O on the device in a second, that second is used in calculating throughput.
Tao




biorpt.sh
Description: Bourne shell script
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Priorities

2006-06-23 Thread eric kustarz


How about it folks - would it be a good idea for me to explore what it 
takes to get such a bug/RFE setup implemented for the ZFS community on 
OpenSolaris.org?



what's wrong with http://bugs.opensolaris.org/bugdatabase/index.jsp for 
finding bugs?


i think we've been really good about taking reported problems and filing 
bugs - if others disagree, feel free to speak up.


I think what you're asking for should be solved at the opensolaris 
community level (if its not already there) - not specifically for ZFS.


eric

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: Fwd: Re: [zfs-discuss] 15 minute fdsync problem and ZFS: Solved

2006-06-23 Thread Tao Chen
On 6/23/06, Richard Elling [EMAIL PROTECTED] wrote:
comment on analysis below...Tao Chen wrote: === Top 5 Devices with largest number of I/Os === DEVICEREAD AVG.ms MBWRITE AVG.ms MBIOs SEEK
 -- -- - -- -  sd16 0.340 4948 387.88413 4954 0% sd26 0.250 4230 387.07
405 4236 0% cmdk0 23 8.110152 0.84017510% Average response time of  300ms is bad.Average is totally useless with this sort of a distribution.
I'd suggest using a statistical package to explore the distribution.Just a few 3-second latencies will skew the average quite a lot.-- richardA summary report is nothing more than an indication of issues, or non-issue.
So I agree that an average is just, an average.However, a few 3-second latencies will not spoil the result too much when there're more than 4000 I/Os sampled.The script saves the raw data in a .rec file, so you can run whatever statistic tool you have against it. I am currently more worried about how accurate and useful the raw data is, which is generated from a DTrace command in it.
The raw record is in this format:- Timestamp(sec.microsec) - DeviceName- W/R- BLK_NO (offset) - BLK_CNT (I/O size)- IO_Time (I/O elapsed time)Tao (
msec.xx)Tao
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Priorities

2006-06-23 Thread Mike Kupfer
 EK == eric kustarz [EMAIL PROTECTED] writes:

EK what's wrong with http://bugs.opensolaris.org/bugdatabase/index.jsp
EK for finding bugs?

Unless they've fixed it recently, the keywords search doesn't actually
check against the Bugster keywords field.

And the information presented is pretty limited.

I think there are other issues, but these are the ones that annoy me the
most.

We (the OpenSolaris core team) have been working with the people who own
the b.o.o code to fix some of the most glaring issues in the short-term.
We've also been working with the Bugster folks to come up with a
long-term plan that puts the external community on more-or-less equal
footing with Sun employees.  (The difference would be that you have to
be a Sun employee to see confidential information, like customer account
names.)

EK I think what you're asking for should be solved at the opensolaris
EK community level (if its not already there) - not specifically for
EK ZFS.

Yes, please.  If we can't work within the b.o.o framework (which is not
an obvious conclusion to me), then at least let's implement something
for the entire site.  Having community-specific bug functionality is
just going to mean duplicated work and an uneven user experience.

mike

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Priorities

2006-06-23 Thread Dale Ghent

On Jun 23, 2006, at 1:09 PM, eric kustarz wrote:



How about it folks - would it be a good idea for me to explore  
what it takes to get such a bug/RFE setup implemented for the ZFS  
community on OpenSolaris.org?



what's wrong with http://bugs.opensolaris.org/bugdatabase/index.jsp  
for finding bugs?


There's a LOT of things wrong with how b.s.o is presented.

For us non-Sun people, b.s.o is a one-way ticket, and only when we're  
lucky.


First, yes, we can search on bug keywords and categories. Great. Used  
to need a Sunsolve acct for this. But once we do that, we can only  
hope that the bugs we want to read about in detail aren't comprised  
solely of See Notes and that's it. It's like seeing To be  
continued... right before the climax of a movie. Useless and  
frustrating.


Second, while there is a way for Joe Random to submit a bug, there is  
zero way for Joe Random to interact with a bug. No voting to bump or  
drop a priority, no easy way to find hot topic bugs, no way to add  
one's own notes to the issue. I guess the desperate just have to clog  
the system with new bugs and have them marked as dups or badger  
someone with a sun.com email address to do it for us.


Third, much of end-to-end bug servicing from a non-Sun perspective is  
still an uphill battle, from acronyms and terms used to policies and  
coordination of work, e.g. Is someone in Sun or elsewhere already  
working on this particular bug I'm interested in? and questions  
which would stem from that basic one.


In summary, the bug/RFE process is still a mystery after 1 year, and  
who knows if it'll stay the ginormous tease that it currently is.  
Really, it's still no better than if one had a Sunsolve account in  
years' past.


/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Priorities

2006-06-23 Thread Stephen Hahn
* Erik Trimble [EMAIL PROTECTED] [2006-06-23 11:15]:
 It is a good start (yes, I know it's an interface to Bugster, just as
 the Java one I pointed out is too - in fact, it's probably the same
 code).  And, I'm certainly not complaining about how well people have
 been taking to and addressing bugs.
 
 However, there are some significant shortcomings with the interface that
 need to be fixed. And, yes, this is true w/r/t the OpenSolaris community
 as a whole.  Basically, the problem is that the OpenSolaris portal
 itself is extremely primitive, and really needs a big overhaul to make
 the information we have easily accessible in a coherent manner.

  Please come to either of website-discuss or tools-discuss and share
  your thoughts for improvement (or at least de-primitivization).
 
 And, in addition, the bug portal isn't really useful for helping manage
 external (to Sun) contributors work.  And it doesn't given any real
 insight into who's working on what, and what schedules might be.  

  I am not sure whether you are commenting on the lack of publication
  from the internal database (which may have this information), or the
  lack of this information more generally.

  - Stephen
  
-- 
Stephen Hahn, PhD  Solaris Kernel Development, Sun Microsystems
[EMAIL PROTECTED]  http://blogs.sun.com/sch/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Priorities

2006-06-23 Thread Nicolas Williams
On Fri, Jun 23, 2006 at 02:20:54PM -0400, Dale Ghent wrote:
 Second, while there is a way for Joe Random to submit a bug, there is  
 zero way for Joe Random to interact with a bug. No voting to bump or  
 drop a priority, no easy way to find hot topic bugs, no way to add  
 one's own notes to the issue. I guess the desperate just have to clog  
 the system with new bugs and have them marked as dups or badger  
 someone with a sun.com email address to do it for us.

Aside: we track bug severity and priority separately.  The former is for
customers to decide, and each customer may assert different severities
for the same bug, while the latter is for engineers and management to
decide.

As for the See comments problem, us engineers have been told to stop
doing that, so that you should see very few _new_ CRs of that sort.

 In summary, the bug/RFE process is still a mystery after 1 year, and  
 who knows if it'll stay the ginormous tease that it currently is.  

I hope not.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Priorities (moving forums...)

2006-06-23 Thread Erik Trimble
Please refer all followups to this thread over to the
[EMAIL PROTECTED]  list.


On Fri, 2006-06-23 at 11:27 -0700, Stephen Hahn wrote:
 * Erik Trimble [EMAIL PROTECTED] [2006-06-23 11:15]:
  It is a good start (yes, I know it's an interface to Bugster, just as
  the Java one I pointed out is too - in fact, it's probably the same
  code).  And, I'm certainly not complaining about how well people have
  been taking to and addressing bugs.
  
  However, there are some significant shortcomings with the interface that
  need to be fixed. And, yes, this is true w/r/t the OpenSolaris community
  as a whole.  Basically, the problem is that the OpenSolaris portal
  itself is extremely primitive, and really needs a big overhaul to make
  the information we have easily accessible in a coherent manner.
 
   Please come to either of website-discuss or tools-discuss and share
   your thoughts for improvement (or at least de-primitivization).
  
  And, in addition, the bug portal isn't really useful for helping manage
  external (to Sun) contributors work.  And it doesn't given any real
  insight into who's working on what, and what schedules might be.  
 
   I am not sure whether you are commenting on the lack of publication
   from the internal database (which may have this information), or the
   lack of this information more generally.
 
   - Stephen


As several others have pointed out, the current Bug/RFE interface is
seriously broken for non-Sun users, and is missing quite a bit of
functionality (both in the interface and in the data being stored) even
for internal Sun folks.


Off the top of my head:

1.  The categories for bug submission and searching really need to be
rethought. At the minimum, the search function should probably be more
in line with the various communities on OS.org.  That is, you probably
should have main categories which line up with each of the O.S.
communities, with subcategories being more specific.

2.  Viewing bugs is a mess - access varies widely across external and
internal users, bugs aren't consistently found/displayed, etc.

3.  There is no development schedule information stored/available.  e.g.
when a particular Bug/RFE is expected to be fixed/included.

4.  Who is working on a bug/RFE isn't available.

5.  External users are effectively shut out of the bug/RFE database.  It
should be possible for a (properly authorized) external user to both
update a bug status, and/or take ownership of the bug/RFE. 

6.  A better community-centric bug/RFE prioritization method needs to be
developed.

7.  Bug/RFE bounties need to be considered, along with a method of
funding and payout for them

8.  The UI for the whole Bug/RFE setup needs a drastic overhaul to make
it simpler to view multiple/related bugs.





  
-- 
Erik Trimble
Java System Support
Mailstop:  usca14-102
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Bandwidth disparity between NFS and ZFS

2006-06-23 Thread Chris Csanady

While dd'ing to an nfs filesystem, half of the bandwidth is unaccounted
for.  What dd reports amounts to almost exactly half of what zpool iostat
or iostat show; even after accounting for the overhead of the two mirrored
vdevs.  Would anyone care to guess where it may be going?

(This is measured over 10 second intervals.  For 1 second intervals,
the bandwidth to the disks jumps around from 40MB/s to 240MB/s)

With a local dd, everything adds up.  This is with a b41 server, and a
MacOS 10.4 nfs client.  I have verified that the bandwidth at the network
interface is approximately that reported by dd, so the issue would appear
to be within the server.

Any suggestions would be welcome.

Chris
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss