Re: [zfs-discuss] ZSF Solaris

2008-10-01 Thread Bob Friesenhahn
On Tue, 30 Sep 2008, Al Hopper wrote:

 I *suspect* that there might be something like a hash table that is
 degenerating into a singly linked list as the root cause of this
 issue.  But this is only my WAG.

That seems to be a reasonable conclusion.  BTFW that my million file 
test directory uses this sort of file naming, but it has only been 
written once.

When making data multi-access safe, often it is easiest to mark old 
data entries as unused while retaining the allocation.  At some later 
time when it is convenient to do so, these old entries may be made 
available for reuse.  It seems like your algorithm is causing the 
directory size to grow quite large, with many stale entries.

Another possibility is that the directory is becoming fragmented due 
to the limitations of block size.  The original directory was 
contiguous, but the updated directory is now fragmented.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZSF Solaris

2008-10-01 Thread Bob Friesenhahn
On Wed, 1 Oct 2008, Ian Collins wrote:

 A million files in ZFS is no big deal:

 But how similar were your file names?

The file names are like:

image.dpx[000]
image.dpx[001]
image.dpx[002]
image.dpx[003]
image.dpx[004]
.
.
.

So they will surely trip up Al Hopper's bad algorithm.
It is pretty common that images arranged in sequences have the common 
part up front so that sorting works.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZSF Solaris

2008-10-01 Thread Bob Friesenhahn
On Wed, 1 Oct 2008, Ram Sharma wrote:
 So for storing 1 million MYISAM tables (MYISAM being a good performer when
 it comes to not very large data) , I need to save 3 million data files in a
 single folder on disk. This is the way MYISAM saves data.
 I will never need to do an ls on this folder. This folder(~database) will be
 used just by MYSQL engine to exceute my SQL queries and fetch me results.

As long as you do not need to list the files in the directory, I think 
that you will be ok with zfs:

First access:
% ptime ls -l 'image.dpx[666]'
-r--r--r-- 8001 bfriesen home 12754944 Jun 16  2005 image.dpx[666]

real0.023
user0.000
sys 0.002

Second access:
% ptime ls -l 'image.dpx[666]'
-r--r--r-- 8001 bfriesen home 12754944 Jun 16  2005 image.dpx[666]

real0.003
user0.000
sys 0.002

Access to a file in a small directory:
% ptime ls -l .zprofile
-rwxr-xr-x 1 bfriesen home 236 Dec 30  2007 .zprofile

real0.003
user0.000
sys 0.002

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZSF Solaris

2008-10-01 Thread Toby Thain

On 1-Oct-08, at 1:56 AM, Ram Sharma wrote:

 Hi Guys,

 Thanks for so many good comments. Perhaps I got even more than what  
 I asked for!

 I am targeting 1 million users for my application.My DB will be on  
 solaris machine.And the reason I am making one table per user is  
 that it will be a simple design as compared to keeping all the data  
 in single table.


You have a green light from ZFS experts, but there is no way you'd  
get that schema past a good DBA. This design will fail you long  
before you get near a million users.

--Toby

 In that case I need to worry about things like horizontal  
 partitioning which inturn will require higher level of management.

 So for storing 1 million MYISAM tables (MYISAM being a good  
 performer when it comes to not very large data) , I need to save 3  
 million data files in a single folder on disk. This is the way  
 MYISAM saves data.
 I will never need to do an ls on this folder. This folder 
 (~database) will be used just by MYSQL engine to exceute my SQL  
 queries and fetch me results.
 And now that ZFS allows me to do this easily, I believe I can go  
 forward with this design easily.Correct me if I am missing something.
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZSF Solaris

2008-09-30 Thread Ram Sharma
Hi,

can anyone please tell me what is the maximum number of files that can be there 
in 1 folder in Solaris with ZSF file system.

I am working on an application in which I have to support 1mn users. In my 
application I am using MySql MyISAM and in MyISAM there is 3 files created for 
1 table. I am having application architechture in which each user will be 
having separate table, so the expected number of files in database folder is 
3mn. I have read somewhere that there is a limit of each OS to create files in 
a folder.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZSF Solaris

2008-09-30 Thread Mark J Musante
On Tue, 30 Sep 2008, Ram Sharma wrote:

 Hi,

 can anyone please tell me what is the maximum number of files that can 
 be there in 1 folder in Solaris with ZSF file system.

By folder, I assume you mean directory and not, say, pool.  In any case, 
the 'limit' is 2^48, but that's effectively no limit at all.


Regards,
markm
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZSF Solaris

2008-09-30 Thread Marcelo Leal
ZFS has not limit for snapshots and filesystems too, but try to create a lot 
snapshots and filesytems and you will have to wait a lot for your pool to 
import too... ;-)
 I think you should not think about the limits, but performance. Any 
filesytem with *too many entries by directory will suffer. So, my advice is 
configure your app to create a better hierarchy.

 Leal.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZSF Solaris

2008-09-30 Thread Toby Thain

On 30-Sep-08, at 7:50 AM, Ram Sharma wrote:

 Hi,

 can anyone please tell me what is the maximum number of files that  
 can be there in 1 folder in Solaris with ZSF file system.

 I am working on an application in which I have to support 1mn  
 users. In my application I am using MySql MyISAM and in MyISAM  
 there is 3 files created for 1 table. I am having application  
 architechture in which each user will be having separate table, so  
 the expected number of files in database folder is 3mn.

That sounds like a disastrous schema design. Apart from that, you're  
going to run into problems on several levels, including O/S resources  
(file descriptors) and filesystem scalability.

--Toby

 I have read somewhere that there is a limit of each OS to create  
 files in a folder.
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZSF Solaris

2008-09-30 Thread Nathan Kroenert
Actually, the one that'll hurt most is ironically the most closely 
related to bad database schema design... With a zillion files in the one 
directory, if someone does an 'ls' in that directory, it'll not only 
take ages, but steal a whole heap of memory and compute power...

Provided the only things that'll be doing *anything* in that directory 
are using indexed methods, there is no real problem from a ZFS 
perspective, but if something decides to list (or worse, list and sort) 
that directory, it won't be that pleasant.

Oh - That's of course assuming you have sufficient memory in the system 
to cache all that metadata somewhere... If you don't then that's another 
zillion I/O's you need to deal with each time you list the entire directory.

an ls -1rt on a directory with about 1.2 million files with names like 
afile1202899 takes minutes to complete on my box, and we see 'ls' get to 
in excess of 700MB rss... (and that's not including the memory zfs is 
using to cache whatever it can.)

My box has the ARC limited to about 1GB, so it's obviously undersized 
for such a workload, but still gives you an indication...

I generally look to keep directories to a size that allows the utilities 
that work on and in it to perform at a reasonable rate... which for the 
most part is around the 100K files or less...

Perhaps you are using larger hardware than I am for some of this stuff? :)

Nathan.

On  1/10/08 07:29 AM, Toby Thain wrote:
 On 30-Sep-08, at 7:50 AM, Ram Sharma wrote:
 
 Hi,

 can anyone please tell me what is the maximum number of files that  
 can be there in 1 folder in Solaris with ZSF file system.

 I am working on an application in which I have to support 1mn  
 users. In my application I am using MySql MyISAM and in MyISAM  
 there is 3 files created for 1 table. I am having application  
 architechture in which each user will be having separate table, so  
 the expected number of files in database folder is 3mn.
 
 That sounds like a disastrous schema design. Apart from that, you're  
 going to run into problems on several levels, including O/S resources  
 (file descriptors) and filesystem scalability.
 
 --Toby
 
 I have read somewhere that there is a limit of each OS to create  
 files in a folder.
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 


//
// Nathan Kroenert  [EMAIL PROTECTED]   //
// Senior Systems Engineer  Phone:  +61 3 9869 6255 //
// Global Systems Engineering   Fax:+61 3 9869 6288 //
// Level 7, 476 St. Kilda Road  //
// Melbourne 3004   VictoriaAustralia   //
//
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZSF Solaris

2008-09-30 Thread Bob Friesenhahn
On Wed, 1 Oct 2008, Nathan Kroenert wrote:
 zillion I/O's you need to deal with each time you list the entire directory.

 an ls -1rt on a directory with about 1.2 million files with names like
 afile1202899 takes minutes to complete on my box, and we see 'ls' get to
 in excess of 700MB rss... (and that's not including the memory zfs is
 using to cache whatever it can.)

A million files in ZFS is no big deal:

% ptime ls -1rt  /dev/null

real   17.277
user8.992
sys 8.231

% ptime ls -1rt | wc -l

real   17.045
user8.607
sys 8.413
100

Maybe the problem is that you need to increase your screen's scroll 
rate. :-)

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZSF Solaris

2008-09-30 Thread Bob Friesenhahn
On Wed, 1 Oct 2008, Nathan Kroenert wrote:

 That being said, there is a large delta in your results and mine... If I get 
 a chance, I'll look into it...

 I suspect it's a cached versus I/O issue...

The first time I posted was the first time the directory has been read 
in well over a month so it was not currently cached.

You might find this to be interesting since it shows that the 'rt' 
options are taking most of the time:

% ptime ls -1 | wc -l

real5.497
user4.825
sys 0.654
100

I will certainly agree that huge directories can cause problems for 
many applications, particularly ones that access the files over a 
network.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZSF Solaris

2008-09-30 Thread Al Hopper
On Tue, Sep 30, 2008 at 6:30 PM, Nathan Kroenert
[EMAIL PROTECTED] wrote:
 Actually, the one that'll hurt most is ironically the most closely
 related to bad database schema design... With a zillion files in the one
 directory, if someone does an 'ls' in that directory, it'll not only
 take ages, but steal a whole heap of memory and compute power...

 Provided the only things that'll be doing *anything* in that directory
 are using indexed methods, there is no real problem from a ZFS
 perspective, but if something decides to list (or worse, list and sort)
 that directory, it won't be that pleasant.

 Oh - That's of course assuming you have sufficient memory in the system
 to cache all that metadata somewhere... If you don't then that's another
 zillion I/O's you need to deal with each time you list the entire directory.

 an ls -1rt on a directory with about 1.2 million files with names like
 afile1202899 takes minutes to complete on my box, and we see 'ls' get to
  ^^^

Here's your problem!

 in excess of 700MB rss... (and that's not including the memory zfs is
 using to cache whatever it can.)

 My box has the ARC limited to about 1GB, so it's obviously undersized
 for such a workload, but still gives you an indication...

 I generally look to keep directories to a size that allows the utilities
 that work on and in it to perform at a reasonable rate... which for the
 most part is around the 100K files or less...

 Perhaps you are using larger hardware than I am for some of this stuff? :)


I've seen this problem where *Solaris has issues with many files
created with this type of file naming pattern.  For example, the file
naming pattern produced by tmpfile(3C).  I saw it originally on a
tmpfs and it can be easily reproduced by:

[note: I'm writing this from memory - so don't beat me up over specific details]

1) pick a number for the number of files you want to test with (try
different numbers - start with 1,500 and then increase it).  Call this
test#
2) cd /tmp
3)  IMPORTANT:  Make a test directory for this experiment - let's call it temp
4) cd /tmp/temp  (your playground)
5) using your favorite language generate your test# of files using a
pattern similar to the one above by calling (ultimate) tmpfile()
6) ptime ls -al;  -  it will be quick the first time
7) ptime rm  * ;   - it will be quick the first time
8) repeat steps 5, 6 and 7.  Your ptimes will be a little slower
9) repeat steps 5, 6 and 7.  Your ptimes will be much slower
10) repeat steps 5, 6 and 7.  Your ptimes will be *really* slow.  Now
you'll understand that you have a problem.
11) repeat 5, 6 and 7 a couple more times.  Notice how bad your ptimes are now!
12) look at the size of /tmp/temp using ls -ald /tmp/temp  and you'll
notice that it has grown substancially.  The larger this directory
grows, the slower the filesystem operations will get.

This behavior is common to tmpfs, UFS and I tested it on early ZFS
releases.  I have no idea why - I have not made the time to figure it
out.  What I have observed is that all operations on your (victim)
test directory will max out (100% utilization) one CPU or one CPU core
- and all directory operations become single-threaded and limited by
the performance of one CPU (or core).

Now for the weird part: the *only* way to return everything to normal
performance levels (that I've found) is to rmdir the (victim)
directory.  This is why I recommend you perform this experiment in a
subdirectory.  If you do it in /tmp - you'll have to reboot the box to
get reasonably performance back - and you don't want to do it in your
home directory either!!

I'll try to set aside some time tomorrow to re-run this experiment.
But I'm nearly sure this is why your directory related file ops are so
slow and *dramatically* slower than they should be.   This problem/bug
is insideous - because using tmpfile() in /tmp is a very common
practice and the application(s) using /tmp will slow down dramatically
while maxing out (100% utilization) one CPU (or core).  And if your
system only has a single CPU...   :(

Let me know what you find out.  I know that the file name pattern is
what causes this bug to bite bigtime - and not so much the number of
files you use to test it.

I *suspect* that there might be something like a hash table that is
degenerating into a singly linked list as the root cause of this
issue.  But this is only my WAG.

Regards,

-- 
Al Hopper  Logical Approach Inc,Plano,TX [EMAIL PROTECTED]
   Voice: 972.379.2133 Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZSF Solaris

2008-09-30 Thread Ian Collins
Bob Friesenhahn wrote:
 On Wed, 1 Oct 2008, Nathan Kroenert wrote:
   
 zillion I/O's you need to deal with each time you list the entire directory.

 an ls -1rt on a directory with about 1.2 million files with names like
 afile1202899 takes minutes to complete on my box, and we see 'ls' get to
 in excess of 700MB rss... (and that's not including the memory zfs is
 using to cache whatever it can.)
 

 A million files in ZFS is no big deal:

   
But how similar were your file names?

Ian

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZSF Solaris

2008-09-30 Thread Jens Elkner
On Tue, Sep 30, 2008 at 09:44:21PM -0500, Al Hopper wrote:
 
 This behavior is common to tmpfs, UFS and I tested it on early ZFS
 releases.  I have no idea why - I have not made the time to figure it
 out.  What I have observed is that all operations on your (victim)
 test directory will max out (100% utilization) one CPU or one CPU core
 - and all directory operations become single-threaded and limited by
 the performance of one CPU (or core).

And sometimes its just a little bug: E.g. with a recent version of Solaris
(i.e. = snv_95 || = S10U5) on UFS:

SunOS graf 5.10 Generic_137112-07 i86pc i386 i86pc (X4600, S10U5)
=
admin.graf /var/tmp   time sh -c 'mkfile 2g xx ; sync'
0.05u 9.78s 0:29.42 33.4%
admin.graf /var/tmp  time sh -c 'mkfile 2g xx ; sync'
0.05u 293.37s 5:13.67 93.5%
admin.graf /var/tmp  rm xx
admin.graf /var/tmp  time sh -c 'mkfile 2g xx ; sync'
0.05u 9.92s 0:31.75 31.4%
admin.graf /var/tmp  time sh -c 'mkfile 2g xx ; sync'
0.05u 305.15s 5:28.67 92.8%
admin.graf /var/tmp  time dd if=/dev/zero of=xx bs=1k count=2048
2048+0 records in
2048+0 records out
0.00u 298.40s 4:58.46 99.9%
admin.graf /var/tmp  time sh -c 'mkfile 2g xx ; sync'
0.05u 394.06s 6:52.79 95.4%

SunOS kaiser 5.10 Generic_137111-07 sun4u sparc SUNW,Sun-Fire-V440 (S10, U5)
=
admin.kaiser /var/tmp  time mkfile 1g xx
0.14u 5.24s 0:26.72 20.1%
admin.kaiser /var/tmp  time mkfile 1g xx
0.13u 64.23s 1:25.67 75.1%
admin.kaiser /var/tmp  time mkfile 1g xx
0.13u 68.36s 1:30.12 75.9%
admin.kaiser /var/tmp  rm xx
admin.kaiser /var/tmp  time mkfile 1g xx
0.14u 5.79s 0:29.93 19.8%
admin.kaiser /var/tmp  time mkfile 1g xx
0.13u 66.37s 1:28.06 75.5%

SunOS q 5.11 snv_98 i86pc i386 i86pc (U40, S11b98)
=
elkner.q /var/tmp  time mkfile 2g xx
0.05u 3.63s 0:42.91 8.5%
elkner.q /var/tmp  time mkfile 2g xx
0.04u 315.15s 5:54.12 89.0%

SunOS dax 5.11 snv_79a i86pc i386 i86pc (U40, S11b79)
=
elkner.dax /var/tmp  time mkfile 2g xx
0.05u 3.09s 0:43.09 7.2%
elkner.dax /var/tmp  time mkfile 2g xx
0.05u 4.95s 0:43.62 11.4%

Regards,
jel.
-- 
Otto-von-Guericke University http://www.cs.uni-magdeburg.de/
Department of Computer Science   Geb. 29 R 027, Universitaetsplatz 2
39106 Magdeburg, Germany Tel: +49 391 67 12768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZSF Solaris

2008-09-30 Thread Ram Sharma
Hi Guys,

Thanks for so many good comments. Perhaps I got even more than what I asked for!

I am targeting 1 million users for my application.My DB will be on solaris 
machine.And the reason I am making one table per user is that it will be a 
simple design as compared to keeping all the data in single table.In that case 
I need to worry about things like horizontal partitioning which inturn will 
require higher level of management.

So for storing 1 million MYISAM tables (MYISAM being a good performer when it 
comes to not very large data) , I need to save 3 million data files in a single 
folder on disk. This is the way MYISAM saves data.
I will never need to do an ls on this folder. This folder(~database) will be 
used just by MYSQL engine to exceute my SQL queries and fetch me results.
And now that ZFS allows me to do this easily, I believe I can go forward with 
this design easily.Correct me if I am missing something.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss