subject:"\[zfs\-discuss\] Lots of metadata overhead on filesystems with 100M files"

Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files

2009-06-22 Thread Bob Friesenhahn


On Mon, 22 Jun 2009, Thomas wrote:

I have and raidz1 conisting 6 5400rpm drives on this zpool. I have 
stored some Media in a FS and in an other 200k files. Both FS are 
written not much. The Pool is 85% Full.


Could this issue also the reason that if Iam playing(reading) some 
Media that the playback is lagging?


Check to see if you have automated snapshots running. If snapshots 
make your pool full, then your pool will also be more likely to 
fragment new/modified files.


Make sure that you are using the default zfs blocksize of 128K since 
smaller block sizes may increase fragmentation.


You may have a slow disk which is causing the whole pool to run slow. 
All it takes is one.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files

2009-06-19 Thread Rainer Orth

Richard Elling richard.ell...@gmail.com writes:

 George would probably have the latest info, but there were a number of
 things which circled around the notorious Stop looking and start ganging
 bug report,
 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6596237

Indeed: we were seriously bitten by this one, taking three Solaris 10
fileservers down for about a week until the problem was diagnosed by Sun
Service and an IDR provided.  Unfortunately, this issue (seriously
fragmented pools or pools beyond ca. 90% full cause file servers to grind
to a halt) were only announced/acknowledged publicly after our incident,
although the problem seems to have been reported almost two years ago.
While a fix has been integrated into snv_114, there's still no patch for
S10, only various IDRs.

It's unclear what the state of the related CR 4854312 (need to defragment
storage pool, submitted in 2003!) is.  I suppose this might be dealt with
by the vdev removal code, but overall it's scary that dealing with such
fundamental issues takes so long.

Rainer

-- 
-
Rainer Orth, Faculty of Technology, Bielefeld University
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files

2009-06-19 Thread Roch Bourbonnais



Le 18 juin 09 à 20:23, Richard Elling a écrit :


Cor Beumer - Storage Solution Architect wrote:

Hi Jose,

Well it depends on the total size of your Zpool and how often these  
files are changed.


...and the average size of the files.  For small files, it is likely  
that the default
recordsize will not be optimal, for several reasons.  Are these  
small files?

-- richard


Hey Richard, I have to correct that. For small files  big files no  
need to tune the recordsize.
(files are stored as single perfectly adjusted records up to the  
dataset recordsize property).


Only for big files access and updated in aligned application record  
(RDBMS) does it help to tune the ZFS recordsize.


-r





I was at a customer an huge internet provider, who had 40x an X4500  
with Standard solaris and using ZFS.
All the machines were equiped with 48x 1TB disks. The machines were  
used to provide the email platform, so all
the user email accounts were on the system. This did mean also  
millions of files in one ZPOOL.


What they noticed on the the X4500 systems, that when the zpool  
became filled up for about 50-60% the performance of the system

did drop enormously.
They do claim this has to do with the fragmentation of the ZFS  
filesystem. So we did try over there putting an S7410 system in  
with about the same config on disks, 44x 1TB SATA BUT 4x 18GB  
WriteZilla (in a stripe) we were able to get much and much more i/ 
o's from the system the the comparable X4500, however they did put  
it in production for a couple of weeks, and as soon as the ZFS  
filesystem did come in the range of about 50-60% filling the did  
see the same problem.
The performance did drop down enormously. Netapps has the same  
problem with there Waffle filesystem, (they also tested this)  
however they do provide an Defragmentation tool for this. This is  
also NOT a nice solution, because you have to run this, manually or  
scheduled and it is taking a lot of system resources but it helps.


I did hear Sun is denying we do have this problem in ZFS, and  
therefore we don't need a kind of defragmentation mechanism,

however our customer experiences are different

May be it is good for the ZFS group to look at this (potential)  
problem.


The customer i am talking about is willing to share there  
experiences with Sun engineering.


greetings,

Cor Beumer


Jose Martins wrote:


Hello experts,

IHAC that wants to put more than 250 Million files on a single
mountpoint (in a directory tree with no more than 100 files on each
directory).

He wants to share such filesystem by NFS and mount it through
many Linux Debian clients

We are proposing a 7410 Openstore appliance...

He is claiming that certain operations like find, even if taken from
the Linux clients on such NFS mountpoint take significant more
time than if such NFS share was provided by other NAS providers
like NetApp...

Can someone confirm if this is really a problem for ZFS  
filesystems?...


Is there any way to tune it?...

We thank any input

Best regards

Jose





--
http://www.sun.com*Cor Beumer *
 Data Management  Storage

 *Sun Microsystems Nederland BV*
 Saturnus 1
 3824 ME Amersfoort The Netherlands
 Phone +31 33 451 5172
 Mobile +31 6 51 603 142
 Email cor.beu...@sun.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




smime.p7s
Description: S/MIME cryptographic signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files

2009-06-18 Thread Louis Romero


hi Dirk,

How might we explain running find on a linux client to an NFS mounted 
file system under the 7000 taking significantly longer (i.e. performance 
behaving as though the command was run from Solaris?)  Not sure if find 
would have the intelligence to differentiate between file system types 
and run different sections of code based upon what it finds?


louis

On 06/17/09 11:38, Dirk Nitschke wrote:

Hi Louis!

Solaris /usr/bin/find and Linux (GNU-) find work differently! I have 
experienced dramatic runtime differences some time ago. The reason is 
that Solaris find and GNU find use different algorithms.


GNU find uses the st_nlink (number of links) field of the stat 
structure to optimize it's work. Solaris find does not use this kind 
of optimization because the meaning of number of links is not well 
defined and file system dependent.


If you are interested, take a look at, say,

CR 4907267 link count problem is hsfs
CR 4462534 RFE: pcfs should emulate link counts for directories

Dirk

Am 17.06.2009 um 18:08 schrieb Louis Romero:


Jose,

I believe the problem is endemic to Solaris.  I have run into similar 
problems doing a simple find(1) in /etc.  On Linux, a find operation 
in /etc is almost instantaneous.  On solaris, it has a tendency to  
spin for a long time.  I don't know what their use of find might be 
but, running updatedb on the linux clients (with the NFS file system 
mounted of course) and using locate(1) will give you a work-around on 
the linux clients.
Caveat Empore: There is a staleness factor associated with this 
solution as any new files dropped in after updatedb runs will not 
show up until the next updatedb is run.


HTH

louis

On 06/16/09 11:55, Jose Martins wrote:


Hello experts,

IHAC that wants to put more than 250 Million files on a single
mountpoint (in a directory tree with no more than 100 files on each
directory).

He wants to share such filesystem by NFS and mount it through
many Linux Debian clients

We are proposing a 7410 Openstore appliance...

He is claiming that certain operations like find, even if taken from
the Linux clients on such NFS mountpoint take significant more
time than if such NFS share was provided by other NAS providers
like NetApp...

Can someone confirm if this is really a problem for ZFS filesystems?...

Is there any way to tune it?...

We thank any input

Best regards

Jose









___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files

2009-06-18 Thread Cor Beumer - Storage Solution Architect





Hi Jose,

Well it depends on the total size of your Zpool and how often these
files are changed.

I was at a customer an huge internet provider, who had 40x an X4500
with Standard solaris and using ZFS.
All the machines were equiped with 48x 1TB disks. The machines were
used to provide the email platform, so all
the user email accounts were on the system. This did mean also millions
of files in one ZPOOL.

What they noticed on the the X4500 systems, that when the zpool became
filled up for about 50-60% the performance of the system
did drop enormously. 
They do claim this has to do with the fragmentation of the ZFS
filesystem. So we did try over there putting an S7410 system in with
about the same config on disks, 44x 1TB SATA BUT 4x 18GB WriteZilla (in
a stripe) we were able to get much and much more i/o's from the system
the the comparable X4500, however they did put it in production for a
couple of weeks, and as soon as the ZFS filesystem did come in the
range of about 50-60% filling the did see the same problem. 
The performance did drop down enormously. Netapps has the same problem
with there Waffle filesystem, (they also tested this) however they do
provide an Defragmentation tool for this. This is also NOT a nice
solution, because you have to run this, manually or scheduled and it is
taking a lot of system resources but it helps. 

I did hear Sun is denying we do have this problem in ZFS, and therefore
we don't need a kind of defragmentation mechanism,
however our customer experiences are different

May be it is good for the ZFS group to look at this (potential) problem.

The customer i am talking about is willing to share there experiences
with Sun engineering.

greetings,

Cor Beumer


Jose Martins wrote:

Hello experts,
  
  
IHAC that wants to put more than 250 Million files on a single
  
mountpoint (in a directory tree with no more than 100 files on each
  
directory).
  
  
He wants to share such filesystem by NFS and mount it through
  
many Linux Debian clients
  
  
We are proposing a 7410 Openstore appliance...
  
  
He is claiming that certain operations like find, even if taken from
  
the Linux clients on such NFS mountpoint take significant more
  
time than if such NFS share was provided by other NAS providers
  
like NetApp...
  
  
Can someone confirm if this is really a problem for ZFS filesystems?...
  
  
Is there any way to tune it?...
  
  
We thank any input
  
  
Best regards
  
  
Jose
  
  
  
  


-- 

  

  
   Cor Beumer 
 Data Management  Storage
  
 Sun Microsystems Nederland BV
 Saturnus 1
 3824 ME Amersfoort The Netherlands
 Phone +31 33 451 5172
 Mobile +31 6 51 603 142
 Email cor.beu...@sun.com
  

  




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files

2009-06-18 Thread Richard Elling


Cor Beumer - Storage Solution Architect wrote:

Hi Jose,

Well it depends on the total size of your Zpool and how often these 
files are changed.


...and the average size of the files.  For small files, it is likely 
that the default

recordsize will not be optimal, for several reasons.  Are these small files?
-- richard



I was at a customer an huge internet provider, who had 40x an X4500 
with Standard solaris and using ZFS.
All the machines were equiped with 48x 1TB disks. The machines were 
used to provide the email platform, so all
the user email accounts were on the system. This did mean also 
millions of files in one ZPOOL.


What they noticed on the the X4500 systems, that when the zpool became 
filled up for about 50-60% the performance of the system

did drop enormously.
They do claim this has to do with the fragmentation of the ZFS 
filesystem. So we did try over there putting an S7410 system in with 
about the same config on disks, 44x 1TB SATA BUT 4x 18GB WriteZilla 
(in a stripe) we were able to get much and much more i/o's from the 
system the the comparable X4500, however they did put it in production 
for a couple of weeks, and as soon as the ZFS filesystem did come in 
the range of about 50-60% filling the did see the same problem.
The performance did drop down enormously. Netapps has the same problem 
with there Waffle filesystem, (they also tested this) however they do 
provide an Defragmentation tool for this. This is also NOT a nice 
solution, because you have to run this, manually or scheduled and it 
is taking a lot of system resources but it helps.


I did hear Sun is denying we do have this problem in ZFS, and 
therefore we don't need a kind of defragmentation mechanism,

however our customer experiences are different

May be it is good for the ZFS group to look at this (potential) problem.

The customer i am talking about is willing to share there experiences 
with Sun engineering.


greetings,

Cor Beumer


Jose Martins wrote:


Hello experts,

IHAC that wants to put more than 250 Million files on a single
mountpoint (in a directory tree with no more than 100 files on each
directory).

He wants to share such filesystem by NFS and mount it through
many Linux Debian clients

We are proposing a 7410 Openstore appliance...

He is claiming that certain operations like find, even if taken from
the Linux clients on such NFS mountpoint take significant more
time than if such NFS share was provided by other NAS providers
like NetApp...

Can someone confirm if this is really a problem for ZFS filesystems?...

Is there any way to tune it?...

We thank any input

Best regards

Jose





--
http://www.sun.com*Cor Beumer *
  Data Management  Storage

  *Sun Microsystems Nederland BV*
  Saturnus 1
  3824 ME Amersfoort The Netherlands
  Phone +31 33 451 5172
  Mobile +31 6 51 603 142
  Email cor.beu...@sun.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files

2009-06-18 Thread Gary Mills

On Thu, Jun 18, 2009 at 12:12:16PM +0200, Cor Beumer - Storage Solution 
Architect wrote:
 
 What they noticed on the the X4500 systems, that when the zpool became 
 filled up for about 50-60% the performance of the system
 did drop enormously.
 They do claim this has to do with the fragmentation of the ZFS 
 filesystem. So we did try over there putting an S7410 system in with 
 about the same config on disks, 44x 1TB SATA BUT 4x 18GB WriteZilla (in 
 a stripe) we were able to get much and much more i/o's from the system 
 the the comparable X4500, however they did put it in production for a 
 couple of weeks, and as soon as the ZFS filesystem did come in the range 
 of about 50-60% filling the did see the same problem.

We had a similar problem with a T2000 and 2 TB of ZFS storage.  Once
the usage reached 1 TB, the write performance dropped considerably and
the CPU consumption increased.  Our problem was indirectly a result of
fragmentation, but it was solved by a ZFS patch.  I understand that
this patch, which fixes a whole bunch of ZFS bugs, should be released
soon.  I wonder if this was your problem.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files

2009-06-18 Thread Richard Elling


Gary Mills wrote:

On Thu, Jun 18, 2009 at 12:12:16PM +0200, Cor Beumer - Storage Solution 
Architect wrote:
  
What they noticed on the the X4500 systems, that when the zpool became 
filled up for about 50-60% the performance of the system

did drop enormously.
They do claim this has to do with the fragmentation of the ZFS 
filesystem. So we did try over there putting an S7410 system in with 
about the same config on disks, 44x 1TB SATA BUT 4x 18GB WriteZilla (in 
a stripe) we were able to get much and much more i/o's from the system 
the the comparable X4500, however they did put it in production for a 
couple of weeks, and as soon as the ZFS filesystem did come in the range 
of about 50-60% filling the did see the same problem.



We had a similar problem with a T2000 and 2 TB of ZFS storage.  Once
the usage reached 1 TB, the write performance dropped considerably and
the CPU consumption increased.  Our problem was indirectly a result of
fragmentation, but it was solved by a ZFS patch.  I understand that
this patch, which fixes a whole bunch of ZFS bugs, should be released
soon.  I wonder if this was your problem.
  


George would probably have the latest info, but there were a number of
things which circled around the notorious Stop looking and start ganging
bug report,
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6596237
-- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files

2009-06-17 Thread Roch Bourbonnais



Le 16 juin 09 à 19:55, Jose Martins a écrit :



Hello experts,

IHAC that wants to put more than 250 Million files on a single
mountpoint (in a directory tree with no more than 100 files on each
directory).

He wants to share such filesystem by NFS and mount it through
many Linux Debian clients

We are proposing a 7410 Openstore appliance...

He is claiming that certain operations like find, even if taken from
the Linux clients on such NFS mountpoint take significant more
time than if such NFS share was provided by other NAS providers
like NetApp...



10%, 100%, 1% or more ? Knowing magnitude helps diagnostics.
What kind of pool is this ?

This should be a read performance test : pool type and total
disk rotation impacts the resulting performance.

Can someone confirm if this is really a problem for ZFS  
filesystems?...



Nope


Is there any way to tune it?...

We thank any input

Best regards

Jose







smime.p7s
Description: S/MIME cryptographic signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files

2009-06-17 Thread robert ungar



Jose,

I hope our openstorage experts weigh in on 'is this a good idea', it 
sounds scary to me but I'm
overly cautious anyway.  I did want to raise the question of other 
client expectations for this

opportunity,  what are the intended data protection requirements, how will
they backup and recover the files,  do they intend to apply replication 
in support of disaster
recovery plan and are the intended data protection schemes practical.  
The other area that jumps
out at me is concurrent access,  in addition to the 'find' by 'many' 
clients,  does the client have any
performance requirements that must be met to insure the solution is 
successful.  Does any of the

above have to happen at the same time ?

I'm not in a position to evaluate these considerations for this 
opportunity, simply sharing some areas

that, often enough, are over looked as we address the chief complaint.

Regards,
Robert




Jose Martins wrote:


Hello experts,

IHAC that wants to put more than 250 Million files on a single
mountpoint (in a directory tree with no more than 100 files on each
directory).

He wants to share such filesystem by NFS and mount it through
many Linux Debian clients

We are proposing a 7410 Openstore appliance...

He is claiming that certain operations like find, even if taken from
the Linux clients on such NFS mountpoint take significant more
time than if such NFS share was provided by other NAS providers
like NetApp...

Can someone confirm if this is really a problem for ZFS filesystems?...

Is there any way to tune it?...

We thank any input

Best regards

Jose








--


Robert C. Ungar  ABCP
Professional Services Delivery
Storage Solutions Specialist
Telephone 585-598-9020

Sun Microsystems
345 Woodcliff Drive
Fairport, NY 14450

www.sun.com/storage 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files

2009-06-17 Thread Eric D. Mudama


On Wed, Jun 17 at 13:49, Alan Hargreaves wrote:
Another question worth asking here is, is a find over the entire  
filesystem something that they would expect to be executed with  
sufficient regularity that it the execution time would have a business  
impact.


Exactly.  That's such an odd business workload on 250,000,000 files
that there isn't likely to be much of a shortcut other than just
throwing tons of spindles (or SSDs) at the problem, and/or having tons
of memory.

If the finds are just by name, thats easy for the system to cache, but
if you're expecting to run something against the output of find with
-exec to parse/process 250M files on a regular basis, you'll likely be
severely IO bound.  Almost to the point of arguing for something like
Hadoop or another form of distributed map:reduce on your dataset with
a lot of nodes, instead of a single storage server.


--
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files

2009-06-17 Thread Louis Romero


Jose,

I believe the problem is endemic to Solaris.  I have run into similar 
problems doing a simple find(1) in /etc.  On Linux, a find operation in 
/etc is almost instantaneous.  On solaris, it has a tendency to  spin 
for a long time.  I don't know what their use of find might be but, 
running updatedb on the linux clients (with the NFS file system mounted 
of course) and using locate(1) will give you a work-around on the linux 
clients. 

Caveat Empore: There is a staleness factor associated with this solution 
as any new files dropped in after updatedb runs will not show up until 
the next updatedb is run.


HTH

louis

On 06/16/09 11:55, Jose Martins wrote:


Hello experts,

IHAC that wants to put more than 250 Million files on a single
mountpoint (in a directory tree with no more than 100 files on each
directory).

He wants to share such filesystem by NFS and mount it through
many Linux Debian clients

We are proposing a 7410 Openstore appliance...

He is claiming that certain operations like find, even if taken from
the Linux clients on such NFS mountpoint take significant more
time than if such NFS share was provided by other NAS providers
like NetApp...

Can someone confirm if this is really a problem for ZFS filesystems?...

Is there any way to tune it?...

We thank any input

Best regards

Jose





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files

2009-06-17 Thread Dirk Nitschke


Hi Louis!

Solaris /usr/bin/find and Linux (GNU-) find work differently! I have  
experienced dramatic runtime differences some time ago. The reason is  
that Solaris find and GNU find use different algorithms.


GNU find uses the st_nlink (number of links) field of the stat  
structure to optimize it's work. Solaris find does not use this kind  
of optimization because the meaning of number of links is not well  
defined and file system dependent.


If you are interested, take a look at, say,

CR 4907267 link count problem is hsfs
CR 4462534 RFE: pcfs should emulate link counts for directories

Dirk

Am 17.06.2009 um 18:08 schrieb Louis Romero:


Jose,

I believe the problem is endemic to Solaris.  I have run into  
similar problems doing a simple find(1) in /etc.  On Linux, a find  
operation in /etc is almost instantaneous.  On solaris, it has a  
tendency to  spin for a long time.  I don't know what their use of  
find might be but, running updatedb on the linux clients (with the  
NFS file system mounted of course) and using locate(1) will give you  
a work-around on the linux clients.
Caveat Empore: There is a staleness factor associated with this  
solution as any new files dropped in after updatedb runs will not  
show up until the next updatedb is run.


HTH

louis

On 06/16/09 11:55, Jose Martins wrote:


Hello experts,

IHAC that wants to put more than 250 Million files on a single
mountpoint (in a directory tree with no more than 100 files on each
directory).

He wants to share such filesystem by NFS and mount it through
many Linux Debian clients

We are proposing a 7410 Openstore appliance...

He is claiming that certain operations like find, even if taken from
the Linux clients on such NFS mountpoint take significant more
time than if such NFS share was provided by other NAS providers
like NetApp...

Can someone confirm if this is really a problem for ZFS  
filesystems?...


Is there any way to tune it?...

We thank any input

Best regards

Jose







--
Sun Microsystems GmbH Dirk Nitschke
Nagelsweg 55  Storage Architect
20097 Hamburg  Phone: +49-40-251523-413
Germany  Fax: +49-40-251523-425
http://www.sun.de/Mobile: +49-172-847 62 66
  dirk.nitsc...@sun.com
---
Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1,
D-85551 Kirchheim-Heimstetten - Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files

2009-06-17 Thread Casper . Dik


Hi Louis!

Solaris /usr/bin/find and Linux (GNU-) find work differently! I have  
experienced dramatic runtime differences some time ago. The reason is  
that Solaris find and GNU find use different algorithms.

GNU find uses the st_nlink (number of links) field of the stat  
structure to optimize it's work. Solaris find does not use this kind  
of optimization because the meaning of number of links is not well  
defined and file system dependent.

But that's not the under discussion: apparently the *same* clients
get different performance from a OpenStorage system vs a Netapp
system.

I think we need to know much more and I think OpenStorage can giv
the information you need.

(Yes, I did have problems because of GNU finds shortcuts: they don't
work all the time)

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files

2009-06-17 Thread Joerg Schilling

Dirk Nitschke dirk.nitsc...@sun.com wrote:

 Solaris /usr/bin/find and Linux (GNU-) find work differently! I have  
 experienced dramatic runtime differences some time ago. The reason is  
 that Solaris find and GNU find use different algorithms.

Correct: Solaris find honors the POSIX standard, GNU find does not :-(


 GNU find uses the st_nlink (number of links) field of the stat  
 structure to optimize it's work. Solaris find does not use this kind  
 of optimization because the meaning of number of links is not well  
 defined and file system dependent.

GNU find makes illegal assumptions on the value in st_nlink for diretctories.

These assumptions are derived from implementation specifics found in historic
UNIX filesystem implementations, but there is no grant for the asumed behavior.


 If you are interested, take a look at, say,

 CR 4907267 link count problem is hsfs

Hsfs just shows you the numbers set up by the ISO-9660 filesystem creation 
utility. If you use a recent original mkisofs (like the one that come with 
Solaris since 1.5 years), you get the same behavior for hsfs and UFS. The 
related feature has been implemented in October 2006 in mkisofs.

If you use mkisofs from one of the non-OSS-friendly Linux distributions like
Debian, RedHat, Suse, Ubuntu or Mandriva, you use a 5 year old mkisofs version 
and thus the values in st_nlink for directories are random numbers.

The problems caused by programs that ignore POSIX rules have been discussed 
several times in the POSIX mailing list. In order to solve the issue, I did
propose several times to introduce a new pathconf() call that allows to ask 
whether a directory has historic UFS semantics for st_nlink.

Hsfs knows whether the filesystem was created by a recent mkisofs and thus
would be able to give the right return value. NFS clients need to implement a 
RPC that allows to retrieve the value from the expoirted filesystem at the 
server side. 




 Am 17.06.2009 um 18:08 schrieb Louis Romero:

  Jose,
 
  I believe the problem is endemic to Solaris.  I have run into  
  similar problems doing a simple find(1) in /etc.  On Linux, a find  
  operation in /etc is almost instantaneous.  On solaris, it has a  

If you ignore standards you may get _apparent_ speed. On Linux this speed
is usually bought by giving up correctness.


  tendency to  spin for a long time.  I don't know what their use of  
  find might be but, running updatedb on the linux clients (with the  
  NFS file system mounted of course) and using locate(1) will give you  
  a work-around on the linux clients.

With NFS, things are even more complex and in principle similar to the hsfs 
case where the OS filesystem implementation just shows you the values set up 
by mkisofs.

On a NFS client, you see the number that have been set up by the NFS server
but you don't know what filesystem type is under the NFS server.

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files

2009-06-16 Thread Paisit Wongsongsarn


Hi Jose,

Enable SSD (cache device usage) only for meta data would help?. 
Assuming that you have read optimized SSD in place.


I never try it out but worth to try by just turn on.

regards,
Paisit W.

Jose Martins wrote:


Hello experts,

IHAC that wants to put more than 250 Million files on a single
mountpoint (in a directory tree with no more than 100 files on each
directory).

He wants to share such filesystem by NFS and mount it through
many Linux Debian clients

We are proposing a 7410 Openstore appliance...

He is claiming that certain operations like find, even if taken from
the Linux clients on such NFS mountpoint take significant more
time than if such NFS share was provided by other NAS providers
like NetApp...

Can someone confirm if this is really a problem for ZFS filesystems?...

Is there any way to tune it?...

We thank any input

Best regards

Jose






--
+-*-*-*-*-*-*-+
Paisit Wongsongsarn
Regional Technical Lead (DMA  PFT)
Storage Practice, Sun Microsystems Asia South
DID: +65 6-239-2626, Mobile: +65 9-154-1717, Email: pai...@sun.com
Blogs: blogs.sun.com/paisit
+-*-*-*-*-*-*-+

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files

Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files

Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files

Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files

Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files

Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files

Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files

Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files

Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files

Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files

Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files

Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files

Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files

Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files

Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files

Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files

16 matches

Site Navigation

Mail list logo

Footer information