Re: [CentOS] inquiry about limitation of file system

2018-11-03 Thread Gordon Messmer

On 11/3/18 12:44 AM, yf chu wrote:

I wonder whether the performance will be affected if there are too many files 
and directories on the server.



With XFS on modern CentOS systems, you probably don't need to worry:
https://www.youtube.com/watch?v=FegjLbCnoBw

For older systems, as best I understand it: As the directory tree grows, 
the answer to your question depends on how many entries are in the 
directories, how deep the directory structure is, and how random the 
access pattern is.  Ultimately, you want to minimize the number of 
individual disk reads required.


Directories with lots of entries is one situation where you may see 
performance degrade.  Typically around the time the directory grows 
larger than the maximum size of the direct block list [1] (48k), reading 
the directory starts to take a little longer. After the maximum size of 
the single indirect block list (4MB), it will tend to get slower again.  
File names impact directory size, so average filename length factors in, 
as well as the number of files.


A given file lookup will need to reach each of the parent directories to 
locate the next item in the path.  If your path is very deep, then your 
directories are likely to be smaller on average, but you're increasing 
the number of lookups required for parent directories to reduce the 
length of the block list.  It might make your worst-case better, but 
your best-case is probably worse.


The system's cache means that accessing a few files in a large structure 
is not as expensive as random files in a large structure.  If you have a 
large structure, but users tend to access mostly the same files at any 
given time, then the system won't be reading the disk for every lookup.  
If accesses aren't random, then structure size becomes less important.


Hashed name directory structure has been mentioned, and those can be 
useful if you have a very large number of objects to store, and they all 
have the same permission set.  A hashed name structure typically 
requires that you  store in a database a map between the original names 
(that users see) and the names' hashes.  You could hash each name at 
lookup, but that doesn't give you a good mechanism for dealing with 
collisions.  Hashed name directory structures typically have a worse 
best-case performance due to the lookup, but they offer predictable and 
even growth for lookup times for each file.  Where a free-form directory 
structure might have a large difference between the best-case and 
worst-case lookup, a hashed name directory structure should be roughly 
the same access time for all files.



1: https://en.wikipedia.org/wiki/Inode_pointer_structure

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] inquiry about limitation of file system

2018-11-03 Thread Keith Keller
On 2018-11-03, Jonathan Billings  wrote:
>
> Now, filesystem limits aside, software that try to read those directories 
> with huge numbers of files are going to have performance issues. I/O 
> operations, memory limitations and time are going to be bottlenecks to web 
> operations. 

Just to be pedantic, it's only what Jonathan suggested that would be a
performance problem.  Typically, a web server doesn't need to read the
directory in order to retrieve a file and send it back to a client, so
that wouldn't necessarily be a performance issue.  But having too many
files in one directory would impact other operations that might be
important, like backups, finding files, or most other bulk file
operations, which would also have an effect on other processes like the
web server.  (And if the web server is generating directory listings on
the fly that would be a huge performance problem.)

And as others have mentioned, this issue isn't filesystem-specific.
There are ways to work around some of these issues, but in general it's
better to avoid them in the first place.

The typical ways of working around this issue are storing the files in a
hashed directory tree, and storing the files as blobs in a database.
There are lots of tools to help either job.

--keith

-- 
kkel...@wombat.san-francisco.ca.us


___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] inquiry about limitation of file system

2018-11-03 Thread Frank Cox
On Sun, 4 Nov 2018 06:37:53 +0800 (CST)
yf chu wrote:

> Thank you for your advice. I know the issue depends on  a lot of factors.
> Would you please give me some detail information about how to tune these
> parameters such as the size of cache,the type of cpu? I am not quite familiar
> with these detail.

Depending on the nature of these "millions of files", you may want to consider 
a database-backed application rather than simply dumping the files into a 
directory tree of some kind.   I assume that you'll have to index your files in 
some way to make all of these web pages useful, so a database might be what you 
want instead of a simple heap o' html files.

Then you won't be dealing with millions of files.  A properly constructed 
database can be very efficient; a lot of very smart people have put a lot of 
thought into making it so.

-- 
MELVILLE THEATRE ~ Real D 3D Digital Cinema ~ www.melvilletheatre.com
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] inquiry about limitation of file system

2018-11-03 Thread yf chu
Thank you for your advice. I know the issue depends on  a lot of factors. Would 
you please give me some detail information about how to tune these parameters 
such as the size of cache,the type of cpu? I am not quite familiar with these 
detail.








At 2018-11-03 22:39:55, "Stephen John Smoogen"  wrote:
>On Sat, 3 Nov 2018 at 04:17, yf chu  wrote:
>>
>> Thank you for your hint.
>> I really mean I am planning to  store  millions of files on the file system.
>> Then may I ask that what is the maximum number of files which could be 
>> stored in one directory without affecting the performance of web server?
>>
>>
>
>There is no simple answer to that. It will depend on everything from
>the physical drives used, the hardware that connects the motherboard
>to the drives, the size of the cache and type of CPU on the system,
>any low level filesystem items (software/hardware raid, type of raid,
>redundancy of the raid, etc), the type of the file system, the size of
>the files, the layout of directory structure, and the metadata
>connected to those files and needing to be checked.
>
>Any one of those can severely affect partially performance of the
>web-server, and multiple combinations of them can severely affect it.
>This means a lot of benchmarking for the hardware and os are needed to
>get an idea if any of the tuning of number of files per directory will
>make things better or not. I have seen many systems where the hardware
>worked better with a certain type of RAID and it didn't matter if you
>had 10,000 or 100 files in each directory.. the changes in performance
>were minimal but moving from RAID10 to RAID6 or vice versa sped things
>up much more.. or adding more cache to the hardware controller etc
>etc.
>
>Assuming you have tuned all of that, then the number of files in the
>directory comes down to a 'gut' check. I have seen some people do some
>sort of power of 2 per directory but rarely go over 1024. if you do a
>3 level double hex tree <[0-f][0-f]>/<[0-f][0-f]>/<[0-f][0-f]>/ and
>lay them out using some sort of file hash method.. you can easily sit
>256 files in each directory and have 2^32 files.. You will probably
>end up with some hot spots depending on the hash method so it would be
>good to test that first.
>
>>
>>
>>
>>
>>
>>
>> At 2018-11-03 16:03:56, "Walter H."  wrote:
>> >On 03.11.2018 08:44, yf chu wrote:
>> >> I have a website with millions of pages.
>> >>
>> >does 'millions of pages' also mean 'millions of files on the file system'?
>> >
>> >just a hint - has nothing to do with any file system as its universal:
>> >e.g. when you have 1 files
>> >don't store them in one folder, create 100 folders with 100 files in each;
>> >
>> >there is no file system that handles millions of files in one folder
>> >or with limited resources (e.g. RAM)
>> >
>> >___
>> >CentOS mailing list
>> >CentOS@centos.org
>> >https://lists.centos.org/mailman/listinfo/centos
>> ___
>> CentOS mailing list
>> CentOS@centos.org
>> https://lists.centos.org/mailman/listinfo/centos
>
>
>
>-- 
>Stephen J Smoogen.
>___
>CentOS mailing list
>CentOS@centos.org
>https://lists.centos.org/mailman/listinfo/centos
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] inquiry about limitation of file system

2018-11-03 Thread Jonathan Billings
On Nov 3, 2018, at 04:16, yf chu  wrote:
> 
> Thank you for your hint.
> I really mean I am planning to  store  millions of files on the file system.
> Then may I ask that what is the maximum number of files which could be stored 
> in one directory without affecting the performance of web server?

There are hard limits in each file system.

 For ext4, there is no per-directory limit, but there is an upper limit of 
total files (inodes really) per file system: 2^32 - 1  (4,294,967,295). XFS 
also has no per-directory limit, and a 2^64 limit of inodes. 
(18,446,744,073,709,551,616)

If you are using ext2 or 3 I think the limit per directory is around 10,000, 
and you start seeing heavy performance issues beyond that. Don’t use them. 

Now, filesystem limits aside, software that try to read those directories with 
huge numbers of files are going to have performance issues. I/O operations, 
memory limitations and time are going to be bottlenecks to web operations. 

You really need to reconsider how you want to serve these pages. 
--
Jonathan Billings
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] inquiry about limitation of file system

2018-11-03 Thread Stephen John Smoogen
On Sat, 3 Nov 2018 at 04:17, yf chu  wrote:
>
> Thank you for your hint.
> I really mean I am planning to  store  millions of files on the file system.
> Then may I ask that what is the maximum number of files which could be stored 
> in one directory without affecting the performance of web server?
>
>

There is no simple answer to that. It will depend on everything from
the physical drives used, the hardware that connects the motherboard
to the drives, the size of the cache and type of CPU on the system,
any low level filesystem items (software/hardware raid, type of raid,
redundancy of the raid, etc), the type of the file system, the size of
the files, the layout of directory structure, and the metadata
connected to those files and needing to be checked.

Any one of those can severely affect partially performance of the
web-server, and multiple combinations of them can severely affect it.
This means a lot of benchmarking for the hardware and os are needed to
get an idea if any of the tuning of number of files per directory will
make things better or not. I have seen many systems where the hardware
worked better with a certain type of RAID and it didn't matter if you
had 10,000 or 100 files in each directory.. the changes in performance
were minimal but moving from RAID10 to RAID6 or vice versa sped things
up much more.. or adding more cache to the hardware controller etc
etc.

Assuming you have tuned all of that, then the number of files in the
directory comes down to a 'gut' check. I have seen some people do some
sort of power of 2 per directory but rarely go over 1024. if you do a
3 level double hex tree <[0-f][0-f]>/<[0-f][0-f]>/<[0-f][0-f]>/ and
lay them out using some sort of file hash method.. you can easily sit
256 files in each directory and have 2^32 files.. You will probably
end up with some hot spots depending on the hash method so it would be
good to test that first.

>
>
>
>
>
>
> At 2018-11-03 16:03:56, "Walter H."  wrote:
> >On 03.11.2018 08:44, yf chu wrote:
> >> I have a website with millions of pages.
> >>
> >does 'millions of pages' also mean 'millions of files on the file system'?
> >
> >just a hint - has nothing to do with any file system as its universal:
> >e.g. when you have 1 files
> >don't store them in one folder, create 100 folders with 100 files in each;
> >
> >there is no file system that handles millions of files in one folder
> >or with limited resources (e.g. RAM)
> >
> >___
> >CentOS mailing list
> >CentOS@centos.org
> >https://lists.centos.org/mailman/listinfo/centos
> ___
> CentOS mailing list
> CentOS@centos.org
> https://lists.centos.org/mailman/listinfo/centos



-- 
Stephen J Smoogen.
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] inquiry about limitation of file system

2018-11-03 Thread yf chu
Thank you for your hint.
I really mean I am planning to  store  millions of files on the file system.
Then may I ask that what is the maximum number of files which could be stored 
in one directory without affecting the performance of web server?








At 2018-11-03 16:03:56, "Walter H."  wrote:
>On 03.11.2018 08:44, yf chu wrote:
>> I have a website with millions of pages.
>>
>does 'millions of pages' also mean 'millions of files on the file system'?
>
>just a hint - has nothing to do with any file system as its universal:
>e.g. when you have 1 files
>don't store them in one folder, create 100 folders with 100 files in each;
>
>there is no file system that handles millions of files in one folder
>or with limited resources (e.g. RAM)
>
>___
>CentOS mailing list
>CentOS@centos.org
>https://lists.centos.org/mailman/listinfo/centos
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] inquiry about limitation of file system

2018-11-03 Thread Walter H.

On 03.11.2018 08:44, yf chu wrote:

I have a website with millions of pages.


does 'millions of pages' also mean 'millions of files on the file system'?

just a hint - has nothing to do with any file system as its universal:
e.g. when you have 1 files
don't store them in one folder, create 100 folders with 100 files in each;

there is no file system that handles millions of files in one folder
or with limited resources (e.g. RAM)

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos