On Jun 19, 2014, at 11:21 AM, Andy Colson wrote:

> I think it depends on how you are going to use them.  I, for example, have 
> lots of images that are served on a web page, after benchmarks I found it was 
> faster to store them on filesystem and let apache serve them directly.

I rarely store images like that locally now; I just toss them onto Amazon S3.

When I did have to store lots of images locally , I found this to be the best 
method:

1. The Postgres record for the image is given a unique and random hash as a 
hexdigest
2. The Image is saved onto a filesystem into a directory mapped by the hexdigest

for example, there might be something like this:

        Postgres:
        id      | filename | hash
        001 | image.jpg | abcdef123

        Filesystem
        abc/def/123/abcdef123-image.jpg

        nginx/apache rewrite rule :
        abcdef123-image.jpg ->  abc/def/123/abcdef123-image.jpg

the reason for this has to do with the performance of various filesystems and 
issues with the distribution of digits in a sequence.  it ties into Benford's 
Law ( http://en.wikipedia.org/wiki/Benford's_law ) as well.

a handful of filesystems exhibit decreased performance as the number of items 
in a directory increases.  a few years ago, 1k-4k items was a safe max -- but 
at 10x that some filesystems really slowed.  i think most modern filesystems 
are still quick at the 5-10k range.  

a hash has more characters and a more normal distribution than a series of 
numbers or natural language filenames.

and if you group a hexdigest into triplets , you get 4096 max files/folders in 
a directory  which is a decent sweet spot
        16 * 16 * 16 = 4096

i haven't had to deal with this sort of stuff in almost 10 years now.  but 
archiving content like this back then was a considerable improvement to 
filesystem performance and web serving.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to