Re: Max files in unix folder from PIL process
Hi All-- Rowdy wrote: FreeDB (CD database) stores one file per CD in one directory per category. The misc category/directory on my FreeBSD 5.3 system currently contains around 481,571 small files. The rock directory/category contains 449,208 files. As some have said, ls is *very* slow on these directories, but otherwise there don't seem to be any problems. I assume you're all using Linux. The GNU version of ls does two things that slow it down. The System V and BSD versions were pretty much identical, in that they processed the argv array in whatever order the shell passed it in. The GNU version re-orders the argv array and stuffs all the arguments into a queue. No big deal if you're just doing ls, but for ls multiple directory names it can slow it down for large argv[n] and/or recursive/deep ls. The other thing it does different from SysV/BSD ls is that it provides for default options in an environment variable. If those env settings specify to always use color, that will slow directory processing _way_ down, identically to the -F option. That's because the color and -F options _require_ a stat() on each and every file in the directory. Standard ls with no options (or old SysV/BSD ls that came with no options) works nearly as fast as os.listdir() in Python, because it doesn't require a stat(). The only thing faster, from a shell user's viewpoint, is 'echo *'. That may not be much help;-) Metta, Ivan -- Ivan Van Laningham God N Locomotive Works http://www.andi-holmes.com/ http://www.foretec.com/python/workshops/1998-11/proceedings.html Army Signal Corps: Cu Chi, Class of '70 Author: Teach Yourself Python in 24 Hours -- http://mail.python.org/mailman/listinfo/python-list
Re: Max files in unix folder from PIL process
Yes I'm talking Linux not BSD so with any luck you won't have the same 'ls' issue; it is not a crash but painfully slow. The only other issue I recall is wildcards fail if they encompass too many files (presumably a bash/max command line size). I would expect the various GUI file managers may give unpredictable results; I would also not rely on remotely mounting the bigdir cross-platform. -- http://mail.python.org/mailman/listinfo/python-list
Max files in unix folder from PIL process
Hi. I am creating a python application that uses PIL to generate thumbnails and sized images. It is beginning to look the volume of images will be large. This has got me to thinking. Is there a number that Unix can handle in a single directory. I am using FreeBSD4.x at the moment. I am thinking the number could be as high 500,000 images in a single directory but more likely in the range of 6,000 to 30,000 for most. I did not want to store these in Postgres. Should this pose a problem on the filesystem? I realize less a python issue really but I though some one might have an idea on the list. Regards, David. -- http://mail.python.org/mailman/listinfo/python-list
Re: Max files in unix folder from PIL process
I ran into a similar situation with a massive directory of PIL generated images (around 10k). No problems on the filesystem/Python side of things but other tools (most noteably 'ls') don't cope very well.As it happens my data has natural groups so I broke the big dir into subdirs to sidestep the problem. -- http://mail.python.org/mailman/listinfo/python-list
Re: Max files in unix folder from PIL process
Hi Jason. Many thanks your reply. This is good to know about ls - what did it do? Was it just slow or did the server or machine die? My images will be going into the path of a web server. This is unchartered territory for me and I don't know whether there will be speed and access problems or how the filesystem copes with this kind of volume. I am definitely planning to split the images into directories by size and that will at least divide the number by a factor of the various sizes (but on the higher end this could still be between 150 - 175 thousand images which is still a pretty big number. I don't know if this will be a problem or not or there is really anything to worry about at all - but it is better to obtain advice from those that have been there, done that - or are at least a bit more familiar with pushing limits on Unix resources than to wonder whether it will work. Regards, David On Monday, March 28, 2005, at 07:18 PM, Kane wrote: I ran into a similar situation with a massive directory of PIL generated images (around 10k). No problems on the filesystem/Python side of things but other tools (most noteably 'ls') don't cope very well.As it happens my data has natural groups so I broke the big dir into subdirs to sidestep the problem. -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: Max files in unix folder from PIL process
Kane wrote: I ran into a similar situation with a massive directory of PIL generated images (around 10k). No problems on the filesystem/Python side of things but other tools (most noteably 'ls') don't cope very well. My experience suggests that 'ls' has a lousy sort routine or that it takes a long time to get the metadata. When I've had to deal with a huge number of files in a directory I can get the list very quickly in Python using os.listdir even though ls is slow. If you're in that situation again, see if the '-f' for unsorted flag makes a difference or use '-1' to see if it's all the stat calls. Andrew [EMAIL PROTECTED] -- http://mail.python.org/mailman/listinfo/python-list
Re: Max files in unix folder from PIL process
David Pratt wrote: Hi. I am creating a python application that uses PIL to generate thumbnails and sized images. It is beginning to look the volume of images will be large. This has got me to thinking. Is there a number that Unix can handle in a single directory. I am using FreeBSD4.x at the moment. I am thinking the number could be as high 500,000 images in a single directory but more likely in the range of 6,000 to 30,000 for most. I did not want to store these in Postgres. Should this pose a problem on the filesystem? I realize less a python issue really but I though some one might have an idea on the list. It all depends on the file system you are using, and somewhat on the operations you are typically performing. I assume this is ufs/ffs, so the directory is a linear list of all files. This causes some performance concerns for accessing: if you want to access an individual file, you need to scan the entire directory. The size of a directory entry depends on the length of a name. Assuming file names of 10 characters, in which case each entry is 20 bytes, a directory with 500,000 images file names requires 10MB on disk. So each directory lookup would potentially require to read 10MB from disk, which might be noticable. For 6,000 entries, the directory size is 120kB, which might not be noticable. In 4.4+, there is a kernel compile time option UFS_DIRHASH, which causes creation of an in-memory hashtable for directories, speeding up lookups significantly. This requires, of course, enough main memory to actually keep the hashtable. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list
Re: Max files in unix folder from PIL process
David Pratt wrote: Hi. I am creating a python application that uses PIL to generate thumbnails and sized images. It is beginning to look the volume of images will be large. This has got me to thinking. Is there a number that Unix can handle in a single directory. I am using FreeBSD4.x at the moment. I am thinking the number could be as high 500,000 images in a single directory but more likely in the range of 6,000 to 30,000 for most. I did not want to store these in Postgres. Should this pose a problem on the filesystem? I realize less a python issue really but I though some one might have an idea on the list. Regards, David. FreeDB (CD database) stores one file per CD in one directory per category. The misc category/directory on my FreeBSD 5.3 system currently contains around 481,571 small files. The rock directory/category contains 449,208 files. As some have said, ls is *very* slow on these directories, but otherwise there don't seem to be any problems. Rowdy -- http://mail.python.org/mailman/listinfo/python-list