A Sunday 11 May 2008, Ivan Vilata i Balaguer escrigué:
> Dinesh B Vadhia (el 2008-05-10 a les 10:10:29 -0700) va dir::
> > I'm using the OS filesystem to store 32,000 images files.  I'm now
> > going to move them into a datastore and the choices are pysqlite or
> > MySQL or PyTables.  The number of images will grow rapidly (to the
> > millions and more) and hence performance is critical.  Multiple
> > images will be accessed from the data strore at a time.  There are
> > no write operations just read only.
> >
> > The data schema is: image index (on the image filename), image
> > filename, image (jpg initially but will be other formats in the
> > future).
> >
> > Any and all suggestions would be appreciated.
>
> Well, I don't quite understand the data schema (are you describing a
> row of three fields in a table), but you may have a look at the
> ``tables.nodes.filenode`` module, which contains a ``FileNode`` class
> which offers a Python file-like interface to a PyTables dataset (a
> one- dimensional ``EArray`` ) holding the bytes of the file.  I
> should be specially useful if you keep images stored with a file
> format like JPEG, PNG and the like.
>
> Also, I'd recommend not cramming all images under a single group to
> avoid performance problems when opening the group, but to pack them
> in groups of at most 4096 (see ``tables.parameters.MAX_GROUP_WIDTH``)
> images per group.

Yeah, I think Ivan is basically right on his appretiations.  However, 
I'd use a regular Array object for saving the images themselves instead 
of a FileNode.  A FileNode is meant more to deal with text where you 
can add and delete lines, but this is not the case of images.  For 
cases where you don't need the append/remove features, an Array is 
probably much more efficient.  And if you need compression, you may 
want to use a CArray instead.  The image index and filename may be 
saved as HDF5 attributes of the *Array objects.

Cheers,

-- 
Francesc Alted

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to