A Thursday 19 March 2009, Pete Gething escrigué:
> Dear list,
>
>
>
> I'm trying to read a section of a large 4-d chunk array into memory
> and am getting some strange behaviour.
>
>
>
> The array is a set up as:
>
>
>
> /realizations (CArray(3, 1978, 1556, 288), shuffle, zlib(1)) ''
>
> atom := Float32Atom(shape=(), dflt=0.0)
>
> maindim := 0
>
> flavor := 'numpy'
>
> byteorder := 'little'
>
> chunkshape := (1, 1978, 1556, 1)
>
>
>
> so overall dimensions of (3, 1978, 1556, 288) made up of chunks of
> (1, 1978, 1556, 1).
>
>
>
> Reading in a 2-d block (corresponding to a single chunk)...
>
>
>
> f_chunk = hr.realizations[0,:,:,0]
>
>
>
> ...takes under a second.
>
>
>
> Reading in a 3-d block of width=2 across the first dimension....
>
>
>
> f_chunk = hr.realizations[0:2:1,:,:,0]
>
>
>
> ...takes about a second:
>
>
>
> So far so good, but reading in a 3-d block of width=2 across the
> fourth dimension....
>
>
>
> f_chunk = hr.realizations[0,:,:, 0:2:1]
>
>
>
> ...suddenly takes nearer 45 minutes!
>
>
>
> I'd be very grateful if anyone could explain why the latter operation
> is requiring this much time - clearly something is going on in
> pytables that I don't currently understand...
That's a good question, and one that I'd like to know the answer too!
After some digging, I've tracked down the problem to be in the HDF5
library. I've reported that to the hdf-forum list and I'll report here
whatever response the HDF5 crew would give.
For your reference, here it is a copy on my report:
---------- Missatge transmès ----------
Subject: Reading across multiple chunks is very slow
Date: Thursday 19 March 2009
From: Francesc Alted <fal...@pytables.org>
To: HDF Group <hdf-fo...@hdfgroup.org>
Hi,
A PyTables' user has reported a performance problem when reading a
dataset in some cases. I've tracked down the problem to the HDF5
library as the output of the attached script reveals:
Time for creating dataset with dims {3, 1978, 1556, 288} --> 0.000000
Time for writing hyperslice {2, 1978, 1556, 2} --> 12.010000
Time for reading hyperslice {2, 1978, 1556, 1} --> 0.020000
Time for reading hyperslice {1, 1978, 1556, 2} --> 2.490000
[This dataset has a chunksize of: {1, 1978, 1556, 1}]
The problem is: why it took 100x times more to read a hyperslice with a
count of {1, 1978, 1556, 2} than other with count {2, 1978, 1556, 1}?
I was trying to figure out what's happening, but as I can't realize a
clear explanation, I think that perhaps this is a bug in HDF5. I've
tried with HDF5 1.6.5, 1.8.2 and 1.8.2-post8, all with similar results.
Thanks,
--
Francesc Alted
#include <time.h>
#include "hdf5.h"
#define H5FILE_NAME "/tmp/read-performance-problem.h5"
#define DATASETNAME "ChunkedArray"
#define RANK 4
static int data1[2][1978][1556][2];
int
main (void)
{
hid_t file; /* handles */
hid_t dataspace, dataset;
hid_t filespace;
hid_t cparms;
hsize_t dims[RANK] = {3, 1978, 1556, 288};
hsize_t maxdims[RANK] = {3, 1978, 1556, 288};
hsize_t chunk_dims[RANK] = {1, 1978, 1556, 1};
hsize_t size[RANK];
hsize_t offset[RANK] = {0, 0, 0, 0};
hsize_t count1[RANK] = {2, 1978, 1556, 2};
hsize_t count2[RANK] = {2, 1978, 1556, 1};
hsize_t count3[RANK] = {1, 1978, 1556, 2};
int i, j, k, l;
float t1;
herr_t status;
int fillvalue = 0;
/* Fill up data1 with values different from fillvalue */
for (i=0; i<count1[0]; i++)
for (j=0; j<count1[1]; j++)
for (k=0; k<count1[2]; k++)
for (l=0; l<count1[3]; l++)
data1[i][j][k][l] = 1;
/*
* Create the data space.
*/
dataspace = H5Screate_simple(RANK, dims, maxdims);
/*
* Create a new file. If file exists its contents will be overwritten.
*/
file = H5Fcreate(H5FILE_NAME, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
/*
* Modify dataset creation properties, i.e. enable chunking.
*/
cparms = H5Pcreate(H5P_DATASET_CREATE);
status = H5Pset_chunk( cparms, RANK, chunk_dims);
status = H5Pset_fill_value (cparms, H5T_NATIVE_INT, &fillvalue );
/*
* Create a new dataset within the file using cparms
* creation properties.
*/
t1 = (float)clock();
dataset = H5Dcreate(file, DATASETNAME, H5T_NATIVE_INT, dataspace, cparms);
printf("Time for creating dataset with dims {3, 1978, 1556, 288} --> %f\n",
(clock()-t1)/CLOCKS_PER_SEC);
/*
* Select a hyperslab.
*/
filespace = H5Dget_space(dataset);
status = H5Sselect_hyperslab(filespace, H5S_SELECT_SET, offset, NULL,
count1, NULL);
/*
* Create a new data space.
*/
H5Sclose(dataspace);
dataspace = H5Screate_simple(RANK, count1, NULL);
/*
* Write the data to the hyperslab.
*/
t1 = (float)clock();
status = H5Dwrite(dataset, H5T_NATIVE_INT, dataspace, filespace,
H5P_DEFAULT, data1);
printf("Time for writing hyperslice {2, 1978, 1556, 2} --> %f\n",
(clock()-t1)/CLOCKS_PER_SEC);
H5Sclose(dataspace);
H5Sclose(filespace);
H5Pclose(cparms);
H5Fclose(file);
/* Reopen the file in read mode */
file = H5Fopen(H5FILE_NAME, H5F_ACC_RDONLY, H5P_DEFAULT);
dataset = H5Dopen(file, DATASETNAME);
/*
* Read count2 hyperslab.
*/
filespace = H5Dget_space(dataset);
status = H5Sselect_hyperslab(filespace, H5S_SELECT_SET, offset, NULL,
count2, NULL);
dataspace = H5Screate_simple(RANK, count2, NULL);
t1 = (float)clock();
H5Dread(dataset, H5T_NATIVE_INT, dataspace, filespace, H5P_DEFAULT, data1);
printf("Time for reading hyperslice {2, 1978, 1556, 1} --> %f\n",
(clock()-t1)/CLOCKS_PER_SEC);
/*
* Read count3 hyperslab.
*/
filespace = H5Dget_space(dataset);
status = H5Sselect_hyperslab(filespace, H5S_SELECT_SET, offset, NULL,
count3, NULL);
dataspace = H5Screate_simple(RANK, count3, NULL);
t1 = (float)clock();
H5Dread(dataset, H5T_NATIVE_INT, dataspace, filespace, H5P_DEFAULT, data1);
printf("Time for reading hyperslice {1, 1978, 1556, 2} --> %f\n",
(clock()-t1)/CLOCKS_PER_SEC);
/*
* Close/release resources.
*/
H5Dclose(dataset);
H5Sclose(dataspace);
H5Sclose(filespace);
H5Fclose(file);
return 0;
}
------------------------------------------------------------------------------
Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
easily build your RIAs with Flex Builder, the Eclipse(TM)based development
software that enables intelligent coding and step-through debugging.
Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users