quering H5 "flatten" data in apache drill in releases after 1.19.0

Tore Van Grembergen via user Thu, 05 Mar 2026 14:32:40 -0800

Hi Team,

I am looking into using the apache drill capabilities for querying H5 data.
The documentation on this as provided on the site 
https://drill.apache.org/docs/hdf5-format-plugin/ works for version 1.19.0, 
however not as of 1.20.0.
The column where the actual data is mapped into seems to be no longer available.


e.g. the column int_data as per below example is no longer there .

apache drill> select * from dfs.test.`dset.h5`;
|-------|-----------|-----------|-----------|---------------|--------------|------------------|-------------------|------------|--------------------------------------------------------------------------|
| path  | data_type | file_name | data_size | element_count | is_timestamp | 
is_time_duration | dataset_data_type | dimensions | int_data                    
                                             |
|-------|-----------|-----------|-----------|---------------|--------------|------------------|-------------------|------------|--------------------------------------------------------------------------|
| /dset | DATASET   | dset.h5   | 96        | 24            | false        | 
false            | INTEGER           | [4, 6]     | 
[[1,2,3,4,5,6],[7,8,9,10,11,12],[13,14,15,16,17,18],[19,20,21,22,23,24]] |
|-------|-----------|-----------|-----------|---------------|--------------|------------------|-------------------|------------|--------------------------------------------------------------------------|


I have read somewhere that a parameter in the workspace definition 
"showPreview" : true should restore the original way of working, however when 
trying to save this parameter, it is automagically removed.
(remark : the environment is running the apache/drill image in a docker 
container, the config is stored on a mounted drive)

The reason for needing this int_data, double_data column is that there are a 
lot of times too many values in and it is not known upfront  how many values 
will be in the field.
Hence the "column" approach in the select * from table(xyz) is not workable.
It is necessary to be able to do  e.g. select flatten(int_data) as int_data 
from dfs.test.dset.h5;

Is there a way to get this (re)-activated in apache dril 1.22 and successors ?

All help is much appreciated.

Kind regards

Tore

quering H5 "flatten" data in apache drill in releases after 1.19.0

Reply via email to