Columnar data model for JSON stored in HBase column?

MOIS Martin (MORPHO) Wed, 13 May 2015 02:14:18 -0700

Hello,

currently I am evaluating Apache Drill and have a few questions regarding the 
implementation details using the HBase Storage Plugin.


The documentation explains that Drill optimizes storage and execution by using 
an in-memory data model that is hierarchical and columnar 
(http://drill.apache.org/docs/performance/). I understand the term "columnar" 
as it is described in the "Dremel" paper 
(http://research.google.com/pubs/pub36632.html).

In my use case I have an HBase table that stores in one column data in JSON 
format:

Put put = new Put(Bytes.toBytes("my-rowkey..."));
put.add(Bytes.toBytes("filterable"), Bytes.toBytes("filterable"), 
Bytes.toBytes("{\"firstName\": \"Martin\", \"lastName\": \"Mois\", ...}"));

As far as I have understood, I have to convert the data in the column to JSON 
in order to query them:

0: jdbc:drill:> select t.json.dateOfBirth from (select 
convert_from(p.filterable.filterable, 'JSON') json from hbase.person p);
+------------+
|   EXPR$0   |
+------------+
| 2007-02-04 |
...

If I now append a condition, I get the following error message:

select t.json.dateOfBirth from (select convert_from(p.filterable.filterable, 
'JSON') json from hbase.im_t_person p) t where t.json.dateOfBirth = 
'2014-09-07';
Query failed: SYSTEM ERROR: Unexpected exception during fragment 
initialization: null


[a2c6cdd8-e5bb-45ab-bd2a-39e728492e58 on trafodion.local:31010]
Error: exception while executing query: Failure while executing query. 
(state=,code=0)

The same happens when I create a view for the query above and set filter 
conditions on this view.

With the above use case in mind, I have the following questions:

1.       Is it possible to query the JSON data inside a column of an HBase 
table with conditions?

2.       When I query an HBase table, does Apache Drill create a  columnar data 
structure in memory for the JSON data contained in the HBase column? Is this 
in-memory structure re-used by similar queries on the view?

3.       If the column family "person" has been created with compression 
enabled, when does decompression happen? Once while the in-memory structure is 
build or again and again for each query?

4.       When we assume that another process updates a row in my HBase table 
while the query is running, how does Apache Drill sync the in-memory structure 
with updates made to the underlying HBase storage?

Please note that data conversion using the option 'store.format' as explained 
in the section "Data Type Conversion" 
(http://drill.apache.org/docs/data-type-conversion/) is not an option, as I 
want to use Apache Drill as some kind of OLAP system where I can query the data 
ad-hoc without any further data conversions.

Is there any kind of documentation (except the source code itself) that 
explains such kind of implementation details?

Best Regards,
Martin Mois
#
" This e-mail and any attached documents may contain confidential or 
proprietary information. If you are not the intended recipient, you are 
notified that any dissemination, copying of this e-mail and any attachments 
thereto or use of their contents by any means whatsoever is strictly 
prohibited. If you have received this e-mail in error, please advise the sender 
immediately and delete this e-mail and all attached documents from your 
computer system."
#

Columnar data model for JSON stored in HBase column?

Reply via email to