Hello, currently I am evaluating Apache Drill and have a few questions regarding the implementation details using the HBase Storage Plugin.
The documentation explains that Drill optimizes storage and execution by using an in-memory data model that is hierarchical and columnar (http://drill.apache.org/docs/performance/). I understand the term "columnar" as it is described in the "Dremel" paper (http://research.google.com/pubs/pub36632.html). In my use case I have an HBase table that stores in one column data in JSON format: Put put = new Put(Bytes.toBytes("my-rowkey...")); put.add(Bytes.toBytes("filterable"), Bytes.toBytes("filterable"), Bytes.toBytes("{\"firstName\": \"Martin\", \"lastName\": \"Mois\", ...}")); As far as I have understood, I have to convert the data in the column to JSON in order to query them: 0: jdbc:drill:> select t.json.dateOfBirth from (select convert_from(p.filterable.filterable, 'JSON') json from hbase.person p); +------------+ | EXPR$0 | +------------+ | 2007-02-04 | ... If I now append a condition, I get the following error message: select t.json.dateOfBirth from (select convert_from(p.filterable.filterable, 'JSON') json from hbase.im_t_person p) t where t.json.dateOfBirth = '2014-09-07'; Query failed: SYSTEM ERROR: Unexpected exception during fragment initialization: null [a2c6cdd8-e5bb-45ab-bd2a-39e728492e58 on trafodion.local:31010] Error: exception while executing query: Failure while executing query. (state=,code=0) The same happens when I create a view for the query above and set filter conditions on this view. With the above use case in mind, I have the following questions: 1. Is it possible to query the JSON data inside a column of an HBase table with conditions? 2. When I query an HBase table, does Apache Drill create a columnar data structure in memory for the JSON data contained in the HBase column? Is this in-memory structure re-used by similar queries on the view? 3. If the column family "person" has been created with compression enabled, when does decompression happen? Once while the in-memory structure is build or again and again for each query? 4. When we assume that another process updates a row in my HBase table while the query is running, how does Apache Drill sync the in-memory structure with updates made to the underlying HBase storage? Please note that data conversion using the option 'store.format' as explained in the section "Data Type Conversion" (http://drill.apache.org/docs/data-type-conversion/) is not an option, as I want to use Apache Drill as some kind of OLAP system where I can query the data ad-hoc without any further data conversions. Is there any kind of documentation (except the source code itself) that explains such kind of implementation details? Best Regards, Martin Mois # " This e-mail and any attached documents may contain confidential or proprietary information. If you are not the intended recipient, you are notified that any dissemination, copying of this e-mail and any attachments thereto or use of their contents by any means whatsoever is strictly prohibited. If you have received this e-mail in error, please advise the sender immediately and delete this e-mail and all attached documents from your computer system." #
