Hi Lens Developers, I am testing 'narrow table' with Lens. If table1 and table2 has the same granularity and table2's columns is a subset of table1, then table2 is a narrow table of table1.
I defined a dimension dimension1: *dimension1.xml:* *<x_dimension name="dimension1" xmlns="uri:lens:cube:0.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance <http://www.w3.org/2001/XMLSchema-instance>"* * xsi:schemaLocation="uri:lens:cube:0.1 cube-0.1.xsd ">* * <attributes>* * <dim_attribute name="primary_key" _type="BIGINT"/>* * <dim_attribute name="attr1" _type="BIGINT"/>* * <dim_attribute name="attr2" _type="BIGINT"/>* * <dim_attribute name="attr3" _type="BIGINT"/>* * <dim_attribute name="attr4" _type="BIGINT"/>* * <dim_attribute name="attr5" _type="BIGINT"/>* * <dim_attribute name="attr6" _type="BIGINT"/>* * <dim_attribute name="attr7" _type="BIGINT"/>* * <dim_attribute name="attr8" _type="BIGINT"/>* * </attributes>* * <properties>* * <property name="dimension.dimension1.timed.dimension" value="dt"/>* * </properties>* *</x_dimension>* And defined two dimtables under this dimension: *dimension1_division2.xml* <x_dimension_table dimension_name="dimension1" table_name="dimension1_division2" weight="10.0" xmlns="uri:lens:cube:0.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="uri:lens:cube:0.1 cube-0.1.xsd "> <columns> <column name="primary_key" _type="BIGINT"/> * <column name="attr5" _type="BIGINT"/>* * <column name="attr6" _type="BIGINT"/>* * <column name="attr7" _type="BIGINT"/>* * <column name="attr8" _type="BIGINT"/>* </columns> <properties> <property name="dimension1.prop" value="d2"/> </properties> <storage_tables> ... </storage_tables> </x_dimension_table> *dimension1_division2_subset.xml* <x_dimension_table dimension_name="dimension1" table_name="dimension1_division2_subset" weight="10.0" xmlns="uri:lens:cube:0.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="uri:lens:cube:0.1 cube-0.1.xsd "> <columns> <column name="primary_key" _type="BIGINT"/> * <column name="attr5" _type="BIGINT"/>* * <column name="attr6" _type="BIGINT"/>* </columns> <properties> <property name="dimension1.prop" value="d2_subset"/> </properties> <storage_tables> ... </storage_tables> </x_dimension_table> Then, I run the query again attr5: *lens-shell>select primary_key, attr5 from dimension1* I expect it will use the table *dimension1_division2_subset *since it is a narrow table and hence will be more efficient. However, it picked up the table *dimension1_division2:* *25 Aug 2016 22:18:14 [Spring Shell] INFO cliLogger - Driver query: 'INSERT OVERWRITE DIRECTORY "hdfs://ltx1-holdemnn01.grid.linkedin.com:9000/user/dev_svc/lensreports/hdfsout/f20e5f20-3f09-4d32-a1b0-ac79d97e0f1a <http://ltx1-holdemnn01.grid.linkedin.com:9000/user/dev_svc/lensreports/hdfsout/f20e5f20-3f09-4d32-a1b0-ac79d97e0f1a>" SELECT ( dimension1 . primary_key ), ( dimension1 . attr5 ) FROM holdem_dimension1_division2 dimension1 WHERE ((((dimension1.dt = 'latest')))) ' and Driver handle: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=6ed9819c-d05f-427f-9ec3-740644072ce9]* Based on the server log, it seems the first candidate dimtable is picked: *[estimate-11] INFO org.apache.lens.cube.parse.CubeQueryContext - Available candidate dims are:[dimension1_division2, dimension1_division2_subset], picking up dimension1_division2 for querying.* Is it the expected behavior? If that is the case, then, do we have plan to support this type of optimization? Thanks, -- *Tao Yan* Software Engineer Data Analytics Infrastructure Tools and Services 206.250.5345 [email protected] https://www.linkedin.com/in/taousc
