The idea of partitions in dimtables is based on snapshots. A metadata is
either snapshotted (at some interval the entire metadata is added in a new
partition), or it's fixed (like you mentioned). In that case, I believe
removing the partition column declaration from the dimtable xml file should
work. We have the same use case and are using it in production without any
issue. Hive metadata is snpathotted hourly/daily and db metadata is fixed.


On Tue, Aug 30, 2016 at 2:02 AM Tao Yan <[email protected]> wrote:

> Hi Rajat,
>
> I created the storage and dimtable using the configurations you provided,
> and the query failed because no partition is added to the table, so, I
> added the following partition:
>
> *<x_partition fact_or_dimension_table_name="dimension1_subset"
> update_period="HOURLY"*
> *  xmlns="uri:lens:cube:0.1"
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance
> <http://www.w3.org/2001/XMLSchema-instance>"*
> *  xsi:schemaLocation="uri:lens:cube:0.1 cube-0.1.xsd ">*
> *  <time_partition_spec>*
> *    <part_spec_element key="dt" value="2016-08-25T00:00:00"/>*
> *  </time_partition_spec>*
> *</x_partition>*
>
> And when I run the query again, Lens translate the Lens query to the
> following SQL:
>
>
>
> *lens-shell>select primary_key, attr1, attr5 from dimension129 Aug 2016
> 18:10:06 [Spring Shell] INFO  cliLogger - Query handle:
> ccb3b99a-22d6-4fbb-9bb8-9f06f633c4b029 Aug 2016 18:10:06 [Spring Shell]
> INFO  cliLogger - User query: 'select primary_key, attr1, attr5 from
> dimension1' was submitted to jdbc/mysql29 Aug 2016 18:10:06 [Spring Shell]
> INFO  cliLogger -  Driver query: 'SELECT ( dimension1 . primary_key ), (
> dimension1 . attr1 ), ( dimension1 . attr5 ) FROM mysql_dimension1_subset
> dimension1 WHERE ((((dimension1.dt = 'latest'))))' and Driver handle: 
> null**And
> it failed because of 'unknown column' error:*
>
> *29 Aug 2016 17:59:24 [Spring Shell] INFO  cliLogger - Query Status:
> Status : FAILED Message : Query execution failed! Progress : 1.0 Error :
> Unknown column 'dimension1_subset__lens_dimension1_subset_dimension1.dt' in
> 'where clause'*
>
> *This is expected because the column dt is not part of mysql table 
> **dimension1_subset.
> I added the column to the table and also set the value to 'latest', then,
> the query returned the result.*
>
>
>
> *​-----------------*
>
> *primary_key    attr1    attr5Results of query stored at :
> hdfs://ltx1-holdemnn01.grid.linkedin.com:9000/user/dev_svc/lensreports/ccb3b99a-22d6-4fbb-9bb8-9f06f633c4b0.csv
> <http://ltx1-holdemnn01.grid.linkedin.com:9000/user/dev_svc/lensreports/ccb3b99a-22d6-4fbb-9bb8-9f06f633c4b0.csv>
>  4 rows processed in (0) seconds.*
>
>
> *I think this is not the way Lens should work with MySQL because sometimes
> MySQL table cannot be changed, and it should not always be changed just
> because Lens is going to query it. In Hive, when a newer partition is
> added, Lens will automatically created a 'latest' partition, I wonder how
> that work in MySQL or other databases. *
>
> Is it possible to add a MySQL table to a partitioned dimtable without
> specifying the partition column, and assume the added MySQL table is the
> latest partition?
>
> Thanks,
>
> On Mon, Aug 29, 2016 at 12:46 AM, Rajat Khandelwal <[email protected]
> > wrote:
>
>> Use something like
>>
>> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
>> <x_storage xmlns="uri:lens:cube:0.1" name="mysql" 
>> classname="org.apache.lens.storage.db.DBStorage">
>> </x_storage>
>>
>>
>> <storage_table>
>>   <update_periods>
>>     <update_period>HOURLY</update_period>
>>   </update_periods>
>>   <storage_name>db</storage_name>
>>   <table_desc external="true"  
>> input_format="org.apache.hadoop.mapred.TextInputFormat" 
>> output_format="org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat" 
>> serde_class_name="org.apache.lens.storage.db.DBSerde" 
>> storage_handler_name="org.apache.lens.storage.db.DBStorageHandler" 
>> num_buckets="0" compressed="false">
>>     <part_cols>
>>       <column name="pt" _type="string" comment="date partition"/>
>>       <column name="dt" _type="string" comment="date partition"/>
>>       <column name="ot" _type="string" comment="date partition"/>
>>     </part_cols>
>>     <table_parameters>
>>       <property name="lens.metastore.native.table.name" 
>> value="table_name_in_mysql_db"/>
>>       <property name="lens.metastore.native.db.name" value="mysql_db_name"/>
>>       <property name="lens.metastore.native.table.column.mapping"
>>                 
>> value="pt=process_time,field1=field1_actual_name,field2=field2_actual_name"/>
>>       <property name="cube.storagetable.start.times" value="now - 4 days"/>
>>     </table_parameters>
>>     <time_part_cols>pt</time_part_cols>
>>     <time_part_cols>dt</time_part_cols>
>>     <time_part_cols>ot</time_part_cols>
>>   </table_desc>
>> </storage_table>
>>
>>
>> On Sat, Aug 27, 2016 at 3:42 AM Tao Yan <[email protected]> wrote:
>>
>>> Hi Lens Developers,
>>>
>>> I am trying to add mysql as a data source, so, I created the driver as
>>> follows:
>>>
>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>
>>> <configuration>
>>>   <property>
>>>     <name>lens.driver.jdbc.driver.class</name>
>>>     <value>com.mysql.jdbc.Driver</value>
>>>   </property>
>>>   <property>
>>>     <name>lens.driver.jdbc.db.uri</name>
>>>
>>> <value>jdbc:mysql://****mysql**hostname****/lens;user=****;passowrd=****</value>
>>>   </property>
>>>   <property>
>>>     <name>lens.driver.jdbc.db.user</name>
>>>     <value>****</value>
>>>   </property>
>>>   <property>
>>>     <name>lens.cube.query.driver.supported.storages</name>
>>>     <value>mysql</value>
>>>     <final>true</final>
>>>   </property>
>>>   <property>
>>>     <name>lens.driver.jdbc.query.rewriter</name>
>>>     <value>org.apache.lens.driver.jdbc.ColumnarSQLRewriter</value>
>>>   </property>
>>>   <property>
>>>     <name>lens.driver.jdbc.explain.keyword</name>
>>>     <value>explain plan for </value>
>>>   </property>
>>> </configuration>
>>>
>>> And created a test table with data in mysql:
>>>
>>> create table dimension1_subset (
>>> primary_key BIGINT(8) not null primary key,
>>> attr1 BIGINT(8),
>>> attr5 BIGINT(8)
>>> );
>>>
>>> [image: Screen Shot 2016-08-26 at 2.44.24 PM.png]
>>> ​
>>> And created the storage file:
>>>
>>> <?xml version="1.0" encoding="UTF-8"?>
>>>
>>> <x_storage classname="org.apache.lens.storage.db.DBStorage" name="mysql"
>>> xmlns="uri:lens:cube:0.1"
>>>   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
>>> xsi:schemaLocation="uri:lens:cube:0.1 cube-0.1.xsd ">
>>>   <properties>
>>>     *<property name="lens.storage.db.url"
>>> value="jdbc:mysql://lva1-db34.corp.linkedin.com/lens
>>> <http://lva1-db34.corp.linkedin.com/lens>"/>*
>>>   </properties>
>>> </x_storage>
>>>
>>> I am confused when set the value for *lens.storage.db.url *because* it
>>> is neither local filesystem nor HDFS.*
>>>
>>> And I don't know what should the storage table look like:
>>>
>>> <x_dimension_table dimension_name="dimension1"
>>> table_name="dimension1_table3" weight="5.0" xmlns="uri:lens:cube:0.1"
>>>   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
>>> xsi:schemaLocation="uri:lens:cube:0.1 cube-0.1.xsd ">
>>>   <columns>
>>>     <column name="primary_key" _type="BIGINT"/>
>>>     <column name="attr1" _type="BIGINT"/>
>>>     <column name="attr5" _type="BIGINT"/>
>>>   </columns>
>>>   <properties>
>>>     <property name="dimension1.prop" value="t3"/>
>>>   </properties>
>>>   <storage_tables>
>>>
>>>
>>> *    <storage_table>      ...    </storage_table>*
>>>   </storage_tables>
>>> </x_dimension_table>
>>>
>>> What should I use for the table_location, and what about part_cols? How
>>> do I add data to this table?
>>>
>>> Thanks,
>>> --
>>>
>>> *Tao Yan*
>>> Software Engineer
>>> Data Analytics Infrastructure Tools and Services
>>>
>>>
>>>
>>> 206.250.5345
>>> [email protected]
>>> https://www.linkedin.com/in/taousc
>>>
>>
>
>
> --
>
> *Tao Yan*
> Software Engineer
> Data Analytics Infrastructure Tools and Services
>
>
>
> 206.250.5345
> [email protected]
> https://www.linkedin.com/in/taousc
>

Reply via email to