H All, I am trying to use HiveOutput Module to insert the ingested data into
hive external table. The table is already created with the same location as
/dt.application.<app_name>.operator.hiveOutput.prop.filePath/ property and
partition column is accessdate. With below configurations in property file,
the hdfs file structure I am expecting is 

/common/data/test/accessCounts
                                                |
                                                ----- accessdate=2017-05-15
                                                                        |
                                                                        ------- 
<fil1>
                                                                        ------- 
<fil2>
                                                ----- accessdate=2017-05-16
                                                                        |
                                                                        ------- 
<fil1>
                                                                        ------- 
<fil2>

but the actual structure look like

/common/data/test/accessCounts/<yarn_application_id_for_apex_ingest_appl>/10
                                                                                
                                           |
                                                                                
                                           ----- 2017-05-15
                                                                                
                                                     |
                                                                                
                                                     ------- <fil1>
                                                                                
                                                     ------- <fil2>
                                                                                
                                          |
                                                                                
                                           ----- 2017-05-16
                                                                                
                                                     |
                                                                                
                                                     ------- <fil1>
                                                                                
                                                     ------- <fil2>

Questions
1. Why the yarn_application_id and some other extra directories are created
when it is no where specified in config
2. If I want to achieve the structure I want, what other configurations I
will need to set?

HiveOutputModule Configs
==================

<property>
                
<name>dt.application.<app_name>.operator.hiveOutput.prop.filePath
                </name>
                <value>/common/data/test/accessCounts</value>
        </property>
        <property>
                
<name>dt.application.<app_name>.operator.hiveOutput.prop.databaseUrl
                </name>
                <value><jdbc_url></value>
        </property>
        <property>
                
<name>dt.application.<app_name>.operator.hiveOutput.prop.databaseDriver
                </name>
                <value>org.apache.hive.jdbc.HiveDriver</value>
        </property>
        <property>
                
<name>dt.application.<app_name>.operator.hiveOutput.prop.tablename
                </name>
                <value><hive table name where records needs to be 
inserted></value>
        </property>
        <property>
        
<name>dt.application.<app_name>.operator.hiveOutput.prop.hivePartitionColumns
                </name>
                <value>{accessdate}</value>
        </property>
        <property>
                
<name>dt.application.<app_name>.operator.hiveOutput.prop.password
                </name>
                <value><hive connection password></value>
        </property>
        <property>
                
<name>dt.application.<app_name>.operator.hiveOutput.prop.userName
                </name>
                <value><hive connection user></value>
        </property>
        <property>
                
<name>dt.application.<app_name>.operator.hiveOutput.prop.hiveColumns
                </name>
                <value>{col1,col2,col3,col4}</value>
        </property>
        <property>
        
<name>dt.application.<app_name>.operator.hiveOutput.prop.hiveColumnDataTypes
                </name>
                <value>{STRING,STRING,STRING,STRING}</value>
        </property>
        <property>
        
<name>dt.application.<app_name>.operator.hiveOutput.prop.hivePartitionColumns
                </name>
                <value>{accessdate}</value>
        </property>
        <property>
        
<name>dt.application.<app_name>.operator.hiveOutput.prop.hivePartitionColumnDataTypes
                </name>
                <value>{STRING}</value>
        </property>
        <property>
        
<name>dt.application.<app_name>.operator.hiveOutput.prop.expressionsForHiveColumns
                </name>
                <value>{"getCol1()","getCol2()","getCol3()","getCol4()"}</value>
        </property>
        <property>
        
<name>dt.application.<app_name>.operator.hiveOutput.prop.expressionsForHivePartitionColumns
                </name>
                <value>{"getAccessdate()"}</value>
        </property>



--
View this message in context: 
http://apache-apex-users-list.78494.x6.nabble.com/HiveOutputModule-creating-extra-directories-than-specified-while-saving-data-into-HDFS-tp1620.html
Sent from the Apache Apex Users list mailing list archive at Nabble.com.

Reply via email to