I have a PutHDFS processor drop a file, I then have a long chain of ReplaceText 
-> PutHiveQL processors that runs a series of steps.
The below ~4 steps allow me to take the file generated by NiFi in one format 
and move it into the final table, which is ORC with several Timestamp columns 
(thus why I'm not using AvroToORC, since I'd lose my Timestamps.

The exact HQL, all in one block, is roughly:

DROP TABLE `db.tbl_${filename}`;

CREATE TABLE ` db.tbl _${filename}`(
   Some list of columns goes here that exactly matches the schema of 
`prod_db.tbl`
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
STORED AS TEXTFILE;
 LOAD DATA INPATH '${absolute.hdfs.path}/${filename}' INTO TABLE ` db.tbl 
_${filename}`;
 INSERT INTO `prod_db.tbl`
SELECT * FROM ` db.tbl _${filename}`;
                DROP TABLE ` db.tbl _${filename}`;

Right now I'm having to split this into 5 separate ReplaceText steps, each one 
followed by a PutHiveQL.  Is there a way I can push a multi-statement, order 
dependent, script like this to Hive in a simpler way?

Thanks,
  Peter

Reply via email to