Matt, I put some thought into this option; but I was worried about guaranteed order of execution. So then I started looking at the prioritized queue. If I use a prioritized queue and a max batch size of 1 on PutHiveQL I think I could get it to work; however I am not really sure how to apply the correct priority attribute to the correct split. Does split already apply a split index? (I haven't checked)
Thanks, Peter -----Original Message----- From: Matt Burgess [mailto:[email protected]] Sent: Friday, September 23, 2016 6:34 AM To: [email protected] Subject: Re: PutHiveQL Multiple Ordered Statements Peter, Since each of your statements ends with a semicolon, I would think you could use SplitText with Enable Multiline Mode and a delimiter of ';' to get flowfiles containing a single statement apiece, then route those to a single PutHiveQL. Not sure what the exact regex would look like but on its face it looks possible :) Regards, Matt On Fri, Sep 23, 2016 at 8:14 AM, Peter Wicks (pwicks) <[email protected]> wrote: > I have a PutHDFS processor drop a file, I then have a long chain of > ReplaceText -> PutHiveQL processors that runs a series of steps. > > The below ~4 steps allow me to take the file generated by NiFi in one > format and move it into the final table, which is ORC with several > Timestamp columns (thus why I’m not using AvroToORC, since I’d lose my > Timestamps. > > > > The exact HQL, all in one block, is roughly: > > > > DROP TABLE `db.tbl_${filename}`; > > > > CREATE TABLE ` db.tbl _${filename}`( > > Some list of columns goes here that exactly matches the schema of > `prod_db.tbl` > > ) > > ROW FORMAT DELIMITED > > FIELDS TERMINATED BY '\001' > > STORED AS TEXTFILE; > > LOAD DATA INPATH '${absolute.hdfs.path}/${filename}' INTO TABLE ` > db.tbl _${filename}`; > > INSERT INTO `prod_db.tbl` > > SELECT * FROM ` db.tbl _${filename}`; > > DROP TABLE ` db.tbl _${filename}`; > > > > Right now I’m having to split this into 5 separate ReplaceText steps, > each one followed by a PutHiveQL. Is there a way I can push a > multi-statement, order dependent, script like this to Hive in a simpler way? > > > > Thanks, > > Peter
