Yes SplitText will write a "fragment.index" attribute (as well as
other attributes about the split) you could use for priority, except
you may need to reverse it (${fragment.count:minus(fragment.index)} or
something like that) for priority.On Fri, Sep 23, 2016 at 9:46 AM, Peter Wicks (pwicks) <[email protected]> wrote: > Matt, > > I put some thought into this option; but I was worried about guaranteed order > of execution. So then I started looking at the prioritized queue. If I use a > prioritized queue and a max batch size of 1 on PutHiveQL I think I could get > it to work; however I am not really sure how to apply the correct priority > attribute to the correct split. Does split already apply a split index? (I > haven't checked) > > Thanks, > Peter > > -----Original Message----- > From: Matt Burgess [mailto:[email protected]] > Sent: Friday, September 23, 2016 6:34 AM > To: [email protected] > Subject: Re: PutHiveQL Multiple Ordered Statements > > Peter, > > Since each of your statements ends with a semicolon, I would think you could > use SplitText with Enable Multiline Mode and a delimiter of ';' > to get flowfiles containing a single statement apiece, then route those to a > single PutHiveQL. Not sure what the exact regex would look like but on its > face it looks possible :) > > Regards, > Matt > > On Fri, Sep 23, 2016 at 8:14 AM, Peter Wicks (pwicks) <[email protected]> > wrote: >> I have a PutHDFS processor drop a file, I then have a long chain of >> ReplaceText -> PutHiveQL processors that runs a series of steps. >> >> The below ~4 steps allow me to take the file generated by NiFi in one >> format and move it into the final table, which is ORC with several >> Timestamp columns (thus why I’m not using AvroToORC, since I’d lose my >> Timestamps. >> >> >> >> The exact HQL, all in one block, is roughly: >> >> >> >> DROP TABLE `db.tbl_${filename}`; >> >> >> >> CREATE TABLE ` db.tbl _${filename}`( >> >> Some list of columns goes here that exactly matches the schema of >> `prod_db.tbl` >> >> ) >> >> ROW FORMAT DELIMITED >> >> FIELDS TERMINATED BY '\001' >> >> STORED AS TEXTFILE; >> >> LOAD DATA INPATH '${absolute.hdfs.path}/${filename}' INTO TABLE ` >> db.tbl _${filename}`; >> >> INSERT INTO `prod_db.tbl` >> >> SELECT * FROM ` db.tbl _${filename}`; >> >> DROP TABLE ` db.tbl _${filename}`; >> >> >> >> Right now I’m having to split this into 5 separate ReplaceText steps, >> each one followed by a PutHiveQL. Is there a way I can push a >> multi-statement, order dependent, script like this to Hive in a simpler way? >> >> >> >> Thanks, >> >> Peter
