Good catch, I got the two mixed together in my mind :P Any way to ensure each HiveQL statement is on a single line in the source file? If not, you could use ExecuteScript and have your own regex that does the multi-line-until-semicolon split. You could add whatever attributes you want in that case too.
Regards, Matt On Fri, Sep 23, 2016 at 11:19 AM, Peter Wicks (pwicks) <[email protected]> wrote: > Matt, > > I realized you meant ExtractText when I saw that SplitText doesn't allow you > to change the split option. > > SplitText does add an attribute for `text.line.count`, but ExtractText > doesn't have anything like that. Thoughts? > > --Peter > > > -----Original Message----- > From: Matt Burgess [mailto:[email protected]] > Sent: Friday, September 23, 2016 8:02 AM > To: [email protected] > Subject: Re: PutHiveQL Multiple Ordered Statements > > Yes SplitText will write a "fragment.index" attribute (as well as other > attributes about the split) you could use for priority, except you may need > to reverse it (${fragment.count:minus(fragment.index)} or something like > that) for priority. > > On Fri, Sep 23, 2016 at 9:46 AM, Peter Wicks (pwicks) <[email protected]> > wrote: >> Matt, >> >> I put some thought into this option; but I was worried about >> guaranteed order of execution. So then I started looking at the >> prioritized queue. If I use a prioritized queue and a max batch size >> of 1 on PutHiveQL I think I could get it to work; however I am not >> really sure how to apply the correct priority attribute to the correct >> split. Does split already apply a split index? (I haven't checked) >> >> Thanks, >> Peter >> >> -----Original Message----- >> From: Matt Burgess [mailto:[email protected]] >> Sent: Friday, September 23, 2016 6:34 AM >> To: [email protected] >> Subject: Re: PutHiveQL Multiple Ordered Statements >> >> Peter, >> >> Since each of your statements ends with a semicolon, I would think you could >> use SplitText with Enable Multiline Mode and a delimiter of ';' >> to get flowfiles containing a single statement apiece, then route >> those to a single PutHiveQL. Not sure what the exact regex would look >> like but on its face it looks possible :) >> >> Regards, >> Matt >> >> On Fri, Sep 23, 2016 at 8:14 AM, Peter Wicks (pwicks) <[email protected]> >> wrote: >>> I have a PutHDFS processor drop a file, I then have a long chain of >>> ReplaceText -> PutHiveQL processors that runs a series of steps. >>> >>> The below ~4 steps allow me to take the file generated by NiFi in one >>> format and move it into the final table, which is ORC with several >>> Timestamp columns (thus why I’m not using AvroToORC, since I’d lose my >>> Timestamps. >>> >>> >>> >>> The exact HQL, all in one block, is roughly: >>> >>> >>> >>> DROP TABLE `db.tbl_${filename}`; >>> >>> >>> >>> CREATE TABLE ` db.tbl _${filename}`( >>> >>> Some list of columns goes here that exactly matches the schema of >>> `prod_db.tbl` >>> >>> ) >>> >>> ROW FORMAT DELIMITED >>> >>> FIELDS TERMINATED BY '\001' >>> >>> STORED AS TEXTFILE; >>> >>> LOAD DATA INPATH '${absolute.hdfs.path}/${filename}' INTO TABLE ` >>> db.tbl _${filename}`; >>> >>> INSERT INTO `prod_db.tbl` >>> >>> SELECT * FROM ` db.tbl _${filename}`; >>> >>> DROP TABLE ` db.tbl _${filename}`; >>> >>> >>> >>> Right now I’m having to split this into 5 separate ReplaceText steps, >>> each one followed by a PutHiveQL. Is there a way I can push a >>> multi-statement, order dependent, script like this to Hive in a simpler way? >>> >>> >>> >>> Thanks, >>> >>> Peter
