Matt, I realized you meant ExtractText when I saw that SplitText doesn't allow you to change the split option.
SplitText does add an attribute for `text.line.count`, but ExtractText doesn't have anything like that. Thoughts? --Peter -----Original Message----- From: Matt Burgess [mailto:[email protected]] Sent: Friday, September 23, 2016 8:02 AM To: [email protected] Subject: Re: PutHiveQL Multiple Ordered Statements Yes SplitText will write a "fragment.index" attribute (as well as other attributes about the split) you could use for priority, except you may need to reverse it (${fragment.count:minus(fragment.index)} or something like that) for priority. On Fri, Sep 23, 2016 at 9:46 AM, Peter Wicks (pwicks) <[email protected]> wrote: > Matt, > > I put some thought into this option; but I was worried about > guaranteed order of execution. So then I started looking at the > prioritized queue. If I use a prioritized queue and a max batch size > of 1 on PutHiveQL I think I could get it to work; however I am not > really sure how to apply the correct priority attribute to the correct > split. Does split already apply a split index? (I haven't checked) > > Thanks, > Peter > > -----Original Message----- > From: Matt Burgess [mailto:[email protected]] > Sent: Friday, September 23, 2016 6:34 AM > To: [email protected] > Subject: Re: PutHiveQL Multiple Ordered Statements > > Peter, > > Since each of your statements ends with a semicolon, I would think you could > use SplitText with Enable Multiline Mode and a delimiter of ';' > to get flowfiles containing a single statement apiece, then route > those to a single PutHiveQL. Not sure what the exact regex would look > like but on its face it looks possible :) > > Regards, > Matt > > On Fri, Sep 23, 2016 at 8:14 AM, Peter Wicks (pwicks) <[email protected]> > wrote: >> I have a PutHDFS processor drop a file, I then have a long chain of >> ReplaceText -> PutHiveQL processors that runs a series of steps. >> >> The below ~4 steps allow me to take the file generated by NiFi in one >> format and move it into the final table, which is ORC with several >> Timestamp columns (thus why I’m not using AvroToORC, since I’d lose my >> Timestamps. >> >> >> >> The exact HQL, all in one block, is roughly: >> >> >> >> DROP TABLE `db.tbl_${filename}`; >> >> >> >> CREATE TABLE ` db.tbl _${filename}`( >> >> Some list of columns goes here that exactly matches the schema of >> `prod_db.tbl` >> >> ) >> >> ROW FORMAT DELIMITED >> >> FIELDS TERMINATED BY '\001' >> >> STORED AS TEXTFILE; >> >> LOAD DATA INPATH '${absolute.hdfs.path}/${filename}' INTO TABLE ` >> db.tbl _${filename}`; >> >> INSERT INTO `prod_db.tbl` >> >> SELECT * FROM ` db.tbl _${filename}`; >> >> DROP TABLE ` db.tbl _${filename}`; >> >> >> >> Right now I’m having to split this into 5 separate ReplaceText steps, >> each one followed by a PutHiveQL. Is there a way I can push a >> multi-statement, order dependent, script like this to Hive in a simpler way? >> >> >> >> Thanks, >> >> Peter
