Matt,

I put some thought into this option; but I was worried about guaranteed order 
of execution. So then I started looking at the prioritized queue. If I use a 
prioritized queue and a max batch size of 1 on PutHiveQL I think I could get it 
to work; however I am not really sure how to apply the correct priority 
attribute to the correct split.  Does split already apply a split index? (I 
haven't checked) 

Thanks,
  Peter 

-----Original Message-----
From: Matt Burgess [mailto:[email protected]] 
Sent: Friday, September 23, 2016 6:34 AM
To: [email protected]
Subject: Re: PutHiveQL Multiple Ordered Statements

Peter,

Since each of your statements ends with a semicolon, I would think you could 
use SplitText with Enable Multiline Mode and a delimiter of ';'
to get flowfiles containing a single statement apiece, then route those to a 
single PutHiveQL. Not sure what the exact regex would look like but on its face 
it looks possible :)

Regards,
Matt

On Fri, Sep 23, 2016 at 8:14 AM, Peter Wicks (pwicks) <[email protected]> wrote:
> I have a PutHDFS processor drop a file, I then have a long chain of 
> ReplaceText -> PutHiveQL processors that runs a series of steps.
>
> The below ~4 steps allow me to take the file generated by NiFi in one 
> format and move it into the final table, which is ORC with several 
> Timestamp columns (thus why I’m not using AvroToORC, since I’d lose my 
> Timestamps.
>
>
>
> The exact HQL, all in one block, is roughly:
>
>
>
> DROP TABLE `db.tbl_${filename}`;
>
>
>
> CREATE TABLE ` db.tbl _${filename}`(
>
>    Some list of columns goes here that exactly matches the schema of 
> `prod_db.tbl`
>
> )
>
> ROW FORMAT DELIMITED
>
> FIELDS TERMINATED BY '\001'
>
> STORED AS TEXTFILE;
>
>  LOAD DATA INPATH '${absolute.hdfs.path}/${filename}' INTO TABLE ` 
> db.tbl _${filename}`;
>
>  INSERT INTO `prod_db.tbl`
>
> SELECT * FROM ` db.tbl _${filename}`;
>
>                 DROP TABLE ` db.tbl _${filename}`;
>
>
>
> Right now I’m having to split this into 5 separate ReplaceText steps, 
> each one followed by a PutHiveQL.  Is there a way I can push a 
> multi-statement, order dependent, script like this to Hive in a simpler way?
>
>
>
> Thanks,
>
>   Peter

Reply via email to