Re: PutHiveQL Multiple Ordered Statements

Matt Burgess Fri, 23 Sep 2016 07:02:59 -0700

Yes SplitText will write a "fragment.index" attribute (as well as
other attributes about the split) you could use for priority, except
you may need to reverse it (${fragment.count:minus(fragment.index)} or
something like that) for priority.


On Fri, Sep 23, 2016 at 9:46 AM, Peter Wicks (pwicks) <[email protected]> wrote:
> Matt,
>
> I put some thought into this option; but I was worried about guaranteed order 
> of execution. So then I started looking at the prioritized queue. If I use a 
> prioritized queue and a max batch size of 1 on PutHiveQL I think I could get 
> it to work; however I am not really sure how to apply the correct priority 
> attribute to the correct split.  Does split already apply a split index? (I 
> haven't checked)
>
> Thanks,
>   Peter
>
> -----Original Message-----
> From: Matt Burgess [mailto:[email protected]]
> Sent: Friday, September 23, 2016 6:34 AM
> To: [email protected]
> Subject: Re: PutHiveQL Multiple Ordered Statements
>
> Peter,
>
> Since each of your statements ends with a semicolon, I would think you could 
> use SplitText with Enable Multiline Mode and a delimiter of ';'
> to get flowfiles containing a single statement apiece, then route those to a 
> single PutHiveQL. Not sure what the exact regex would look like but on its 
> face it looks possible :)
>
> Regards,
> Matt
>
> On Fri, Sep 23, 2016 at 8:14 AM, Peter Wicks (pwicks) <[email protected]> 
> wrote:
>> I have a PutHDFS processor drop a file, I then have a long chain of
>> ReplaceText -> PutHiveQL processors that runs a series of steps.
>>
>> The below ~4 steps allow me to take the file generated by NiFi in one
>> format and move it into the final table, which is ORC with several
>> Timestamp columns (thus why I’m not using AvroToORC, since I’d lose my 
>> Timestamps.
>>
>>
>>
>> The exact HQL, all in one block, is roughly:
>>
>>
>>
>> DROP TABLE `db.tbl_${filename}`;
>>
>>
>>
>> CREATE TABLE ` db.tbl _${filename}`(
>>
>>    Some list of columns goes here that exactly matches the schema of
>> `prod_db.tbl`
>>
>> )
>>
>> ROW FORMAT DELIMITED
>>
>> FIELDS TERMINATED BY '\001'
>>
>> STORED AS TEXTFILE;
>>
>>  LOAD DATA INPATH '${absolute.hdfs.path}/${filename}' INTO TABLE `
>> db.tbl _${filename}`;
>>
>>  INSERT INTO `prod_db.tbl`
>>
>> SELECT * FROM ` db.tbl _${filename}`;
>>
>>                 DROP TABLE ` db.tbl _${filename}`;
>>
>>
>>
>> Right now I’m having to split this into 5 separate ReplaceText steps,
>> each one followed by a PutHiveQL.  Is there a way I can push a
>> multi-statement, order dependent, script like this to Hive in a simpler way?
>>
>>
>>
>> Thanks,
>>
>>   Peter

Re: PutHiveQL Multiple Ordered Statements

Reply via email to