Re: Dynamic columns in Hive Table - Best Design for the problem

Edward Capriolo Sun, 29 Dec 2013 07:56:25 -0800

Basically when you have data like this, it is best to treat the all the
columns as a single string and write a tool to break the entire row apart.
You could use a UDF or a UDTF actually. Look at something like parseUrl...


select myRow(row) as id string, events List<String> ....

A UDTF allows you to produce columns and or rows.

The other way is you write a UDF that returns a struct.


On Sun, Dec 29, 2013 at 10:17 AM, Raj Hadoop <[email protected]> wrote:

> Matt,
>
> Thanks for the suggestion. Can you please provide more details on what
> type of UDAF should I develop ? I have never worked on a UDAF earlier. But
> would like to explore it. Any tips on how to proceed.
>
> Thanks,
> Raj
>
>
>   On Saturday, December 28, 2013 2:47 PM, Matt Tucker <
> [email protected]> wrote:
>  It looks like you're essentially doing a pivot function. Your best bet
> is to write a custom UDAF or look at the windowing functions available in
> recent releases.
> Matt
> On Dec 28, 2013 12:57 PM, "Raj Hadoop" <[email protected]> wrote:
>
> Dear All Hive Group Members,
>
> I have the following requirement.
>
> Input:
>
> Ticket#|Date of booking|Price
> 100|20-Oct-13|54
> 100|21-Oct-13|56
> 100|22-Oct-13|54
> 100|23-Oct-13|55
> 100|27-Oct-13|60
> 100|30-Oct-13|47
>
> 101|10-Sep-13|12
> 101|13-Sep-13|14
> 101|20-Oct-13|6
>
>
> Expected Output:
>
> Ticket#|Initial|Delta1|Delta2|Delta3|Delta4|Delta5
> 100|20-Oct-13,54|
> 21-Oct-13,2|22-Oct-13,0|23-Oct-3,1|27-Oct-13,6|30-Oct-13,-7
> 101|10-Sep-13,12|13-Sep-13,2|20-Oct-13,-6|||
>
> The number of columns in the expected output is a dynamic list depending
> on the number of price changes of a ticket.
>
> 1) What is the best design to solve the above problem in Hive?
> 2) How do we implement it?
>
> Please advise.
>
> Regards,
> Raj
>
>
>
>
>
>
>
>

Re: Dynamic columns in Hive Table - Best Design for the problem

Reply via email to