Basically when you have data like this, it is best to treat the all the columns as a single string and write a tool to break the entire row apart. You could use a UDF or a UDTF actually. Look at something like parseUrl...
select myRow(row) as id string, events List<String> .... A UDTF allows you to produce columns and or rows. The other way is you write a UDF that returns a struct. On Sun, Dec 29, 2013 at 10:17 AM, Raj Hadoop <hadoop...@yahoo.com> wrote: > Matt, > > Thanks for the suggestion. Can you please provide more details on what > type of UDAF should I develop ? I have never worked on a UDAF earlier. But > would like to explore it. Any tips on how to proceed. > > Thanks, > Raj > > > On Saturday, December 28, 2013 2:47 PM, Matt Tucker < > matthewt...@gmail.com> wrote: > It looks like you're essentially doing a pivot function. Your best bet > is to write a custom UDAF or look at the windowing functions available in > recent releases. > Matt > On Dec 28, 2013 12:57 PM, "Raj Hadoop" <hadoop...@yahoo.com> wrote: > > Dear All Hive Group Members, > > I have the following requirement. > > Input: > > Ticket#|Date of booking|Price > 100|20-Oct-13|54 > 100|21-Oct-13|56 > 100|22-Oct-13|54 > 100|23-Oct-13|55 > 100|27-Oct-13|60 > 100|30-Oct-13|47 > > 101|10-Sep-13|12 > 101|13-Sep-13|14 > 101|20-Oct-13|6 > > > Expected Output: > > Ticket#|Initial|Delta1|Delta2|Delta3|Delta4|Delta5 > 100|20-Oct-13,54| > 21-Oct-13,2|22-Oct-13,0|23-Oct-3,1|27-Oct-13,6|30-Oct-13,-7 > 101|10-Sep-13,12|13-Sep-13,2|20-Oct-13,-6||| > > The number of columns in the expected output is a dynamic list depending > on the number of price changes of a ticket. > > 1) What is the best design to solve the above problem in Hive? > 2) How do we implement it? > > Please advise. > > Regards, > Raj > > > > > > > >