Thank you Selva for the response. But the data is too huge to be handled by Excel, thus I need to do it with hive.
On Thu, Oct 10, 2013 at 2:31 PM, Selvamohan Neethiraj < [email protected]> wrote: > If it is not so much data, you can use Excel's PivotTable to solve this > specific requirement: > > 1. Select the date and plz columns (w/o header) and create pivotTable on a > new WorkSheet > 2. Drag the column name: 'Date' from the 'Pivot Table' Builder to the > 'Column Label' section. > 3. Drag the column name: 'PLZ' from the 'Pivot Table' Builder to the > 'Row Labels' > 4. Drag the column name: 'Date' from the 'Pivot Table' Builder to the > 'Values' and ensure the selected summary function is 'Count' > > Hope, this helps, > > Thanks, > Selva- > > > On Oct 10, 2013, at 7:53 AM, Panshul Whisper <[email protected]> > wrote: > > Hello, > > I have a data manipulation query. > > I have my data in the following format: > > *Date PLZ Count* > date1 plz1 count1 > date1 plz1 count2 > date1 plz1 count3 > date1 plz2 count4 > date1 plz2 count5 > date1 plz3 count6 > date1 plz3 count7 > > date2 plz1 count8 > date2 plz1 count9 > date2 plz3 count10 > date2 plz3 count11 > > date3 plz1 count12 > date3 plz1 count13 > date3 plz2 count14 > date3 plz2 count15 > date3 plz2 count16 > > date4 plz1 count17 > date4 plz2 count18 > date4 plz3 count19 > > With Hive queries I have managed to get the data into the following > grouped and aggregated format: > > select plz, TO_DATE(time), sum(totalcount) from power_pad_part where > yearfolder='2013' and monthfolder in ('01') > and eco=0 and TO_DATE(time)>='Date1' and TO_DATE(time)<'Date4' > and plz in ('plz1','plz2','plz3') > group by plz, time > > *PLZ Date TotalCount* > plz1 date1 TC1 > plz1 date2 TC2 > plz1 date3 TC3 > plz1 date4 TC4 > > plz2 date1 TC5 > plz2 date3 TC6 > plz2 date4 TC7 > > plz3 date1 TC8 > plz3 date2 TC9 > plz3 date4 TC10 > > The above data is grouped by plz and date. There is no entry of a plz if > it does not exist for a date. > I used the following query to generate the above data: > > > > But I require the data to be in the following format: > > *PLZ Date1 Date2 Date3 Date4 ..... DateN .....* > plz1 TC1 TC2 TC3 TC4 ..... TC N > plz2 TC5 ------ TC6 TC7 ..... TC N > plz3 TC8 TC9 ------ TC10 .... TC N > > > This needs to be generated as 1 column for every date of the year. > > Any help to generate this format is welcome. Writing my own mapper and > reduce and calling in Hive as function is to be my last option. > > Thanking You, > > > -- > Regards, > Ouch Whisper > 010101010101 > > > -- Regards, Ouch Whisper 010101010101
<<Screen Shot 2013-10-10 at 8.28.29 AM.png>>
<<Screen Shot 2013-10-10 at 8.28.29 AM.png>>
