Also I have a colllect udf. https://github.com/edwardcapriolo/hive-collect
Since collect sets removes duplicates. On Thu, Aug 23, 2012 at 1:26 PM, Philip Tromans <philip.j.trom...@gmail.com> wrote: > insert into originalTable > select uniqueId, collect_set(whatever) from explodedTable group by uniqueId > > will probably do the trick. > > Phil. > > On 23 August 2012 17:45, Mike Fleming <m...@obvious.com> wrote: >> I see that hive has away to take a table and produce multiple rows. >> >> Is there a built in way to do the reverse? >> >> Say I have a table with a unique key and an array. I do this: >> >>> insert into explodedTable select uniqueId, explode(arrayOfThings) from >>> originalTable >> >> Now I have a table with a row for each (uniqueId, element in arrayOfThings). >> >> Is there any way to take the contents of explodedTable and essentially >> produce the original table, reconstructing the arrayOfThings for each >> uniqueId? >> >> It seems, conceptually, that if I "cluster by uniqueId" then a reducer knows >> that it will get all rows for each uniqueId bundled together, so it ought to >> be fairly feasible to simply emit an unexploded row. However, I can't seem >> to find a built-in way to do this. >> >> Mike >>