I have a hive table partitioned by dates. It contains ecomm data in the format siteid,sitecatid,catid,subcatgid,pid,pname,pprice,pmrp,pdesc....
What I need to do is to run a query on table above in hive for top 10 products(count wise) in each sub category. What adds a bit more complexity is that I need all the information of the product. Now when I do group by with only subcatg,pid, I can only select the same fields. But I want all the data for that product coming in the same row as subcatg & prodid like prodname, proddesc,price, mrp,imageurl. And since some information like price & proddesc of a product keep on changing I want to pick the latest column values(according to a date field) for a pid if we are able to do a group by on subcatg,pid. I am not able to find a solution to my problem in hive. Any help would be much appreciated. Regards Mohit