The primary difference between hive and pig is the language. There are implementation differences that will result in performance differences, but it will be hard to figure out what aspect of implementation responsible for what improvement.
I think a more interesting project would be to compare the impact of various performance improvements in hive. There are many features that you can turn on and off. example - - hive vectorization - file format - text vs RCFile vs ORC - compressed vs uncompressed - mapreduce vs tez execution engine - stats optimized queries On Thu, May 1, 2014 at 5:47 AM, Sarfraz Ramay <sarfraz.ra...@gmail.com> wrote: >> >> Hi, >> >> It seems that both Hive and Pig are used for managing large data sets. >> Hive is more SQL oriented whereas Pig is more for the data flows. I am doing >> a master's thesis on the performance evaluation of both. Can some please >> provide a list of tasks that would make for an interesting comparison ? >> >> >> What is Hive good at ? >> >> What is Pig good at ? >> >> Ideally, i would like to take what Hive is good at and test it in Pig and >> vice versa. The competitive characteristics would make for an interesting >> comparison. >> >> >> >> >> Regards, >> Sarfraz Rasheed Ramay (DIT) >> Dublin, Ireland. > > -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.