The primary difference between hive and pig is the language. There are
implementation differences that will result in performance
differences, but it will be hard to figure out what aspect of
implementation responsible for what improvement.

I think a more interesting project would be to compare the impact of
various performance improvements in hive. There are many features that
you can turn on and off.

example -
- hive vectorization
- file format - text vs RCFile vs ORC
- compressed vs uncompressed
- mapreduce vs tez execution engine
- stats optimized queries



On Thu, May 1, 2014 at 5:47 AM, Sarfraz Ramay <sarfraz.ra...@gmail.com> wrote:
>>
>> Hi,
>>
>> It seems that both Hive and Pig are used for managing large data sets.
>> Hive is more SQL oriented whereas Pig is more for the data flows. I am doing
>> a master's thesis on the performance evaluation of both. Can some please
>> provide a list of tasks that would make for an interesting comparison ?
>>
>>
>> What is Hive good at ?
>>
>> What is Pig good at ?
>>
>> Ideally, i would like to take what Hive is good at and test it in Pig and
>> vice versa. The competitive characteristics  would make for an interesting
>> comparison.
>>
>>
>>
>>
>> Regards,
>> Sarfraz Rasheed Ramay (DIT)
>> Dublin, Ireland.
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Reply via email to