Re: What is implemented behind the PIG Joins

Thejas Nair Mon, 22 Aug 2011 13:59:10 -0700

Hi Byambajargal,
What version of pig does your distribution use ?
-Thejas


On 8/22/11 3:42 AM, byambaa wrote:

Hello
I have a cluster with 11 nodes each of them have 16 GB RAM, 6 core CPU,
1 TB HDD and i am using cloudera distribution CHD4b with Pig. I have two
Pig
Join queries which are a Parallel and a Replicated version of pig Join
and MapReduce Reduce side and Map side joins.

Theoretically Replicated Join could be faster than Parallel join but in
my case Parallel is faster.
i have a questions :

1.I am wondering why the replicated join is so slowly how it works what
is the behind the replicated join.
2. MR reduce side join was faster than parallel pig join, what is
implemented background the parallel pig join. i guess pig implement also
MR reduce side join.

Could you explain me about the Pig joins how it works and what is run
behind the pig scripts


Replicated Join in HDFS Replicated Join in Hbase MR Reduce side join MR
Joins (Singleton pattern)
obr_wp_annotation 1786MB
29 sec 50 sec 36 sec 19
obr_ct_annotation 5916MB
799 sec 523 sec
108 sec 69
obr_pm_annotation 16983MB
1794 sec
707 sec 248 sec 138

the relation file is 659MB

thanks you very much

Byambajargal

Re: What is implemented behind the PIG Joins

Reply via email to