Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Pig Wiki" for change
notification.
The following page has been changed by PradeepKamath:
http://wiki.apache.org/pig/PigFRJoin
--
= Fragment Replicate Join =
Fragment Replicate Join(FRJ) is useful when we want a join between a huge
table and a very small table (fitting in memory small) and the join doesn't
expand the data by much. The idea is to distribute the processing of the huge
files by fragmenting it and replicating the small file to all machines
receiving a fragment of the huge file. Because of the availability of the
entire small file, the join becomes a trivial task without needing any break in
the pipeline.
- NOTE: In the initial version of the implementation, the first input in the
Join statement will be considered to be the "fragment" input and the other
inputs are considered to be the "replicate" inputs.
+ NOTE: In the initial version of the implementation, the first input in the
Join statement will be considered to be the "fragment" input and the other
inputs are considered to be the "replicate" inputs. In tests, it was found that
around 80MB might be a maximum size for the "replicate" input. Above that,
there is a good probability of getting memory errors.
== Performance ==
The following is a set of parameters that we can alter to compare the
performance of the different types of join algorithms: