subject:"\[Pig Wiki\] Update of \"PigFRJoin\" by PradeepKamath"

[Pig Wiki] Update of "PigFRJoin" by PradeepKamath

2009-01-08 Thread Apache Wiki

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by PradeepKamath:
http://wiki.apache.org/pig/PigFRJoin

--
  = Fragment Replicate Join =
  Fragment Replicate Join(FRJ) is useful when we want a join between a huge 
table and a very small table (fitting in memory small) and the join doesn't 
expand the data by much. The idea is to distribute the processing of the huge 
files by fragmenting it and replicating the small file to all machines 
receiving a fragment of the huge file. Because of the availability of the 
entire small file, the join becomes a trivial task without needing any break in 
the pipeline.
  
- NOTE: In the initial version of the implementation, the first input in the 
Join statement will be considered to be the "fragment" input and the other 
inputs are considered to be the "replicate" inputs.
+ NOTE: In the initial version of the implementation, the first input in the 
Join statement will be considered to be the "fragment" input and the other 
inputs are considered to be the "replicate" inputs. In tests, it was found that 
around 80MB might be a maximum size for the "replicate" input. Above that, 
there is a good probability of getting memory errors.
  
  == Performance ==
  The following is a set of parameters that we can alter to compare the 
performance of the different types of join algorithms:

[Pig Wiki] Update of "PigFRJoin" by PradeepKamath

2009-01-06 Thread Apache Wiki

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by PradeepKamath:
http://wiki.apache.org/pig/PigFRJoin

--
  = Fragment Replicate Join =
  Fragment Replicate Join(FRJ) is useful when we want a join between a huge 
table and a very small table (fitting in memory small) and the join doesn't 
expand the data by much. The idea is to distribute the processing of the huge 
files by fragmenting it and replicating the small file to all machines 
receiving a fragment of the huge file. Because of the availability of the 
entire small file, the join becomes a trivial task without needing any break in 
the pipeline.
+ 
+ NOTE: In the initial version of the implementation, the first input in the 
Join statement will be considered to be the "fragment" input and the other 
inputs are considered to be the "replicate" inputs.
  
  == Performance ==
  The following is a set of parameters that we can alter to compare the 
performance of the different types of join algorithms:

[Pig Wiki] Update of "PigFRJoin" by PradeepKamath

[Pig Wiki] Update of "PigFRJoin" by PradeepKamath

2 matches

Site Navigation

Mail list logo

Footer information