Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by SriranjanManjunath:
http://wiki.apache.org/pig/PigSampler

------------------------------------------------------------------------------
   * Different operations need a different sampler. We thus need a generic 
sampling interface.
  
  == Proposed changes ==
-  * Addition of a "Sampler" interface that sample loaders must implement. The 
existing RandomSampleLoader will be modified to implement the same.
+  * An abstract "Sampler" class that defines the basic sampling operations. 
The existing RandomSampleLoader will be modified to extend the same.
   * Order By will continue to use the existing RandomSampleLoader where as 
SkewedJoin will define a new Sampler. The distinction is important since the 
sample rate is different between the two and the sample rate for skewed join 
will not be known during the compilation phase.
   * Skewed Join sampler will estimate the number of samples based on the size 
of the input.
   * Using a more uniform distribution for the skewed join sample loader 
instead of making it random. The distribution can be generated offline and 
stored in a file and later used by the sample loader to pick the samples.

Reply via email to