Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.
The following page has been changed by SriranjanManjunath: http://wiki.apache.org/pig/PigSampler ------------------------------------------------------------------------------ * Different operations need a different sampler. We thus need a generic sampling interface. == Proposed changes == - * Addition of a "Sampler" interface that sample loaders must implement. The existing RandomSampleLoader will be modified to implement the same. + * An abstract "Sampler" class that defines the basic sampling operations. The existing RandomSampleLoader will be modified to extend the same. * Order By will continue to use the existing RandomSampleLoader where as SkewedJoin will define a new Sampler. The distinction is important since the sample rate is different between the two and the sample rate for skewed join will not be known during the compilation phase. * Skewed Join sampler will estimate the number of samples based on the size of the input. * Using a more uniform distribution for the skewed join sample loader instead of making it random. The distribution can be generated offline and stored in a file and later used by the sample loader to pick the samples.
