Sorry, I was confused with RandomSampleLoader which uses reservoir sampling. 
SAMPLE is rewritten to filter + less than expression with sampling percentage 
as predicate value. 

Thanks
-- Prasanth

On Feb 28, 2013, at 5:01 AM, Gianmarco De Francisci Morales <[email protected]> 
wrote:

> Hi,
> LIMIT takes the first X records, so there are no statistical guarantees.
> SAMPLE takes X% of the records from the whole bag (uniformly), so you have
> statistical guarantees.
> No, SAMPLE does not use reservoir sampling.
> 
> Cheers,
> 
> --
> Gianmarco
> 
> 
> On Wed, Feb 27, 2013 at 12:23 AM, Prasanth J 
> <[email protected]>wrote:
> 
>> AFAIK, SAMPLE operator internally uses reservoir sampling. So it reads
>> entire data to randomly generate 10% data.
>> 
>> Thanks
>> -- Prasanth
>> 
>> On Feb 26, 2013, at 6:19 PM, Panshul Whisper <[email protected]>
>> wrote:
>> 
>>> Hello,
>>> 
>>> Can somebody please explain me the difference between Limit and Sample
>>> statements.
>>> Does it read the entire input file in case of Sample if the value is set
>> to
>>> 0.1 or it reads randomly only till 10% of the data has been collected.
>>> 
>>> Thanking You for any help.
>>> 
>>> --
>>> Regards,
>>> Ouch Whisper
>>> 010101010101
>> 
>> 

Reply via email to