Sorry, I was confused with RandomSampleLoader which uses reservoir sampling. SAMPLE is rewritten to filter + less than expression with sampling percentage as predicate value.
Thanks -- Prasanth On Feb 28, 2013, at 5:01 AM, Gianmarco De Francisci Morales <[email protected]> wrote: > Hi, > LIMIT takes the first X records, so there are no statistical guarantees. > SAMPLE takes X% of the records from the whole bag (uniformly), so you have > statistical guarantees. > No, SAMPLE does not use reservoir sampling. > > Cheers, > > -- > Gianmarco > > > On Wed, Feb 27, 2013 at 12:23 AM, Prasanth J > <[email protected]>wrote: > >> AFAIK, SAMPLE operator internally uses reservoir sampling. So it reads >> entire data to randomly generate 10% data. >> >> Thanks >> -- Prasanth >> >> On Feb 26, 2013, at 6:19 PM, Panshul Whisper <[email protected]> >> wrote: >> >>> Hello, >>> >>> Can somebody please explain me the difference between Limit and Sample >>> statements. >>> Does it read the entire input file in case of Sample if the value is set >> to >>> 0.1 or it reads randomly only till 10% of the data has been collected. >>> >>> Thanking You for any help. >>> >>> -- >>> Regards, >>> Ouch Whisper >>> 010101010101 >> >>
