Hi, LIMIT takes the first X records, so there are no statistical guarantees. SAMPLE takes X% of the records from the whole bag (uniformly), so you have statistical guarantees. No, SAMPLE does not use reservoir sampling.
Cheers, -- Gianmarco On Wed, Feb 27, 2013 at 12:23 AM, Prasanth J <[email protected]>wrote: > AFAIK, SAMPLE operator internally uses reservoir sampling. So it reads > entire data to randomly generate 10% data. > > Thanks > -- Prasanth > > On Feb 26, 2013, at 6:19 PM, Panshul Whisper <[email protected]> > wrote: > > > Hello, > > > > Can somebody please explain me the difference between Limit and Sample > > statements. > > Does it read the entire input file in case of Sample if the value is set > to > > 0.1 or it reads randomly only till 10% of the data has been collected. > > > > Thanking You for any help. > > > > -- > > Regards, > > Ouch Whisper > > 010101010101 > >
