Since I'm reading a JSON file, I will try changing JSONRecordReader.DEFAULT_ROWS_PER_BATCH. Thanks for the advice!
Eric On Wed, Jul 6, 2016 at 12:42 AM, Abdel Hakim Deneche <[email protected]> wrote: > It depends on the data you are querying, for .json you could change the > value of JSONRecordReader.DEFAULT_ROWS_PER_BATCH, which is set by default > to 4096, but this will only affect the size of the batches produced by the > reader, other operators may still alter the batch size > > On Tue, Jul 5, 2016 at 7:30 PM, Eric Fukuda <[email protected]> wrote: > > > Thanks Abdel. Looking at the code, it looks like the maximum number of > > records in a batch is 64k. I suspect the reason I'm having only 4k is > that > > it reached the capacity of the buffer in the batch. Is there a way to > > relieve this capacity restriction? It doesn't have to be a configuration > > option. I don't mind changing and compiling the code. > > > > On Tue, Jul 5, 2016 at 8:55 PM, Abdel Hakim Deneche < > [email protected] > > > > > wrote: > > > > > Unfortunately I don't think there is way to do it. > > > > > > On Tue, Jul 5, 2016 at 3:58 PM, Eric Fukuda <[email protected]> > > wrote: > > > > > > > I'm trying to see how performance differs with different batch sizes. > > My > > > > table has 13 integer fields and 1 string field, and has 8M records. > > > > Following the code with a debugger, there seem to be 4096 records in > a > > > > batch. Can this be 8192 or larger? > > > > > > > > On Tue, Jul 5, 2016 at 6:47 PM, Abdel Hakim Deneche < > > > [email protected] > > > > > > > > > wrote: > > > > > > > > > hey Eric, > > > > > > > > > > Can you give more information about what you are trying to achieve > ? > > > > > > > > > > Thanks > > > > > > > > > > On Tue, Jul 5, 2016 at 3:41 PM, Eric Fukuda <[email protected]> > > > > wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > Does anyone know if there is a way to increase or specify the > > number > > > of > > > > > > records per batch manually? > > > > > > > > > > > > Thanks, > > > > > > Eric > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Abdelhakim Deneche > > > > > > > > > > Software Engineer > > > > > > > > > > <http://www.mapr.com/> > > > > > > > > > > > > > > > Now Available - Free Hadoop On-Demand Training > > > > > < > > > > > > > > > > > > > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Abdelhakim Deneche > > > > > > Software Engineer > > > > > > <http://www.mapr.com/> > > > > > > > > > Now Available - Free Hadoop On-Demand Training > > > < > > > > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > > > > > > > > > -- > > Abdelhakim Deneche > > Software Engineer > > <http://www.mapr.com/> > > > Now Available - Free Hadoop On-Demand Training > < > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > >
