It depends on the data you are querying, for .json you could change the value of JSONRecordReader.DEFAULT_ROWS_PER_BATCH, which is set by default to 4096, but this will only affect the size of the batches produced by the reader, other operators may still alter the batch size
On Tue, Jul 5, 2016 at 7:30 PM, Eric Fukuda <[email protected]> wrote: > Thanks Abdel. Looking at the code, it looks like the maximum number of > records in a batch is 64k. I suspect the reason I'm having only 4k is that > it reached the capacity of the buffer in the batch. Is there a way to > relieve this capacity restriction? It doesn't have to be a configuration > option. I don't mind changing and compiling the code. > > On Tue, Jul 5, 2016 at 8:55 PM, Abdel Hakim Deneche <[email protected] > > > wrote: > > > Unfortunately I don't think there is way to do it. > > > > On Tue, Jul 5, 2016 at 3:58 PM, Eric Fukuda <[email protected]> > wrote: > > > > > I'm trying to see how performance differs with different batch sizes. > My > > > table has 13 integer fields and 1 string field, and has 8M records. > > > Following the code with a debugger, there seem to be 4096 records in a > > > batch. Can this be 8192 or larger? > > > > > > On Tue, Jul 5, 2016 at 6:47 PM, Abdel Hakim Deneche < > > [email protected] > > > > > > > wrote: > > > > > > > hey Eric, > > > > > > > > Can you give more information about what you are trying to achieve ? > > > > > > > > Thanks > > > > > > > > On Tue, Jul 5, 2016 at 3:41 PM, Eric Fukuda <[email protected]> > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > Does anyone know if there is a way to increase or specify the > number > > of > > > > > records per batch manually? > > > > > > > > > > Thanks, > > > > > Eric > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Abdelhakim Deneche > > > > > > > > Software Engineer > > > > > > > > <http://www.mapr.com/> > > > > > > > > > > > > Now Available - Free Hadoop On-Demand Training > > > > < > > > > > > > > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > > > > > > > > > > > > > > > > -- > > > > Abdelhakim Deneche > > > > Software Engineer > > > > <http://www.mapr.com/> > > > > > > Now Available - Free Hadoop On-Demand Training > > < > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > > -- Abdelhakim Deneche Software Engineer <http://www.mapr.com/> Now Available - Free Hadoop On-Demand Training <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>
