[ https://issues.apache.org/jira/browse/MADLIB-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jingyi Mei reassigned MADLIB-1224: ---------------------------------- Assignee: Jingyi Mei > Select default buffer size for mini-batch preprocessor > ------------------------------------------------------ > > Key: MADLIB-1224 > URL: https://issues.apache.org/jira/browse/MADLIB-1224 > Project: Apache MADlib > Issue Type: Improvement > Components: Module: Utilities > Reporter: Jingyi Mei > Assignee: Jingyi Mei > Priority: Major > Fix For: v1.14 > > > As a follow up of https://issues.apache.org/jira/browse/MADLIB-1200 > > In minibatch_preprocessor, we made buffer_size as an optional parameter. If > it is not set, some default value will be assigned. Current considerations > are: > # Within segment, each cell has 1GB limit so that we can't put too many rows > into one super row to exceed the limit > # Among segments, data should be distributed as equally as possible to avoid > data skew so that GPDB can work more efficiently. -- This message was sent by Atlassian JIRA (v7.6.3#76005)