[jira] [Commented] (BEAM-3737) Key-aware batching function
[ https://issues.apache.org/jira/browse/BEAM-3737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16507150#comment-16507150 ] Debasish Das commented on BEAM-3737: I saw this is being mentioned in TFMA...I am also not clear why BatchElements() is neededgroupByKey takes combiner which should run on both map and reduce side...Am I missing something here ? Is it the case that beam Combiner does not run on map side ? [~robertwb] is that why you mentioned that we should run the combiner upfront in ParDo and then run groupByKey to achieve map and reduce side combine ? > Key-aware batching function > --- > > Key: BEAM-3737 > URL: https://issues.apache.org/jira/browse/BEAM-3737 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chuan Yu Foo >Priority: Major > > I have a CombineFn for which add_input has very large overhead. I would like > to batch the incoming elements into a large batch before each call to > add_input to reduce this overhead. In other words, I would like to do > something like: > {{elements | GroupByKey() | BatchElements() | CombineValues(MyCombineFn())}} > Unfortunately, BatchElements is not key-aware, and can't be used after a > GroupByKey to batch elements per key. I'm working around this by doing the > batching within CombineValues, which makes the CombineFn rather messy. It > would be nice if there were a key-aware BatchElements transform which could > be used in this context. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3737) Key-aware batching function
[ https://issues.apache.org/jira/browse/BEAM-3737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16426269#comment-16426269 ] Robert Bradshaw commented on BEAM-3737: --- It seems that GroupByKey() would already give you values batched per key, right? Or are you looking for something you can place before the GBK that enables combiner lifting? > Key-aware batching function > --- > > Key: BEAM-3737 > URL: https://issues.apache.org/jira/browse/BEAM-3737 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chuan Yu Foo >Priority: Major > > I have a CombineFn for which add_input has very large overhead. I would like > to batch the incoming elements into a large batch before each call to > add_input to reduce this overhead. In other words, I would like to do > something like: > {{elements | GroupByKey() | BatchElements() | CombineValues(MyCombineFn())}} > Unfortunately, BatchElements is not key-aware, and can't be used after a > GroupByKey to batch elements per key. I'm working around this by doing the > batching within CombineValues, which makes the CombineFn rather messy. It > would be nice if there were a key-aware BatchElements transform which could > be used in this context. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3737) Key-aware batching function
[ https://issues.apache.org/jira/browse/BEAM-3737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387286#comment-16387286 ] Kenneth Knowles commented on BEAM-3737: --- Actually I may have misinterpreted this, but I think [~robertwb] has context for this. Unassigning for now. > Key-aware batching function > --- > > Key: BEAM-3737 > URL: https://issues.apache.org/jira/browse/BEAM-3737 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chuan Yu Foo >Priority: Major > > I have a CombineFn for which add_input has very large overhead. I would like > to batch the incoming elements into a large batch before each call to > add_input to reduce this overhead. In other words, I would like to do > something like: > {{elements | GroupByKey() | BatchElements() | CombineValues(MyCombineFn())}} > Unfortunately, BatchElements is not key-aware, and can't be used after a > GroupByKey to batch elements per key. I'm working around this by doing the > batching within CombineValues, which makes the CombineFn rather messy. It > would be nice if there were a key-aware BatchElements transform which could > be used in this context. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3737) Key-aware batching function
[ https://issues.apache.org/jira/browse/BEAM-3737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16373867#comment-16373867 ] Kenneth Knowles commented on BEAM-3737: --- Would you be interested in contributing something here? > Key-aware batching function > --- > > Key: BEAM-3737 > URL: https://issues.apache.org/jira/browse/BEAM-3737 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Chuan Yu Foo >Assignee: Kenneth Knowles >Priority: Major > > I have a CombineFn for which add_input has very large overhead. I would like > to batch the incoming elements into a large batch before each call to > add_input to reduce this overhead. In other words, I would like to do > something like: > {{elements | GroupByKey() | BatchElements() | CombineValues(MyCombineFn())}} > Unfortunately, BatchElements is not key-aware, and can't be used after a > GroupByKey to batch elements per key. I'm working around this by doing the > batching within CombineValues, which makes the CombineFn rather messy. It > would be nice if there were a key-aware BatchElements transform which could > be used in this context. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3737) Key-aware batching function
[ https://issues.apache.org/jira/browse/BEAM-3737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16373866#comment-16373866 ] Kenneth Knowles commented on BEAM-3737: --- This is a very interesting scenario! Can you build a {{BatchingCombineFn}} that has an accumulator that just buffers elements for a while and compacts them as needed? Then in {{extractOutput}} you can do the final conversion. > Key-aware batching function > --- > > Key: BEAM-3737 > URL: https://issues.apache.org/jira/browse/BEAM-3737 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Chuan Yu Foo >Assignee: Kenneth Knowles >Priority: Major > > I have a CombineFn for which add_input has very large overhead. I would like > to batch the incoming elements into a large batch before each call to > add_input to reduce this overhead. In other words, I would like to do > something like: > {{elements | GroupByKey() | BatchElements() | CombineValues(MyCombineFn())}} > Unfortunately, BatchElements is not key-aware, and can't be used after a > GroupByKey to batch elements per key. I'm working around this by doing the > batching within CombineValues, which makes the CombineFn rather messy. It > would be nice if there were a key-aware BatchElements transform which could > be used in this context. -- This message was sent by Atlassian JIRA (v7.6.3#76005)