[jira] [Commented] (KAFKA-3511) Add common aggregation functions like Sum and Avg as build-ins in Kafka Streams DSL

2016-05-26 Thread Eno Thereska (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301718#comment-15301718
 ] 

Eno Thereska commented on KAFKA-3511:
-

>From feedback, it's probably best to hold off from this one for now. Users can 
>pass in their own aggregators currently (and in Java 8 they will be able to 
>also use lambda functions). It is not 100% clear what the value of providing 
>built-in aggregators would be when it is really easy for users to write their 
>own (e.g., count or sum aggregator).

> Add common aggregation functions like Sum and Avg as build-ins in Kafka 
> Streams DSL
> ---
>
> Key: KAFKA-3511
> URL: https://issues.apache.org/jira/browse/KAFKA-3511
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Eno Thereska
>  Labels: api
> Fix For: 0.10.1.0
>
>
> Currently we have the following aggregation APIs in the Streams DSL:
> {code}
> KStream.aggregateByKey(..)
> KStream.reduceByKey(..)
> KStream.countByKey(..)
> KTable.groupBy(...).aggregate(..)
> KTable.groupBy(...).reduce(..)
> KTable.groupBy(...).count(..)
> {code}
> And it is better to add common aggregation functions like Sum and Avg as 
> built-in into the Streams DSL. A few questions to ask though:
> 1. Should we add those built-in functions as, for example 
> {{KTable.groupBy(...).sum(...)} or {{KTable.groupBy(...).aggregate(SUM, 
> ...)}}. Please see the comments below for detailed pros and cons.
> 2. If we go with the second option above, should we replace the countByKey / 
> count operators with aggregate(COUNT) as well? Personally I (Guozhang) feel 
> it is not necessary, as COUNT is a special aggregate function since we do not 
> need to map on any value fields; this is the same approach as in Spark as 
> well, where Count is built-in as first-citizen in the DSL, and others are 
> built-in as {{aggregate(SUM)}}, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3511) Add common aggregation functions like Sum and Avg as build-ins in Kafka Streams DSL

2016-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301710#comment-15301710
 ] 

ASF GitHub Bot commented on KAFKA-3511:
---

Github user enothereska closed the pull request at:

https://github.com/apache/kafka/pull/1424


> Add common aggregation functions like Sum and Avg as build-ins in Kafka 
> Streams DSL
> ---
>
> Key: KAFKA-3511
> URL: https://issues.apache.org/jira/browse/KAFKA-3511
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Eno Thereska
>  Labels: api
> Fix For: 0.10.1.0
>
>
> Currently we have the following aggregation APIs in the Streams DSL:
> {code}
> KStream.aggregateByKey(..)
> KStream.reduceByKey(..)
> KStream.countByKey(..)
> KTable.groupBy(...).aggregate(..)
> KTable.groupBy(...).reduce(..)
> KTable.groupBy(...).count(..)
> {code}
> And it is better to add common aggregation functions like Sum and Avg as 
> built-in into the Streams DSL. A few questions to ask though:
> 1. Should we add those built-in functions as, for example 
> {{KTable.groupBy(...).sum(...)} or {{KTable.groupBy(...).aggregate(SUM, 
> ...)}}. Please see the comments below for detailed pros and cons.
> 2. If we go with the second option above, should we replace the countByKey / 
> count operators with aggregate(COUNT) as well? Personally I (Guozhang) feel 
> it is not necessary, as COUNT is a special aggregate function since we do not 
> need to map on any value fields; this is the same approach as in Spark as 
> well, where Count is built-in as first-citizen in the DSL, and others are 
> built-in as {{aggregate(SUM)}}, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3511) Add common aggregation functions like Sum and Avg as build-ins in Kafka Streams DSL

2016-05-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298244#comment-15298244
 ] 

ASF GitHub Bot commented on KAFKA-3511:
---

GitHub user enothereska opened a pull request:

https://github.com/apache/kafka/pull/1424

KAFKA-3511: Initial commit for aggregators [WiP]

Initial structure. Removed initialiser. Two simple aggregators.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/enothereska/kafka KAFKA-3511-sum-avg

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1424.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1424


commit 18416bb213b6eaa3fa5952af67dc5396204e247c
Author: Eno Thereska 
Date:   2016-05-24T14:25:47Z

Initial commit for aggregators




> Add common aggregation functions like Sum and Avg as build-ins in Kafka 
> Streams DSL
> ---
>
> Key: KAFKA-3511
> URL: https://issues.apache.org/jira/browse/KAFKA-3511
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Eno Thereska
>  Labels: api
> Fix For: 0.10.1.0
>
>
> Currently we have the following aggregation APIs in the Streams DSL:
> {code}
> KStream.aggregateByKey(..)
> KStream.reduceByKey(..)
> KStream.countByKey(..)
> KTable.groupBy(...).aggregate(..)
> KTable.groupBy(...).reduce(..)
> KTable.groupBy(...).count(..)
> {code}
> And it is better to add common aggregation functions like Sum and Avg as 
> built-in into the Streams DSL. A few questions to ask though:
> 1. Should we add those built-in functions as, for example 
> {{KTable.groupBy(...).sum(...)} or {{KTable.groupBy(...).aggregate(SUM, 
> ...)}}. Please see the comments below for detailed pros and cons.
> 2. If we go with the second option above, should we replace the countByKey / 
> count operators with aggregate(COUNT) as well? Personally I (Guozhang) feel 
> it is not necessary, as COUNT is a special aggregate function since we do not 
> need to map on any value fields; this is the same approach as in Spark as 
> well, where Count is built-in as first-citizen in the DSL, and others are 
> built-in as {{aggregate(SUM)}}, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)