Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/4634#issuecomment-76695972
  
    So, it seems like there's an argument here that `combineByKey` doesn't add 
much over `aggregateByKey`. I agree, although it is slightly more general, 
letting you make an initial value as a function of an input, instead of 
providing a zero value. But `combineByKey` has all of the advanced options like 
`mapSideCombine`.
    
    So if you just need `aggregateByKey`, but do need to control advanced 
settings, you have to go down a step to use `combineByKey`. You have to provide 
a function to make a zero value, instead of a zero value, which isn't a big 
deal. Of course, I don't think the API can be changed in the short term. 
Removing `combineByKey` would lose one little bit of control too: zero value as 
a function, and as it happens now, control over things like map side combine.
    
    We're left with an argument for API consistency between Java and Scala, 
which is compelling. that is, they should at least match, irrespective of what 
changes may happen later.
    
    `groupByKey` vs `aggregateByKey` seems like a slightly different question 
that results in an alternative suggestions: add this `mapSideCombine` flag to 
`aggregateByKey`.
    
    1. Don't change Scala API. Make `combineByKey` consistent in Java API and 
expose `mapSideCombine`
    2. Add new optional param to Scala `aggregateByKey`. Add to Java 
`aggregateByKey` as well.
    
    I slightly prefer 1 because it's a strictly smaller change and leaves 
things more API consistent. It seems like purpose of 2 is to fix by removing a 
need for `combineByKey` to exist, but, it does, so that's moot to me.
    
    I'd like to proceed with this change, then. It passes tests and does not 
affect the API. I'd like to wait a couple days for @pwendell or @rxin since it 
has a core API question.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to