Xiang Gao created SPARK-17185: --------------------------------- Summary: Unify naming of API for RDD and Dataset Key: SPARK-17185 URL: https://issues.apache.org/jira/browse/SPARK-17185 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Reporter: Xiang Gao
In RDD, groupByKey is used to generate a key-list pair and aggregateByKey is used to do aggregation. In Dataset, aggregation is done by groupBy and groupByKey, and no API for key-list pair is provided. The same name "groupBy" is designed to do different things and this might be be confusing. Besides, it would be more convenient to provide API to generate key-list pair for Dataset. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org