[jira] [Updated] (SPARK-27425) Add count_if functions
[ https://issues.apache.org/jira/browse/SPARK-27425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaerim Yeo updated SPARK-27425: Description: Add aggregation function which returns the number of records satisfying a given condition. For Presto, [{{count_if}}|https://prestodb.github.io/docs/current/functions/aggregate.html] function is supported, we can write concisely. However, Spark does not support yet, we need to write like {{COUNT(CASE WHEN some_condition THEN 1 END)}} or {{SUM(CASE WHEN some_condition THEN 1 END)}}, which looks painful. was: Add aggregation function which returns the number of records satisfying a given condition. For Presto, [{{count_if}}|https://prestodb.github.io/docs/current/functions/aggregate.html] function is supported, we can write concisely. However, Spark does not support yet, we need to write like {{count(case when some_condition then 1 end)}} or {{sum(case when some_condition then 1 end)}}, which looks painful. > Add count_if functions > -- > > Key: SPARK-27425 > URL: https://issues.apache.org/jira/browse/SPARK-27425 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.1 >Reporter: Chaerim Yeo >Priority: Minor > > Add aggregation function which returns the number of records satisfying a > given condition. > For Presto, > [{{count_if}}|https://prestodb.github.io/docs/current/functions/aggregate.html] > function is supported, we can write concisely. > However, Spark does not support yet, we need to write like {{COUNT(CASE WHEN > some_condition THEN 1 END)}} or {{SUM(CASE WHEN some_condition THEN 1 END)}}, > which looks painful. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27425) Add count_if functions
[ https://issues.apache.org/jira/browse/SPARK-27425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaerim Yeo updated SPARK-27425: Description: Add aggregation function which returns the number of records satisfying a given condition. For Presto, [{{count_if}}|https://prestodb.github.io/docs/current/functions/aggregate.html] function is supported, we can write concisely. However, Spark does not support yet, we need to write like {{count(case when some_condition then 1 end)}} or {{sum(case when some_condition then 1 end)}}, which looks painful. was: Add aggregation function which returns the number of records satisfying a given condition. For Presto, [{{count_if}}|https://prestodb.github.io/docs/current/functions/aggregate.html] function is supported, we can write concisely. However, Spark does not support yet, we need to write like {{count(case when some_condition then 1)}} or {{sum(case when some_condition then 1 end)}}, which looks painful. > Add count_if functions > -- > > Key: SPARK-27425 > URL: https://issues.apache.org/jira/browse/SPARK-27425 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.1 >Reporter: Chaerim Yeo >Priority: Minor > > Add aggregation function which returns the number of records satisfying a > given condition. > For Presto, > [{{count_if}}|https://prestodb.github.io/docs/current/functions/aggregate.html] > function is supported, we can write concisely. > However, Spark does not support yet, we need to write like {{count(case when > some_condition then 1 end)}} or {{sum(case when some_condition then 1 end)}}, > which looks painful. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27425) Add count_if functions
[ https://issues.apache.org/jira/browse/SPARK-27425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaerim Yeo updated SPARK-27425: Summary: Add count_if functions (was: SQL count_if functions) > Add count_if functions > -- > > Key: SPARK-27425 > URL: https://issues.apache.org/jira/browse/SPARK-27425 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.1 >Reporter: Chaerim Yeo >Priority: Minor > > Add aggregation function which returns the number of records satisfying a > given condition. > For Presto, > [{{count_if}}|https://prestodb.github.io/docs/current/functions/aggregate.html] > function is supported, we can write concisely. > However, Spark does not support yet, we need to write like {{count(case when > some_condition then 1)}} or {{sum(case when some_condition then 1 end)}}, > which looks painful. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27425) SQL count_if functions
Chaerim Yeo created SPARK-27425: --- Summary: SQL count_if functions Key: SPARK-27425 URL: https://issues.apache.org/jira/browse/SPARK-27425 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.1 Reporter: Chaerim Yeo Add aggregation function which returns the number of records satisfying a given condition. For Presto, [{{count_if}}|https://prestodb.github.io/docs/current/functions/aggregate.html] function is supported, we can write concisely. However, Spark does not support yet, we need to write like {{count(case when some_condition then 1)}} or {{sum(case when some_condition then 1 end)}}, which looks painful. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25571) Add withColumnsRenamed method to Dataset
[ https://issues.apache.org/jira/browse/SPARK-25571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaerim Yeo updated SPARK-25571: External issue URL: (was: https://github.com/apache/spark/pull/) External issue ID: (was: 22591) > Add withColumnsRenamed method to Dataset > > > Key: SPARK-25571 > URL: https://issues.apache.org/jira/browse/SPARK-25571 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.2 >Reporter: Chaerim Yeo >Priority: Major > > There are two general approaches to rename several columns. > * Using *withColumnRenamed* method > * Using *select* method > {code} > // Using withColumnRenamed > ds.withColumnRenamed("first_name", "firstName") > .withColumnRenamed("last_name", "lastName") > .withColumnRenamed("postal_code", "postalCode") > // Using select > ds.select( > $"id", > $"first_name" as "firstName", > $"last_name" as "lastName", > $"address", > $"postal_code" as "postalCode" > ) > {code} > However, both approaches are still inefficient and redundant due to following > limitations. > * withColumnRenamed: it is required to call method several times > * select: it is required to pass all columns to select method > It is necessary to implement new method, such as *withColumnsRenamed*, which > can rename many columns at once. > {code} > ds.withColumnsRenamed( > "first_name" -> "firstName", > "last_name" -> "lastName", > "postal_code" -> "postalCode" > ) > // or > ds.withColumnsRenamed(Map( > "first_name" -> "firstName", > "last_name" -> "lastName", > "postal_code" -> "postalCode" > )) > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25571) Add withColumnsRenamed method to Dataset
[ https://issues.apache.org/jira/browse/SPARK-25571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaerim Yeo updated SPARK-25571: External issue URL: https://github.com/apache/spark/pull/ External issue ID: 22591 > Add withColumnsRenamed method to Dataset > > > Key: SPARK-25571 > URL: https://issues.apache.org/jira/browse/SPARK-25571 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.2 >Reporter: Chaerim Yeo >Priority: Major > > There are two general approaches to rename several columns. > * Using *withColumnRenamed* method > * Using *select* method > {code} > // Using withColumnRenamed > ds.withColumnRenamed("first_name", "firstName") > .withColumnRenamed("last_name", "lastName") > .withColumnRenamed("postal_code", "postalCode") > // Using select > ds.select( > $"id", > $"first_name" as "firstName", > $"last_name" as "lastName", > $"address", > $"postal_code" as "postalCode" > ) > {code} > However, both approaches are still inefficient and redundant due to following > limitations. > * withColumnRenamed: it is required to call method several times > * select: it is required to pass all columns to select method > It is necessary to implement new method, such as *withColumnsRenamed*, which > can rename many columns at once. > {code} > ds.withColumnsRenamed( > "first_name" -> "firstName", > "last_name" -> "lastName", > "postal_code" -> "postalCode" > ) > // or > ds.withColumnsRenamed(Map( > "first_name" -> "firstName", > "last_name" -> "lastName", > "postal_code" -> "postalCode" > )) > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25571) Add withColumnsRenamed method to Dataset
[ https://issues.apache.org/jira/browse/SPARK-25571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632777#comment-16632777 ] Chaerim Yeo commented on SPARK-25571: - I'm working on it now. > Add withColumnsRenamed method to Dataset > > > Key: SPARK-25571 > URL: https://issues.apache.org/jira/browse/SPARK-25571 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.2 >Reporter: Chaerim Yeo >Priority: Major > > There are two general approaches to rename several columns. > * Using *withColumnRenamed* method > * Using *select* method > {code} > // Using withColumnRenamed > ds.withColumnRenamed("first_name", "firstName") > .withColumnRenamed("last_name", "lastName") > .withColumnRenamed("postal_code", "postalCode") > // Using select > ds.select( > $"id", > $"first_name" as "firstName", > $"last_name" as "lastName", > $"address", > $"postal_code" as "postalCode" > ) > {code} > However, both approaches are still inefficient and redundant due to following > limitations. > * withColumnRenamed: it is required to call method several times > * select: it is required to pass all columns to select method > It is necessary to implement new method, such as *withColumnsRenamed*, which > can rename many columns at once. > {code} > ds.withColumnsRenamed( > "first_name" -> "firstName", > "last_name" -> "lastName", > "postal_code" -> "postalCode" > ) > // or > ds.withColumnsRenamed(Map( > "first_name" -> "firstName", > "last_name" -> "lastName", > "postal_code" -> "postalCode" > )) > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25571) Add withColumnsRenamed method to Dataset
Chaerim Yeo created SPARK-25571: --- Summary: Add withColumnsRenamed method to Dataset Key: SPARK-25571 URL: https://issues.apache.org/jira/browse/SPARK-25571 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.3.2 Reporter: Chaerim Yeo There are two general approaches to rename several columns. * Using *withColumnRenamed* method * Using *select* method {code} // Using withColumnRenamed ds.withColumnRenamed("first_name", "firstName") .withColumnRenamed("last_name", "lastName") .withColumnRenamed("postal_code", "postalCode") // Using select ds.select( $"id", $"first_name" as "firstName", $"last_name" as "lastName", $"address", $"postal_code" as "postalCode" ) {code} However, both approaches are still inefficient and redundant due to following limitations. * withColumnRenamed: it is required to call method several times * select: it is required to pass all columns to select method It is necessary to implement new method, such as *withColumnsRenamed*, which can rename many columns at once. {code} ds.withColumnsRenamed( "first_name" -> "firstName", "last_name" -> "lastName", "postal_code" -> "postalCode" ) // or ds.withColumnsRenamed(Map( "first_name" -> "firstName", "last_name" -> "lastName", "postal_code" -> "postalCode" )) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org