[jira] [Updated] (SPARK-27425) Add count_if functions

2019-04-10 Thread Chaerim Yeo (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaerim Yeo updated SPARK-27425:

Description: 
Add aggregation function which returns the number of records satisfying a given 
condition.

For Presto, 
[{{count_if}}|https://prestodb.github.io/docs/current/functions/aggregate.html] 
function is supported, we can write concisely.

However, Spark does not support yet, we need to write like {{COUNT(CASE WHEN 
some_condition THEN 1 END)}} or {{SUM(CASE WHEN some_condition THEN 1 END)}}, 
which looks painful.

  was:
Add aggregation function which returns the number of records satisfying a given 
condition.

For Presto, 
[{{count_if}}|https://prestodb.github.io/docs/current/functions/aggregate.html] 
function is supported, we can write concisely.

However, Spark does not support yet, we need to write like {{count(case when 
some_condition then 1 end)}} or {{sum(case when some_condition then 1 end)}}, 
which looks painful.


> Add count_if functions
> --
>
> Key: SPARK-27425
> URL: https://issues.apache.org/jira/browse/SPARK-27425
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.1
>Reporter: Chaerim Yeo
>Priority: Minor
>
> Add aggregation function which returns the number of records satisfying a 
> given condition.
> For Presto, 
> [{{count_if}}|https://prestodb.github.io/docs/current/functions/aggregate.html]
>  function is supported, we can write concisely.
> However, Spark does not support yet, we need to write like {{COUNT(CASE WHEN 
> some_condition THEN 1 END)}} or {{SUM(CASE WHEN some_condition THEN 1 END)}}, 
> which looks painful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27425) Add count_if functions

2019-04-10 Thread Chaerim Yeo (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaerim Yeo updated SPARK-27425:

Description: 
Add aggregation function which returns the number of records satisfying a given 
condition.

For Presto, 
[{{count_if}}|https://prestodb.github.io/docs/current/functions/aggregate.html] 
function is supported, we can write concisely.

However, Spark does not support yet, we need to write like {{count(case when 
some_condition then 1 end)}} or {{sum(case when some_condition then 1 end)}}, 
which looks painful.

  was:
Add aggregation function which returns the number of records satisfying a given 
condition.

For Presto, 
[{{count_if}}|https://prestodb.github.io/docs/current/functions/aggregate.html] 
function is supported, we can write concisely.

However, Spark does not support yet, we need to write like {{count(case when 
some_condition then 1)}} or {{sum(case when some_condition then 1 end)}}, which 
looks painful.


> Add count_if functions
> --
>
> Key: SPARK-27425
> URL: https://issues.apache.org/jira/browse/SPARK-27425
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.1
>Reporter: Chaerim Yeo
>Priority: Minor
>
> Add aggregation function which returns the number of records satisfying a 
> given condition.
> For Presto, 
> [{{count_if}}|https://prestodb.github.io/docs/current/functions/aggregate.html]
>  function is supported, we can write concisely.
> However, Spark does not support yet, we need to write like {{count(case when 
> some_condition then 1 end)}} or {{sum(case when some_condition then 1 end)}}, 
> which looks painful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27425) Add count_if functions

2019-04-10 Thread Chaerim Yeo (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaerim Yeo updated SPARK-27425:

Summary: Add count_if functions  (was: SQL count_if functions)

> Add count_if functions
> --
>
> Key: SPARK-27425
> URL: https://issues.apache.org/jira/browse/SPARK-27425
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.1
>Reporter: Chaerim Yeo
>Priority: Minor
>
> Add aggregation function which returns the number of records satisfying a 
> given condition.
> For Presto, 
> [{{count_if}}|https://prestodb.github.io/docs/current/functions/aggregate.html]
>  function is supported, we can write concisely.
> However, Spark does not support yet, we need to write like {{count(case when 
> some_condition then 1)}} or {{sum(case when some_condition then 1 end)}}, 
> which looks painful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27425) SQL count_if functions

2019-04-10 Thread Chaerim Yeo (JIRA)
Chaerim Yeo created SPARK-27425:
---

 Summary: SQL count_if functions
 Key: SPARK-27425
 URL: https://issues.apache.org/jira/browse/SPARK-27425
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.1
Reporter: Chaerim Yeo


Add aggregation function which returns the number of records satisfying a given 
condition.

For Presto, 
[{{count_if}}|https://prestodb.github.io/docs/current/functions/aggregate.html] 
function is supported, we can write concisely.

However, Spark does not support yet, we need to write like {{count(case when 
some_condition then 1)}} or {{sum(case when some_condition then 1 end)}}, which 
looks painful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25571) Add withColumnsRenamed method to Dataset

2018-09-29 Thread Chaerim Yeo (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaerim Yeo updated SPARK-25571:

External issue URL:   (was: https://github.com/apache/spark/pull/)
 External issue ID:   (was: 22591)

> Add withColumnsRenamed method to Dataset
> 
>
> Key: SPARK-25571
> URL: https://issues.apache.org/jira/browse/SPARK-25571
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.2
>Reporter: Chaerim Yeo
>Priority: Major
>
> There are two general approaches to rename several columns.
>  * Using *withColumnRenamed* method
>  * Using *select* method
> {code}
> // Using withColumnRenamed
> ds.withColumnRenamed("first_name", "firstName")
>   .withColumnRenamed("last_name", "lastName")
>   .withColumnRenamed("postal_code", "postalCode")
> // Using select
> ds.select(
>   $"id",
>   $"first_name" as "firstName",
>   $"last_name" as "lastName",
>   $"address",
>   $"postal_code" as "postalCode"
> )
> {code}
> However, both approaches are still inefficient and redundant due to following 
> limitations.
>  * withColumnRenamed: it is required to call method several times
>  * select: it is required to pass all columns to select method
> It is necessary to implement new method, such as *withColumnsRenamed*, which 
> can rename many columns at once.
> {code}
> ds.withColumnsRenamed(
>   "first_name" -> "firstName",
>   "last_name" -> "lastName",
>   "postal_code" -> "postalCode"
> )
> // or
> ds.withColumnsRenamed(Map(
>   "first_name" -> "firstName",
>   "last_name" -> "lastName",
>   "postal_code" -> "postalCode"
> ))
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25571) Add withColumnsRenamed method to Dataset

2018-09-29 Thread Chaerim Yeo (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaerim Yeo updated SPARK-25571:

External issue URL: https://github.com/apache/spark/pull/
 External issue ID: 22591

> Add withColumnsRenamed method to Dataset
> 
>
> Key: SPARK-25571
> URL: https://issues.apache.org/jira/browse/SPARK-25571
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.2
>Reporter: Chaerim Yeo
>Priority: Major
>
> There are two general approaches to rename several columns.
>  * Using *withColumnRenamed* method
>  * Using *select* method
> {code}
> // Using withColumnRenamed
> ds.withColumnRenamed("first_name", "firstName")
>   .withColumnRenamed("last_name", "lastName")
>   .withColumnRenamed("postal_code", "postalCode")
> // Using select
> ds.select(
>   $"id",
>   $"first_name" as "firstName",
>   $"last_name" as "lastName",
>   $"address",
>   $"postal_code" as "postalCode"
> )
> {code}
> However, both approaches are still inefficient and redundant due to following 
> limitations.
>  * withColumnRenamed: it is required to call method several times
>  * select: it is required to pass all columns to select method
> It is necessary to implement new method, such as *withColumnsRenamed*, which 
> can rename many columns at once.
> {code}
> ds.withColumnsRenamed(
>   "first_name" -> "firstName",
>   "last_name" -> "lastName",
>   "postal_code" -> "postalCode"
> )
> // or
> ds.withColumnsRenamed(Map(
>   "first_name" -> "firstName",
>   "last_name" -> "lastName",
>   "postal_code" -> "postalCode"
> ))
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25571) Add withColumnsRenamed method to Dataset

2018-09-28 Thread Chaerim Yeo (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632777#comment-16632777
 ] 

Chaerim Yeo commented on SPARK-25571:
-

I'm working on it now.

> Add withColumnsRenamed method to Dataset
> 
>
> Key: SPARK-25571
> URL: https://issues.apache.org/jira/browse/SPARK-25571
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.2
>Reporter: Chaerim Yeo
>Priority: Major
>
> There are two general approaches to rename several columns.
>  * Using *withColumnRenamed* method
>  * Using *select* method
> {code}
> // Using withColumnRenamed
> ds.withColumnRenamed("first_name", "firstName")
>   .withColumnRenamed("last_name", "lastName")
>   .withColumnRenamed("postal_code", "postalCode")
> // Using select
> ds.select(
>   $"id",
>   $"first_name" as "firstName",
>   $"last_name" as "lastName",
>   $"address",
>   $"postal_code" as "postalCode"
> )
> {code}
> However, both approaches are still inefficient and redundant due to following 
> limitations.
>  * withColumnRenamed: it is required to call method several times
>  * select: it is required to pass all columns to select method
> It is necessary to implement new method, such as *withColumnsRenamed*, which 
> can rename many columns at once.
> {code}
> ds.withColumnsRenamed(
>   "first_name" -> "firstName",
>   "last_name" -> "lastName",
>   "postal_code" -> "postalCode"
> )
> // or
> ds.withColumnsRenamed(Map(
>   "first_name" -> "firstName",
>   "last_name" -> "lastName",
>   "postal_code" -> "postalCode"
> ))
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25571) Add withColumnsRenamed method to Dataset

2018-09-28 Thread Chaerim Yeo (JIRA)
Chaerim Yeo created SPARK-25571:
---

 Summary: Add withColumnsRenamed method to Dataset
 Key: SPARK-25571
 URL: https://issues.apache.org/jira/browse/SPARK-25571
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.3.2
Reporter: Chaerim Yeo


There are two general approaches to rename several columns.
 * Using *withColumnRenamed* method
 * Using *select* method

{code}
// Using withColumnRenamed
ds.withColumnRenamed("first_name", "firstName")
  .withColumnRenamed("last_name", "lastName")
  .withColumnRenamed("postal_code", "postalCode")

// Using select
ds.select(
  $"id",
  $"first_name" as "firstName",
  $"last_name" as "lastName",
  $"address",
  $"postal_code" as "postalCode"
)
{code}
However, both approaches are still inefficient and redundant due to following 
limitations.
 * withColumnRenamed: it is required to call method several times
 * select: it is required to pass all columns to select method

It is necessary to implement new method, such as *withColumnsRenamed*, which 
can rename many columns at once.
{code}
ds.withColumnsRenamed(
  "first_name" -> "firstName",
  "last_name" -> "lastName",
  "postal_code" -> "postalCode"
)
// or
ds.withColumnsRenamed(Map(
  "first_name" -> "firstName",
  "last_name" -> "lastName",
  "postal_code" -> "postalCode"
))
{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org