[jira] [Commented] (SPARK-23899) Built-in SQL Function Improvement

2018-12-02 Thread Arseniy Tashoyan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16706156#comment-16706156
 ] 

Arseniy Tashoyan commented on SPARK-23899:
--

What do you think about this one: SPARK-23693?

> Built-in SQL Function Improvement
> -
>
> Key: SPARK-23899
> URL: https://issues.apache.org/jira/browse/SPARK-23899
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Priority: Major
> Fix For: 2.4.0
>
>
> This umbrella JIRA is to improve compatibility with the other data processing 
> systems, including Hive, Teradata, Presto, Postgres, MySQL, DB2, Oracle, and 
> MS SQL Server.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23899) Built-in SQL Function Improvement

2018-09-19 Thread Georg Heiler (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620230#comment-16620230
 ] 

Georg Heiler commented on SPARK-23899:
--

What about repartitioning by complex types, i.e. size of array? 
[https://stackoverflow.com/questions/46240688/how-to-equally-partition-array-data-in-spark-dataframe]
 

Assuming n records of data frames is almost constant but m observations define 
the real computational complexity a regular repartition will only ensure 
roughly equal amounts of n records per partition not considering the size of 
the array. 

 

Ideally, I would want to make sure that especially arrays with many elements do 
not end up in the same partition in order to prevent data skew.

> Built-in SQL Function Improvement
> -
>
> Key: SPARK-23899
> URL: https://issues.apache.org/jira/browse/SPARK-23899
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Priority: Major
> Fix For: 2.4.0
>
>
> This umbrella JIRA is to improve compatibility with the other data processing 
> systems, including Hive, Teradata, Presto, Postgres, MySQL, DB2, Oracle, and 
> MS SQL Server.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23899) Built-in SQL Function Improvement

2018-09-10 Thread Wenchen Fan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609258#comment-16609258
 ] 

Wenchen Fan commented on SPARK-23899:
-

I'm resolving it, since there is only one subtask unfinished, which is minor to 
this entire story.

> Built-in SQL Function Improvement
> -
>
> Key: SPARK-23899
> URL: https://issues.apache.org/jira/browse/SPARK-23899
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Priority: Major
> Fix For: 2.4.0
>
>
> This umbrella JIRA is to improve compatibility with the other data processing 
> systems, including Hive, Teradata, Presto, Postgres, MySQL, DB2, Oracle, and 
> MS SQL Server.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23899) Built-in SQL Function Improvement

2018-05-22 Thread Alex Vayda (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484477#comment-16484477
 ] 

Alex Vayda commented on SPARK-23899:


What do you guys think about adding another set of convenient functions for 
working with multi-dimentional arrays? E.g. matrix operations like 
{{transpose}}, {{multiply}} and others?
Something similar to {{ml.linalg.Matrix}}

> Built-in SQL Function Improvement
> -
>
> Key: SPARK-23899
> URL: https://issues.apache.org/jira/browse/SPARK-23899
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Priority: Major
> Fix For: 2.4.0
>
>
> This umbrella JIRA is to improve compatibility with the other data processing 
> systems, including Hive, Teradata, Presto, Postgres, MySQL, DB2, Oracle, and 
> MS SQL Server.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org