[jira] [Commented] (SPARK-23899) Built-in SQL Function Improvement
[ https://issues.apache.org/jira/browse/SPARK-23899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16706156#comment-16706156 ] Arseniy Tashoyan commented on SPARK-23899: -- What do you think about this one: SPARK-23693? > Built-in SQL Function Improvement > - > > Key: SPARK-23899 > URL: https://issues.apache.org/jira/browse/SPARK-23899 > Project: Spark > Issue Type: Umbrella > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Priority: Major > Fix For: 2.4.0 > > > This umbrella JIRA is to improve compatibility with the other data processing > systems, including Hive, Teradata, Presto, Postgres, MySQL, DB2, Oracle, and > MS SQL Server. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23899) Built-in SQL Function Improvement
[ https://issues.apache.org/jira/browse/SPARK-23899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620230#comment-16620230 ] Georg Heiler commented on SPARK-23899: -- What about repartitioning by complex types, i.e. size of array? [https://stackoverflow.com/questions/46240688/how-to-equally-partition-array-data-in-spark-dataframe] Assuming n records of data frames is almost constant but m observations define the real computational complexity a regular repartition will only ensure roughly equal amounts of n records per partition not considering the size of the array. Ideally, I would want to make sure that especially arrays with many elements do not end up in the same partition in order to prevent data skew. > Built-in SQL Function Improvement > - > > Key: SPARK-23899 > URL: https://issues.apache.org/jira/browse/SPARK-23899 > Project: Spark > Issue Type: Umbrella > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Priority: Major > Fix For: 2.4.0 > > > This umbrella JIRA is to improve compatibility with the other data processing > systems, including Hive, Teradata, Presto, Postgres, MySQL, DB2, Oracle, and > MS SQL Server. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23899) Built-in SQL Function Improvement
[ https://issues.apache.org/jira/browse/SPARK-23899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609258#comment-16609258 ] Wenchen Fan commented on SPARK-23899: - I'm resolving it, since there is only one subtask unfinished, which is minor to this entire story. > Built-in SQL Function Improvement > - > > Key: SPARK-23899 > URL: https://issues.apache.org/jira/browse/SPARK-23899 > Project: Spark > Issue Type: Umbrella > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Priority: Major > Fix For: 2.4.0 > > > This umbrella JIRA is to improve compatibility with the other data processing > systems, including Hive, Teradata, Presto, Postgres, MySQL, DB2, Oracle, and > MS SQL Server. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23899) Built-in SQL Function Improvement
[ https://issues.apache.org/jira/browse/SPARK-23899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484477#comment-16484477 ] Alex Vayda commented on SPARK-23899: What do you guys think about adding another set of convenient functions for working with multi-dimentional arrays? E.g. matrix operations like {{transpose}}, {{multiply}} and others? Something similar to {{ml.linalg.Matrix}} > Built-in SQL Function Improvement > - > > Key: SPARK-23899 > URL: https://issues.apache.org/jira/browse/SPARK-23899 > Project: Spark > Issue Type: Umbrella > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Priority: Major > Fix For: 2.4.0 > > > This umbrella JIRA is to improve compatibility with the other data processing > systems, including Hive, Teradata, Presto, Postgres, MySQL, DB2, Oracle, and > MS SQL Server. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org