[jira] [Commented] (SPARK-19653) `Vector` Type Should Be A First-Class Citizen In Spark SQL

2019-08-13 Thread Tanmay Binaykiya (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906751#comment-16906751
 ] 

Tanmay Binaykiya commented on SPARK-19653:
--

What is the status of this? 

> `Vector` Type Should Be A First-Class Citizen In Spark SQL
> --
>
> Key: SPARK-19653
> URL: https://issues.apache.org/jira/browse/SPARK-19653
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib, SQL
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Mike Dusenberry
>Priority: Major
>  Labels: bulk-closed
>
> *Issue*: The {{Vector}} type in Spark MLlib (DataFrame-based API, informally 
> "Spark ML") should be added as a first-class citizen to Spark SQL.
> *Current Status*:  Currently, Spark MLlib adds a [{{Vector}} SQL datatype | 
> https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.linalg.SQLDataTypes$]
>  to allow DataFrames/DataSets to use {{Vector}} columns, which is necessary 
> for MLlib algorithms.  Although this allows a DataFrame/DataSet to contain 
> vectors, it does not allow one to make complete use of the rich set of 
> features made available by Spark SQL.  For example, it is not possible to use 
> any of the SQL functions, such as {{avg}}, {{sum}}, etc. on a {{Vector}} 
> column, nor is it possible to save a DataFrame with a {{Vector}} column as a 
> CSV file.  In any of these cases, an error message is returned with an note 
> that the operator is not supported on a {{Vector}} type.
> *Benefit*: Allow users to make use of all Spark SQL features that can be 
> reasonably applied to a vector.
> *Goal*:  Move the {{Vector}} type from Spark MLlib into Spark SQL as a 
> first-class citizen.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19653) `Vector` Type Should Be A First-Class Citizen In Spark SQL

2017-03-13 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15923307#comment-15923307
 ] 

Joseph K. Bradley commented on SPARK-19653:
---

I agree it'd be nice to make it easier to work with linalg types in DataFrames. 
 There are 2 paths:
1. Make linalg types (at least Vector, ideally Matrix) into first-class 
citizens of Spark SQL.
2. Improve support for UDTs so that linalg types can stay in MLlib yet still be 
easy to work with in DataFrames.

For the purpose of ML, I'm OK with either.  For the purpose of making Spark 
more useful and powerful in general, one could argue that 2 is the better 
choice, although it might be harder to design and implement.

> `Vector` Type Should Be A First-Class Citizen In Spark SQL
> --
>
> Key: SPARK-19653
> URL: https://issues.apache.org/jira/browse/SPARK-19653
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib, SQL
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Mike Dusenberry
>
> *Issue*: The {{Vector}} type in Spark MLlib (DataFrame-based API, informally 
> "Spark ML") should be added as a first-class citizen to Spark SQL.
> *Current Status*:  Currently, Spark MLlib adds a [{{Vector}} SQL datatype | 
> https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.linalg.SQLDataTypes$]
>  to allow DataFrames/DataSets to use {{Vector}} columns, which is necessary 
> for MLlib algorithms.  Although this allows a DataFrame/DataSet to contain 
> vectors, it does not allow one to make complete use of the rich set of 
> features made available by Spark SQL.  For example, it is not possible to use 
> any of the SQL functions, such as {{avg}}, {{sum}}, etc. on a {{Vector}} 
> column, nor is it possible to save a DataFrame with a {{Vector}} column as a 
> CSV file.  In any of these cases, an error message is returned with an note 
> that the operator is not supported on a {{Vector}} type.
> *Benefit*: Allow users to make use of all Spark SQL features that can be 
> reasonably applied to a vector.
> *Goal*:  Move the {{Vector}} type from Spark MLlib into Spark SQL as a 
> first-class citizen.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19653) `Vector` Type Should Be A First-Class Citizen In Spark SQL

2017-02-18 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873103#comment-15873103
 ] 

Sean Owen commented on SPARK-19653:
---

Related to https://issues.apache.org/jira/browse/SPARK-19217

> `Vector` Type Should Be A First-Class Citizen In Spark SQL
> --
>
> Key: SPARK-19653
> URL: https://issues.apache.org/jira/browse/SPARK-19653
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib, SQL
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Mike Dusenberry
>
> *Issue*: The {{Vector}} type in Spark MLlib (DataFrame-based API, informally 
> "Spark ML") should be added as a first-class citizen to Spark SQL.
> *Current Status*:  Currently, Spark MLlib adds a [{{Vector}} SQL datatype | 
> https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.linalg.SQLDataTypes$]
>  to allow DataFrames/DataSets to use {{Vector}} columns, which is necessary 
> for MLlib algorithms.  Although this allows a DataFrame/DataSet to contain 
> vectors, it does not allow one to make complete use of the rich set of 
> features made available by Spark SQL.  For example, it is not possible to use 
> any of the SQL functions, such as {{avg}}, {{sum}}, etc. on a {{Vector}} 
> column, nor is it possible to save a DataFrame with a {{Vector}} column as a 
> CSV file.  In any of these cases, an error message is returned with an note 
> that the operator is not supported on a {{Vector}} type.
> *Benefit*: Allow users to make use of all Spark SQL features that can be 
> reasonably applied to a vector.
> *Goal*:  Move the {{Vector}} type from Spark MLlib into Spark SQL as a 
> first-class citizen.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19653) `Vector` Type Should Be A First-Class Citizen In Spark SQL

2017-02-17 Thread Liang-Chi Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872938#comment-15872938
 ] 

Liang-Chi Hsieh commented on SPARK-19653:
-

Actually some Spark SQL functions like the mentioned {{avg}}, {{sum}} only 
support {{NumericType}}. They don't support {{Vector}} is not all because 
{{Vector}} type isn't first-class citizen in Spark SQL.

> `Vector` Type Should Be A First-Class Citizen In Spark SQL
> --
>
> Key: SPARK-19653
> URL: https://issues.apache.org/jira/browse/SPARK-19653
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib, SQL
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Mike Dusenberry
>
> *Issue*: The {{Vector}} type in Spark MLlib (DataFrame-based API, informally 
> "Spark ML") should be added as a first-class citizen to Spark SQL.
> *Current Status*:  Currently, Spark MLlib adds a [{{Vector}} SQL datatype | 
> https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.linalg.SQLDataTypes$]
>  to allow DataFrames/DataSets to use {{Vector}} columns, which is necessary 
> for MLlib algorithms.  Although this allows a DataFrame/DataSet to contain 
> vectors, it does not allow one to make complete use of the rich set of 
> features made available by Spark SQL.  For example, it is not possible to use 
> any of the SQL functions, such as {{avg}}, {{sum}}, etc. on a {{Vector}} 
> column, nor is it possible to save a DataFrame with a {{Vector}} column as a 
> CSV file.  In any of these cases, an error message is returned with an note 
> that the operator is not supported on a {{Vector}} type.
> *Benefit*: Allow users to make use of all Spark SQL features that can be 
> reasonably applied to a vector.
> *Goal*:  Move the {{Vector}} type from Spark MLlib into Spark SQL as a 
> first-class citizen.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19653) `Vector` Type Should Be A First-Class Citizen In Spark SQL

2017-02-17 Thread Kazuaki Ishizaki (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872901#comment-15872901
 ] 

Kazuaki Ishizaki commented on SPARK-19653:
--

cc: [~cloud_fan]

> `Vector` Type Should Be A First-Class Citizen In Spark SQL
> --
>
> Key: SPARK-19653
> URL: https://issues.apache.org/jira/browse/SPARK-19653
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib, SQL
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Mike Dusenberry
>
> *Issue*: The {{Vector}} type in Spark MLlib (DataFrame-based API, informally 
> "Spark ML") should be added as a first-class citizen to Spark SQL.
> *Current Status*:  Currently, Spark MLlib adds a [{{Vector}} SQL datatype | 
> https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.linalg.SQLDataTypes$]
>  to allow DataFrames/DataSets to use {{Vector}} columns, which is necessary 
> for MLlib algorithms.  Although this allows a DataFrame/DataSet to contain 
> vectors, it does not allow one to make complete use of the rich set of 
> features made available by Spark SQL.  For example, it is not possible to use 
> any of the SQL functions, such as {{avg}}, {{sum}}, etc. on a {{Vector}} 
> column, nor is it possible to save a DataFrame with a {{Vector}} column as a 
> CSV file.  In any of these cases, an error message is returned with an note 
> that the operator is not supported on a {{Vector}} type.
> *Benefit*: Allow users to make use of all Spark SQL features that can be 
> reasonably applied to a vector.
> *Goal*:  Move the {{Vector}} type from Spark MLlib into Spark SQL as a 
> first-class citizen.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19653) `Vector` Type Should Be A First-Class Citizen In Spark SQL

2017-02-17 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872847#comment-15872847
 ] 

Mike Dusenberry commented on SPARK-19653:
-

cc [~sethah]

> `Vector` Type Should Be A First-Class Citizen In Spark SQL
> --
>
> Key: SPARK-19653
> URL: https://issues.apache.org/jira/browse/SPARK-19653
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib, SQL
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Mike Dusenberry
>
> *Issue*: The {{Vector}} type in Spark MLlib (DataFrame-based API, informally 
> "Spark ML") should be added as a first-class citizen to Spark SQL.
> *Current Status*:  Currently, Spark MLlib adds a [{{Vector}} SQL datatype | 
> https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.linalg.SQLDataTypes$]
>  to allow DataFrames/DataSets to use {{Vector}} columns, which is necessary 
> for MLlib algorithms.  Although this allows a DataFrame/DataSet to contain 
> vectors, it does not allow one to make complete use of the rich set of 
> features made available by Spark SQL.  For example, it is not possible to use 
> any of the SQL functions, such as {{avg}}, {{sum}}, etc. on a {{Vector}} 
> column, nor is it possible to save a DataFrame with a {{Vector}} column as a 
> CSV file.  In any of these cases, an error message is returned with an note 
> that the operator is not supported on a {{Vector}} type.
> *Benefit*: Allow users to make use of all Spark SQL features that can be 
> reasonably applied to a vector.
> *Goal*:  Move the {{Vector}} type from Spark MLlib into Spark SQL as a 
> first-class citizen.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19653) `Vector` Type Should Be A First-Class Citizen In Spark SQL

2017-02-17 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872819#comment-15872819
 ] 

Xiao Li commented on SPARK-19653:
-

cc [~mengxr] [~josephkb]

> `Vector` Type Should Be A First-Class Citizen In Spark SQL
> --
>
> Key: SPARK-19653
> URL: https://issues.apache.org/jira/browse/SPARK-19653
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib, SQL
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Mike Dusenberry
>
> *Issue*: The {{Vector}} type in Spark MLlib (DataFrame-based API, informally 
> "Spark ML") should be added as a first-class citizen to Spark SQL.
> *Current Status*:  Currently, Spark MLlib adds a [{{Vector}} SQL datatype | 
> https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.linalg.SQLDataTypes$]
>  to allow DataFrames/DataSets to use {{Vector}} columns, which is necessary 
> for MLlib algorithms.  Although this allows a DataFrame/DataSet to contain 
> vectors, it does not allow one to make complete use of the rich set of 
> features made available by Spark SQL.  For example, it is not possible to use 
> any of the SQL functions, such as {{avg}}, {{sum}}, etc. on a {{Vector}} 
> column, nor is it possible to save a DataFrame with a {{Vector}} column as a 
> CSV file.  In any of these cases, an error message is returned with an note 
> that the operator is not supported on a {{Vector}} type.
> *Benefit*: Allow users to make use of all Spark SQL features that can be 
> reasonably applied to a vector.
> *Goal*:  Move the {{Vector}} type from Spark MLlib into Spark SQL as a 
> first-class citizen.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19653) `Vector` Type Should Be A First-Class Citizen In Spark SQL

2017-02-17 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872810#comment-15872810
 ] 

Mike Dusenberry commented on SPARK-19653:
-

cc [~mlnick], [~smilegator]

> `Vector` Type Should Be A First-Class Citizen In Spark SQL
> --
>
> Key: SPARK-19653
> URL: https://issues.apache.org/jira/browse/SPARK-19653
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib, SQL
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Mike Dusenberry
>
> *Issue*: The {{Vector}} type in Spark MLlib (DataFrame-based API, informally 
> "Spark ML") should be added as a first-class citizen to Spark SQL.
> *Current Status*:  Currently, Spark MLlib adds a [{{Vector}} SQL datatype | 
> https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.linalg.SQLDataTypes$]
>  to allow DataFrames/DataSets to use {{Vector}} columns, which is necessary 
> for MLlib algorithms.  Although this allows a DataFrame/DataSet to contain 
> vectors, it does not allow one to make complete use of the rich set of 
> features made available by Spark SQL.  For example, it is not possible to use 
> any of the SQL functions, such as {{avg}}, {{sum}}, etc. on a {{Vector}} 
> column, nor is it possible to save a DataFrame with a {{Vector}} column as a 
> CSV file.  In any of these cases, an error message is returned with an note 
> that the operator is not supported on a {{Vector}} type.
> *Benefit*: Allow users to make use of all Spark SQL features that can be 
> reasonably applied to a vector.
> *Goal*:  Move the {{Vector}} type from Spark MLlib into Spark SQL as a 
> first-class citizen.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org