[ 
https://issues.apache.org/jira/browse/SPARK-31400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17089317#comment-17089317
 ] 

JinxinTang commented on SPARK-31400:
------------------------------------

Thanks for report this issue, this is in progress:

[PR|https://github.com/apache/spark/pull/28291]

> The catalogString doesn't distinguish Vectors in ml and mllib
> -------------------------------------------------------------
>
>                 Key: SPARK-31400
>                 URL: https://issues.apache.org/jira/browse/SPARK-31400
>             Project: Spark
>          Issue Type: Bug
>          Components: ML, MLlib
>    Affects Versions: 2.4.5
>         Environment: Ubuntu 16.04
>            Reporter: Junpei Zhou
>            Priority: Major
>
> h2. Bug Description
> The `catalogString` is not detailed enough to distinguish the 
> pyspark.ml.linalg.Vectors and pyspark.mllib.linalg.Vectors.
> h2. How to reproduce the bug
> [Here|https://spark.apache.org/docs/latest/ml-features#minmaxscaler] is an 
> example from the official document (Python code). If I keep all other lines 
> untouched, and only modify the Vectors import line, which means:
> {code:java}
> # from pyspark.ml.linalg import Vectors
> from pyspark.mllib.linalg import Vectors
> {code}
> Or you can directly execute the following code snippet:
> {code:java}
> from pyspark.ml.feature import MinMaxScaler
> # from pyspark.ml.linalg import Vectors
> from pyspark.mllib.linalg import Vectors
> dataFrame = spark.createDataFrame([
>     (0, Vectors.dense([1.0, 0.1, -1.0]),),
>     (1, Vectors.dense([2.0, 1.1, 1.0]),),
>     (2, Vectors.dense([3.0, 10.1, 3.0]),)
> ], ["id", "features"])
> scaler = MinMaxScaler(inputCol="features", outputCol="scaledFeatures")
> scalerModel = scaler.fit(dataFrame)
> {code}
> It will raise an error:
> {code:java}
> IllegalArgumentException: 'requirement failed: Column features must be of 
> type struct<type:tinyint,size:int,indices:array<int>,values:array<double>> 
> but was actually 
> struct<type:tinyint,size:int,indices:array<int>,values:array<double>>.'
> {code}
> However, the actually struct and the desired struct are exactly the same 
> string, which cannot provide useful information to the programmer. I would 
> suggest making the catalogString distinguish pyspark.ml.linalg.Vectors and 
> pyspark.mllib.linalg.Vectors.
> Thanks!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to