Junpei Zhou created SPARK-31400:
-----------------------------------

             Summary: The catalogString doesn't distinguish Vectors in ml and 
mllib
                 Key: SPARK-31400
                 URL: https://issues.apache.org/jira/browse/SPARK-31400
             Project: Spark
          Issue Type: Bug
          Components: ML, MLlib
    Affects Versions: 2.4.5
         Environment: Ubuntu 16.04
            Reporter: Junpei Zhou


h2. Bug Description

The `catalogString` is not detailed enough to distinguish the 
pyspark.ml.linalg.Vectors and pyspark.mllib.linalg.Vectors.
h2. How to reproduce the bug

[Here|https://spark.apache.org/docs/latest/ml-features#minmaxscaler] is an 
example from the official document (Python code). If I keep all other lines 
untouched, and only modify the Vectors import line, which means:

 
{code:java}
# from pyspark.ml.linalg import Vectors
from pyspark.mllib.linalg import Vectors
{code}
Or you can directly execute the following code snippet:

 

 
{code:java}
from pyspark.ml.feature import MinMaxScaler
# from pyspark.ml.linalg import Vectors
from pyspark.mllib.linalg import Vectors
dataFrame = spark.createDataFrame([
    (0, Vectors.dense([1.0, 0.1, -1.0]),),
    (1, Vectors.dense([2.0, 1.1, 1.0]),),
    (2, Vectors.dense([3.0, 10.1, 3.0]),)
], ["id", "features"])
scaler = MinMaxScaler(inputCol="features", outputCol="scaledFeatures")
scalerModel = scaler.fit(dataFrame)
{code}
It will raise an error:

 
{code:java}
IllegalArgumentException: 'requirement failed: Column features must be of type 
struct<type:tinyint,size:int,indices:array<int>,values:array<double>> but was 
actually struct<type:tinyint,size:int,indices:array<int>,values:array<double>>.'
{code}
However, the actually struct and the desired struct are exactly the same 
string, which cannot provide useful information to the programmer. I would 
suggest making the catalogString distinguish pyspark.ml.linalg.Vectors and 
pyspark.mllib.linalg.Vectors.

Thanks!

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to