Re: Spark 2.1.1: A bug in org.apache.spark.ml.linalg.* when using VectorAssembler.scala

2017-07-13 Thread Yan Facai
Hi, junjie.

As Nick said,
spark.ml indeed contains Vector, Vectors and VectorUDT by itself, see:
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala:36:
sealed trait Vector extends Serializable

So, which bug do you find with VectorAssembler? Could you give more details?









On Thu, Jul 13, 2017 at 5:15 PM,  wrote:

> Dear Developers:
>
> Here is a bug in org.apache.spark.ml.linalg.*:
> Class Vector, Vectors are not included in org.apache.spark.ml.linalg.*,
> but they are used in VectorAssembler.scala as follows:
>
> *import *org.apache.spark.ml.linalg.{Vector, Vectors, VectorUDT}
>
> Therefore, bug was reported when I was using VectorAssembler.
>
> Since org.apache.spark.mllib.linalg.* contains the class {Vector,
> Vectors, VectorUDT}, I rewrote VectorAssembler.scala as
> XVectorAssembler.scala by mainly changing "*import *org.apache.spark.*ml*
> .linalg.{Vector, Vectors, VectorUDT}" to
> "*import *org.apache.spark.*mllib*.linalg.{Vector, Vectors, VectorUDT}"
>
> But bug occured as follows:
>
> " Column v must be of type org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7
> but was actually org.apache.spark.mllib.linalg.VectorUDT@f71b0bce "
>
> Would you please help fix the bug?
>
> Thank you very much!
>
> Best regards
> --xiongjunjie



On Thu, Jul 13, 2017 at 6:08 PM, Nick Pentreath 
wrote:

> There are Vector classes under ml.linalg package - And VectorAssembler and
> other feature transformers all work with ml.linalg vectors.
>
> If you try to use mllib.linalg vectors instead you will get an error as
> the user defined type for SQL is not correct
>
>
> On Thu, 13 Jul 2017 at 11:23,  wrote:
>
>> Dear Developers:
>>
>> Here is a bug in org.apache.spark.ml.linalg.*:
>> Class Vector, Vectors are not included in org.apache.spark.ml.linalg.*,
>> but they are used in VectorAssembler.scala as follows:
>>
>> *import *org.apache.spark.ml.linalg.{Vector, Vectors, VectorUDT}
>>
>> Therefore, bug was reported when I was using VectorAssembler.
>>
>> Since org.apache.spark.mllib.linalg.* contains the class {Vector,
>> Vectors, VectorUDT}, I rewrote VectorAssembler.scala as
>> XVectorAssembler.scala by mainly changing "*import *org.apache.spark.*ml*
>> .linalg.{Vector, Vectors, VectorUDT}" to
>> "*import *org.apache.spark.*mllib*.linalg.{Vector, Vectors, VectorUDT}"
>>
>> But bug occured as follows:
>>
>> " Column v must be of type org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7
>> but was actually org.apache.spark.mllib.linalg.VectorUDT@f71b0bce "
>>
>> Would you please help fix the bug?
>>
>> Thank you very much!
>>
>> Best regards
>> --xiongjunjie
>
>


Re: Spark 2.1.1: A bug in org.apache.spark.ml.linalg.* when using VectorAssembler.scala

2017-07-13 Thread Nick Pentreath
There are Vector classes under ml.linalg package - And VectorAssembler and
other feature transformers all work with ml.linalg vectors.

If you try to use mllib.linalg vectors instead you will get an error as the
user defined type for SQL is not correct


On Thu, 13 Jul 2017 at 11:23,  wrote:

> Dear Developers:
>
> Here is a bug in org.apache.spark.ml.linalg.*:
> Class Vector, Vectors are not included in org.apache.spark.ml.linalg.*,
> but they are used in VectorAssembler.scala as follows:
>
> *import *org.apache.spark.ml.linalg.{Vector, Vectors, VectorUDT}
>
> Therefore, bug was reported when I was using VectorAssembler.
>
> Since org.apache.spark.mllib.linalg.* contains the class {Vector,
> Vectors, VectorUDT}, I rewrote VectorAssembler.scala as
> XVectorAssembler.scala by mainly changing "*import 
> *org.apache.spark.*ml*.linalg.{Vector,
> Vectors, VectorUDT}" to
> "*import *org.apache.spark.*mllib*.linalg.{Vector, Vectors, VectorUDT}"
>
> But bug occured as follows:
>
> " Column v must be of type org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7
> but was actually org.apache.spark.mllib.linalg.VectorUDT@f71b0bce "
>
> Would you please help fix the bug?
>
> Thank you very much!
>
> Best regards
> --xiongjunjie


Spark 2.1.1: A bug in org.apache.spark.ml.linalg.* when using VectorAssembler.scala

2017-07-13 Thread xiongjunjie
Dear Developers:

Here is a bug in org.apache.spark.ml.linalg.*:
Class Vector, Vectors are not included in org.apache.spark.ml.linalg.*, 
but they are used in VectorAssembler.scala as follows:

import org.apache.spark.ml.linalg.{Vector, Vectors, VectorUDT}

Therefore, bug was reported when I was using VectorAssembler.

Since org.apache.spark.mllib.linalg.* contains the class {Vector, Vectors, 
VectorUDT}, I rewrote VectorAssembler.scala as XVectorAssembler.scala by 
mainly changing "import org.apache.spark.ml.linalg.{Vector, Vectors, 
VectorUDT}" to 
"import org.apache.spark.mllib.linalg.{Vector, Vectors, VectorUDT}"

But bug occured as follows:

" Column v must be of type org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7 
but was actually org.apache.spark.mllib.linalg.VectorUDT@f71b0bce "

Would you please help fix the bug?

Thank you very much!

Best regards
--xiongjunjie