Is this a bug?
On Sep 19, 2016 10:10 PM, "janardhan shetty" <janardhan...@gmail.com> wrote:

> Hi,
>
> I am hitting this issue. https://issues.apache.org/jira/browse/SPARK-10835
> .
>
> Issue seems to be resolved but resurfacing in 2.0 ML. Any workaround is
> appreciated ?
>
> Note:
> Pipeline has Ngram before word2Vec.
>
> Error:
> val word2Vec = new Word2Vec().setInputCol("wordsGrams").setOutputCol("
> features").setVectorSize(128).setMinCount(10)
>
> scala> word2Vec.fit(grams)
> java.lang.IllegalArgumentException: requirement failed: Column wordsGrams
> must be of type ArrayType(StringType,true) but was actually
> ArrayType(StringType,false).
>   at scala.Predef$.require(Predef.scala:224)
>   at org.apache.spark.ml.util.SchemaUtils$.checkColumnType(
> SchemaUtils.scala:42)
>   at org.apache.spark.ml.feature.Word2VecBase$class.
> validateAndTransformSchema(Word2Vec.scala:111)
>   at org.apache.spark.ml.feature.Word2Vec.validateAndTransformSchema(
> Word2Vec.scala:121)
>   at org.apache.spark.ml.feature.Word2Vec.transformSchema(
> Word2Vec.scala:187)
>   at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:70)
>   at org.apache.spark.ml.feature.Word2Vec.fit(Word2Vec.scala:170)
>
>
> Github code for Ngram:
>
>
> override protected def validateInputType(inputType: DataType): Unit = {
>     require(inputType.sameType(ArrayType(StringType)),
>       s"Input type must be ArrayType(StringType) but got $inputType.")
>   }
>
>   override protected def outputDataType: DataType = new
> ArrayType(StringType, false)
> }
>
>

Reply via email to