Thanks Sean.
On Sep 20, 2016 7:45 AM, "Sean Owen" <so...@cloudera.com> wrote:

> Ah, I think that this was supposed to be changed with SPARK-9062. Let
> me see about reopening 10835 and addressing it.
>
> On Tue, Sep 20, 2016 at 3:24 PM, janardhan shetty
> <janardhan...@gmail.com> wrote:
> > Is this a bug?
> >
> > On Sep 19, 2016 10:10 PM, "janardhan shetty" <janardhan...@gmail.com>
> wrote:
> >>
> >> Hi,
> >>
> >> I am hitting this issue.
> >> https://issues.apache.org/jira/browse/SPARK-10835.
> >>
> >> Issue seems to be resolved but resurfacing in 2.0 ML. Any workaround is
> >> appreciated ?
> >>
> >> Note:
> >> Pipeline has Ngram before word2Vec.
> >>
> >> Error:
> >> val word2Vec = new
> >> Word2Vec().setInputCol("wordsGrams").setOutputCol("
> features").setVectorSize(128).setMinCount(10)
> >>
> >> scala> word2Vec.fit(grams)
> >> java.lang.IllegalArgumentException: requirement failed: Column
> wordsGrams
> >> must be of type ArrayType(StringType,true) but was actually
> >> ArrayType(StringType,false).
> >>   at scala.Predef$.require(Predef.scala:224)
> >>   at
> >> org.apache.spark.ml.util.SchemaUtils$.checkColumnType(
> SchemaUtils.scala:42)
> >>   at
> >> org.apache.spark.ml.feature.Word2VecBase$class.
> validateAndTransformSchema(Word2Vec.scala:111)
> >>   at
> >> org.apache.spark.ml.feature.Word2Vec.validateAndTransformSchema(
> Word2Vec.scala:121)
> >>   at
> >> org.apache.spark.ml.feature.Word2Vec.transformSchema(
> Word2Vec.scala:187)
> >>   at org.apache.spark.ml.PipelineStage.transformSchema(
> Pipeline.scala:70)
> >>   at org.apache.spark.ml.feature.Word2Vec.fit(Word2Vec.scala:170)
> >>
> >>
> >> Github code for Ngram:
> >>
> >>
> >> override protected def validateInputType(inputType: DataType): Unit = {
> >>     require(inputType.sameType(ArrayType(StringType)),
> >>       s"Input type must be ArrayType(StringType) but got $inputType.")
> >>   }
> >>
> >>   override protected def outputDataType: DataType = new
> >> ArrayType(StringType, false)
> >> }
> >>
> >
>

Reply via email to