Andrew Davidson created SPARK-12606:
---------------------------------------

             Summary: Scala/Java compatibility issue Re: how to extend java 
transformer from Scala UnaryTransformer ?
                 Key: SPARK-12606
                 URL: https://issues.apache.org/jira/browse/SPARK-12606
             Project: Spark
          Issue Type: Bug
          Components: ML
    Affects Versions: 1.5.2
         Environment: Java 8, Mac OS, Spark-1.5.2
            Reporter: Andrew Davidson



Hi Andy,

I suspect that you hit the Scala/Java compatibility issue, I can also reproduce 
this issue, so could you file a JIRA to track this issue?

Yanbo

2016-01-02 3:38 GMT+08:00 Andy Davidson <a...@santacruzintegration.com>:
I am trying to write a trivial transformer I use use in my pipeline. I am using 
java and spark 1.5.2. It was suggested that I use the Tokenize.scala class as 
an example. This should be very easy how ever I do not understand Scala, I am 
having trouble debugging the following exception.

Any help would be greatly appreciated.

Happy New Year

Andy

java.lang.IllegalArgumentException: requirement failed: Param null__inputCol 
does not belong to Stemmer_2f3aa96d-7919-4eaa-ad54-f7c620b92d1c.
        at scala.Predef$.require(Predef.scala:233)
        at org.apache.spark.ml.param.Params$class.shouldOwn(params.scala:557)
        at org.apache.spark.ml.param.Params$class.set(params.scala:436)
        at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:37)
        at org.apache.spark.ml.param.Params$class.set(params.scala:422)
        at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:37)
        at 
org.apache.spark.ml.UnaryTransformer.setInputCol(Transformer.scala:83)
        at com.pws.xxx.ml.StemmerTest.test(StemmerTest.java:30)



public class StemmerTest extends AbstractSparkTest {
    @Test
    public void test() {
        Stemmer stemmer = new Stemmer()
                                .setInputCol("raw”) //line 30
                                .setOutputCol("filtered");
    }
}

/**
 * @ see 
spark-1.5.1/mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala
 * @ see 
https://chimpler.wordpress.com/2014/06/11/classifiying-documents-using-naive-bayes-on-apache-spark-mllib/
 * @ see 
http://www.tonytruong.net/movie-rating-prediction-with-apache-spark-and-hortonworks/
 * 
 * @author andrewdavidson
 *
 */
public class Stemmer extends UnaryTransformer<List<String>, List<String>, 
Stemmer> implements Serializable{
    static Logger logger = LoggerFactory.getLogger(Stemmer.class);
    private static final long serialVersionUID = 1L;
    private static final  ArrayType inputType = 
DataTypes.createArrayType(DataTypes.StringType, true);
    private final String uid = Stemmer.class.getSimpleName() + "_" + 
UUID.randomUUID().toString();

    @Override
    public String uid() {
        return uid;
    }

    /*
       override protected def validateInputType(inputType: DataType): Unit = {
    require(inputType == StringType, s"Input type must be string type but got 
$inputType.")
  }
     */
    @Override
    public void validateInputType(DataType inputTypeArg) {
        String msg = "inputType must be " + inputType.simpleString() + " but 
got " + inputTypeArg.simpleString();
        assert (inputType.equals(inputTypeArg)) : msg; 
    }
    
    @Override
    public Function1<List<String>, List<String>> createTransformFunc() {
        // 
http://stackoverflow.com/questions/6545066/using-scala-from-java-passing-functions-as-parameters
        Function1<List<String>, List<String>> f = new 
AbstractFunction1<List<String>, List<String>>() {
            public List<String> apply(List<String> words) {
                for(String word : words) {
                    logger.error("AEDWIP input word: {}", word);
                }
                return words;
            }
        };
        
        return f;
    }

    @Override
    public DataType outputDataType() {
        return DataTypes.createArrayType(DataTypes.StringType, true);
    }
}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to