Thanks for the quick reply Ram. Will take a look at the Tokenizer code and try it out.
Dimple On Tue, Jun 2, 2015 at 10:42 AM, Ram Sriharsha <sriharsha....@gmail.com> wrote: > Hi > > We are in the process of adding examples for feature transformations ( > https://issues.apache.org/jira/browse/SPARK-7546) and this should be > available shortly on Spark Master. > In the meanwhile, the best place to start would be to look at how the > Tokenizer works here: > > https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala > > You need to implement the Transformer interface as above. In this case a > UnaryTransformer since the feature transformer acts on one column, > transforms it and outputs another column. > > and an example of how to build a pipeline that includes a feature > transformer (the HashingTF is the feature transformer analogous to what you > would build): > > https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/SimpleTextClassificationPipeline.scala > > but stay tuned, we should have examples in Python, Scala and Java soon > > Ram > > On Tue, Jun 2, 2015 at 10:19 AM, dimple <dimp201...@gmail.com> wrote: > >> Hi, >> I would like to embed my own transformer in the Spark.ml Pipleline but do >> not see an example of it. Can someone share an example of which >> classes/interfaces I need to extend/implement in order to do so. Thanks. >> >> Dimple >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Embedding-your-own-transformer-in-Spark-ml-Pipleline-tp23112.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >