Thanks for the quick reply Ram.  Will take a look at the Tokenizer code and
try it out.

Dimple

On Tue, Jun 2, 2015 at 10:42 AM, Ram Sriharsha <sriharsha....@gmail.com>
wrote:

> Hi
>
> We are in the process of adding examples for feature transformations (
> https://issues.apache.org/jira/browse/SPARK-7546) and this should be
> available shortly on Spark Master.
> In the meanwhile, the best place to start would be to look at how the
> Tokenizer works here:
>
> https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala
>
> You need to implement the Transformer interface as above. In this case a
> UnaryTransformer since the feature transformer acts on one column,
> transforms it and outputs another column.
>
> and an example of how to build a pipeline that includes a feature
> transformer (the HashingTF is the feature transformer analogous to what you
> would build):
>
> https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/SimpleTextClassificationPipeline.scala
>
> but stay tuned, we should have examples in Python, Scala and Java soon
>
> Ram
>
> On Tue, Jun 2, 2015 at 10:19 AM, dimple <dimp201...@gmail.com> wrote:
>
>> Hi,
>> I would like to embed my own transformer in the Spark.ml Pipleline but do
>> not see an example of it. Can someone share an example of which
>> classes/interfaces I need to extend/implement in order to do so. Thanks.
>>
>> Dimple
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Embedding-your-own-transformer-in-Spark-ml-Pipleline-tp23112.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Reply via email to