Re: Why can't a Transformer have multiple output columns?

2016-08-23 Thread Nicholas Chammas
Thanks for the pointer! A linked issue from the one you shared also appears to be relevant. SPARK-8418 : "Add single- and multi-value support to ML Transformers" On Tue, Aug 23, 2016 at 10:41 AM Nick Pentreath wrote: > It's not impossible that a

Re: Why can't a Transformer have multiple output columns?

2016-08-23 Thread Nick Pentreath
It's not impossible that a Transformer could output multiple columns - it's simply because none of the current ones do. It's true that it might be a relatively less common use case in general. But take StringIndexer for example. It turns strings (categorical features) into ints (0-based indexes).

Why can't a Transformer have multiple output columns?

2016-08-23 Thread Nicholas Chammas
If you create your own Spark 2.x ML Transformer, there are multiple mix-ins (is that the correct term?) that you can use to define its behavior which are in ml/param/shared.py . Among them are the following mix-ins: