This is great feedback to hear. I think there was discussion about moving
Pipelines outside of ML at some point, but I'll have to spend more time to
dig it up.
In the meantime, I thought I'd mention this JIRA here in case people have
feedback:
https://issues.apache.org/jira/browse/SPARK-14033
Hi Maciej,
Absolutely. We had to copy HasInputCol/s, HasOutputCol/s (along with a
couple of others like HasProbabilityCol) to our repo. Which for most
use-cases is good enough, but for some (e.g. operating on any Transformer
that accepts either our or Sparks HasInputCol) makes the code clunky.
Hi,
Never develop any custom Transformer (or UnaryTransformer in particular),
but I'd be for it if that's the case.
Jacek
28.03.2016 6:54 AM "Maciej Szymkiewicz" napisał(a):
> Hi Jacek,
>
> In this context, don't you think it would be useful, if at least some
> traits
Hi Jacek,
In this context, don't you think it would be useful, if at least some
traits from org.apache.spark.ml.param.shared.sharedParams were
public?HasInputCol(s) and HasOutputCol for example. These are useful
pretty much every time you create custom Transformer.
--
Pozdrawiam,
Maciej
Spark ML Pipelines API (not just Transformers, Estimators and custom
Pipelines classes as well) are definitely not just machine-learning
specific.
We use them heavily in our developement. We're building machine learning
pipelines *BUT* many steps involve joining, schema manipulation,
Hi Joseph,
Thanks for the response. I'm one who doesn't understand all the
hype/need for Machine Learning...yet and through Spark ML(lib) glasses
I'm looking at ML space. In the meantime I've got few assignments (in
a project with Spark and Scala) that have required quite extensive
dataset
There have been some comments about using Pipelines outside of ML, but I
have not yet seen a real need for it. If a user does want to use Pipelines
for non-ML tasks, they still can use Transformers + PipelineModels. Will
that work?
On Fri, Mar 25, 2016 at 8:05 AM, Jacek Laskowski
Hi,
After few weeks with spark.ml now, I came to conclusion that
Transformer concept from Pipeline API (spark.ml/MLlib) should be part
of DataFrame (SQL) where they fit better. Are there any plans to
migrate Transformer API (ML) to DataFrame (SQL)?
Pozdrawiam,
Jacek Laskowski