Re: Any plans to migrate Transformer API to Spark SQL (closer to DataFrames)?

2016-03-29 Thread Joseph Bradley
This is great feedback to hear. I think there was discussion about moving Pipelines outside of ML at some point, but I'll have to spend more time to dig it up. In the meantime, I thought I'd mention this JIRA here in case people have feedback: https://issues.apache.org/jira/browse/SPARK-14033

Re: Any plans to migrate Transformer API to Spark SQL (closer to DataFrames)?

2016-03-28 Thread Michał Zieliński
Hi Maciej, Absolutely. We had to copy HasInputCol/s, HasOutputCol/s (along with a couple of others like HasProbabilityCol) to our repo. Which for most use-cases is good enough, but for some (e.g. operating on any Transformer that accepts either our or Sparks HasInputCol) makes the code clunky.

Re: Any plans to migrate Transformer API to Spark SQL (closer to DataFrames)?

2016-03-28 Thread Jacek Laskowski
Hi, Never develop any custom Transformer (or UnaryTransformer in particular), but I'd be for it if that's the case. Jacek 28.03.2016 6:54 AM "Maciej Szymkiewicz" napisał(a): > Hi Jacek, > > In this context, don't you think it would be useful, if at least some > traits

Re: Any plans to migrate Transformer API to Spark SQL (closer to DataFrames)?

2016-03-27 Thread Maciej Szymkiewicz
Hi Jacek, In this context, don't you think it would be useful, if at least some traits from org.apache.spark.ml.param.shared.sharedParams were public?HasInputCol(s) and HasOutputCol for example. These are useful pretty much every time you create custom Transformer. -- Pozdrawiam, Maciej

Re: Any plans to migrate Transformer API to Spark SQL (closer to DataFrames)?

2016-03-26 Thread Michał Zieliński
Spark ML Pipelines API (not just Transformers, Estimators and custom Pipelines classes as well) are definitely not just machine-learning specific. We use them heavily in our developement. We're building machine learning pipelines *BUT* many steps involve joining, schema manipulation,

Re: Any plans to migrate Transformer API to Spark SQL (closer to DataFrames)?

2016-03-26 Thread Jacek Laskowski
Hi Joseph, Thanks for the response. I'm one who doesn't understand all the hype/need for Machine Learning...yet and through Spark ML(lib) glasses I'm looking at ML space. In the meantime I've got few assignments (in a project with Spark and Scala) that have required quite extensive dataset

Re: Any plans to migrate Transformer API to Spark SQL (closer to DataFrames)?

2016-03-25 Thread Joseph Bradley
There have been some comments about using Pipelines outside of ML, but I have not yet seen a real need for it. If a user does want to use Pipelines for non-ML tasks, they still can use Transformers + PipelineModels. Will that work? On Fri, Mar 25, 2016 at 8:05 AM, Jacek Laskowski

Any plans to migrate Transformer API to Spark SQL (closer to DataFrames)?

2016-03-25 Thread Jacek Laskowski
Hi, After few weeks with spark.ml now, I came to conclusion that Transformer concept from Pipeline API (spark.ml/MLlib) should be part of DataFrame (SQL) where they fit better. Are there any plans to migrate Transformer API (ML) to DataFrame (SQL)? Pozdrawiam, Jacek Laskowski