So basically I need something like df.withColumn("score", new Column(new Expression { ...
def eval(input: Row = null): EvaluatedType = myModel.score(input) ... })) But I can't do this, so how can I make a UDF or something like it, that can take in a Row and pass back a double value or some struct... On Tue, Sep 8, 2015 at 5:33 PM, Night Wolf <nightwolf...@gmail.com> wrote: > Not sure how that would work. Really I want to tack on an extra column > onto the DF with a UDF that can take a Row object. > > On Tue, Sep 8, 2015 at 1:54 AM, Jörn Franke <jornfra...@gmail.com> wrote: > >> Can you use a map or list with different properties as one parameter? >> Alternatively a string where parameters are Comma-separated... >> >> Le lun. 7 sept. 2015 à 8:35, Night Wolf <nightwolf...@gmail.com> a >> écrit : >> >>> Is it possible to have a UDF which takes a variable number of arguments? >>> >>> e.g. df.select(myUdf($"*")) fails with >>> >>> org.apache.spark.sql.AnalysisException: unresolved operator 'Project >>> [scalaUDF(*) AS scalaUDF(*)#26]; >>> >>> What I would like to do is pass in a generic data frame which can be >>> then passed to a UDF which does scoring of a model. The UDF needs to know >>> the schema to map column names in the model to columns in the DataFrame. >>> >>> The model has 100s of factors (very wide), so I can't just have a >>> scoring UDF that has 500 parameters (for obvious reasons). >>> >>> Cheers, >>> ~N >>> >> >