thanks Kumar , I want to know how to cao udf with multiple parameters , maybe an udf to make a substr function,how can I pass parameter with begin and end index ? I try it with errors. Does the udf parameters could only be a column type?
2017-06-16 lk_spark 发件人:Pralabh Kumar <pralabhku...@gmail.com> 发送时间:2017-06-16 17:49 主题:Re: how to call udf with parameters 收件人:"lk_spark"<lk_sp...@163.com> 抄送:"user.spark"<user@spark.apache.org> sample UDF val getlength=udf((data:String)=>data.length()) data.select(getlength(data("col1"))) On Fri, Jun 16, 2017 at 9:21 AM, lk_spark <lk_sp...@163.com> wrote: hi,all I define a udf with multiple parameters ,but I don't know how to call it with DataFrame UDF: def ssplit2 = udf { (sentence: String, delNum: Boolean, delEn: Boolean, minTermLen: Int) => val terms = HanLP.segment(sentence).asScala ..... Call : scala> val output = input.select(ssplit2($"text",true,true,2).as('words)) <console>:40: error: type mismatch; found : Boolean(true) required: org.apache.spark.sql.Column val output = input.select(ssplit2($"text",true,true,2).as('words)) ^ <console>:40: error: type mismatch; found : Boolean(true) required: org.apache.spark.sql.Column val output = input.select(ssplit2($"text",true,true,2).as('words)) ^ <console>:40: error: type mismatch; found : Int(2) required: org.apache.spark.sql.Column val output = input.select(ssplit2($"text",true,true,2).as('words)) ^ scala> val output = input.select(ssplit2($"text",$"true",$"true",$"2").as('words)) org.apache.spark.sql.AnalysisException: cannot resolve '`true`' given input columns: [id, text];; 'Project [UDF(text#6, 'true, 'true, '2) AS words#16] +- Project [_1#2 AS id#5, _2#3 AS text#6] +- LocalRelation [_1#2, _2#3] I need help!! 2017-06-16 lk_spark