thanks Kumar , that really helpful !!
2017-06-16 lk_spark 发件人:Pralabh Kumar <pralabhku...@gmail.com> 发送时间:2017-06-16 18:30 主题:Re: Re: how to call udf with parameters 收件人:"lk_spark"<lk_sp...@163.com> 抄送:"user.spark"<user@spark.apache.org> val getlength=udf((idx1:Int,idx2:Int, data : String)=> data.substring(idx1,idx2)) data.select(getlength(lit(1),lit(2),data("col1"))).collect On Fri, Jun 16, 2017 at 10:22 AM, Pralabh Kumar <pralabhku...@gmail.com> wrote: Use lit , give me some time , I'll provide an example On 16-Jun-2017 10:15 AM, "lk_spark" <lk_sp...@163.com> wrote: thanks Kumar , I want to know how to cao udf with multiple parameters , maybe an udf to make a substr function,how can I pass parameter with begin and end index ? I try it with errors. Does the udf parameters could only be a column type? 2017-06-16 lk_spark 发件人:Pralabh Kumar <pralabhku...@gmail.com> 发送时间:2017-06-16 17:49 主题:Re: how to call udf with parameters 收件人:"lk_spark"<lk_sp...@163.com> 抄送:"user.spark"<user@spark.apache.org> sample UDF val getlength=udf((data:String)=>data.length()) data.select(getlength(data("col1"))) On Fri, Jun 16, 2017 at 9:21 AM, lk_spark <lk_sp...@163.com> wrote: hi,all I define a udf with multiple parameters ,but I don't know how to call it with DataFrame UDF: def ssplit2 = udf { (sentence: String, delNum: Boolean, delEn: Boolean, minTermLen: Int) => val terms = HanLP.segment(sentence).asScala ..... Call : scala> val output = input.select(ssplit2($"text",true,true,2).as('words)) <console>:40: error: type mismatch; found : Boolean(true) required: org.apache.spark.sql.Column val output = input.select(ssplit2($"text",true,true,2).as('words)) ^ <console>:40: error: type mismatch; found : Boolean(true) required: org.apache.spark.sql.Column val output = input.select(ssplit2($"text",true,true,2).as('words)) ^ <console>:40: error: type mismatch; found : Int(2) required: org.apache.spark.sql.Column val output = input.select(ssplit2($"text",true,true,2).as('words)) ^ scala> val output = input.select(ssplit2($"text",$"true",$"true",$"2").as('words)) org.apache.spark.sql.AnalysisException: cannot resolve '`true`' given input columns: [id, text];; 'Project [UDF(text#6, 'true, 'true, '2) AS words#16] +- Project [_1#2 AS id#5, _2#3 AS text#6] +- LocalRelation [_1#2, _2#3] I need help!! 2017-06-16 lk_spark