Re: how to call udf with parameters
What version of spark you are using? I cannot reproduce your error: scala> spark.version res9: String = 2.1.1 scala> val dataset = Seq((0, "hello"), (1, "world")).toDF("id", "text") dataset: org.apache.spark.sql.DataFrame = [id: int, text: string] scala> import org.apache.spark.sql.functions.udf import org.apache.spark.sql.functions.udf // define a method in similar way like you did scala> def len = udf { (data: String) => data.length > 0 } len: org.apache.spark.sql.expressions.UserDefinedFunction // use it scala> dataset.select(len($"text").as('length)).show +--+ |length| +--+ | true| | true| +--+ Yong From: Pralabh Kumar Sent: Friday, June 16, 2017 12:19 AM To: lk_spark Cc: user.spark Subject: Re: how to call udf with parameters sample UDF val getlength=udf((data:String)=>data.length()) data.select(getlength(data("col1"))) On Fri, Jun 16, 2017 at 9:21 AM, lk_spark mailto:lk_sp...@163.com>> wrote: hi,all I define a udf with multiple parameters ,but I don't know how to call it with DataFrame UDF: def ssplit2 = udf { (sentence: String, delNum: Boolean, delEn: Boolean, minTermLen: Int) => val terms = HanLP.segment(sentence).asScala . Call : scala> val output = input.select(ssplit2($"text",true,true,2).as('words)) :40: error: type mismatch; found : Boolean(true) required: org.apache.spark.sql.Column val output = input.select(ssplit2($"text",true,true,2).as('words)) ^ :40: error: type mismatch; found : Boolean(true) required: org.apache.spark.sql.Column val output = input.select(ssplit2($"text",true,true,2).as('words)) ^ :40: error: type mismatch; found : Int(2) required: org.apache.spark.sql.Column val output = input.select(ssplit2($"text",true,true,2).as('words)) ^ scala> val output = input.select(ssplit2($"text",$"true",$"true",$"2").as('words)) org.apache.spark.sql.AnalysisException: cannot resolve '`true`' given input columns: [id, text];; 'Project [UDF(text#6, 'true, 'true, '2) AS words#16] +- Project [_1#2 AS id#5, _2#3 AS text#6] +- LocalRelation [_1#2, _2#3] I need help!! 2017-06-16 lk_spark
Re: Re: Re: how to call udf with parameters
thanks Kumar , that really helpful !! 2017-06-16 lk_spark 发件人:Pralabh Kumar 发送时间:2017-06-16 18:30 主题:Re: Re: how to call udf with parameters 收件人:"lk_spark" 抄送:"user.spark" val getlength=udf((idx1:Int,idx2:Int, data : String)=> data.substring(idx1,idx2)) data.select(getlength(lit(1),lit(2),data("col1"))).collect On Fri, Jun 16, 2017 at 10:22 AM, Pralabh Kumar wrote: Use lit , give me some time , I'll provide an example On 16-Jun-2017 10:15 AM, "lk_spark" wrote: thanks Kumar , I want to know how to cao udf with multiple parameters , maybe an udf to make a substr function,how can I pass parameter with begin and end index ? I try it with errors. Does the udf parameters could only be a column type? 2017-06-16 lk_spark 发件人:Pralabh Kumar 发送时间:2017-06-16 17:49 主题:Re: how to call udf with parameters 收件人:"lk_spark" 抄送:"user.spark" sample UDF val getlength=udf((data:String)=>data.length()) data.select(getlength(data("col1"))) On Fri, Jun 16, 2017 at 9:21 AM, lk_spark wrote: hi,all I define a udf with multiple parameters ,but I don't know how to call it with DataFrame UDF: def ssplit2 = udf { (sentence: String, delNum: Boolean, delEn: Boolean, minTermLen: Int) => val terms = HanLP.segment(sentence).asScala . Call : scala> val output = input.select(ssplit2($"text",true,true,2).as('words)) :40: error: type mismatch; found : Boolean(true) required: org.apache.spark.sql.Column val output = input.select(ssplit2($"text",true,true,2).as('words)) ^ :40: error: type mismatch; found : Boolean(true) required: org.apache.spark.sql.Column val output = input.select(ssplit2($"text",true,true,2).as('words)) ^ :40: error: type mismatch; found : Int(2) required: org.apache.spark.sql.Column val output = input.select(ssplit2($"text",true,true,2).as('words)) ^ scala> val output = input.select(ssplit2($"text",$"true",$"true",$"2").as('words)) org.apache.spark.sql.AnalysisException: cannot resolve '`true`' given input columns: [id, text];; 'Project [UDF(text#6, 'true, 'true, '2) AS words#16] +- Project [_1#2 AS id#5, _2#3 AS text#6] +- LocalRelation [_1#2, _2#3] I need help!! 2017-06-16 lk_spark
Re: Re: how to call udf with parameters
val getlength=udf((idx1:Int,idx2:Int, data : String)=> data.substring(idx1,idx2)) data.select(getlength(lit(1),lit(2),data("col1"))).collect On Fri, Jun 16, 2017 at 10:22 AM, Pralabh Kumar wrote: > Use lit , give me some time , I'll provide an example > > On 16-Jun-2017 10:15 AM, "lk_spark" wrote: > >> thanks Kumar , I want to know how to cao udf with multiple parameters , >> maybe an udf to make a substr function,how can I pass parameter with begin >> and end index ? I try it with errors. Does the udf parameters could only >> be a column type? >> >> 2017-06-16 >> -- >> lk_spark >> ------------------ >> >> *发件人:*Pralabh Kumar >> *发送时间:*2017-06-16 17:49 >> *主题:*Re: how to call udf with parameters >> *收件人:*"lk_spark" >> *抄送:*"user.spark" >> >> sample UDF >> val getlength=udf((data:String)=>data.length()) >> data.select(getlength(data("col1"))) >> >> On Fri, Jun 16, 2017 at 9:21 AM, lk_spark wrote: >> >>> hi,all >>> I define a udf with multiple parameters ,but I don't know how to >>> call it with DataFrame >>> >>> UDF: >>> >>> def ssplit2 = udf { (sentence: String, delNum: Boolean, delEn: Boolean, >>> minTermLen: Int) => >>> val terms = HanLP.segment(sentence).asScala >>> . >>> >>> Call : >>> >>> scala> val output = input.select(ssplit2($"text",t >>> rue,true,2).as('words)) >>> :40: error: type mismatch; >>> found : Boolean(true) >>> required: org.apache.spark.sql.Column >>>val output = input.select(ssplit2($"text",t >>> rue,true,2).as('words)) >>> ^ >>> :40: error: type mismatch; >>> found : Boolean(true) >>> required: org.apache.spark.sql.Column >>>val output = input.select(ssplit2($"text",t >>> rue,true,2).as('words)) >>> ^ >>> :40: error: type mismatch; >>> found : Int(2) >>> required: org.apache.spark.sql.Column >>>val output = input.select(ssplit2($"text",t >>> rue,true,2).as('words)) >>>^ >>> >>> scala> val output = input.select(ssplit2($"text",$ >>> "true",$"true",$"2").as('words)) >>> org.apache.spark.sql.AnalysisException: cannot resolve '`true`' given >>> input columns: [id, text];; >>> 'Project [UDF(text#6, 'true, 'true, '2) AS words#16] >>> +- Project [_1#2 AS id#5, _2#3 AS text#6] >>>+- LocalRelation [_1#2, _2#3] >>> >>> I need help!! >>> >>> >>> 2017-06-16 >>> -- >>> lk_spark >>> >> >>
Re: Re: how to call udf with parameters
Use lit , give me some time , I'll provide an example On 16-Jun-2017 10:15 AM, "lk_spark" wrote: > thanks Kumar , I want to know how to cao udf with multiple parameters , > maybe an udf to make a substr function,how can I pass parameter with begin > and end index ? I try it with errors. Does the udf parameters could only > be a column type? > > 2017-06-16 > -- > lk_spark > -- > > *发件人:*Pralabh Kumar > *发送时间:*2017-06-16 17:49 > *主题:*Re: how to call udf with parameters > *收件人:*"lk_spark" > *抄送:*"user.spark" > > sample UDF > val getlength=udf((data:String)=>data.length()) > data.select(getlength(data("col1"))) > > On Fri, Jun 16, 2017 at 9:21 AM, lk_spark wrote: > >> hi,all >> I define a udf with multiple parameters ,but I don't know how to >> call it with DataFrame >> >> UDF: >> >> def ssplit2 = udf { (sentence: String, delNum: Boolean, delEn: Boolean, >> minTermLen: Int) => >> val terms = HanLP.segment(sentence).asScala >> . >> >> Call : >> >> scala> val output = input.select(ssplit2($"text",true,true,2).as('words)) >> :40: error: type mismatch; >> found : Boolean(true) >> required: org.apache.spark.sql.Column >>val output = input.select(ssplit2($"text",true,true,2).as('words)) >> ^ >> :40: error: type mismatch; >> found : Boolean(true) >> required: org.apache.spark.sql.Column >>val output = input.select(ssplit2($"text",true,true,2).as('words)) >> ^ >> :40: error: type mismatch; >> found : Int(2) >> required: org.apache.spark.sql.Column >>val output = input.select(ssplit2($"text",true,true,2).as('words)) >>^ >> >> scala> val output = input.select(ssplit2($"text",$ >> "true",$"true",$"2").as('words)) >> org.apache.spark.sql.AnalysisException: cannot resolve '`true`' given >> input columns: [id, text];; >> 'Project [UDF(text#6, 'true, 'true, '2) AS words#16] >> +- Project [_1#2 AS id#5, _2#3 AS text#6] >>+- LocalRelation [_1#2, _2#3] >> >> I need help!! >> >> >> 2017-06-16 >> -- >> lk_spark >> > >
Re: Re: how to call udf with parameters
thanks Kumar , I want to know how to cao udf with multiple parameters , maybe an udf to make a substr function,how can I pass parameter with begin and end index ? I try it with errors. Does the udf parameters could only be a column type? 2017-06-16 lk_spark 发件人:Pralabh Kumar 发送时间:2017-06-16 17:49 主题:Re: how to call udf with parameters 收件人:"lk_spark" 抄送:"user.spark" sample UDF val getlength=udf((data:String)=>data.length()) data.select(getlength(data("col1"))) On Fri, Jun 16, 2017 at 9:21 AM, lk_spark wrote: hi,all I define a udf with multiple parameters ,but I don't know how to call it with DataFrame UDF: def ssplit2 = udf { (sentence: String, delNum: Boolean, delEn: Boolean, minTermLen: Int) => val terms = HanLP.segment(sentence).asScala . Call : scala> val output = input.select(ssplit2($"text",true,true,2).as('words)) :40: error: type mismatch; found : Boolean(true) required: org.apache.spark.sql.Column val output = input.select(ssplit2($"text",true,true,2).as('words)) ^ :40: error: type mismatch; found : Boolean(true) required: org.apache.spark.sql.Column val output = input.select(ssplit2($"text",true,true,2).as('words)) ^ :40: error: type mismatch; found : Int(2) required: org.apache.spark.sql.Column val output = input.select(ssplit2($"text",true,true,2).as('words)) ^ scala> val output = input.select(ssplit2($"text",$"true",$"true",$"2").as('words)) org.apache.spark.sql.AnalysisException: cannot resolve '`true`' given input columns: [id, text];; 'Project [UDF(text#6, 'true, 'true, '2) AS words#16] +- Project [_1#2 AS id#5, _2#3 AS text#6] +- LocalRelation [_1#2, _2#3] I need help!! 2017-06-16 lk_spark
Re: how to call udf with parameters
sample UDF val getlength=udf((data:String)=>data.length()) data.select(getlength(data("col1"))) On Fri, Jun 16, 2017 at 9:21 AM, lk_spark wrote: > hi,all > I define a udf with multiple parameters ,but I don't know how to > call it with DataFrame > > UDF: > > def ssplit2 = udf { (sentence: String, delNum: Boolean, delEn: Boolean, > minTermLen: Int) => > val terms = HanLP.segment(sentence).asScala > . > > Call : > > scala> val output = input.select(ssplit2($"text",true,true,2).as('words)) > :40: error: type mismatch; > found : Boolean(true) > required: org.apache.spark.sql.Column >val output = input.select(ssplit2($"text",true,true,2).as('words)) > ^ > :40: error: type mismatch; > found : Boolean(true) > required: org.apache.spark.sql.Column >val output = input.select(ssplit2($"text",true,true,2).as('words)) > ^ > :40: error: type mismatch; > found : Int(2) > required: org.apache.spark.sql.Column >val output = input.select(ssplit2($"text",true,true,2).as('words)) >^ > > scala> val output = input.select(ssplit2($"text",$ > "true",$"true",$"2").as('words)) > org.apache.spark.sql.AnalysisException: cannot resolve '`true`' given > input columns: [id, text];; > 'Project [UDF(text#6, 'true, 'true, '2) AS words#16] > +- Project [_1#2 AS id#5, _2#3 AS text#6] >+- LocalRelation [_1#2, _2#3] > > I need help!! > > > 2017-06-16 > -- > lk_spark >