thanks Kumar , I want to know how to cao udf with multiple parameters , maybe 
an udf to make a substr function,how can I pass parameter with begin and end 
index ?  I try it with errors. Does the udf parameters could only be a column 
type?

2017-06-16 

lk_spark 



发件人:Pralabh Kumar <pralabhku...@gmail.com>
发送时间:2017-06-16 17:49
主题:Re: how to call udf with parameters
收件人:"lk_spark"<lk_sp...@163.com>
抄送:"user.spark"<user@spark.apache.org>

sample UDF
val getlength=udf((data:String)=>data.length())

data.select(getlength(data("col1")))



On Fri, Jun 16, 2017 at 9:21 AM, lk_spark <lk_sp...@163.com> wrote:

hi,all
     I define a udf with multiple parameters  ,but I don't know how to call it 
with DataFrame

UDF:

def ssplit2 = udf { (sentence: String, delNum: Boolean, delEn: Boolean, 
minTermLen: Int) =>
    val terms = HanLP.segment(sentence).asScala
.....

Call :

scala> val output = input.select(ssplit2($"text",true,true,2).as('words))
<console>:40: error: type mismatch;
 found   : Boolean(true)
 required: org.apache.spark.sql.Column
       val output = input.select(ssplit2($"text",true,true,2).as('words))
                                                 ^
<console>:40: error: type mismatch;
 found   : Boolean(true)
 required: org.apache.spark.sql.Column
       val output = input.select(ssplit2($"text",true,true,2).as('words))
                                                      ^
<console>:40: error: type mismatch;
 found   : Int(2)
 required: org.apache.spark.sql.Column
       val output = input.select(ssplit2($"text",true,true,2).as('words))
                                                           ^

scala> val output = 
input.select(ssplit2($"text",$"true",$"true",$"2").as('words))
org.apache.spark.sql.AnalysisException: cannot resolve '`true`' given input 
columns: [id, text];;
'Project [UDF(text#6, 'true, 'true, '2) AS words#16]
+- Project [_1#2 AS id#5, _2#3 AS text#6]
   +- LocalRelation [_1#2, _2#3]


I need help!!


2017-06-16


lk_spark 

Reply via email to