thanks Kumar ,  that really helpful !!

2017-06-16 

lk_spark 



发件人:Pralabh Kumar <pralabhku...@gmail.com>
发送时间:2017-06-16 18:30
主题:Re: Re: how to call udf with parameters
收件人:"lk_spark"<lk_sp...@163.com>
抄送:"user.spark"<user@spark.apache.org>

val getlength=udf((idx1:Int,idx2:Int, data : String)=> 
data.substring(idx1,idx2))



data.select(getlength(lit(1),lit(2),data("col1"))).collect



On Fri, Jun 16, 2017 at 10:22 AM, Pralabh Kumar <pralabhku...@gmail.com> wrote:

Use lit , give me some time , I'll provide an example


On 16-Jun-2017 10:15 AM, "lk_spark" <lk_sp...@163.com> wrote:

thanks Kumar , I want to know how to cao udf with multiple parameters , maybe 
an udf to make a substr function,how can I pass parameter with begin and end 
index ?  I try it with errors. Does the udf parameters could only be a column 
type?

2017-06-16 

lk_spark 



发件人:Pralabh Kumar <pralabhku...@gmail.com>
发送时间:2017-06-16 17:49
主题:Re: how to call udf with parameters
收件人:"lk_spark"<lk_sp...@163.com>
抄送:"user.spark"<user@spark.apache.org>

sample UDF
val getlength=udf((data:String)=>data.length())

data.select(getlength(data("col1")))



On Fri, Jun 16, 2017 at 9:21 AM, lk_spark <lk_sp...@163.com> wrote:

hi,all
     I define a udf with multiple parameters  ,but I don't know how to call it 
with DataFrame

UDF:

def ssplit2 = udf { (sentence: String, delNum: Boolean, delEn: Boolean, 
minTermLen: Int) =>
    val terms = HanLP.segment(sentence).asScala
.....

Call :

scala> val output = input.select(ssplit2($"text",true,true,2).as('words))
<console>:40: error: type mismatch;
 found   : Boolean(true)
 required: org.apache.spark.sql.Column
       val output = input.select(ssplit2($"text",true,true,2).as('words))
                                                 ^
<console>:40: error: type mismatch;
 found   : Boolean(true)
 required: org.apache.spark.sql.Column
       val output = input.select(ssplit2($"text",true,true,2).as('words))
                                                      ^
<console>:40: error: type mismatch;
 found   : Int(2)
 required: org.apache.spark.sql.Column
       val output = input.select(ssplit2($"text",true,true,2).as('words))
                                                           ^

scala> val output = 
input.select(ssplit2($"text",$"true",$"true",$"2").as('words))
org.apache.spark.sql.AnalysisException: cannot resolve '`true`' given input 
columns: [id, text];;
'Project [UDF(text#6, 'true, 'true, '2) AS words#16]
+- Project [_1#2 AS id#5, _2#3 AS text#6]
   +- LocalRelation [_1#2, _2#3]


I need help!!


2017-06-16


lk_spark 

Reply via email to