Finally, the flowing code get no performance lose. I think the point is to avoid to use the getString method, Thanks everyone again.
//import org.apache.hadoop.hive.ql.udf.generic.GenericUDF; import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.io.Text; import java.net.URLDecoder; public final class urldecode extends UDF { private Text t = new Text(); public Text evaluate(Text s) { if (s == null) { return null; } try { t.set( URLDecoder.decode( s.toString(), "UTF-8" )); return t; } catch ( Exception e) { return null; } } //public static void main(String args[]) { //String t = "%E5%A4%AA%E5%8E%9F-%E4%B8%89%E4%BA%9A"; //System.out.println( getString(t) ); //} } On Tue, Aug 16, 2011 at 10:47 AM, wd <w...@wdicc.com> wrote: > Thanks for all your advise, I'll try it out. > > On Mon, Aug 15, 2011 at 9:02 PM, Edward Capriolo <edlinuxg...@gmail.com> > wrote: >> >> >> On Monday, August 15, 2011, Carl Steinbach <c...@cloudera.com> wrote: >>> Converting it to a GenericUDF (i.e. extending GenericUDF instead of UDF) >>> should help some with performance. >>> On Mon, Aug 15, 2011 at 1:49 AM, wd <w...@wdicc.com> wrote: >>>> >>>> hi, >>>> >>>> I create a udf to decode urlencoded things, but found the speed for >>>> mapred is 3 times(73sec -> 213 sec) as before. How to optimize it? >>>> >>>> package com.test.hive.udf; >>>> >>>> import org.apache.hadoop.hive.ql.exec.UDF; >>>> import java.net.URLDecoder; >>>> >>>> public final class urldecode extends UDF { >>>> >>>> public String evaluate(final String s) { >>>> if (s == null) { return null; } >>>> return getString(s); >>>> } >>>> >>>> public static String getString(String s) { >>>> String a; >>>> try { >>>> a = URLDecoder.decode(s); >>>> } catch ( Exception e) { >>>> a = ""; >>>> } >>>> return a; >>>> } >>>> >>>> public static void main(String args[]) { >>>> String t = "%E5%A4%AA%E5%8E%9F-%E4%B8%89%E4%BA%9A"; >>>> System.out.println( getString(t) ); >>>> } >>>> } >>> >>> >> >> Also you should use class level privatete members to save on object >> incantation and garbage collection. >> >> You also get benefits by matching the args with what you would normally >> expect from upstream. Hive converts text to string when needed, but if the >> data normally coming into the method is text you could try and match the >> argument and see if it is any faster. >