Re: slow performance when using udf

wd Mon, 15 Aug 2011 23:34:09 -0700

Finally, the flowing code get no performance lose. I think the point
is to avoid to use the getString method, Thanks everyone again.


//import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

import java.net.URLDecoder;

public final class urldecode extends UDF {

    private Text t = new Text();

    public Text evaluate(Text s) {
        if (s == null) { return null; }
        try {
            t.set( URLDecoder.decode( s.toString(), "UTF-8" ));
            return t;
        } catch ( Exception e) {
            return null;
        }
    }

    //public static void main(String args[]) {
        //String t = "%E5%A4%AA%E5%8E%9F-%E4%B8%89%E4%BA%9A";
        //System.out.println( getString(t) );
    //}
}


On Tue, Aug 16, 2011 at 10:47 AM, wd <w...@wdicc.com> wrote:
> Thanks for all your advise, I'll try it out.
>
> On Mon, Aug 15, 2011 at 9:02 PM, Edward Capriolo <edlinuxg...@gmail.com> 
> wrote:
>>
>>
>> On Monday, August 15, 2011, Carl Steinbach <c...@cloudera.com> wrote:
>>> Converting it to a GenericUDF (i.e. extending GenericUDF instead of UDF)
>>> should help some with performance.
>>> On Mon, Aug 15, 2011 at 1:49 AM, wd <w...@wdicc.com> wrote:
>>>>
>>>> hi,
>>>>
>>>> I create a udf to decode urlencoded things, but found the speed for
>>>> mapred is 3 times(73sec -> 213 sec) as before. How to optimize it?
>>>>
>>>> package com.test.hive.udf;
>>>>
>>>> import org.apache.hadoop.hive.ql.exec.UDF;
>>>> import java.net.URLDecoder;
>>>>
>>>> public final class urldecode extends UDF {
>>>>
>>>>    public String evaluate(final String s) {
>>>>        if (s == null) { return null; }
>>>>        return getString(s);
>>>>    }
>>>>
>>>>    public static String getString(String s) {
>>>>        String a;
>>>>        try {
>>>>            a = URLDecoder.decode(s);
>>>>        } catch ( Exception e) {
>>>>            a = "";
>>>>        }
>>>>        return a;
>>>>    }
>>>>
>>>>    public static void main(String args[]) {
>>>>        String t = "%E5%A4%AA%E5%8E%9F-%E4%B8%89%E4%BA%9A";
>>>>        System.out.println( getString(t) );
>>>>    }
>>>> }
>>>
>>>
>>
>> Also you should use class level privatete members to save on object
>> incantation and garbage collection.
>>
>> You also get benefits by matching the args with what you would normally
>> expect from upstream. Hive converts text to string when needed, but if the
>> data normally coming into the method is text you could try and match the
>> argument and see if it is any faster.
>

Re: slow performance when using udf

Reply via email to