?You might be using the wrong path to reference the distributed cache - I was 
under the impression that the distributed cache files would accessible using a 
local path not something starting with '/'.

I suspect query 1 is working because fetch task conversion is running the 
select in a local task, where it can see /data/MyData.txt. Try setting 
hive.fetch.task.conversion=false so the select query is run in a MR task, to 
see if you get the same results as the other queries. Actually fetch task 
conversion doesn't work well with the distributed cache since it is not running 
in a MR task.


For the other queries try referencing the file as "MyData.txt".



________________________________
From: Dayong <will...@gmail.com>
Sent: Tuesday, April 05, 2016 11:49 AM
To: user@hive.apache.org
Subject: Re: Hive UDF to fetch value from distributed cache not working with 
outer queries

What if you extends genericUDF

Thanks,
Dayong

On Apr 5, 2016, at 2:11 PM, Abhishek Dubey 
<abhishek.du...@xoriant.com<mailto:abhishek.du...@xoriant.com>> wrote:


Hi,


We have written a Hive UDF in Java to fetch value from file added in 
distributed cache which works perfectly from a select query like :

Query 1.

select country_key, MyFunction(country_key,"/data/MyData.txt") as capital from 
tablename;


But not working when trying to create table from its output. Like :

Query 2.

 create table new_table
    as
    select country_key, MyFunction(country_key,"/data/MyData.txt") as capital 
from tablename;


It is not even working from outer select. Like :

Query 3.

select t.capital from
(
select country_key, MyFunction(country_key,"/data/MyData.txt") as capital from 
tablename
) t;


Below is my UDF's evaluate function :

public class CountryMap extends UDF{

    Map<Integer, String> countryMap =  null;

    public String evaluate(Integer keyCol, String mapFile) {


        if (countryMap == null){
            //read comma delimited data from mapFile and build a hashmap
                countryMap.put(key, value);
            }

        if (countryMap.containsKey(keyCol)) {
                return countryMap.get(keyCol);
            }
        return "NA";
    }
}


Adding jar, file and creating Hive temporary function in Hive like:

ADD JAR /data/CountryMap-with-dependencies.jar;
ADD FILE /data/MyData.txt;
CREATE TEMPORARY FUNCTION MyFunction as 'CountryMap';


When I run query 1 I get expected value from Map but when I run query 2 and 3 I 
get 'NA'. When I returned Map.size() for query 2 and 3 in place of 'NA' it was 
zero.

I am puzzled why outer select or create table is not able to fetch coutryMap() 
value and why the size of Map becomes zero.



Thanks in advance,

Abhishek Dubey

Reply via email to