Re: Can't access file in Distributed Cache in Hive 1.1.0

gabriel balan Tue, 30 Jun 2015 07:51:19 -0700

Hi

Try "set hive.fetch.task.conversion=minimal;" in hive cli to get an MR job 
rather than a local fetch task.


hth
Gabriel Balan

On 6/30/2015 5:22 AM, Zsolt Tóth wrote:

Thank you for your answer. The plans are identical for Hive 1.0.0 and Hive 
1.1.0.

You're right, Hive-1.1.0 does not start a MapReduce job for the query, while 
Hive-1.0.0 does. Should I file a JIRA for this issue?

2015-05-07 21:17 GMT+02:00 Jason Dere <jd...@hortonworks.com 
<mailto:jd...@hortonworks.com>>:

    Is this on Hive CLI, or using HiveServer2?

    Can you run "explain select in_file('a', './testfile') from a;" from both 
Hive 1.0.0 and hive 1.1.0 and see if they look different?
    One possibile thing that might be happening here is that in Hive-1.1.0, 
this query is being executed without the need for a map/reduce job, in which 
case the working directory for the query is probably the local working 
directory from when Hive was invoked. I don't think the Distributed Cache will 
be working correctly in this case, because the UDF is not running in a 
map/reduce task.

    If a map-reduce job is kicked off for the query and the UDF is running in 
this m/r task environment, then the distributed cache will likely be working 
fine.

    If there is a way to ensure the query with your UDF runs as part of a 
map/reduce job this may do the trick. Adding an order-by will do it, but maybe 
other people on this list may have a better way of making this happen.



    On May 7, 2015, at 3:28 AM, Zsolt Tóth <toth.zsolt....@gmail.com 
<mailto:toth.zsolt....@gmail.com>> wrote:

    Does this error occur for anyone else? It might be a serious issue.

    2015-05-05 13:59 GMT+02:00 Zsolt Tóth <toth.zsolt....@gmail.com 
<mailto:toth.zsolt....@gmail.com>>:

        Hi,

        I've just upgraded to Hive 1.1.0 and it looks like there is a problem 
with the distributed cache.
        I use ADD FILE, then an UDF that wants to read the file. The following 
syntax works in Hive 1.0.0 but Hive can't find the file in 1.1.0 (testfile 
exists on hdfs, the built-in udf in_file is just an example):

        add file hdfs:///tmp/testfile;
        select in_file('a', './testfile') from a;

        However, it works with the local path:

        select in_file('a', 
'/tmp/462e6854-10f3-4a68-a290-615e6e9d60ff_resources/testfile') from a;

        When I try to list the files in the directory "./" in Hive 1.1.0, it 
lists the cluster's root directory. It looks like the working directory changed in Hive 
1.1.0. Is this intended? If so, how can I access the files in the distributed cache added 
with ADD FILE?

        Regards,
        Zsolt


--
The statements and opinions expressed here are my own and do not necessarily 
represent those of Oracle Corporation.

Re: Can't access file in Distributed Cache in Hive 1.1.0

Reply via email to