Re:Re: when split data in AM lead to "path not exsit or is not a directory" Execption

LLBian Fri, 15 Jan 2016 01:12:41 -0800

    Thanks for your  so much for your quick response. I tried to work out the 
details these tow days. But t didn't work even after I gave it my best shot.
    Your suggestion is right, mybe I should  describe the problem More clearly. 
    Hadoop version 2.6.0; hive version 1.2.1; tez version 0.7.0
    I tried different ways as follows:
（1）For hive,we can add the  third-party by  hive.aux.jars.path. So I add tez 
jars by this parameter. When I launch hive by CLI , all these jars will be 
loaded to HDFS tez session path " /tmp/hadoop_yarn/root/_tez_session_dir/[long 
dynamic number]/". result is failed.
（2） I attempted  to use this parameter to load my own dependent configuration 
directory and files(or more accurately，our company's other team's 
product)-----[conf/hbasetable]. In mr engine, we put the dependent directory 
"conf/hbasetable" under $HIVE_HOME/nbconf, it can worked well. In spark engine, 
we encounter similar problem，and solved the problem by packaging all dependent 
$HIVE_HOME/nbconf/ to a jar file named conf.jar and put it to 
$HIVE_HOME/nblib/, when launching hive, it will be loaded to HDFS dest 
directory（use hive.aux.jrs.path ）. result is failed
（3）config "tez.aux.uris=hdfs://nnhost:nnport/apps/tmpfiles/",and uploaded 
"conf/hbasetable" and the packaged "conf.jar" to this path.  result is failed
（4）According to your suggestion，in my hadoop cluster, the related 
configurations are
yarn.nodemanager.delete.debug-delay-sec=1200;
 yarn.nodemanager.local-dirs=/hdfsdata/1/yarndata/nm-local-dir;
 I added some code in TezSessionState.refreshLocalResourcesFromConf(),they are:
----------------------------------------------------------------------
FileSystem fs = FileSystem.get(conf);
Path src = new Path("/opt/software/hhive/nbconf/conf/hbasetable");
Path dest = new Path(dir + "/conf/hbasetable");
if (!fs.exist(dest)) {
     fs.copyFromLocalFile(src,dest);
}
-----------------------------------------------------------------
       This can make directries in HDFS, 
/tmp/hadoop_yarn/root/_tez_session_dir/[long dynamic number]/conf/hbasetable, 
and can upload files to there, but after hive query "select count(*) from h_im" 
failed, I find its runtime path (AM container path):
/hdfsdata/1/yarndata/nm-local-dir/usercache/root/appcache/application_1452823977303_004/container_1452823977303_0004_01_000001/
 there are all resources except "[conf/hbasetable]".
         I am sure it should be here, because I printsome logs about its 
current running path，it was 
"/hdfsdata/1/yarndata/nm-local-dir/usercache/root/appcache/application_1452823977303_004/container_1452823977303_0004_01_000001/conf/hbasetable",also
 it printed error messages:
"java.lang.ExceptionInInitializerError
……【lang error meesage but small value】omits message........
Caused by java.lang.RuntimeException:[conf/hbasetable/] path not exsit or is 
not a directory"
    Now,my main questions are:
 (1) Under AM container path of local disk, there are all symbolic files linked 
to real jars.  I can not understand why the yarn container didn't download 
"conf/hbasetable"，why？ 
(2)As mentioned earlier，I also packaged this "conf/hbasetable" to conf.jar, and 
it was downloaded to the AM container path, why it can not be  parsed or 
decompressed ?


   Is there any configuration options can do this?

best wishes & Thankyou.
 ------LLBian


At 2016-01-14 11:18:55, "Hitesh Shah" <[email protected]> wrote:
>Hello 
>
>You are right that when hive.compute.splits.in.am is true, the splits are 
>computed in the cluster in the Tez AM container. 
>
>Now, there are a bunch of options to consider but the general gist is that if 
>you are familiar with MapReduce Distributed Cache or YARN local resources, you 
>need to add the files that your custom input format needs to Tez’s version of 
>the distributed cache. The simplest approach for you may be to just use “add 
>jar” from Hive which will automatically add these files to the distributed 
>cache ( this will copy them from local filesystem into HDFS and also make them 
>available in the Tez AM container ). The other option is upload all the 
>necessary files to HDFS, say “/tmp/additionalfiles/“ and then specify 
>“hdfs://nnhost:nnport/tmp/additionalfiles/“ for property “tez.aux.uris” in 
>tez-site.xml.  This will add all contents of this HDFS dir as part of the 
>distributed cache. Please note that Tez does not do recursive searches in the 
>dir but it supports a comma-separate list of files/dirs for tez.aux.uris 
>
>Next, to debug this, you can do the following:
>   - set "yarn.nodemanager.delete.debug-delay-sec” in yarn-site.xml to a value 
> like 1200 to help debugging. This will require NodeManager restarts.
>   - next, run your query.
>   - Find the application on the YARN ResourceManager UI. This app page will 
> also tell you which node the AM is running on or ran on. 
>   - Go to this node and search for launch_container.sh for the container in 
> question ( these files will be found in one of the dirs configured for 
> yarn.local-dirs based on your yarn-site.xml )
>   - Looking inside launch_container.sh, look for $CWD and see the contents of 
> the dir pointed to by $CWD. This will give you an idea of the localized files 
> ( from distributed cache ).
>
>If you have more questions, can you first clarify what information/files are 
>needed for your plugin to run? 
>
>thanks
>— Hitesh
> 
> 
>
>On Jan 13, 2016, at 7:01 PM, LLBian <[email protected]> wrote:
>
>> And,also the log is in yarn container. 
>> I try to solve this problem by packaging nbconf/ to a jar file under 
>> $HIVE_HOME,then put it under $HIVE_HOME/nblib, it was uploaded to  
>> /tmp/hadoop_yarn/root/_tez_session_dir/,but did not work.
>> 
>> Best regards.
>>  LLBian
>> 
>> 01-14-2016
>> 
>> At 2016-01-14 10:47:18, "LLBian" <[email protected]> wrote:
>> >Hi,all
>> >    I'm a green hand in using apache tez. Recently,I met some of the 
>> > difficulty:
>> >    our team has developed a plug-in on hive. It is similar to the function 
>> > of HBaseHandler,but customized code. Now my task is to ensure it can be 
>> > compatible with tez.  while that is the background.My question is:
>> >（1）I have a directory named nbconf,it is created under $HIVE_HOME, under 
>> >it,there is a sub-directory named conf/hbasetable. 
>> >（2）I also have a directory named nblib,it is  created under $HIVE_HOME,used 
>> >for Tez JARs.
>> >（3）when I set  hive.compute.splits.in.am=true,it throws Exception in hive 
>> >log:
>> >   ……
>> >[map1]java.lang.ExceptionInInitializerError:
>> >……
>> >……
>> >Caused by java.lang.RuntimeException:[conf/hbasetable/] path not exsit or 
>> >is not a directory
>> >……
>> >
>> >But actually it exists！It is under local $HIVE_HOME/nbconf. When I set  
>> >hive.compute.splits.in.am=FALSE,it works well. So, I guess,maybe because 
>> >computing splits in Cluster AM,not in localdisk. Mybe I should load some 
>> >files or directory(eg.conf/hbasetable)HDFS,If tez wish to do so,where 
>> >should I put them？:
>> >the tez session dirctory?
>> > /tmp/hadoop_yarn/root/_tez_session_dir/? 
>> > /tmp/hadoop_yarn/root/_tez_session_dir/.tez/?
>> > /tmp/hadoop_yarn/root/_tez_session_dir/.tez/AppId/?
>> >I tryed these,but they all didn't work.
>> >
>> >becase it is OK when debugging, so I don`t know how to take up the matter.  
>> >I don't know where to put this customed directory "[conf/hbasetable]" on 
>> >HDFS.
>> >
>> >I am eager to get your guidance. Any help is greatly appreciated .
>> >(Please forgive my poor English)
>> >
>> >LLBian
>> 
>> 
>> 
>>  
>

Re:Re: when split data in AM lead to "path not exsit or is not a directory" Execption

Reply via email to