Thanks for your so much for your quick response. I tried to work out the
details these tow days. But t didn't work even after I gave it my best shot.
Your suggestion is right, mybe I should describe the problem More clearly.
Hadoop version 2.6.0; hive version 1.2.1; tez version 0.7.0
I tried different ways as follows:
(1)For hive,we can add the third-party by hive.aux.jars.path. So I add tez
jars by this parameter. When I launch hive by CLI , all these jars will be
loaded to HDFS tez session path " /tmp/hadoop_yarn/root/_tez_session_dir/[long
dynamic number]/". result is failed.
(2) I attempted to use this parameter to load my own dependent configuration
directory and files(or more accurately,our company's other team's
product)-----[conf/hbasetable]. In mr engine, we put the dependent directory
"conf/hbasetable" under $HIVE_HOME/nbconf, it can worked well. In spark engine,
we encounter similar problem,and solved the problem by packaging all dependent
$HIVE_HOME/nbconf/ to a jar file named conf.jar and put it to
$HIVE_HOME/nblib/, when launching hive, it will be loaded to HDFS dest
directory(use hive.aux.jrs.path ). result is failed
(3)config "tez.aux.uris=hdfs://nnhost:nnport/apps/tmpfiles/",and uploaded
"conf/hbasetable" and the packaged "conf.jar" to this path. result is failed
(4)According to your suggestion,in my hadoop cluster, the related
configurations are
yarn.nodemanager.delete.debug-delay-sec=1200;
yarn.nodemanager.local-dirs=/hdfsdata/1/yarndata/nm-local-dir;
I added some code in TezSessionState.refreshLocalResourcesFromConf(),they are:
----------------------------------------------------------------------
FileSystem fs = FileSystem.get(conf);
Path src = new Path("/opt/software/hhive/nbconf/conf/hbasetable");
Path dest = new Path(dir + "/conf/hbasetable");
if (!fs.exist(dest)) {
fs.copyFromLocalFile(src,dest);
}
-----------------------------------------------------------------
This can make directries in HDFS,
/tmp/hadoop_yarn/root/_tez_session_dir/[long dynamic number]/conf/hbasetable,
and can upload files to there, but after hive query "select count(*) from h_im"
failed, I find its runtime path (AM container path):
/hdfsdata/1/yarndata/nm-local-dir/usercache/root/appcache/application_1452823977303_004/container_1452823977303_0004_01_000001/
there are all resources except "[conf/hbasetable]".
I am sure it should be here, because I printsome logs about its
current running path,it was
"/hdfsdata/1/yarndata/nm-local-dir/usercache/root/appcache/application_1452823977303_004/container_1452823977303_0004_01_000001/conf/hbasetable",also
it printed error messages:
"java.lang.ExceptionInInitializerError
……【lang error meesage but small value】omits message........
Caused by java.lang.RuntimeException:[conf/hbasetable/] path not exsit or is
not a directory"
Now,my main questions are:
(1) Under AM container path of local disk, there are all symbolic files linked
to real jars. I can not understand why the yarn container didn't download
"conf/hbasetable",why?
(2)As mentioned earlier,I also packaged this "conf/hbasetable" to conf.jar, and
it was downloaded to the AM container path, why it can not be parsed or
decompressed ?
Is there any configuration options can do this?
best wishes & Thankyou.
------LLBian
At 2016-01-14 11:18:55, "Hitesh Shah" <[email protected]> wrote:
>Hello
>
>You are right that when hive.compute.splits.in.am is true, the splits are
>computed in the cluster in the Tez AM container.
>
>Now, there are a bunch of options to consider but the general gist is that if
>you are familiar with MapReduce Distributed Cache or YARN local resources, you
>need to add the files that your custom input format needs to Tez’s version of
>the distributed cache. The simplest approach for you may be to just use “add
>jar” from Hive which will automatically add these files to the distributed
>cache ( this will copy them from local filesystem into HDFS and also make them
>available in the Tez AM container ). The other option is upload all the
>necessary files to HDFS, say “/tmp/additionalfiles/“ and then specify
>“hdfs://nnhost:nnport/tmp/additionalfiles/“ for property “tez.aux.uris” in
>tez-site.xml. This will add all contents of this HDFS dir as part of the
>distributed cache. Please note that Tez does not do recursive searches in the
>dir but it supports a comma-separate list of files/dirs for tez.aux.uris
>
>Next, to debug this, you can do the following:
> - set "yarn.nodemanager.delete.debug-delay-sec” in yarn-site.xml to a value
> like 1200 to help debugging. This will require NodeManager restarts.
> - next, run your query.
> - Find the application on the YARN ResourceManager UI. This app page will
> also tell you which node the AM is running on or ran on.
> - Go to this node and search for launch_container.sh for the container in
> question ( these files will be found in one of the dirs configured for
> yarn.local-dirs based on your yarn-site.xml )
> - Looking inside launch_container.sh, look for $CWD and see the contents of
> the dir pointed to by $CWD. This will give you an idea of the localized files
> ( from distributed cache ).
>
>If you have more questions, can you first clarify what information/files are
>needed for your plugin to run?
>
>thanks
>— Hitesh
>
>
>
>On Jan 13, 2016, at 7:01 PM, LLBian <[email protected]> wrote:
>
>> And,also the log is in yarn container.
>> I try to solve this problem by packaging nbconf/ to a jar file under
>> $HIVE_HOME,then put it under $HIVE_HOME/nblib, it was uploaded to
>> /tmp/hadoop_yarn/root/_tez_session_dir/,but did not work.
>>
>> Best regards.
>> LLBian
>>
>> 01-14-2016
>>
>> At 2016-01-14 10:47:18, "LLBian" <[email protected]> wrote:
>> >Hi,all
>> > I'm a green hand in using apache tez. Recently,I met some of the
>> > difficulty:
>> > our team has developed a plug-in on hive. It is similar to the function
>> > of HBaseHandler,but customized code. Now my task is to ensure it can be
>> > compatible with tez. while that is the background.My question is:
>> >(1)I have a directory named nbconf,it is created under $HIVE_HOME, under
>> >it,there is a sub-directory named conf/hbasetable.
>> >(2)I also have a directory named nblib,it is created under $HIVE_HOME,used
>> >for Tez JARs.
>> >(3)when I set hive.compute.splits.in.am=true,it throws Exception in hive
>> >log:
>> > ……
>> >[map1]java.lang.ExceptionInInitializerError:
>> >……
>> >……
>> >Caused by java.lang.RuntimeException:[conf/hbasetable/] path not exsit or
>> >is not a directory
>> >……
>> >
>> >But actually it exists!It is under local $HIVE_HOME/nbconf. When I set
>> >hive.compute.splits.in.am=FALSE,it works well. So, I guess,maybe because
>> >computing splits in Cluster AM,not in localdisk. Mybe I should load some
>> >files or directory(eg.conf/hbasetable)HDFS,If tez wish to do so,where
>> >should I put them?:
>> >the tez session dirctory?
>> > /tmp/hadoop_yarn/root/_tez_session_dir/?
>> > /tmp/hadoop_yarn/root/_tez_session_dir/.tez/?
>> > /tmp/hadoop_yarn/root/_tez_session_dir/.tez/AppId/?
>> >I tryed these,but they all didn't work.
>> >
>> >becase it is OK when debugging, so I don`t know how to take up the matter.
>> >I don't know where to put this customed directory "[conf/hbasetable]" on
>> >HDFS.
>> >
>> >I am eager to get your guidance. Any help is greatly appreciated .
>> >(Please forgive my poor English)
>> >
>> >LLBian
>>
>>
>>
>>
>