We have a few dozen files that need to be made available to all
mappers/reducers in the cluster while running hive transformation steps .
It seems the "add archive" does not make the entries unarchived and thus
available directly on the default file path - and that is what we are
looking for.
To illustrate:
add file modelfile.1;
add file modelfile.2;
..
add file modelfile.N;
Then, our model that is invoked during the transformation step *does *have
correct access to its model files in the defaul path.
But .. those model files take low *minutes* to all load..
instead when we try:
add archive modelArchive.tgz.
The problem is the archive does not get exploded apparently ..
I have an archive for example that contains shell scripts under the "hive"
directory stored inside. I am *not *able to access hive/my-shell-script.sh
after adding the archive. Specifically the following fails:
$ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml
-rwxrwxr-x stephenb/stephenb 664 2013-06-18 17:46
appminer/bin/launch-quixey_to_xml.sh
from (select transform (aappname,qappname)
*using *'*hive/parse_qx.py*' as (aappname2 string, qappname2 string) from
eqx ) o insert overwrite table c select o.aappname2, o.qappname2;
Cannot run program "hive/parse_qx.py": java.io.IOException: error=2,
No such file or directory