Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

Stephen Boesch Thu, 20 Jun 2013 05:33:42 -0700

We have a few dozen files that need to be made available to all
mappers/reducers in the cluster while running  hive transformation steps .


It seems the "add archive"  does not make the entries unarchived and thus
available directly on the default file path - and that is what we are
looking for.

To illustrate:

   add file modelfile.1;
   add file modelfile.2;
   ..
    add file modelfile.N;

  Then, our model that is invoked during the transformation step *does *have
correct access to its model files in the defaul path.

But .. those model files take low *minutes* to all load..

instead when we try:
   add archive  modelArchive.tgz.

The problem is the archive does not get exploded apparently ..

I have an archive for example that contains shell scripts under the "hive"
directory stored inside.  I am *not *able to access hive/my-shell-script.sh
 after adding the archive. Specifically the following fails:

$ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml
-rwxrwxr-x stephenb/stephenb    664 2013-06-18 17:46
appminer/bin/launch-quixey_to_xml.sh

from (select transform (aappname,qappname)
*using *'*hive/parse_qx.py*' as (aappname2 string, qappname2 string) from
eqx ) o insert overwrite table c select o.aappname2, o.qappname2;

Cannot run program "hive/parse_qx.py": java.io.IOException: error=2,
No such file or directory

Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

Reply via email to