what would be interesting would be to run a little experiment and find out what the default PATH is on your data nodes. How much of a pain would it be to run a little python script to print to stderr the value of the environmental variable $PATH and $PWD (or the shell command 'pwd') ?
that's of course going through normal channels of "add file". the thing is given you're using a relative path "hive/parse_qx.py" you need to know what the "current directory" is when the process runs on the data nodes. On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch <java...@gmail.com> wrote: > > We have a few dozen files that need to be made available to all > mappers/reducers in the cluster while running hive transformation steps . > > It seems the "add archive" does not make the entries unarchived and thus > available directly on the default file path - and that is what we are > looking for. > > To illustrate: > > add file modelfile.1; > add file modelfile.2; > .. > add file modelfile.N; > > Then, our model that is invoked during the transformation step *does *have > correct access to its model files in the defaul path. > > But .. those model files take low *minutes* to all load.. > > instead when we try: > add archive modelArchive.tgz. > > The problem is the archive does not get exploded apparently .. > > I have an archive for example that contains shell scripts under the "hive" > directory stored inside. I am *not *able to access > hive/my-shell-script.sh after adding the archive. Specifically the > following fails: > > $ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml > -rwxrwxr-x stephenb/stephenb 664 2013-06-18 17:46 > appminer/bin/launch-quixey_to_xml.sh > > from (select transform (aappname,qappname) > *using *'*hive/parse_qx.py*' as (aappname2 string, qappname2 string) from > eqx ) o insert overwrite table c select o.aappname2, o.qappname2; > > Cannot run program "hive/parse_qx.py": java.io.IOException: error=2, No such > file or directory > > > >