Great suggestion Craig! Could you open a Jira on this? thanx ben
On Friday 08 February 2008 01:26:11 Craig Macdonald wrote: > Good morning, > > I've been playing with Pig using three setups: > (a) local > (b) hadoop mapred with hdfs > (c) hadoop mapred with file:///path/to/shared/fs as the default file > system > > In our local setup, various NFS filesystems are shared between all > machines (including mapred nodes) eg /users, /local > > I would like Pig to note when input files are in a file:// directory > that has been marked as shared, and hence not copy it to DFS. > > For comparison, the Torque PBS resource manager has a usecp directive, > which notes when a filesystem location is shared between all nodes, (and > hence scp is not needed). See > http://www.clusterresources.com/wiki/doku.php?id=torque:6.2_nfs_and_other_n >etworked_filesystems > > It would be good to have a configurable setting in Pig that says when a > filesystem is shared, and hence no copying between file:// and hdfs:// > is needed. > An example in our setup might be: > sharedFS file:///local/ > sharedFS file:///users/ > if commands should be used. > > Relatedly, if I use a fs.default.name=file:///path/to/shared/fs then the > default file path for Pig job information is not suitable (eg > /tmp/tempRANDOMINT is NOT shared on all nodes) > > C
