probably best to send the URL for JIRA...

On Feb 8, 2008, at 11:22 AM, Benjamin Reed wrote:

Great suggestion Craig! Could you open a Jira on this?

thanx
ben

On Friday 08 February 2008 01:26:11 Craig Macdonald wrote:
Good morning,

I've been playing with Pig using three setups:
 (a) local
 (b) hadoop mapred with hdfs
 (c) hadoop mapred with file:///path/to/shared/fs as the default file
system

In our local setup, various NFS filesystems are shared between all
machines (including mapred nodes)  eg /users, /local

I would like Pig to note when input files are in a file:// directory
that has been marked as shared, and hence not copy it to DFS.

For comparison, the Torque PBS resource manager has a usecp directive, which notes when a filesystem location is shared between all nodes, (and
hence scp is not needed). See
http://www.clusterresources.com/wiki/doku.php?id=torque: 6.2_nfs_and_other_n
etworked_filesystems

It would be good to have a configurable setting in Pig that says when a filesystem is shared, and hence no copying between file:// and hdfs://
is needed.
An example in our setup might be:
 sharedFS file:///local/
 sharedFS file:///users/
if commands should be used.

Relatedly, if I use a fs.default.name=file:///path/to/shared/fs then the
default file path for Pig job information is not suitable (eg
/tmp/tempRANDOMINT is NOT shared on all nodes)

C



Reply via email to