[
https://issues.apache.org/jira/browse/PIG-102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577024#action_12577024
]
Craig Macdonald commented on PIG-102:
-------------------------------------
Hi Ben,
I looked at PigInputFormat, and this essentially looked OK: The correct
filesystem is identified for each path individually.
However, after this i'm a bit lost. A simple test case fails, as somewhere a
local path is not being used with the correct file system. I know what the
exception means, just how to find out *where* it fails.
{noformat}
2008-03-10 15:01:27,698 [main] ERROR org.apache.pig.tools.grunt.Grunt - Wrong
FS: file:/path/to/url.sample, expected: hdfs://node04:56228
{noformat}
I just have no idea how to force a stack trace to be shown. Can anyone comment
here (Stefan?) on how to enable traces on log.error()?
Benjamin, I was hopeful if the proper scheme I (i.e. file: after hadoopify) is
left on, then the proper file system will be selected by the Hadoop layer. I
suspect that HDataStorage, HFile & HDirectory etc will have to change such that
they obtain the correct filesystem for each datastorage element. Generally
speaking, the PIG-32 backend assumes a single file-system for a single backend,
an assumption that this JIRA challenges.
Craig
> Dont copy to DFS if source filesystem marked as shared
> ------------------------------------------------------
>
> Key: PIG-102
> URL: https://issues.apache.org/jira/browse/PIG-102
> Project: Pig
> Issue Type: New Feature
> Components: impl
> Environment: Installations with shared folders on all nodes (eg NFS)
> Reporter: Craig Macdonald
> Attachments: shared.patch
>
>
> I've been playing with Pig using three setups:
> (a) local
> (b) hadoop mapred with hdfs
> (c) hadoop mapred with file:///path/to/shared/fs as the default file system
> In our local setup, various NFS filesystems are shared between all machines
> (including mapred nodes) eg /users, /local
> I would like Pig to note when input files are in a file:// directory that has
> been marked as shared, and hence not copy it to DFS.
> Similarly, the Torque PBS resource manager has a usecp directive, which notes
> when a filesystem location is shared between all nodes, (and hence scp is not
> needed, cp alone can be used). See
> http://www.clusterresources.com/wiki/doku.php?id=torque:6.2_nfs_and_other_networked_filesystems
> It would be good to have a configurable setting in Pig that says when a
> filesystem is shared, and hence no copying between file:// and hdfs:// is
> needed.
> An example in our setup might be:
> sharedFS file:///local/
> sharedFS file:///users/
> if commands should be used.
> This command should be used with care. Obviously if you have 1000 nodes all
> accessing a shared file in NFS, then it would have been better to "hadoopify"
> the file.
> The likely area of code to patch is
> src/org/apache/pig/impl/io/FileLocalizer.java hadoopify(String, PigContext)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.